Automated Development of a Grammatical Dictionary for Georgian Dialects
DOI:
https://doi.org/10.33422/ejest.v8i1.1553Keywords:
Acquisition of Lexicon, Agglutinative Languages, Language Modelling, Lemmatization Rules, Morphological AnalysisAbstract
This paper presents an automated system for compiling grammatical dictionaries of the Georgian language and its dialects. Unlike traditional dictionaries, grammatical dictionaries include not only base word forms but also complete paradigms, offering detailed morphological and syntactic information. This is particularly crucial for agglutinative-inflectional languages such as Georgian, where word forms vary significantly depending on context. The system applies a dictionary-based approach to expand lexical resources by identifying words with shared grammatical markers and integrates an innovative lemmatization algorithm capable of processing unknown words, automatically generating their base forms and paradigms. The methodology builds upon prior research in dialectal lexicography and syntactic annotation within Georgian corpora, while introducing comparative insights from similar linguistic technologies applied to other agglutinative languages. The developed system demonstrated high efficiency in automating the creation of grammatical dictionaries. Testing on Georgian literary corpora revealed that only 2% of non-dictionary word forms required manual correction post-lemmatization. The affix-based algorithm significantly outperformed traditional suffix-only methods, particularly in handling complex morphological structures. These results confirm the system's effectiveness in expanding lexical resources and highlight its adaptability for other Kartvelian languages. The study emphasizes the value of integrating linguistic theory with computational approaches to address challenges in morphological processing and lexicon development, offering both theoretical contributions and practical applications in language technology.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Liana L Lortkipanidze, Anna R Chutkerashvili

This work is licensed under a Creative Commons Attribution 4.0 International License.