The purpose of this paper is to construct a dataset of Mongolian near-synonymous compound qualitative adjectives to improve semantic analysis and develop semantic networks in the Mongolian language. By analyzing the c...The purpose of this paper is to construct a dataset of Mongolian near-synonymous compound qualitative adjectives to improve semantic analysis and develop semantic networks in the Mongolian language. By analyzing the characteristics of Mongolian compound adjectives, this paper examines the current challenge in the study of synonyms. In this paper, we use three methods based on dictionaries, word embedding, and pre-trained language models and successfully construct 76 near-sense sets containing 450 conjunctive adjectives. The experiments demonstrate that the dictionary-based method is more accurate but needs better recall. The word embedding-based method can discover new near-synonyms but is limited by data quality. The pre-trained language model method underperforms when dealing with complex contexts. Future work will expand dictionary resources, optimize matching rules, improve data quality, and explore alternative models to build a more comprehensive Mongolian near-synonym resource.展开更多
基金supported by the Doctoral Discipline Innovation Project of the Inner Mongolia Autonomous Region(No.B20231051Z).
文摘The purpose of this paper is to construct a dataset of Mongolian near-synonymous compound qualitative adjectives to improve semantic analysis and develop semantic networks in the Mongolian language. By analyzing the characteristics of Mongolian compound adjectives, this paper examines the current challenge in the study of synonyms. In this paper, we use three methods based on dictionaries, word embedding, and pre-trained language models and successfully construct 76 near-sense sets containing 450 conjunctive adjectives. The experiments demonstrate that the dictionary-based method is more accurate but needs better recall. The word embedding-based method can discover new near-synonyms but is limited by data quality. The pre-trained language model method underperforms when dealing with complex contexts. Future work will expand dictionary resources, optimize matching rules, improve data quality, and explore alternative models to build a more comprehensive Mongolian near-synonym resource.