When performing English-to-Tamil Neural Machine Translation(NMT),end users face several challenges due to Tamil's rich morphology,free word order,and limited annotated corpora.Although available transformer-based ...When performing English-to-Tamil Neural Machine Translation(NMT),end users face several challenges due to Tamil's rich morphology,free word order,and limited annotated corpora.Although available transformer-based models offer strong baselines,they compromise syntactic awareness and the detection and man-agement of offensive content in cluttered,noisy,and informal text.In this paper,we present POSDEP-Offense-Trans,a multi-task NMT framework that combines Part-of-Speech(POS)and Dependency Parsing(DEP)methods with a robust offensive language classification module.Our architecture enriches the Transformer encoder with syntax-aware embeddings and provides syntax-guided attention mechanisms.The architecture incorporates a structure-aware contrastive loss that reinforces syntactic consistency and deploys auxiliary classification heads for POS tagging,dependency parsing,and multi-class offensive detection.The classifier for offensive words operates at both sentence and token levels and obtains guidance from syntactic features and formal finite automata rules that model offensive language structures-hate speech,profanity,sarcasm,and threats.Using this architecture,we construct a syntactically enriched,socially annotated corpus.Experimental results show improvements in translation quality,with a BLEU score of 33.5,UAS/LAS parsing accuracies of 92.4%and 90%,and a 4.5%Fl-score gain in offensive content detection compared with baseline POS+DEP+Offense models.Also,the proposed model achieved 92.3%in offensive content neutralization,as confirmed by ablation studies.This comprehensive English-Tamil NMT model that unifies syntactic modelling and ethical filtering-laying the groundwork for applications in social media moderation,hate speech mitigation,and policy-compliant multilingual content generation.展开更多
文摘When performing English-to-Tamil Neural Machine Translation(NMT),end users face several challenges due to Tamil's rich morphology,free word order,and limited annotated corpora.Although available transformer-based models offer strong baselines,they compromise syntactic awareness and the detection and man-agement of offensive content in cluttered,noisy,and informal text.In this paper,we present POSDEP-Offense-Trans,a multi-task NMT framework that combines Part-of-Speech(POS)and Dependency Parsing(DEP)methods with a robust offensive language classification module.Our architecture enriches the Transformer encoder with syntax-aware embeddings and provides syntax-guided attention mechanisms.The architecture incorporates a structure-aware contrastive loss that reinforces syntactic consistency and deploys auxiliary classification heads for POS tagging,dependency parsing,and multi-class offensive detection.The classifier for offensive words operates at both sentence and token levels and obtains guidance from syntactic features and formal finite automata rules that model offensive language structures-hate speech,profanity,sarcasm,and threats.Using this architecture,we construct a syntactically enriched,socially annotated corpus.Experimental results show improvements in translation quality,with a BLEU score of 33.5,UAS/LAS parsing accuracies of 92.4%and 90%,and a 4.5%Fl-score gain in offensive content detection compared with baseline POS+DEP+Offense models.Also,the proposed model achieved 92.3%in offensive content neutralization,as confirmed by ablation studies.This comprehensive English-Tamil NMT model that unifies syntactic modelling and ethical filtering-laying the groundwork for applications in social media moderation,hate speech mitigation,and policy-compliant multilingual content generation.