Linguistic steganography(LS)aims to embed secret information into normal natural text for covert communication.It includes modification-based(MLS)and generation-based(GLS)methods.MLS often relies on limited manual rul...Linguistic steganography(LS)aims to embed secret information into normal natural text for covert communication.It includes modification-based(MLS)and generation-based(GLS)methods.MLS often relies on limited manual rules,resulting in low embedding capacity,while GLS achieves higher embedding capacity through automatic text generation but typically ignores extraction efficiency.To address this,we propose a sentence attribute encodingbased MLS method that enhances extraction efficiency while maintaining strong performance.The proposed method designs a lightweight semantic attribute analyzer to encode sentence attributes for embedding secret information.When the attribute values of the cover sentence differ from the secret information to be embedded,a semantic attribute adjuster based on paraphrasing is used to automatically generate paraphrase sentences of the target attribute,thereby improving the problem of insufficient manual rules.During the extraction,secret information can be extracted solely by employing the semantic attribute analyzer,thereby eliminating the dependence on the paraphrasing generation model.Experimental results show that thismethod achieves an extraction speed of 1141.54 bits/sec,compared with the existing methods,it has remarkable advantages regarding extraction speed.Meanwhile,the stego text generated by thismethod respectively reaches 68.53,39.88,and 80.77 on BLEU,△PPL,and BERTScore.Compared with the existing methods,the text quality is effectively improved.展开更多
Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search i...Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search in databases.However,due to a lack of unified naming standards across prevalent information systems(a.k.a.information islands),AST identification still remains as an open problem.To tackle this problem,we propose a context-aware method to figure out the ASTs for relations in this paper.We transform the AST identification into a multi-class classification problem and propose a schema context aware(SCA)model to learn the representation from a collection of relations associated with attribute values and schema context.Based on the learned representation,we predict the AST for a given attribute from an underlying relation,wherein the predicted AST is mapped to one of the labeled ASTs.To improve the performance for AST identification,especially for the case that the predicted semantic types of attributes are not included in the labeled ASTs,we then introduce knowledge base embeddings(a.k.a.KBVec)to enhance the above representation and construct a schema context aware model with knowledge base enhanced(SCA-KB)to get a stable and robust model.Extensive experiments based on real datasets demonstrate that our context-aware method outperforms the state-of-the-art approaches by a large margin,up to 6.14%and 25.17%in terms of macro average F1 score,and up to 0.28%and 9.56%in terms of weighted F1 score over high-quality and low-quality datasets respectively.展开更多
基金supported by the National Natural Science Foundation of China under Grant 61972057Hunan Provincial Natural Science Foundation of China under Grant 2022JJ30623.
文摘Linguistic steganography(LS)aims to embed secret information into normal natural text for covert communication.It includes modification-based(MLS)and generation-based(GLS)methods.MLS often relies on limited manual rules,resulting in low embedding capacity,while GLS achieves higher embedding capacity through automatic text generation but typically ignores extraction efficiency.To address this,we propose a sentence attribute encodingbased MLS method that enhances extraction efficiency while maintaining strong performance.The proposed method designs a lightweight semantic attribute analyzer to encode sentence attributes for embedding secret information.When the attribute values of the cover sentence differ from the secret information to be embedded,a semantic attribute adjuster based on paraphrasing is used to automatically generate paraphrase sentences of the target attribute,thereby improving the problem of insufficient manual rules.During the extraction,secret information can be extracted solely by employing the semantic attribute analyzer,thereby eliminating the dependence on the paraphrasing generation model.Experimental results show that thismethod achieves an extraction speed of 1141.54 bits/sec,compared with the existing methods,it has remarkable advantages regarding extraction speed.Meanwhile,the stego text generated by thismethod respectively reaches 68.53,39.88,and 80.77 on BLEU,△PPL,and BERTScore.Compared with the existing methods,the text quality is effectively improved.
基金supported by the National Key Research and Development Program of China under Grant No.2020YFB2104100the National Natural Science Foundation of China under Grant Nos.61972403 and U1711261the Fundamental Research Funds for the Central Universities of China,the Research Funds of Renmin University of China,and Tencent Rhino-Bird Joint Research Program.
文摘Identifying semantic types for attributes in relations,known as attribute semantic type(AST)identification,plays an important role in many data analysis tasks,such as data cleaning,schema matching,and keyword search in databases.However,due to a lack of unified naming standards across prevalent information systems(a.k.a.information islands),AST identification still remains as an open problem.To tackle this problem,we propose a context-aware method to figure out the ASTs for relations in this paper.We transform the AST identification into a multi-class classification problem and propose a schema context aware(SCA)model to learn the representation from a collection of relations associated with attribute values and schema context.Based on the learned representation,we predict the AST for a given attribute from an underlying relation,wherein the predicted AST is mapped to one of the labeled ASTs.To improve the performance for AST identification,especially for the case that the predicted semantic types of attributes are not included in the labeled ASTs,we then introduce knowledge base embeddings(a.k.a.KBVec)to enhance the above representation and construct a schema context aware model with knowledge base enhanced(SCA-KB)to get a stable and robust model.Extensive experiments based on real datasets demonstrate that our context-aware method outperforms the state-of-the-art approaches by a large margin,up to 6.14%and 25.17%in terms of macro average F1 score,and up to 0.28%and 9.56%in terms of weighted F1 score over high-quality and low-quality datasets respectively.