The emergence of large language models(LLMs)has brought about revolutionary social value.However,concerns have arisen regarding the generation of deceptive content by LLMs and their potential for misuse.Consequently,a...The emergence of large language models(LLMs)has brought about revolutionary social value.However,concerns have arisen regarding the generation of deceptive content by LLMs and their potential for misuse.Consequently,a crucial research question arises:How can we differentiate between AI-generated and human-authored text?Existing detectors face some challenges,such as operating as black boxes,relying on supervised training,and being vulnerable to manipulation and misinformation.To tackle these challenges,we propose an innovative unsupervised white-box detection method that utilizes a“dual-driven verification mechanism”to achieve high-performance detection,even in the presence of obfuscated attacks in the text content.To be more specific,we initially employ the SpaceInfi strategy to enhance the difficulty of detecting the text content.Subsequently,we randomly select vulnerable spots from the text and perturb them using another pre-trained language model(e.g.,T5).Finally,we apply a dual-driven defense mechanism(D3M)that validates text content with perturbations,whether generated by a model or authored by a human,based on the dimensions of Information TransmissionQuality and Information TransmissionDensity.Through experimental validation,our proposed novelmethod demonstrates state-of-the-art(SOTA)performancewhen exposed to equivalent levels of perturbation intensity across multiple benchmarks,thereby showcasing the effectiveness of our strategies.展开更多
文摘The emergence of large language models(LLMs)has brought about revolutionary social value.However,concerns have arisen regarding the generation of deceptive content by LLMs and their potential for misuse.Consequently,a crucial research question arises:How can we differentiate between AI-generated and human-authored text?Existing detectors face some challenges,such as operating as black boxes,relying on supervised training,and being vulnerable to manipulation and misinformation.To tackle these challenges,we propose an innovative unsupervised white-box detection method that utilizes a“dual-driven verification mechanism”to achieve high-performance detection,even in the presence of obfuscated attacks in the text content.To be more specific,we initially employ the SpaceInfi strategy to enhance the difficulty of detecting the text content.Subsequently,we randomly select vulnerable spots from the text and perturb them using another pre-trained language model(e.g.,T5).Finally,we apply a dual-driven defense mechanism(D3M)that validates text content with perturbations,whether generated by a model or authored by a human,based on the dimensions of Information TransmissionQuality and Information TransmissionDensity.Through experimental validation,our proposed novelmethod demonstrates state-of-the-art(SOTA)performancewhen exposed to equivalent levels of perturbation intensity across multiple benchmarks,thereby showcasing the effectiveness of our strategies.