A scenic-spot introduction-task-oriented 3D virtual human spoken dialogue system-- EasyGuide is introduced. The system includes five modules: natural language processing, task do- main knowledge database, dialogue ma...A scenic-spot introduction-task-oriented 3D virtual human spoken dialogue system-- EasyGuide is introduced. The system includes five modules: natural language processing, task do- main knowledge database, dialogue management, voice processing and 3D virtual human text-to-vis- ual speech synthesis. In the first module, dictionary construction along with sentence analysis and semantic representation axe illustrated specifically. A tree-structured knowledge database is designed for the task domain. A novel framework based on the keyword analysis and context constraints is proposed as the dialogue management. As for voice processing module, a software development kit which performs speech recognition and synthesis is introduced briefly. In the last module, 3D viseme synthesis is explained with examples and a text-driven facial animation system is presented. Evalua- tion results show that the system can achieve satisfactory performance.展开更多
As a representative technique in natural language processing(NLP),named entity recognition is used in many tasks,such as dialogue systems,machine translation and information extraction.In dialogue systems,there is a c...As a representative technique in natural language processing(NLP),named entity recognition is used in many tasks,such as dialogue systems,machine translation and information extraction.In dialogue systems,there is a common case for named entity recognition,where a lot of entities are composed of numbers,and are segmented to be located in different places.For example,in multiple rounds of dialogue systems,a phone number is likely to be divided into several parts,because the phone number is usually long and is emphasized.In this paper,the entity consisting of numbers is named as number entity.The discontinuous positions of number entities result from many reasons.We find two reasons from real-world dialogue systems.The first reason is the repetitive confirmation of different components of a number entity,and the second reason is the interception of mood words.The extraction of number entities is quite useful in many tasks,such as user information completion and service requests correction.However,the existing entity extraction methods cannot extract entities consisting of discontinuous entity blocks.To address these problems,in this paper,we propose a comprehensive method for number entity recognition,which is capable of extracting number entities in multiple rounds of dialogues systems.We conduct extensive experiments on a real-world dataset,and the experimental results demonstrate the high performance of our method.展开更多
Neural talk models play a leading role in the growing popular building of conversational managers.A commonplace criticism of those systems is that they seldom understand or use the conversation data efficiently.The d...Neural talk models play a leading role in the growing popular building of conversational managers.A commonplace criticism of those systems is that they seldom understand or use the conversation data efficiently.The development of profound concentration on innovations has increased the use of neural models for a discussion display.In recent years,deep learning(DL)models have achieved significant success in various tasks,and many dialogue systems are also employing DL techniques.The primary issues involved in the generation of the dialogue system are acquiring perspectives into instinctual linguistics,comprehension provision,and conversation assessment.In this paper,we mainly focus on DL-based dialogue systems.The issue to be overcome under this publication would be dialogue supervision,which will determine how the framework responds to recognizing the needs of the user.The dataset utilized in this research is extracted from movies.The models implemented in this research are the seq2seq model,transformers,and GPT while using word embedding and NLP.The results obtained after implementation depicted that all three models produced accurate results.In the modern revolutionized world,the demand for a dialogue system is more than ever.Therefore,it is essential to take the necessary steps to build effective dialogue systems.展开更多
In task-oriented dialogue systems, intent, emotion, and actions are crucial elements of user activity. Analyzing the relationships among these elements to control and manage task-oriented dialogue systems is a challen...In task-oriented dialogue systems, intent, emotion, and actions are crucial elements of user activity. Analyzing the relationships among these elements to control and manage task-oriented dialogue systems is a challenging task. However, previous work has primarily focused on the independent recognition of user intent and emotion, making it difficult to simultaneously track both aspects in the dialogue tracking module and to effectively utilize user emotions in subsequent dialogue strategies. We propose a Multi-Head Encoder Shared Model (MESM) that dynamically integrates features from emotion and intent encoders through a feature fusioner. Addressing the scarcity of datasets containing both emotion and intent labels, we designed a multi-dataset learning approach enabling the model to generate dialogue summaries encompassing both user intent and emotion. Experiments conducted on the MultiWoZ and MELD datasets demonstrate that our model effectively captures user intent and emotion, achieving extremely competitive results in dialogue state tracking tasks.展开更多
Dialogue State Tracking(DST)is a critical component of task-oriented spoken dialogue systems(SDS),tasked with maintaining an accurate representation of the conversational state by predicting slots and their correspond...Dialogue State Tracking(DST)is a critical component of task-oriented spoken dialogue systems(SDS),tasked with maintaining an accurate representation of the conversational state by predicting slots and their corresponding values.Recent advances leverage Large Language Models(LLMs)with prompt-based tuning to improve tracking accuracy and efficiency.However,these approaches often incur substantial computational and memory overheads and typically address slot extraction implicitly within prompts,without explicitly modeling the complex dependencies between slots and values.In this work,we propose PUGG,a novel DST framework that constructs schema-driven prompts to fine-tune GPT-2 and utilizes its tokenizer to implement a memory encoder.PUGG explicitly extracts slot values via GPT-2 and employs Graph Attention Networks(GATs)to model and reason over the intricate relationships between slots and their associated values.We evaluate PUGG on four publicly available datasets,where it achieves stateof-the-art performance across multiple evaluation metrics,highlighting its robustness and generalizability in diverse conversational scenarios.Our results indicate that the integration of GPT-2 substantially reduces model complexity and memory consumption by streamlining key processes.Moreover,prompt tuning enhances the model’s flexibility and precision in extracting relevant slot-value pairs,while the incorporation of GATs facilitates effective relational reasoning,leading to improved dialogue state representations.展开更多
Algorithms of detecting dialogue deviations from a dialogue topic in an agent and ontology-based dialogue management system(AODMS) are proposed. In AODMS, agents and ontologies are introduced to represent domain kno...Algorithms of detecting dialogue deviations from a dialogue topic in an agent and ontology-based dialogue management system(AODMS) are proposed. In AODMS, agents and ontologies are introduced to represent domain knowledge. And general algorithms that model dialogue phenomena in different domains can be realized in that complex relationships between knowledge in different domains can be described by ontologies. An evaluation of the dialogue management system with deviation-judging algorithms on 736 utterances shows that the AODMS is able to talk about the given topic consistently and answer 86.6 % of the utterances, while only 72.1% of the utterances can be responded correctly without deviation-judging module.展开更多
SHTQS is an intelligent telephone-besed spoken dialyze system providing the infomation about the best route between two sites in Shanghai. Instead of separated parts of speech decoding and language parsing, a close co...SHTQS is an intelligent telephone-besed spoken dialyze system providing the infomation about the best route between two sites in Shanghai. Instead of separated parts of speech decoding and language parsing, a close cool,ration is carded out in SHTQS by integrating automatic speech recognizer (AS,R), language understanding, dialogue management and speech generatot. In such a way, the erroneous analysis and uncertainty happening in the preceding stages would be recovered and determined acourately with high-level knowledge, Moreover, instead of shallow word-level analysis or simply keyword or key phrase matching, a deeper analysis is performed in our system by integrating a robust parser and a semantic interpreter. The robust parser is particularly important for spontanecos speech inputs because most of the inquiry sentences/phrases are ill-formed. In addition, in designinga mixed-initiative dialogue system, understanding users' inquiries is essential; however, simply matching keywords and/or key phrases can hardly achieve this. Therefore, a semantic interpreter is incorporated in oar system. The performnce of is also evaluated. The dialogue efficiency is 4.4 sentences per query on an average and the case precision rate of language understanding module is up to 81%. The results are satisfactory.展开更多
Traditionally, the AI community assumes that a knowledge base must be consistent. Despite that, there are many applications where, due to the existence of rules with exceptions, inconsistent knowledge must be consider...Traditionally, the AI community assumes that a knowledge base must be consistent. Despite that, there are many applications where, due to the existence of rules with exceptions, inconsistent knowledge must be considered. One way of restoring consistency is to withdraw conflicting rules;however, this will destroy part of the knowledge. Indeed, a better alternative would be to give precedence to exceptions. This paper proposes a dialogue system for coherent reasoning with inconsistent knowledge, which resolves conflicts by using precedence relations of three kinds: explicit precedence relation, which is synthesized from precedence rules;implicit precedence relation, which is synthesized from defeasible rules;mixed precedence relation, which is synthesized by combining explicit and implicit precedence relations.展开更多
Consistency identification in task-oriented dialogue(CI-ToD)can prevent inconsistent dialogue response generation,which has recently emerged as an important and growing research area.This paper takes the first step to...Consistency identification in task-oriented dialogue(CI-ToD)can prevent inconsistent dialogue response generation,which has recently emerged as an important and growing research area.This paper takes the first step to explore a pre-training paradigm for CI-ToD.Nevertheless,pre-training for CI-ToD is non-trivial because it requires a large amount of multi-turn KB-grounded dialogues,which are extremely hard to collect.To alleviate the data scarcity problem for pre-training,we introduce a modularized pre-training framework(MPFToD),which is capable of utilizing large amounts of KB-free dialogues.Specifically,such modularization allows us to decouple CI-ToD into three sub-modules and propose three pre-training tasks including(i)query response matching pre-training;(ii)dialogue history consistent identification pre-training;and(iii)KB mask language modeling to enhance different abilities of CI-ToD model.As different sub-tasks are solved separately,MPFToD can learn from large amounts of KB-free dialogues for different modules,which are much easier to obtain.Results on the CI-ToD benchmark show that MPFToD pushes the state-of-the-art performance from 56.3%to 61.0%.Furthermore,we show its transferability with promising performance on other downstream tasks(i.e.,dialog act recognition,sentiment classification and table fact checking).展开更多
Due to the significance and value in human-computer interaction and natural language processing,task-oriented dialog systems are attracting more and more attention in both academic and industrial communities.In this p...Due to the significance and value in human-computer interaction and natural language processing,task-oriented dialog systems are attracting more and more attention in both academic and industrial communities.In this paper,we survey recent advances and challenges in task-oriented dialog systems.We also discuss three critical topics for task-oriented dialog systems:(1)improving data efficiency to facilitate dialog modeling in low-resource settings,(2)modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance,and(3)integrating domain ontology knowledge into the dialog model.Besides,we review the recent progresses in dialog evaluation and some widely-used corpora.We believe that this survey,though incomplete,can shed a light on future research in task-oriented dialog systems.展开更多
Dialogue policy learning(DPL)is a key component in a task-oriented dialogue(TOD)system.Its goal is to decide the next action of the dialogue system,given the dialogue state at each turn based on a learned dialogue pol...Dialogue policy learning(DPL)is a key component in a task-oriented dialogue(TOD)system.Its goal is to decide the next action of the dialogue system,given the dialogue state at each turn based on a learned dialogue policy.Reinforcement learning(RL)is widely used to optimize this dialogue policy.In the learning process,the user is regarded as the environment and the system as the agent.In this paper,we present an overview of the recent advances and challenges in dialogue policy from the perspective of RL.More specifically,we identify the problems and summarize corresponding solutions for RL-based dialogue policy learning.In addition,we provide a comprehensive survey of applying RL to DPL by categorizing recent methods into five basic elements in RL.We believe this survey can shed light on future research in DPL.展开更多
Large-scale pre-training has shown remarkable performance in building open-domain dialogue systems.However,previous works mainly focus on showing and evaluating the conversational performance of the released dialogue ...Large-scale pre-training has shown remarkable performance in building open-domain dialogue systems.However,previous works mainly focus on showing and evaluating the conversational performance of the released dialogue model,ignoring the discussion of some key factors towards a powerful human-like chatbot,especially in Chinese scenarios.In this paper,we conduct extensive experiments to investigate these under-explored factors,including data quality control,model architecture designs,training approaches,and decoding strategies.We propose EVA2.0,a large-scale pre-trained open-domain Chinese dialogue model with 2.8 billion parameters,and will make our models and codes publicly available.Automatic and human evaluations show that EVA2.0 significantly outperforms other open-source counterparts.We also discuss the limitations of this work by presenting some failure cases and pose some future research directions on large-scale Chinese open-domain dialogue systems.展开更多
This paper focuses on end-to-end task-oriented dialogue systems,which jointly handle dialogue state tracking(DST)and response generation.Traditional methods usually adopt a supervised paradigm to learn DST from a manu...This paper focuses on end-to-end task-oriented dialogue systems,which jointly handle dialogue state tracking(DST)and response generation.Traditional methods usually adopt a supervised paradigm to learn DST from a manually labeled corpus.However,the annotation of the corpus is costly,time-consuming,and cannot cover a wide range of domains in the real world.To solve this problem,we propose a multi-span prediction network(MSPN)that performs unsupervised DST for end-to-end task-oriented dialogue.Specifically,MSPN contains a novel split-merge copy mechanism that captures long-term dependencies in dialogues to automatically extract multiple text spans as keywords.Based on these keywords,MSPN uses a semantic distance based clustering approach to obtain the values of each slot.In addition,we propose an ontology-based reinforcement learning approach,which employs the values of each slot to train MSPN to generate relevant values.Experimental results on single-domain and multi-domain task-oriented dialogue datasets show that MSPN achieves state-of-the-art performance with significant improvements.Besides,we construct a new Chinese dialogue dataset MeDial in the low-resource medical domain,which further demonstrates the adaptability of MSPN.展开更多
为实现翻译机器人的智能对话和翻译,研究提出了一种基于改进生成对抗网络的新系统。新系统引入注意力机制提升英语对话语义信息的抓取能力。研究设计了结合注意力机制的改进生成对抗网络架构,通过生成器和判别器的对抗训练优化语义生成...为实现翻译机器人的智能对话和翻译,研究提出了一种基于改进生成对抗网络的新系统。新系统引入注意力机制提升英语对话语义信息的抓取能力。研究设计了结合注意力机制的改进生成对抗网络架构,通过生成器和判别器的对抗训练优化语义生成能力和对话自然度。并搭建了基于B/S架构的翻译机器人对话系统,实现语音问答、翻译和用户管理等功能,并评估了系统运行效果。研究结果表明,新系统在FutureBeeAI English General Conversational Text Dataset数据集中表现更好,其中新系统的BLEU值最高能够达到0.82,相较于RNN模型提升了0.27。同时新系统在不同数据集测试中其BLEU值也比RNN模型高了0.20。在系统对话翻译准确率中,新系统的准确率最高能够达到96.52%,相较于RNN提升了37.88%,且不同数据集测试中也比RNN高了23.44%。同时新系统相较于单一的生成对抗网络其BLEU值提升了0.30。同时使用新系统的机器人能够完成不同问题的对话和翻译。由此可见,研究构建的新系统能够实现翻译机器人的智能对话,并且具有很好的翻译和对话效果。展开更多
基金Supported by the Ministerial Level Advanced Research Foundation(404050301.4)the National Natural Science Foundation of hina(60605015)
文摘A scenic-spot introduction-task-oriented 3D virtual human spoken dialogue system-- EasyGuide is introduced. The system includes five modules: natural language processing, task do- main knowledge database, dialogue management, voice processing and 3D virtual human text-to-vis- ual speech synthesis. In the first module, dictionary construction along with sentence analysis and semantic representation axe illustrated specifically. A tree-structured knowledge database is designed for the task domain. A novel framework based on the keyword analysis and context constraints is proposed as the dialogue management. As for voice processing module, a software development kit which performs speech recognition and synthesis is introduced briefly. In the last module, 3D viseme synthesis is explained with examples and a text-driven facial animation system is presented. Evalua- tion results show that the system can achieve satisfactory performance.
基金This research was partially supported by:Zhejiang Laboratory(2020AA3AB05)the Fundamental Research Funds for the Provincial Universities of Zhejiang(RF-A2020007).
文摘As a representative technique in natural language processing(NLP),named entity recognition is used in many tasks,such as dialogue systems,machine translation and information extraction.In dialogue systems,there is a common case for named entity recognition,where a lot of entities are composed of numbers,and are segmented to be located in different places.For example,in multiple rounds of dialogue systems,a phone number is likely to be divided into several parts,because the phone number is usually long and is emphasized.In this paper,the entity consisting of numbers is named as number entity.The discontinuous positions of number entities result from many reasons.We find two reasons from real-world dialogue systems.The first reason is the repetitive confirmation of different components of a number entity,and the second reason is the interception of mood words.The extraction of number entities is quite useful in many tasks,such as user information completion and service requests correction.However,the existing entity extraction methods cannot extract entities consisting of discontinuous entity blocks.To address these problems,in this paper,we propose a comprehensive method for number entity recognition,which is capable of extracting number entities in multiple rounds of dialogues systems.We conduct extensive experiments on a real-world dataset,and the experimental results demonstrate the high performance of our method.
文摘Neural talk models play a leading role in the growing popular building of conversational managers.A commonplace criticism of those systems is that they seldom understand or use the conversation data efficiently.The development of profound concentration on innovations has increased the use of neural models for a discussion display.In recent years,deep learning(DL)models have achieved significant success in various tasks,and many dialogue systems are also employing DL techniques.The primary issues involved in the generation of the dialogue system are acquiring perspectives into instinctual linguistics,comprehension provision,and conversation assessment.In this paper,we mainly focus on DL-based dialogue systems.The issue to be overcome under this publication would be dialogue supervision,which will determine how the framework responds to recognizing the needs of the user.The dataset utilized in this research is extracted from movies.The models implemented in this research are the seq2seq model,transformers,and GPT while using word embedding and NLP.The results obtained after implementation depicted that all three models produced accurate results.In the modern revolutionized world,the demand for a dialogue system is more than ever.Therefore,it is essential to take the necessary steps to build effective dialogue systems.
基金funded by the Science and Technology Foundation of Chongqing EducationCommission(GrantNo.KJQN202301153)the ScientificResearch Foundation of Chongqing University of Technology(Grant No.2021ZDZ025)the Postgraduate Innovation Foundation of Chongqing University of Technology(Grant No.gzlcx20243524).
文摘In task-oriented dialogue systems, intent, emotion, and actions are crucial elements of user activity. Analyzing the relationships among these elements to control and manage task-oriented dialogue systems is a challenging task. However, previous work has primarily focused on the independent recognition of user intent and emotion, making it difficult to simultaneously track both aspects in the dialogue tracking module and to effectively utilize user emotions in subsequent dialogue strategies. We propose a Multi-Head Encoder Shared Model (MESM) that dynamically integrates features from emotion and intent encoders through a feature fusioner. Addressing the scarcity of datasets containing both emotion and intent labels, we designed a multi-dataset learning approach enabling the model to generate dialogue summaries encompassing both user intent and emotion. Experiments conducted on the MultiWoZ and MELD datasets demonstrate that our model effectively captures user intent and emotion, achieving extremely competitive results in dialogue state tracking tasks.
基金supported by the MSIT(Ministry of Science and ICT),Republic of Korea,under the ITRC(Information Technology Research Centre)support program(IITP-2024-RS-2024-00437191)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘Dialogue State Tracking(DST)is a critical component of task-oriented spoken dialogue systems(SDS),tasked with maintaining an accurate representation of the conversational state by predicting slots and their corresponding values.Recent advances leverage Large Language Models(LLMs)with prompt-based tuning to improve tracking accuracy and efficiency.However,these approaches often incur substantial computational and memory overheads and typically address slot extraction implicitly within prompts,without explicitly modeling the complex dependencies between slots and values.In this work,we propose PUGG,a novel DST framework that constructs schema-driven prompts to fine-tune GPT-2 and utilizes its tokenizer to implement a memory encoder.PUGG explicitly extracts slot values via GPT-2 and employs Graph Attention Networks(GATs)to model and reason over the intricate relationships between slots and their associated values.We evaluate PUGG on four publicly available datasets,where it achieves stateof-the-art performance across multiple evaluation metrics,highlighting its robustness and generalizability in diverse conversational scenarios.Our results indicate that the integration of GPT-2 substantially reduces model complexity and memory consumption by streamlining key processes.Moreover,prompt tuning enhances the model’s flexibility and precision in extracting relevant slot-value pairs,while the incorporation of GATs facilitates effective relational reasoning,leading to improved dialogue state representations.
文摘Algorithms of detecting dialogue deviations from a dialogue topic in an agent and ontology-based dialogue management system(AODMS) are proposed. In AODMS, agents and ontologies are introduced to represent domain knowledge. And general algorithms that model dialogue phenomena in different domains can be realized in that complex relationships between knowledge in different domains can be described by ontologies. An evaluation of the dialogue management system with deviation-judging algorithms on 736 utterances shows that the AODMS is able to talk about the given topic consistently and answer 86.6 % of the utterances, while only 72.1% of the utterances can be responded correctly without deviation-judging module.
文摘SHTQS is an intelligent telephone-besed spoken dialyze system providing the infomation about the best route between two sites in Shanghai. Instead of separated parts of speech decoding and language parsing, a close cool,ration is carded out in SHTQS by integrating automatic speech recognizer (AS,R), language understanding, dialogue management and speech generatot. In such a way, the erroneous analysis and uncertainty happening in the preceding stages would be recovered and determined acourately with high-level knowledge, Moreover, instead of shallow word-level analysis or simply keyword or key phrase matching, a deeper analysis is performed in our system by integrating a robust parser and a semantic interpreter. The robust parser is particularly important for spontanecos speech inputs because most of the inquiry sentences/phrases are ill-formed. In addition, in designinga mixed-initiative dialogue system, understanding users' inquiries is essential; however, simply matching keywords and/or key phrases can hardly achieve this. Therefore, a semantic interpreter is incorporated in oar system. The performnce of is also evaluated. The dialogue efficiency is 4.4 sentences per query on an average and the case precision rate of language understanding module is up to 81%. The results are satisfactory.
文摘Traditionally, the AI community assumes that a knowledge base must be consistent. Despite that, there are many applications where, due to the existence of rules with exceptions, inconsistent knowledge must be considered. One way of restoring consistency is to withdraw conflicting rules;however, this will destroy part of the knowledge. Indeed, a better alternative would be to give precedence to exceptions. This paper proposes a dialogue system for coherent reasoning with inconsistent knowledge, which resolves conflicts by using precedence relations of three kinds: explicit precedence relation, which is synthesized from precedence rules;implicit precedence relation, which is synthesized from defeasible rules;mixed precedence relation, which is synthesized by combining explicit and implicit precedence relations.
基金supported by the National Natural Science Foundation of China(NSFC)(Grant Nos.62306342,62176076)the Excellent Young Scientists Fund in Hunan Province(2024JJ4070)+3 种基金supported by the Natural Science Foundation of Guangdong(2023A1515012922)Shenzhen Foundational Research Funding(JCYJ20220818102415032)The Major Key Project of PCL(PCL2023A09)Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies(2022B1212010005k).
文摘Consistency identification in task-oriented dialogue(CI-ToD)can prevent inconsistent dialogue response generation,which has recently emerged as an important and growing research area.This paper takes the first step to explore a pre-training paradigm for CI-ToD.Nevertheless,pre-training for CI-ToD is non-trivial because it requires a large amount of multi-turn KB-grounded dialogues,which are extremely hard to collect.To alleviate the data scarcity problem for pre-training,we introduce a modularized pre-training framework(MPFToD),which is capable of utilizing large amounts of KB-free dialogues.Specifically,such modularization allows us to decouple CI-ToD into three sub-modules and propose three pre-training tasks including(i)query response matching pre-training;(ii)dialogue history consistent identification pre-training;and(iii)KB mask language modeling to enhance different abilities of CI-ToD model.As different sub-tasks are solved separately,MPFToD can learn from large amounts of KB-free dialogues for different modules,which are much easier to obtain.Results on the CI-ToD benchmark show that MPFToD pushes the state-of-the-art performance from 56.3%to 61.0%.Furthermore,we show its transferability with promising performance on other downstream tasks(i.e.,dialog act recognition,sentiment classification and table fact checking).
基金the National Natural Science Foundation of China(Grant Nos.61936010 and 61876096)the National Key R&D Program of China(Grant No.2018YFC0830200)。
文摘Due to the significance and value in human-computer interaction and natural language processing,task-oriented dialog systems are attracting more and more attention in both academic and industrial communities.In this paper,we survey recent advances and challenges in task-oriented dialog systems.We also discuss three critical topics for task-oriented dialog systems:(1)improving data efficiency to facilitate dialog modeling in low-resource settings,(2)modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance,and(3)integrating domain ontology knowledge into the dialog model.Besides,we review the recent progresses in dialog evaluation and some widely-used corpora.We believe that this survey,though incomplete,can shed a light on future research in task-oriented dialog systems.
基金Innovation and Technology Fund(ITF),Government of the Hong Kong Special Administrative Region(HKSAR),China(No.PRP-054-21FX).
文摘Dialogue policy learning(DPL)is a key component in a task-oriented dialogue(TOD)system.Its goal is to decide the next action of the dialogue system,given the dialogue state at each turn based on a learned dialogue policy.Reinforcement learning(RL)is widely used to optimize this dialogue policy.In the learning process,the user is regarded as the environment and the system as the agent.In this paper,we present an overview of the recent advances and challenges in dialogue policy from the perspective of RL.More specifically,we identify the problems and summarize corresponding solutions for RL-based dialogue policy learning.In addition,we provide a comprehensive survey of applying RL to DPL by categorizing recent methods into five basic elements in RL.We believe this survey can shed light on future research in DPL.
基金supported by the 2030 National Key AI Program of China(No.2021ZD0113304)the National Science Foundation for Distinguished Young Scholars(No.62125604)+2 种基金the NSFC projects(Key project with No.61936010 and regular project with No.61876096)the Guoqiang Institute of Tsinghua University,China(Nos.2019GQG1 and 2020GQG0005)Tsinghua-Toyota Joint Research Fund.
文摘Large-scale pre-training has shown remarkable performance in building open-domain dialogue systems.However,previous works mainly focus on showing and evaluating the conversational performance of the released dialogue model,ignoring the discussion of some key factors towards a powerful human-like chatbot,especially in Chinese scenarios.In this paper,we conduct extensive experiments to investigate these under-explored factors,including data quality control,model architecture designs,training approaches,and decoding strategies.We propose EVA2.0,a large-scale pre-trained open-domain Chinese dialogue model with 2.8 billion parameters,and will make our models and codes publicly available.Automatic and human evaluations show that EVA2.0 significantly outperforms other open-source counterparts.We also discuss the limitations of this work by presenting some failure cases and pose some future research directions on large-scale Chinese open-domain dialogue systems.
基金supported by the National Key Research and Development Program of China under Grant No.2020AAA0106400the National Natural Science Foundation of China under Grant Nos.61922085 and 61976211+2 种基金the Independent Research Project of National Laboratory of Pattern Recognition under Grant No.Z-2018013the Key Research Program of Chinese Academy of Sciences(CAS)under Grant No.ZDBS-SSW-JSC006the Youth Innovation Promotion Association CAS under Grant No.201912.
文摘This paper focuses on end-to-end task-oriented dialogue systems,which jointly handle dialogue state tracking(DST)and response generation.Traditional methods usually adopt a supervised paradigm to learn DST from a manually labeled corpus.However,the annotation of the corpus is costly,time-consuming,and cannot cover a wide range of domains in the real world.To solve this problem,we propose a multi-span prediction network(MSPN)that performs unsupervised DST for end-to-end task-oriented dialogue.Specifically,MSPN contains a novel split-merge copy mechanism that captures long-term dependencies in dialogues to automatically extract multiple text spans as keywords.Based on these keywords,MSPN uses a semantic distance based clustering approach to obtain the values of each slot.In addition,we propose an ontology-based reinforcement learning approach,which employs the values of each slot to train MSPN to generate relevant values.Experimental results on single-domain and multi-domain task-oriented dialogue datasets show that MSPN achieves state-of-the-art performance with significant improvements.Besides,we construct a new Chinese dialogue dataset MeDial in the low-resource medical domain,which further demonstrates the adaptability of MSPN.
文摘为实现翻译机器人的智能对话和翻译,研究提出了一种基于改进生成对抗网络的新系统。新系统引入注意力机制提升英语对话语义信息的抓取能力。研究设计了结合注意力机制的改进生成对抗网络架构,通过生成器和判别器的对抗训练优化语义生成能力和对话自然度。并搭建了基于B/S架构的翻译机器人对话系统,实现语音问答、翻译和用户管理等功能,并评估了系统运行效果。研究结果表明,新系统在FutureBeeAI English General Conversational Text Dataset数据集中表现更好,其中新系统的BLEU值最高能够达到0.82,相较于RNN模型提升了0.27。同时新系统在不同数据集测试中其BLEU值也比RNN模型高了0.20。在系统对话翻译准确率中,新系统的准确率最高能够达到96.52%,相较于RNN提升了37.88%,且不同数据集测试中也比RNN高了23.44%。同时新系统相较于单一的生成对抗网络其BLEU值提升了0.30。同时使用新系统的机器人能够完成不同问题的对话和翻译。由此可见,研究构建的新系统能够实现翻译机器人的智能对话,并且具有很好的翻译和对话效果。