基于知识库的问答(question answering over knowledge base,QA-KB)致力于从语义角度更准确地分析用户的查询意图,并用简洁准确的结果回答用户的自然语言问题.现有的QA-KB方法大多基于APA(alignment-prediction-answering)框架,将整个...基于知识库的问答(question answering over knowledge base,QA-KB)致力于从语义角度更准确地分析用户的查询意图,并用简洁准确的结果回答用户的自然语言问题.现有的QA-KB方法大多基于APA(alignment-prediction-answering)框架,将整个问答过程拆解为多个分离的任务,采用贪心思想作为决策本质,缺乏统一化的建模与全局化的优化策略.因此提出一种端到端的无监督QA-KB框架,并使用动态规划算法支撑全局的优化与决策.实验结果表明该方法在中文问答数据集中取得了良好效果,尤其在解决多跳问题上有突出表现,为现有的问答系统提供了新思路.展开更多
We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an att...We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an attention-based graph neural network that yields high-accuracy predictions.Our approach employs two attention mechanisms that allow for message passing on the crystal graphs,which in turn enable the model to selectively attend to pertinent atoms and their local environments,thereby improving performance.We conduct comprehensive experiments to validate our approach,which demonstrates that our method surpasses existing methods in terms of predictive accuracy.Our results suggest that deep learning,particularly attention-based networks,holds significant promise for predicting crystal material properties,with implications for material discovery and the refined intelligent systems.展开更多
Federated learning(FL)is a novel distributed machine learning paradigm that enables participants to collaboratively train a centralized model with privacy preservation by eliminating the requirement of data sharing.In...Federated learning(FL)is a novel distributed machine learning paradigm that enables participants to collaboratively train a centralized model with privacy preservation by eliminating the requirement of data sharing.In practice,FL often involves multiple participants and requires the third party to aggregate global information to guide the update of the target participant.Therefore,many FL methods do not work well due to the training and test data of each participant may not be sampled from the same feature space and the same underlying distribution.Meanwhile,the differences in their local devices(system heterogeneity),the continuous influx of online data(incremental data),and labeled data scarcity may further influence the performance of these methods.To solve this problem,federated transfer learning(FTL),which integrates transfer learning(TL)into FL,has attracted the attention of numerous researchers.However,since FL enables a continuous share of knowledge among participants with each communication round while not allowing local data to be accessed by other participants,FTL faces many unique challenges that are not present in TL.In this survey,we focus on categorizing and reviewing the current progress on federated transfer learning,and outlining corresponding solutions and applications.Furthermore,the common setting of FTL scenarios,available datasets,and significant related research are summarized in this survey.展开更多
Causal inference has recently garnered significant interest among recommender system(RS)researchers due to its ability to dissect cause-and-effect relationships and its broad applicability across multiple fields.It of...Causal inference has recently garnered significant interest among recommender system(RS)researchers due to its ability to dissect cause-and-effect relationships and its broad applicability across multiple fields.It offers a framework to model the causality in RSs such as confounding effects and deal with counterfactual problems such as offline policy evaluation and data augmentation.Although there are already some valuable surveys on causal recommendations,they typically classify approaches based on the practical issues faced in RS,a classification that may disperse and fragment the uni-fied causal theories.Considering RS researchers’unfamiliarity with causality,it is necessary yet challenging to comprehensively review relevant studies from a coherent causal theoretical perspective,thereby facilitating a deeper integration of causal inference in RS.This survey provides a systematic review of up-to-date papers in this area from a causal theory standpoint and traces the evolutionary development of RS methods within the same causal strategy.First,we introduce the fundamental concepts of causal inference as the basis of the following review.Subsequently,we propose a novel theory-driven taxonomy,categorizing existing methods based on the causal theory employed,namely those based on the potential outcome framework,the structural causal model,and general counterfactuals.The review then delves into the technical details of how existing methods apply causal inference to address particular recommender issues.Finally,we highlight some promising directions for future research in this field.Representative papers and open-source resources will be progressively available at https://github.com/Chrissie-Law/Causal-Inference-forRecommendation.展开更多
Rational drug use requires that patients receive medications for an adequate period of time.The adequate duration time of medications not only improve the therapeutic effect of medicines,but also reduce the side effec...Rational drug use requires that patients receive medications for an adequate period of time.The adequate duration time of medications not only improve the therapeutic effect of medicines,but also reduce the side effects and adverse reactions of medicines.This paper proposes a data-driven method to mine typical treatment duration patterns for rational drug use from electronic medical records (EMRs).Firstly,a quintuple is defined to describe drug use duration statistics (DUDS) for each drug and treatment record is further represented with DUDS vector (DUDSV).Next a similarity measure method is adopted to compute the similarity between treatment records.Meanwhile,a clustering algorithm is used to cluster all patient treatment records to extract typical treatment duration patterns including typical drug sets,effective drug use day sets,and the DUDSs of each typical drug.Then the extracted typical treatment duration patterns are evaluated and annotated based on patients' demographic information,disease severity scores,treatment outcome and diagnostic information.Finally,a real-world EMR dataset is performed to indicate that the approach we proposed can effectively mine typical treatment duration patterns from EMRs and recommend the appropriate treatment regimens for patients based on their admission information.展开更多
Hierarchical topic model has been widely applied in many real applications, because it can build a hierarchy on topics with guaranteeing of topics' quality. Most of traditional methods build a hierarchy by adopting l...Hierarchical topic model has been widely applied in many real applications, because it can build a hierarchy on topics with guaranteeing of topics' quality. Most of traditional methods build a hierarchy by adopting low-level topics as new features to construct high-level ones, which will often cause semantic confusion between low-level topics and high-level ones. To address the above problem, we propose a novel topic model named hierarchical sparse NMF with orthogonal constraint (HSOC), which is based on non-negative matrix factorization and builds topic hierarchy via splitting super-topics into sub-topics. In HSOC, we introduce global independence, local independence and information consistency to constraint the split topics. Extensive experimental results on real-world corpora show that the purposed model achieves comparable performance on topic quality and better performance on semantic feature representation of documents compared with baseline methods.展开更多
As a kind of the most significantly popular information in markets,the sales ranking has great impacts on consumer choice.However,there are few discussions on how sales ranking should be provided to consumers in the l...As a kind of the most significantly popular information in markets,the sales ranking has great impacts on consumer choice.However,there are few discussions on how sales ranking should be provided to consumers in the literature.This paper aims to answer the following two questions:1)To what extent does the sales ranking influence consumer choices;2)When the sales ranking should be provided to consumers.To do so,this paper first constructs a sales ranking model and then provides detailed simulation experiments to demonstrate the model.The experimental results show that for markets where consumer preferences are dramatically different,such as music and movie markets,sales rankings do not have significant influences on consumer choices and should not be provided to consumers until a large number of early independent consumer choices have been accumulated.But for markets in which consumer preferences are similar,such as markets for official supplies,sales rankings have more influences on consumer choices and should be provided to consumers earlier.Furthermore,an evolution strategy is proposed to ascertain the most suitable sales rankings(characterised by suitable influence strength and suitable release time)for some specified online markets.The comparison results show that the optimized sales rankings not only can help consumers discover higher-quality products but also can improve overall sales.展开更多
In this paper it is shown how to transform a regular triangular set into a normal triangular set by computing the W-characteristic set of their saturated ideal and an algorithm is proposed for decomposing any polynomi...In this paper it is shown how to transform a regular triangular set into a normal triangular set by computing the W-characteristic set of their saturated ideal and an algorithm is proposed for decomposing any polynomial set into ?nitely many strong characteristic pairs, each of which is formed with the reduced lexicographic Gr?bner basis and the normal W-characteristic set of a characterizable ideal.展开更多
Software systems are present all around us and playing their vital roles in our daily life.The correct functioning of these systems is of prime concern.In addition to classical testing techniques,formal techniques lik...Software systems are present all around us and playing their vital roles in our daily life.The correct functioning of these systems is of prime concern.In addition to classical testing techniques,formal techniques like model checking are used to reinforce the quality and reliability of software systems.However,obtaining of behavior model,which is essential for model-based techniques,of unknown software systems is a challenging task.To mitigate this problem,an emerging black-box analysis technique,called Model Learning,can be applied.It complements existing model-based testing and verification approaches by providing behavior models of blackbox systems fully automatically.This paper surveys the model learning technique,which recently has attracted much attention from researchers,especially from the domains of testing and verification.First,we review the background and foundations of model learning,which form the basis of subsequent sections.Second,we present some well-known model learning tools and provide their merits and shortcomings in the form of a comparison table.Third,we describe the successful applications of model learning in multidisciplinary fields,current challenges along with possible future works,and concluding remarks.展开更多
On May 12,2019 we shall celebrate the centenary birthday of the late Professor Wu Wen-Tsun (1919-2017),one of the most famous and influential mathematicians in China.Wu made foundational contributions to the field of ...On May 12,2019 we shall celebrate the centenary birthday of the late Professor Wu Wen-Tsun (1919-2017),one of the most famous and influential mathematicians in China.Wu made foundational contributions to the field of topology and established Mathematics Mechanization as a discipline.He devoted himself to an academic career of more than six decades with leadership and extensive activities of research and education across mathematics to computer science and artificial intelligence.The scope of his research spans from algebraic topology, differential topology,and algebraic geometry to automated reasoning,symbolic computation, and game theory,and to the history of mathematics.His early work in topology was a major breakthrough,leading to well-known and now classical results including the characteristic class and formulas named after Wu.In the late 1970s,he pioneered the research of Mathematics Mechanization through his invention of the "Wu method",which revolutionized the field of automated reasoning.展开更多
Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge.However,when we aggregate the crowd knowledge based on the currently developed voting algorithms,it often results i...Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge.However,when we aggregate the crowd knowledge based on the currently developed voting algorithms,it often results in common knowledge that may not be expected.In this paper,we consider the problem of collecting specific knowledge via crowdsourcing.With the help of using external knowledge base such as WordNet,we incorporate the semantic relations between the alternative answers into a probabilisticmodel to determine which answer is more specific.We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption,and solve it by the expectation-maximization(EM)algorithm.To increase algorithm compatibility,we also refine our method into semi-supervised one.Experimental results show that our approach is robust with hyper-parameters and achieves better improvement thanmajority voting and other algorithms when more specific answers are expected,especially for sparse data.展开更多
1 Introduction Time seriesaugmentationis an essential approachto solvethe overfitting problem on the time series classification(TSC)task[1,2].Although existing approaches perform better in mitigating this problem,none...1 Introduction Time seriesaugmentationis an essential approachto solvethe overfitting problem on the time series classification(TSC)task[1,2].Although existing approaches perform better in mitigating this problem,none of them focus on protecting saliency regions on time series.The key informative shapelets contained in these regions are the core basis for distinguishing categories(e.g.,upward spikes in ECG and high amplitude in Sensor).展开更多
文摘基于知识库的问答(question answering over knowledge base,QA-KB)致力于从语义角度更准确地分析用户的查询意图,并用简洁准确的结果回答用户的自然语言问题.现有的QA-KB方法大多基于APA(alignment-prediction-answering)框架,将整个问答过程拆解为多个分离的任务,采用贪心思想作为决策本质,缺乏统一化的建模与全局化的优化策略.因此提出一种端到端的无监督QA-KB框架,并使用动态规划算法支撑全局的优化与决策.实验结果表明该方法在中文问答数据集中取得了良好效果,尤其在解决多跳问题上有突出表现,为现有的问答系统提供了新思路.
基金the National Natural Science Foundation of China(Grant Nos.61972016 and 62032016)the Beijing Nova Program(Grant No.20220484106)。
文摘We present a novel approach for the prediction of crystal material properties that is distinct from the computationally complex and expensive density functional theory(DFT)-based calculations.Instead,we utilize an attention-based graph neural network that yields high-accuracy predictions.Our approach employs two attention mechanisms that allow for message passing on the crystal graphs,which in turn enable the model to selectively attend to pertinent atoms and their local environments,thereby improving performance.We conduct comprehensive experiments to validate our approach,which demonstrates that our method surpasses existing methods in terms of predictive accuracy.Our results suggest that deep learning,particularly attention-based networks,holds significant promise for predicting crystal material properties,with implications for material discovery and the refined intelligent systems.
基金the National Key R&D Program of China(No.2021ZD0113602)the National Natural Science Foundation of China(Grant Nos.62176014 and 62202273)。
文摘Federated learning(FL)is a novel distributed machine learning paradigm that enables participants to collaboratively train a centralized model with privacy preservation by eliminating the requirement of data sharing.In practice,FL often involves multiple participants and requires the third party to aggregate global information to guide the update of the target participant.Therefore,many FL methods do not work well due to the training and test data of each participant may not be sampled from the same feature space and the same underlying distribution.Meanwhile,the differences in their local devices(system heterogeneity),the continuous influx of online data(incremental data),and labeled data scarcity may further influence the performance of these methods.To solve this problem,federated transfer learning(FTL),which integrates transfer learning(TL)into FL,has attracted the attention of numerous researchers.However,since FL enables a continuous share of knowledge among participants with each communication round while not allowing local data to be accessed by other participants,FTL faces many unique challenges that are not present in TL.In this survey,we focus on categorizing and reviewing the current progress on federated transfer learning,and outlining corresponding solutions and applications.Furthermore,the common setting of FTL scenarios,available datasets,and significant related research are summarized in this survey.
基金This review is supported by the National Key Research and Development Program of China under grant no.2021ZD0113602the National Natural Science Foundation of China under grant nos.62176014 and 62276015the Fundamental Research Funds for the Central Universities.
文摘Causal inference has recently garnered significant interest among recommender system(RS)researchers due to its ability to dissect cause-and-effect relationships and its broad applicability across multiple fields.It offers a framework to model the causality in RSs such as confounding effects and deal with counterfactual problems such as offline policy evaluation and data augmentation.Although there are already some valuable surveys on causal recommendations,they typically classify approaches based on the practical issues faced in RS,a classification that may disperse and fragment the uni-fied causal theories.Considering RS researchers’unfamiliarity with causality,it is necessary yet challenging to comprehensively review relevant studies from a coherent causal theoretical perspective,thereby facilitating a deeper integration of causal inference in RS.This survey provides a systematic review of up-to-date papers in this area from a causal theory standpoint and traces the evolutionary development of RS methods within the same causal strategy.First,we introduce the fundamental concepts of causal inference as the basis of the following review.Subsequently,we propose a novel theory-driven taxonomy,categorizing existing methods based on the causal theory employed,namely those based on the potential outcome framework,the structural causal model,and general counterfactuals.The review then delves into the technical details of how existing methods apply causal inference to address particular recommender issues.Finally,we highlight some promising directions for future research in this field.Representative papers and open-source resources will be progressively available at https://github.com/Chrissie-Law/Causal-Inference-forRecommendation.
基金The authors would like to thank the anonymous referees for their help to improve the quality of the paper. This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 71771034 and 71421001Science and Technology Program of Jieyang under Grant No. 2017xm041+1 种基金China Postdoctoral Science Foundation under Grant No. 2017M620054, and the Scientific and Technological Innovation Foundation of Dalian under Grant No. 2018J11CY009This paper is a significantly extended and revised version of the conference paper presented at KSS-2018.
文摘Rational drug use requires that patients receive medications for an adequate period of time.The adequate duration time of medications not only improve the therapeutic effect of medicines,but also reduce the side effects and adverse reactions of medicines.This paper proposes a data-driven method to mine typical treatment duration patterns for rational drug use from electronic medical records (EMRs).Firstly,a quintuple is defined to describe drug use duration statistics (DUDS) for each drug and treatment record is further represented with DUDS vector (DUDSV).Next a similarity measure method is adopted to compute the similarity between treatment records.Meanwhile,a clustering algorithm is used to cluster all patient treatment records to extract typical treatment duration patterns including typical drug sets,effective drug use day sets,and the DUDSs of each typical drug.Then the extracted typical treatment duration patterns are evaluated and annotated based on patients' demographic information,disease severity scores,treatment outcome and diagnostic information.Finally,a real-world EMR dataset is performed to indicate that the approach we proposed can effectively mine typical treatment duration patterns from EMRs and recommend the appropriate treatment regimens for patients based on their admission information.
文摘Hierarchical topic model has been widely applied in many real applications, because it can build a hierarchy on topics with guaranteeing of topics' quality. Most of traditional methods build a hierarchy by adopting low-level topics as new features to construct high-level ones, which will often cause semantic confusion between low-level topics and high-level ones. To address the above problem, we propose a novel topic model named hierarchical sparse NMF with orthogonal constraint (HSOC), which is based on non-negative matrix factorization and builds topic hierarchy via splitting super-topics into sub-topics. In HSOC, we introduce global independence, local independence and information consistency to constraint the split topics. Extensive experimental results on real-world corpora show that the purposed model achieves comparable performance on topic quality and better performance on semantic feature representation of documents compared with baseline methods.
基金supported in part by the National Natural Science Foundation of China(Nos.71771034,71901011,71971039)the Science of Technology Program of Jieyang(No.2017xm041)+1 种基金Funds for Creative Research Group of China(No.71421001)the Scientific and Technological Innovation Foundation of Dalian(No.2018J11CY009).
文摘As a kind of the most significantly popular information in markets,the sales ranking has great impacts on consumer choice.However,there are few discussions on how sales ranking should be provided to consumers in the literature.This paper aims to answer the following two questions:1)To what extent does the sales ranking influence consumer choices;2)When the sales ranking should be provided to consumers.To do so,this paper first constructs a sales ranking model and then provides detailed simulation experiments to demonstrate the model.The experimental results show that for markets where consumer preferences are dramatically different,such as music and movie markets,sales rankings do not have significant influences on consumer choices and should not be provided to consumers until a large number of early independent consumer choices have been accumulated.But for markets in which consumer preferences are similar,such as markets for official supplies,sales rankings have more influences on consumer choices and should be provided to consumers earlier.Furthermore,an evolution strategy is proposed to ascertain the most suitable sales rankings(characterised by suitable influence strength and suitable release time)for some specified online markets.The comparison results show that the optimized sales rankings not only can help consumers discover higher-quality products but also can improve overall sales.
基金supported partially by the National Natural Science Foundation of China under Grant Nos.11771034 and 11401018
文摘In this paper it is shown how to transform a regular triangular set into a normal triangular set by computing the W-characteristic set of their saturated ideal and an algorithm is proposed for decomposing any polynomial set into ?nitely many strong characteristic pairs, each of which is formed with the reduced lexicographic Gr?bner basis and the normal W-characteristic set of a characterizable ideal.
基金the National Natural Science Foundation of China(NSFC)(Grant Nos.61872016,61932007 and 61972013).
文摘Software systems are present all around us and playing their vital roles in our daily life.The correct functioning of these systems is of prime concern.In addition to classical testing techniques,formal techniques like model checking are used to reinforce the quality and reliability of software systems.However,obtaining of behavior model,which is essential for model-based techniques,of unknown software systems is a challenging task.To mitigate this problem,an emerging black-box analysis technique,called Model Learning,can be applied.It complements existing model-based testing and verification approaches by providing behavior models of blackbox systems fully automatically.This paper surveys the model learning technique,which recently has attracted much attention from researchers,especially from the domains of testing and verification.First,we review the background and foundations of model learning,which form the basis of subsequent sections.Second,we present some well-known model learning tools and provide their merits and shortcomings in the form of a comparison table.Third,we describe the successful applications of model learning in multidisciplinary fields,current challenges along with possible future works,and concluding remarks.
文摘On May 12,2019 we shall celebrate the centenary birthday of the late Professor Wu Wen-Tsun (1919-2017),one of the most famous and influential mathematicians in China.Wu made foundational contributions to the field of topology and established Mathematics Mechanization as a discipline.He devoted himself to an academic career of more than six decades with leadership and extensive activities of research and education across mathematics to computer science and artificial intelligence.The scope of his research spans from algebraic topology, differential topology,and algebraic geometry to automated reasoning,symbolic computation, and game theory,and to the history of mathematics.His early work in topology was a major breakthrough,leading to well-known and now classical results including the characteristic class and formulas named after Wu.In the late 1970s,he pioneered the research of Mathematics Mechanization through his invention of the "Wu method",which revolutionized the field of automated reasoning.
基金This work was supported partly by National Key Research and Development Program of China(2019YFB1705902)partly by the National Natural Science Foundation of China(Grant Nos.61932007,61972013,61976187,61421003).
文摘Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge.However,when we aggregate the crowd knowledge based on the currently developed voting algorithms,it often results in common knowledge that may not be expected.In this paper,we consider the problem of collecting specific knowledge via crowdsourcing.With the help of using external knowledge base such as WordNet,we incorporate the semantic relations between the alternative answers into a probabilisticmodel to determine which answer is more specific.We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption,and solve it by the expectation-maximization(EM)algorithm.To increase algorithm compatibility,we also refine our method into semi-supervised one.Experimental results show that our approach is robust with hyper-parameters and achieves better improvement thanmajority voting and other algorithms when more specific answers are expected,especially for sparse data.
基金supported by the National Key Research and Development Program (2018YFB1306000)Ministry of Industry and Information Technology of China (2105-370171-07-02-860873)+1 种基金State Key Lab of Software Development Environment (SKLSDE)Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC).
文摘1 Introduction Time seriesaugmentationis an essential approachto solvethe overfitting problem on the time series classification(TSC)task[1,2].Although existing approaches perform better in mitigating this problem,none of them focus on protecting saliency regions on time series.The key informative shapelets contained in these regions are the core basis for distinguishing categories(e.g.,upward spikes in ECG and high amplitude in Sensor).