Existing text truth discovery methods fail to address two challenges:the inherent long-distance dependencies and thematic diversity of long texts;the inherent subjective sentiment that obscures objective evaluation of...Existing text truth discovery methods fail to address two challenges:the inherent long-distance dependencies and thematic diversity of long texts;the inherent subjective sentiment that obscures objective evaluation of source reliability.To address these challenges,a novel truth discovery method named large language model(LLM)-enhanced text truth discovery with dual attention(LTDDA)is proposed.First,LLMs generate embedded representations of text claims,and enhance the feature space to tackle long-distance dependencies and thematic diversity.Then,the complex relationship between source reliability and claim credibility is captured by integrating semantic and sentiment features.Finally,dual-layer attention is applied to extract key semantic information and assign consistent weights to similar sources,resulting in accurate truth outputs.Extensive experiments on three realworld datasets demonstrate that the effectiveness of LTDDA outperforms that of state-of-the-art methods,providing new insights for building more reliable and accurate text truth discovery systems.展开更多
Air pollution has become a global concern for many years.Vehicular crowdsensing systems make it possible to monitor air quality at a fine granularity.To better utilize the sensory data with varying credibility,truth d...Air pollution has become a global concern for many years.Vehicular crowdsensing systems make it possible to monitor air quality at a fine granularity.To better utilize the sensory data with varying credibility,truth discovery frameworks are introduced.However,in urban cities,there is a significant difference in traffic volumes of streets or blocks,which leads to a data sparsity problem for truth discovery.Protecting the privacy of participant vehicles is also a crucial task.We first present a data masking-based privacy-preserving truth discovery framework,which incorporates spatial and temporal correlations to solve the sparsity problem.To further improve the truth discovery performance of the presented framework,an enhanced version is proposed with anonymous communication and data perturbation.Both frameworks are more lightweight than the existing cryptography-based methods.We also evaluate the work with simulations and fully discuss the performance and possible extensions.展开更多
Unmanned and aerial systems as interactors among different system components for communications,have opened up great opportunities for truth data discovery in Mobile Crowd Sensing(MCS)which has not been properly solve...Unmanned and aerial systems as interactors among different system components for communications,have opened up great opportunities for truth data discovery in Mobile Crowd Sensing(MCS)which has not been properly solved in the literature.In this paper,an Unmanned Aerial Vehicles-supported Intelligent Truth Discovery(UAV-ITD)scheme is proposed to obtain truth data at low-cost communications for MCS.The main innovations of the UAV-ITD scheme are as follows:(1)UAV-ITD scheme takes the first step in employing UAV joint Deep Matrix Factorization(DMF)to discover truth data based on the trust mechanism for an Information Elicitation Without Verification(IEWV)problem in MCS.(2)This paper introduces a truth data discovery scheme for the first time that only needs to collect a part of n data samples to infer the data of the entire network with high accuracy,which saves more communication costs than most previous data collection schemes,where they collect n or kn data samples.Finally,we conducted extensive experiments to evaluate the UAV-ITD scheme.The results show that compared with previous schemes,our scheme can reduce estimated truth error by 52.25%–96.09%,increase the accuracy of workers’trust evaluation by 0.68–61.82 times,and save recruitment costs by 24.08%–54.15%in truth data discovery.展开更多
With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and...With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and get the true information, truth discovery has been proposed and received widespread attention. Many algorithms have been proposed to adapt to different scenarios. This paper aims to investigate these algorithms and summarize them from the perspective of algorithm models and specific concepts. Some classic datasets and evaluation metrics are given in this paper. Some future directions for readers are also provided to better understand the field of truth discovery.展开更多
Truth discovery aims to resolve conflicts among multiple sources and find the truth. Conventional methods for truth discovery mainly investigate the mutual effect between the reliability of sources and the credibility...Truth discovery aims to resolve conflicts among multiple sources and find the truth. Conventional methods for truth discovery mainly investigate the mutual effect between the reliability of sources and the credibility of statements. These methods use real numbers, which have a lower representation capability than vectors to represent the reliability. In addition, neural networks have not been used for truth discovery. In this work, we propose memory-network-based models to address truth discovery. Our proposed models use feedforward and feedback memory networks to learn the representation of the credibility of statements. Specifically, our models adopt a memory mechanism to learn the reliability of sources for truth prediction. The proposed models use categorical and continuous data during model learning by automatically assigning different weights to the loss function on the basis of their own effects. Experimental results show that our proposed models outperform state-of-the-art methods for truth discovery.展开更多
In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the s...In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the same object.One important task is to identify the most trustworthy value out of all the conflicting claims, and this is known as truth discovery. Existing truth discovery methods simultaneously identify the most trustworthy information and source reliability degrees and are based on the idea that more reliable sources often provide more trustworthy information,and vice versa. However, there are often semantic constrains defined upon relational database, which can be violated by a single data source. To remove violations, an important task is to repair data to satisfy the constrains,and this is known as data cleaning. The two problems above may coexist, but considering them together can provide some benefits, and to the authors knowledge, this has not yet been the focus of any research. In this paper, therefore, a schema-decomposing based method is proposed to simultaneously discover the truth and to clean the data, with the aim of improving accuracy. Experimental results using real world data sets of notebooks and mobile phones, as well as simulated data sets, demonstrate the effectiveness and efficiency of our proposed method.展开更多
There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challe...There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challenge,we propose a new and convenient truth discovery method to handle time series data.A more accurate sample is closer to the truth and,consequently,to other accurate samples.Because the mutual-confirm relationship between sensors is very similar to the mutual-quote relationship between web pages,we evaluate sensor reliability based on PageRank and then estimate the truth by sensor reliability.Therefore,this method does not rely on smoothness assumptions or prior knowledge of the data.Finally,we validate the effectiveness and efficiency of the proposed method on real-world and synthetic data sets,respectively.展开更多
Human intelligence tasks(HITs),such as labeling images for machine learning,are widely utilized for crowdsourcing human knowledge.Centralized crowdsourcing platforms face challenges of a single point of failure and a ...Human intelligence tasks(HITs),such as labeling images for machine learning,are widely utilized for crowdsourcing human knowledge.Centralized crowdsourcing platforms face challenges of a single point of failure and a lack of service transparency.Existing blockchain-based crowdsourcing approaches overlook the low scalability problem of permissionless blockchains or inconveniently rely on existing ground-truth data as the root of trust in evaluating the quality of workers’answers.We propose a blockchain-based crowdsourcing scheme for ensuring dual fairness(i.e.,preventing false reporting and free riding)and improving on-chain efficiency concerning on-chain storage and smart contract computation.The proposed scheme does not rely on trusted authorities but rather depends on a public blockchain to guarantee dual fairness.An efficient and publicly verifiable truth discovery scheme is designed based on majority voting and cryptographic accumulators.This truth discovery scheme aims at inferring ground truth from workers’answers.The ground truth is further utilized to estimate the quality of workers’answers.Additionally,a novel blockchain-based protocol is designed to further reduce on-chain costs while ensuring truthfulness.The scheme has O(n)complexity for both on-chain storage and smart contract computation,regardless of the number of questions,where𝑛denotes the number of workers.Formal security analysis is provided,and extensive experiments are conducted to evaluate its effectiveness and performance.展开更多
文摘Existing text truth discovery methods fail to address two challenges:the inherent long-distance dependencies and thematic diversity of long texts;the inherent subjective sentiment that obscures objective evaluation of source reliability.To address these challenges,a novel truth discovery method named large language model(LLM)-enhanced text truth discovery with dual attention(LTDDA)is proposed.First,LLMs generate embedded representations of text claims,and enhance the feature space to tackle long-distance dependencies and thematic diversity.Then,the complex relationship between source reliability and claim credibility is captured by integrating semantic and sentiment features.Finally,dual-layer attention is applied to extract key semantic information and assign consistent weights to similar sources,resulting in accurate truth outputs.Extensive experiments on three realworld datasets demonstrate that the effectiveness of LTDDA outperforms that of state-of-the-art methods,providing new insights for building more reliable and accurate text truth discovery systems.
文摘Air pollution has become a global concern for many years.Vehicular crowdsensing systems make it possible to monitor air quality at a fine granularity.To better utilize the sensory data with varying credibility,truth discovery frameworks are introduced.However,in urban cities,there is a significant difference in traffic volumes of streets or blocks,which leads to a data sparsity problem for truth discovery.Protecting the privacy of participant vehicles is also a crucial task.We first present a data masking-based privacy-preserving truth discovery framework,which incorporates spatial and temporal correlations to solve the sparsity problem.To further improve the truth discovery performance of the presented framework,an enhanced version is proposed with anonymous communication and data perturbation.Both frameworks are more lightweight than the existing cryptography-based methods.We also evaluate the work with simulations and fully discuss the performance and possible extensions.
基金supported by the National Natural Science Foundation of China under Grant No.62072475.
文摘Unmanned and aerial systems as interactors among different system components for communications,have opened up great opportunities for truth data discovery in Mobile Crowd Sensing(MCS)which has not been properly solved in the literature.In this paper,an Unmanned Aerial Vehicles-supported Intelligent Truth Discovery(UAV-ITD)scheme is proposed to obtain truth data at low-cost communications for MCS.The main innovations of the UAV-ITD scheme are as follows:(1)UAV-ITD scheme takes the first step in employing UAV joint Deep Matrix Factorization(DMF)to discover truth data based on the trust mechanism for an Information Elicitation Without Verification(IEWV)problem in MCS.(2)This paper introduces a truth data discovery scheme for the first time that only needs to collect a part of n data samples to infer the data of the entire network with high accuracy,which saves more communication costs than most previous data collection schemes,where they collect n or kn data samples.Finally,we conducted extensive experiments to evaluate the UAV-ITD scheme.The results show that compared with previous schemes,our scheme can reduce estimated truth error by 52.25%–96.09%,increase the accuracy of workers’trust evaluation by 0.68–61.82 times,and save recruitment costs by 24.08%–54.15%in truth data discovery.
基金Fundamental Research Funds for the Central Universities,China (No. 22D111207)。
文摘With the rocketing progress of the Internet, it is easier for people to get information about the objects that they are interested in. However, this information usually has conflicts. In order to resolve conflicts and get the true information, truth discovery has been proposed and received widespread attention. Many algorithms have been proposed to adapt to different scenarios. This paper aims to investigate these algorithms and summarize them from the perspective of algorithm models and specific concepts. Some classic datasets and evaluation metrics are given in this paper. Some future directions for readers are also provided to better understand the field of truth discovery.
基金supported by the National HighTech Development(863)Program of China(No.2015AA015407)the National Natural Science Foundation of China(Nos.61632011 and 61370164)
文摘Truth discovery aims to resolve conflicts among multiple sources and find the truth. Conventional methods for truth discovery mainly investigate the mutual effect between the reliability of sources and the credibility of statements. These methods use real numbers, which have a lower representation capability than vectors to represent the reliability. In addition, neural networks have not been used for truth discovery. In this work, we propose memory-network-based models to address truth discovery. Our proposed models use feedforward and feedback memory networks to learn the representation of the credibility of statements. Specifically, our models adopt a memory mechanism to learn the reliability of sources for truth prediction. The proposed models use categorical and continuous data during model learning by automatically assigning different weights to the loss function on the basis of their own effects. Experimental results show that our proposed models outperform state-of-the-art methods for truth discovery.
基金partially supported by the Key Research and Development Plan of National Ministry of Science and Technology (No. 2016YFB1000703)the Key Program of the National Natural Science Foundation of China (Nos. 61190115, 61472099, 61632010, and U1509216)+2 种基金National Sci-Tech Support Plan (No. 2015BAH10F01)the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province (No. LC2016026)MOE-Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology
文摘In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the same object.One important task is to identify the most trustworthy value out of all the conflicting claims, and this is known as truth discovery. Existing truth discovery methods simultaneously identify the most trustworthy information and source reliability degrees and are based on the idea that more reliable sources often provide more trustworthy information,and vice versa. However, there are often semantic constrains defined upon relational database, which can be violated by a single data source. To remove violations, an important task is to repair data to satisfy the constrains,and this is known as data cleaning. The two problems above may coexist, but considering them together can provide some benefits, and to the authors knowledge, this has not yet been the focus of any research. In this paper, therefore, a schema-decomposing based method is proposed to simultaneously discover the truth and to clean the data, with the aim of improving accuracy. Experimental results using real world data sets of notebooks and mobile phones, as well as simulated data sets, demonstrate the effectiveness and efficiency of our proposed method.
基金National Natural Science Foundation of China(No.62002131)Shuangchuang Ph.D Award(from World Prestigious Universities)of Jiangsu Province,China(No.JSSCBS20211179)。
文摘There are errors in multi-source uncertain time series data.Truth discovery methods for time series data are effective in finding more accurate values,but some have limitations in their usability.To tackle this challenge,we propose a new and convenient truth discovery method to handle time series data.A more accurate sample is closer to the truth and,consequently,to other accurate samples.Because the mutual-confirm relationship between sensors is very similar to the mutual-quote relationship between web pages,we evaluate sensor reliability based on PageRank and then estimate the truth by sensor reliability.Therefore,this method does not rely on smoothness assumptions or prior knowledge of the data.Finally,we validate the effectiveness and efficiency of the proposed method on real-world and synthetic data sets,respectively.
基金supported by grants from the National Research Foundation of Korea(NRF),funded by the Korean government(Grant Nos.NRF-2022R1A2B5B01001553 and NRF-2022R1A4A1033549)provided by an Institute of Information&Communications Technology Planning&Evaluation(IITP)grant,also funded by the Korean government(MSIT)under Grant No.RS-2022-00155915,for the project titled“Artificial Intelligence Convergence Innovation Hu-man Resources Development(Inha University).”
文摘Human intelligence tasks(HITs),such as labeling images for machine learning,are widely utilized for crowdsourcing human knowledge.Centralized crowdsourcing platforms face challenges of a single point of failure and a lack of service transparency.Existing blockchain-based crowdsourcing approaches overlook the low scalability problem of permissionless blockchains or inconveniently rely on existing ground-truth data as the root of trust in evaluating the quality of workers’answers.We propose a blockchain-based crowdsourcing scheme for ensuring dual fairness(i.e.,preventing false reporting and free riding)and improving on-chain efficiency concerning on-chain storage and smart contract computation.The proposed scheme does not rely on trusted authorities but rather depends on a public blockchain to guarantee dual fairness.An efficient and publicly verifiable truth discovery scheme is designed based on majority voting and cryptographic accumulators.This truth discovery scheme aims at inferring ground truth from workers’answers.The ground truth is further utilized to estimate the quality of workers’answers.Additionally,a novel blockchain-based protocol is designed to further reduce on-chain costs while ensuring truthfulness.The scheme has O(n)complexity for both on-chain storage and smart contract computation,regardless of the number of questions,where𝑛denotes the number of workers.Formal security analysis is provided,and extensive experiments are conducted to evaluate its effectiveness and performance.