The advancements of mobile devices, public networks and the Internet of creature huge amounts of complex data, both construct & unstructured are being captured in trust to allow organizations to produce better bus...The advancements of mobile devices, public networks and the Internet of creature huge amounts of complex data, both construct & unstructured are being captured in trust to allow organizations to produce better business decisions as data is now pivotal for an organizations success. These enormous amounts of data are referred to as Big Data, which enables a competitive advantage over rivals when processed and analyzed appropriately. However Big Data Analytics has a few concerns including Management of Data, Privacy & Security, getting optimal path for transport data, and Data Representation. However, the structure of network does not completely match transportation demand, i.e., there still exist a few bottlenecks in the network. This paper presents a new approach to get the optimal path of valuable data movement through a given network based on the knapsack problem. This paper will give value for each piece of data, it depends on the importance of this data (each piece of data defined by two arguments size and value), and the approach tries to find the optimal path from source to destination, a mathematical models are developed to adjust data flows between their shortest paths based on the 0 - 1 knapsack problem. We also take out computational experience using the commercial software Gurobi and a greedy algorithm (GA), respectively. The outcome indicates that the suggest models are active and workable. This paper introduced two different algorithms to study the shortest path problems: the first algorithm studies the shortest path problems when stochastic activates and activities does not depend on weights. The second algorithm studies the shortest path problems depends on weights.展开更多
Purpose–Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data.An important issue on data integration is the existence of conflicts among the...Purpose–Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data.An important issue on data integration is the existence of conflicts among the different data sources.Data sources may conflict with each other at data level,which is defined as data inconsistency.The purpose of this paper is to aim at this problem and propose a solution for data inconsistency in data integration.Design/methodology/approach–A relational data model extended with data source quality criteria is first defined.Then based on the proposed data model,a data inconsistency solution strategy is provided.To accomplish the strategy,fuzzy multi-attribute decision-making(MADM)approach based on data source quality criteria is applied to obtain the results.Finally,users feedbacks strategies are proposed to optimize the result of fuzzy MADM approach as the final data inconsistent solution.Findings–To evaluate the proposed method,the data obtained from the sensors are extracted.Some experiments are designed and performed to explain the effectiveness of the proposed strategy.The results substantiate that the solution has a better performance than the other methods on correctness,time cost and stability indicators.Practical implications–Since the inconsistent data collected from the sensors are pervasive,the proposed method can solve this problem and correct the wrong choice to some extent.Originality/value–In this paper,for the first time the authors study the effect of users feedbacks on integration results aiming at the inconsistent data.展开更多
In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the s...In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the same object.One important task is to identify the most trustworthy value out of all the conflicting claims, and this is known as truth discovery. Existing truth discovery methods simultaneously identify the most trustworthy information and source reliability degrees and are based on the idea that more reliable sources often provide more trustworthy information,and vice versa. However, there are often semantic constrains defined upon relational database, which can be violated by a single data source. To remove violations, an important task is to repair data to satisfy the constrains,and this is known as data cleaning. The two problems above may coexist, but considering them together can provide some benefits, and to the authors knowledge, this has not yet been the focus of any research. In this paper, therefore, a schema-decomposing based method is proposed to simultaneously discover the truth and to clean the data, with the aim of improving accuracy. Experimental results using real world data sets of notebooks and mobile phones, as well as simulated data sets, demonstrate the effectiveness and efficiency of our proposed method.展开更多
Cloud water plays an important role in the global atmospheric water cycle and weather modification,but cloud is one of the most uncertain parameters in the study of weather and climate.The cloud water products from di...Cloud water plays an important role in the global atmospheric water cycle and weather modification,but cloud is one of the most uncertain parameters in the study of weather and climate.The cloud water products from different data sources may have considerable discrepancies.In this study,the total cloud liquid water(termed as cloud liquid water path,LWP)obtained from satellite observations[Advanced Himawari Imager(AHI)and Advanced Microwave Scanning Radiometer(AMSR)]and three sets of modern reanalysis data(ERA5,JRA-55,and MERRA-2)are compared and analyzed.Moreover,characteristics of vertical distributions of cloud liquid water content(LWC)in different regions over East Asia are analyzed by using the profile data from the reanalyses.The main findings are as follows:(1)in extensive warm marine clouds,AHI and AMSR have a good agreement(with the correlation coefficient larger than 0.7)but with an overestimation from AHI;(2)under warm cloud conditions,the LWP in ERA5shows a significant positive bias(about 0.065 kg m^(-2))over land,while MERRA-2 is closer to the satellite product compared with ERA5 and JRA-55;and(3)Southwest China(SW)is the area with most abundant LWC.The LWC is mainly concentrated in the middle and lower troposphere in the study area,and the LWC in ERA5 is higher than that in MERRA-2 and JRA-55.Overall,satellite observations and reanalyses exhibit significant inconsistency for cloud LWP,which needs further investigation and understanding.展开更多
文摘The advancements of mobile devices, public networks and the Internet of creature huge amounts of complex data, both construct & unstructured are being captured in trust to allow organizations to produce better business decisions as data is now pivotal for an organizations success. These enormous amounts of data are referred to as Big Data, which enables a competitive advantage over rivals when processed and analyzed appropriately. However Big Data Analytics has a few concerns including Management of Data, Privacy & Security, getting optimal path for transport data, and Data Representation. However, the structure of network does not completely match transportation demand, i.e., there still exist a few bottlenecks in the network. This paper presents a new approach to get the optimal path of valuable data movement through a given network based on the knapsack problem. This paper will give value for each piece of data, it depends on the importance of this data (each piece of data defined by two arguments size and value), and the approach tries to find the optimal path from source to destination, a mathematical models are developed to adjust data flows between their shortest paths based on the 0 - 1 knapsack problem. We also take out computational experience using the commercial software Gurobi and a greedy algorithm (GA), respectively. The outcome indicates that the suggest models are active and workable. This paper introduced two different algorithms to study the shortest path problems: the first algorithm studies the shortest path problems when stochastic activates and activities does not depend on weights. The second algorithm studies the shortest path problems depends on weights.
文摘Purpose–Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data.An important issue on data integration is the existence of conflicts among the different data sources.Data sources may conflict with each other at data level,which is defined as data inconsistency.The purpose of this paper is to aim at this problem and propose a solution for data inconsistency in data integration.Design/methodology/approach–A relational data model extended with data source quality criteria is first defined.Then based on the proposed data model,a data inconsistency solution strategy is provided.To accomplish the strategy,fuzzy multi-attribute decision-making(MADM)approach based on data source quality criteria is applied to obtain the results.Finally,users feedbacks strategies are proposed to optimize the result of fuzzy MADM approach as the final data inconsistent solution.Findings–To evaluate the proposed method,the data obtained from the sensors are extracted.Some experiments are designed and performed to explain the effectiveness of the proposed strategy.The results substantiate that the solution has a better performance than the other methods on correctness,time cost and stability indicators.Practical implications–Since the inconsistent data collected from the sensors are pervasive,the proposed method can solve this problem and correct the wrong choice to some extent.Originality/value–In this paper,for the first time the authors study the effect of users feedbacks on integration results aiming at the inconsistent data.
基金partially supported by the Key Research and Development Plan of National Ministry of Science and Technology (No. 2016YFB1000703)the Key Program of the National Natural Science Foundation of China (Nos. 61190115, 61472099, 61632010, and U1509216)+2 种基金National Sci-Tech Support Plan (No. 2015BAH10F01)the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province (No. LC2016026)MOE-Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology
文摘In this era of big data, data are often collected from multiple sources that have different reliabilities, and there is inevitable conflict with respect to the various information obtained when it relates to the the same object.One important task is to identify the most trustworthy value out of all the conflicting claims, and this is known as truth discovery. Existing truth discovery methods simultaneously identify the most trustworthy information and source reliability degrees and are based on the idea that more reliable sources often provide more trustworthy information,and vice versa. However, there are often semantic constrains defined upon relational database, which can be violated by a single data source. To remove violations, an important task is to repair data to satisfy the constrains,and this is known as data cleaning. The two problems above may coexist, but considering them together can provide some benefits, and to the authors knowledge, this has not yet been the focus of any research. In this paper, therefore, a schema-decomposing based method is proposed to simultaneously discover the truth and to clean the data, with the aim of improving accuracy. Experimental results using real world data sets of notebooks and mobile phones, as well as simulated data sets, demonstrate the effectiveness and efficiency of our proposed method.
基金Supported by the National Natural Science Foundation of China(42205044)National Key Research and Development Program of China(2024YFF1308202)+2 种基金Fengyun Application Pioneer Project(FY-APP-2022.0111)Project for the Capacity Construction of Weather Modification in Southwest China[SCIT-ZG(Z)-2024100001]Wuxi University Research Start-up Fund for Introduced Talents(2024r045)。
文摘Cloud water plays an important role in the global atmospheric water cycle and weather modification,but cloud is one of the most uncertain parameters in the study of weather and climate.The cloud water products from different data sources may have considerable discrepancies.In this study,the total cloud liquid water(termed as cloud liquid water path,LWP)obtained from satellite observations[Advanced Himawari Imager(AHI)and Advanced Microwave Scanning Radiometer(AMSR)]and three sets of modern reanalysis data(ERA5,JRA-55,and MERRA-2)are compared and analyzed.Moreover,characteristics of vertical distributions of cloud liquid water content(LWC)in different regions over East Asia are analyzed by using the profile data from the reanalyses.The main findings are as follows:(1)in extensive warm marine clouds,AHI and AMSR have a good agreement(with the correlation coefficient larger than 0.7)but with an overestimation from AHI;(2)under warm cloud conditions,the LWP in ERA5shows a significant positive bias(about 0.065 kg m^(-2))over land,while MERRA-2 is closer to the satellite product compared with ERA5 and JRA-55;and(3)Southwest China(SW)is the area with most abundant LWC.The LWC is mainly concentrated in the middle and lower troposphere in the study area,and the LWC in ERA5 is higher than that in MERRA-2 and JRA-55.Overall,satellite observations and reanalyses exhibit significant inconsistency for cloud LWP,which needs further investigation and understanding.