We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format...We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing展开更多
In the data transaction process within a data asset trading platform,quantifying the trustworthiness of data source nodes is challenging due to their numerous attributes and complex structures.To address this issue,a ...In the data transaction process within a data asset trading platform,quantifying the trustworthiness of data source nodes is challenging due to their numerous attributes and complex structures.To address this issue,a distributed data source trust assessment management framework,a trust quantification model,and a dynamic adjustment mechanism are proposed.Themodel integrates the Analytic Hierarchy Process(AHP)and Dempster-Shafer(D-S)evidence theory to determine attribute weights and calculate direct trust values,while the PageRank algorithm is employed to derive indirect trust values.Thedirect and indirect trust values are then combined to compute the comprehensive trust value of the data source.Furthermore,a dynamic adjustment mechanism is introduced to continuously update the comprehensive trust value based on historical assessment data.By leveraging the collaborative efforts of multiple nodes in the distributed network,the proposed framework enables a comprehensive,dynamic,and objective evaluation of data source trustworthiness.Extensive experimental analyses demonstrate that the trust quantification model effectively handles large-scale data source trust assessments,exhibiting both strong trust differentiation capability and high robustness.展开更多
Attention deficit hyperactivity disorder(ADHD)is a common,highly heritable psychiatric disorder charac-terized by hyperactivity,inattention and increased im-pulsivity.In recent years,a large number of genetic studies ...Attention deficit hyperactivity disorder(ADHD)is a common,highly heritable psychiatric disorder charac-terized by hyperactivity,inattention and increased im-pulsivity.In recent years,a large number of genetic studies for ADHD have been published and related ge-netic data has been accumulated dramatically.To pro-vide researchers a comprehensive ADHD genetic re-source,we previously developed the first genetic data-base for ADHD(ADHDgene).The abundant genetic data provides novel candidates for further study.Meanwhile,it also brings new challenge for selecting promising candidate genes for replication and verification research.In this study,we surveyed the computational tools for candidate gene prioritization and selected five tools,which integrate multiple data sources for gene prioritiza-tion,to prioritize ADHD candidate genes in ADHDgene.The prioritization analysis resulted in 16 prioritized can-didate genes,which are mainly involved in several major neurotransmitter systems or in nervous system development pathways.Among these genes,nervous system development related genes,especially SNAP25,STX1A and the gene-gene interactions related with each of them deserve further investigations.Our results may provide new insight for further verification study and facilitate the exploration of pathogenesis mechanism of ADHD.展开更多
Stock price trend prediction is a challenging issue in the financial field.To get improvements in predictive performance,both data and technique are essential.The purpose of this paper is to compare deep learning mode...Stock price trend prediction is a challenging issue in the financial field.To get improvements in predictive performance,both data and technique are essential.The purpose of this paper is to compare deep learning model(LSTM)with two ensemble models(RF and XGboost)using multiple data.Data is gathered from four stocks of financial sector in China A-share market,and the accuracy and F1-measure are used as performance measure.The data of the past three days is applied to classify the rise and fall trend of price on the next day.The models’performance are tested under different market styles(bull or bear market)and different market activities.The results indicate that under the same conditions,LSTM is the top algorithm followed by RF and XGBoost.For all models applied in this study,prediction performance in bull markets is much better than in bear markets,and the result in active period is better than inactive period by average.It is also found that adding data sources is not always effective in improving forecasting performance,and valuable data sources and proper processing may be more essential than providing a large quantity of data source.展开更多
With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The networ...With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The network security environment in the era of big data presents the characteristics of large amounts of data,high diversity,and high real-time requirements.Traditional security defense methods and tools have been unable to cope with the complex and changing network security threats.This paper proposes a machine-learning security defense algorithm based on metadata association features.Emphasize control over unauthorized users through privacy,integrity,and availability.The user model is established and the mapping between the user model and the metadata of the data source is generated.By analyzing the user model and its corresponding mapping relationship,the query of the user model can be decomposed into the query of various heterogeneous data sources,and the integration of heterogeneous data sources based on the metadata association characteristics can be realized.Define and classify customer information,automatically identify and perceive sensitive data,build a behavior audit and analysis platform,analyze user behavior trajectories,and complete the construction of a machine learning customer information security defense system.The experimental results show that when the data volume is 5×103 bit,the data storage integrity of the proposed method is 92%.The data accuracy is 98%,and the success rate of data intrusion is only 2.6%.It can be concluded that the data storage method in this paper is safe,the data accuracy is always at a high level,and the data disaster recovery performance is good.This method can effectively resist data intrusion and has high air traffic control security.It can not only detect all viruses in user data storage,but also realize integrated virus processing,and further optimize the security defense effect of user big data.展开更多
The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectivene...The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectiveness of longitudinal data analysis, artificial intelligence, and machine learning approaches based on magnetic resonance imaging and positron emission tomography neuroimaging modalities for progression estimation and the detection of Alzheimer’s disease onset. The significance of feature extraction in highly complex neuroimaging data, identification of vulnerable brain regions, and the determination of the threshold values for plaques, tangles, and neurodegeneration of these regions will extensively be evaluated. Developing automated methods to improve the aforementioned research areas would enable specialists to determine the progression of the disease and find the link between the biomarkers and more accurate detection of Alzheimer’s disease onset.展开更多
To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing ...To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.展开更多
Space-time disease cluster detection assists in conducting disease surveillance and implementing control strategies.The state-of-the-art method for this kind of problem is the Space-time Scan Statistics(SaTScan)which ...Space-time disease cluster detection assists in conducting disease surveillance and implementing control strategies.The state-of-the-art method for this kind of problem is the Space-time Scan Statistics(SaTScan)which has limitations for non-traditional/non-clinical data sources due to its parametric model assumptions such as Poisson orGaussian counts.Addressing this problem,an Eigenspace-based method called Multi-EigenSpot has recently been proposed as a nonparametric solution.However,it is based on the population counts data which are not always available in the least developed countries.In addition,the population counts are difficult to approximate for some surveillance data such as emergency department visits and over-the-counter drug sales,where the catchment area for each hospital/pharmacy is undefined.We extend the population-based Multi-EigenSpot method to approximate the potential disease clusters from the observed/reported disease counts only with no need for the population counts.The proposed adaptation uses an estimator of expected disease count that does not depend on the population counts.The proposed method was evaluated on the real-world dataset and the results were compared with the population-based methods:Multi-EigenSpot and SaTScan.The result shows that the proposed adaptation is effective in approximating the important outputs of the population-based methods.展开更多
Real traffic information was analyzed in the statistical characteristics and approximated as a Gaussian time series. A data source model, called two states constant bit rate (TSCBR), was proposed in dynamic traffic mo...Real traffic information was analyzed in the statistical characteristics and approximated as a Gaussian time series. A data source model, called two states constant bit rate (TSCBR), was proposed in dynamic traffic monitoring sensor networks. Analysis of autocorrelation of the models shows that the proposed TSCBR model matches with the statistical characteristics of real data source closely. To further verify the validity of the TSCBR data source model, the performance metrics of power consumption and network lifetime was studied in the evaluation of sensor media access control (SMAC) algorithm. The simulation results show that compared with traditional data source models, TSCBR model can significantly improve accuracy of the algorithm evaluation.展开更多
The Digital Elevation Model(DEM)data of debris flow prevention engineering are the boundary of a debris flow prevention simulation,which provides accurate and reliable DEM data and is a key consideration in debris flo...The Digital Elevation Model(DEM)data of debris flow prevention engineering are the boundary of a debris flow prevention simulation,which provides accurate and reliable DEM data and is a key consideration in debris flow prevention simulations.Thus,this paper proposes a multi-source data fusion method.First,we constructed 3D models of debris flow prevention using virtual reality technology according to the relevant specifications.The 3D spatial data generated by 3D modeling were converted into DEM data for debris flow prevention engineering.Then,the accuracy and applicability of the DEM data were verified by the error analysis testing and fusion testing of the debris flow prevention simulation.Finally,we propose the Levels of Detail algorithm based on the quadtree structure to realize the visualization of a large-scale disaster prevention scene.The test results reveal that the data fusion method controlled the error rate of the DEM data of the debris flow prevention engineering within an allowable range and generated 3D volume data(obj format)to compensate for the deficiency of the DEM data whereby the 3D internal entity space is not expressed.Additionally,the levels of detailed method can dispatch the data of a large-scale debris flow hazard scene in real time to ensure a realistic 3D visualization.In summary,the proposed methods can be applied to the planning of debris flow prevention engineering and to the simulation of the debris flow prevention process.展开更多
This research describes a quantitative,rapid,and low-cost methodology for debris flow susceptibility evaluation at the basin scale using open-access data and geodatabases.The proposed approach can aid decision makers ...This research describes a quantitative,rapid,and low-cost methodology for debris flow susceptibility evaluation at the basin scale using open-access data and geodatabases.The proposed approach can aid decision makers in land management and territorial planning,by first screening for areas with a higher debris flow susceptibility.Five environmental predisposing factors,namely,bedrock lithology,fracture network,quaternary deposits,slope inclination,and hydrographic network,were selected as independent parameters and their mutual interactions were described and quantified using the Rock Engineering System(RES)methodology.For each parameter,specific indexes were proposed,aiming to provide a final synthetic and representative index of debris flow susceptibility at the basin scale.The methodology was tested in four basins located in the Upper Susa Valley(NW Italian Alps)where debris flow events are the predominant natural hazard.The proposed matrix can represent a useful standardized tool,universally applicable,since it is independent of type and characteristic of the basin.展开更多
Neutral beam injection is one of the effective auxiliary heating methods in magnetic-confinementfusion experiments. In order to acquire the suppressor-grid current signal and avoid the grid being damaged by overheatin...Neutral beam injection is one of the effective auxiliary heating methods in magnetic-confinementfusion experiments. In order to acquire the suppressor-grid current signal and avoid the grid being damaged by overheating, a data acquisition and over-current protection system based on the PXI(PCI e Xtensions for Instrumentation) platform has been developed. The system consists of a current sensor, data acquisition module and over-current protection module. In the data acquisition module,the acquired data of one shot will be transferred in isolation and saved in a data-storage server in a txt file. It can also be recalled using NBWave for future analysis. The over-current protection module contains two modes: remote and local. This gives it the function of setting a threshold voltage remotely and locally, and the forbidden time of over-current protection also can be set by a host PC in remote mode. Experimental results demonstrate that the data acquisition and overcurrent protection system has the advantages of setting forbidden time and isolation transmission.展开更多
Over the past few years,the application and usage of Machine Learning(ML)techniques have increased exponentially due to continuously increasing the size of data and computing capacity.Despite the popularity of ML tech...Over the past few years,the application and usage of Machine Learning(ML)techniques have increased exponentially due to continuously increasing the size of data and computing capacity.Despite the popularity of ML techniques,only a few research studies have focused on the application of ML especially supervised learning techniques in Requirement Engineering(RE)activities to solve the problems that occur in RE activities.The authors focus on the systematic mapping of past work to investigate those studies that focused on the application of supervised learning techniques in RE activities between the period of 2002–2023.The authors aim to investigate the research trends,main RE activities,ML algorithms,and data sources that were studied during this period.Forty-five research studies were selected based on our exclusion and inclusion criteria.The results show that the scientific community used 57 algorithms.Among those algorithms,researchers mostly used the five following ML algorithms in RE activities:Decision Tree,Support Vector Machine,Naïve Bayes,K-nearest neighbour Classifier,and Random Forest.The results show that researchers used these algorithms in eight major RE activities.Those activities are requirements analysis,failure prediction,effort estimation,quality,traceability,business rules identification,content classification,and detection of problems in requirements written in natural language.Our selected research studies used 32 private and 41 public data sources.The most popular data sources that were detected in selected studies are the Metric Data Programme from NASA,Predictor Models in Software Engineering,and iTrust Electronic Health Care System.展开更多
Geological hazard is an adverse geological condition that can cause loss of life and property.Accurate prediction and analysis of geological hazards is an important and challenging task.In the past decade,there has be...Geological hazard is an adverse geological condition that can cause loss of life and property.Accurate prediction and analysis of geological hazards is an important and challenging task.In the past decade,there has been a great expansion of geohazard detection data and advancement in data-driven simulation techniques.In particular,great efforts have been made in applying deep learning to predict geohazards.To understand the recent progress in this field,this paper provides an overview of the commonly used data sources and deep neural networks in the prediction of a variety of geological hazards.展开更多
Portable Pilot Units(PPU)play an important role in modern pilotage.With more than 20 years’PPU development and practice,a comprehensive data analysis is conducted in this paper.The reliabilities and accuracy of diffe...Portable Pilot Units(PPU)play an important role in modern pilotage.With more than 20 years’PPU development and practice,a comprehensive data analysis is conducted in this paper.The reliabilities and accuracy of different sensors are compared.Finally,the risk of PPU piloting and the corresponding countermeasures is discussed.展开更多
The main function of Internet of Things is to collect and transmit data.At present,the data transmission in Internet of Things lacks effective trust attestation mechanism and trust traceability mechanism of data sourc...The main function of Internet of Things is to collect and transmit data.At present,the data transmission in Internet of Things lacks effective trust attestation mechanism and trust traceability mechanism of data source.To solve the above problems,a trust attestation mechanism for sensing layer nodes is presented.First a trusted group is established,and the node which is going to join the group needs to attest its identity and key attributes to the higher level node.Then the dynamic trust measurement value of the node can be obtained by measuring the node data transmission behavior.Finally the node encapsulates the key attributes and trust measurement value to use short message group signature to attest its trust to the challenger.This mechanism can measure the data sending and receiving behaviors of sensing nodes and track the data source,and it does not expose the privacy information of nodes and the sensing nodes can be traced effectively.The trust measurement for sensing nodes and verification is applicable to Internet of Things and the simulation experiment shows the trust attestation mechanism is flexible,practical and efficient.Besides,it can accurately and quickly identify the malicious nodes at the same time.The impact on the system performance is negligible.展开更多
Paddy rice mapping is crucial for cultivation management,yield estimation,and food security.Guangdong,straddling tropics and subtropics,is a major rice-producing region in China.Mapping paddy rice in Guangdong is esse...Paddy rice mapping is crucial for cultivation management,yield estimation,and food security.Guangdong,straddling tropics and subtropics,is a major rice-producing region in China.Mapping paddy rice in Guangdong is essential.However,there are 2 main difficulties in tropical and subtropical paddy rice mapping,including the lack of high-quality optical images and differences in paddy rice planting times.This study proposed a paddy rice mapping framework using phenology matching,integrating Sentinel-1 and Sentinel-2 data to incorporate prior knowledge into the classifiers.The transplanting periods of paddy rice were identified with Sentinel-1 data,and the subsequent 3 months were defined as the growth periods.Features during growth periods obtained by Sentinel-1 and Sentinel-2 were inputted into machine learning classifiers.The classifiers using matched features substantially improved mapping accuracy compared with those using unmatched features,both for early and late rice mapping.The proposed method also improved the accuracy by 6.44%to 16.10%compared with 3 other comparison methods.The model,utilizing matched features,was applied to early and late rice mapping in Guangdong in 2020.Regression results between mapping area and statistical data validate paddy rice mapping credibility.Our analysis revealed that thermal conditions,especially cold severity during growing stages,are the primary determinant of paddy rice phenology.Spatial patterns of paddy rice in Guangdong result from a blend of human and physical factors,with slope and minimum temperature emerging as the most important limitations.These findings enhance our understanding of rice ecosystems’dynamics,offering insights for formulating relevant agricultural policies.展开更多
The"omics"revolution has transformed the biomedical research landscape by equipping scientists with the ability to interrogate complex biological phenomenon and disease processes at an unprecedented level.Th...The"omics"revolution has transformed the biomedical research landscape by equipping scientists with the ability to interrogate complex biological phenomenon and disease processes at an unprecedented level.The volume of"big"data generated by the different omics studies such as genomics,transcriptomics,proteomics,and metabolomics has led to the concurrent development of computational tools to enable in silico analysis and aid data deconvolution.Considering the intensive resources and high costs required to generate and analyze big data,there has been centralized,collaborative efforts to make the data and analysis tools freely available as"Open Source,"to benefit the wider research community.Pancreatology research studies have contributed to this"big data rush"and have additionally benefitted from utilizing the open source data as evidenced by the increasing number of new research findings and publications that stem from such data.In this review,we briefly introduce the evolution of open source omics data,data types,the"FAIR"guiding principles for data management and reuse,and centralized platforms that enable free and fair data accessibility,availability,and provide tools for omics data analysis.We illustrate,through the case study of our own experience in mining pancreatitis omics data,the power of repurposing open source data to answer translationally relevant questions in pancreas research.展开更多
Background: Population-based cancer survival is a key metric in evaluating the overall effectiveness of health services and cancer control activities. Advancement in information technology enables accurate vital statu...Background: Population-based cancer survival is a key metric in evaluating the overall effectiveness of health services and cancer control activities. Advancement in information technology enables accurate vital status tracking through multi-source data linkage. However, its reliability for survival estimates in China is unclear.Methods: We analyzed data from Dalian Cancer Registry to evaluate the reliability of multi-source data linkage for population-based cancer survival estimates in China. Newly diagnosed cancer patients in 2015 were included and followed until June 2021. We conducted single-source data linkage by linking patients to Dalian Vital Statistics System, and multi-source data linkage by further linking to Dalian Household Registration System and the hospital medical records. Patient vital status was subsequently determined through active follow-up via telephone calls, referred to as comprehensive follow-up, which served as the gold standard. Using the cohort method, we calculated 5-year observed survival and age-standardized relative survival for 20 cancer types and all cancers combined.Results: Compared to comprehensive follow-up, single-source data linkage overestimated 5-year observed survival by 3.2% for all cancers combined, ranging from 0.1% to 8.6% across 20 cancer types. Multi-source data linkage provided a relatively complete patient vital status, with an observed survival estimate of only 0.3% higher for all cancers, ranging from 0% to1.5% across 20 cancer types.Conclusion: Multi-source data linkage contributes to reliable population-based cancer survival estimates in China. Linkage of multiple databases might be of great value in improving the efficiency of follow-up and the quality of survival data for cancer patients in developing countries.展开更多
Traditional assessment indexes could not fully describe offshore wind resources,for the meteorological properties of offshore are more complex than onshore.As a result,the uncertainty of offshore wind power projects w...Traditional assessment indexes could not fully describe offshore wind resources,for the meteorological properties of offshore are more complex than onshore.As a result,the uncertainty of offshore wind power projects would be increased and final economic benefits would be affected.Therefore,a study on offshore wind resource assessment is carried out,including three processes of“studying data sources,conducting multidimensional indexes system and proposing an offshore wind resource assessment method based on analytic hierarchy process(AHP).First,measured wind data and two kinds of reanalysis data are used to analyze the characteristics and reliability of data sources.Second,indexes such as effective wind speed occurrence,affluent level occurrence,coefficient of variation,neutral state occurrence have been proposed to depict availability,richness,and stability of offshore wind resources,respectively.Combined with existing parameters(wind power density,dominant wind direction occurrence,water depth,distance to coast),a multidimensional indexes system has been built and on this basis,an offshore wind energy potential assessment method has been proposed.Furthermore,the proposed method is verified by the annual energy production of five offshore wind turbines and practical operating data of four offshore wind farms in China.This study also compares the ranking results of the AHP model to two multi-criteria decision making(MCDM)models including weighted aggregated sum product assessment(WASPAS)and multi-attribute ideal real comparative analysis(MAIRCA).Results show the proposed method gains well in practical engineering applications,where the economic score values have been considered based on the offshore reasonable utilization hours of the whole life cycle in China.展开更多
文摘We propose a three-step technique to achieve this purpose. First, we utilize a collection of XML namespaces organized into hierarchical structure as a medium for expressing data semantics. Second, we define the format of resource descriptor for the information source discovery scheme so that we can dynamically register and/or deregister the Web data sources on the fly. Third, we employ an inverted-index mechanism to identify the subset of information sources that are relevant to a particular user query. We describe the design, architecture, and implementation of our approach—IWDS, and illustrate its use through case examples. Key words integration - heterogeneity - Web data source - XML namespace CLC number TP 311.13 Foundation item: Supported by the National Key Technologies R&D Program of China(2002BA103A04)Biography: WU Wei (1975-), male, Ph.D candidate, research direction: information integration, distribute computing
基金funded by Haikou Science and Technology Plan Project(2022-007),in part by key Laboratory of PK System Technologies Research of Hainan,China.
文摘In the data transaction process within a data asset trading platform,quantifying the trustworthiness of data source nodes is challenging due to their numerous attributes and complex structures.To address this issue,a distributed data source trust assessment management framework,a trust quantification model,and a dynamic adjustment mechanism are proposed.Themodel integrates the Analytic Hierarchy Process(AHP)and Dempster-Shafer(D-S)evidence theory to determine attribute weights and calculate direct trust values,while the PageRank algorithm is employed to derive indirect trust values.Thedirect and indirect trust values are then combined to compute the comprehensive trust value of the data source.Furthermore,a dynamic adjustment mechanism is introduced to continuously update the comprehensive trust value based on historical assessment data.By leveraging the collaborative efforts of multiple nodes in the distributed network,the proposed framework enables a comprehensive,dynamic,and objective evaluation of data source trustworthiness.Extensive experimental analyses demonstrate that the trust quantification model effectively handles large-scale data source trust assessments,exhibiting both strong trust differentiation capability and high robustness.
基金supported by Key Laboratory of Mental Health,Insti-tute of Psychology,Chinese Academy of Sciences.
文摘Attention deficit hyperactivity disorder(ADHD)is a common,highly heritable psychiatric disorder charac-terized by hyperactivity,inattention and increased im-pulsivity.In recent years,a large number of genetic studies for ADHD have been published and related ge-netic data has been accumulated dramatically.To pro-vide researchers a comprehensive ADHD genetic re-source,we previously developed the first genetic data-base for ADHD(ADHDgene).The abundant genetic data provides novel candidates for further study.Meanwhile,it also brings new challenge for selecting promising candidate genes for replication and verification research.In this study,we surveyed the computational tools for candidate gene prioritization and selected five tools,which integrate multiple data sources for gene prioritiza-tion,to prioritize ADHD candidate genes in ADHDgene.The prioritization analysis resulted in 16 prioritized can-didate genes,which are mainly involved in several major neurotransmitter systems or in nervous system development pathways.Among these genes,nervous system development related genes,especially SNAP25,STX1A and the gene-gene interactions related with each of them deserve further investigations.Our results may provide new insight for further verification study and facilitate the exploration of pathogenesis mechanism of ADHD.
基金This work is supported by:Engineering Research Center of State Financial Security,Ministry of Education,Central University of Finance and Economics,Beijing,102206,ChinaProgram for Innovation Research in Central University of Finance and Economics.
文摘Stock price trend prediction is a challenging issue in the financial field.To get improvements in predictive performance,both data and technique are essential.The purpose of this paper is to compare deep learning model(LSTM)with two ensemble models(RF and XGboost)using multiple data.Data is gathered from four stocks of financial sector in China A-share market,and the accuracy and F1-measure are used as performance measure.The data of the past three days is applied to classify the rise and fall trend of price on the next day.The models’performance are tested under different market styles(bull or bear market)and different market activities.The results indicate that under the same conditions,LSTM is the top algorithm followed by RF and XGBoost.For all models applied in this study,prediction performance in bull markets is much better than in bear markets,and the result in active period is better than inactive period by average.It is also found that adding data sources is not always effective in improving forecasting performance,and valuable data sources and proper processing may be more essential than providing a large quantity of data source.
基金This work was supported by the National Natural Science Foundation of China(U2133208,U20A20161).
文摘With the popularization of the Internet and the development of technology,cyber threats are increasing day by day.Threats such as malware,hacking,and data breaches have had a serious impact on cybersecurity.The network security environment in the era of big data presents the characteristics of large amounts of data,high diversity,and high real-time requirements.Traditional security defense methods and tools have been unable to cope with the complex and changing network security threats.This paper proposes a machine-learning security defense algorithm based on metadata association features.Emphasize control over unauthorized users through privacy,integrity,and availability.The user model is established and the mapping between the user model and the metadata of the data source is generated.By analyzing the user model and its corresponding mapping relationship,the query of the user model can be decomposed into the query of various heterogeneous data sources,and the integration of heterogeneous data sources based on the metadata association characteristics can be realized.Define and classify customer information,automatically identify and perceive sensitive data,build a behavior audit and analysis platform,analyze user behavior trajectories,and complete the construction of a machine learning customer information security defense system.The experimental results show that when the data volume is 5×103 bit,the data storage integrity of the proposed method is 92%.The data accuracy is 98%,and the success rate of data intrusion is only 2.6%.It can be concluded that the data storage method in this paper is safe,the data accuracy is always at a high level,and the data disaster recovery performance is good.This method can effectively resist data intrusion and has high air traffic control security.It can not only detect all viruses in user data storage,but also realize integrated virus processing,and further optimize the security defense effect of user big data.
文摘The scientists are dedicated to studying the detection of Alzheimer’s disease onset to find a cure, or at the very least, medication that can slow the progression of the disease. This article explores the effectiveness of longitudinal data analysis, artificial intelligence, and machine learning approaches based on magnetic resonance imaging and positron emission tomography neuroimaging modalities for progression estimation and the detection of Alzheimer’s disease onset. The significance of feature extraction in highly complex neuroimaging data, identification of vulnerable brain regions, and the determination of the threshold values for plaques, tangles, and neurodegeneration of these regions will extensively be evaluated. Developing automated methods to improve the aforementioned research areas would enable specialists to determine the progression of the disease and find the link between the biomarkers and more accurate detection of Alzheimer’s disease onset.
文摘To construct mediators for data integration systems that integrate structured and semi-structured data, and to facilitate the reformulation and decomposition of the query, the presented system uses the XML processing language (XPL) for the mediator. With XPL, it is easy to construct mediators for data integration based on XML, and it can accelerate the work in the mediator.
基金This article was funded by a Fundamental Research Grant Scheme(FRGS)from the Ministry of Education,Malaysia(Ref:FRGS/1/2018/STG06/UTP/02/1)a Yayasan Universiti Teknologi PETRONAS-Fundamental Research Grant(cost center of 015LC0-013)received by Hanita Daud,URLs:https://www.mohe.gov.my/en/initiatives-2/187-program-utama/penyelidikan/548-research-grants-informationhttps://www.utp.edu.my/yayasan/Pages/default.aspx.
文摘Space-time disease cluster detection assists in conducting disease surveillance and implementing control strategies.The state-of-the-art method for this kind of problem is the Space-time Scan Statistics(SaTScan)which has limitations for non-traditional/non-clinical data sources due to its parametric model assumptions such as Poisson orGaussian counts.Addressing this problem,an Eigenspace-based method called Multi-EigenSpot has recently been proposed as a nonparametric solution.However,it is based on the population counts data which are not always available in the least developed countries.In addition,the population counts are difficult to approximate for some surveillance data such as emergency department visits and over-the-counter drug sales,where the catchment area for each hospital/pharmacy is undefined.We extend the population-based Multi-EigenSpot method to approximate the potential disease clusters from the observed/reported disease counts only with no need for the population counts.The proposed adaptation uses an estimator of expected disease count that does not depend on the population counts.The proposed method was evaluated on the real-world dataset and the results were compared with the population-based methods:Multi-EigenSpot and SaTScan.The result shows that the proposed adaptation is effective in approximating the important outputs of the population-based methods.
基金The National Natural Science Foundation ofChia(No60372076)The Important cienceand Technology Key Item of Shanghai Science and Technology Bureau ( No05dz15004)
文摘Real traffic information was analyzed in the statistical characteristics and approximated as a Gaussian time series. A data source model, called two states constant bit rate (TSCBR), was proposed in dynamic traffic monitoring sensor networks. Analysis of autocorrelation of the models shows that the proposed TSCBR model matches with the statistical characteristics of real data source closely. To further verify the validity of the TSCBR data source model, the performance metrics of power consumption and network lifetime was studied in the evaluation of sensor media access control (SMAC) algorithm. The simulation results show that compared with traditional data source models, TSCBR model can significantly improve accuracy of the algorithm evaluation.
基金support provided by the National Natural Sciences Foundation of China(No.41771419)Student Research Training Program of Southwest Jiaotong University(No.191510,No.182117)。
文摘The Digital Elevation Model(DEM)data of debris flow prevention engineering are the boundary of a debris flow prevention simulation,which provides accurate and reliable DEM data and is a key consideration in debris flow prevention simulations.Thus,this paper proposes a multi-source data fusion method.First,we constructed 3D models of debris flow prevention using virtual reality technology according to the relevant specifications.The 3D spatial data generated by 3D modeling were converted into DEM data for debris flow prevention engineering.Then,the accuracy and applicability of the DEM data were verified by the error analysis testing and fusion testing of the debris flow prevention simulation.Finally,we propose the Levels of Detail algorithm based on the quadtree structure to realize the visualization of a large-scale disaster prevention scene.The test results reveal that the data fusion method controlled the error rate of the DEM data of the debris flow prevention engineering within an allowable range and generated 3D volume data(obj format)to compensate for the deficiency of the DEM data whereby the 3D internal entity space is not expressed.Additionally,the levels of detailed method can dispatch the data of a large-scale debris flow hazard scene in real time to ensure a realistic 3D visualization.In summary,the proposed methods can be applied to the planning of debris flow prevention engineering and to the simulation of the debris flow prevention process.
文摘This research describes a quantitative,rapid,and low-cost methodology for debris flow susceptibility evaluation at the basin scale using open-access data and geodatabases.The proposed approach can aid decision makers in land management and territorial planning,by first screening for areas with a higher debris flow susceptibility.Five environmental predisposing factors,namely,bedrock lithology,fracture network,quaternary deposits,slope inclination,and hydrographic network,were selected as independent parameters and their mutual interactions were described and quantified using the Rock Engineering System(RES)methodology.For each parameter,specific indexes were proposed,aiming to provide a final synthetic and representative index of debris flow susceptibility at the basin scale.The methodology was tested in four basins located in the Upper Susa Valley(NW Italian Alps)where debris flow events are the predominant natural hazard.The proposed matrix can represent a useful standardized tool,universally applicable,since it is independent of type and characteristic of the basin.
基金supported by National Natural Science Foundation of China(No.11575240)Key Program of Research and Development of Hefei Science Center,CAS(grant 2016HSC-KPRD002)
文摘Neutral beam injection is one of the effective auxiliary heating methods in magnetic-confinementfusion experiments. In order to acquire the suppressor-grid current signal and avoid the grid being damaged by overheating, a data acquisition and over-current protection system based on the PXI(PCI e Xtensions for Instrumentation) platform has been developed. The system consists of a current sensor, data acquisition module and over-current protection module. In the data acquisition module,the acquired data of one shot will be transferred in isolation and saved in a data-storage server in a txt file. It can also be recalled using NBWave for future analysis. The over-current protection module contains two modes: remote and local. This gives it the function of setting a threshold voltage remotely and locally, and the forbidden time of over-current protection also can be set by a host PC in remote mode. Experimental results demonstrate that the data acquisition and overcurrent protection system has the advantages of setting forbidden time and isolation transmission.
基金Research Center of the College of Computer and Information Sciences,King Saud University,Grant/Award Number:RSPD2024R947King Saud University。
文摘Over the past few years,the application and usage of Machine Learning(ML)techniques have increased exponentially due to continuously increasing the size of data and computing capacity.Despite the popularity of ML techniques,only a few research studies have focused on the application of ML especially supervised learning techniques in Requirement Engineering(RE)activities to solve the problems that occur in RE activities.The authors focus on the systematic mapping of past work to investigate those studies that focused on the application of supervised learning techniques in RE activities between the period of 2002–2023.The authors aim to investigate the research trends,main RE activities,ML algorithms,and data sources that were studied during this period.Forty-five research studies were selected based on our exclusion and inclusion criteria.The results show that the scientific community used 57 algorithms.Among those algorithms,researchers mostly used the five following ML algorithms in RE activities:Decision Tree,Support Vector Machine,Naïve Bayes,K-nearest neighbour Classifier,and Random Forest.The results show that researchers used these algorithms in eight major RE activities.Those activities are requirements analysis,failure prediction,effort estimation,quality,traceability,business rules identification,content classification,and detection of problems in requirements written in natural language.Our selected research studies used 32 private and 41 public data sources.The most popular data sources that were detected in selected studies are the Metric Data Programme from NASA,Predictor Models in Software Engineering,and iTrust Electronic Health Care System.
文摘Geological hazard is an adverse geological condition that can cause loss of life and property.Accurate prediction and analysis of geological hazards is an important and challenging task.In the past decade,there has been a great expansion of geohazard detection data and advancement in data-driven simulation techniques.In particular,great efforts have been made in applying deep learning to predict geohazards.To understand the recent progress in this field,this paper provides an overview of the commonly used data sources and deep neural networks in the prediction of a variety of geological hazards.
基金supported by Key Project in the National Science and Technology Pillar Program(Grant No.2015BAG20B05)the Fund of National Engineering Research Center for Water Transport Safety(No.16KA03).
文摘Portable Pilot Units(PPU)play an important role in modern pilotage.With more than 20 years’PPU development and practice,a comprehensive data analysis is conducted in this paper.The reliabilities and accuracy of different sensors are compared.Finally,the risk of PPU piloting and the corresponding countermeasures is discussed.
基金Supported by the National Natural Science Foundation of China(61501007)General Project of Science and Technology Project of Beijing Municipal Education Commission(KM201610005023)
文摘The main function of Internet of Things is to collect and transmit data.At present,the data transmission in Internet of Things lacks effective trust attestation mechanism and trust traceability mechanism of data source.To solve the above problems,a trust attestation mechanism for sensing layer nodes is presented.First a trusted group is established,and the node which is going to join the group needs to attest its identity and key attributes to the higher level node.Then the dynamic trust measurement value of the node can be obtained by measuring the node data transmission behavior.Finally the node encapsulates the key attributes and trust measurement value to use short message group signature to attest its trust to the challenger.This mechanism can measure the data sending and receiving behaviors of sensing nodes and track the data source,and it does not expose the privacy information of nodes and the sensing nodes can be traced effectively.The trust measurement for sensing nodes and verification is applicable to Internet of Things and the simulation experiment shows the trust attestation mechanism is flexible,practical and efficient.Besides,it can accurately and quickly identify the malicious nodes at the same time.The impact on the system performance is negligible.
基金supported in part by the National Key R&D Program of China under grant 2022YFB3903402in part by the National Natural Science Foundation of China under grant 42222106in part by the National Natural Science Foundation of China under grant 61976234.
文摘Paddy rice mapping is crucial for cultivation management,yield estimation,and food security.Guangdong,straddling tropics and subtropics,is a major rice-producing region in China.Mapping paddy rice in Guangdong is essential.However,there are 2 main difficulties in tropical and subtropical paddy rice mapping,including the lack of high-quality optical images and differences in paddy rice planting times.This study proposed a paddy rice mapping framework using phenology matching,integrating Sentinel-1 and Sentinel-2 data to incorporate prior knowledge into the classifiers.The transplanting periods of paddy rice were identified with Sentinel-1 data,and the subsequent 3 months were defined as the growth periods.Features during growth periods obtained by Sentinel-1 and Sentinel-2 were inputted into machine learning classifiers.The classifiers using matched features substantially improved mapping accuracy compared with those using unmatched features,both for early and late rice mapping.The proposed method also improved the accuracy by 6.44%to 16.10%compared with 3 other comparison methods.The model,utilizing matched features,was applied to early and late rice mapping in Guangdong in 2020.Regression results between mapping area and statistical data validate paddy rice mapping credibility.Our analysis revealed that thermal conditions,especially cold severity during growing stages,are the primary determinant of paddy rice phenology.Spatial patterns of paddy rice in Guangdong result from a blend of human and physical factors,with slope and minimum temperature emerging as the most important limitations.These findings enhance our understanding of rice ecosystems’dynamics,offering insights for formulating relevant agricultural policies.
基金supported by the Stanford Diabetes Research Center(no.P30DK116074)and mentored by SPARK Translational Research Program,Stanford University.
文摘The"omics"revolution has transformed the biomedical research landscape by equipping scientists with the ability to interrogate complex biological phenomenon and disease processes at an unprecedented level.The volume of"big"data generated by the different omics studies such as genomics,transcriptomics,proteomics,and metabolomics has led to the concurrent development of computational tools to enable in silico analysis and aid data deconvolution.Considering the intensive resources and high costs required to generate and analyze big data,there has been centralized,collaborative efforts to make the data and analysis tools freely available as"Open Source,"to benefit the wider research community.Pancreatology research studies have contributed to this"big data rush"and have additionally benefitted from utilizing the open source data as evidenced by the increasing number of new research findings and publications that stem from such data.In this review,we briefly introduce the evolution of open source omics data,data types,the"FAIR"guiding principles for data management and reuse,and centralized platforms that enable free and fair data accessibility,availability,and provide tools for omics data analysis.We illustrate,through the case study of our own experience in mining pancreatitis omics data,the power of repurposing open source data to answer translationally relevant questions in pancreas research.
基金supported by the National Key R&D Program of China (2022YFC3600805 and 2021YFC2501900)
文摘Background: Population-based cancer survival is a key metric in evaluating the overall effectiveness of health services and cancer control activities. Advancement in information technology enables accurate vital status tracking through multi-source data linkage. However, its reliability for survival estimates in China is unclear.Methods: We analyzed data from Dalian Cancer Registry to evaluate the reliability of multi-source data linkage for population-based cancer survival estimates in China. Newly diagnosed cancer patients in 2015 were included and followed until June 2021. We conducted single-source data linkage by linking patients to Dalian Vital Statistics System, and multi-source data linkage by further linking to Dalian Household Registration System and the hospital medical records. Patient vital status was subsequently determined through active follow-up via telephone calls, referred to as comprehensive follow-up, which served as the gold standard. Using the cohort method, we calculated 5-year observed survival and age-standardized relative survival for 20 cancer types and all cancers combined.Results: Compared to comprehensive follow-up, single-source data linkage overestimated 5-year observed survival by 3.2% for all cancers combined, ranging from 0.1% to 8.6% across 20 cancer types. Multi-source data linkage provided a relatively complete patient vital status, with an observed survival estimate of only 0.3% higher for all cancers, ranging from 0% to1.5% across 20 cancer types.Conclusion: Multi-source data linkage contributes to reliable population-based cancer survival estimates in China. Linkage of multiple databases might be of great value in improving the efficiency of follow-up and the quality of survival data for cancer patients in developing countries.
文摘Traditional assessment indexes could not fully describe offshore wind resources,for the meteorological properties of offshore are more complex than onshore.As a result,the uncertainty of offshore wind power projects would be increased and final economic benefits would be affected.Therefore,a study on offshore wind resource assessment is carried out,including three processes of“studying data sources,conducting multidimensional indexes system and proposing an offshore wind resource assessment method based on analytic hierarchy process(AHP).First,measured wind data and two kinds of reanalysis data are used to analyze the characteristics and reliability of data sources.Second,indexes such as effective wind speed occurrence,affluent level occurrence,coefficient of variation,neutral state occurrence have been proposed to depict availability,richness,and stability of offshore wind resources,respectively.Combined with existing parameters(wind power density,dominant wind direction occurrence,water depth,distance to coast),a multidimensional indexes system has been built and on this basis,an offshore wind energy potential assessment method has been proposed.Furthermore,the proposed method is verified by the annual energy production of five offshore wind turbines and practical operating data of four offshore wind farms in China.This study also compares the ranking results of the AHP model to two multi-criteria decision making(MCDM)models including weighted aggregated sum product assessment(WASPAS)and multi-attribute ideal real comparative analysis(MAIRCA).Results show the proposed method gains well in practical engineering applications,where the economic score values have been considered based on the offshore reasonable utilization hours of the whole life cycle in China.