Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in c...Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.展开更多
The mining industry faces a number of challenges that promote the adoption of new technologies.Big data,which is driven by the accelerating progress of information and communication technology,is one of the promising ...The mining industry faces a number of challenges that promote the adoption of new technologies.Big data,which is driven by the accelerating progress of information and communication technology,is one of the promising technologies that can reshape the entire mining landscape.Despite numerous attempts to apply big data in the mining industry,fundamental problems of big data,especially big data management(BDM),in the mining industry persist.This paper aims to fill the gap by presenting the basics of BDM.This work provides a brief introduction to big data and BDM,and it discusses the challenges encountered by the mining industry to indicate the necessity of implementing big data.It also summarizes data sources in the mining industry and presents the potential benefits of big data to the mining industry.This work also envisions a future in which a global database project is established and big data is used together with other technologies(i.e.,automation),supported by government policies and following international standards.This paper also outlines the precautions for the utilization of BDM in the mining industry.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
Comprehensive evaluation and warning is very important and difficult in food safety. This paper mainly focuses on introducing the application of using big data mining in food safety warning field. At first,we introduc...Comprehensive evaluation and warning is very important and difficult in food safety. This paper mainly focuses on introducing the application of using big data mining in food safety warning field. At first,we introduce the concept of big data miming and three big data methods. At the same time,we discuss the application of the three big data miming methods in food safety areas. Then we compare these big data miming methods,and propose how to apply Back Propagation Neural Network in food safety risk warning.展开更多
Although big data are widely used in various fields,its application is still rare in the study of mining subsidence prediction(MSP)caused by underground mining.Traditional research in MSP has the problem of oversimpli...Although big data are widely used in various fields,its application is still rare in the study of mining subsidence prediction(MSP)caused by underground mining.Traditional research in MSP has the problem of oversimplifying geological mining conditions,ignoring the fluctuation of rock layers with space.In the context of geospatial big data,a data-intensive FLAC3D(Fast Lagrangian Analysis of a Continua in 3 Dimensions)model is proposed in this paper based on borehole logs.In the modeling process,we developed a method to handle geospatial big data and were able to make full use of borehole logs.The effectiveness of the proposed method was verified by comparing the results of the traditional method,proposed method,and field observation.The findings show that the proposed method has obvious advantages over the traditional prediction results.The relative error of the maximum surface subsidence predicted by the proposed method decreased by 93.7%and the standard deviation of the prediction results(which was 70 points)decreased by 39.4%,on average.The data-intensive modeling method is of great significance for improving the accuracy of mining subsidence predictions.展开更多
Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challen...Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.展开更多
Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply c...Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply chain and Internet,Big Data,Artificial Intelligence,Internet of Things,Blockchain,etc.,the efficiency of supply chain financial services can be greatly promoted through building more customized risk pricing models and conducting more rigorous investment decision-making processes.However,with the rapid development of new technologies,the SCF data has been massively increased and new financial fraud behaviors or patterns are becoming more covertly scattered among normal ones.The lack of enough capability to handle the big data volumes and mitigate the financial frauds may lead to huge losses in supply chains.In this article,a distributed approach of big data mining is proposed for financial fraud detection in a supply chain,which implements the distributed deep learning model of Convolutional Neural Network(CNN)on big data infrastructure of Apache Spark and Hadoop to speed up the processing of the large dataset in parallel and reduce the processing time significantly.By training and testing on the continually updated SCF dataset,the approach can intelligently and automatically classify the massive data samples and discover the fraudulent financing behaviors,so as to enhance the financial fraud detection with high precision and recall rates,and reduce the losses of frauds in a supply chain.展开更多
Research and application of big data mining,at present,is a hot issue. This paper briefly introduces the basic ideas of big data research, analyses the necessity of big data application in earthquake precursor observa...Research and application of big data mining,at present,is a hot issue. This paper briefly introduces the basic ideas of big data research, analyses the necessity of big data application in earthquake precursor observation,and probes certain issues and solutions when applying this technology to work in the seismic-related domain. By doing so,we hope it can promote the innovative use of big data in earthquake precursor observation data analysis.展开更多
With the development and improvement of the information technologies,the increasing of the upper application systems and the rapid expansion of the data accumulated in the campus information environment,a typical camp...With the development and improvement of the information technologies,the increasing of the upper application systems and the rapid expansion of the data accumulated in the campus information environment,a typical campus big data environment has initially been formed.Because of the characteristics of the higher education,students'mobility is great and their learning environment is uncertain,so that the students'attendance mostly used the manual naming.The student attendance system based on the big data architecture is relying on the campus network,and adopting the appropriate sensors.Through the data mining technology,combined with the campus One Card solution,we can realize the management of the attendance without naming in class.It can not only strengthen the management of the students,but can also improve the management levels of the colleges and universities.展开更多
The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technol...The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.展开更多
Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been br...Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.展开更多
The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big ...The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big data feature analysis and mining. The big data feature mining in the cloud computing environment is an effective method for the elficient application of the massive data in the information age. In the process of the big data mining, the method o f the big data feature mining based on the gradient sampling has the poor logicality. It only mines the big data features from a single-level perspective, which reduces the precision of the big data feature mining.展开更多
Clinical databases have accumulated large quantities of information about patients and their medical conditions. Current challenges in biomedical research and clinical practice include information overload and the nee...Clinical databases have accumulated large quantities of information about patients and their medical conditions. Current challenges in biomedical research and clinical practice include information overload and the need to optimize workflows, processes and guidelines, to increase capacity while reducing costs and improving efficiency. There is an urgent need for integrative and interactive machine learning solutions, because no medical doctor or biomedical researcher can keep pace today with the increasingly large and complex data sets – often called "Big Data".展开更多
Big data is a highlighted challenge for many fields with the rapid expansion of large-volume, complex, and fast-growing sources of data. Mining from big data is required for exploring the essence of data and providing...Big data is a highlighted challenge for many fields with the rapid expansion of large-volume, complex, and fast-growing sources of data. Mining from big data is required for exploring the essence of data and providing meaningful information. To this end, we have previously introduced the theory of physical field to explore relations between objects in data space and proposed a framework of data field to discover the underlying distribution of big data. This paper concerns an overview of big data mining by the use of data field. It mainly discusses the theory of data field and different aspects of applications including feature selection for high-dimensional data, clustering, and the recognition of facial expression in human-computer interaction. In these applications, data field is employed to capture the intrinsic distribution of data objects for selecting meaningful features, fast clustering, and describing variation of facial expression. It is expected that our contributions would help overcome the problems in accordance with big data.展开更多
With the rapid development of the internet, internet of things, mobile internet, and cloud computing, the amount of data in circulation has grown rapidly. More social information has contributed to the growth of big d...With the rapid development of the internet, internet of things, mobile internet, and cloud computing, the amount of data in circulation has grown rapidly. More social information has contributed to the growth of big data, and data has become a core asset. Big data is challenging in terms of effective storage, efficient computation and analysis, and deep data mining. In this paper, we discuss the signif- icance of big data and discuss key technologies and problems in big-data analyties. We also discuss the future prospects of big-data analylics.展开更多
With the rapid development of the global economy, maritime transportation has become much more convenient due to large capacities and low freight. However, this means the sea lanes are becoming more and more crowded,l...With the rapid development of the global economy, maritime transportation has become much more convenient due to large capacities and low freight. However, this means the sea lanes are becoming more and more crowded,leading to high probabilities of marine accidents in complex maritime environments. According to relevant historical statistics, a large number of accidents have happened in water areas that lack high precision navigation data, which can be utilized to enhance navigation safety. The purpose of this work was to carry out ship route planning automatically, by mining historical big automatic identification system(AIS) data. It is well-known that experiential navigation information hidden in maritime big data could be automatically extracted using advanced data mining techniques;assisting in the generation of safe and reliable ship planning routes for complex maritime environments. In this paper, a novel method is proposed to construct a big data-driven framework for generating ship planning routes automatically, under varying navigation conditions. The method performs density-based spatial clustering of applications with noise first on a large number of ship trajectories to form different trajectory vector clusters. Then, it iteratively calculates its centerline in the trajectory vector cluster, and constructs the waterway network from the node-arc topology relationship among these centerlines. The generation of shipping route could be based on the waterway network and conducted by rasterizing the marine environment risks for the sea area not covered by the waterway network. Numerous experiments have been conducted on different AIS data sets in different water areas, and the experimental results have demonstrated the effectiveness of the framework of the ship route planning proposed in this paper.展开更多
The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibi...The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibilities that increase the dynamic and volatility of their ecosystems. On the one hand, this evolution generates a huge field for exploitation, but on the other hand also increases complexity including new challenges and requirements demanding for new approaches in several issues. One challenge is the analysis of such systems that generate huge amounts of (continuously generated) data, potentially containing valuable information useful for several use cases, such as knowledge generation, key performance indicator (KPI) optimization, diagnosis, predication, feedback to design or decision support. This work presents a review of Big Data analysis in smart manufacturing systems. It includes the status quo in research, innovation and development, next challenges, and a comprehensive list of potential use cases and exploitation possibilities.展开更多
Purpose: Big data offer a huge challenge. Their very existence leads to the contradiction that the more data we have the less accessible they become,as the particular piece of information one is searching for may be b...Purpose: Big data offer a huge challenge. Their very existence leads to the contradiction that the more data we have the less accessible they become,as the particular piece of information one is searching for may be buried among terabytes of other data. In this contribution we discuss the origin of big data and point to three challenges when big data arise: Data storage,data processing and generating insights.Design/methodology/approach: Computer-related challenges can be expressed by the CAP theorem which states that it is only possible to simultaneously provide any two of the three following properties in distributed applications: Consistency(C),availability(A) and partition tolerance(P). As an aside we mention Amdahl's law and its application for scientific collaboration. We further discuss data mining in large databases and knowledge representation for handling the results of data mining exercises. We further offer a short informetric study of the field of big data,and point to the ethical dimension of the big data phenomenon.Findings: There still are serious problems to overcome before the field of big data can deliver on its promises.Implications and limitations: This contribution offers a personal view,focusing on the information science aspects,but much more can be said about software aspects.Originality/value: We express the hope that the information scientists,including librarians,will be able to play their full role within the knowledge discovery,data mining and big data communities,leading to exciting developments,the reduction of scientific bottlenecks and really innovative applications.展开更多
Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Proj...Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.展开更多
Global warming has caused the Arctic Ocean ice cover to shrink.This endangers the environment but has made traversing the Arctic channel possible.Therefore,the strategic position of the Arctic has been significantly i...Global warming has caused the Arctic Ocean ice cover to shrink.This endangers the environment but has made traversing the Arctic channel possible.Therefore,the strategic position of the Arctic has been significantly improved.As a near-Arctic country,China has formulated relevant policies that will be directly impacted by changes in the international relations between the eight Arctic countries(regions).A comprehensive and real-time analysis of the various characteristics of the Arctic geographical relationship is required in China,which helps formulate political,economic,and diplomatic countermeasures.Massive global real-time open databases provide news data from major media in various countries.This makes it possible to monitor geographical relationships in real-time.This paper explores key elements of the social development of eight Arctic countries(regions)over 2013-2019 based on the GDELT database and the method of labeled latent Dirichlet allocation.This paper also constructs the national interaction network and identifies the evolution pattern for the relationships between Arctic countries(regions).The following conclusions are drawn.(1)Arctic news hotspot is now focusing on climate change/ice cap melting which is becoming the main driving factor for changes in geographical relationships in the Arctic.(2)There is a strong correlation between the number of news pieces about ice cap melting and the sea ice area.(3)With the melting of the ice caps,the social,economic,and military activities in the Arctic have been booming,and the competition for dominance is becoming increasingly fierce.In general,there is a pattern of domination by Russia and Canada.展开更多
文摘Facing the development of future 5 G, the emerging technologies such as Internet of things, big data, cloud computing, and artificial intelligence is enhancing an explosive growth in data traffic. Radical changes in communication theory and implement technologies, the wireless communications and wireless networks have entered a new era. Among them, wireless big data(WBD) has tremendous value, and artificial intelligence(AI) gives unthinkable possibilities. However, in the big data development and artificial intelligence application groups, the lack of a sound theoretical foundation and mathematical methods is regarded as a real challenge that needs to be solved. From the basic problem of wireless communication, the interrelationship of demand, environment and ability, this paper intends to investigate the concept and data model of WBD, the wireless data mining, the wireless knowledge and wireless knowledge learning(WKL), and typical practices examples, to facilitate and open up more opportunities of WBD research and developments. Such research is beneficial for creating new theoretical foundation and emerging technologies of future wireless communications.
文摘The mining industry faces a number of challenges that promote the adoption of new technologies.Big data,which is driven by the accelerating progress of information and communication technology,is one of the promising technologies that can reshape the entire mining landscape.Despite numerous attempts to apply big data in the mining industry,fundamental problems of big data,especially big data management(BDM),in the mining industry persist.This paper aims to fill the gap by presenting the basics of BDM.This work provides a brief introduction to big data and BDM,and it discusses the challenges encountered by the mining industry to indicate the necessity of implementing big data.It also summarizes data sources in the mining industry and presents the potential benefits of big data to the mining industry.This work also envisions a future in which a global database project is established and big data is used together with other technologies(i.e.,automation),supported by government policies and following international standards.This paper also outlines the precautions for the utilization of BDM in the mining industry.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
基金Supported by Soft Science Research Project of Guizhou Province(R20142023)Key Youth Fund Project of Guizhou Academy of Sciences(J201402)
文摘Comprehensive evaluation and warning is very important and difficult in food safety. This paper mainly focuses on introducing the application of using big data mining in food safety warning field. At first,we introduce the concept of big data miming and three big data methods. At the same time,we discuss the application of the three big data miming methods in food safety areas. Then we compare these big data miming methods,and propose how to apply Back Propagation Neural Network in food safety risk warning.
文摘Although big data are widely used in various fields,its application is still rare in the study of mining subsidence prediction(MSP)caused by underground mining.Traditional research in MSP has the problem of oversimplifying geological mining conditions,ignoring the fluctuation of rock layers with space.In the context of geospatial big data,a data-intensive FLAC3D(Fast Lagrangian Analysis of a Continua in 3 Dimensions)model is proposed in this paper based on borehole logs.In the modeling process,we developed a method to handle geospatial big data and were able to make full use of borehole logs.The effectiveness of the proposed method was verified by comparing the results of the traditional method,proposed method,and field observation.The findings show that the proposed method has obvious advantages over the traditional prediction results.The relative error of the maximum surface subsidence predicted by the proposed method decreased by 93.7%and the standard deviation of the prediction results(which was 70 points)decreased by 39.4%,on average.The data-intensive modeling method is of great significance for improving the accuracy of mining subsidence predictions.
基金support from the Cyber Technology Institute(CTI)at the School of Computer Science and Informatics,De Montfort University,United Kingdom,along with financial assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher’s office through publication fund E15216.
文摘Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.
基金This research work is supported by Hunan Provincial Education Science 13th Five-Year Plan(Grant No.XJK016BXX001,Zhou,H.,http://jyt.hunan.gov.cn/jyt/sjyt/jky/index.html)Social Science Foundation of Hunan Province(Grant No.17YBA049,Zhou,H.,https://sk.rednet.cn/channel/7862.html)The work is also supported by Open Foundation for University Innovation Platform from Hunan Province,China(Grand No.18K103,Sun,G.,http://kxjsc.gov.hnedu.cn/).
文摘Supply Chain Finance(SCF)is important for improving the effectiveness of supply chain capital operations and reducing the overall management cost of a supply chain.In recent years,with the deep integration of supply chain and Internet,Big Data,Artificial Intelligence,Internet of Things,Blockchain,etc.,the efficiency of supply chain financial services can be greatly promoted through building more customized risk pricing models and conducting more rigorous investment decision-making processes.However,with the rapid development of new technologies,the SCF data has been massively increased and new financial fraud behaviors or patterns are becoming more covertly scattered among normal ones.The lack of enough capability to handle the big data volumes and mitigate the financial frauds may lead to huge losses in supply chains.In this article,a distributed approach of big data mining is proposed for financial fraud detection in a supply chain,which implements the distributed deep learning model of Convolutional Neural Network(CNN)on big data infrastructure of Apache Spark and Hadoop to speed up the processing of the large dataset in parallel and reduce the processing time significantly.By training and testing on the continually updated SCF dataset,the approach can intelligently and automatically classify the massive data samples and discover the fraudulent financing behaviors,so as to enhance the financial fraud detection with high precision and recall rates,and reduce the losses of frauds in a supply chain.
基金sponsored by the Earthquake Monitoring Special Project of "Precursor Observation Data Mining",Key Laboratory of Crustal Dynamics,Institute of Crustal Dynamics,China Earthquake Administration
文摘Research and application of big data mining,at present,is a hot issue. This paper briefly introduces the basic ideas of big data research, analyses the necessity of big data application in earthquake precursor observation,and probes certain issues and solutions when applying this technology to work in the seismic-related domain. By doing so,we hope it can promote the innovative use of big data in earthquake precursor observation data analysis.
文摘With the development and improvement of the information technologies,the increasing of the upper application systems and the rapid expansion of the data accumulated in the campus information environment,a typical campus big data environment has initially been formed.Because of the characteristics of the higher education,students'mobility is great and their learning environment is uncertain,so that the students'attendance mostly used the manual naming.The student attendance system based on the big data architecture is relying on the campus network,and adopting the appropriate sensors.Through the data mining technology,combined with the campus One Card solution,we can realize the management of the attendance without naming in class.It can not only strengthen the management of the students,but can also improve the management levels of the colleges and universities.
文摘The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.
文摘Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.
文摘The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big data feature analysis and mining. The big data feature mining in the cloud computing environment is an effective method for the elficient application of the massive data in the information age. In the process of the big data mining, the method o f the big data feature mining based on the gradient sampling has the poor logicality. It only mines the big data features from a single-level perspective, which reduces the precision of the big data feature mining.
文摘Clinical databases have accumulated large quantities of information about patients and their medical conditions. Current challenges in biomedical research and clinical practice include information overload and the need to optimize workflows, processes and guidelines, to increase capacity while reducing costs and improving efficiency. There is an urgent need for integrative and interactive machine learning solutions, because no medical doctor or biomedical researcher can keep pace today with the increasingly large and complex data sets – often called "Big Data".
文摘Big data is a highlighted challenge for many fields with the rapid expansion of large-volume, complex, and fast-growing sources of data. Mining from big data is required for exploring the essence of data and providing meaningful information. To this end, we have previously introduced the theory of physical field to explore relations between objects in data space and proposed a framework of data field to discover the underlying distribution of big data. This paper concerns an overview of big data mining by the use of data field. It mainly discusses the theory of data field and different aspects of applications including feature selection for high-dimensional data, clustering, and the recognition of facial expression in human-computer interaction. In these applications, data field is employed to capture the intrinsic distribution of data objects for selecting meaningful features, fast clustering, and describing variation of facial expression. It is expected that our contributions would help overcome the problems in accordance with big data.
文摘With the rapid development of the internet, internet of things, mobile internet, and cloud computing, the amount of data in circulation has grown rapidly. More social information has contributed to the growth of big data, and data has become a core asset. Big data is challenging in terms of effective storage, efficient computation and analysis, and deep data mining. In this paper, we discuss the signif- icance of big data and discuss key technologies and problems in big-data analyties. We also discuss the future prospects of big-data analylics.
文摘With the rapid development of the global economy, maritime transportation has become much more convenient due to large capacities and low freight. However, this means the sea lanes are becoming more and more crowded,leading to high probabilities of marine accidents in complex maritime environments. According to relevant historical statistics, a large number of accidents have happened in water areas that lack high precision navigation data, which can be utilized to enhance navigation safety. The purpose of this work was to carry out ship route planning automatically, by mining historical big automatic identification system(AIS) data. It is well-known that experiential navigation information hidden in maritime big data could be automatically extracted using advanced data mining techniques;assisting in the generation of safe and reliable ship planning routes for complex maritime environments. In this paper, a novel method is proposed to construct a big data-driven framework for generating ship planning routes automatically, under varying navigation conditions. The method performs density-based spatial clustering of applications with noise first on a large number of ship trajectories to form different trajectory vector clusters. Then, it iteratively calculates its centerline in the trajectory vector cluster, and constructs the waterway network from the node-arc topology relationship among these centerlines. The generation of shipping route could be based on the waterway network and conducted by rasterizing the marine environment risks for the sea area not covered by the waterway network. Numerous experiments have been conducted on different AIS data sets in different water areas, and the experimental results have demonstrated the effectiveness of the framework of the ship route planning proposed in this paper.
文摘The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibilities that increase the dynamic and volatility of their ecosystems. On the one hand, this evolution generates a huge field for exploitation, but on the other hand also increases complexity including new challenges and requirements demanding for new approaches in several issues. One challenge is the analysis of such systems that generate huge amounts of (continuously generated) data, potentially containing valuable information useful for several use cases, such as knowledge generation, key performance indicator (KPI) optimization, diagnosis, predication, feedback to design or decision support. This work presents a review of Big Data analysis in smart manufacturing systems. It includes the status quo in research, innovation and development, next challenges, and a comprehensive list of potential use cases and exploitation possibilities.
文摘Purpose: Big data offer a huge challenge. Their very existence leads to the contradiction that the more data we have the less accessible they become,as the particular piece of information one is searching for may be buried among terabytes of other data. In this contribution we discuss the origin of big data and point to three challenges when big data arise: Data storage,data processing and generating insights.Design/methodology/approach: Computer-related challenges can be expressed by the CAP theorem which states that it is only possible to simultaneously provide any two of the three following properties in distributed applications: Consistency(C),availability(A) and partition tolerance(P). As an aside we mention Amdahl's law and its application for scientific collaboration. We further discuss data mining in large databases and knowledge representation for handling the results of data mining exercises. We further offer a short informetric study of the field of big data,and point to the ethical dimension of the big data phenomenon.Findings: There still are serious problems to overcome before the field of big data can deliver on its promises.Implications and limitations: This contribution offers a personal view,focusing on the information science aspects,but much more can be said about software aspects.Originality/value: We express the hope that the information scientists,including librarians,will be able to play their full role within the knowledge discovery,data mining and big data communities,leading to exciting developments,the reduction of scientific bottlenecks and really innovative applications.
文摘Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.
基金National Natural Science Foundation of China(42071153)The Strategic Priority Research Program of Chinese Academy of Sciences(XDA19040401)The Strategic Priority Research Program of Chinese Academy of Sciences(XDA20080100)。
文摘Global warming has caused the Arctic Ocean ice cover to shrink.This endangers the environment but has made traversing the Arctic channel possible.Therefore,the strategic position of the Arctic has been significantly improved.As a near-Arctic country,China has formulated relevant policies that will be directly impacted by changes in the international relations between the eight Arctic countries(regions).A comprehensive and real-time analysis of the various characteristics of the Arctic geographical relationship is required in China,which helps formulate political,economic,and diplomatic countermeasures.Massive global real-time open databases provide news data from major media in various countries.This makes it possible to monitor geographical relationships in real-time.This paper explores key elements of the social development of eight Arctic countries(regions)over 2013-2019 based on the GDELT database and the method of labeled latent Dirichlet allocation.This paper also constructs the national interaction network and identifies the evolution pattern for the relationships between Arctic countries(regions).The following conclusions are drawn.(1)Arctic news hotspot is now focusing on climate change/ice cap melting which is becoming the main driving factor for changes in geographical relationships in the Arctic.(2)There is a strong correlation between the number of news pieces about ice cap melting and the sea ice area.(3)With the melting of the ice caps,the social,economic,and military activities in the Arctic have been booming,and the competition for dominance is becoming increasingly fierce.In general,there is a pattern of domination by Russia and Canada.