Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with o...Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.展开更多
Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from sei...Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.展开更多
With the accelerating aging process of China’s population,the demand for community elderly care services has shown diversified and personalized characteristics.However,problems such as insufficient total care service...With the accelerating aging process of China’s population,the demand for community elderly care services has shown diversified and personalized characteristics.However,problems such as insufficient total care service resources,uneven distribution,and prominent supply-demand contradictions have seriously affected service quality.Big data technology,with core advantages including data collection,analysis and mining,and accurate prediction,provides a new solution for the allocation of community elderly care service resources.This paper systematically studies the application value of big data technology in the allocation of community elderly care service resources from three aspects:resource allocation efficiency,service accuracy,and management intelligence.Combined with practical needs,it proposes optimal allocation strategies such as building a big data analysis platform and accurately grasping the elderly’s care needs,striving to provide operable path references for the construction of community elderly care service systems,promoting the early realization of the elderly care service goal of“adequate support and proper care for the elderly”,and boosting the high-quality development of China’s elderly care service industry.展开更多
With the advent of the big data era,modern statistics has enjoyed unprecedented development opportunities and also faced numerous new challenges.Traditional statistical computing methods are often limited by issues su...With the advent of the big data era,modern statistics has enjoyed unprecedented development opportunities and also faced numerous new challenges.Traditional statistical computing methods are often limited by issues such as computer memory capacity and distributed storage of data across different locations,and are unable to directly apply to large-scale data sets.Therefore,in the context of big data,designing efficient and theoretically guaranteed statistical learning and inference algorithms has become a key issue that the current field of statistics urgently needs to address.In this paper,the application status of statistical analysis methods in the big data environment was systematically reviewed,and its future development directions were analyzed to provide reference and support for the further development of theory and methods of the statistical analysis of big data.展开更多
The convergence of artificial intelligence(AI)and big data is reshaping contemporary oncology by enabling the integration of multimodal information across imaging,pathology,genomics,and clinical records.From a physici...The convergence of artificial intelligence(AI)and big data is reshaping contemporary oncology by enabling the integration of multimodal information across imaging,pathology,genomics,and clinical records.From a physician-centered perspective,these technologies can potentially be used to improve diagnostic precision,support individualized treatment planning,enhance longitudinal patient management,and accelerate both clinical and translational research.In this review,we synthesize the core AI methodologies most relevant to oncology-machine learning,deep learning,and large language models-and examine how they interact with established and emerging oncology data platforms.We further highlight practical use cases in clinical workflows and research pipelines,emphasizing opportunities for advancing precision cancer care while also addressing challenges associated with data heterogeneity,model generalizability,privacy protection,and real-world implementation.By underscoring the synergistic value of AI and big data,this review aims to inform the development of clinically meaningful,context-adapted strategies that promote translational innovation in both global and locally resourced healthcare environments.展开更多
The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threa...The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threats is both necessary and complex,yet these interconnected healthcare settings generate enormous amounts of heterogeneous data.Traditional Intrusion Detection Systems(IDS),which are generally centralized and machine learning-based,often fail to address the rapidly changing nature of cyberattacks and are challenged by ethical concerns related to patient data privacy.Moreover,traditional AI-driven IDS usually face challenges in handling large-scale,heterogeneous healthcare data while ensuring data privacy and operational efficiency.To address these issues,emerging technologies such as Big Data Analytics(BDA)and Federated Learning(FL)provide a hybrid framework for scalable,adaptive intrusion detection in IoT-driven healthcare systems.Big data techniques enable processing large-scale,highdimensional healthcare data,and FL can be used to train a model in a decentralized manner without transferring raw data,thereby maintaining privacy between institutions.This research proposes a privacy-preserving Federated Learning–based model that efficiently detects cyber threats in connected healthcare systems while ensuring distributed big data processing,privacy,and compliance with ethical regulations.To strengthen the reliability of the reported findings,the resultswere validated using cross-dataset testing and 95%confidence intervals derived frombootstrap analysis,confirming consistent performance across heterogeneous healthcare data distributions.This solution takes a significant step toward securing next-generation healthcare infrastructure by combining scalability,privacy,adaptability,and earlydetection capabilities.The proposed global model achieves a test accuracy of 99.93%±0.03(95%CI)and amiss-rate of only 0.07%±0.02,representing state-of-the-art performance in privacy-preserving intrusion detection.The proposed FL-driven IDS framework offers an efficient,privacy-preserving,and scalable solution for securing next-generation healthcare infrastructures by combining adaptability,early detection,and ethical data management.展开更多
The development of remote sensing has seen the creation of a global measurement infrastructure of sustainable development due to growing multipolar archives,rising revisit frequency,and the availability of cloud-acces...The development of remote sensing has seen the creation of a global measurement infrastructure of sustainable development due to growing multipolar archives,rising revisit frequency,and the availability of cloud-accessible platforms of Earth observation.This review summarizes how remote sensing big data is being organized into decision-grade sustainability intelligence,the new approaches to analytics,and how Sustainable Development Goals(SDGs)-oriented application pathways inter-relate action pathways that bridge observations with action.The terminologies like new data ecosystem,data readiness and interoperability,changing economics of scalable computation,and detailing the functions of diversity of modalities(optical,Synthetic Aperture Radar—SAR,thermal,Light Detection and Ranging—LiDAR,hyperspectral)have been defined.These themes of analytics,which are transforming the practice of operational analytics,are then condensed:foundations and self-supervised learning of transferable representations,multi-modal fusion to gap fill and richer inference,spatiotemporal intelligence to trend of early warning,physics-aware hybrid methods to enhance robustness and meaning under non-stationary conditions.Across the climate risk,food systems,water resources,sustainable cities,ecosystems and biodiversity,energy transitions,and health exposure pathways,the roles of Earth Observation(EO)products as direct measures and proxies,and concepts of validating,semantic comparability,and communicating uncertainties play a key role in EO products becoming credible when faced with high-stakes deployment decisions.Lastly,we chart world ways of implementation via monitoring services,early warning systems,and systems of multiple regimes,and previously underline cross-cutting priorities,scalable structures in validation,performance,so that domains of shift,agreeable governance,and Dual-use risk safeguards,and sustainable lifecycle support of EO services.These priorities form a realistic set of priorities on the alignment of remote sensing innovation with quantifiable SDGs progress.展开更多
Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great s...Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.展开更多
The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability...The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability,operational efficiency,and security depends on the identification of anomalies in these dynamic and resource-constrained systems.Due to their high computational requirements and inability to efficiently process continuous data streams,traditional anomaly detection techniques often fail in IoT systems.This work presents a resource-efficient adaptive anomaly detection model for real-time streaming data in IoT systems.Extensive experiments were carried out on multiple real-world datasets,achieving an average accuracy score of 96.06%with an execution time close to 7.5 milliseconds for each individual streaming data point,demonstrating its potential for real-time,resourceconstrained applications.The model uses Principal Component Analysis(PCA)for dimensionality reduction and a Z-score technique for anomaly detection.It maintains a low computational footprint with a sliding window mechanism,enabling incremental data processing and identification of both transient and sustained anomalies without storing historical data.The system uses a Multivariate Linear Regression(MLR)based imputation technique that estimates missing or corrupted sensor values,preserving data integrity prior to anomaly detection.The suggested solution is appropriate for many uses in smart cities,industrial automation,environmental monitoring,IoT security,and intelligent transportation systems,and is particularly well-suited for resource-constrained edge devices.展开更多
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract...This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.展开更多
In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedente...In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.展开更多
The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improv...The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.展开更多
The security of the seed industry is crucial for ensuring national food security.Currently,developed countries in Europe and America,along with international seed industry giants,have entered the Breeding 4.0 era.This...The security of the seed industry is crucial for ensuring national food security.Currently,developed countries in Europe and America,along with international seed industry giants,have entered the Breeding 4.0 era.This era integrates biotechnology,artificial intelligence(AI),and big data information technology.In contrast,China is still in a transition period between stages 2.0 and 3.0,which primarily relies on conventional selection and molecular breeding.In the context of increasingly complex international situations,accurately identifying core issues in China's seed industry innovation and seizing the frontier of international seed technology are strategically important.These efforts are essential for ensuring food security and revitalizing the seed industry.This paper systematically analyzes the characteristics of crop breeding data from artificial selection to intelligent design breeding.It explores the applications and development trends of AI and big data in modern crop breeding from several key perspectives.These include highthroughput phenotype acquisition and analysis,multiomics big data database and management system construction,AI-based multiomics integrated analysis,and the development of intelligent breeding software tools based on biological big data and AI technology.Based on an in-depth analysis of the current status and challenges of China's seed industry technology development,we propose strategic goals and key tasks for China's new generation of AI and big data-driven intelligent design breeding.These suggestions aim to accelerate the development of an intelligent-driven crop breeding engineering system that features large-scale gene mining,efficient gene manipulation,engineered variety design,and systematized biobreeding.This study provides a theoretical basis and practical guidance for the development of China's seed industry technology.展开更多
This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models ...This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.展开更多
Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning fr...Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.展开更多
On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th Nation...On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.展开更多
Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tecto...Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tectonic activities.In the big data era,the establishment of new data platforms and the application of big data methods have become a focus for metamorphic rocks.Significant progress has been made in creating specialized databases,compiling comprehensive datasets,and utilizing data analytics to address complex scientific questions.However,many existing databases are inadequate in meeting the specific requirements of metamorphic research,resulting from a substantial amount of valuable data remaining uncollected.Therefore,constructing new databases that can cope with the development of the data era is necessary.This article provides an extensive review of existing databases related to metamorphic rocks and discusses data-driven studies in this.Accordingly,several crucial factors that need to be taken into consideration in the establishment of specialized metamorphic databases are identified,aiming to leverage data-driven applications to achieve broader scientific objectives in metamorphic research.展开更多
High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution ...High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.展开更多
The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information ...The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.展开更多
Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of i...Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.展开更多
文摘Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.
文摘Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.
文摘With the accelerating aging process of China’s population,the demand for community elderly care services has shown diversified and personalized characteristics.However,problems such as insufficient total care service resources,uneven distribution,and prominent supply-demand contradictions have seriously affected service quality.Big data technology,with core advantages including data collection,analysis and mining,and accurate prediction,provides a new solution for the allocation of community elderly care service resources.This paper systematically studies the application value of big data technology in the allocation of community elderly care service resources from three aspects:resource allocation efficiency,service accuracy,and management intelligence.Combined with practical needs,it proposes optimal allocation strategies such as building a big data analysis platform and accurately grasping the elderly’s care needs,striving to provide operable path references for the construction of community elderly care service systems,promoting the early realization of the elderly care service goal of“adequate support and proper care for the elderly”,and boosting the high-quality development of China’s elderly care service industry.
文摘With the advent of the big data era,modern statistics has enjoyed unprecedented development opportunities and also faced numerous new challenges.Traditional statistical computing methods are often limited by issues such as computer memory capacity and distributed storage of data across different locations,and are unable to directly apply to large-scale data sets.Therefore,in the context of big data,designing efficient and theoretically guaranteed statistical learning and inference algorithms has become a key issue that the current field of statistics urgently needs to address.In this paper,the application status of statistical analysis methods in the big data environment was systematically reviewed,and its future development directions were analyzed to provide reference and support for the further development of theory and methods of the statistical analysis of big data.
基金Hunan Provincial Natural Science Foundation of China,Grant/Award Numbers:2024JJ6289,2023JJ60464,2023JJ60334Changsha City Technology Program,Grant/Award Number:kq2403120+1 种基金Climb Plan of Hunan Cancer Hospital,Grant/Award Numbers:ZX2021005,QH2023006High-Level Talent Support Program of Hunan Cancer Hospital,Grant/Award Number:20250731-1050。
文摘The convergence of artificial intelligence(AI)and big data is reshaping contemporary oncology by enabling the integration of multimodal information across imaging,pathology,genomics,and clinical records.From a physician-centered perspective,these technologies can potentially be used to improve diagnostic precision,support individualized treatment planning,enhance longitudinal patient management,and accelerate both clinical and translational research.In this review,we synthesize the core AI methodologies most relevant to oncology-machine learning,deep learning,and large language models-and examine how they interact with established and emerging oncology data platforms.We further highlight practical use cases in clinical workflows and research pipelines,emphasizing opportunities for advancing precision cancer care while also addressing challenges associated with data heterogeneity,model generalizability,privacy protection,and real-world implementation.By underscoring the synergistic value of AI and big data,this review aims to inform the development of clinically meaningful,context-adapted strategies that promote translational innovation in both global and locally resourced healthcare environments.
文摘The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threats is both necessary and complex,yet these interconnected healthcare settings generate enormous amounts of heterogeneous data.Traditional Intrusion Detection Systems(IDS),which are generally centralized and machine learning-based,often fail to address the rapidly changing nature of cyberattacks and are challenged by ethical concerns related to patient data privacy.Moreover,traditional AI-driven IDS usually face challenges in handling large-scale,heterogeneous healthcare data while ensuring data privacy and operational efficiency.To address these issues,emerging technologies such as Big Data Analytics(BDA)and Federated Learning(FL)provide a hybrid framework for scalable,adaptive intrusion detection in IoT-driven healthcare systems.Big data techniques enable processing large-scale,highdimensional healthcare data,and FL can be used to train a model in a decentralized manner without transferring raw data,thereby maintaining privacy between institutions.This research proposes a privacy-preserving Federated Learning–based model that efficiently detects cyber threats in connected healthcare systems while ensuring distributed big data processing,privacy,and compliance with ethical regulations.To strengthen the reliability of the reported findings,the resultswere validated using cross-dataset testing and 95%confidence intervals derived frombootstrap analysis,confirming consistent performance across heterogeneous healthcare data distributions.This solution takes a significant step toward securing next-generation healthcare infrastructure by combining scalability,privacy,adaptability,and earlydetection capabilities.The proposed global model achieves a test accuracy of 99.93%±0.03(95%CI)and amiss-rate of only 0.07%±0.02,representing state-of-the-art performance in privacy-preserving intrusion detection.The proposed FL-driven IDS framework offers an efficient,privacy-preserving,and scalable solution for securing next-generation healthcare infrastructures by combining adaptability,early detection,and ethical data management.
文摘The development of remote sensing has seen the creation of a global measurement infrastructure of sustainable development due to growing multipolar archives,rising revisit frequency,and the availability of cloud-accessible platforms of Earth observation.This review summarizes how remote sensing big data is being organized into decision-grade sustainability intelligence,the new approaches to analytics,and how Sustainable Development Goals(SDGs)-oriented application pathways inter-relate action pathways that bridge observations with action.The terminologies like new data ecosystem,data readiness and interoperability,changing economics of scalable computation,and detailing the functions of diversity of modalities(optical,Synthetic Aperture Radar—SAR,thermal,Light Detection and Ranging—LiDAR,hyperspectral)have been defined.These themes of analytics,which are transforming the practice of operational analytics,are then condensed:foundations and self-supervised learning of transferable representations,multi-modal fusion to gap fill and richer inference,spatiotemporal intelligence to trend of early warning,physics-aware hybrid methods to enhance robustness and meaning under non-stationary conditions.Across the climate risk,food systems,water resources,sustainable cities,ecosystems and biodiversity,energy transitions,and health exposure pathways,the roles of Earth Observation(EO)products as direct measures and proxies,and concepts of validating,semantic comparability,and communicating uncertainties play a key role in EO products becoming credible when faced with high-stakes deployment decisions.Lastly,we chart world ways of implementation via monitoring services,early warning systems,and systems of multiple regimes,and previously underline cross-cutting priorities,scalable structures in validation,performance,so that domains of shift,agreeable governance,and Dual-use risk safeguards,and sustainable lifecycle support of EO services.These priorities form a realistic set of priorities on the alignment of remote sensing innovation with quantifiable SDGs progress.
基金Supported by the National Key Research and Development Program of China(Nos.2016YFC1402000,2018YFC1407003,2017YFC1405300)
文摘Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.
基金funded by the Ongoing Research Funding Program(ORF-2025-890)King Saud University,Riyadh,Saudi Arabia and was supported by the Competitive Research Fund of theUniversity of Aizu,Japan.
文摘The exponential expansion of the Internet of Things(IoT),Industrial Internet of Things(IIoT),and Transportation Management of Things(TMoT)produces vast amounts of real-time streaming data.Ensuring system dependability,operational efficiency,and security depends on the identification of anomalies in these dynamic and resource-constrained systems.Due to their high computational requirements and inability to efficiently process continuous data streams,traditional anomaly detection techniques often fail in IoT systems.This work presents a resource-efficient adaptive anomaly detection model for real-time streaming data in IoT systems.Extensive experiments were carried out on multiple real-world datasets,achieving an average accuracy score of 96.06%with an execution time close to 7.5 milliseconds for each individual streaming data point,demonstrating its potential for real-time,resourceconstrained applications.The model uses Principal Component Analysis(PCA)for dimensionality reduction and a Z-score technique for anomaly detection.It maintains a low computational footprint with a sliding window mechanism,enabling incremental data processing and identification of both transient and sustained anomalies without storing historical data.The system uses a Multivariate Linear Regression(MLR)based imputation technique that estimates missing or corrupted sensor values,preserving data integrity prior to anomaly detection.The suggested solution is appropriate for many uses in smart cities,industrial automation,environmental monitoring,IoT security,and intelligent transportation systems,and is particularly well-suited for resource-constrained edge devices.
基金Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of Chinathe Director Fund(No.IS201116002)from Institute of Seismology,CEA
文摘This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.
文摘In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.
文摘The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.
基金partially supported by the Construction of Collaborative Innovation Center of Beijing Academy of Agricultural and Forestry Sciences(KJCX20240406)the Beijing Natural Science Foundation(JQ24037)+1 种基金the National Natural Science Foundation of China(32330075)the Earmarked Fund for China Agriculture Research System(CARS-02 and CARS-54)。
文摘The security of the seed industry is crucial for ensuring national food security.Currently,developed countries in Europe and America,along with international seed industry giants,have entered the Breeding 4.0 era.This era integrates biotechnology,artificial intelligence(AI),and big data information technology.In contrast,China is still in a transition period between stages 2.0 and 3.0,which primarily relies on conventional selection and molecular breeding.In the context of increasingly complex international situations,accurately identifying core issues in China's seed industry innovation and seizing the frontier of international seed technology are strategically important.These efforts are essential for ensuring food security and revitalizing the seed industry.This paper systematically analyzes the characteristics of crop breeding data from artificial selection to intelligent design breeding.It explores the applications and development trends of AI and big data in modern crop breeding from several key perspectives.These include highthroughput phenotype acquisition and analysis,multiomics big data database and management system construction,AI-based multiomics integrated analysis,and the development of intelligent breeding software tools based on biological big data and AI technology.Based on an in-depth analysis of the current status and challenges of China's seed industry technology development,we propose strategic goals and key tasks for China's new generation of AI and big data-driven intelligent design breeding.These suggestions aim to accelerate the development of an intelligent-driven crop breeding engineering system that features large-scale gene mining,efficient gene manipulation,engineered variety design,and systematized biobreeding.This study provides a theoretical basis and practical guidance for the development of China's seed industry technology.
基金sponsored by the U.S.Department of Housing and Urban Development(Grant No.NJLTS0027-22)The opinions expressed in this study are the authors alone,and do not represent the U.S.Depart-ment of HUD’s opinions.
文摘This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.
基金supported by the National Natural Science Foundation of China(32370703)the CAMS Innovation Fund for Medical Sciences(CIFMS)(2022-I2M-1-021,2021-I2M-1-061)the Major Project of Guangzhou National Labora-tory(GZNL2024A01015).
文摘Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.
文摘On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.
基金funded by the National Natural Science Foundation of China(No.42220104008)。
文摘Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tectonic activities.In the big data era,the establishment of new data platforms and the application of big data methods have become a focus for metamorphic rocks.Significant progress has been made in creating specialized databases,compiling comprehensive datasets,and utilizing data analytics to address complex scientific questions.However,many existing databases are inadequate in meeting the specific requirements of metamorphic research,resulting from a substantial amount of valuable data remaining uncollected.Therefore,constructing new databases that can cope with the development of the data era is necessary.This article provides an extensive review of existing databases related to metamorphic rocks and discusses data-driven studies in this.Accordingly,several crucial factors that need to be taken into consideration in the establishment of specialized metamorphic databases are identified,aiming to leverage data-driven applications to achieve broader scientific objectives in metamorphic research.
基金supported by the National Key Research Program(No.2018YFB1601105,No.2018YFB1601102)the Natural Science Foundation of China(No.41975165,No.U1811463)Chongqing Science and Technology Project(No.cstc2019jscxfxydX0035)。
文摘High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.
文摘The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.
基金This work is supported by the National Natural Science Foundation of China(Grant No.51991392)Key Deployment Projects of Chinese Academy of Sciences(Grant No.ZDRW-ZS-2021-3-3)the Second Tibetan Plateau Scientific Expedition and Research Program(STEP)(Grant No.2019QZKK0904).
文摘Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.