This article presents views on the future development of data science,with a particular focus on its importance to artificial intel-ligence(AI).After discussing the challenges of data science,it elu-cidates a possible...This article presents views on the future development of data science,with a particular focus on its importance to artificial intel-ligence(AI).After discussing the challenges of data science,it elu-cidates a possible approach to tackle these challenges by clarifying the logic and principles of data related to the multi-level complex-ity of the world.Finally,urgently required actions are briefly outlined.展开更多
National Population Health Data Center(NPHDC)is one of China's 20 national-level science data centers,jointly designated by the Ministry of Science and Technology and the Ministry of Finance.Operated by the Chines...National Population Health Data Center(NPHDC)is one of China's 20 national-level science data centers,jointly designated by the Ministry of Science and Technology and the Ministry of Finance.Operated by the Chinese Academy of Medical Sciences under the oversight of the National Health Commission,NPHDC adheres to national regulations including the Scientific Data Management Measures and the National Science and Technology Infrastructure Service Platform Management Measures,and is committed to collecting,integrating,managing,and sharing biomedical and health data through openaccess platform,fostering open sharing and engaging in international cooperation.展开更多
Photonuclear data are increasingly used in fundamental nuclear research and technological applications.These data are generated using advanced γ-ray sources.The Shanghai laser electron gamma source(SLEGS)is a new las...Photonuclear data are increasingly used in fundamental nuclear research and technological applications.These data are generated using advanced γ-ray sources.The Shanghai laser electron gamma source(SLEGS)is a new laser Compton scattering γ-ray source at the Shanghai Synchrotron Radiation Facility.It delivers energy-tunable,quasi-monoenergetic gamma beams for high-precision photonuclear measurements.This paper presents the flat-efficiency detector(FED)array at SLEGS and its application in photoneutron cross-section measurements.Systematic uncertainties of the FED array were determined to be 3.02%through calibration with a ^(252)Cf neutron source.Using ^(197)Au and ^(159)Tb as representative nuclei,we demonstrate the format and processing methodology for raw photoneutron data.The results validate SLEGS’capability for high-precision photoneutron measurements.展开更多
Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning fr...Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.展开更多
Data space,as an innovative data management and sharing model,is emerging in the medical and health sectors.This study expounds on the conceptual connotation of data space and delineates its key technologies,including...Data space,as an innovative data management and sharing model,is emerging in the medical and health sectors.This study expounds on the conceptual connotation of data space and delineates its key technologies,including distributed data storage,standardization and interoperability of data sharing,data security and privacy protection,data analysis and mining,and data space assessment.By analyzing the real-world cases of data spaces within medicine and health,this study compares the similarities and differences across various dimensions such as purpose,architecture,data interoperability,and privacy protection.Meanwhile,data spaces in these fields are challenged by the limited computing resources,the complexities of data integration,and the need for optimized algorithms.Additionally,legal and ethical issues such as unclear data ownership,undefined usage rights,risks associated with privacy protection need to be addressed.The study notes organizational and management difficulties,calling for enhancements in governance framework,data sharing mechanisms,and value assessment systems.In the future,technological innovation,sound regulations,and optimized management will help the development of the medical and health data space.These developments will enable the secure and efficient utilization of data,propelling the medical industry into an era characterized by precision,intelligence,and personalization.展开更多
In the era of digital intelligence,data is a key element in promoting social and economic development.Educational data,as a vital component of data,not only supports teaching and learning but also contains much sensit...In the era of digital intelligence,data is a key element in promoting social and economic development.Educational data,as a vital component of data,not only supports teaching and learning but also contains much sensitive information.How to effectively categorize and protect sensitive data has become an urgent issue in educational data security.This paper systematically researches and constructs a multi-dimensional classification framework for sensitive educational data,and discusses its security protection strategy from the aspects of identification and desensitization,aiming to provide new ideas for the security management of sensitive educational data and to help the construction of an educational data security ecosystem in the era of digital intelligence.展开更多
Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for...Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.展开更多
The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,s...The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,such as Artificial Intelligence(AI)and machine learning,to make accurate decisions.Data science is the science of dealing with data and its relationships through intelligent approaches.Most state-of-the-art research focuses independently on either data science or IIoT,rather than exploring their integration.Therefore,to address the gap,this article provides a comprehensive survey on the advances and integration of data science with the Intelligent IoT(IIoT)system by classifying the existing IoT-based data science techniques and presenting a summary of various characteristics.The paper analyzes the data science or big data security and privacy features,including network architecture,data protection,and continuous monitoring of data,which face challenges in various IoT-based systems.Extensive insights into IoT data security,privacy,and challenges are visualized in the context of data science for IoT.In addition,this study reveals the current opportunities to enhance data science and IoT market development.The current gap and challenges faced in the integration of data science and IoT are comprehensively presented,followed by the future outlook and possible solutions.展开更多
With the advent of the big data era,real-time data analysis and decision-support systems have been recognized as essential tools for enhancing enterprise competitiveness and optimizing the decision-making process.This...With the advent of the big data era,real-time data analysis and decision-support systems have been recognized as essential tools for enhancing enterprise competitiveness and optimizing the decision-making process.This study aims to explore the development strategies of real-time data analysis and decision-support systems,and analyze their application status and future development trends in various industries.The article first reviews the basic concepts and importance of real-time data analysis and decision-support systems,and then discusses in detail the key technical aspects such as system architecture,data collection and processing,analysis methods,and visualization techniques.展开更多
Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to...Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to pose significant barriers to clinical research progress.In response,our research team has embarked on the development of a specialized clinical research database for cardiology,thereby establishing a comprehensive digital platform that facilitates both clinical decision-making and research endeavors.Methods The database incorporated actual clinical data from patients who received treatment at the Cardiovascular Medicine Department of Chinese PLA General Hospital from 2012 to 2021.It included comprehensive data on patients'basic information,medical history,non-invasive imaging studies,laboratory test results,as well as peri-procedural information related to interventional surgeries,extracted from the Hospital Information System.Additionally,an innovative artificial intelligence(AI)-powered interactive follow-up system had been developed,ensuring that nearly all myocardial infarction patients received at least one post-discharge follow-up,thereby achieving comprehensive data management throughout the entire care continuum for highrisk patients.Results This database integrates extensive cross-sectional and longitudinal patient data,with a focus on higher-risk acute coronary syndrome patients.It achieves the integration of structured and unstructured clinical data,while innovatively incorporating AI and automatic speech recognition technologies to enhance data integration and workflow efficiency.It creates a comprehensive patient view,thereby improving diagnostic and follow-up quality,and provides high-quality data to support clinical research.Despite limitations in unstructured data standardization and biological sample integrity,the database's development is accompanied by ongoing optimization efforts.Conclusion The cardiovascular specialty clinical database is a comprehensive digital archive integrating clinical treatment and research,which facilitates the digital and intelligent transformation of clinical diagnosis and treatment processes.It supports clinical decision-making and offers data support and potential research directions for the specialized management of cardiovascular diseases.展开更多
With the rapid development of information technology,data security issues have received increasing attention.Data encryption and decryption technology,as a key means of ensuring data security,plays an important role i...With the rapid development of information technology,data security issues have received increasing attention.Data encryption and decryption technology,as a key means of ensuring data security,plays an important role in multiple fields such as communication security,data storage,and data recovery.This article explores the fundamental principles and interrelationships of data encryption and decryption,examines the strengths,weaknesses,and applicability of symmetric,asymmetric,and hybrid encryption algorithms,and introduces key application scenarios for data encryption and decryption technology.It examines the challenges and corresponding countermeasures related to encryption algorithm security,key management,and encryption-decryption performance.Finally,it analyzes the development trends and future prospects of data encryption and decryption technology.This article provides a systematic understanding of data encryption and decryption techniques,which has good reference value for software designers.展开更多
This paper analyzes the advantages of legal digital currencies and explores their impact on bank big data practices.By combining bank big data collection and processing,it clarifies that legal digital currencies can e...This paper analyzes the advantages of legal digital currencies and explores their impact on bank big data practices.By combining bank big data collection and processing,it clarifies that legal digital currencies can enhance the efficiency of bank data processing,enrich data types,and strengthen data analysis and application capabilities.In response to future development needs,it is necessary to strengthen data collection management,enhance data processing capabilities,innovate big data application models,and provide references for bank big data practices,promoting the transformation and upgrading of the banking industry in the context of legal digital currencies.展开更多
In this paper,the application of agricultural big data in agricultural economic management is deeply explored,and its potential in promoting profit growth and innovation is analyzed.However,challenges persist in data ...In this paper,the application of agricultural big data in agricultural economic management is deeply explored,and its potential in promoting profit growth and innovation is analyzed.However,challenges persist in data collection and integration,limitations of analytical technologies,talent development,team building,and policy support when applying agricultural big data.Effective application strategies are proposed,including data-driven precision agriculture practices,construction of data integration and management platforms,data security and privacy protection strategies,as well as long-term planning and development strategies for agricultural big data,to maximize its impact on agricultural economic management.Future advancements require collaborative efforts in technological innovation,talent cultivation,and policy support,to realize the extensive application of agricultural big data in agricultural economic management and ensure sustainable industrial development.展开更多
On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th Nation...On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.展开更多
This work focuses on enhancing low frequency seismic data using a convolutional neural network trained on synthetic data.Traditional seismic data often lack both high and low frequencies,which are essential for detail...This work focuses on enhancing low frequency seismic data using a convolutional neural network trained on synthetic data.Traditional seismic data often lack both high and low frequencies,which are essential for detailed geological interpretation and various geophysical applications.Low frequency data is particularly valuable for reducing wavelet sidelobes and improving full waveform inversion(FWI).Conventional methods for bandwidth extension include seismic deconvolution and sparse inversion,which have limitations in recovering low frequencies.The study explores the potential of the U-net,which has been successful in other geophysical applications such as noise attenuation and seismic resolution enhancement.The novelty in our approach is that we do not rely on computationally expensive finite difference modelling to create training data.Instead,our synthetic training data is created from individual randomly perturbed events with variations in bandwidth,making it more adaptable to different data sets compared to previous deep learning methods.The method was tested on both synthetic and real seismic data,demonstrating effective low frequency reconstruction and sidelobe reduction.With a synthetic full waveform inversion to recover a velocity model and a seismic amplitude inversion to estimate acoustic impedance we demonstrate the validity and benefit of the proposed method.Overall,the study presents a robust approach to seismic bandwidth extension using deep learning,emphasizing the importance of diverse and well-designed but computationally inexpensive synthetic training data.展开更多
Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tecto...Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tectonic activities.In the big data era,the establishment of new data platforms and the application of big data methods have become a focus for metamorphic rocks.Significant progress has been made in creating specialized databases,compiling comprehensive datasets,and utilizing data analytics to address complex scientific questions.However,many existing databases are inadequate in meeting the specific requirements of metamorphic research,resulting from a substantial amount of valuable data remaining uncollected.Therefore,constructing new databases that can cope with the development of the data era is necessary.This article provides an extensive review of existing databases related to metamorphic rocks and discusses data-driven studies in this.Accordingly,several crucial factors that need to be taken into consideration in the establishment of specialized metamorphic databases are identified,aiming to leverage data-driven applications to achieve broader scientific objectives in metamorphic research.展开更多
The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utilit...The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utility of these ancient genomic datasets,a range of databases and advanced statistical models have been developed,including the Allen Ancient DNA Resource(AADR)(Mallick et al.,2024)and AdmixTools(Patterson et al.,2012).While upstream processes such as sequencing and raw data processing have been streamlined by resources like the AADR,the downstream analysis of these datasets-encompassing population genetics inference and spatiotemporal interpretation-remains a significant challenge.The AADR provides a unified collection of published ancient DNA(aDNA)data,yet its file-based format and reliance on command-line tools,such as those in Admix-Tools(Patterson et al.,2012),require advanced computational expertise for effective exploration and analysis.These requirements can present significant challenges forresearchers lackingadvanced computational expertise,limiting the accessibility and broader application of these valuable genomic resources.展开更多
Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced tran...Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.展开更多
Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of ...Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of China’s Major Research Plan entitled“Fundamental Researches on the Formation and Response Mechanism of the Air Pollution Complex in China”(or the Plan)has funded 76 research projects to explore the causes of air pollution in China,and the key processes of air pollution in atmospheric physics and atmospheric chemistry.In order to summarize the abundant data from the Plan and exhibit the long-term impacts domestically and internationally,an integration project is responsible for collecting the various types of data generated by the 76 projects of the Plan.This project has classified and integrated these data,forming eight categories containing 258 datasets and 15 technical reports in total.The integration project has led to the successful establishment of the China Air Pollution Data Center(CAPDC)platform,providing storage,retrieval,and download services for the eight categories.This platform has distinct features including data visualization,related project information querying,and bilingual services in both English and Chinese,which allows for rapid searching and downloading of data and provides a solid foundation of data and support for future related research.Air pollution control in China,especially in the past decade,is undeniably a global exemplar,and this data center is the first in China to focus on research into the country’s air pollution complex.展开更多
As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and oper...As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and operation and supervision[1,2].Healthcare data elements include biolog.ical and clinical data that are related to disease,environ-mental health data that are associated with life,and operational and healthcare management data that are related to healthcare activities(Figure 1).Activities such as the construction of a data value assessment system,the devel-opment of a data circulation and sharing platform,and the authorization of data compliance and operation products support the strong growth momentum of the market for health care data elements in China[3].展开更多
文摘This article presents views on the future development of data science,with a particular focus on its importance to artificial intel-ligence(AI).After discussing the challenges of data science,it elu-cidates a possible approach to tackle these challenges by clarifying the logic and principles of data related to the multi-level complex-ity of the world.Finally,urgently required actions are briefly outlined.
文摘National Population Health Data Center(NPHDC)is one of China's 20 national-level science data centers,jointly designated by the Ministry of Science and Technology and the Ministry of Finance.Operated by the Chinese Academy of Medical Sciences under the oversight of the National Health Commission,NPHDC adheres to national regulations including the Scientific Data Management Measures and the National Science and Technology Infrastructure Service Platform Management Measures,and is committed to collecting,integrating,managing,and sharing biomedical and health data through openaccess platform,fostering open sharing and engaging in international cooperation.
基金supported by National Key Research and Development Program of China(Nos.2022YFA1602404 and 2023YFA1606901)the National Natural Science Foundation of China(Nos.12275338,12388102,and U2441221)the Key Laboratory of Nuclear Data Foundation(JCKY2022201C152).
文摘Photonuclear data are increasingly used in fundamental nuclear research and technological applications.These data are generated using advanced γ-ray sources.The Shanghai laser electron gamma source(SLEGS)is a new laser Compton scattering γ-ray source at the Shanghai Synchrotron Radiation Facility.It delivers energy-tunable,quasi-monoenergetic gamma beams for high-precision photonuclear measurements.This paper presents the flat-efficiency detector(FED)array at SLEGS and its application in photoneutron cross-section measurements.Systematic uncertainties of the FED array were determined to be 3.02%through calibration with a ^(252)Cf neutron source.Using ^(197)Au and ^(159)Tb as representative nuclei,we demonstrate the format and processing methodology for raw photoneutron data.The results validate SLEGS’capability for high-precision photoneutron measurements.
基金supported by the National Natural Science Foundation of China(32370703)the CAMS Innovation Fund for Medical Sciences(CIFMS)(2022-I2M-1-021,2021-I2M-1-061)the Major Project of Guangzhou National Labora-tory(GZNL2024A01015).
文摘Viral infectious diseases,characterized by their intricate nature and wide-ranging diversity,pose substantial challenges in the domain of data management.The vast volume of data generated by these diseases,spanning from the molecular mechanisms within cells to large-scale epidemiological patterns,has surpassed the capabilities of traditional analytical methods.In the era of artificial intelligence(AI)and big data,there is an urgent necessity for the optimization of these analytical methods to more effectively handle and utilize the information.Despite the rapid accumulation of data associated with viral infections,the lack of a comprehensive framework for integrating,selecting,and analyzing these datasets has left numerous researchers uncertain about which data to select,how to access it,and how to utilize it most effectively in their research.This review endeavors to fill these gaps by exploring the multifaceted nature of viral infectious diseases and summarizing relevant data across multiple levels,from the molecular details of pathogens to broad epidemiological trends.The scope extends from the micro-scale to the macro-scale,encompassing pathogens,hosts,and vectors.In addition to data summarization,this review thoroughly investigates various dataset sources.It also traces the historical evolution of data collection in the field of viral infectious diseases,highlighting the progress achieved over time.Simultaneously,it evaluates the current limitations that impede data utilization.Furthermore,we propose strategies to surmount these challenges,focusing on the development and application of advanced computational techniques,AI-driven models,and enhanced data integration practices.By providing a comprehensive synthesis of existing knowledge,this review is designed to guide future research and contribute to more informed approaches in the surveillance,prevention,and control of viral infectious diseases,particularly within the context of the expanding big-data landscape.
文摘Data space,as an innovative data management and sharing model,is emerging in the medical and health sectors.This study expounds on the conceptual connotation of data space and delineates its key technologies,including distributed data storage,standardization and interoperability of data sharing,data security and privacy protection,data analysis and mining,and data space assessment.By analyzing the real-world cases of data spaces within medicine and health,this study compares the similarities and differences across various dimensions such as purpose,architecture,data interoperability,and privacy protection.Meanwhile,data spaces in these fields are challenged by the limited computing resources,the complexities of data integration,and the need for optimized algorithms.Additionally,legal and ethical issues such as unclear data ownership,undefined usage rights,risks associated with privacy protection need to be addressed.The study notes organizational and management difficulties,calling for enhancements in governance framework,data sharing mechanisms,and value assessment systems.In the future,technological innovation,sound regulations,and optimized management will help the development of the medical and health data space.These developments will enable the secure and efficient utilization of data,propelling the medical industry into an era characterized by precision,intelligence,and personalization.
基金Education Science planning project of Jiangsu Province in 2024(Grant No:B-b/2024/01/152)2025 Jiangsu Normal University Graduate Research and Innovation Program school-level project“Research on the Construction and Desensitization Strategies of Education Sensitive Data Classification from the Perspective of Educational Ecology”。
文摘In the era of digital intelligence,data is a key element in promoting social and economic development.Educational data,as a vital component of data,not only supports teaching and learning but also contains much sensitive information.How to effectively categorize and protect sensitive data has become an urgent issue in educational data security.This paper systematically researches and constructs a multi-dimensional classification framework for sensitive educational data,and discusses its security protection strategy from the aspects of identification and desensitization,aiming to provide new ideas for the security management of sensitive educational data and to help the construction of an educational data security ecosystem in the era of digital intelligence.
基金supported by the National Language Commission to research on sign language data specifications for artificial intelligence applications and test standards for language service translation systems (No.ZDI145-70)。
文摘Sign language dataset is essential in sign language recognition and translation(SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer(TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition(SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint(VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate(WER) than the unscreened dataset.
基金supported in part by the National Natural Science Foundation of China under Grant 62371181in part by the Changzhou Science and Technology International Cooperation Program under Grant CZ20230029+1 种基金supported by a National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(2021R1A2B5B02087169)supported under the framework of international cooperation program managed by the National Research Foundation of Korea(2022K2A9A1A01098051)。
文摘The Intelligent Internet of Things(IIoT)involves real-world things that communicate or interact with each other through networking technologies by collecting data from these“things”and using intelligent approaches,such as Artificial Intelligence(AI)and machine learning,to make accurate decisions.Data science is the science of dealing with data and its relationships through intelligent approaches.Most state-of-the-art research focuses independently on either data science or IIoT,rather than exploring their integration.Therefore,to address the gap,this article provides a comprehensive survey on the advances and integration of data science with the Intelligent IoT(IIoT)system by classifying the existing IoT-based data science techniques and presenting a summary of various characteristics.The paper analyzes the data science or big data security and privacy features,including network architecture,data protection,and continuous monitoring of data,which face challenges in various IoT-based systems.Extensive insights into IoT data security,privacy,and challenges are visualized in the context of data science for IoT.In addition,this study reveals the current opportunities to enhance data science and IoT market development.The current gap and challenges faced in the integration of data science and IoT are comprehensively presented,followed by the future outlook and possible solutions.
文摘With the advent of the big data era,real-time data analysis and decision-support systems have been recognized as essential tools for enhancing enterprise competitiveness and optimizing the decision-making process.This study aims to explore the development strategies of real-time data analysis and decision-support systems,and analyze their application status and future development trends in various industries.The article first reviews the basic concepts and importance of real-time data analysis and decision-support systems,and then discusses in detail the key technical aspects such as system architecture,data collection and processing,analysis methods,and visualization techniques.
基金Noncommunicable Chronic Diseases-National Science and Technology Major Project(2023ZD0503906)。
文摘Background Medical informatics accumulated vast amounts of data for clinical diagnosis and treatment.However,limited access to follow-up data and the difficulty in integrating data across diverse platforms continue to pose significant barriers to clinical research progress.In response,our research team has embarked on the development of a specialized clinical research database for cardiology,thereby establishing a comprehensive digital platform that facilitates both clinical decision-making and research endeavors.Methods The database incorporated actual clinical data from patients who received treatment at the Cardiovascular Medicine Department of Chinese PLA General Hospital from 2012 to 2021.It included comprehensive data on patients'basic information,medical history,non-invasive imaging studies,laboratory test results,as well as peri-procedural information related to interventional surgeries,extracted from the Hospital Information System.Additionally,an innovative artificial intelligence(AI)-powered interactive follow-up system had been developed,ensuring that nearly all myocardial infarction patients received at least one post-discharge follow-up,thereby achieving comprehensive data management throughout the entire care continuum for highrisk patients.Results This database integrates extensive cross-sectional and longitudinal patient data,with a focus on higher-risk acute coronary syndrome patients.It achieves the integration of structured and unstructured clinical data,while innovatively incorporating AI and automatic speech recognition technologies to enhance data integration and workflow efficiency.It creates a comprehensive patient view,thereby improving diagnostic and follow-up quality,and provides high-quality data to support clinical research.Despite limitations in unstructured data standardization and biological sample integrity,the database's development is accompanied by ongoing optimization efforts.Conclusion The cardiovascular specialty clinical database is a comprehensive digital archive integrating clinical treatment and research,which facilitates the digital and intelligent transformation of clinical diagnosis and treatment processes.It supports clinical decision-making and offers data support and potential research directions for the specialized management of cardiovascular diseases.
文摘With the rapid development of information technology,data security issues have received increasing attention.Data encryption and decryption technology,as a key means of ensuring data security,plays an important role in multiple fields such as communication security,data storage,and data recovery.This article explores the fundamental principles and interrelationships of data encryption and decryption,examines the strengths,weaknesses,and applicability of symmetric,asymmetric,and hybrid encryption algorithms,and introduces key application scenarios for data encryption and decryption technology.It examines the challenges and corresponding countermeasures related to encryption algorithm security,key management,and encryption-decryption performance.Finally,it analyzes the development trends and future prospects of data encryption and decryption technology.This article provides a systematic understanding of data encryption and decryption techniques,which has good reference value for software designers.
文摘This paper analyzes the advantages of legal digital currencies and explores their impact on bank big data practices.By combining bank big data collection and processing,it clarifies that legal digital currencies can enhance the efficiency of bank data processing,enrich data types,and strengthen data analysis and application capabilities.In response to future development needs,it is necessary to strengthen data collection management,enhance data processing capabilities,innovate big data application models,and provide references for bank big data practices,promoting the transformation and upgrading of the banking industry in the context of legal digital currencies.
基金Supported by Research and Application of Soil Collection Software and Soil Ecological Big Data Platform in Guangxi Woodland(GUILINKEYAN[2022ZC]44)Construction of Soil Information Database and Visualization System for Artificial Forests in Central Guangxi(2023GXZCLK62).
文摘In this paper,the application of agricultural big data in agricultural economic management is deeply explored,and its potential in promoting profit growth and innovation is analyzed.However,challenges persist in data collection and integration,limitations of analytical technologies,talent development,team building,and policy support when applying agricultural big data.Effective application strategies are proposed,including data-driven precision agriculture practices,construction of data integration and management platforms,data security and privacy protection strategies,as well as long-term planning and development strategies for agricultural big data,to maximize its impact on agricultural economic management.Future advancements require collaborative efforts in technological innovation,talent cultivation,and policy support,to realize the extensive application of agricultural big data in agricultural economic management and ensure sustainable industrial development.
文摘On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.
文摘This work focuses on enhancing low frequency seismic data using a convolutional neural network trained on synthetic data.Traditional seismic data often lack both high and low frequencies,which are essential for detailed geological interpretation and various geophysical applications.Low frequency data is particularly valuable for reducing wavelet sidelobes and improving full waveform inversion(FWI).Conventional methods for bandwidth extension include seismic deconvolution and sparse inversion,which have limitations in recovering low frequencies.The study explores the potential of the U-net,which has been successful in other geophysical applications such as noise attenuation and seismic resolution enhancement.The novelty in our approach is that we do not rely on computationally expensive finite difference modelling to create training data.Instead,our synthetic training data is created from individual randomly perturbed events with variations in bandwidth,making it more adaptable to different data sets compared to previous deep learning methods.The method was tested on both synthetic and real seismic data,demonstrating effective low frequency reconstruction and sidelobe reduction.With a synthetic full waveform inversion to recover a velocity model and a seismic amplitude inversion to estimate acoustic impedance we demonstrate the validity and benefit of the proposed method.Overall,the study presents a robust approach to seismic bandwidth extension using deep learning,emphasizing the importance of diverse and well-designed but computationally inexpensive synthetic training data.
基金funded by the National Natural Science Foundation of China(No.42220104008)。
文摘Research into metamorphism plays a pivotal role in reconstructing the evolution of continent,particularly through the study of ancient rocks that are highly susceptible to metamorphic alterations due to multiple tectonic activities.In the big data era,the establishment of new data platforms and the application of big data methods have become a focus for metamorphic rocks.Significant progress has been made in creating specialized databases,compiling comprehensive datasets,and utilizing data analytics to address complex scientific questions.However,many existing databases are inadequate in meeting the specific requirements of metamorphic research,resulting from a substantial amount of valuable data remaining uncollected.Therefore,constructing new databases that can cope with the development of the data era is necessary.This article provides an extensive review of existing databases related to metamorphic rocks and discusses data-driven studies in this.Accordingly,several crucial factors that need to be taken into consideration in the establishment of specialized metamorphic databases are identified,aiming to leverage data-driven applications to achieve broader scientific objectives in metamorphic research.
基金by the National Key Research and Development Program of China(2023YFC3303701-02 and 2024YFC3306701)the National Natural Science Foundation of China(T2425014 and 32270667)+3 种基金the Natural Science Foundation of Fujian Province of China(2023J06013)the Major Project of the National Social Science Foundation of China granted to Chuan-Chao Wang(21&ZD285)Open Research Fund of State Key Laboratory of Genetic Engineering at Fudan University(SKLGE-2310)Open Research Fund of Forensic Genetics Key Laboratory of the Ministry of Public Security(2023FGKFKT07).
文摘The analysis of ancient genomics provides opportunities to explore human population history across both temporal and geographic dimensions(Haak et al.,2015;Wang et al.,2021,2024)to enhance the accessibility and utility of these ancient genomic datasets,a range of databases and advanced statistical models have been developed,including the Allen Ancient DNA Resource(AADR)(Mallick et al.,2024)and AdmixTools(Patterson et al.,2012).While upstream processes such as sequencing and raw data processing have been streamlined by resources like the AADR,the downstream analysis of these datasets-encompassing population genetics inference and spatiotemporal interpretation-remains a significant challenge.The AADR provides a unified collection of published ancient DNA(aDNA)data,yet its file-based format and reliance on command-line tools,such as those in Admix-Tools(Patterson et al.,2012),require advanced computational expertise for effective exploration and analysis.These requirements can present significant challenges forresearchers lackingadvanced computational expertise,limiting the accessibility and broader application of these valuable genomic resources.
基金research was funded by Science and Technology Project of State Grid Corporation of China under grant number 5200-202319382A-2-3-XG.
文摘Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.
基金supported by the National Natural Science Foundation of China(Grant No.92044303)。
文摘Air pollution in China covers a large area with complex sources and formation mechanisms,making it a unique place to conduct air pollution and atmospheric chemistry research.The National Natural Science Foundation of China’s Major Research Plan entitled“Fundamental Researches on the Formation and Response Mechanism of the Air Pollution Complex in China”(or the Plan)has funded 76 research projects to explore the causes of air pollution in China,and the key processes of air pollution in atmospheric physics and atmospheric chemistry.In order to summarize the abundant data from the Plan and exhibit the long-term impacts domestically and internationally,an integration project is responsible for collecting the various types of data generated by the 76 projects of the Plan.This project has classified and integrated these data,forming eight categories containing 258 datasets and 15 technical reports in total.The integration project has led to the successful establishment of the China Air Pollution Data Center(CAPDC)platform,providing storage,retrieval,and download services for the eight categories.This platform has distinct features including data visualization,related project information querying,and bilingual services in both English and Chinese,which allows for rapid searching and downloading of data and provides a solid foundation of data and support for future related research.Air pollution control in China,especially in the past decade,is undeniably a global exemplar,and this data center is the first in China to focus on research into the country’s air pollution complex.
基金supported by National Natural Science Foundation of China(Grants 72474022,71974011,72174022,71972012,71874009)"BIT think tank"Promotion Plan of Science and Technology Innovation Program of Beijing Institute of Technology(Grants 2024CX14017,2023CX13029).
文摘As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and operation and supervision[1,2].Healthcare data elements include biolog.ical and clinical data that are related to disease,environ-mental health data that are associated with life,and operational and healthcare management data that are related to healthcare activities(Figure 1).Activities such as the construction of a data value assessment system,the devel-opment of a data circulation and sharing platform,and the authorization of data compliance and operation products support the strong growth momentum of the market for health care data elements in China[3].