The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th Nation...On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.展开更多
Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthc...Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthcare dataset standards from 2011 to 2025.It analyzes their evolution across types,applications,institutions,and themes,highlighting key achievements including substantial growth in quantity,optimized typology,expansion into innovative application scenarios such as health decision support,and broadened institutional involvement.The study also identifies critical challenges,including imbalanced development,insufficient quality control,and a lack of essential metadata—such as authoritative data element mappings and privacy annotations—which hampers the delivery of intelligent services.To address these challenges,the study proposes a multi-faceted strategy focused on optimizing the standard system's architecture,enhancing quality and implementation,and advancing both data governance—through authoritative tracing and privacy protection—and intelligent service provision.These strategies aim to promote the application of dataset standards,thereby fostering and securing the development of new productive forces in healthcare.展开更多
Objectives:Electronic health records(EHRs)offer valuable real-world data(RWD)for Chinese medicine research.However,significant methodological challenges remain in developing integrative Chinese-Western medicine(ICWM)d...Objectives:Electronic health records(EHRs)offer valuable real-world data(RWD)for Chinese medicine research.However,significant methodological challenges remain in developing integrative Chinese-Western medicine(ICWM)databases.This study aims to establish a best-practice methodological framework,referred to as BRIDGE,to guide the construction of ICWM databases using EHRs.Methods:We developed the methodological framework through a comprehensive process,including systematic literature review,synthesis of empirical experiences,thematic expert discussions,and consultation with an external panel to reach consensus.Results:The BRIDGE framework outlines 6 core components for ICWM-EHR database development:Overall design,database architecture,data extraction and linkage,data governance,data verification,and data quality evaluation.Key data elements include variables related to study population,treatment or exposure,outcomes,and confounders.These databases support various research applications,particularly in evaluating the effectiveness and safety of integrative therapies.To demonstrate its practical value,we developed an ICWM-EHR database on women’s reproductive lifespan,encompassing 2,064,482 patients.This database captures women’s health conditions across the life course,from reproductive age to older adulthood.Conclusions:The BRIDGE methodological framework provides a standardized approach to building high-quality ICWM-EHR databases.It offers a unique opportunity to strengthen the methodological rigor and real-world relevance of Chinese medicine research in integrated healthcare settings.展开更多
This article introduces the methodologies and instrumentation for data measurement and propagation at the Back-n white neutron facility of the China Spallation Neutron Source.The Back-n facility employs backscattering...This article introduces the methodologies and instrumentation for data measurement and propagation at the Back-n white neutron facility of the China Spallation Neutron Source.The Back-n facility employs backscattering techniques to generate a broad spectrum of white neutrons.Equipped with advanced detectors such as the light particle detector array and the fission ionization chamber detector,the facility achieves high-precision data acquisition through a general-purpose electronics system.Data were managed and stored in a hierarchical system supported by the National High Energy Physics Science Data Center,ensuring long-term preservation and efficient access.The data from the Back-n experiments significantly contribute to nuclear physics,reactor design,astrophysics,and medical physics,enhancing the understanding of nuclear processes and supporting interdisciplinary research.展开更多
Many complex systems are frequently subject to the influence of uncertain disturbances,which can exert a profound effect on the critical transitions(CTs),potentially resulting in catastrophic consequences.Consequently...Many complex systems are frequently subject to the influence of uncertain disturbances,which can exert a profound effect on the critical transitions(CTs),potentially resulting in catastrophic consequences.Consequently,it is of uttermost importance to provide warnings for noise-induced CTs in various applications.Although capturing certain generic symptoms of transition behaviors from observational and simulated data poses a challenging problem,this work attempts to extract information regarding CTs from simulated data of a Gaussian white noise-induced tri-stable system.Using the extended dynamic mode decomposition(EDMD)algorithm,we initially obtain finite-dimensional approximations of both the stochastic Koopman operator and the generator.Subsequently,the drift parameters and the noise intensity within the system are identified from the simulated data.Utilizing the identified system,the parameter-dependent basin of the unsafe regime(PDBUR)is quantified,enabling data-driven early warning of Gaussian white noise-induced CTs.Finally,an error analysis is carried out to verify the effectiveness of the data-driven results.Our findings may serve as a paradigm for understanding and predicting noise-induced CTs in complex systems based on data.展开更多
Sustained and spatially explicit monitoring of the United Nations 2030 Agenda for Sustainable Development is critical for effectively tracking progress toward the global Sustainable Development Goals(SDGs).Although la...Sustained and spatially explicit monitoring of the United Nations 2030 Agenda for Sustainable Development is critical for effectively tracking progress toward the global Sustainable Development Goals(SDGs).Although land cover information has long been recognized as an essential component for monitoring SDGs,a standardized scientific framework for identifying and prioritizing land cover related essential variables does not exist.Therefore,we propose a novel expert-and data-driven framework for identifying,refining,and selecting a priority list of Essential Land cover-related Variables for SDGs(ELcV4SDGs).This framework integrates methods including expert knowledge-based analysis,clustering of variables with similar attributes,and quantified index calculation to establish the priority list.Applying the framework to 15 specific SDG indicators,we found that the ELcV4SDGs priority list comprises three main categories,type and structure,pattern and intensity,and process and evolution of land cover,which are further divided into 19 subcategories and ultimately encompass 50 general variables.The ELcV4SDGs will support detailed spatial monitoring and enhance their scientific applications for SDG monitoring and assessment,thereby guiding future SDG priority actions and informing decision-making to advance the 2030 SDGs agenda at local,national,and global levels.展开更多
Remote sensing plays a pivotal role in forest inventory by enabling efficient large-scale monitoring while minimizing fieldwork costs.However,missing values pose a critical challenge in remote sensing applications,as ...Remote sensing plays a pivotal role in forest inventory by enabling efficient large-scale monitoring while minimizing fieldwork costs.However,missing values pose a critical challenge in remote sensing applications,as ignoring or mishandling such data gaps can introduce systematic bias into the estimation of target variables for natural resource monitoring.This can lead to cascading errors that propagate through forest and ecosystem management decisions,ultimately hindering progress toward sustainable forest management,biodiversity conservation,and climate change mitigation strategies.This study aims to propose and demonstrate a procedure that employs hybrid estimators to address the limitations of missing remotely sensed data in forest inventory,using Landsat 7 ETM+SLC-off data as an archived source for forest resource monitoring as a case in point.We compared forest inventory estimates from the hybrid estimator with those from a conventional model-based(CMB)estimator using Sentinel-2 data without missing values.Monte Carlo simulations revealed three key findings:(1)The hybrid estimator,leveraging missing-data remote sensing represented by Landsat 7 ETM+SLCoff data,achieved a sampling precision of over 90%,meeting China's national standard for the National Forest Inventory(NFI);(2)The hybrid estimator demonstrated comparable efficiency to the CMB estimator;(3)The uncertainty associated with hybrid estimators was primarily dominated by model parameter estimation,which could be effectively mitigated by slightly increasing the training sample size or refining model specification.Overall,in forest inventory,the hybrid estimator can surmount the limitations posed by missing values in remotely sensed auxiliary data,effectively balancing cost-effectiveness and flexibility.展开更多
【Objective】Medical imaging data has great value,but it contains a significant amount of sensitive information about patients.At present,laws and regulations regarding to the de-identification of medical imaging data...【Objective】Medical imaging data has great value,but it contains a significant amount of sensitive information about patients.At present,laws and regulations regarding to the de-identification of medical imaging data are not clearly defined around the world.This study aims to develop a tool that meets compliance-driven desensitization requirements tailored to diverse research needs.【Methods】To enhance the security of medical image data,we designed and implemented a DICOM format medical image de-identification system on the Windows operating system.【Results】Our custom de-identification system is adaptable to the legal standards of different countries and can accommodate specific research demands.The system offers both web-based online and desktop offline de-identification capabilities,enabling customization of de-identification rules and facilitating batch processing to improve efficiency.【Conclusions】This medical image de-identification system robustly strengthens the stewardship of sensitive medical data,aligning with data security protection requirements while facilitating the sharing and utilization of medical image data.This approach unlocks the intrinsic value inherent in such datasets.展开更多
As smart grid technology rapidly advances,the vast amount of user data collected by smart meter presents significant challenges in data security and privacy protection.Current research emphasizes data security and use...As smart grid technology rapidly advances,the vast amount of user data collected by smart meter presents significant challenges in data security and privacy protection.Current research emphasizes data security and user privacy concerns within smart grids.However,existing methods struggle with efficiency and security when processing large-scale data.Balancing efficient data processing with stringent privacy protection during data aggregation in smart grids remains an urgent challenge.This paper proposes an AI-based multi-type data aggregation method designed to enhance aggregation efficiency and security by standardizing and normalizing various data modalities.The approach optimizes data preprocessing,integrates Long Short-Term Memory(LSTM)networks for handling time-series data,and employs homomorphic encryption to safeguard user privacy.It also explores the application of Boneh Lynn Shacham(BLS)signatures for user authentication.The proposed scheme’s efficiency,security,and privacy protection capabilities are validated through rigorous security proofs and experimental analysis.展开更多
The characteristic databases in China face issues such as narrow resource coverage,low levels of standardization and normalization,and limited data sharing.To address these challenges,this paper proposes the concept o...The characteristic databases in China face issues such as narrow resource coverage,low levels of standardization and normalization,and limited data sharing.To address these challenges,this paper proposes the concept of characteristic databases alliance,using marine characteristic databases as a case for feasibility analysis and discussion.The paper outlines the development path for such alliances and offers recommendations for future growth,aiming to establish a collaborative platform for the development of characteristic databases.展开更多
With the rise of data-intensive research,data literacy has become a critical capability for improving scientific data quality and achieving artificial intelligence(AI)readiness.In the biomedical domain,data are charac...With the rise of data-intensive research,data literacy has become a critical capability for improving scientific data quality and achieving artificial intelligence(AI)readiness.In the biomedical domain,data are characterized by high complexity and privacy sensitivity,calling for robust and systematic data management skills.This paper reviews current trends in scientific data governance and the evolving policy landscape,highlighting persistent challenges such as inconsistent standards,semantic misalignment,and limited awareness of compliance.These issues are largely rooted in the lack of structured training and practical support for researchers.In response,this study builds on existing data literacy frameworks and integrates the specific demands of biomedical research to propose a comprehensive,lifecycle-oriented data literacy competency model with an emphasis on ethics and regulatory awareness.Furthermore,it outlines a tiered training strategy tailored to different research stages—undergraduate,graduate,and professional,offering theoretical foundations and practical pathways for universities and research institutions to advance data literacy education.展开更多
Pelvic floor dysfunction(PFD),including conditions such as stress urinary incontinence,pelvic organ prolapse,and fecal incontinence,significantly affects women's quality of life and their physical and mental healt...Pelvic floor dysfunction(PFD),including conditions such as stress urinary incontinence,pelvic organ prolapse,and fecal incontinence,significantly affects women's quality of life and their physical and mental health.With advancement of digital medicine,the systematic collection of data and the high-quality development of database platforms have increasingly become central pillars of PFD research and management.We systematically review the developmental stages of PFDrelated databases.We then conduct a comparative analysis of representative international and domestic platforms,examining key aspects including organizational structures and construction models,data sources and integration strategies,core functionalities,data quality control and standardization,data security and access management,and research applications.Finally,based on the current status of PFD database development both globally and in China,we offer recommendations to strengthen data infrastructure and guide future directions.The findings may serve as a valuable reference for the optimization of PFD databases worldwide.展开更多
Since meteorological conditions are the main factor driving the transport and dispersion of air pollutants,an accurate simulation of the meteorological field will directly affect the accuracy of the atmospheric chemic...Since meteorological conditions are the main factor driving the transport and dispersion of air pollutants,an accurate simulation of the meteorological field will directly affect the accuracy of the atmospheric chemical transport model in simulating PM_(2.5).Based on the NASM joint chemical data assimilation system,the authors quantified the impacts of different meteorological fields on the pollutant simulations as well as revealed the role of meteorological conditions in the accumulation,maintenance,and dissipation of heavy haze pollution.During the two heavy pollution processes from 10 to 24 November 2018,the meteorological fields were obtained using NCEP FNL and ERA5 reanalysis data,each used to drive the WRF model,to analyze the differences in the simulated PM_(2.5) concentration.The results show that the meteorological field has a strong influence on the concentration levels and spatial distribution of the pollution simulations.The ERA5 group had relatively small simulation errors,and more accurate PM_(2.5) simulation results could be obtained.The RMSE was 11.86𝜇g m^(-3)lower than that of the FNL group before assimilation,and 5.77𝜇g m^(-3)lower after joint assimilation.The authors used the PM_(2.5) simulation results obtained by ERA5 data to discuss the role of the wind field and circulation situation on the pollution process,to analyze the correlation between wind speed,temperature,relative humidity,and boundary layer height and pollutant concentrations,and to further clarify the key formation mechanism of this pollution process.展开更多
The widespread usage of rechargeable batteries in portable devices,electric vehicles,and energy storage systems has underscored the importance for accurately predicting their lifetimes.However,data scarcity often limi...The widespread usage of rechargeable batteries in portable devices,electric vehicles,and energy storage systems has underscored the importance for accurately predicting their lifetimes.However,data scarcity often limits the accuracy of prediction models,which is escalated by the incompletion of data induced by the issues such as sensor failures.To address these challenges,we propose a novel approach to accommodate data insufficiency through achieving external information from incomplete data samples,which are usually discarded in existing studies.In order to fully unleash the prediction power of incomplete data,we have investigated the Multiple Imputation by Chained Equations(MICE)method that diversifies the training data through exploring the potential data patterns.The experimental results demonstrate that the proposed method significantly outperforms the baselines in the most considered scenarios while reducing the prediction root mean square error(RMSE)by up to 18.9%.Furthermore,we have also observed that the penetration of incomplete data benefits the explainability of the prediction model through facilitating the feature selection.展开更多
The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This pap...The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.展开更多
In this study,we developed a high-resolution(3 arcsec,approximately 90 m)V_(S30) map and associated open-access dataset for the 140 km×200 km region affected by the January 2025 M6.8 Dingri Xizang,China earthquak...In this study,we developed a high-resolution(3 arcsec,approximately 90 m)V_(S30) map and associated open-access dataset for the 140 km×200 km region affected by the January 2025 M6.8 Dingri Xizang,China earthquake.This map provides a significantly finer resolution compared to existing V_(S30) maps,which typically use a 30 arcsec grid.The V_(S30) values were estimated using the Cokriging-based V_(S30) proxy model(SCK model),which integrates V_(S30) measurements as primary constraints and utilizes topographic slope as a secondary parameter.The findings indicate that the V_(S30) values range from 200 to 250 m/s in the sedimentary deposit areas near the earthquake’s epicenter and from 400 to 600 m/s in the surrounding mountainous regions.This study showcases the capability of the SCK model to efficiently generate V_(S30) estimations across various spatial resolutions and demonstrates its effectiveness in producing reliable estimations in data-sparse regions.展开更多
Previous studies aiming to accelerate data processing have focused on enhancement algorithms,using the graphics processing unit(GPU)to speed up programs,and thread-level parallelism.These methods overlook maximizing t...Previous studies aiming to accelerate data processing have focused on enhancement algorithms,using the graphics processing unit(GPU)to speed up programs,and thread-level parallelism.These methods overlook maximizing the utilization of existing central processing unit(CPU)resources and reducing human and computational time costs via process automation.Accordingly,this paper proposes a scheme,called SSM,that combines“Srun job submission mode”,“Sbatch job submission mode”,and“Monitor function”.The SSM scheme includes three main modules:data management,command management,and resource management.Its core innovations are command splitting and parallel execution.The results show that this method effectively improves CPU utilization and reduces the time required for data processing.In terms of CPU utilization,the average value of this scheme is 89%.In contrast,the average CPU utilizations of“Srun job submission mode”and“Sbatch job submission mode”are significantly lower,at 43%and 52%,respectively.In terms of the data-processing time,SSM testing on the Five-hundred-meter Aperture Spherical radio Telescope(FAST)data requires only 5.5 h,compared with 8 h in the“Srun job submission mode”and 14 h in the“Sbatch job submission mode”.In addition,tests on the FAST and Parkes datasets demonstrate the universality of the SSM scheme,which can process data from different telescopes.The compatibility of the SSM scheme for pulsar searches is verified using 2 days of observational data from the globular cluster M2,with the scheme successfully discovering all published pulsars in M2.展开更多
Lead(Pb)plays a significant role in the nuclear industry and is extensively used in radiation shielding,radiation protection,neutron moderation,radiation measurements,and various other critical functions.Consequently,...Lead(Pb)plays a significant role in the nuclear industry and is extensively used in radiation shielding,radiation protection,neutron moderation,radiation measurements,and various other critical functions.Consequently,the measurement and evaluation of Pb nuclear data are highly regarded in nuclear scientific research,emphasizing its crucial role in the field.Using the time-of-flight(ToF)method,the neutron leakage spectra from three^(nat)Pb samples were measured at 60°and 120°based on the neutronics integral experimental facility at the China Institute of Atomic Energy(CIAE).The^(nat)Pb sample sizes were30 cm×30 cm×5 cm,30 cm×30 cm×10 cm,and 30 cm×30 cm×15 cm.Neutron sources were generated by the Cockcroft-Walton accelerator,producing approximately 14.5 MeV and 3.5 MeV neutrons through the T(d,n)^(4)He and D(d,n)^(3)He reactions,respectively.Leakage neutron spectra were also calculated by employing the Monte Carlo code of MCNP-4C,and the nuclear data of Pb isotopes from four libraries:CENDL-3.2,JEFF-3.3,JENDL-5,and ENDF/B-Ⅷ.0 were used individually.By comparing the simulation and experimental results,improvements and deficiencies in the evaluated nuclear data of the Pb isotopes were analyzed.Most of the calculated results were consistent with the experimental results;however,a few areas did not fit well.In the(n,el)energy range,the simulated results from CENDL-3.2 were significantly overestimated;in the(n,inl)D and the(n,inl)C energy regions,the results from CENDL-3.2 and ENDF/B-Ⅷ.0 were significantly overestimated at 120°,and the results from JENDL-5 and JEFF-3.3 are underestimated at 60°in the(n,inl)D energy region.The calculated spectra were analyzed by comparing them with the experimental spectra in terms of the neutron spectrum shape and C/E values.The results indicate that the theoretical simulations,using different data libraries,overestimated or underestimated the measured values in certain energy ranges.Secondary neutron energies and angular distributions in the data files have been presented to explain these discrepancies.展开更多
Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpe...Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.展开更多
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
文摘On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.
文摘Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthcare dataset standards from 2011 to 2025.It analyzes their evolution across types,applications,institutions,and themes,highlighting key achievements including substantial growth in quantity,optimized typology,expansion into innovative application scenarios such as health decision support,and broadened institutional involvement.The study also identifies critical challenges,including imbalanced development,insufficient quality control,and a lack of essential metadata—such as authoritative data element mappings and privacy annotations—which hampers the delivery of intelligent services.To address these challenges,the study proposes a multi-faceted strategy focused on optimizing the standard system's architecture,enhancing quality and implementation,and advancing both data governance—through authoritative tracing and privacy protection—and intelligent service provision.These strategies aim to promote the application of dataset standards,thereby fostering and securing the development of new productive forces in healthcare.
基金supported by the National Key Research&Development Program of China(No.2024YFC3505800)the National Natural Science Foundation of China(Nos.82474334,82474335 and 72174132)+3 种基金National Science Fund for Distinguished Young Scholars(No.82225049)the Key Research&Development Projects of Sichuan Provincial Department of Science and Technology(Nos.2024YFFK0174 and 2024YFFK0152)1.3.5 Project for Disciplines of Excellence,West China Hospital,Sichuan University(Nos.ZYYC24010 and ZYGD23004)the Special Fund for Traditional Chinese Medicine of Sichuan Provincial Administration of Traditional Chinese Medicine(No.2024zd023).
文摘Objectives:Electronic health records(EHRs)offer valuable real-world data(RWD)for Chinese medicine research.However,significant methodological challenges remain in developing integrative Chinese-Western medicine(ICWM)databases.This study aims to establish a best-practice methodological framework,referred to as BRIDGE,to guide the construction of ICWM databases using EHRs.Methods:We developed the methodological framework through a comprehensive process,including systematic literature review,synthesis of empirical experiences,thematic expert discussions,and consultation with an external panel to reach consensus.Results:The BRIDGE framework outlines 6 core components for ICWM-EHR database development:Overall design,database architecture,data extraction and linkage,data governance,data verification,and data quality evaluation.Key data elements include variables related to study population,treatment or exposure,outcomes,and confounders.These databases support various research applications,particularly in evaluating the effectiveness and safety of integrative therapies.To demonstrate its practical value,we developed an ICWM-EHR database on women’s reproductive lifespan,encompassing 2,064,482 patients.This database captures women’s health conditions across the life course,from reproductive age to older adulthood.Conclusions:The BRIDGE methodological framework provides a standardized approach to building high-quality ICWM-EHR databases.It offers a unique opportunity to strengthen the methodological rigor and real-world relevance of Chinese medicine research in integrated healthcare settings.
基金supported by the National Key Research and Development Plan(No.2023YFA1606602)。
文摘This article introduces the methodologies and instrumentation for data measurement and propagation at the Back-n white neutron facility of the China Spallation Neutron Source.The Back-n facility employs backscattering techniques to generate a broad spectrum of white neutrons.Equipped with advanced detectors such as the light particle detector array and the fission ionization chamber detector,the facility achieves high-precision data acquisition through a general-purpose electronics system.Data were managed and stored in a hierarchical system supported by the National High Energy Physics Science Data Center,ensuring long-term preservation and efficient access.The data from the Back-n experiments significantly contribute to nuclear physics,reactor design,astrophysics,and medical physics,enhancing the understanding of nuclear processes and supporting interdisciplinary research.
基金Project supported by the National Natural Science Foundation of China(No.12402033)the National Natural Science Foundation for Distinguished Young Scholars of China(No.52225211)。
文摘Many complex systems are frequently subject to the influence of uncertain disturbances,which can exert a profound effect on the critical transitions(CTs),potentially resulting in catastrophic consequences.Consequently,it is of uttermost importance to provide warnings for noise-induced CTs in various applications.Although capturing certain generic symptoms of transition behaviors from observational and simulated data poses a challenging problem,this work attempts to extract information regarding CTs from simulated data of a Gaussian white noise-induced tri-stable system.Using the extended dynamic mode decomposition(EDMD)algorithm,we initially obtain finite-dimensional approximations of both the stochastic Koopman operator and the generator.Subsequently,the drift parameters and the noise intensity within the system are identified from the simulated data.Utilizing the identified system,the parameter-dependent basin of the unsafe regime(PDBUR)is quantified,enabling data-driven early warning of Gaussian white noise-induced CTs.Finally,an error analysis is carried out to verify the effectiveness of the data-driven results.Our findings may serve as a paradigm for understanding and predicting noise-induced CTs in complex systems based on data.
基金supported by the Key Program of National Natural Science Foundation of China(Grant No.41930650)Young Scientists Fund of the National Natural Science Foundation of China(Grant No.42301310).
文摘Sustained and spatially explicit monitoring of the United Nations 2030 Agenda for Sustainable Development is critical for effectively tracking progress toward the global Sustainable Development Goals(SDGs).Although land cover information has long been recognized as an essential component for monitoring SDGs,a standardized scientific framework for identifying and prioritizing land cover related essential variables does not exist.Therefore,we propose a novel expert-and data-driven framework for identifying,refining,and selecting a priority list of Essential Land cover-related Variables for SDGs(ELcV4SDGs).This framework integrates methods including expert knowledge-based analysis,clustering of variables with similar attributes,and quantified index calculation to establish the priority list.Applying the framework to 15 specific SDG indicators,we found that the ELcV4SDGs priority list comprises three main categories,type and structure,pattern and intensity,and process and evolution of land cover,which are further divided into 19 subcategories and ultimately encompass 50 general variables.The ELcV4SDGs will support detailed spatial monitoring and enhance their scientific applications for SDG monitoring and assessment,thereby guiding future SDG priority actions and informing decision-making to advance the 2030 SDGs agenda at local,national,and global levels.
基金supported by the National Key R&D Program of China(No.2023YFF1304002-05)the National Social Science Fund of China(No.22BTJ005)the National Natural Science Foundation of China(No.32572049)。
文摘Remote sensing plays a pivotal role in forest inventory by enabling efficient large-scale monitoring while minimizing fieldwork costs.However,missing values pose a critical challenge in remote sensing applications,as ignoring or mishandling such data gaps can introduce systematic bias into the estimation of target variables for natural resource monitoring.This can lead to cascading errors that propagate through forest and ecosystem management decisions,ultimately hindering progress toward sustainable forest management,biodiversity conservation,and climate change mitigation strategies.This study aims to propose and demonstrate a procedure that employs hybrid estimators to address the limitations of missing remotely sensed data in forest inventory,using Landsat 7 ETM+SLC-off data as an archived source for forest resource monitoring as a case in point.We compared forest inventory estimates from the hybrid estimator with those from a conventional model-based(CMB)estimator using Sentinel-2 data without missing values.Monte Carlo simulations revealed three key findings:(1)The hybrid estimator,leveraging missing-data remote sensing represented by Landsat 7 ETM+SLCoff data,achieved a sampling precision of over 90%,meeting China's national standard for the National Forest Inventory(NFI);(2)The hybrid estimator demonstrated comparable efficiency to the CMB estimator;(3)The uncertainty associated with hybrid estimators was primarily dominated by model parameter estimation,which could be effectively mitigated by slightly increasing the training sample size or refining model specification.Overall,in forest inventory,the hybrid estimator can surmount the limitations posed by missing values in remotely sensed auxiliary data,effectively balancing cost-effectiveness and flexibility.
基金CAMS Innovation Fund for Medical Sciences(CIFMS):“Construction of an Intelligent Management and Efficient Utilization Technology System for Big Data in Population Health Science.”(2021-I2M-1-057)Key Projects of the Innovation Fund of the National Clinical Research Center for Orthopedics and Sports Rehabilitation:“National Orthopedics and Sports Rehabilitation Real-World Research Platform System Construction”(23-NCRC-CXJJ-ZD4)。
文摘【Objective】Medical imaging data has great value,but it contains a significant amount of sensitive information about patients.At present,laws and regulations regarding to the de-identification of medical imaging data are not clearly defined around the world.This study aims to develop a tool that meets compliance-driven desensitization requirements tailored to diverse research needs.【Methods】To enhance the security of medical image data,we designed and implemented a DICOM format medical image de-identification system on the Windows operating system.【Results】Our custom de-identification system is adaptable to the legal standards of different countries and can accommodate specific research demands.The system offers both web-based online and desktop offline de-identification capabilities,enabling customization of de-identification rules and facilitating batch processing to improve efficiency.【Conclusions】This medical image de-identification system robustly strengthens the stewardship of sensitive medical data,aligning with data security protection requirements while facilitating the sharing and utilization of medical image data.This approach unlocks the intrinsic value inherent in such datasets.
基金supported by the National Key R&D Program of China(No.2023YFB2703700)the National Natural Science Foundation of China(Nos.U21A20465,62302457,62402444,62172292)+4 种基金the Fundamental Research Funds of Zhejiang Sci-Tech University(Nos.23222092-Y,22222266-Y)the Program for Leading Innovative Research Team of Zhejiang Province(No.2023R01001)the Zhejiang Provincial Natural Science Foundation of China(Nos.LQ24F020008,LQ24F020012)the Foundation of State Key Laboratory of Public Big Data(No.[2022]417)the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(No.2023C01119).
文摘As smart grid technology rapidly advances,the vast amount of user data collected by smart meter presents significant challenges in data security and privacy protection.Current research emphasizes data security and user privacy concerns within smart grids.However,existing methods struggle with efficiency and security when processing large-scale data.Balancing efficient data processing with stringent privacy protection during data aggregation in smart grids remains an urgent challenge.This paper proposes an AI-based multi-type data aggregation method designed to enhance aggregation efficiency and security by standardizing and normalizing various data modalities.The approach optimizes data preprocessing,integrates Long Short-Term Memory(LSTM)networks for handling time-series data,and employs homomorphic encryption to safeguard user privacy.It also explores the application of Boneh Lynn Shacham(BLS)signatures for user authentication.The proposed scheme’s efficiency,security,and privacy protection capabilities are validated through rigorous security proofs and experimental analysis.
文摘The characteristic databases in China face issues such as narrow resource coverage,low levels of standardization and normalization,and limited data sharing.To address these challenges,this paper proposes the concept of characteristic databases alliance,using marine characteristic databases as a case for feasibility analysis and discussion.The paper outlines the development path for such alliances and offers recommendations for future growth,aiming to establish a collaborative platform for the development of characteristic databases.
文摘With the rise of data-intensive research,data literacy has become a critical capability for improving scientific data quality and achieving artificial intelligence(AI)readiness.In the biomedical domain,data are characterized by high complexity and privacy sensitivity,calling for robust and systematic data management skills.This paper reviews current trends in scientific data governance and the evolving policy landscape,highlighting persistent challenges such as inconsistent standards,semantic misalignment,and limited awareness of compliance.These issues are largely rooted in the lack of structured training and practical support for researchers.In response,this study builds on existing data literacy frameworks and integrates the specific demands of biomedical research to propose a comprehensive,lifecycle-oriented data literacy competency model with an emphasis on ethics and regulatory awareness.Furthermore,it outlines a tiered training strategy tailored to different research stages—undergraduate,graduate,and professional,offering theoretical foundations and practical pathways for universities and research institutions to advance data literacy education.
文摘Pelvic floor dysfunction(PFD),including conditions such as stress urinary incontinence,pelvic organ prolapse,and fecal incontinence,significantly affects women's quality of life and their physical and mental health.With advancement of digital medicine,the systematic collection of data and the high-quality development of database platforms have increasingly become central pillars of PFD research and management.We systematically review the developmental stages of PFDrelated databases.We then conduct a comparative analysis of representative international and domestic platforms,examining key aspects including organizational structures and construction models,data sources and integration strategies,core functionalities,data quality control and standardization,data security and access management,and research applications.Finally,based on the current status of PFD database development both globally and in China,we offer recommendations to strengthen data infrastructure and guide future directions.The findings may serve as a valuable reference for the optimization of PFD databases worldwide.
基金supported by the Second Tibetan Plateau Scientific Expedition and Research Program of Ministry of Science and Technology of the People's Republic of China[grant number 2022QZKK0101]the Science and Technology Department of the Tibet Program[grant number XZ202301ZY0035G]。
文摘Since meteorological conditions are the main factor driving the transport and dispersion of air pollutants,an accurate simulation of the meteorological field will directly affect the accuracy of the atmospheric chemical transport model in simulating PM_(2.5).Based on the NASM joint chemical data assimilation system,the authors quantified the impacts of different meteorological fields on the pollutant simulations as well as revealed the role of meteorological conditions in the accumulation,maintenance,and dissipation of heavy haze pollution.During the two heavy pollution processes from 10 to 24 November 2018,the meteorological fields were obtained using NCEP FNL and ERA5 reanalysis data,each used to drive the WRF model,to analyze the differences in the simulated PM_(2.5) concentration.The results show that the meteorological field has a strong influence on the concentration levels and spatial distribution of the pollution simulations.The ERA5 group had relatively small simulation errors,and more accurate PM_(2.5) simulation results could be obtained.The RMSE was 11.86𝜇g m^(-3)lower than that of the FNL group before assimilation,and 5.77𝜇g m^(-3)lower after joint assimilation.The authors used the PM_(2.5) simulation results obtained by ERA5 data to discuss the role of the wind field and circulation situation on the pollution process,to analyze the correlation between wind speed,temperature,relative humidity,and boundary layer height and pollutant concentrations,and to further clarify the key formation mechanism of this pollution process.
文摘The widespread usage of rechargeable batteries in portable devices,electric vehicles,and energy storage systems has underscored the importance for accurately predicting their lifetimes.However,data scarcity often limits the accuracy of prediction models,which is escalated by the incompletion of data induced by the issues such as sensor failures.To address these challenges,we propose a novel approach to accommodate data insufficiency through achieving external information from incomplete data samples,which are usually discarded in existing studies.In order to fully unleash the prediction power of incomplete data,we have investigated the Multiple Imputation by Chained Equations(MICE)method that diversifies the training data through exploring the potential data patterns.The experimental results demonstrate that the proposed method significantly outperforms the baselines in the most considered scenarios while reducing the prediction root mean square error(RMSE)by up to 18.9%.Furthermore,we have also observed that the penetration of incomplete data benefits the explainability of the prediction model through facilitating the feature selection.
文摘The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.
基金supported by the National Natural Science Foundation of China(No.42120104002).
文摘In this study,we developed a high-resolution(3 arcsec,approximately 90 m)V_(S30) map and associated open-access dataset for the 140 km×200 km region affected by the January 2025 M6.8 Dingri Xizang,China earthquake.This map provides a significantly finer resolution compared to existing V_(S30) maps,which typically use a 30 arcsec grid.The V_(S30) values were estimated using the Cokriging-based V_(S30) proxy model(SCK model),which integrates V_(S30) measurements as primary constraints and utilizes topographic slope as a secondary parameter.The findings indicate that the V_(S30) values range from 200 to 250 m/s in the sedimentary deposit areas near the earthquake’s epicenter and from 400 to 600 m/s in the surrounding mountainous regions.This study showcases the capability of the SCK model to efficiently generate V_(S30) estimations across various spatial resolutions and demonstrates its effectiveness in producing reliable estimations in data-sparse regions.
基金supported by the National Nature Science Foundation of China(12363010)supported by the Guizhou Provincial Basic Research Program(Natural Science)(ZK[2023]039)the Key Technology R&D Program([2023]352).
文摘Previous studies aiming to accelerate data processing have focused on enhancement algorithms,using the graphics processing unit(GPU)to speed up programs,and thread-level parallelism.These methods overlook maximizing the utilization of existing central processing unit(CPU)resources and reducing human and computational time costs via process automation.Accordingly,this paper proposes a scheme,called SSM,that combines“Srun job submission mode”,“Sbatch job submission mode”,and“Monitor function”.The SSM scheme includes three main modules:data management,command management,and resource management.Its core innovations are command splitting and parallel execution.The results show that this method effectively improves CPU utilization and reduces the time required for data processing.In terms of CPU utilization,the average value of this scheme is 89%.In contrast,the average CPU utilizations of“Srun job submission mode”and“Sbatch job submission mode”are significantly lower,at 43%and 52%,respectively.In terms of the data-processing time,SSM testing on the Five-hundred-meter Aperture Spherical radio Telescope(FAST)data requires only 5.5 h,compared with 8 h in the“Srun job submission mode”and 14 h in the“Sbatch job submission mode”.In addition,tests on the FAST and Parkes datasets demonstrate the universality of the SSM scheme,which can process data from different telescopes.The compatibility of the SSM scheme for pulsar searches is verified using 2 days of observational data from the globular cluster M2,with the scheme successfully discovering all published pulsars in M2.
基金supported by the National Natural Science Foundation of China(Nos.11775311 and U2067205)the Stable Support Basic Research Program Grant(BJ010261223282)the Research and Development Project of China National Nuclear Corporation。
文摘Lead(Pb)plays a significant role in the nuclear industry and is extensively used in radiation shielding,radiation protection,neutron moderation,radiation measurements,and various other critical functions.Consequently,the measurement and evaluation of Pb nuclear data are highly regarded in nuclear scientific research,emphasizing its crucial role in the field.Using the time-of-flight(ToF)method,the neutron leakage spectra from three^(nat)Pb samples were measured at 60°and 120°based on the neutronics integral experimental facility at the China Institute of Atomic Energy(CIAE).The^(nat)Pb sample sizes were30 cm×30 cm×5 cm,30 cm×30 cm×10 cm,and 30 cm×30 cm×15 cm.Neutron sources were generated by the Cockcroft-Walton accelerator,producing approximately 14.5 MeV and 3.5 MeV neutrons through the T(d,n)^(4)He and D(d,n)^(3)He reactions,respectively.Leakage neutron spectra were also calculated by employing the Monte Carlo code of MCNP-4C,and the nuclear data of Pb isotopes from four libraries:CENDL-3.2,JEFF-3.3,JENDL-5,and ENDF/B-Ⅷ.0 were used individually.By comparing the simulation and experimental results,improvements and deficiencies in the evaluated nuclear data of the Pb isotopes were analyzed.Most of the calculated results were consistent with the experimental results;however,a few areas did not fit well.In the(n,el)energy range,the simulated results from CENDL-3.2 were significantly overestimated;in the(n,inl)D and the(n,inl)C energy regions,the results from CENDL-3.2 and ENDF/B-Ⅷ.0 were significantly overestimated at 120°,and the results from JENDL-5 and JEFF-3.3 are underestimated at 60°in the(n,inl)D energy region.The calculated spectra were analyzed by comparing them with the experimental spectra in terms of the neutron spectrum shape and C/E values.The results indicate that the theoretical simulations,using different data libraries,overestimated or underestimated the measured values in certain energy ranges.Secondary neutron energies and angular distributions in the data files have been presented to explain these discrepancies.
基金supported in part by the National Key Research and Development Program of China under Grant 2024YFE0200600in part by the National Natural Science Foundation of China under Grant 62071425+3 种基金in part by the Zhejiang Key Research and Development Plan under Grant 2022C01093in part by the Zhejiang Provincial Natural Science Foundation of China under Grant LR23F010005in part by the National Key Laboratory of Wireless Communications Foundation under Grant 2023KP01601in part by the Big Data and Intelligent Computing Key Lab of CQUPT under Grant BDIC-2023-B-001.
文摘Semantic communication(SemCom)aims to achieve high-fidelity information delivery under low communication consumption by only guaranteeing semantic accuracy.Nevertheless,semantic communication still suffers from unexpected channel volatility and thus developing a re-transmission mechanism(e.g.,hybrid automatic repeat request[HARQ])becomes indispensable.In that regard,instead of discarding previously transmitted information,the incremental knowledge-based HARQ(IK-HARQ)is deemed as a more effective mechanism that could sufficiently utilize the information semantics.However,considering the possible existence of semantic ambiguity in image transmission,a simple bit-level cyclic redundancy check(CRC)might compromise the performance of IK-HARQ.Therefore,there emerges a strong incentive to revolutionize the CRC mechanism,thus more effectively reaping the benefits of both SemCom and HARQ.In this paper,built on top of swin transformer-based joint source-channel coding(JSCC)and IK-HARQ,we propose a semantic image transmission framework SC-TDA-HARQ.In particular,different from the conventional CRC,we introduce a topological data analysis(TDA)-based error detection method,which capably digs out the inner topological and geometric information of images,to capture semantic information and determine the necessity for re-transmission.Extensive numerical results validate the effectiveness and efficiency of the proposed SC-TDA-HARQ framework,especially under the limited bandwidth condition,and manifest the superiority of TDA-based error detection method in image transmission.