Photonuclear data are increasingly used in fundamental nuclear research and technological applications.These data are generated using advanced γ-ray sources.The Shanghai laser electron gamma source(SLEGS)is a new las...Photonuclear data are increasingly used in fundamental nuclear research and technological applications.These data are generated using advanced γ-ray sources.The Shanghai laser electron gamma source(SLEGS)is a new laser Compton scattering γ-ray source at the Shanghai Synchrotron Radiation Facility.It delivers energy-tunable,quasi-monoenergetic gamma beams for high-precision photonuclear measurements.This paper presents the flat-efficiency detector(FED)array at SLEGS and its application in photoneutron cross-section measurements.Systematic uncertainties of the FED array were determined to be 3.02%through calibration with a ^(252)Cf neutron source.Using ^(197)Au and ^(159)Tb as representative nuclei,we demonstrate the format and processing methodology for raw photoneutron data.The results validate SLEGS’capability for high-precision photoneutron measurements.展开更多
Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where la...Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where large biomedical datasets are transformed into valuable knowledge.Existing methods rely on a feature extraction step and suffer from high computational time requirements.In contrast,newer approaches leveraging deep learning have shown significant promise in enhancing accuracy and efficiency.In this paper,we investigate the performance of various deep learning architectures:Convolutional Neural Network(CNN),CNN-Long Short-Term Memory(CNNLSTM),CNN-Bidirectional Long Short-Term Memory(CNN-BiLSTM),Residual Network(ResNet),and InceptionV3 for DNA sequence classification.Various numerical and visual data representation techniques are utilized to represent the input datasets,including:label encoding,k-mer sentence encoding,k-mer one-hot vector,Frequency Chaos Game Representation(FCGR)and 5-Color Map(ColorSquare).Three datasets are used for the training of the models including H3,H4 and DNA Sequence Dataset(Yeast,Human,Arabidopsis Thaliana).Experiments are performed to determine which combination of DNA representation and deep learning architecture yields improved performance for the classification task.Our results indicate that using a hybrid CNN-LSTM neural network trained on DNA sequences represented as one-hot encoded k-mer sequences yields the best performance,achieving an accuracy of 92.1%.展开更多
Parametric survival models are essential for analyzing time-to-event data in fields such as engineering and biomedicine.While the log-logistic distribution is popular for its simplicity and closed-form expressions,it ...Parametric survival models are essential for analyzing time-to-event data in fields such as engineering and biomedicine.While the log-logistic distribution is popular for its simplicity and closed-form expressions,it often lacks the flexibility needed to capture complex hazard patterns.In this article,we propose a novel extension of the classical log-logistic distribution,termed the new exponential log-logistic(NExLL)distribution,designed to provide enhanced flexibility in modeling time-to-event data with complex failure behaviors.The NExLL model incorporates a new exponential generator to expand the shape adaptability of the baseline log-logistic distribution,allowing it to capture a wide range of hazard rate shapes,including increasing,decreasing,J-shaped,reversed J-shaped,modified bathtub,and unimodal forms.A key feature of the NExLL distribution is its formulation as a mixture of log-logistic densities,offering both symmetric and asymmetric patterns suitable for diverse real-world reliability scenarios.We establish several theoretical properties of the model,including closed-form expressions for its probability density function,cumulative distribution function,moments,hazard rate function,and quantiles.Parameter estimation is performed using seven classical estimation techniques,with extensive Monte Carlo simulations used to evaluate and compare their performance under various conditions.The practical utility and flexibility of the proposed model are illustrated using two real-world datasets from reliability and engineering applications,where the NExLL model demonstrates superior fit and predictive performance compared to existing log-logistic-basedmodels.This contribution advances the toolbox of parametric survivalmodels,offering a robust alternative formodeling complex aging and failure patterns in reliability,engineering,and other applied domains.展开更多
A basic procedure for transforming readable data into encoded forms is encryption, which ensures security when the right decryption keys are used. Hadoop is susceptible to possible cyber-attacks because it lacks built...A basic procedure for transforming readable data into encoded forms is encryption, which ensures security when the right decryption keys are used. Hadoop is susceptible to possible cyber-attacks because it lacks built-in security measures, even though it can effectively handle and store enormous datasets using the Hadoop Distributed File System (HDFS). The increasing number of data breaches emphasizes how urgently creative encryption techniques are needed in cloud-based big data settings. This paper presents Adaptive Attribute-Based Honey Encryption (AABHE), a state-of-the-art technique that combines honey encryption with Ciphertext-Policy Attribute-Based Encryption (CP-ABE) to provide improved data security. Even if intercepted, AABHE makes sure that sensitive data cannot be accessed by unauthorized parties. With a focus on protecting huge files in HDFS, the suggested approach achieves 98% security robustness and 95% encryption efficiency, outperforming other encryption methods including Ciphertext-Policy Attribute-Based Encryption (CP-ABE), Key-Policy Attribute-Based Encryption (KB-ABE), and Advanced Encryption Standard combined with Attribute-Based Encryption (AES+ABE). By fixing Hadoop’s security flaws, AABHE fortifies its protections against data breaches and enhances Hadoop’s dependability as a platform for processing and storing massive amounts of data.展开更多
The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including ...Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.展开更多
High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging ...High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).展开更多
Irregular seismic data causes problems with multi-trace processing algorithms and degrades processing quality. We introduce the Projection onto Convex Sets (POCS) based image restoration method into the seismic data...Irregular seismic data causes problems with multi-trace processing algorithms and degrades processing quality. We introduce the Projection onto Convex Sets (POCS) based image restoration method into the seismic data reconstruction field to interpolate irregularly missing traces. For entire dead traces, we transfer the POCS iteration reconstruction process from the time to frequency domain to save computational cost because forward and reverse Fourier time transforms are not needed. In each iteration, the selection threshold parameter is important for reconstruction efficiency. In this paper, we designed two types of threshold models to reconstruct irregularly missing seismic data. The experimental results show that an exponential threshold can greatly reduce iterations and improve reconstruction efficiency compared to a linear threshold for the same reconstruction result. We also analyze the anti- noise and anti-alias ability of the POCS reconstruction method. Finally, theoretical model tests and real data examples indicate that the proposed method is efficient and applicable.展开更多
JCOMM has strategy to establish the network of WMO-IOC Centres for Marine-meteorological and Oceanographic Climate Data (CMOCs) under the new Marine Climate Data System (MCDS) in 2012 for improving the quality and...JCOMM has strategy to establish the network of WMO-IOC Centres for Marine-meteorological and Oceanographic Climate Data (CMOCs) under the new Marine Climate Data System (MCDS) in 2012 for improving the quality and timeliness of the marine-meteorological and oceanographic data, metadata and products available to end users. China as a candidate of CMOC China has been approved to run on a trial basis after the 4th Meeting of the Joint IOC/WMO Technical Commission for Oceanography and Marine Meteorology (JCOMM). This article states the developing intention of CMOC China in the next few years through the brief introduction to critical marine data, products and service system and cooperation projects in the world.展开更多
The absence of low-frequency information in seismic data is one of the most difficult problems in elastic full waveform inversion. Without low-frequency data, it is difficult to recover the long-wavelength components ...The absence of low-frequency information in seismic data is one of the most difficult problems in elastic full waveform inversion. Without low-frequency data, it is difficult to recover the long-wavelength components of subsurface models and the inversion converges to local minima. To solve this problem, the elastic envelope inversion method is introduced. Based on the elastic envelope operator that is capable of retrieving low- frequency signals hidden in multicomponent data, the proposed method uses the envelope of multicomponent seismic signals to construct a misfit function and then recover the long- wavelength components of the subsurface model. Numerical tests verify that the elastic envelope method reduces the inversion nonlinearity and provides better starting models for the subsequent conventional elastic full waveform inversion and elastic depth migration, even when low frequencies are missing in multicomponent data and the starting model is far from the true model. Numerical tests also suggest that the proposed method is more effective in reconstructing the long-wavelength components of the S-wave velocity model. The inversion of synthetic data based on the Marmousi-2 model shows that the resolution of conventional elastic full waveform inversion improves after using the starting model obtained using the elastic envelope method. Finally, the limitations of the elastic envelope inversion method are discussed.展开更多
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR...A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.展开更多
In multi-component seismic exploration, the horizontal and vertical components both contain P- and SV-waves. The P- and SV-wavefields in a seismic record can be separated by their horizontal and vertical displacements...In multi-component seismic exploration, the horizontal and vertical components both contain P- and SV-waves. The P- and SV-wavefields in a seismic record can be separated by their horizontal and vertical displacements when upgoing P- and SV-waves arrive at the sea floor. If the sea floor P wave velocity, S wave velocity, and density are known, the separation can be achieved in ther-p domain. The separated wavefields are then transformed to the time domain. A method of separating P- and SV-wavefields is presented in this paper and used to effectively separate P- and SV-wavefields in synthetic and real data. The application to real data shows that this method is feasible and effective. It also can be used for free surface data.展开更多
A large number of autonomous profiling floats deployed in global oceans have provided abundant temperature and salinity profiles of the upper ocean. Many floats occasionally profile observations during the passage of ...A large number of autonomous profiling floats deployed in global oceans have provided abundant temperature and salinity profiles of the upper ocean. Many floats occasionally profile observations during the passage of tropical cyclones. These in-situ observations are valuable and useful in studying the ocean’s response to tropical cyclones, which are rarely observed due to harsh weather conditions. In this paper, the upper ocean response to the tropical cyclones in the northwestern Pacific during 2000–2005 is analyzed and discussed based on the data from Argo profiling floats. Results suggest that the passage of tropical cyclones caused the deepening of mixed layer depth (MLD), cooling of mixed layer temperature (MLT), and freshening of mixed layer salinity (MLS). The change in MLT is negatively correlated to wind speed. The cooling of the MLT extended for 50–150 km on the right side of the cyclone track. The change of MLS is almost symmetrical in distribution on both sides of the track, and the change of MLD is negatively correlated to pre-cyclone initial MLD.展开更多
Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for ...Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for anti-influenza were collected and recorded in the database, and then the correlation coefficient between herbs, core combinations of herbs and new prescriptions were analyzed by using modified mutual information, complex system entropy cluster and unsupervised hierarchical clustering, respectively. Results: Based on analysis of 126 Chinese patent medicine recipes, the frequency of each herb occurrence in these recipes, 54 frequently-used herb pairs, 34 core combinations were determined, and 4 new recipes for influenza were developed. Conclusion: Unsupervised data mining methods are able to mine the component law quickly and develop new prescriptions.展开更多
Seismic data structure characteristics means the waveform character arranged in the time sequence at discrete data points in each 2-D or 3-D seismic trace. Hydrocarbon prediction using seismic data structure character...Seismic data structure characteristics means the waveform character arranged in the time sequence at discrete data points in each 2-D or 3-D seismic trace. Hydrocarbon prediction using seismic data structure characteristics is a new reservoir prediction technique. When the main pay interval is in carbonate fracture and fissure-cavern type reservoirs with very strong inhomogeneity, there are some difficulties with hydrocarbon prediction. Because of the special geological conditions of the eighth zone in the Tahe oil field, we apply seismic data structure characteristics to hydrocarbon prediction for the Ordovician reservoir in this zone. We divide the area oil zone into favorable and unfavorable blocks. Eighteen well locations were proposed in the favorable oil block, drilled, and recovered higher output of oil and gas.展开更多
In the industrial process situation, principal component analysis (PCA) is ageneral method in data reconciliation. However, PCA sometime is unfeasible to nonlinear featureanalysis and limited in application to nonline...In the industrial process situation, principal component analysis (PCA) is ageneral method in data reconciliation. However, PCA sometime is unfeasible to nonlinear featureanalysis and limited in application to nonlinear industrial process. Kernel PCA (KPCA) is extensionof PCA and can be used for nonlinear feature analysis. A nonlinear data reconciliation method basedon KPCA is proposed. The basic idea of this method is that firstly original data are mapped to highdimensional feature space by nonlinear function, and PCA is implemented in the feature space. Thennonlinear feature analysis is implemented and data are reconstructed by using the kernel. The datareconciliation method based on KPCA is applied to ternary distillation column. Simulation resultsshow that this method can filter the noise in measurements of nonlinear process and reconciliateddata can represent the true information of nonlinear process.展开更多
With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big D...With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.展开更多
With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processin...With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noise- contaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airbome gravity-gradiometry data from Vinton salt dome (south- west Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.展开更多
The bearings of a certain type have their lives following a Weibull distribution. In a life test with 20 sets of bearings, only one set failed within the specified time, and none of the remainder failed even after th...The bearings of a certain type have their lives following a Weibull distribution. In a life test with 20 sets of bearings, only one set failed within the specified time, and none of the remainder failed even after the time of test has been extended. With a set of testing data like that in Table 1, it is required to estimate the reliability at the mission time. In this paper, we first use hierarchical Bayesian method to determine the prior distribution and the Bayesian estimates of various probabilities of failures, p i 's, then use the method of least squares to estimate the parameters of the Weibull distribution and the reliability. Actual computation shows that the estimates so obtained are rather robust. And the results have been adopted for practical use.展开更多
基金supported by National Key Research and Development Program of China(Nos.2022YFA1602404 and 2023YFA1606901)the National Natural Science Foundation of China(Nos.12275338,12388102,and U2441221)the Key Laboratory of Nuclear Data Foundation(JCKY2022201C152).
文摘Photonuclear data are increasingly used in fundamental nuclear research and technological applications.These data are generated using advanced γ-ray sources.The Shanghai laser electron gamma source(SLEGS)is a new laser Compton scattering γ-ray source at the Shanghai Synchrotron Radiation Facility.It delivers energy-tunable,quasi-monoenergetic gamma beams for high-precision photonuclear measurements.This paper presents the flat-efficiency detector(FED)array at SLEGS and its application in photoneutron cross-section measurements.Systematic uncertainties of the FED array were determined to be 3.02%through calibration with a ^(252)Cf neutron source.Using ^(197)Au and ^(159)Tb as representative nuclei,we demonstrate the format and processing methodology for raw photoneutron data.The results validate SLEGS’capability for high-precision photoneutron measurements.
基金funded by the Researchers Supporting Project number(RSPD2025R857),King Saud University,Riyadh,Saudi Arabia.
文摘Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where large biomedical datasets are transformed into valuable knowledge.Existing methods rely on a feature extraction step and suffer from high computational time requirements.In contrast,newer approaches leveraging deep learning have shown significant promise in enhancing accuracy and efficiency.In this paper,we investigate the performance of various deep learning architectures:Convolutional Neural Network(CNN),CNN-Long Short-Term Memory(CNNLSTM),CNN-Bidirectional Long Short-Term Memory(CNN-BiLSTM),Residual Network(ResNet),and InceptionV3 for DNA sequence classification.Various numerical and visual data representation techniques are utilized to represent the input datasets,including:label encoding,k-mer sentence encoding,k-mer one-hot vector,Frequency Chaos Game Representation(FCGR)and 5-Color Map(ColorSquare).Three datasets are used for the training of the models including H3,H4 and DNA Sequence Dataset(Yeast,Human,Arabidopsis Thaliana).Experiments are performed to determine which combination of DNA representation and deep learning architecture yields improved performance for the classification task.Our results indicate that using a hybrid CNN-LSTM neural network trained on DNA sequences represented as one-hot encoded k-mer sequences yields the best performance,achieving an accuracy of 92.1%.
文摘Parametric survival models are essential for analyzing time-to-event data in fields such as engineering and biomedicine.While the log-logistic distribution is popular for its simplicity and closed-form expressions,it often lacks the flexibility needed to capture complex hazard patterns.In this article,we propose a novel extension of the classical log-logistic distribution,termed the new exponential log-logistic(NExLL)distribution,designed to provide enhanced flexibility in modeling time-to-event data with complex failure behaviors.The NExLL model incorporates a new exponential generator to expand the shape adaptability of the baseline log-logistic distribution,allowing it to capture a wide range of hazard rate shapes,including increasing,decreasing,J-shaped,reversed J-shaped,modified bathtub,and unimodal forms.A key feature of the NExLL distribution is its formulation as a mixture of log-logistic densities,offering both symmetric and asymmetric patterns suitable for diverse real-world reliability scenarios.We establish several theoretical properties of the model,including closed-form expressions for its probability density function,cumulative distribution function,moments,hazard rate function,and quantiles.Parameter estimation is performed using seven classical estimation techniques,with extensive Monte Carlo simulations used to evaluate and compare their performance under various conditions.The practical utility and flexibility of the proposed model are illustrated using two real-world datasets from reliability and engineering applications,where the NExLL model demonstrates superior fit and predictive performance compared to existing log-logistic-basedmodels.This contribution advances the toolbox of parametric survivalmodels,offering a robust alternative formodeling complex aging and failure patterns in reliability,engineering,and other applied domains.
基金funded by Princess Nourah bint Abdulrahman UniversityResearchers Supporting Project number (PNURSP2024R408), Princess Nourah bint AbdulrahmanUniversity, Riyadh, Saudi Arabia.
文摘A basic procedure for transforming readable data into encoded forms is encryption, which ensures security when the right decryption keys are used. Hadoop is susceptible to possible cyber-attacks because it lacks built-in security measures, even though it can effectively handle and store enormous datasets using the Hadoop Distributed File System (HDFS). The increasing number of data breaches emphasizes how urgently creative encryption techniques are needed in cloud-based big data settings. This paper presents Adaptive Attribute-Based Honey Encryption (AABHE), a state-of-the-art technique that combines honey encryption with Ciphertext-Policy Attribute-Based Encryption (CP-ABE) to provide improved data security. Even if intercepted, AABHE makes sure that sensitive data cannot be accessed by unauthorized parties. With a focus on protecting huge files in HDFS, the suggested approach achieves 98% security robustness and 95% encryption efficiency, outperforming other encryption methods including Ciphertext-Policy Attribute-Based Encryption (CP-ABE), Key-Policy Attribute-Based Encryption (KB-ABE), and Advanced Encryption Standard combined with Attribute-Based Encryption (AES+ABE). By fixing Hadoop’s security flaws, AABHE fortifies its protections against data breaches and enhances Hadoop’s dependability as a platform for processing and storing massive amounts of data.
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
基金Supported by Xuhui District Health Commission,No.SHXH202214.
文摘Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.
文摘High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).
基金financially supported by National 863 Program (Grants No.2006AA 09A 102-09)National Science and Technology of Major Projects ( Grants No.2008ZX0 5025-001-001)
文摘Irregular seismic data causes problems with multi-trace processing algorithms and degrades processing quality. We introduce the Projection onto Convex Sets (POCS) based image restoration method into the seismic data reconstruction field to interpolate irregularly missing traces. For entire dead traces, we transfer the POCS iteration reconstruction process from the time to frequency domain to save computational cost because forward and reverse Fourier time transforms are not needed. In each iteration, the selection threshold parameter is important for reconstruction efficiency. In this paper, we designed two types of threshold models to reconstruct irregularly missing seismic data. The experimental results show that an exponential threshold can greatly reduce iterations and improve reconstruction efficiency compared to a linear threshold for the same reconstruction result. We also analyze the anti- noise and anti-alias ability of the POCS reconstruction method. Finally, theoretical model tests and real data examples indicate that the proposed method is efficient and applicable.
文摘JCOMM has strategy to establish the network of WMO-IOC Centres for Marine-meteorological and Oceanographic Climate Data (CMOCs) under the new Marine Climate Data System (MCDS) in 2012 for improving the quality and timeliness of the marine-meteorological and oceanographic data, metadata and products available to end users. China as a candidate of CMOC China has been approved to run on a trial basis after the 4th Meeting of the Joint IOC/WMO Technical Commission for Oceanography and Marine Meteorology (JCOMM). This article states the developing intention of CMOC China in the next few years through the brief introduction to critical marine data, products and service system and cooperation projects in the world.
文摘The absence of low-frequency information in seismic data is one of the most difficult problems in elastic full waveform inversion. Without low-frequency data, it is difficult to recover the long-wavelength components of subsurface models and the inversion converges to local minima. To solve this problem, the elastic envelope inversion method is introduced. Based on the elastic envelope operator that is capable of retrieving low- frequency signals hidden in multicomponent data, the proposed method uses the envelope of multicomponent seismic signals to construct a misfit function and then recover the long- wavelength components of the subsurface model. Numerical tests verify that the elastic envelope method reduces the inversion nonlinearity and provides better starting models for the subsequent conventional elastic full waveform inversion and elastic depth migration, even when low frequencies are missing in multicomponent data and the starting model is far from the true model. Numerical tests also suggest that the proposed method is more effective in reconstructing the long-wavelength components of the S-wave velocity model. The inversion of synthetic data based on the Marmousi-2 model shows that the resolution of conventional elastic full waveform inversion improves after using the starting model obtained using the elastic envelope method. Finally, the limitations of the elastic envelope inversion method are discussed.
基金The National Natural Science Foundation of China(No.60673060)the Natural Science Foundation of Jiangsu Province(No.BK2005047)
文摘A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods.
基金This research is sponsored by National Natural Science Foundation of China (No. 40272041) and Innovative Foundation of CNPC (N0. 04E702).
文摘In multi-component seismic exploration, the horizontal and vertical components both contain P- and SV-waves. The P- and SV-wavefields in a seismic record can be separated by their horizontal and vertical displacements when upgoing P- and SV-waves arrive at the sea floor. If the sea floor P wave velocity, S wave velocity, and density are known, the separation can be achieved in ther-p domain. The separated wavefields are then transformed to the time domain. A method of separating P- and SV-wavefields is presented in this paper and used to effectively separate P- and SV-wavefields in synthetic and real data. The application to real data shows that this method is feasible and effective. It also can be used for free surface data.
基金the Ministry of Science and Technology of China (No.2002CB714001 and 2001CCB00200)the Youth Fund of State Oceanic Administration (No. 2004203)
文摘A large number of autonomous profiling floats deployed in global oceans have provided abundant temperature and salinity profiles of the upper ocean. Many floats occasionally profile observations during the passage of tropical cyclones. These in-situ observations are valuable and useful in studying the ocean’s response to tropical cyclones, which are rarely observed due to harsh weather conditions. In this paper, the upper ocean response to the tropical cyclones in the northwestern Pacific during 2000–2005 is analyzed and discussed based on the data from Argo profiling floats. Results suggest that the passage of tropical cyclones caused the deepening of mixed layer depth (MLD), cooling of mixed layer temperature (MLT), and freshening of mixed layer salinity (MLS). The change in MLT is negatively correlated to wind speed. The cooling of the MLT extended for 50–150 km on the right side of the cyclone track. The change of MLS is almost symmetrical in distribution on both sides of the track, and the change of MLD is negatively correlated to pre-cyclone initial MLD.
基金supported by Scientific Research Special Project of TCM Profession (200907001E)Science and Technology Special Major Project for "Significant New Drugs Formulation" (2009ZX09301-005-02)
文摘Objective:To analyze the component law of Chinese patent medicines for anti-influenza and develop new prescriptions for anti-influenza by unsupervised data mining methods. Methods: Chinese patent medicine recipes for anti-influenza were collected and recorded in the database, and then the correlation coefficient between herbs, core combinations of herbs and new prescriptions were analyzed by using modified mutual information, complex system entropy cluster and unsupervised hierarchical clustering, respectively. Results: Based on analysis of 126 Chinese patent medicine recipes, the frequency of each herb occurrence in these recipes, 54 frequently-used herb pairs, 34 core combinations were determined, and 4 new recipes for influenza were developed. Conclusion: Unsupervised data mining methods are able to mine the component law quickly and develop new prescriptions.
基金This reservoir research is sponsored by the National 973 Subject Project (No. 2001CB209).
文摘Seismic data structure characteristics means the waveform character arranged in the time sequence at discrete data points in each 2-D or 3-D seismic trace. Hydrocarbon prediction using seismic data structure characteristics is a new reservoir prediction technique. When the main pay interval is in carbonate fracture and fissure-cavern type reservoirs with very strong inhomogeneity, there are some difficulties with hydrocarbon prediction. Because of the special geological conditions of the eighth zone in the Tahe oil field, we apply seismic data structure characteristics to hydrocarbon prediction for the Ordovician reservoir in this zone. We divide the area oil zone into favorable and unfavorable blocks. Eighteen well locations were proposed in the favorable oil block, drilled, and recovered higher output of oil and gas.
基金This project is supported by Special Foundation for Major State Basic Research of China (Project 973, No.G1998030415)
文摘In the industrial process situation, principal component analysis (PCA) is ageneral method in data reconciliation. However, PCA sometime is unfeasible to nonlinear featureanalysis and limited in application to nonlinear industrial process. Kernel PCA (KPCA) is extensionof PCA and can be used for nonlinear feature analysis. A nonlinear data reconciliation method basedon KPCA is proposed. The basic idea of this method is that firstly original data are mapped to highdimensional feature space by nonlinear function, and PCA is implemented in the feature space. Thennonlinear feature analysis is implemented and data are reconstructed by using the kernel. The datareconciliation method based on KPCA is applied to ternary distillation column. Simulation resultsshow that this method can filter the noise in measurements of nonlinear process and reconciliateddata can represent the true information of nonlinear process.
基金supported by the Research Fund of Tencent Computer System Co.Ltd.under Grant No.170125
文摘With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.
基金the Sub-project of National Science and Technology Major Project of China(No.2016ZX05027-002-003)the National Natural Science Foundation of China(No.41404089)+1 种基金the State Key Program of National Natural Science of China(No.41430322)the National Basic Research Program of China(973 Program)(No.2015CB45300)
文摘With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noise- contaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airbome gravity-gradiometry data from Vinton salt dome (south- west Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.
文摘The bearings of a certain type have their lives following a Weibull distribution. In a life test with 20 sets of bearings, only one set failed within the specified time, and none of the remainder failed even after the time of test has been extended. With a set of testing data like that in Table 1, it is required to estimate the reliability at the mission time. In this paper, we first use hierarchical Bayesian method to determine the prior distribution and the Bayesian estimates of various probabilities of failures, p i 's, then use the method of least squares to estimate the parameters of the Weibull distribution and the reliability. Actual computation shows that the estimates so obtained are rather robust. And the results have been adopted for practical use.