Long noncoding RNAs (IncRNAs) have been increasingly implicated in a variety of human diseases, including autoimmune disease (Wu et al., 2015), neurodegenerative diseases (Wapinski and Chang, 2011) and cancer (...Long noncoding RNAs (IncRNAs) have been increasingly implicated in a variety of human diseases, including autoimmune disease (Wu et al., 2015), neurodegenerative diseases (Wapinski and Chang, 2011) and cancer (Huarte, 2015). Due to recent advances in next-generation sequencing technologies, tens of thousands of lncRNAs have been identified and annotated, a number of them have been proven to be functional in diverse biological processes through various mechanisms.展开更多
RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpre...RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpreting the RNA-seq data.With the development of high-throughput sequencing technology,the sequencing depth of RNA-seq data increases explosively.The intricate biological process of transcriptome is more complicated and diversified beyond our imagination.Moreover,most of the remaining organisms still have no available reference genome or have only incomplete genome annotations.Therefore,a large number of bioinformatics methods for various transcriptomics studies are proposed to effectively settle these challenges.This review comprehensively summarizes the various studies in RNA-seq data analysis and their corresponding analysis methods,including genome annotation,quality control and pre-processing of reads,read alignment,transcriptome assembly,gene and isoform expression quantification,differential expression analysis,data visualization and other analyses.展开更多
Background:Data from RNA-seq experiments provide a wealth of information about the transcriptome of an organism.However,the analysis of such data is very demanding.In this study,we aimed to establish robust analysis p...Background:Data from RNA-seq experiments provide a wealth of information about the transcriptome of an organism.However,the analysis of such data is very demanding.In this study,we aimed to establish robust analysis procedures that can be used in clinical practice.Methods:We studied RNA-seq data from triple-negative breast cancer patients.Specifically,we investigated the subsampling of RNA-seq data.Results:The main results of our investigations are as follows:(1) the subsampling of RNA-seq data gave biologically realistic simulations of sequencing experiments with smaller sequencing depth but not direct scaling of count matrices;(2) the saturation of results required an average sequencing depth larger than 32 million reads and an individual sequencing depth larger than 46 million reads;and(3) for an abrogated feature selection,higher moments of the distribution of all expressed genes had a higher sensitivity for signal detection than the corresponding mean values.Conclusions:Our results reveal important characteristics of RNA-seq data that must be understood before one can apply such an approach to translational medicine.展开更多
Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the ...Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the 2D visualization of cells,a new feature selection method called HRG(Highly Regional Genes)is proposed to find the informative genes,which show regional expression patterns in the cell-cell similarity network.We mathematically find the optimal expression patterns that can maximize the proposed scoring function.In comparison with several unsupervised methods,HRG shows high accuracy and robustness,and can increase the performance of downstream cell clustering and gene correlation analysis.Also,it is applicable for selecting informative genes of sequencing-based spatial transcriptomic data.展开更多
Background: The Poisson and the Negative Binomial distributions are commonly used to model count data. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial has a variance lar...Background: The Poisson and the Negative Binomial distributions are commonly used to model count data. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial has a variance larger than the mean and therefore both models are appropriate to model over-dispersed count data. Objectives: A new two-parameter probability distribution called the Quasi-Negative Binomial Distribution (QNBD) is being studied in this paper, generalizing the well-known negative binomial distribution. This model turns out to be quite flexible for analyzing count data. Our main objectives are to estimate the parameters of the proposed distribution and to discuss its applicability to genetics data. As an application, we demonstrate that the QNBD regression representation is utilized to model genomics data sets. Results: The new distribution is shown to provide a good fit with respect to the “Akaike Information Criterion”, AIC, considered a measure of model goodness of fit. The proposed distribution may serve as a viable alternative to other distributions available in the literature for modeling count data exhibiting overdispersion, arising in various fields of scientific investigation such as genomics and biomedicine.展开更多
Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from sei...Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.展开更多
This work contributes to the theoretical foundation for pricing in data markets and offers practical insights for managing digital data exchanges in the era of big data.We propose a structured pricing model for data e...This work contributes to the theoretical foundation for pricing in data markets and offers practical insights for managing digital data exchanges in the era of big data.We propose a structured pricing model for data exchanges transitioning from quasi-public to marketoriented operations.To address the complex dynamics among data exchanges,suppliers,and consumers,the authors develop a threestage Stackelberg game framework.In this model,the data exchange acts as a leader setting transaction commission rates,suppliers are intermediate leaders determining unit prices,and consumers are followers making purchasing decisions.Two pricing strategies are examined:the Independent Pricing Approach(IPA)and the novel Perfectly Competitive Pricing Approach(PCPA),which accounts for competition among data providers.Using backward induction,the study derives subgame-perfect equilibria and proves the existence and uniqueness of Stackelberg equilibria under both approaches.Extensive numerical simulations are carried out in the model,demonstrating that PCPA enhances data demander utility,encourages supplier competition,increases transaction volume,and improves the overall profitability and sustainability of data exchanges.Social welfare analysis further confirms PCPA’s superiority in promoting efficient and fair data markets.展开更多
Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potent...Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potential to process complex datasets and support decision-making in OC diagnosis.Nevertheless,traditional ML models tend to be biased,overfitting,noisy,and less generalized.Moreover,their black-box nature reduces interpretability and limits their practical clinical applicability.In this study,we introduce an explainable ensemble learning(EL)model,TreeX-Stack,based on a stacking architecture that employs tree-based learners such as Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),and Extreme Gradient Boosting(XGBoost)as base learners,and Logistic Regression(LR)as the meta-learner to enhance ovarian cancer(OC)diagnosis.Local Interpretable ModelAgnostic Explanations(LIME)are used to explain individual predictions,making the model outputs more clinically interpretable and applicable.The model is trained on the dataset that includes demographic information,blood test,general chemistry,and tumor markers.Extensive preprocessing includes handling missing data using iterative imputation with Bayesian Ridge and addressing multicollinearity by removing features with correlation coefficients above 0.7.Relevant features are then selected using the Boruta feature selection method.To obtain robust and unbiased performance estimates during hyperparameter tuning,nested cross-validation(CV)with grid search is employed,and all experiments are repeated five times to ensure statistical reliability.TreeX-Stack demonstrates excellent diagnostic performance,achieving an accuracy of 0.9027,a precision of 0.8673,a recall of 0.9391,and an F1-score of 0.9012.Feature-importance analyses using LIME and permutation importance highlight Human Epididymis Protein 4(HE4)as the most significant biomarker for OC.The combination of high predictive performance and interpretability makes TreeX-Stack a reliable tool for clinical decision support in OC diagnosis.展开更多
Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and...Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and gridded data)and actual climate data(fixed-point observations near the sampling sites),in northeastern China’s warm temperate zone and analyzed differences in their correlations with tree-ring width index.The results were:(1)Gridded temperature data,as well as precipitation and relative humidity data from the Huailai meteorological station,was more consistent with the actual climate data;in contrast,gridded soil moisture content data showed significant discrepancies.(2)Horizontal distance had a greater impact on the representativeness of actual climate conditions than vertical elevation differences.(3)Differences in consistency between alternative and actual climate data also affected their correlations with tree-ring width indices.In some growing season months,correlation coefficients,both in magnitude and sign,differed significantly from those based on actual data.The selection of different alternative climate datasets can lead to biased results in assessing forest responses to climate change,which is detrimental to the management of forest ecosystems in harsh environments.Therefore,the scientific and rational selection of alternative climate data is essential for dendroecological and climatological research.展开更多
To address the severe challenges of PM_(2.5) and ozone co-control during the"14^(th) Five-Year Plan"period and to enhance the precision and intelligence level of air environment governance,it is imperative t...To address the severe challenges of PM_(2.5) and ozone co-control during the"14^(th) Five-Year Plan"period and to enhance the precision and intelligence level of air environment governance,it is imperative to build an efficient comprehensive management platform for regional air quality.In this paper,the specific practice in Zibo City,Shandong Province is as an example to systematically analyze the top-level design,technical implementation,and innovative application of a comprehensive management platform for regional air quality integrating"perception monitoring,data fusion,research judgment of early warnings,analysis of sources,collaborative dispatching,and evaluation assessment".Through the construction of an"sky-air-ground"integrated three-dimensional monitoring network,the platform integrates multi-source heterogeneous environmental data,and employs big data,cloud computing,artificial intelligence,CALPUFF/CMAQ,and other numerical model technologies to achieve comprehensive perception,precise prediction,intelligent source tracing,and closed-loop management of air pollution.The platform innovatively establishes a full-process closed-loop management mechanism of"data-early warning-disposition-evaluation",and achieves a fundamental transformation from passive response to active anticipation and from experience-based judgment to data driving in environmental supervision.The application results show that this platform significantly improves the scientific decision-making ability and collaborative execution efficiency of air pollution governance in Zibo City,providing a replicable and scalable comprehensive solution for similar industrial cities to achieve the continuous improvement of air quality.展开更多
tRNA-derived small RNAs(tsRNAs),as a class of regulatory small noncoding RNA,have been implicated in a wide variety of human diseases.Large amounts of tsRNA–disease associations have been identified in recent years f...tRNA-derived small RNAs(tsRNAs),as a class of regulatory small noncoding RNA,have been implicated in a wide variety of human diseases.Large amounts of tsRNA–disease associations have been identified in recent years from accumulating studies.However,repositories for cataloging the detailed information on tsRNA–disease associations are scarce.In this study,we provide a tsRNADisease database by integrating experimentally and computationally supported tsRNA–disease associations from manual curation of literatures and other related resources.tsRNADisease contains 5571 manually curated associations between 4759 tsRNAs and 166 diseases with experimental evidence from 346 studies.In addition,it also contains 5013 predicted associations between 1297 tsRNAs and 111 diseases.tsRNADisease provides a user-friendly interface to browse,retrieve,and download data conveniently.This database can improve our understanding of tsRNA deregulation in diseases and serve as a valuable resource for investigating the mechanism of disease-related tsRNAs.tsRNADisease is freely available at http://www.compgenelab.info/tsRNADisease.展开更多
0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has...0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has largely been discipline-based,relying on field investigations,data collection,experimental analyses,and data interpretation to study individual components of the Earth system.展开更多
Photoacoustic-computed tomography is a novel imaging technique that combines high absorption contrast and deep tissue penetration capability,enabling comprehensive three-dimensional imaging of biological targets.Howev...Photoacoustic-computed tomography is a novel imaging technique that combines high absorption contrast and deep tissue penetration capability,enabling comprehensive three-dimensional imaging of biological targets.However,the increasing demand for higher resolution and real-time imaging results in significant data volume,limiting data storage,transmission and processing efficiency of system.Therefore,there is an urgent need for an effective method to compress the raw data without compromising image quality.This paper presents a photoacoustic-computed tomography 3D data compression method and system based on Wavelet-Transformer.This method is based on the cooperative compression framework that integrates wavelet hard coding with deep learning-based soft decoding.It combines the multiscale analysis capability of wavelet transforms with the global feature modeling advantage of Transformers,achieving high-quality data compression and reconstruction.Experimental results using k-wave simulation suggest that the proposed compression system has advantages under extreme compression conditions,achieving a raw data compression ratio of up to 1:40.Furthermore,three-dimensional data compression experiment using in vivo mouse demonstrated that the maximum peak signal-to-noise ratio(PSNR)and structural similarity index(SSIM)values of reconstructed images reached 38.60 and 0.9583,effectively overcoming detail loss and artifacts introduced by raw data compression.All the results suggest that the proposed system can significantly reduce storage requirements and hardware cost,enhancing computational efficiency and image quality.These advantages support the development of photoacoustic-computed tomography toward higher efficiency,real-time performance and intelligent functionality.展开更多
Amid the increasing demand for data sharing,the need for flexible,secure,and auditable access control mechanisms has garnered significant attention in the academic community.However,blockchain-based ciphertextpolicy a...Amid the increasing demand for data sharing,the need for flexible,secure,and auditable access control mechanisms has garnered significant attention in the academic community.However,blockchain-based ciphertextpolicy attribute-based encryption(CP-ABE)schemes still face cumbersome ciphertext re-encryption and insufficient oversight when handling dynamic attribute changes and cross-chain collaboration.To address these issues,we propose a dynamic permission attribute-encryption scheme for multi-chain collaboration.This scheme incorporates a multiauthority architecture for distributed attribute management and integrates an attribute revocation and granting mechanism that eliminates the need for ciphertext re-encryption,effectively reducing both computational and communication overhead.It leverages the InterPlanetary File System(IPFS)for off-chain data storage and constructs a cross-chain regulatory framework—comprising a Hyperledger Fabric business chain and a FISCO BCOS regulatory chain—to record changes in decryption privileges and access behaviors in an auditable manner.Security analysis shows selective indistinguishability under chosen-plaintext attack(sIND-CPA)security under the decisional q-Parallel Bilinear Diffie-Hellman Exponent Assumption(q-PBDHE).In the performance and experimental evaluations,we compared the proposed scheme with several advanced schemes.The results show that,while preserving security,the proposed scheme achieves higher encryption/decryption efficiency and lower storage overhead for ciphertexts and keys.展开更多
With the popularization of new technologies,telephone fraud has become the main means of stealing money and personal identity information.Taking inspiration from the website authentication mechanism,we propose an end-...With the popularization of new technologies,telephone fraud has become the main means of stealing money and personal identity information.Taking inspiration from the website authentication mechanism,we propose an end-to-end datamodem scheme that transmits the caller’s digital certificates through a voice channel for the recipient to verify the caller’s identity.Encoding useful information through voice channels is very difficult without the assistance of telecommunications providers.For example,speech activity detection may quickly classify encoded signals as nonspeech signals and reject input waveforms.To address this issue,we propose a novel modulation method based on linear frequency modulation that encodes 3 bits per symbol by varying its frequency,shape,and phase,alongside a lightweightMobileNetV3-Small-based demodulator for efficient and accurate signal decoding on resource-constrained devices.This method leverages the unique characteristics of linear frequency modulation signals,making them more easily transmitted and decoded in speech channels.To ensure reliable data delivery over unstable voice links,we further introduce a robust framing scheme with delimiter-based synchronization,a sample-level position remedying algorithm,and a feedback-driven retransmission mechanism.We have validated the feasibility and performance of our system through expanded real-world evaluations,demonstrating that it outperforms existing advanced methods in terms of robustness and data transfer rate.This technology establishes the foundational infrastructure for reliable certificate delivery over voice channels,which is crucial for achieving strong caller authentication and preventing telephone fraud at its root cause.展开更多
Artificial Intelligence(AI)in healthcare enables predicting diabetes using data-driven methods instead of the traditional ways of screening the disease,which include hemoglobin A1c(HbA1c),oral glucose tolerance test(O...Artificial Intelligence(AI)in healthcare enables predicting diabetes using data-driven methods instead of the traditional ways of screening the disease,which include hemoglobin A1c(HbA1c),oral glucose tolerance test(OGTT),and fasting plasma glucose(FPG)screening techniques,which are invasive and limited in scale.Machine learning(ML)and deep neural network(DNN)models that use large datasets to learn the complex,nonlinear feature interactions,but the conventional ML algorithms are data sensitive and often show unstable predictive accuracy.Conversely,DNN models are more robust,though the ability to reach a high accuracy rate consistently on heterogeneous datasets is still an open challenge.For predicting diabetes,this work proposed a hybrid DNN approach by integrating a bidirectional long short-term memory(BiLSTM)network with a bidirectional gated recurrent unit(BiGRU).A robust DL model,developed by combining various datasets with weighted coefficients,dense operations in the connection of deep layers,and the output aggregation using batch normalization and dropout functions to avoid overfitting.The goal of this hybrid model is better generalization and consistency among various datasets,which facilitates the effective management and early intervention.The proposed DNN model exhibits an excellent predictive performance as compared to the state-of-the-art and baseline ML and DNN models for diabetes prediction tasks.The robust performance indicates the possible usefulness of DL-based models in the development of disease prediction in healthcare and other areas that demand high-quality analytics.展开更多
Reducing carbon emissions is fundamental to achieving carbon neutrality.Existing studies have typically estimated emissions by predicting fossil fuel consumption across sectors under different socioeconomic scenarios;...Reducing carbon emissions is fundamental to achieving carbon neutrality.Existing studies have typically estimated emissions by predicting fossil fuel consumption across sectors under different socioeconomic scenarios;however,uncertainties in future development often lead to deviations from these assumptions.To address this limitation,this study proposes a data-driven approach for evaluating national carbon emissions using historical data.Countries with similar energy consumption patterns were selected as reference samples,and their emission pathways were analyzed to predict future emissions for countries that have not yet reached their peak.Key indicators,including peak levels,timing,plateau duration,and post-peak decline rates,were identified.The results indicate that the trends in unpeaked economies can be effectively assessed based on the emission patterns of countries with comparable energy structures.Applying this framework to China suggests a carbon peak between 2027 and 2030,in the range of 14.207 to 16.234 Gt,followed by a gradual decline from 2031 to 2036.Compared with the average results of the existing studies,the predicted minimum and maximum emissions show error margins of 10.1% and 1.41%,respectively.This study proposes a top-down methodology that provides a transparent,reproducible,and empirical framework for forecasting carbon emission pathways,thereby offering a scientific basis for assessing countries that have not yet reached their emissions peak.展开更多
基金supported by grants from the National Key R&D Program of China (2017YFA0106700)National Natural Science Foundation of China (81772614, U1611261, 81772586 and 81602461)+3 种基金China Postdoctoral Science Foundation (2017M610573)Young Elite Scientists Sponsorship Program by CAST (2017QNRC001)Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2017, to J. Zheng)Fundamental Research Funds for the Central Universities (SYSU:17ykzd32)
文摘Long noncoding RNAs (IncRNAs) have been increasingly implicated in a variety of human diseases, including autoimmune disease (Wu et al., 2015), neurodegenerative diseases (Wapinski and Chang, 2011) and cancer (Huarte, 2015). Due to recent advances in next-generation sequencing technologies, tens of thousands of lncRNAs have been identified and annotated, a number of them have been proven to be functional in diverse biological processes through various mechanisms.
文摘RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpreting the RNA-seq data.With the development of high-throughput sequencing technology,the sequencing depth of RNA-seq data increases explosively.The intricate biological process of transcriptome is more complicated and diversified beyond our imagination.Moreover,most of the remaining organisms still have no available reference genome or have only incomplete genome annotations.Therefore,a large number of bioinformatics methods for various transcriptomics studies are proposed to effectively settle these challenges.This review comprehensively summarizes the various studies in RNA-seq data analysis and their corresponding analysis methods,including genome annotation,quality control and pre-processing of reads,read alignment,transcriptome assembly,gene and isoform expression quantification,differential expression analysis,data visualization and other analyses.
基金supported In part by the Arkansas Biosciences Institute under Grant(No.UL1TR000039)the IDeANetworks of Biomedical Research Excellence(INBRE) Grant(No.P20RR16460)
文摘Background:Data from RNA-seq experiments provide a wealth of information about the transcriptome of an organism.However,the analysis of such data is very demanding.In this study,we aimed to establish robust analysis procedures that can be used in clinical practice.Methods:We studied RNA-seq data from triple-negative breast cancer patients.Specifically,we investigated the subsampling of RNA-seq data.Results:The main results of our investigations are as follows:(1) the subsampling of RNA-seq data gave biologically realistic simulations of sequencing experiments with smaller sequencing depth but not direct scaling of count matrices;(2) the saturation of results required an average sequencing depth larger than 32 million reads and an individual sequencing depth larger than 46 million reads;and(3) for an abrogated feature selection,higher moments of the distribution of all expressed genes had a higher sensitivity for signal detection than the corresponding mean values.Conclusions:Our results reveal important characteristics of RNA-seq data that must be understood before one can apply such an approach to translational medicine.
基金supported by the National Key Research and Development Program(2020YFA0712403,2020YFA0906900)National Natural Science Foundation of China(61922047,81890993,61721003,62133006)BNRIST Young Innovation Fund(BNR2020RC01009)。
文摘Gene selection is an indispensable step for analyzing noisy and high-dimensional single-cell RNA-seq(scRNA-seq)data.Compared with the commonly used variance-based methods,by mimicking the human maker selection in the 2D visualization of cells,a new feature selection method called HRG(Highly Regional Genes)is proposed to find the informative genes,which show regional expression patterns in the cell-cell similarity network.We mathematically find the optimal expression patterns that can maximize the proposed scoring function.In comparison with several unsupervised methods,HRG shows high accuracy and robustness,and can increase the performance of downstream cell clustering and gene correlation analysis.Also,it is applicable for selecting informative genes of sequencing-based spatial transcriptomic data.
文摘Background: The Poisson and the Negative Binomial distributions are commonly used to model count data. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial has a variance larger than the mean and therefore both models are appropriate to model over-dispersed count data. Objectives: A new two-parameter probability distribution called the Quasi-Negative Binomial Distribution (QNBD) is being studied in this paper, generalizing the well-known negative binomial distribution. This model turns out to be quite flexible for analyzing count data. Our main objectives are to estimate the parameters of the proposed distribution and to discuss its applicability to genetics data. As an application, we demonstrate that the QNBD regression representation is utilized to model genomics data sets. Results: The new distribution is shown to provide a good fit with respect to the “Akaike Information Criterion”, AIC, considered a measure of model goodness of fit. The proposed distribution may serve as a viable alternative to other distributions available in the literature for modeling count data exhibiting overdispersion, arising in various fields of scientific investigation such as genomics and biomedicine.
文摘Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.
基金supported by the National Natural Science Foundation of China[grant numbers 12171158,12371474 and 12571510]Fundamental Research Funds for the Central Universities[grant number 2025ECNU-WLJC006].
文摘This work contributes to the theoretical foundation for pricing in data markets and offers practical insights for managing digital data exchanges in the era of big data.We propose a structured pricing model for data exchanges transitioning from quasi-public to marketoriented operations.To address the complex dynamics among data exchanges,suppliers,and consumers,the authors develop a threestage Stackelberg game framework.In this model,the data exchange acts as a leader setting transaction commission rates,suppliers are intermediate leaders determining unit prices,and consumers are followers making purchasing decisions.Two pricing strategies are examined:the Independent Pricing Approach(IPA)and the novel Perfectly Competitive Pricing Approach(PCPA),which accounts for competition among data providers.Using backward induction,the study derives subgame-perfect equilibria and proves the existence and uniqueness of Stackelberg equilibria under both approaches.Extensive numerical simulations are carried out in the model,demonstrating that PCPA enhances data demander utility,encourages supplier competition,increases transaction volume,and improves the overall profitability and sustainability of data exchanges.Social welfare analysis further confirms PCPA’s superiority in promoting efficient and fair data markets.
基金supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)under the grant number IMSIU-DDRSP2601.
文摘Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potential to process complex datasets and support decision-making in OC diagnosis.Nevertheless,traditional ML models tend to be biased,overfitting,noisy,and less generalized.Moreover,their black-box nature reduces interpretability and limits their practical clinical applicability.In this study,we introduce an explainable ensemble learning(EL)model,TreeX-Stack,based on a stacking architecture that employs tree-based learners such as Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),and Extreme Gradient Boosting(XGBoost)as base learners,and Logistic Regression(LR)as the meta-learner to enhance ovarian cancer(OC)diagnosis.Local Interpretable ModelAgnostic Explanations(LIME)are used to explain individual predictions,making the model outputs more clinically interpretable and applicable.The model is trained on the dataset that includes demographic information,blood test,general chemistry,and tumor markers.Extensive preprocessing includes handling missing data using iterative imputation with Bayesian Ridge and addressing multicollinearity by removing features with correlation coefficients above 0.7.Relevant features are then selected using the Boruta feature selection method.To obtain robust and unbiased performance estimates during hyperparameter tuning,nested cross-validation(CV)with grid search is employed,and all experiments are repeated five times to ensure statistical reliability.TreeX-Stack demonstrates excellent diagnostic performance,achieving an accuracy of 0.9027,a precision of 0.8673,a recall of 0.9391,and an F1-score of 0.9012.Feature-importance analyses using LIME and permutation importance highlight Human Epididymis Protein 4(HE4)as the most significant biomarker for OC.The combination of high predictive performance and interpretability makes TreeX-Stack a reliable tool for clinical decision support in OC diagnosis.
基金supported by the International Partnership program of the Chinese Academy of Sciences(170GJHZ2023074GC)National Natural Science Foundation of China(42425706 and 42488201)+1 种基金National Key Research and Development Program of China(2024YFF0807902)Beijing Natural Science Foundation(8242041),and China Postdoctoral Science Foundation(2025M770353).
文摘Accurately assessing the relationship between tree growth and climatic factors is of great importance in dendrochronology.This study evaluated the consistency between alternative climate datasets(including station and gridded data)and actual climate data(fixed-point observations near the sampling sites),in northeastern China’s warm temperate zone and analyzed differences in their correlations with tree-ring width index.The results were:(1)Gridded temperature data,as well as precipitation and relative humidity data from the Huailai meteorological station,was more consistent with the actual climate data;in contrast,gridded soil moisture content data showed significant discrepancies.(2)Horizontal distance had a greater impact on the representativeness of actual climate conditions than vertical elevation differences.(3)Differences in consistency between alternative and actual climate data also affected their correlations with tree-ring width indices.In some growing season months,correlation coefficients,both in magnitude and sign,differed significantly from those based on actual data.The selection of different alternative climate datasets can lead to biased results in assessing forest responses to climate change,which is detrimental to the management of forest ecosystems in harsh environments.Therefore,the scientific and rational selection of alternative climate data is essential for dendroecological and climatological research.
文摘To address the severe challenges of PM_(2.5) and ozone co-control during the"14^(th) Five-Year Plan"period and to enhance the precision and intelligence level of air environment governance,it is imperative to build an efficient comprehensive management platform for regional air quality.In this paper,the specific practice in Zibo City,Shandong Province is as an example to systematically analyze the top-level design,technical implementation,and innovative application of a comprehensive management platform for regional air quality integrating"perception monitoring,data fusion,research judgment of early warnings,analysis of sources,collaborative dispatching,and evaluation assessment".Through the construction of an"sky-air-ground"integrated three-dimensional monitoring network,the platform integrates multi-source heterogeneous environmental data,and employs big data,cloud computing,artificial intelligence,CALPUFF/CMAQ,and other numerical model technologies to achieve comprehensive perception,precise prediction,intelligent source tracing,and closed-loop management of air pollution.The platform innovatively establishes a full-process closed-loop management mechanism of"data-early warning-disposition-evaluation",and achieves a fundamental transformation from passive response to active anticipation and from experience-based judgment to data driving in environmental supervision.The application results show that this platform significantly improves the scientific decision-making ability and collaborative execution efficiency of air pollution governance in Zibo City,providing a replicable and scalable comprehensive solution for similar industrial cities to achieve the continuous improvement of air quality.
基金supported by the National Natural Science Foundation of China(91959106)the Foundation of the Shanghai Municipal Education Commission(24RGZNC02)+4 种基金Shanghai Key Laboratory of Intelligent Information Processing,Fudan University(IIPL-2025-RD3-02)Key University Science Research Project of Anhui Province(2023AH030108)Climbing Peak Training Program for Innovative Technology team of Yijishan Hospital,Wannan Medical College(PF201904)Peak Training Program for Scientific Research of Yijishan Hospital,Wannan Medical College(GF2019G15)the talent project of the First Affiliated Hospital of Wannan Medical College(Yijishan Hospital of Wannan Medical College)(YR202422).
文摘tRNA-derived small RNAs(tsRNAs),as a class of regulatory small noncoding RNA,have been implicated in a wide variety of human diseases.Large amounts of tsRNA–disease associations have been identified in recent years from accumulating studies.However,repositories for cataloging the detailed information on tsRNA–disease associations are scarce.In this study,we provide a tsRNADisease database by integrating experimentally and computationally supported tsRNA–disease associations from manual curation of literatures and other related resources.tsRNADisease contains 5571 manually curated associations between 4759 tsRNAs and 166 diseases with experimental evidence from 346 studies.In addition,it also contains 5013 predicted associations between 1297 tsRNAs and 111 diseases.tsRNADisease provides a user-friendly interface to browse,retrieve,and download data conveniently.This database can improve our understanding of tsRNA deregulation in diseases and serve as a valuable resource for investigating the mechanism of disease-related tsRNAs.tsRNADisease is freely available at http://www.compgenelab.info/tsRNADisease.
基金supported by National Key R&D Program of China(No.2021YFF0501301)the National Natural Science Foundation of China(No.42172231)。
文摘0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has largely been discipline-based,relying on field investigations,data collection,experimental analyses,and data interpretation to study individual components of the Earth system.
基金supported by the National Key R&D Program of China[Grant No.2023YFF0713600]the National Natural Science Foundation of China[Grant No.62275062]+3 种基金Project of Shandong Innovation and Startup Community of High-end Medical Apparatus and Instruments[Grant No.2023-SGTTXM-002 and 2024-SGTTXM-005]the Shandong Province Technology Innovation Guidance Plan(Central Leading Local Science and Technology Development Fund)[Grant No.YDZX2023115]the Taishan Scholar Special Funding Project of Shandong Provincethe Shandong Laboratory of Advanced Biomaterials and Medical Devices in Weihai[Grant No.ZL202402].
文摘Photoacoustic-computed tomography is a novel imaging technique that combines high absorption contrast and deep tissue penetration capability,enabling comprehensive three-dimensional imaging of biological targets.However,the increasing demand for higher resolution and real-time imaging results in significant data volume,limiting data storage,transmission and processing efficiency of system.Therefore,there is an urgent need for an effective method to compress the raw data without compromising image quality.This paper presents a photoacoustic-computed tomography 3D data compression method and system based on Wavelet-Transformer.This method is based on the cooperative compression framework that integrates wavelet hard coding with deep learning-based soft decoding.It combines the multiscale analysis capability of wavelet transforms with the global feature modeling advantage of Transformers,achieving high-quality data compression and reconstruction.Experimental results using k-wave simulation suggest that the proposed compression system has advantages under extreme compression conditions,achieving a raw data compression ratio of up to 1:40.Furthermore,three-dimensional data compression experiment using in vivo mouse demonstrated that the maximum peak signal-to-noise ratio(PSNR)and structural similarity index(SSIM)values of reconstructed images reached 38.60 and 0.9583,effectively overcoming detail loss and artifacts introduced by raw data compression.All the results suggest that the proposed system can significantly reduce storage requirements and hardware cost,enhancing computational efficiency and image quality.These advantages support the development of photoacoustic-computed tomography toward higher efficiency,real-time performance and intelligent functionality.
文摘Amid the increasing demand for data sharing,the need for flexible,secure,and auditable access control mechanisms has garnered significant attention in the academic community.However,blockchain-based ciphertextpolicy attribute-based encryption(CP-ABE)schemes still face cumbersome ciphertext re-encryption and insufficient oversight when handling dynamic attribute changes and cross-chain collaboration.To address these issues,we propose a dynamic permission attribute-encryption scheme for multi-chain collaboration.This scheme incorporates a multiauthority architecture for distributed attribute management and integrates an attribute revocation and granting mechanism that eliminates the need for ciphertext re-encryption,effectively reducing both computational and communication overhead.It leverages the InterPlanetary File System(IPFS)for off-chain data storage and constructs a cross-chain regulatory framework—comprising a Hyperledger Fabric business chain and a FISCO BCOS regulatory chain—to record changes in decryption privileges and access behaviors in an auditable manner.Security analysis shows selective indistinguishability under chosen-plaintext attack(sIND-CPA)security under the decisional q-Parallel Bilinear Diffie-Hellman Exponent Assumption(q-PBDHE).In the performance and experimental evaluations,we compared the proposed scheme with several advanced schemes.The results show that,while preserving security,the proposed scheme achieves higher encryption/decryption efficiency and lower storage overhead for ciphertexts and keys.
文摘With the popularization of new technologies,telephone fraud has become the main means of stealing money and personal identity information.Taking inspiration from the website authentication mechanism,we propose an end-to-end datamodem scheme that transmits the caller’s digital certificates through a voice channel for the recipient to verify the caller’s identity.Encoding useful information through voice channels is very difficult without the assistance of telecommunications providers.For example,speech activity detection may quickly classify encoded signals as nonspeech signals and reject input waveforms.To address this issue,we propose a novel modulation method based on linear frequency modulation that encodes 3 bits per symbol by varying its frequency,shape,and phase,alongside a lightweightMobileNetV3-Small-based demodulator for efficient and accurate signal decoding on resource-constrained devices.This method leverages the unique characteristics of linear frequency modulation signals,making them more easily transmitted and decoded in speech channels.To ensure reliable data delivery over unstable voice links,we further introduce a robust framing scheme with delimiter-based synchronization,a sample-level position remedying algorithm,and a feedback-driven retransmission mechanism.We have validated the feasibility and performance of our system through expanded real-world evaluations,demonstrating that it outperforms existing advanced methods in terms of robustness and data transfer rate.This technology establishes the foundational infrastructure for reliable certificate delivery over voice channels,which is crucial for achieving strong caller authentication and preventing telephone fraud at its root cause.
基金supported by the School of Digital Science,Universiti Brunei Darussalam,Brunei.
文摘Artificial Intelligence(AI)in healthcare enables predicting diabetes using data-driven methods instead of the traditional ways of screening the disease,which include hemoglobin A1c(HbA1c),oral glucose tolerance test(OGTT),and fasting plasma glucose(FPG)screening techniques,which are invasive and limited in scale.Machine learning(ML)and deep neural network(DNN)models that use large datasets to learn the complex,nonlinear feature interactions,but the conventional ML algorithms are data sensitive and often show unstable predictive accuracy.Conversely,DNN models are more robust,though the ability to reach a high accuracy rate consistently on heterogeneous datasets is still an open challenge.For predicting diabetes,this work proposed a hybrid DNN approach by integrating a bidirectional long short-term memory(BiLSTM)network with a bidirectional gated recurrent unit(BiGRU).A robust DL model,developed by combining various datasets with weighted coefficients,dense operations in the connection of deep layers,and the output aggregation using batch normalization and dropout functions to avoid overfitting.The goal of this hybrid model is better generalization and consistency among various datasets,which facilitates the effective management and early intervention.The proposed DNN model exhibits an excellent predictive performance as compared to the state-of-the-art and baseline ML and DNN models for diabetes prediction tasks.The robust performance indicates the possible usefulness of DL-based models in the development of disease prediction in healthcare and other areas that demand high-quality analytics.
基金The National Natural Science Foundation of China(No.52470211)Special Foundation of Jiangsu Province Science and Technology Plan(No.BZ2024017)RECLAIM Network Plus Project(No.EP/W034034/1).
文摘Reducing carbon emissions is fundamental to achieving carbon neutrality.Existing studies have typically estimated emissions by predicting fossil fuel consumption across sectors under different socioeconomic scenarios;however,uncertainties in future development often lead to deviations from these assumptions.To address this limitation,this study proposes a data-driven approach for evaluating national carbon emissions using historical data.Countries with similar energy consumption patterns were selected as reference samples,and their emission pathways were analyzed to predict future emissions for countries that have not yet reached their peak.Key indicators,including peak levels,timing,plateau duration,and post-peak decline rates,were identified.The results indicate that the trends in unpeaked economies can be effectively assessed based on the emission patterns of countries with comparable energy structures.Applying this framework to China suggests a carbon peak between 2027 and 2030,in the range of 14.207 to 16.234 Gt,followed by a gradual decline from 2031 to 2036.Compared with the average results of the existing studies,the predicted minimum and maximum emissions show error margins of 10.1% and 1.41%,respectively.This study proposes a top-down methodology that provides a transparent,reproducible,and empirical framework for forecasting carbon emission pathways,thereby offering a scientific basis for assessing countries that have not yet reached their emissions peak.