Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using d...Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.展开更多
We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction...We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.展开更多
The unique composition of milk makes this basic foodstuff into an exceptional raw material for the production of new ingredients with desired properties and diverse applications in the food industry. The fractionation...The unique composition of milk makes this basic foodstuff into an exceptional raw material for the production of new ingredients with desired properties and diverse applications in the food industry. The fractionation of milk is the key in the development of those ingredients and products;hence continuous research and development on this field, especially various levels of fractionation and separation by filtration, have been carried out. This review focuses on the production of milk fractions as well as their particular properties, applications and processes that increase their exploitation. Whey proteins and caseins from the protein fraction are excellent emulsifiers and protein supplements. Besides, they can be chemically or enzymatically modified to obtain bioactive peptides with numerous functional and nutritional properties. In this context, valorization techniques of cheese-whey proteins, by-product of dairy industry that constitutes both economic and environmental problems, are being developed. Phospholipids from the milk fat fraction are powerful emulsifiers and also have exclusive nutraceutical properties. In addition, enzyme modification of milk phospholipids makes it possible to tailor emulsifiers with particular properties. However, several aspects remain to be overcome;those refer to a deeper understanding of the healthy, functional and nutritional properties of these new ingredients that might be barriers for its use and acceptability. Additionally, in this review, alternative applications of milk constituents in the non-food area such as in the manufacture of plastic materials and textile fibers are also introduced. The unmet needs, the cross-fertilization in between various protein domains,the carbon footprint requirements, the environmental necessities, the health and wellness new demand, etc., are dominant factors in the search for innovation approaches;these factors are also outlining the further innovation potential deriving from those “apparent” constrains obliging science and technology to take them into account.展开更多
Smart Materials are along with Innovation attributes and Artificial Intelligence among the most used “buzz” words in all media. Central to their practical occurrence, many talents are to be gathered within new conte...Smart Materials are along with Innovation attributes and Artificial Intelligence among the most used “buzz” words in all media. Central to their practical occurrence, many talents are to be gathered within new contextual data influxes. Has this, in the last 20 years, changed some of the essential fundamental dimensions and the required skills of the actors such as providers, users, insiders, etc.? This is a preliminary focus and prelude of this review. As an example, polysaccharide materials are the most abundant macromolecules present as an integral part of the natural system of our planet. They are renewable, biodegradable, carbon neutral with low environmental, health and safety risks and serve as structural materials in the cell walls of plants. Most of them are used, for many years, as engineering materials in many important industrial processes, such as pulp and papermaking and manufacture of synthetic textile fibres. They are also used in other domains such as conversion into biofuels and, more recently, in the design of processes using polysaccharide nanoparticles. The main properties of polysaccharides (e.g. low density, thermal stability, chemical resistance, high mechanical strength…), together with their biocompatibility, biodegradability, functionality, durability and uniformity, allow their use for manufacturing smart materials such as blends and composites, electroactive polymers and hydrogels which can be obtained 1) through direct utilization and/or 2) after chemical or physical modifications of the polysaccharides. This paper reviews recent works developed on polysaccharides, mainly on cellulose, hemicelluloses, chitin, chitosans, alginates, and their by-products (blends and composites), with the objectives of manufacturing smart materials. It is worth noting that, today, the fundamental understanding of the molecular level interactions that confer smartness to polysaccharides remains poor and one can predict that new experimental and theoretical tools will emerge to develop the necessary understanding of the structure-property-function relationships that will enable polysaccharide-smartness to be better understood and controlled, giving rise to the development of new and innovative applications such as nanotechnology, foods, cosmetics and medicine (e.g. controlled drug release and regenerative medicine) and so, opening up major commercial markets in the context of green chemistry.展开更多
0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has...0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has largely been discipline-based,relying on field investigations,data collection,experimental analyses,and data interpretation to study individual components of the Earth system.展开更多
This research paper describes the design and implementation of the Consultative Committee for Space Data Systems (CCSDS) standards REF _Ref401069962 \r \h \* MERGEFORMAT [1] for Space Data Link Layer Protocol (SDLP). ...This research paper describes the design and implementation of the Consultative Committee for Space Data Systems (CCSDS) standards REF _Ref401069962 \r \h \* MERGEFORMAT [1] for Space Data Link Layer Protocol (SDLP). The primer focus is the telecommand (TC) part of the standard. The implementation of the standard was in the form of DLL functions using C++ programming language. The second objective of this paper was to use the DLL functions with OMNeT++ simulating environment to create a simulator in order to analyze the mean end-to-end Packet Delay, maximum achievable application layer throughput for a given fixed link capacity and normalized protocol overhead, defined as the total number of bytes transmitted on the link in a given period of time (e.g. per second) divided by the number of bytes of application data received at the application layer model data sink. In addition, the DLL was also integrated with Ground Support Equipment Operating System (GSEOS), a software system for space instruments and small spacecrafts especially suited for low budget missions. The SDLP is designed for rapid test system design and high flexibility for changing telemetry and command requirements. GSEOS can be seamlessly moved from EM/FM development (bench testing) to flight operations. It features the Python programming language as a configuration/scripting tool and can easily be extended to accommodate custom hardware interfaces. This paper also shows the results of the simulations and its analysis.展开更多
Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from sei...Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.展开更多
Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potent...Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potential to process complex datasets and support decision-making in OC diagnosis.Nevertheless,traditional ML models tend to be biased,overfitting,noisy,and less generalized.Moreover,their black-box nature reduces interpretability and limits their practical clinical applicability.In this study,we introduce an explainable ensemble learning(EL)model,TreeX-Stack,based on a stacking architecture that employs tree-based learners such as Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),and Extreme Gradient Boosting(XGBoost)as base learners,and Logistic Regression(LR)as the meta-learner to enhance ovarian cancer(OC)diagnosis.Local Interpretable ModelAgnostic Explanations(LIME)are used to explain individual predictions,making the model outputs more clinically interpretable and applicable.The model is trained on the dataset that includes demographic information,blood test,general chemistry,and tumor markers.Extensive preprocessing includes handling missing data using iterative imputation with Bayesian Ridge and addressing multicollinearity by removing features with correlation coefficients above 0.7.Relevant features are then selected using the Boruta feature selection method.To obtain robust and unbiased performance estimates during hyperparameter tuning,nested cross-validation(CV)with grid search is employed,and all experiments are repeated five times to ensure statistical reliability.TreeX-Stack demonstrates excellent diagnostic performance,achieving an accuracy of 0.9027,a precision of 0.8673,a recall of 0.9391,and an F1-score of 0.9012.Feature-importance analyses using LIME and permutation importance highlight Human Epididymis Protein 4(HE4)as the most significant biomarker for OC.The combination of high predictive performance and interpretability makes TreeX-Stack a reliable tool for clinical decision support in OC diagnosis.展开更多
Over the last few years, the Internet of Things (IoT) has become an omnipresent term. The IoT expands the existing common concepts, anytime and anyplace to the connectivity for anything. The proliferation in IoT offer...Over the last few years, the Internet of Things (IoT) has become an omnipresent term. The IoT expands the existing common concepts, anytime and anyplace to the connectivity for anything. The proliferation in IoT offers opportunities but may also bear risks. A hitherto neglected aspect is the possible increase in power consumption as smart devices in IoT applications are expected to be reachable by other devices at all times. This implies that the device is consuming electrical energy even when it is not in use for its primary function. Many researchers’ communities have started addressing storage ability like cache memory of smart devices using the concept called—Named Data Networking (NDN) to achieve better energy efficient communication model. In NDN, memory or buffer overflow is the common challenge especially when internal memory of node exceeds its limit and data with highest degree of freshness may not be accommodated and entire scenarios behaves like a traditional network. In such case, Data Caching is not performed by intermediate nodes to guarantee highest degree of freshness. On the periodical updates sent from data producers, it is exceedingly demanded that data consumers must get up to date information at cost of lease energy. Consequently, there is challenge in maintaining tradeoff between freshness energy consumption during Publisher-Subscriber interaction. In our work, we proposed the architecture to overcome cache strategy issue by Smart Caching Algorithm for improvement in memory management and data freshness. The smart caching strategy updates the data at precise interval by keeping garbage data into consideration. It is also observed from experiment that data redundancy can be easily obtained by ignoring/dropping data packets for the information which is not of interest by other participating nodes in network, ultimately leading to optimizing tradeoff between freshness and energy required.展开更多
Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including ...Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.展开更多
Benthic habitat mapping is an emerging discipline in the international marine field in recent years,providing an effective tool for marine spatial planning,marine ecological management,and decision-making applications...Benthic habitat mapping is an emerging discipline in the international marine field in recent years,providing an effective tool for marine spatial planning,marine ecological management,and decision-making applications.Seabed sediment classification is one of the main contents of seabed habitat mapping.In response to the impact of remote sensing imaging quality and the limitations of acoustic measurement range,where a single data source does not fully reflect the substrate type,we proposed a high-precision seabed habitat sediment classification method that integrates data from multiple sources.Based on WorldView-2 multi-spectral remote sensing image data and multibeam bathymetry data,constructed a random forests(RF)classifier with optimal feature selection.A seabed sediment classification experiment integrating optical remote sensing and acoustic remote sensing data was carried out in the shallow water area of Wuzhizhou Island,Hainan,South China.Different seabed sediment types,such as sand,seagrass,and coral reefs were effectively identified,with an overall classification accuracy of 92%.Experimental results show that RF matrix optimized by fusing multi-source remote sensing data for feature selection were better than the classification results of simple combinations of data sources,which improved the accuracy of seabed sediment classification.Therefore,the method proposed in this paper can be effectively applied to high-precision seabed sediment classification and habitat mapping around islands and reefs.展开更多
Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leadi...Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leading social media platform—Sina Weibo, a hybrid of Twitter and Facebook—has more than 600 million users. Weibo’s great market penetration suggests that tourism operators and markets need to understand how to build effective and sustainable communications on Chinese social media platforms. In order to offer a better decision support platform to tourism destination managers as well as Chinese tourists, we proposed a framework using linked data on Sina Weibo. Linked Data is a term referring to using the Internet to connect related data. We will show how it can be used and how ontology can be designed to include the users’ context (e.g., GPS locations). Our framework will provide a good theoretical foundation for further understand Chinese tourists’ expectation, experiences, behaviors and new trends in Switzerland.展开更多
In Chinese language studies, both “The Textual Research on Historical Documents” and “The Comparative Study of Historical Data” are traditional in methodology and they both deserve being treasured, passed on, and ...In Chinese language studies, both “The Textual Research on Historical Documents” and “The Comparative Study of Historical Data” are traditional in methodology and they both deserve being treasured, passed on, and further developed. It will certainly do harm to the development of academic research if any of the two methods is given unreasonable priority. The author claims that the best or one of the best methodologies of the historical study of Chinese language is the combination of the two, hence a new interpretation of “The Double-proof Method”. Meanwhile, this essay is also an attempt to put forward “The Law of Quan-ma and Gui-mei” in Chinese language studies, in which the author believes that it is not advisable to either treat Gui-mei as Quan-ma or vice versa in linguistic research. It is crucial for us to respect always the language facts first, which is considered the very soul of linguistics.展开更多
The relatively rapid recession of glaciers in the Himalayas and formation of moraine dammed glacial lakes(MDGLs) in the recent past have increased the risk of glacier lake outburst floods(GLOF) in the countries of Nep...The relatively rapid recession of glaciers in the Himalayas and formation of moraine dammed glacial lakes(MDGLs) in the recent past have increased the risk of glacier lake outburst floods(GLOF) in the countries of Nepal and Bhutan and in the mountainous territory of Sikkim in India. As a product of climate change and global warming, such a risk has not only raised the level of threats to the habitation and infrastructure of the region, but has also contributed to the worsening of the balance of the unique ecosystem that exists in this domain that sustains several of the highest mountain peaks of the world. This study attempts to present an up to date mapping of the MDGLs in the central and eastern Himalayan regions using remote sensing data, with an objective to analyse their surface area variations with time from 1990 through 2015, disaggregated over six episodes. The study also includes the evaluation for susceptibility of MDGLs to GLOF with the least criteria decision analysis(LCDA). Forty two major MDGLs, each having a lake surface area greater than 0.2 km2, that were identified in the Himalayan ranges of Nepal, Bhutan, and Sikkim, have been categorized according to their surface area expansion rates in space and time. The lakes have been identified as located within the elevation range of 3800 m and6800 m above mean sea level(a msl). With a total surface area of 37.9 km2, these MDGLs as a whole were observed to have expanded by an astonishing 43.6% in area over the 25 year period of this study. A factor is introduced to numerically sort the lakes in terms of their relative yearly expansion rates, based on their interpretation of their surface area extents from satellite imageries. Verification of predicted GLOF events in the past using this factor with the limited field data as reported in literature indicates that the present analysis may be considered a sufficiently reliable and rapid technique for assessing the potential bursting susceptibility of the MDGLs. The analysis also indicates that, as of now, there are eight MDGLs in the region which appear to be in highly vulnerable states and have high chances in causing potential GLOF events anytime in the recent future.展开更多
The Mountain Science Data Center(MSDC),founded in 2021 under the Institute of Mountain Hazards and Environment(IMHE),Chinese Academy of Sciences,manages the entire lifecycle of mountain science data.It integrates data...The Mountain Science Data Center(MSDC),founded in 2021 under the Institute of Mountain Hazards and Environment(IMHE),Chinese Academy of Sciences,manages the entire lifecycle of mountain science data.It integrates data from diverse sources,including debris flows,landslides,soils,ecology,geology,natural resources,basic geographic information,and socio-economic data.The center provides comprehensive services,including data collection,processing,analysis tools,modeling,and application support,offering reliable data backing for numerous research projects within the institute.Data management and services are accessible via the Mountain Science Data Center Portal(https://www.msdc.ac.cn/),ensuring long-term,stable,and trustworthy access to facilitate scientific research and institutional development.展开更多
Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management.It is a common requirement to reuse the data for clinical research.However,we have to face ...Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management.It is a common requirement to reuse the data for clinical research.However,we have to face challenges like the inconsistence of terminology in electronic health records (EHR) and the complexities in data quality and data formats in regional healthcare platform.In this paper,we propose methodology and process on constructing large scale cohorts which forms the basis of causality and comparative effectiveness relationship in epidemiology.We firstly constructed a Chinese terminology knowledge graph to deal with the diversity of vocabularies on regional platform.Secondly,we built special disease case repositories (i.e.,heart failure repository) that utilize the graph to search the related patients and to normalize the data.Based on the requirements of the clinical research which aimed to explore the effectiveness of taking statin on 180-days readmission in patients with heart failure,we built a large-scale retrospective cohort with 29647 cases of heart failure patients from the heart failure repository.After the propensity score matching,the study group (n=6346) and the control group (n=6346) with parallel clinical characteristics were acquired.Logistic regression analysis showed that taking statins had a negative correlation with 180-days readmission in heart failure patients.This paper presents the workflow and application example of big data mining based on regional EHR data.展开更多
Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.Ho...Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.However,the analysis and visualization of Ribo-seq data remain challenging.Despite the availability of various analytical pipelines,improvements in comprehensiveness,accuracy,and user-friendliness are still necessary.In this study,we develop RiboParser/RiboShiny,a robust framework for analyzing and visualizing Ribo-seq data.Building on published methods,we optimize ribosome structure-based and start/stopbased models to improve the accuracy and stability of P-site detection,even in species with a high proportion of leaderless transcripts.Leveraging these improvements,RiboParser offers comprehensive analyses,including quality control,gene-level analysis,codon-level analysis,and the analysis of Ribo-seq variants.Meanwhile,RiboShiny provides a user-friendly and adaptable platform for data visualization,facilitating deeper insights into the translational landscape.Furthermore,the integration of standardized genome annotation renders our platform universally applicable to various organisms with sequenced genomes.This framework has the potential to significantly improve the precision and efficiency of Ribo-seq data interpretation,thereby deepening our understanding of translational regulation.展开更多
Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the clou...Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the cloud and inference can be obtained on real-world data.In most applications,it is important to compress the vision data due to the enormous bandwidth and memory requirements.Video codecs exploit spatial and temporal correlations to achieve high compression ratios,but they are computationally expensive.This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos.However,contrary to the normal practice of reconstructing the full-resolution frames through motion compensation,this work proposes to infer the class label from the block-based computed motion fields directly.Motion fields are a richer and more complex representation of motion vectors,where each motion vector carries the magnitude and direction information.This approach has two advantages:the cost of motion compensation and video decoding is avoided,and the dimensions of the input signal are highly reduced.This results in a shallower network for classification.The neural network can be trained using motion vectors in two ways:complex representations and magnitude-direction pairs.The proposed work trains a convolutional neural network on the direction and magnitude tensors of the motion fields.Our experimental results show 20×faster convergence during training,reduced overfitting,and accelerated inference on a hand gesture recognition dataset compared to full-resolution and downsampled frames.We validate the proposed methodology on the HGds dataset,achieving a testing accuracy of 99.21%,on the HMDB51 dataset,achieving 82.54%accuracy,and on the UCF101 dataset,achieving 97.13%accuracy,outperforming state-of-the-art methods in computational efficiency.展开更多
High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging ...High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).展开更多
基金The work described in this paper was fully supported by a grant from Hong Kong Metropolitan University(RIF/2021/05).
文摘Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.
文摘We develop various statistical methods important for multidimensional genetic data analysis. Theorems justifying application of these methods are established. We concentrate on the multifactor dimensionality reduction, logic regression, random forests, stochastic gradient boosting along with their new modifications. We use complementary approaches to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and non-genetic risk factors are examined. To perform the data analysis concerning the coronary heart disease and myocardial infarction the Lomonosov Moscow State University supercomputer “Chebyshev” was employed.
文摘The unique composition of milk makes this basic foodstuff into an exceptional raw material for the production of new ingredients with desired properties and diverse applications in the food industry. The fractionation of milk is the key in the development of those ingredients and products;hence continuous research and development on this field, especially various levels of fractionation and separation by filtration, have been carried out. This review focuses on the production of milk fractions as well as their particular properties, applications and processes that increase their exploitation. Whey proteins and caseins from the protein fraction are excellent emulsifiers and protein supplements. Besides, they can be chemically or enzymatically modified to obtain bioactive peptides with numerous functional and nutritional properties. In this context, valorization techniques of cheese-whey proteins, by-product of dairy industry that constitutes both economic and environmental problems, are being developed. Phospholipids from the milk fat fraction are powerful emulsifiers and also have exclusive nutraceutical properties. In addition, enzyme modification of milk phospholipids makes it possible to tailor emulsifiers with particular properties. However, several aspects remain to be overcome;those refer to a deeper understanding of the healthy, functional and nutritional properties of these new ingredients that might be barriers for its use and acceptability. Additionally, in this review, alternative applications of milk constituents in the non-food area such as in the manufacture of plastic materials and textile fibers are also introduced. The unmet needs, the cross-fertilization in between various protein domains,the carbon footprint requirements, the environmental necessities, the health and wellness new demand, etc., are dominant factors in the search for innovation approaches;these factors are also outlining the further innovation potential deriving from those “apparent” constrains obliging science and technology to take them into account.
文摘Smart Materials are along with Innovation attributes and Artificial Intelligence among the most used “buzz” words in all media. Central to their practical occurrence, many talents are to be gathered within new contextual data influxes. Has this, in the last 20 years, changed some of the essential fundamental dimensions and the required skills of the actors such as providers, users, insiders, etc.? This is a preliminary focus and prelude of this review. As an example, polysaccharide materials are the most abundant macromolecules present as an integral part of the natural system of our planet. They are renewable, biodegradable, carbon neutral with low environmental, health and safety risks and serve as structural materials in the cell walls of plants. Most of them are used, for many years, as engineering materials in many important industrial processes, such as pulp and papermaking and manufacture of synthetic textile fibres. They are also used in other domains such as conversion into biofuels and, more recently, in the design of processes using polysaccharide nanoparticles. The main properties of polysaccharides (e.g. low density, thermal stability, chemical resistance, high mechanical strength…), together with their biocompatibility, biodegradability, functionality, durability and uniformity, allow their use for manufacturing smart materials such as blends and composites, electroactive polymers and hydrogels which can be obtained 1) through direct utilization and/or 2) after chemical or physical modifications of the polysaccharides. This paper reviews recent works developed on polysaccharides, mainly on cellulose, hemicelluloses, chitin, chitosans, alginates, and their by-products (blends and composites), with the objectives of manufacturing smart materials. It is worth noting that, today, the fundamental understanding of the molecular level interactions that confer smartness to polysaccharides remains poor and one can predict that new experimental and theoretical tools will emerge to develop the necessary understanding of the structure-property-function relationships that will enable polysaccharide-smartness to be better understood and controlled, giving rise to the development of new and innovative applications such as nanotechnology, foods, cosmetics and medicine (e.g. controlled drug release and regenerative medicine) and so, opening up major commercial markets in the context of green chemistry.
基金supported by National Key R&D Program of China(No.2021YFF0501301)the National Natural Science Foundation of China(No.42172231)。
文摘0 INTRODUCTION Earth science is a natural science concerned with the composition,dynamics,spatiotemporal evolution,and formation mechanisms of Earth materials(Chen and Yang,2023).Traditional Earth science research has largely been discipline-based,relying on field investigations,data collection,experimental analyses,and data interpretation to study individual components of the Earth system.
文摘This research paper describes the design and implementation of the Consultative Committee for Space Data Systems (CCSDS) standards REF _Ref401069962 \r \h \* MERGEFORMAT [1] for Space Data Link Layer Protocol (SDLP). The primer focus is the telecommand (TC) part of the standard. The implementation of the standard was in the form of DLL functions using C++ programming language. The second objective of this paper was to use the DLL functions with OMNeT++ simulating environment to create a simulator in order to analyze the mean end-to-end Packet Delay, maximum achievable application layer throughput for a given fixed link capacity and normalized protocol overhead, defined as the total number of bytes transmitted on the link in a given period of time (e.g. per second) divided by the number of bytes of application data received at the application layer model data sink. In addition, the DLL was also integrated with Ground Support Equipment Operating System (GSEOS), a software system for space instruments and small spacecrafts especially suited for low budget missions. The SDLP is designed for rapid test system design and high flexibility for changing telemetry and command requirements. GSEOS can be seamlessly moved from EM/FM development (bench testing) to flight operations. It features the Python programming language as a configuration/scripting tool and can easily be extended to accommodate custom hardware interfaces. This paper also shows the results of the simulations and its analysis.
文摘Earthquakes are highly destructive spatio-temporal phenomena whose analysis is essential for disaster preparedness and risk mitigation.Modern seismological research produces vast volumes of heterogeneous data from seismic networks,satellite observations,and geospatial repositories,creating the need for scalable infrastructures capable of integrating and analyzing such data to support intelligent decision-making.Data warehousing technologies provide a robust foundation for this purpose;however,existing earthquake-oriented data warehouses remain limited,often relying on simplified schemas,domain-specific analytics,or cataloguing efforts.This paper presents the design and implementation of a spatio-temporal data warehouse for seismic activity.The framework integrates spatial and temporal dimensions in a unified schema and introduces a novel array-based approach for managing many-to-many relationships between facts and dimensions without intermediate bridge tables.A comparative evaluation against a conventional bridge-table schema demonstrates that the array-based design improves fact-centric query performance,while the bridge-table schema remains advantageous for dimension-centric queries.To reconcile these trade-offs,a hybrid schema is proposed that retains both representations,ensuring balanced efficiency across heterogeneous workloads.The proposed framework demonstrates how spatio-temporal data warehousing can address schema complexity,improve query performance,and support multidimensional visualization.In doing so,it provides a foundation for integrating seismic analysis into broader big data-driven intelligent decision systems for disaster resilience,risk mitigation,and emergency management.
基金supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)under the grant number IMSIU-DDRSP2601.
文摘Ovarian cancer(OC)is one of the leading causes of death related to gynecological cancer,with the main difficulty of its early diagnosis and a heterogeneous nature of tumor biomarkers.Machine learning(ML)has the potential to process complex datasets and support decision-making in OC diagnosis.Nevertheless,traditional ML models tend to be biased,overfitting,noisy,and less generalized.Moreover,their black-box nature reduces interpretability and limits their practical clinical applicability.In this study,we introduce an explainable ensemble learning(EL)model,TreeX-Stack,based on a stacking architecture that employs tree-based learners such as Decision Tree(DT),Random Forest(RF),Gradient Boosting(GB),and Extreme Gradient Boosting(XGBoost)as base learners,and Logistic Regression(LR)as the meta-learner to enhance ovarian cancer(OC)diagnosis.Local Interpretable ModelAgnostic Explanations(LIME)are used to explain individual predictions,making the model outputs more clinically interpretable and applicable.The model is trained on the dataset that includes demographic information,blood test,general chemistry,and tumor markers.Extensive preprocessing includes handling missing data using iterative imputation with Bayesian Ridge and addressing multicollinearity by removing features with correlation coefficients above 0.7.Relevant features are then selected using the Boruta feature selection method.To obtain robust and unbiased performance estimates during hyperparameter tuning,nested cross-validation(CV)with grid search is employed,and all experiments are repeated five times to ensure statistical reliability.TreeX-Stack demonstrates excellent diagnostic performance,achieving an accuracy of 0.9027,a precision of 0.8673,a recall of 0.9391,and an F1-score of 0.9012.Feature-importance analyses using LIME and permutation importance highlight Human Epididymis Protein 4(HE4)as the most significant biomarker for OC.The combination of high predictive performance and interpretability makes TreeX-Stack a reliable tool for clinical decision support in OC diagnosis.
文摘Over the last few years, the Internet of Things (IoT) has become an omnipresent term. The IoT expands the existing common concepts, anytime and anyplace to the connectivity for anything. The proliferation in IoT offers opportunities but may also bear risks. A hitherto neglected aspect is the possible increase in power consumption as smart devices in IoT applications are expected to be reachable by other devices at all times. This implies that the device is consuming electrical energy even when it is not in use for its primary function. Many researchers’ communities have started addressing storage ability like cache memory of smart devices using the concept called—Named Data Networking (NDN) to achieve better energy efficient communication model. In NDN, memory or buffer overflow is the common challenge especially when internal memory of node exceeds its limit and data with highest degree of freshness may not be accommodated and entire scenarios behaves like a traditional network. In such case, Data Caching is not performed by intermediate nodes to guarantee highest degree of freshness. On the periodical updates sent from data producers, it is exceedingly demanded that data consumers must get up to date information at cost of lease energy. Consequently, there is challenge in maintaining tradeoff between freshness energy consumption during Publisher-Subscriber interaction. In our work, we proposed the architecture to overcome cache strategy issue by Smart Caching Algorithm for improvement in memory management and data freshness. The smart caching strategy updates the data at precise interval by keeping garbage data into consideration. It is also observed from experiment that data redundancy can be easily obtained by ignoring/dropping data packets for the information which is not of interest by other participating nodes in network, ultimately leading to optimizing tradeoff between freshness and energy required.
基金Supported by Xuhui District Health Commission,No.SHXH202214.
文摘Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.
基金Supported by the National Natural Science Foundation of China(Nos.42376185,41876111)the Shandong Provincial Natural Science Foundation(No.ZR2023MD073)。
文摘Benthic habitat mapping is an emerging discipline in the international marine field in recent years,providing an effective tool for marine spatial planning,marine ecological management,and decision-making applications.Seabed sediment classification is one of the main contents of seabed habitat mapping.In response to the impact of remote sensing imaging quality and the limitations of acoustic measurement range,where a single data source does not fully reflect the substrate type,we proposed a high-precision seabed habitat sediment classification method that integrates data from multiple sources.Based on WorldView-2 multi-spectral remote sensing image data and multibeam bathymetry data,constructed a random forests(RF)classifier with optimal feature selection.A seabed sediment classification experiment integrating optical remote sensing and acoustic remote sensing data was carried out in the shallow water area of Wuzhizhou Island,Hainan,South China.Different seabed sediment types,such as sand,seagrass,and coral reefs were effectively identified,with an overall classification accuracy of 92%.Experimental results show that RF matrix optimized by fusing multi-source remote sensing data for feature selection were better than the classification results of simple combinations of data sources,which improved the accuracy of seabed sediment classification.Therefore,the method proposed in this paper can be effectively applied to high-precision seabed sediment classification and habitat mapping around islands and reefs.
文摘Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leading social media platform—Sina Weibo, a hybrid of Twitter and Facebook—has more than 600 million users. Weibo’s great market penetration suggests that tourism operators and markets need to understand how to build effective and sustainable communications on Chinese social media platforms. In order to offer a better decision support platform to tourism destination managers as well as Chinese tourists, we proposed a framework using linked data on Sina Weibo. Linked Data is a term referring to using the Internet to connect related data. We will show how it can be used and how ontology can be designed to include the users’ context (e.g., GPS locations). Our framework will provide a good theoretical foundation for further understand Chinese tourists’ expectation, experiences, behaviors and new trends in Switzerland.
文摘In Chinese language studies, both “The Textual Research on Historical Documents” and “The Comparative Study of Historical Data” are traditional in methodology and they both deserve being treasured, passed on, and further developed. It will certainly do harm to the development of academic research if any of the two methods is given unreasonable priority. The author claims that the best or one of the best methodologies of the historical study of Chinese language is the combination of the two, hence a new interpretation of “The Double-proof Method”. Meanwhile, this essay is also an attempt to put forward “The Law of Quan-ma and Gui-mei” in Chinese language studies, in which the author believes that it is not advisable to either treat Gui-mei as Quan-ma or vice versa in linguistic research. It is crucial for us to respect always the language facts first, which is considered the very soul of linguistics.
文摘The relatively rapid recession of glaciers in the Himalayas and formation of moraine dammed glacial lakes(MDGLs) in the recent past have increased the risk of glacier lake outburst floods(GLOF) in the countries of Nepal and Bhutan and in the mountainous territory of Sikkim in India. As a product of climate change and global warming, such a risk has not only raised the level of threats to the habitation and infrastructure of the region, but has also contributed to the worsening of the balance of the unique ecosystem that exists in this domain that sustains several of the highest mountain peaks of the world. This study attempts to present an up to date mapping of the MDGLs in the central and eastern Himalayan regions using remote sensing data, with an objective to analyse their surface area variations with time from 1990 through 2015, disaggregated over six episodes. The study also includes the evaluation for susceptibility of MDGLs to GLOF with the least criteria decision analysis(LCDA). Forty two major MDGLs, each having a lake surface area greater than 0.2 km2, that were identified in the Himalayan ranges of Nepal, Bhutan, and Sikkim, have been categorized according to their surface area expansion rates in space and time. The lakes have been identified as located within the elevation range of 3800 m and6800 m above mean sea level(a msl). With a total surface area of 37.9 km2, these MDGLs as a whole were observed to have expanded by an astonishing 43.6% in area over the 25 year period of this study. A factor is introduced to numerically sort the lakes in terms of their relative yearly expansion rates, based on their interpretation of their surface area extents from satellite imageries. Verification of predicted GLOF events in the past using this factor with the limited field data as reported in literature indicates that the present analysis may be considered a sufficiently reliable and rapid technique for assessing the potential bursting susceptibility of the MDGLs. The analysis also indicates that, as of now, there are eight MDGLs in the region which appear to be in highly vulnerable states and have high chances in causing potential GLOF events anytime in the recent future.
文摘The Mountain Science Data Center(MSDC),founded in 2021 under the Institute of Mountain Hazards and Environment(IMHE),Chinese Academy of Sciences,manages the entire lifecycle of mountain science data.It integrates data from diverse sources,including debris flows,landslides,soils,ecology,geology,natural resources,basic geographic information,and socio-economic data.The center provides comprehensive services,including data collection,processing,analysis tools,modeling,and application support,offering reliable data backing for numerous research projects within the institute.Data management and services are accessible via the Mountain Science Data Center Portal(https://www.msdc.ac.cn/),ensuring long-term,stable,and trustworthy access to facilitate scientific research and institutional development.
基金Supported by the National Major Scientific and Technological Special Project for"Significant New Drugs Development’’(No.2018ZX09201008)Special Fund Project for Information Development from Shanghai Municipal Commission of Economy and Information(No.201701013)
文摘Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management.It is a common requirement to reuse the data for clinical research.However,we have to face challenges like the inconsistence of terminology in electronic health records (EHR) and the complexities in data quality and data formats in regional healthcare platform.In this paper,we propose methodology and process on constructing large scale cohorts which forms the basis of causality and comparative effectiveness relationship in epidemiology.We firstly constructed a Chinese terminology knowledge graph to deal with the diversity of vocabularies on regional platform.Secondly,we built special disease case repositories (i.e.,heart failure repository) that utilize the graph to search the related patients and to normalize the data.Based on the requirements of the clinical research which aimed to explore the effectiveness of taking statin on 180-days readmission in patients with heart failure,we built a large-scale retrospective cohort with 29647 cases of heart failure patients from the heart failure repository.After the propensity score matching,the study group (n=6346) and the control group (n=6346) with parallel clinical characteristics were acquired.Logistic regression analysis showed that taking statins had a negative correlation with 180-days readmission in heart failure patients.This paper presents the workflow and application example of big data mining based on regional EHR data.
基金supported by the National Key Research and Development Program of China(2022YFA0912100)the National Natural Science Foundation of China(32270098 and 32470073)+1 种基金the Fundamental Research Funds for the Central Universities(2662024JC015)the National Key Laboratory of Agricultural Microbiology(AML2024D02)to Z.Z.
文摘Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.However,the analysis and visualization of Ribo-seq data remain challenging.Despite the availability of various analytical pipelines,improvements in comprehensiveness,accuracy,and user-friendliness are still necessary.In this study,we develop RiboParser/RiboShiny,a robust framework for analyzing and visualizing Ribo-seq data.Building on published methods,we optimize ribosome structure-based and start/stopbased models to improve the accuracy and stability of P-site detection,even in species with a high proportion of leaderless transcripts.Leveraging these improvements,RiboParser offers comprehensive analyses,including quality control,gene-level analysis,codon-level analysis,and the analysis of Ribo-seq variants.Meanwhile,RiboShiny provides a user-friendly and adaptable platform for data visualization,facilitating deeper insights into the translational landscape.Furthermore,the integration of standardized genome annotation renders our platform universally applicable to various organisms with sequenced genomes.This framework has the potential to significantly improve the precision and efficiency of Ribo-seq data interpretation,thereby deepening our understanding of translational regulation.
基金Supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R896).
文摘Deep neural networks have achieved excellent classification results on several computer vision benchmarks.This has led to the popularity of machine learning as a service,where trained algorithms are hosted on the cloud and inference can be obtained on real-world data.In most applications,it is important to compress the vision data due to the enormous bandwidth and memory requirements.Video codecs exploit spatial and temporal correlations to achieve high compression ratios,but they are computationally expensive.This work computes the motion fields between consecutive frames to facilitate the efficient classification of videos.However,contrary to the normal practice of reconstructing the full-resolution frames through motion compensation,this work proposes to infer the class label from the block-based computed motion fields directly.Motion fields are a richer and more complex representation of motion vectors,where each motion vector carries the magnitude and direction information.This approach has two advantages:the cost of motion compensation and video decoding is avoided,and the dimensions of the input signal are highly reduced.This results in a shallower network for classification.The neural network can be trained using motion vectors in two ways:complex representations and magnitude-direction pairs.The proposed work trains a convolutional neural network on the direction and magnitude tensors of the motion fields.Our experimental results show 20×faster convergence during training,reduced overfitting,and accelerated inference on a hand gesture recognition dataset compared to full-resolution and downsampled frames.We validate the proposed methodology on the HGds dataset,achieving a testing accuracy of 99.21%,on the HMDB51 dataset,achieving 82.54%accuracy,and on the UCF101 dataset,achieving 97.13%accuracy,outperforming state-of-the-art methods in computational efficiency.
文摘High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).