Metro is an important form of public transport in Shanghai.Based on the metro card data,we conduct the cluster analysis of Shanghai metro stations according to the pattern of passenger flow changing with time.Then the...Metro is an important form of public transport in Shanghai.Based on the metro card data,we conduct the cluster analysis of Shanghai metro stations according to the pattern of passenger flow changing with time.Then the characteristics of travel time and surrounding land use are investigated for different types of stations to explore the relationship between urban land-use characteristics and travel activities reflected by passenger flow at metro stations.It is found that the passenger flow pattern of metro stations is closely related to the location conditions of stations and its surrounding land-use patterns.Based on various characteristics,285 metro stations are classified into four types,including residential-oriented stations,employmentoriented stations,employment-residence-oriented stations,and integrated functionaloriented stations,reflecting the interaction between spontaneous travel behavior and urban land-use characteristics and providing a reference for optimizing the urban functional structure and the spatial allocation of facilities.展开更多
Assessment of SDG11.3.1 indicator of the United Nations Sustainable Development Goals(SDGs)is a valuable tool for policymakers in urban planning.This study aims to enhance the accuracy of the SDG11.3.1 evaluation and ...Assessment of SDG11.3.1 indicator of the United Nations Sustainable Development Goals(SDGs)is a valuable tool for policymakers in urban planning.This study aims to enhance the accuracy of the SDG11.3.1 evaluation and explore the impact of varying precision levels in urban built-up area on the indicator’s assessment outcomes.We developed an algorithm to generate accurate urban built-up area data products based on China’s Geographical Condition Monitoring data with a 2 m resolution.The study evaluates urban land-use efficiency in China from 2015 to 2020 across different geographical units using both the research product and data derived from other studies utilizing medium and low-resolution imagery.The results indicate:(1)A significant improvement in the accuracy of our urban built-up area data,with the SDG11.3.1 evaluation results demonstrating a more precise reflection of spatiotemporal characteristics.The indicator shows a positive correlation with the accuracy level of the built-up area data;(2)From 2015 to 2020,Chinese prefecture-level cities have undergone faster urbanization in terms of land expansion relative to population growth,leading to less optimal land resource utilization.Only in extra-large cities does urban population growth show a relatively balanced pattern.However,urban popula tion growth in other regions and cities of various sizes lags behind land urbanization.Notably,Northeast China and small to medium cities encounter significant challenges in urban population growth.The comprehensive framework developed for evaluating SDG11.3.1 with high-precision urban built-up area data can be adapted to different national regions,yielding more accurate SDG11.3.1 outcomes.Our urban area and built-up area data products provide crucial inputs for calculating at least four indicators related to SDG11.展开更多
A summer strong convective precipitation event on 10 July 2004 over Beijing is numerically simulated in this paper, and the impact of urban heat island (UHI) on summer convective rain is investigated. The analysis r...A summer strong convective precipitation event on 10 July 2004 over Beijing is numerically simulated in this paper, and the impact of urban heat island (UHI) on summer convective rain is investigated. The analysis reveals that a mesoscaie convective cloud cluster system leads to this heavy rainfall event, suggesting the supply of moisture by the large scale circulation before the initiation of precipitation, a generally weaker UHI of 2-3℃ existed in the urban area. Much like a sea breeze, the anomalously warm urban air created relatively low pressure, inducing the inflow of cooler rural air towards the urban center, which is favorable to the ascending motion and the formation of convective precipitation over the urban area. In addition, the numerical simulation of the strong convective precipitation event suggests that the simulated result of precipitation using the 2002 LANDSAT-7 land-use data with 30-m resolution is much better than that using the 1992-1993 USGS land-use data with 1-km resolution, whether in the magnitude of rainfall or in the location of precipitation. The simulation confirms to some extent that the UHI has a significant role in causing extreme rainfall event.展开更多
This study examines the impacts of land-use data on the simulation of surface air temperature in Northwest China by the Weather Research and Forecasting(WRF) model. International Geosphere–Biosphere Program(IGBP) lan...This study examines the impacts of land-use data on the simulation of surface air temperature in Northwest China by the Weather Research and Forecasting(WRF) model. International Geosphere–Biosphere Program(IGBP) landuse data with 500-m spatial resolution are generated from Moderate Resolution Imaging Spectroradiometer(MODIS)satellite products. These data are used to replace the default U.S. Geological Survey(USGS) land-use data in the WRF model. Based on the data recorded by national basic meteorological observing stations in Northwest China, results are compared and evaluated. It is found that replacing the default USGS land-use data in the WRF model with the IGBP data improves the ability of the model to simulate surface air temperature in Northwest China in July and December 2015. Errors in the simulated daytime surface air temperature are reduced, while the results vary between seasons. There is some variation in the degree and range of impacts of land-use data on surface air temperature among seasons. Using the IGBP data, the simulated daytime surface air temperature in July 2015 improves at a relatively small number of stations, but to a relatively large degree; whereas the simulation of daytime surface air temperature in December 2015 improves at almost all stations, but only to a relatively small degree(within 1°C). Mitigation of daytime surface air temperature overestimation in July 2015 is influenced mainly by the change in ground heat flux. The modification of underestimated temperature comes mainly from the improvement of simulated net radiation in December 2015.展开更多
Although a land-cover database is very important to national land use including urban planning and land-use management, it is very laborious and time-consuming to build through digitization of paper land-use maps (1:1...Although a land-cover database is very important to national land use including urban planning and land-use management, it is very laborious and time-consuming to build through digitization of paper land-use maps (1:10000) and data input by hand. Here we propose a new, high-level, automatic technique to build a land-use database, which has proved useful and practical in building a land-use database of Baotou City.展开更多
Recently land-use change has been the main concern for worldwide environment change and is being used by city and regional planners to design sustainable cities. Nakuru in the central Rift Valley of Kenya has undergon...Recently land-use change has been the main concern for worldwide environment change and is being used by city and regional planners to design sustainable cities. Nakuru in the central Rift Valley of Kenya has undergone rapid urban growth in last decade. This paper focused on urban growth using multi-sensor satellite imageries and explored the potential benefits of combining data from optical sensors (Landsat, Worldview-2) with Radar sensor data from Advanced Land Observing Satellite (ALOS) Phased Array type L-band Synthetic Aperture Radar (PALSAR) data for urban land-use mapping. Landsat has sufficient spectral bands allowing for better delineation of urban green and impervious surface, Worldview-2 has a higher spatial resolution and facilitates urban growth mapping while PALSAR has higher temporal resolution compared to other operational sensors and has the capability of penetrating clouds irrespective of weather conditions and time of day, a condition prevalent in Nakuru, because it lies in a tropical area. Several classical and modern classifiers namely maximum likelihood (ML) and support vector machine (SVM) were applied for image classification and their performance assessed. The land-use data of the years 1986, 2000 and 2010 were compiled and analyzed using post classification comparison (PCC). The value of combining multi-temporal Landsat imagery and PALSAR was explored and achieved in this research. Our research illustrated that SVM algorithm yielded better results compared to ML. The integration of Landsat and ALOS PALSAR gave good results compared to when ALOS PAL- SAR was classified alone. 19.70 km2 of land changed to urban land-use from non-urban land-use between the years 2000 to 2010 indicating rapid urban growth has taken place. Land-use information is useful for the comprehensive land-use planning and an integrated management of resources to ensure sustainability of land and to achieve social Eq- uity, economic efficiency and environmental sustainability.展开更多
A major threat to biodiversity in North Dakota is the conversion of forested land to cultivable land, especially those that act as riparian buffers. To reverse this trend of transformation, a validation and prediction...A major threat to biodiversity in North Dakota is the conversion of forested land to cultivable land, especially those that act as riparian buffers. To reverse this trend of transformation, a validation and prediction model is necessary to assess the change. Spatial prediction within a Geographic Information System (GIS) using Kriging is a popular stochastic method. The objective of this study was to predict spatial and temporal transformation of a small agricultural watershed—Pipestem Creek in North Dakota;USA using satellite imagery from 1976 to 2015. To enhance the difference between forested land and non-forested land, a spectral transformation method—Tasseled-Cap’s Greenness Index (TCGI) was used. To study the spatial structure present in the imagery within the study period, semivariograms were generated. The Kriging prediction maps were post-classified using Remote Sensing techniques of change detection to obtain the direction and intensity of forest to non-forest change. TCGI generated higher values from 1976 to 2000 and it gradually reduced from 2000 to 2011 indicating loss of forested land.展开更多
Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a...Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.展开更多
As an important resource in data link,time slots should be strategically allocated to enhance transmission efficiency and resist eavesdropping,especially considering the tremendous increase in the number of nodes and ...As an important resource in data link,time slots should be strategically allocated to enhance transmission efficiency and resist eavesdropping,especially considering the tremendous increase in the number of nodes and diverse communication needs.It is crucial to design control sequences with robust randomness and conflict-freeness to properly address differentiated access control in data link.In this paper,we propose a hierarchical access control scheme based on control sequences to achieve high utilization of time slots and differentiated access control.A theoretical bound of the hierarchical control sequence set is derived to characterize the constraints on the parameters of the sequence set.Moreover,two classes of optimal hierarchical control sequence sets satisfying the theoretical bound are constructed,both of which enable the scheme to achieve maximum utilization of time slots.Compared with the fixed time slot allocation scheme,our scheme reduces the symbol error rate by up to 9%,which indicates a significant improvement in anti-interference and eavesdropping capabilities.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi...Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.展开更多
The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack...The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack a unified data structure,and depend heavily on manual intervention to process high-frequency and retroactive transactions.To address these limitations,a graph-based unified settlement framework is proposed to enhance automation,flexibility,and adaptability in electricity market settlements.A flexible attribute-graph model is employed to represent heterogeneousmulti-market data,enabling standardized integration,rapid querying,and seamless adaptation to evolving business requirements.An extensible operator library is designed to support configurable settlement rules,and a suite of modular tools—including dataset generation,formula configuration,billing templates,and task scheduling—facilitates end-to-end automated settlement processing.A robust refund-clearing mechanism is further incorporated,utilizing sandbox execution,data-version snapshots,dynamic lineage tracing,and real-time changecapture technologies to enable rapid and accurate recalculations under dynamic policy and data revisions.Case studies based on real-world data from regional Chinese markets validate the effectiveness of the proposed approach,demonstrating marked improvements in computational efficiency,system robustness,and automation.Moreover,enhanced settlement accuracy and high temporal granularity improve price-signal fidelity,promote cost-reflective tariffs,and incentivize energy-efficient and demand-responsive behavior among market participants.The method not only supports equitable and transparent market operations but also provides a generalizable,scalable foundation for modern electricity settlement platforms in increasingly complex and dynamic market environments.展开更多
With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comp...With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.展开更多
While the Ordos Basin is recognized for its substantial hydrocarbon exploration prospects,its rugged loess tableland terrain has rendered seismic exploration exceptionally challenging[1-3].Persistent obstacles such as...While the Ordos Basin is recognized for its substantial hydrocarbon exploration prospects,its rugged loess tableland terrain has rendered seismic exploration exceptionally challenging[1-3].Persistent obstacles such as complex 3D survey planning,low signal-tonoise ratio raw data,inadequate near-surface velocity modeling,and imaging inaccuracy have long hindered the advancement of seismic exploration across this region.Through a problem-solving approach rooted in geological target analysis,this research systematically investigates the behavioral patterns of nodal seismometer-based high-density seismic acquisition in loess plateau.Tailored advancements in waveform enhancement and depth velocity modelling methodologies have been engineered.Field validations confirm that the optimized workflow demonstrates marked improvements in amplitude preservation and imaging resolution,offering novel insights for future reservoir characterization endeavors.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.Ho...Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.However,the analysis and visualization of Ribo-seq data remain challenging.Despite the availability of various analytical pipelines,improvements in comprehensiveness,accuracy,and user-friendliness are still necessary.In this study,we develop RiboParser/RiboShiny,a robust framework for analyzing and visualizing Ribo-seq data.Building on published methods,we optimize ribosome structure-based and start/stopbased models to improve the accuracy and stability of P-site detection,even in species with a high proportion of leaderless transcripts.Leveraging these improvements,RiboParser offers comprehensive analyses,including quality control,gene-level analysis,codon-level analysis,and the analysis of Ribo-seq variants.Meanwhile,RiboShiny provides a user-friendly and adaptable platform for data visualization,facilitating deeper insights into the translational landscape.Furthermore,the integration of standardized genome annotation renders our platform universally applicable to various organisms with sequenced genomes.This framework has the potential to significantly improve the precision and efficiency of Ribo-seq data interpretation,thereby deepening our understanding of translational regulation.展开更多
High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging ...High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).展开更多
Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including ...Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.展开更多
文摘Metro is an important form of public transport in Shanghai.Based on the metro card data,we conduct the cluster analysis of Shanghai metro stations according to the pattern of passenger flow changing with time.Then the characteristics of travel time and surrounding land use are investigated for different types of stations to explore the relationship between urban land-use characteristics and travel activities reflected by passenger flow at metro stations.It is found that the passenger flow pattern of metro stations is closely related to the location conditions of stations and its surrounding land-use patterns.Based on various characteristics,285 metro stations are classified into four types,including residential-oriented stations,employmentoriented stations,employment-residence-oriented stations,and integrated functionaloriented stations,reflecting the interaction between spontaneous travel behavior and urban land-use characteristics and providing a reference for optimizing the urban functional structure and the spatial allocation of facilities.
基金funded by the National Key Research and De-velopment Program of China(Grant No.2023YFC3804001)the Natural Resources Planning and Management Project(Grant No.A2417,A2418)the Fundamental Scientific Research Funds for Central Public Wel-fare Research Institutes(Grant No.AR2409).
文摘Assessment of SDG11.3.1 indicator of the United Nations Sustainable Development Goals(SDGs)is a valuable tool for policymakers in urban planning.This study aims to enhance the accuracy of the SDG11.3.1 evaluation and explore the impact of varying precision levels in urban built-up area on the indicator’s assessment outcomes.We developed an algorithm to generate accurate urban built-up area data products based on China’s Geographical Condition Monitoring data with a 2 m resolution.The study evaluates urban land-use efficiency in China from 2015 to 2020 across different geographical units using both the research product and data derived from other studies utilizing medium and low-resolution imagery.The results indicate:(1)A significant improvement in the accuracy of our urban built-up area data,with the SDG11.3.1 evaluation results demonstrating a more precise reflection of spatiotemporal characteristics.The indicator shows a positive correlation with the accuracy level of the built-up area data;(2)From 2015 to 2020,Chinese prefecture-level cities have undergone faster urbanization in terms of land expansion relative to population growth,leading to less optimal land resource utilization.Only in extra-large cities does urban population growth show a relatively balanced pattern.However,urban popula tion growth in other regions and cities of various sizes lags behind land urbanization.Notably,Northeast China and small to medium cities encounter significant challenges in urban population growth.The comprehensive framework developed for evaluating SDG11.3.1 with high-precision urban built-up area data can be adapted to different national regions,yielding more accurate SDG11.3.1 outcomes.Our urban area and built-up area data products provide crucial inputs for calculating at least four indicators related to SDG11.
基金Natural Science Foundation of Beijing (No. 8072009)Beijing Specific Project to Foster Elitist (No. 20061D0200800060)Beijing New Star Project on Science & Technology (2004A57).
文摘A summer strong convective precipitation event on 10 July 2004 over Beijing is numerically simulated in this paper, and the impact of urban heat island (UHI) on summer convective rain is investigated. The analysis reveals that a mesoscaie convective cloud cluster system leads to this heavy rainfall event, suggesting the supply of moisture by the large scale circulation before the initiation of precipitation, a generally weaker UHI of 2-3℃ existed in the urban area. Much like a sea breeze, the anomalously warm urban air created relatively low pressure, inducing the inflow of cooler rural air towards the urban center, which is favorable to the ascending motion and the formation of convective precipitation over the urban area. In addition, the numerical simulation of the strong convective precipitation event suggests that the simulated result of precipitation using the 2002 LANDSAT-7 land-use data with 30-m resolution is much better than that using the 1992-1993 USGS land-use data with 1-km resolution, whether in the magnitude of rainfall or in the location of precipitation. The simulation confirms to some extent that the UHI has a significant role in causing extreme rainfall event.
基金Supported by the China Meteorological Administration Special Public Welfare Research Fund(GYHY201506001)National Natural Science Foundation of China(41675015)
文摘This study examines the impacts of land-use data on the simulation of surface air temperature in Northwest China by the Weather Research and Forecasting(WRF) model. International Geosphere–Biosphere Program(IGBP) landuse data with 500-m spatial resolution are generated from Moderate Resolution Imaging Spectroradiometer(MODIS)satellite products. These data are used to replace the default U.S. Geological Survey(USGS) land-use data in the WRF model. Based on the data recorded by national basic meteorological observing stations in Northwest China, results are compared and evaluated. It is found that replacing the default USGS land-use data in the WRF model with the IGBP data improves the ability of the model to simulate surface air temperature in Northwest China in July and December 2015. Errors in the simulated daytime surface air temperature are reduced, while the results vary between seasons. There is some variation in the degree and range of impacts of land-use data on surface air temperature among seasons. Using the IGBP data, the simulated daytime surface air temperature in July 2015 improves at a relatively small number of stations, but to a relatively large degree; whereas the simulation of daytime surface air temperature in December 2015 improves at almost all stations, but only to a relatively small degree(within 1°C). Mitigation of daytime surface air temperature overestimation in July 2015 is influenced mainly by the change in ground heat flux. The modification of underestimated temperature comes mainly from the improvement of simulated net radiation in December 2015.
基金supported by the National Natural Science Foundation of China(Grant No.40471090)the Education Committee Foundation of Beijing(Grant No.KM200510028013)the Science Innovation Group of Beijing.
文摘Although a land-cover database is very important to national land use including urban planning and land-use management, it is very laborious and time-consuming to build through digitization of paper land-use maps (1:10000) and data input by hand. Here we propose a new, high-level, automatic technique to build a land-use database, which has proved useful and practical in building a land-use database of Baotou City.
文摘Recently land-use change has been the main concern for worldwide environment change and is being used by city and regional planners to design sustainable cities. Nakuru in the central Rift Valley of Kenya has undergone rapid urban growth in last decade. This paper focused on urban growth using multi-sensor satellite imageries and explored the potential benefits of combining data from optical sensors (Landsat, Worldview-2) with Radar sensor data from Advanced Land Observing Satellite (ALOS) Phased Array type L-band Synthetic Aperture Radar (PALSAR) data for urban land-use mapping. Landsat has sufficient spectral bands allowing for better delineation of urban green and impervious surface, Worldview-2 has a higher spatial resolution and facilitates urban growth mapping while PALSAR has higher temporal resolution compared to other operational sensors and has the capability of penetrating clouds irrespective of weather conditions and time of day, a condition prevalent in Nakuru, because it lies in a tropical area. Several classical and modern classifiers namely maximum likelihood (ML) and support vector machine (SVM) were applied for image classification and their performance assessed. The land-use data of the years 1986, 2000 and 2010 were compiled and analyzed using post classification comparison (PCC). The value of combining multi-temporal Landsat imagery and PALSAR was explored and achieved in this research. Our research illustrated that SVM algorithm yielded better results compared to ML. The integration of Landsat and ALOS PALSAR gave good results compared to when ALOS PAL- SAR was classified alone. 19.70 km2 of land changed to urban land-use from non-urban land-use between the years 2000 to 2010 indicating rapid urban growth has taken place. Land-use information is useful for the comprehensive land-use planning and an integrated management of resources to ensure sustainability of land and to achieve social Eq- uity, economic efficiency and environmental sustainability.
文摘A major threat to biodiversity in North Dakota is the conversion of forested land to cultivable land, especially those that act as riparian buffers. To reverse this trend of transformation, a validation and prediction model is necessary to assess the change. Spatial prediction within a Geographic Information System (GIS) using Kriging is a popular stochastic method. The objective of this study was to predict spatial and temporal transformation of a small agricultural watershed—Pipestem Creek in North Dakota;USA using satellite imagery from 1976 to 2015. To enhance the difference between forested land and non-forested land, a spectral transformation method—Tasseled-Cap’s Greenness Index (TCGI) was used. To study the spatial structure present in the imagery within the study period, semivariograms were generated. The Kriging prediction maps were post-classified using Remote Sensing techniques of change detection to obtain the direction and intensity of forest to non-forest change. TCGI generated higher values from 1976 to 2000 and it gradually reduced from 2000 to 2011 indicating loss of forested land.
文摘Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.
基金supported by the National Science Foundation of China(No.62171387)the Science and Technology Program of Sichuan Province(No.2024NSFSC0468)the China Postdoctoral Science Foundation(No.2019M663475).
文摘As an important resource in data link,time slots should be strategically allocated to enhance transmission efficiency and resist eavesdropping,especially considering the tremendous increase in the number of nodes and diverse communication needs.It is crucial to design control sequences with robust randomness and conflict-freeness to properly address differentiated access control in data link.In this paper,we propose a hierarchical access control scheme based on control sequences to achieve high utilization of time slots and differentiated access control.A theoretical bound of the hierarchical control sequence set is derived to characterize the constraints on the parameters of the sequence set.Moreover,two classes of optimal hierarchical control sequence sets satisfying the theoretical bound are constructed,both of which enable the scheme to achieve maximum utilization of time slots.Compared with the fixed time slot allocation scheme,our scheme reduces the symbol error rate by up to 9%,which indicates a significant improvement in anti-interference and eavesdropping capabilities.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金funded by University of Transport and Communications(UTC)under grant number T2025-CN-004.
文摘Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.
基金funded by the Science and Technology Project of State Grid Corporation of China(5108-202355437A-3-2-ZN).
文摘The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack a unified data structure,and depend heavily on manual intervention to process high-frequency and retroactive transactions.To address these limitations,a graph-based unified settlement framework is proposed to enhance automation,flexibility,and adaptability in electricity market settlements.A flexible attribute-graph model is employed to represent heterogeneousmulti-market data,enabling standardized integration,rapid querying,and seamless adaptation to evolving business requirements.An extensible operator library is designed to support configurable settlement rules,and a suite of modular tools—including dataset generation,formula configuration,billing templates,and task scheduling—facilitates end-to-end automated settlement processing.A robust refund-clearing mechanism is further incorporated,utilizing sandbox execution,data-version snapshots,dynamic lineage tracing,and real-time changecapture technologies to enable rapid and accurate recalculations under dynamic policy and data revisions.Case studies based on real-world data from regional Chinese markets validate the effectiveness of the proposed approach,demonstrating marked improvements in computational efficiency,system robustness,and automation.Moreover,enhanced settlement accuracy and high temporal granularity improve price-signal fidelity,promote cost-reflective tariffs,and incentivize energy-efficient and demand-responsive behavior among market participants.The method not only supports equitable and transparent market operations but also provides a generalizable,scalable foundation for modern electricity settlement platforms in increasingly complex and dynamic market environments.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2023-00235509Development of security monitoring technology based network behavior against encrypted cyber threats in ICT convergence environment).
文摘With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.
文摘While the Ordos Basin is recognized for its substantial hydrocarbon exploration prospects,its rugged loess tableland terrain has rendered seismic exploration exceptionally challenging[1-3].Persistent obstacles such as complex 3D survey planning,low signal-tonoise ratio raw data,inadequate near-surface velocity modeling,and imaging inaccuracy have long hindered the advancement of seismic exploration across this region.Through a problem-solving approach rooted in geological target analysis,this research systematically investigates the behavioral patterns of nodal seismometer-based high-density seismic acquisition in loess plateau.Tailored advancements in waveform enhancement and depth velocity modelling methodologies have been engineered.Field validations confirm that the optimized workflow demonstrates marked improvements in amplitude preservation and imaging resolution,offering novel insights for future reservoir characterization endeavors.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
基金supported by the National Key Research and Development Program of China(2022YFA0912100)the National Natural Science Foundation of China(32270098 and 32470073)+1 种基金the Fundamental Research Funds for the Central Universities(2662024JC015)the National Key Laboratory of Agricultural Microbiology(AML2024D02)to Z.Z.
文摘Translation is a crucial step in gene expression.Over the past decade,the development and application of ribosome profiling(Ribo-seq)have significantly advanced our understanding of translational regulation in vivo.However,the analysis and visualization of Ribo-seq data remain challenging.Despite the availability of various analytical pipelines,improvements in comprehensiveness,accuracy,and user-friendliness are still necessary.In this study,we develop RiboParser/RiboShiny,a robust framework for analyzing and visualizing Ribo-seq data.Building on published methods,we optimize ribosome structure-based and start/stopbased models to improve the accuracy and stability of P-site detection,even in species with a high proportion of leaderless transcripts.Leveraging these improvements,RiboParser offers comprehensive analyses,including quality control,gene-level analysis,codon-level analysis,and the analysis of Ribo-seq variants.Meanwhile,RiboShiny provides a user-friendly and adaptable platform for data visualization,facilitating deeper insights into the translational landscape.Furthermore,the integration of standardized genome annotation renders our platform universally applicable to various organisms with sequenced genomes.This framework has the potential to significantly improve the precision and efficiency of Ribo-seq data interpretation,thereby deepening our understanding of translational regulation.
文摘High-throughput transcriptomics has evolved from bulk RNA-seq to single-cell and spatial profiling,yet its clinical translation still depends on effective integration across diverse omics and data modalities.Emerging foundation models and multimodal learning frameworks are enabling scalable and transferable representations of cellular states,while advances in interpretability and real-world data integration are bridging the gap between discovery and clinical application.This paper outlines a concise roadmap for AI-driven,transcriptome-centered multi-omics integration in precision medicine(Figure 1).
基金Supported by Xuhui District Health Commission,No.SHXH202214.
文摘Gastrointestinal tumors require personalized treatment strategies due to their heterogeneity and complexity.Multimodal artificial intelligence(AI)addresses this challenge by integrating diverse data sources-including computed tomography(CT),magnetic resonance imaging(MRI),endoscopic imaging,and genomic profiles-to enable intelligent decision-making for individualized therapy.This approach leverages AI algorithms to fuse imaging,endoscopic,and omics data,facilitating comprehensive characterization of tumor biology,prediction of treatment response,and optimization of therapeutic strategies.By combining CT and MRI for structural assessment,endoscopic data for real-time visual inspection,and genomic information for molecular profiling,multimodal AI enhances the accuracy of patient stratification and treatment personalization.The clinical implementation of this technology demonstrates potential for improving patient outcomes,advancing precision oncology,and supporting individualized care in gastrointestinal cancers.Ultimately,multimodal AI serves as a transformative tool in oncology,bridging data integration with clinical application to effectively tailor therapies.