The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This pap...The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.展开更多
As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and oper...As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and operation and supervision[1,2].Healthcare data elements include biolog.ical and clinical data that are related to disease,environ-mental health data that are associated with life,and operational and healthcare management data that are related to healthcare activities(Figure 1).Activities such as the construction of a data value assessment system,the devel-opment of a data circulation and sharing platform,and the authorization of data compliance and operation products support the strong growth momentum of the market for health care data elements in China[3].展开更多
With increasing demand for data circulation,ensuring data security and privacy is paramount,specifically protecting privacy while maximizing utility.Blockchain,while decentralized and transparent,faces challenges in p...With increasing demand for data circulation,ensuring data security and privacy is paramount,specifically protecting privacy while maximizing utility.Blockchain,while decentralized and transparent,faces challenges in privacy protection and data verification,especially for sensitive data.Existing schemes often suffer from inefficiency and high overhead.We propose a privacy protection scheme using BGV homomorphic encryption and Pedersen Secret Sharing.This scheme enables secure computation on encrypted data,with Pedersen sharding and verifying the private key,ensuring data consistency and immutability.The blockchain framework manages key shards,verifies secrets,and aids security auditing.This approach allows for trusted computation without revealing the underlying data.Preliminary results demonstrate the scheme's feasibility in ensuring data privacy and security,making data available but not visible.This study provides an effective solution for data sharing and privacy protection in blockchain applications.展开更多
This paper explores the development of interpretable data elements from raw data using Kolmogorov-Arnold Networks(KAN).With the exponential growth of data in contemporary society,there is an urgent need for effective ...This paper explores the development of interpretable data elements from raw data using Kolmogorov-Arnold Networks(KAN).With the exponential growth of data in contemporary society,there is an urgent need for effective data processing methods to unlock the full potential of this resource.The study focuses on the application of KAN in the transportation sector to transform raw traffic data into meaningful data elements.The core of the research is the KANT-GCN model,which synergizes Kolmogorov-Arnold Networks with Temporal Graph Convolutional Networks(T-GCN).This innovative model demonstrates superior performance in predicting traffic speeds,outperforming existing methods in terms of accuracy,reliability,and interpretability.The model was evaluated using real-world datasets from Shenzhen,Los Angeles,and the San Francisco Bay Area,showing significant improvements in different metrics.The paper highlights the potential of KAN-T-GCN to revolutionize data-driven decision-making in traffic management and other sectors,underscoring its ability to handle dynamic updates and maintain data integrity.展开更多
This article explores the characteristics of data resources from the perspective of production factors,analyzes the demand for trustworthy circulation technology,designs a fusion architecture and related solutions,inc...This article explores the characteristics of data resources from the perspective of production factors,analyzes the demand for trustworthy circulation technology,designs a fusion architecture and related solutions,including multi-party data intersection calculation,distributed machine learning,etc.It also compares performance differences,conducts formal verification,points out the value and limitations of architecture innovation,and looks forward to future opportunities.展开更多
On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th Nation...On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.展开更多
[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data...[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.展开更多
We propose a novel workflow for fast forward modeling of well logs in axially symmetric 2D models of the nearwellbore environment.The approach integrates the finite element method with deep residual neural networks to...We propose a novel workflow for fast forward modeling of well logs in axially symmetric 2D models of the nearwellbore environment.The approach integrates the finite element method with deep residual neural networks to achieve exceptional computational efficiency and accuracy.The workflow is demonstrated through the modeling of wireline electromagnetic propagation resistivity logs,where the measured responses exhibit a highly nonlinear relationship with formation properties.The motivation for this research is the need for advanced modeling al-gorithms that are fast enough for use in modern quantitative interpretation tools,where thousands of simulations may be required in iterative inversion processes.The proposed algorithm achieves a remarkable enhancement in performance,being up to 3000 times faster than the finite element method alone when utilizing a GPU.While still ensuring high accuracy,this makes it well-suited for practical applications when reliable payzone assessment is needed in complex environmental scenarios.Furthermore,the algorithm’s efficiency positions it as a promising tool for stochastic Bayesian inversion,facilitating reliable uncertainty quantification in subsurface property estimation.展开更多
In modern ZnO varistors,traditional aging mechanisms based on increased power consumption are no longer relevant due to reduced power consumption during DC aging.Prolonged exposure to both AC and DC voltages results i...In modern ZnO varistors,traditional aging mechanisms based on increased power consumption are no longer relevant due to reduced power consumption during DC aging.Prolonged exposure to both AC and DC voltages results in increased leakage current,decreased breakdown voltage,and lower nonlinearity,ultimately compromising their protective performance.To investigate the evolution in electrical properties during DC aging,this work developed a finite element model based on Voronoi networks and conducted accelerated aging tests on commercial varistors.Throughout the aging process,current-voltage characteristics and Schottky barrier parameters were measured and analyzed.The results indicate that when subjected to constant voltage,current flows through regions with larger grain sizes,forming discharge channels.As aging progresses,the current focus increases on these channels,leading to a decline in the varistor’s overall performance.Furthermore,analysis of the Schottky barrier parameters shows that the changes in electrical performance during aging are non-monotonic.These findings offer theoretical support for understanding the aging mechanisms and condition assessment of modern stable ZnO varistors.展开更多
Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a...Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi...Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.展开更多
With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comp...With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.展开更多
The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack...The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack a unified data structure,and depend heavily on manual intervention to process high-frequency and retroactive transactions.To address these limitations,a graph-based unified settlement framework is proposed to enhance automation,flexibility,and adaptability in electricity market settlements.A flexible attribute-graph model is employed to represent heterogeneousmulti-market data,enabling standardized integration,rapid querying,and seamless adaptation to evolving business requirements.An extensible operator library is designed to support configurable settlement rules,and a suite of modular tools—including dataset generation,formula configuration,billing templates,and task scheduling—facilitates end-to-end automated settlement processing.A robust refund-clearing mechanism is further incorporated,utilizing sandbox execution,data-version snapshots,dynamic lineage tracing,and real-time changecapture technologies to enable rapid and accurate recalculations under dynamic policy and data revisions.Case studies based on real-world data from regional Chinese markets validate the effectiveness of the proposed approach,demonstrating marked improvements in computational efficiency,system robustness,and automation.Moreover,enhanced settlement accuracy and high temporal granularity improve price-signal fidelity,promote cost-reflective tariffs,and incentivize energy-efficient and demand-responsive behavior among market participants.The method not only supports equitable and transparent market operations but also provides a generalizable,scalable foundation for modern electricity settlement platforms in increasingly complex and dynamic market environments.展开更多
The moving morphable component(MMC)topology optimization method,as a typical explicit topology optimization method,has been widely concerned.In the MMC topology optimization framework,the surrogate material model is m...The moving morphable component(MMC)topology optimization method,as a typical explicit topology optimization method,has been widely concerned.In the MMC topology optimization framework,the surrogate material model is mainly used for finite element analysis at present,and the effectiveness of the surrogate material model has been fully confirmed.However,there are some accuracy problems when dealing with boundary elements using the surrogate material model,which will affect the topology optimization results.In this study,a boundary element reconstruction(BER)model is proposed based on the surrogate material model under the MMC topology optimization framework to improve the accuracy of topology optimization.The proposed BER model can reconstruct the boundary elements by refining the local meshes and obtaining new nodes in boundary elements.Then the density of boundary elements is recalculated using the new node information,which is more accurate than the original model.Based on the new density of boundary elements,the material properties and volume information of the boundary elements are updated.Compared with other finite element analysis methods,the BER model is simple and feasible and can improve computational accuracy.Finally,the effectiveness and superiority of the proposed method are verified by comparing it with the optimization results of the original surrogate material model through several numerical examples.展开更多
On the stone-paved lanes of Songyang County that date back to ancient times,morning mist lingered as a faint fragrance of tea wafted from a century-old house.Inside,Yang Junjie,a tea maker born in the 1980s,worked def...On the stone-paved lanes of Songyang County that date back to ancient times,morning mist lingered as a faint fragrance of tea wafted from a century-old house.Inside,Yang Junjie,a tea maker born in the 1980s,worked deftly at the stove,his hands moving swiftly over the scorching iron wok as tender green tea leaves dance between his fingers.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities...The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.展开更多
We report the results of the experiment on synthesizing ^(287,288)Mc isotopes (Z=115) using the fusionevaporation reaction ^(243)Am(^(48)Ca,4n,3n)^(287,288)Mc at the Spectrometer for Heavy Atoms and Nuclear Structure-...We report the results of the experiment on synthesizing ^(287,288)Mc isotopes (Z=115) using the fusionevaporation reaction ^(243)Am(^(48)Ca,4n,3n)^(287,288)Mc at the Spectrometer for Heavy Atoms and Nuclear Structure-2(SHANS2),a gas-filled recoil separator located at the China Accelerator Facility for Superheavy Elements(CAFE2).In total,20 decay chains are attributed to ^(288)Mc and 1 decay chain is assigned to ^(287)Mc.The measured oa-decay properties of ^(287,288)Mc as well as its descendants are consistent with the known data.No additional decay chains originating from the 2n or 5n reaction channels were detected.The excitation function of the ^(243)Am(^(48)Ca,3n)^(288)Mc reaction was measured at the cross-section level of picobarn,which indicates the promising capability for the study of heavy and superheavy nuclei at the facility.展开更多
文摘The data production elements are driving profound transformations in the real economy across production objects,methods,and tools,generating significant economic effects such as industrial structure upgrading.This paper aims to reveal the impact mechanism of the data elements on the“three transformations”(high-end,intelligent,and green)in the manufacturing sector,theoretically elucidating the intrinsic mechanisms by which the data elements influence these transformations.The study finds that the data elements significantly enhance the high-end,intelligent,and green levels of China's manufacturing industry.In terms of the pathways of impact,the data elements primarily influence the development of high-tech industries and overall green technological innovation,thereby affecting the high-end,intelligent,and green transformation of the industry.
基金supported by National Natural Science Foundation of China(Grants 72474022,71974011,72174022,71972012,71874009)"BIT think tank"Promotion Plan of Science and Technology Innovation Program of Beijing Institute of Technology(Grants 2024CX14017,2023CX13029).
文摘As a new type of production factor in healthcare,healthcare data elements have been rapidly integrated into various health production processes,such as clinical assistance,health management,biological testing,and operation and supervision[1,2].Healthcare data elements include biolog.ical and clinical data that are related to disease,environ-mental health data that are associated with life,and operational and healthcare management data that are related to healthcare activities(Figure 1).Activities such as the construction of a data value assessment system,the devel-opment of a data circulation and sharing platform,and the authorization of data compliance and operation products support the strong growth momentum of the market for health care data elements in China[3].
基金supported by the National Key Research and Development Plan in China(Grant No.2020YFB1005500)。
文摘With increasing demand for data circulation,ensuring data security and privacy is paramount,specifically protecting privacy while maximizing utility.Blockchain,while decentralized and transparent,faces challenges in privacy protection and data verification,especially for sensitive data.Existing schemes often suffer from inefficiency and high overhead.We propose a privacy protection scheme using BGV homomorphic encryption and Pedersen Secret Sharing.This scheme enables secure computation on encrypted data,with Pedersen sharding and verifying the private key,ensuring data consistency and immutability.The blockchain framework manages key shards,verifies secrets,and aids security auditing.This approach allows for trusted computation without revealing the underlying data.Preliminary results demonstrate the scheme's feasibility in ensuring data privacy and security,making data available but not visible.This study provides an effective solution for data sharing and privacy protection in blockchain applications.
基金supported by the EU H2020 Research and Innovation Program under the Marie Sklodowska-Curie Grant Agreement(Project-DEEP,Grant No.101109045)the National Natural Science Foundation of China(No.NSFC 61925105 and 62171257)the Tsinghua University-China Mobile Communications Group Co.,Ltd.Joint Institute,and the Fundamental Research Funds for the Central Universities,China(No.FRF-NP-20-03).
文摘This paper explores the development of interpretable data elements from raw data using Kolmogorov-Arnold Networks(KAN).With the exponential growth of data in contemporary society,there is an urgent need for effective data processing methods to unlock the full potential of this resource.The study focuses on the application of KAN in the transportation sector to transform raw traffic data into meaningful data elements.The core of the research is the KANT-GCN model,which synergizes Kolmogorov-Arnold Networks with Temporal Graph Convolutional Networks(T-GCN).This innovative model demonstrates superior performance in predicting traffic speeds,outperforming existing methods in terms of accuracy,reliability,and interpretability.The model was evaluated using real-world datasets from Shenzhen,Los Angeles,and the San Francisco Bay Area,showing significant improvements in different metrics.The paper highlights the potential of KAN-T-GCN to revolutionize data-driven decision-making in traffic management and other sectors,underscoring its ability to handle dynamic updates and maintain data integrity.
文摘This article explores the characteristics of data resources from the perspective of production factors,analyzes the demand for trustworthy circulation technology,designs a fusion architecture and related solutions,including multi-party data intersection calculation,distributed machine learning,etc.It also compares performance differences,conducts formal verification,points out the value and limitations of architecture innovation,and looks forward to future opportunities.
文摘On October 18,2017,the 19th National Congress Report called for the implementation of the Healthy China Strategy.The development of biomedical data plays a pivotal role in advancing this strategy.Since the 18th National Congress of the Communist Party of China,China has vigorously promoted the integration and implementation of the Healthy China and Digital China strategies.The National Health Commission has prioritized the development of health and medical big data,issuing policies to promote standardized applica-tions and foster innovation in"Internet+Healthcare."Biomedical data has significantly contributed to preci-sion medicine,personalized health management,drug development,disease diagnosis,public health monitor-ing,and epidemic prediction capabilities.
基金the Fifth Batch of Innovation Teams of Wuzhou Meteorological Bureau"Wuzhou Innovation Team for Enhancing the Comprehensive Meteorological Observation Ability through Digitization and Intelligence"Wuzhou Science and Technology Planning Project(202402122,202402119).
文摘[Objective]In response to the issue of insufficient integrity in hourly routine meteorological element data files,this paper aims to improve the availability and reliability of data files,and provide high-quality data file support for meteorological forecasting and services.[Method]In this paper,an efficient and accurate method for data file quality control and fusion processing is developed.By locating the missing measurement time,data are extracted from the"AWZ.db"database and the minute routine meteorological element data file,and merged into the hourly routine meteorological element data file.[Result]Data processing efficiency and accuracy are significantly improved,and the problem of incomplete hourly routine meteorological element data files is solved.At the same time,it emphasizes the importance of ensuring the accuracy of the files used and carefully checking and verifying the fusion results,and proposes strategies to improve data quality.[Conclusion]This method provides convenience for observation personnel and effectively improves the integrity and accuracy of data files.In the future,it is expected to provide more reliable data support for meteorological forecasting and services.
基金financially supported by the Russian federal research project No.FWZZ-2022-0026“Innovative aspects of electro-dynamics in problems of exploration and oilfield geophysics”.
文摘We propose a novel workflow for fast forward modeling of well logs in axially symmetric 2D models of the nearwellbore environment.The approach integrates the finite element method with deep residual neural networks to achieve exceptional computational efficiency and accuracy.The workflow is demonstrated through the modeling of wireline electromagnetic propagation resistivity logs,where the measured responses exhibit a highly nonlinear relationship with formation properties.The motivation for this research is the need for advanced modeling al-gorithms that are fast enough for use in modern quantitative interpretation tools,where thousands of simulations may be required in iterative inversion processes.The proposed algorithm achieves a remarkable enhancement in performance,being up to 3000 times faster than the finite element method alone when utilizing a GPU.While still ensuring high accuracy,this makes it well-suited for practical applications when reliable payzone assessment is needed in complex environmental scenarios.Furthermore,the algorithm’s efficiency positions it as a promising tool for stochastic Bayesian inversion,facilitating reliable uncertainty quantification in subsurface property estimation.
文摘In modern ZnO varistors,traditional aging mechanisms based on increased power consumption are no longer relevant due to reduced power consumption during DC aging.Prolonged exposure to both AC and DC voltages results in increased leakage current,decreased breakdown voltage,and lower nonlinearity,ultimately compromising their protective performance.To investigate the evolution in electrical properties during DC aging,this work developed a finite element model based on Voronoi networks and conducted accelerated aging tests on commercial varistors.Throughout the aging process,current-voltage characteristics and Schottky barrier parameters were measured and analyzed.The results indicate that when subjected to constant voltage,current flows through regions with larger grain sizes,forming discharge channels.As aging progresses,the current focus increases on these channels,leading to a decline in the varistor’s overall performance.Furthermore,analysis of the Schottky barrier parameters shows that the changes in electrical performance during aging are non-monotonic.These findings offer theoretical support for understanding the aging mechanisms and condition assessment of modern stable ZnO varistors.
文摘Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金funded by University of Transport and Communications(UTC)under grant number T2025-CN-004.
文摘Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2023-00235509Development of security monitoring technology based network behavior against encrypted cyber threats in ICT convergence environment).
文摘With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.
基金funded by the Science and Technology Project of State Grid Corporation of China(5108-202355437A-3-2-ZN).
文摘The increasing complexity of China’s electricity market creates substantial challenges for settlement automation,data consistency,and operational scalability.Existing provincial settlement systems are fragmented,lack a unified data structure,and depend heavily on manual intervention to process high-frequency and retroactive transactions.To address these limitations,a graph-based unified settlement framework is proposed to enhance automation,flexibility,and adaptability in electricity market settlements.A flexible attribute-graph model is employed to represent heterogeneousmulti-market data,enabling standardized integration,rapid querying,and seamless adaptation to evolving business requirements.An extensible operator library is designed to support configurable settlement rules,and a suite of modular tools—including dataset generation,formula configuration,billing templates,and task scheduling—facilitates end-to-end automated settlement processing.A robust refund-clearing mechanism is further incorporated,utilizing sandbox execution,data-version snapshots,dynamic lineage tracing,and real-time changecapture technologies to enable rapid and accurate recalculations under dynamic policy and data revisions.Case studies based on real-world data from regional Chinese markets validate the effectiveness of the proposed approach,demonstrating marked improvements in computational efficiency,system robustness,and automation.Moreover,enhanced settlement accuracy and high temporal granularity improve price-signal fidelity,promote cost-reflective tariffs,and incentivize energy-efficient and demand-responsive behavior among market participants.The method not only supports equitable and transparent market operations but also provides a generalizable,scalable foundation for modern electricity settlement platforms in increasingly complex and dynamic market environments.
基金supported by the Science and Technology Research Project of Henan Province(242102241055)the Industry-University-Research Collaborative Innovation Base on Automobile Lightweight of“Science and Technology Innovation in Central Plains”(2024KCZY315)the Opening Fund of State Key Laboratory of Structural Analysis,Optimization and CAE Software for Industrial Equipment(GZ2024A03-ZZU).
文摘The moving morphable component(MMC)topology optimization method,as a typical explicit topology optimization method,has been widely concerned.In the MMC topology optimization framework,the surrogate material model is mainly used for finite element analysis at present,and the effectiveness of the surrogate material model has been fully confirmed.However,there are some accuracy problems when dealing with boundary elements using the surrogate material model,which will affect the topology optimization results.In this study,a boundary element reconstruction(BER)model is proposed based on the surrogate material model under the MMC topology optimization framework to improve the accuracy of topology optimization.The proposed BER model can reconstruct the boundary elements by refining the local meshes and obtaining new nodes in boundary elements.Then the density of boundary elements is recalculated using the new node information,which is more accurate than the original model.Based on the new density of boundary elements,the material properties and volume information of the boundary elements are updated.Compared with other finite element analysis methods,the BER model is simple and feasible and can improve computational accuracy.Finally,the effectiveness and superiority of the proposed method are verified by comparing it with the optimization results of the original surrogate material model through several numerical examples.
文摘On the stone-paved lanes of Songyang County that date back to ancient times,morning mist lingered as a faint fragrance of tea wafted from a century-old house.Inside,Yang Junjie,a tea maker born in the 1980s,worked deftly at the stove,his hands moving swiftly over the scorching iron wok as tender green tea leaves dance between his fingers.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.
文摘The rapid growth of biomedical data,particularly multi-omics data including genomes,transcriptomics,proteomics,metabolomics,and epigenomics,medical research and clinical decision-making confront both new opportunities and obstacles.The huge and diversified nature of these datasets cannot always be managed using traditional data analysis methods.As a consequence,deep learning has emerged as a strong tool for analysing numerous omics data due to its ability to handle complex and non-linear relationships.This paper explores the fundamental concepts of deep learning and how they are used in multi-omics medical data mining.We demonstrate how autoencoders,variational autoencoders,multimodal models,attention mechanisms,transformers,and graph neural networks enable pattern analysis and recognition across all omics data.Deep learning has been found to be effective in illness classification,biomarker identification,gene network learning,and therapeutic efficacy prediction.We also consider critical problems like as data quality,model explainability,whether findings can be repeated,and computational power requirements.We now consider future elements of combining omics with clinical and imaging data,explainable AI,federated learning,and real-time diagnostics.Overall,this study emphasises the need of collaborating across disciplines to advance deep learning-based multi-omics research for precision medicine and comprehending complicated disorders.
基金supported in part by the National Key R&D Program of China (Contract Nos.2023YFA1606500,2024YFE0109800,and 2024YFE0110400)Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB34010000)+5 种基金the Gansu Key Project of Science and Technology (Grant No.23ZDGA014)the Guangdong Major Project of Basic and Applied Basic Research (Grant No.2021B0301030006)the National Natural Science Foundation of China (Grant Nos.12105328,W2412040,12475126,12422507,12035011,12375118,12435008,and W2412043)the Chinese Academy of Sciences Project for Young Scientists in Basic Research(Grant No.YSBR-002)the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant Nos.2020409 and 2023439)the Russian Science Foundation (Grant No.25-42-00003)。
文摘We report the results of the experiment on synthesizing ^(287,288)Mc isotopes (Z=115) using the fusionevaporation reaction ^(243)Am(^(48)Ca,4n,3n)^(287,288)Mc at the Spectrometer for Heavy Atoms and Nuclear Structure-2(SHANS2),a gas-filled recoil separator located at the China Accelerator Facility for Superheavy Elements(CAFE2).In total,20 decay chains are attributed to ^(288)Mc and 1 decay chain is assigned to ^(287)Mc.The measured oa-decay properties of ^(287,288)Mc as well as its descendants are consistent with the known data.No additional decay chains originating from the 2n or 5n reaction channels were detected.The excitation function of the ^(243)Am(^(48)Ca,3n)^(288)Mc reaction was measured at the cross-section level of picobarn,which indicates the promising capability for the study of heavy and superheavy nuclei at the facility.