With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud...With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud servers vulnerable due to insufficient encryption.This paper introduces a novel mechanism that encrypts data in‘bundle’units,designed to meet the dual requirements of efficiency and security for frequently updated collaborative data.Each bundle includes updated information,allowing only the updated portions to be reencrypted when changes occur.The encryption method proposed in this paper addresses the inefficiencies of traditional encryption modes,such as Cipher Block Chaining(CBC)and Counter(CTR),which require decrypting and re-encrypting the entire dataset whenever updates occur.The proposed method leverages update-specific information embedded within data bundles and metadata that maps the relationship between these bundles and the plaintext data.By utilizing this information,the method accurately identifies the modified portions and applies algorithms to selectively re-encrypt only those sections.This approach significantly enhances the efficiency of data updates while maintaining high performance,particularly in large-scale data environments.To validate this approach,we conducted experiments measuring execution time as both the size of the modified data and the total dataset size varied.Results show that the proposed method significantly outperforms CBC and CTR modes in execution speed,with greater performance gains as data size increases.Additionally,our security evaluation confirms that this method provides robust protection against both passive and active attacks.展开更多
With the continuous advancement of the tiered diagnosis and treatment system,the medical consortium model has gained increasing attention as an important approach to promoting the vertical integration of healthcare re...With the continuous advancement of the tiered diagnosis and treatment system,the medical consortium model has gained increasing attention as an important approach to promoting the vertical integration of healthcare resources.Within this context,laboratory data,as a key component of healthcare information systems,urgently requires efficient sharing and intelligent analysis.This paper designs and constructs an intelligent early warning system for laboratory data based on a cloud platform tailored to the medical consortium model.Through standardized data formats and unified access interfaces,the system enables the integration and cleaning of laboratory data across multiple healthcare institutions.By combining medical rule sets with machine learning models,the system achieves graded alerts and rapid responses to abnormal key indicators and potential outbreaks of infectious diseases.Practical deployment results demonstrate that the system significantly improves the utilization efficiency of laboratory data,strengthens public health event monitoring,and optimizes inter-institutional collaboration.The paper also discusses challenges encountered during system implementation,such as inconsistent data standards,security and compliance concerns,and model interpretability,and proposes corresponding optimization strategies.These findings provide a reference for the broader application of intelligent medical early warning systems.展开更多
Airborne LiDAR(Light Detection and Ranging)is an evolving high-tech active remote sensing technology that has the capability to acquire large-area topographic data and can quickly generate DEM(Digital Elevation Model)...Airborne LiDAR(Light Detection and Ranging)is an evolving high-tech active remote sensing technology that has the capability to acquire large-area topographic data and can quickly generate DEM(Digital Elevation Model)products.Combined with image data,this technology can further enrich and extract spatial geographic information.However,practically,due to the limited operating range of airborne LiDAR and the large area of task,it would be necessary to perform registration and stitching process on point clouds of adjacent flight strips.By eliminating grow errors,the systematic errors in the data need to be effectively reduced.Thus,this paper conducts research on point cloud registration methods in urban building areas,aiming to improve the accuracy and processing efficiency of airborne LiDAR data.Meanwhile,an improved post-ICP(Iterative Closest Point)point cloud registration method was proposed in this study to determine the accurate registration and efficient stitching of point clouds,which capable to provide a potential technical support for applicants in related field.展开更多
Large-scale point cloud datasets form the basis for training various deep learning networks and achieving high-quality network processing tasks.Due to the diversity and robustness constraints of the data,data augmenta...Large-scale point cloud datasets form the basis for training various deep learning networks and achieving high-quality network processing tasks.Due to the diversity and robustness constraints of the data,data augmentation(DA)methods are utilised to expand dataset diversity and scale.However,due to the complex and distinct characteristics of LiDAR point cloud data from different platforms(such as missile-borne and vehicular LiDAR data),directly applying traditional 2D visual domain DA methods to 3D data can lead to networks trained using this approach not robustly achieving the corresponding tasks.To address this issue,the present study explores DA for missile-borne LiDAR point cloud using a Monte Carlo(MC)simulation method that closely resembles practical application.Firstly,the model of multi-sensor imaging system is established,taking into account the joint errors arising from the platform itself and the relative motion during the imaging process.A distortion simulation method based on MC simulation for augmenting missile-borne LiDAR point cloud data is proposed,underpinned by an analysis of combined errors between different modal sensors,achieving high-quality augmentation of point cloud data.The effectiveness of the proposed method in addressing imaging system errors and distortion simulation is validated using the imaging scene dataset constructed in this paper.Comparative experiments between the proposed point cloud DA algorithm and the current state-of-the-art algorithms in point cloud detection and single object tracking tasks demonstrate that the proposed method can improve the network performance obtained from unaugmented datasets by over 17.3%and 17.9%,surpassing SOTA performance of current point cloud DA algorithms.展开更多
A basic procedure for transforming readable data into encoded forms is encryption, which ensures security when the right decryption keys are used. Hadoop is susceptible to possible cyber-attacks because it lacks built...A basic procedure for transforming readable data into encoded forms is encryption, which ensures security when the right decryption keys are used. Hadoop is susceptible to possible cyber-attacks because it lacks built-in security measures, even though it can effectively handle and store enormous datasets using the Hadoop Distributed File System (HDFS). The increasing number of data breaches emphasizes how urgently creative encryption techniques are needed in cloud-based big data settings. This paper presents Adaptive Attribute-Based Honey Encryption (AABHE), a state-of-the-art technique that combines honey encryption with Ciphertext-Policy Attribute-Based Encryption (CP-ABE) to provide improved data security. Even if intercepted, AABHE makes sure that sensitive data cannot be accessed by unauthorized parties. With a focus on protecting huge files in HDFS, the suggested approach achieves 98% security robustness and 95% encryption efficiency, outperforming other encryption methods including Ciphertext-Policy Attribute-Based Encryption (CP-ABE), Key-Policy Attribute-Based Encryption (KB-ABE), and Advanced Encryption Standard combined with Attribute-Based Encryption (AES+ABE). By fixing Hadoop’s security flaws, AABHE fortifies its protections against data breaches and enhances Hadoop’s dependability as a platform for processing and storing massive amounts of data.展开更多
The integration of the Internet of Things(IoT)into healthcare systems improves patient care,boosts operational efficiency,and contributes to cost-effective healthcare delivery.However,overcoming several associated cha...The integration of the Internet of Things(IoT)into healthcare systems improves patient care,boosts operational efficiency,and contributes to cost-effective healthcare delivery.However,overcoming several associated challenges,such as data security,interoperability,and ethical concerns,is crucial to realizing the full potential of IoT in healthcare.Real-time anomaly detection plays a key role in protecting patient data and maintaining device integrity amidst the additional security risks posed by interconnected systems.In this context,this paper presents a novelmethod for healthcare data privacy analysis.The technique is based on the identification of anomalies in cloud-based Internet of Things(IoT)networks,and it is optimized using explainable artificial intelligence.For anomaly detection,the Radial Boltzmann Gaussian Temporal Fuzzy Network(RBGTFN)is used in the process of doing information privacy analysis for healthcare data.Remora Colony SwarmOptimization is then used to carry out the optimization of the network.The performance of the model in identifying anomalies across a variety of healthcare data is evaluated by an experimental study.This evaluation suggested that themodel measures the accuracy,precision,latency,Quality of Service(QoS),and scalability of themodel.A remarkable 95%precision,93%latency,89%quality of service,98%detection accuracy,and 96%scalability were obtained by the suggested model,as shown by the subsequent findings.展开更多
Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of th...Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of these data has not been well stored,managed and mined.With the development of cloud computing technology,it provides a rare development opportunity for logging big data private cloud.The traditional petrophysical evaluation and interpretation model has encountered great challenges in the face of new evaluation objects.The solution research of logging big data distributed storage,processing and learning functions integrated in logging big data private cloud has not been carried out yet.To establish a distributed logging big-data private cloud platform centered on a unifi ed learning model,which achieves the distributed storage and processing of logging big data and facilitates the learning of novel knowledge patterns via the unifi ed logging learning model integrating physical simulation and data models in a large-scale functional space,thus resolving the geo-engineering evaluation problem of geothermal fi elds.Based on the research idea of“logging big data cloud platform-unifi ed logging learning model-large function space-knowledge learning&discovery-application”,the theoretical foundation of unified learning model,cloud platform architecture,data storage and learning algorithm,arithmetic power allocation and platform monitoring,platform stability,data security,etc.have been carried on analysis.The designed logging big data cloud platform realizes parallel distributed storage and processing of data and learning algorithms.The feasibility of constructing a well logging big data cloud platform based on a unifi ed learning model of physics and data is analyzed in terms of the structure,ecology,management and security of the cloud platform.The case study shows that the logging big data cloud platform has obvious technical advantages over traditional logging evaluation methods in terms of knowledge discovery method,data software and results sharing,accuracy,speed and complexity.展开更多
Cloud computing has become an essential technology for the management and processing of large datasets,offering scalability,high availability,and fault tolerance.However,optimizing data replication across multiple dat...Cloud computing has become an essential technology for the management and processing of large datasets,offering scalability,high availability,and fault tolerance.However,optimizing data replication across multiple data centers poses a significant challenge,especially when balancing opposing goals such as latency,storage costs,energy consumption,and network efficiency.This study introduces a novel Dynamic Optimization Algorithm called Dynamic Multi-Objective Gannet Optimization(DMGO),designed to enhance data replication efficiency in cloud environments.Unlike traditional static replication systems,DMGO adapts dynamically to variations in network conditions,system demand,and resource availability.The approach utilizes multi-objective optimization approaches to efficiently balance data access latency,storage efficiency,and operational costs.DMGO consistently evaluates data center performance and adjusts replication algorithms in real time to guarantee optimal system efficiency.Experimental evaluations conducted in a simulated cloud environment demonstrate that DMGO significantly outperforms conventional static algorithms,achieving faster data access,lower storage overhead,reduced energy consumption,and improved scalability.The proposed methodology offers a robust and adaptable solution for modern cloud systems,ensuring efficient resource consumption while maintaining high performance.展开更多
Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts...Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts of climate change.Remote sensing has become a vital tool for snow monitoring,with the widely used Moderate-resolution Imaging Spectroradiometer(MODIS)snow products from the Terra and Aqua satellites.However,cloud cover often interferes with snow detection,making cloud removal techniques crucial for reliable snow product generation.This study evaluated the accuracy of four MODIS snow cover datasets generated through different cloud removal algorithms.Using real-time field camera observations from four stations in the Tianshan Mountains,China,this study assessed the performance of these datasets during three distinct snow periods:the snow accumulation period(September-November),snowmelt period(March-June),and stable snow period(December-February in the following year).The findings showed that cloud-free snow products generated using the Hidden Markov Random Field(HMRF)algorithm consistently outperformed the others,particularly under cloud cover,while cloud-free snow products using near-day synthesis and the spatiotemporal adaptive fusion method with error correction(STAR)demonstrated varying performance depending on terrain complexity and cloud conditions.This study highlighted the importance of considering terrain features,land cover types,and snow dynamics when selecting cloud removal methods,particularly in areas with rapid snow accumulation and melting.The results suggested that future research should focus on improving cloud removal algorithms through the integration of machine learning,multi-source data fusion,and advanced remote sensing technologies.By expanding validation efforts and refining cloud removal strategies,more accurate and reliable snow products can be developed,contributing to enhanced snow monitoring and better management of water resources in alpine and arid areas.展开更多
Cloud storage,a core component of cloud computing,plays a vital role in the storage and management of data.Electronic Health Records(EHRs),which document users’health information,are typically stored on cloud servers...Cloud storage,a core component of cloud computing,plays a vital role in the storage and management of data.Electronic Health Records(EHRs),which document users’health information,are typically stored on cloud servers.However,users’sensitive data would then become unregulated.In the event of data loss,cloud storage providers might conceal the fact that data has been compromised to protect their reputation and mitigate losses.Ensuring the integrity of data stored in the cloud remains a pressing issue that urgently needs to be addressed.In this paper,we propose a data auditing scheme for cloud-based EHRs that incorporates recoverability and batch auditing,alongside a thorough security and performance evaluation.Our scheme builds upon the indistinguishability-based privacy-preserving auditing approach proposed by Zhou et al.We identify that this scheme is insecure and vulnerable to forgery attacks on data storage proofs.To address these vulnerabilities,we enhanced the auditing process using masking techniques and designed new algorithms to strengthen security.We also provide formal proof of the security of the signature algorithm and the auditing scheme.Furthermore,our results show that our scheme effectively protects user privacy and is resilient against malicious attacks.Experimental results indicate that our scheme is not only secure and efficient but also supports batch auditing of cloud data.Specifically,when auditing 10,000 users,batch auditing reduces computational overhead by 101 s compared to normal auditing.展开更多
The cloud data centres evolved with an issue of energy management due to the constant increase in size,complexity and enormous consumption of energy.Energy management is a challenging issue that is critical in cloud d...The cloud data centres evolved with an issue of energy management due to the constant increase in size,complexity and enormous consumption of energy.Energy management is a challenging issue that is critical in cloud data centres and an important concern of research for many researchers.In this paper,we proposed a cuckoo search(CS)-based optimisation technique for the virtual machine(VM)selection and a novel placement algorithm considering the different constraints.The energy consumption model and the simulation model have been implemented for the efficient selection of VM.The proposed model CSOA-VM not only lessens the violations at the service level agreement(SLA)level but also minimises the VM migrations.The proposed model also saves energy and the performance analysis shows that energy consumption obtained is 1.35 kWh,SLA violation is 9.2 and VM migration is about 268.Thus,there is an improvement in energy consumption of about 1.8%and a 2.1%improvement(reduction)in violations of SLA in comparison to existing techniques.展开更多
Early detection of convective clouds is vital for minimizing hazardous impacts.Forecasting convective initiation(CI)using current multispectral geostationary meteorological satellites is often challenged by high false...Early detection of convective clouds is vital for minimizing hazardous impacts.Forecasting convective initiation(CI)using current multispectral geostationary meteorological satellites is often challenged by high false-alarm rates and missed detections caused by limited resolution.In contrast,high-resolution earth observation satellites offer more detailed texture information,improving early detection capabilities.The authors propose a novel methodology that integrates the advanced features of China’s latest-generation satellites,Gaofen-4(GF-4)and Fengyun-4A(FY-4A).This fusion method retains GF’s high-resolution details and FY-4A’s multispectral information.Two cases from different observational scenarios and weather conditions under GF-4’s staring mode were carried out to compare the CI forecast results based on fused data and solely on FY-4A data.The fused data demonstrated superior performance in detecting smaller-scale convective clouds,enabling earlier forecasting with a lead time of 15–30 minutes,and more accurate location identification.Integrating high-resolution earth observation satellites into early convective cloud detection provides valuable insights for forecasters and decision-makers,particularly given the current resolution limitations of geostationary meteorological satellites.展开更多
In this study, a variety of high-resolution satellite data were used to analyze the similarities and differences in horizontal and vertical cloud microphysical characteristics of 11 tropical cyclones(TCs) in three dif...In this study, a variety of high-resolution satellite data were used to analyze the similarities and differences in horizontal and vertical cloud microphysical characteristics of 11 tropical cyclones(TCs) in three different ocean basins.The results show that for the 11 TCs in different ocean basins, no matter in what season the TCs were generated when they reached or approached Category 4, their melting layers were all distributed in the vertical direction at the height of about 5 km. The high value of ice water contents in the vertical direction of 11 TCs all reach or approach about 2000 g cm^(–3).The total attenuated scattering coefficient at 532 nm, TAB-532, can successfully characterize the distribution of areas with high ice water content when the vertical distribution was concentrated near 0.1 km^(–1)sr^(–1), possibly because the diameter distribution of the corresponding range of aerosol particles had a more favorable effect on the formation of ice nuclei,indicating that aerosols had a significant impact on the ice-phase processes and characteristics. Moreover, by analyzing the horizontal cloud water content, the distribution analysis of cloud water path(CWP) and ice water path(IWP) shows that when the sea surface temperature was at a relatively high value, and the vertical wind shear was relatively small, the CWP and the IWP can reach a relatively high value, which also proves the importance of environmental field factors on the influence of TC cloud microphysical characteristics.展开更多
The spatial distribution of discontinuities and the size of rock blocks are the key indicators for rock mass quality evaluation and rockfall risk assessment.Traditional manual measurement is often dangerous or unreach...The spatial distribution of discontinuities and the size of rock blocks are the key indicators for rock mass quality evaluation and rockfall risk assessment.Traditional manual measurement is often dangerous or unreachable at some high-steep rock slopes.In contrast,unmanned aerial vehicle(UAV)photogrammetry is not limited by terrain conditions,and can efficiently collect high-precision three-dimensional(3D)point clouds of rock masses through all-round and multiangle photography for rock mass characterization.In this paper,a new method based on a 3D point cloud is proposed for discontinuity identification and refined rock block modeling.The method is based on four steps:(1)Establish a point cloud spatial topology,and calculate the point cloud normal vector and average point spacing based on several machine learning algorithms;(2)Extract discontinuities using the density-based spatial clustering of applications with noise(DBSCAN)algorithm and fit the discontinuity plane by combining principal component analysis(PCA)with the natural breaks(NB)method;(3)Propose a method of inserting points in the line segment to generate an embedded discontinuity point cloud;and(4)Adopt a Poisson reconstruction method for refined rock block modeling.The proposed method was applied to an outcrop of an ultrahigh steep rock slope and compared with the results of previous studies and manual surveys.The results show that the method can eliminate the influence of discontinuity undulations on the orientation measurement and describe the local concave-convex characteristics on the modeling of rock blocks.The calculation results are accurate and reliable,which can meet the practical requirements of engineering.展开更多
Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a...Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi...Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.展开更多
With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comp...With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
基金supported by the Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(RS-2024-00399401,Development of Quantum-Safe Infrastructure Migration and Quantum Security Verification Technologies).
文摘With the rise of remote collaboration,the demand for advanced storage and collaboration tools has rapidly increased.However,traditional collaboration tools primarily rely on access control,leaving data stored on cloud servers vulnerable due to insufficient encryption.This paper introduces a novel mechanism that encrypts data in‘bundle’units,designed to meet the dual requirements of efficiency and security for frequently updated collaborative data.Each bundle includes updated information,allowing only the updated portions to be reencrypted when changes occur.The encryption method proposed in this paper addresses the inefficiencies of traditional encryption modes,such as Cipher Block Chaining(CBC)and Counter(CTR),which require decrypting and re-encrypting the entire dataset whenever updates occur.The proposed method leverages update-specific information embedded within data bundles and metadata that maps the relationship between these bundles and the plaintext data.By utilizing this information,the method accurately identifies the modified portions and applies algorithms to selectively re-encrypt only those sections.This approach significantly enhances the efficiency of data updates while maintaining high performance,particularly in large-scale data environments.To validate this approach,we conducted experiments measuring execution time as both the size of the modified data and the total dataset size varied.Results show that the proposed method significantly outperforms CBC and CTR modes in execution speed,with greater performance gains as data size increases.Additionally,our security evaluation confirms that this method provides robust protection against both passive and active attacks.
文摘With the continuous advancement of the tiered diagnosis and treatment system,the medical consortium model has gained increasing attention as an important approach to promoting the vertical integration of healthcare resources.Within this context,laboratory data,as a key component of healthcare information systems,urgently requires efficient sharing and intelligent analysis.This paper designs and constructs an intelligent early warning system for laboratory data based on a cloud platform tailored to the medical consortium model.Through standardized data formats and unified access interfaces,the system enables the integration and cleaning of laboratory data across multiple healthcare institutions.By combining medical rule sets with machine learning models,the system achieves graded alerts and rapid responses to abnormal key indicators and potential outbreaks of infectious diseases.Practical deployment results demonstrate that the system significantly improves the utilization efficiency of laboratory data,strengthens public health event monitoring,and optimizes inter-institutional collaboration.The paper also discusses challenges encountered during system implementation,such as inconsistent data standards,security and compliance concerns,and model interpretability,and proposes corresponding optimization strategies.These findings provide a reference for the broader application of intelligent medical early warning systems.
基金Guangxi Key Laboratory of Spatial Information and Geomatics(21-238-21-12)Guangxi Young and Middle-aged Teachers’Research Fundamental Ability Enhancement Project(2023KY1196).
文摘Airborne LiDAR(Light Detection and Ranging)is an evolving high-tech active remote sensing technology that has the capability to acquire large-area topographic data and can quickly generate DEM(Digital Elevation Model)products.Combined with image data,this technology can further enrich and extract spatial geographic information.However,practically,due to the limited operating range of airborne LiDAR and the large area of task,it would be necessary to perform registration and stitching process on point clouds of adjacent flight strips.By eliminating grow errors,the systematic errors in the data need to be effectively reduced.Thus,this paper conducts research on point cloud registration methods in urban building areas,aiming to improve the accuracy and processing efficiency of airborne LiDAR data.Meanwhile,an improved post-ICP(Iterative Closest Point)point cloud registration method was proposed in this study to determine the accurate registration and efficient stitching of point clouds,which capable to provide a potential technical support for applicants in related field.
基金Postgraduate Innovation Top notch Talent Training Project of Hunan Province,Grant/Award Number:CX20220045Scientific Research Project of National University of Defense Technology,Grant/Award Number:22-ZZCX-07+2 种基金New Era Education Quality Project of Anhui Province,Grant/Award Number:2023cxcysj194National Natural Science Foundation of China,Grant/Award Numbers:62201597,62205372,1210456foundation of Hefei Comprehensive National Science Center,Grant/Award Number:KY23C502。
文摘Large-scale point cloud datasets form the basis for training various deep learning networks and achieving high-quality network processing tasks.Due to the diversity and robustness constraints of the data,data augmentation(DA)methods are utilised to expand dataset diversity and scale.However,due to the complex and distinct characteristics of LiDAR point cloud data from different platforms(such as missile-borne and vehicular LiDAR data),directly applying traditional 2D visual domain DA methods to 3D data can lead to networks trained using this approach not robustly achieving the corresponding tasks.To address this issue,the present study explores DA for missile-borne LiDAR point cloud using a Monte Carlo(MC)simulation method that closely resembles practical application.Firstly,the model of multi-sensor imaging system is established,taking into account the joint errors arising from the platform itself and the relative motion during the imaging process.A distortion simulation method based on MC simulation for augmenting missile-borne LiDAR point cloud data is proposed,underpinned by an analysis of combined errors between different modal sensors,achieving high-quality augmentation of point cloud data.The effectiveness of the proposed method in addressing imaging system errors and distortion simulation is validated using the imaging scene dataset constructed in this paper.Comparative experiments between the proposed point cloud DA algorithm and the current state-of-the-art algorithms in point cloud detection and single object tracking tasks demonstrate that the proposed method can improve the network performance obtained from unaugmented datasets by over 17.3%and 17.9%,surpassing SOTA performance of current point cloud DA algorithms.
基金funded by Princess Nourah bint Abdulrahman UniversityResearchers Supporting Project number (PNURSP2024R408), Princess Nourah bint AbdulrahmanUniversity, Riyadh, Saudi Arabia.
文摘A basic procedure for transforming readable data into encoded forms is encryption, which ensures security when the right decryption keys are used. Hadoop is susceptible to possible cyber-attacks because it lacks built-in security measures, even though it can effectively handle and store enormous datasets using the Hadoop Distributed File System (HDFS). The increasing number of data breaches emphasizes how urgently creative encryption techniques are needed in cloud-based big data settings. This paper presents Adaptive Attribute-Based Honey Encryption (AABHE), a state-of-the-art technique that combines honey encryption with Ciphertext-Policy Attribute-Based Encryption (CP-ABE) to provide improved data security. Even if intercepted, AABHE makes sure that sensitive data cannot be accessed by unauthorized parties. With a focus on protecting huge files in HDFS, the suggested approach achieves 98% security robustness and 95% encryption efficiency, outperforming other encryption methods including Ciphertext-Policy Attribute-Based Encryption (CP-ABE), Key-Policy Attribute-Based Encryption (KB-ABE), and Advanced Encryption Standard combined with Attribute-Based Encryption (AES+ABE). By fixing Hadoop’s security flaws, AABHE fortifies its protections against data breaches and enhances Hadoop’s dependability as a platform for processing and storing massive amounts of data.
基金funded by Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah under grant No.(RG-6-611-43)the authors,therefore,acknowledge with thanks DSR technical and financial support.
文摘The integration of the Internet of Things(IoT)into healthcare systems improves patient care,boosts operational efficiency,and contributes to cost-effective healthcare delivery.However,overcoming several associated challenges,such as data security,interoperability,and ethical concerns,is crucial to realizing the full potential of IoT in healthcare.Real-time anomaly detection plays a key role in protecting patient data and maintaining device integrity amidst the additional security risks posed by interconnected systems.In this context,this paper presents a novelmethod for healthcare data privacy analysis.The technique is based on the identification of anomalies in cloud-based Internet of Things(IoT)networks,and it is optimized using explainable artificial intelligence.For anomaly detection,the Radial Boltzmann Gaussian Temporal Fuzzy Network(RBGTFN)is used in the process of doing information privacy analysis for healthcare data.Remora Colony SwarmOptimization is then used to carry out the optimization of the network.The performance of the model in identifying anomalies across a variety of healthcare data is evaluated by an experimental study.This evaluation suggested that themodel measures the accuracy,precision,latency,Quality of Service(QoS),and scalability of themodel.A remarkable 95%precision,93%latency,89%quality of service,98%detection accuracy,and 96%scalability were obtained by the suggested model,as shown by the subsequent findings.
基金supported By Grant (PLN2022-14) of State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation (Southwest Petroleum University)。
文摘Well logging technology has accumulated a large amount of historical data through four generations of technological development,which forms the basis of well logging big data and digital assets.However,the value of these data has not been well stored,managed and mined.With the development of cloud computing technology,it provides a rare development opportunity for logging big data private cloud.The traditional petrophysical evaluation and interpretation model has encountered great challenges in the face of new evaluation objects.The solution research of logging big data distributed storage,processing and learning functions integrated in logging big data private cloud has not been carried out yet.To establish a distributed logging big-data private cloud platform centered on a unifi ed learning model,which achieves the distributed storage and processing of logging big data and facilitates the learning of novel knowledge patterns via the unifi ed logging learning model integrating physical simulation and data models in a large-scale functional space,thus resolving the geo-engineering evaluation problem of geothermal fi elds.Based on the research idea of“logging big data cloud platform-unifi ed logging learning model-large function space-knowledge learning&discovery-application”,the theoretical foundation of unified learning model,cloud platform architecture,data storage and learning algorithm,arithmetic power allocation and platform monitoring,platform stability,data security,etc.have been carried on analysis.The designed logging big data cloud platform realizes parallel distributed storage and processing of data and learning algorithms.The feasibility of constructing a well logging big data cloud platform based on a unifi ed learning model of physics and data is analyzed in terms of the structure,ecology,management and security of the cloud platform.The case study shows that the logging big data cloud platform has obvious technical advantages over traditional logging evaluation methods in terms of knowledge discovery method,data software and results sharing,accuracy,speed and complexity.
文摘Cloud computing has become an essential technology for the management and processing of large datasets,offering scalability,high availability,and fault tolerance.However,optimizing data replication across multiple data centers poses a significant challenge,especially when balancing opposing goals such as latency,storage costs,energy consumption,and network efficiency.This study introduces a novel Dynamic Optimization Algorithm called Dynamic Multi-Objective Gannet Optimization(DMGO),designed to enhance data replication efficiency in cloud environments.Unlike traditional static replication systems,DMGO adapts dynamically to variations in network conditions,system demand,and resource availability.The approach utilizes multi-objective optimization approaches to efficiently balance data access latency,storage efficiency,and operational costs.DMGO consistently evaluates data center performance and adjusts replication algorithms in real time to guarantee optimal system efficiency.Experimental evaluations conducted in a simulated cloud environment demonstrate that DMGO significantly outperforms conventional static algorithms,achieving faster data access,lower storage overhead,reduced energy consumption,and improved scalability.The proposed methodology offers a robust and adaptable solution for modern cloud systems,ensuring efficient resource consumption while maintaining high performance.
基金funded by the Third Xinjiang Scientific Expedition Program(2021xjkk1400)the National Natural Science Foundation of China(42071049)+2 种基金the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2019D01C022)the Xinjiang Uygur Autonomous Region Innovation Environment Construction Special Project&Science and Technology Innovation Base Construction Project(PT2107)the Tianshan Talent-Science and Technology Innovation Team(2022TSYCTD0006).
文摘Snow cover plays a critical role in global climate regulation and hydrological processes.Accurate monitoring is essential for understanding snow distribution patterns,managing water resources,and assessing the impacts of climate change.Remote sensing has become a vital tool for snow monitoring,with the widely used Moderate-resolution Imaging Spectroradiometer(MODIS)snow products from the Terra and Aqua satellites.However,cloud cover often interferes with snow detection,making cloud removal techniques crucial for reliable snow product generation.This study evaluated the accuracy of four MODIS snow cover datasets generated through different cloud removal algorithms.Using real-time field camera observations from four stations in the Tianshan Mountains,China,this study assessed the performance of these datasets during three distinct snow periods:the snow accumulation period(September-November),snowmelt period(March-June),and stable snow period(December-February in the following year).The findings showed that cloud-free snow products generated using the Hidden Markov Random Field(HMRF)algorithm consistently outperformed the others,particularly under cloud cover,while cloud-free snow products using near-day synthesis and the spatiotemporal adaptive fusion method with error correction(STAR)demonstrated varying performance depending on terrain complexity and cloud conditions.This study highlighted the importance of considering terrain features,land cover types,and snow dynamics when selecting cloud removal methods,particularly in areas with rapid snow accumulation and melting.The results suggested that future research should focus on improving cloud removal algorithms through the integration of machine learning,multi-source data fusion,and advanced remote sensing technologies.By expanding validation efforts and refining cloud removal strategies,more accurate and reliable snow products can be developed,contributing to enhanced snow monitoring and better management of water resources in alpine and arid areas.
基金supported by National Natural Science Foundation of China(No.62172436)Additionally,it is supported by Natural Science Foundation of Shaanxi Province(No.2023-JC-YB-584)Engineering University of PAP’s Funding for Scientific Research Innovation Team and Key Researcher(No.KYGG202011).
文摘Cloud storage,a core component of cloud computing,plays a vital role in the storage and management of data.Electronic Health Records(EHRs),which document users’health information,are typically stored on cloud servers.However,users’sensitive data would then become unregulated.In the event of data loss,cloud storage providers might conceal the fact that data has been compromised to protect their reputation and mitigate losses.Ensuring the integrity of data stored in the cloud remains a pressing issue that urgently needs to be addressed.In this paper,we propose a data auditing scheme for cloud-based EHRs that incorporates recoverability and batch auditing,alongside a thorough security and performance evaluation.Our scheme builds upon the indistinguishability-based privacy-preserving auditing approach proposed by Zhou et al.We identify that this scheme is insecure and vulnerable to forgery attacks on data storage proofs.To address these vulnerabilities,we enhanced the auditing process using masking techniques and designed new algorithms to strengthen security.We also provide formal proof of the security of the signature algorithm and the auditing scheme.Furthermore,our results show that our scheme effectively protects user privacy and is resilient against malicious attacks.Experimental results indicate that our scheme is not only secure and efficient but also supports batch auditing of cloud data.Specifically,when auditing 10,000 users,batch auditing reduces computational overhead by 101 s compared to normal auditing.
文摘The cloud data centres evolved with an issue of energy management due to the constant increase in size,complexity and enormous consumption of energy.Energy management is a challenging issue that is critical in cloud data centres and an important concern of research for many researchers.In this paper,we proposed a cuckoo search(CS)-based optimisation technique for the virtual machine(VM)selection and a novel placement algorithm considering the different constraints.The energy consumption model and the simulation model have been implemented for the efficient selection of VM.The proposed model CSOA-VM not only lessens the violations at the service level agreement(SLA)level but also minimises the VM migrations.The proposed model also saves energy and the performance analysis shows that energy consumption obtained is 1.35 kWh,SLA violation is 9.2 and VM migration is about 268.Thus,there is an improvement in energy consumption of about 1.8%and a 2.1%improvement(reduction)in violations of SLA in comparison to existing techniques.
基金supported by the Demonstration System for High Resolution Meteorological Application(Ⅱ)[grant number 32-Y30F08-9001-20/22]the National Natural Science Foundation of China[grant numbers 12292981 and 12292984]。
文摘Early detection of convective clouds is vital for minimizing hazardous impacts.Forecasting convective initiation(CI)using current multispectral geostationary meteorological satellites is often challenged by high false-alarm rates and missed detections caused by limited resolution.In contrast,high-resolution earth observation satellites offer more detailed texture information,improving early detection capabilities.The authors propose a novel methodology that integrates the advanced features of China’s latest-generation satellites,Gaofen-4(GF-4)and Fengyun-4A(FY-4A).This fusion method retains GF’s high-resolution details and FY-4A’s multispectral information.Two cases from different observational scenarios and weather conditions under GF-4’s staring mode were carried out to compare the CI forecast results based on fused data and solely on FY-4A data.The fused data demonstrated superior performance in detecting smaller-scale convective clouds,enabling earlier forecasting with a lead time of 15–30 minutes,and more accurate location identification.Integrating high-resolution earth observation satellites into early convective cloud detection provides valuable insights for forecasters and decision-makers,particularly given the current resolution limitations of geostationary meteorological satellites.
基金National Natural Science Foundation of China(42192554, 42175008)Shanghai Typhoon Research Foundation(TFJJ202201)+1 种基金S&T Development Fund of CAMS (2022KJ012)Basic Research Fund of CAMS (2022Y006)。
文摘In this study, a variety of high-resolution satellite data were used to analyze the similarities and differences in horizontal and vertical cloud microphysical characteristics of 11 tropical cyclones(TCs) in three different ocean basins.The results show that for the 11 TCs in different ocean basins, no matter in what season the TCs were generated when they reached or approached Category 4, their melting layers were all distributed in the vertical direction at the height of about 5 km. The high value of ice water contents in the vertical direction of 11 TCs all reach or approach about 2000 g cm^(–3).The total attenuated scattering coefficient at 532 nm, TAB-532, can successfully characterize the distribution of areas with high ice water content when the vertical distribution was concentrated near 0.1 km^(–1)sr^(–1), possibly because the diameter distribution of the corresponding range of aerosol particles had a more favorable effect on the formation of ice nuclei,indicating that aerosols had a significant impact on the ice-phase processes and characteristics. Moreover, by analyzing the horizontal cloud water content, the distribution analysis of cloud water path(CWP) and ice water path(IWP) shows that when the sea surface temperature was at a relatively high value, and the vertical wind shear was relatively small, the CWP and the IWP can reach a relatively high value, which also proves the importance of environmental field factors on the influence of TC cloud microphysical characteristics.
基金supported by the National Natural Science Foundation of China(Grant Nos.41941017 and 42177139)Graduate Innovation Fund of Jilin University(Grant No.2024CX099)。
文摘The spatial distribution of discontinuities and the size of rock blocks are the key indicators for rock mass quality evaluation and rockfall risk assessment.Traditional manual measurement is often dangerous or unreachable at some high-steep rock slopes.In contrast,unmanned aerial vehicle(UAV)photogrammetry is not limited by terrain conditions,and can efficiently collect high-precision three-dimensional(3D)point clouds of rock masses through all-round and multiangle photography for rock mass characterization.In this paper,a new method based on a 3D point cloud is proposed for discontinuity identification and refined rock block modeling.The method is based on four steps:(1)Establish a point cloud spatial topology,and calculate the point cloud normal vector and average point spacing based on several machine learning algorithms;(2)Extract discontinuities using the density-based spatial clustering of applications with noise(DBSCAN)algorithm and fit the discontinuity plane by combining principal component analysis(PCA)with the natural breaks(NB)method;(3)Propose a method of inserting points in the line segment to generate an embedded discontinuity point cloud;and(4)Adopt a Poisson reconstruction method for refined rock block modeling.The proposed method was applied to an outcrop of an ultrahigh steep rock slope and compared with the results of previous studies and manual surveys.The results show that the method can eliminate the influence of discontinuity undulations on the orientation measurement and describe the local concave-convex characteristics on the modeling of rock blocks.The calculation results are accurate and reliable,which can meet the practical requirements of engineering.
文摘Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金funded by University of Transport and Communications(UTC)under grant number T2025-CN-004.
文摘Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2023-00235509Development of security monitoring technology based network behavior against encrypted cyber threats in ICT convergence environment).
文摘With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.