Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel a...Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.展开更多
Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness a...Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.展开更多
Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi...Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.展开更多
With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comp...With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.展开更多
Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic...Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.展开更多
Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using d...Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.展开更多
[Objective] The paper was to establish tea germplasm database of Yunnan Province,and promote sharing of the tea germplasm resources.[Method] Eight hundred and thirty copies of tea germplasm resources of Yunnan Provinc...[Objective] The paper was to establish tea germplasm database of Yunnan Province,and promote sharing of the tea germplasm resources.[Method] Eight hundred and thirty copies of tea germplasm resources of Yunnan Province were first systematically documented by using Access database software,the generic description of 631 tea resources and characteristic description of 300 tea resources were submitted for e-platform,then linked with the national e-platform for natural scientific and technological resources,and the tea germplasm database of Yunnan Province was established.[Result] Based on the conservation and utilization status of tea germplam resources,the sharing and utilization framework of tea germplam resources was presented.Many problems and suggestion about tea germplasm resources in the process of conservation,documentation concordance and sharing were pointed out.For example,conservation areas were separated and system was not completed;the main traits assessment and identification researching work had not completely accomplished and sharing was inefficient.[Conclusion] The paper laid foundation for standardized,digitized and information-based management of tea germplasm resources.展开更多
With the advent of cloud storage, users can share their own data in the remote cloud as a group. To ensure the security of stored data and the normal operation of public auditing, once a user is revoked from the user ...With the advent of cloud storage, users can share their own data in the remote cloud as a group. To ensure the security of stored data and the normal operation of public auditing, once a user is revoked from the user group, the data files he signed should be resigned by other legal users in the group. In this paper, we propose a new re-signature scheme utilizing backup files to rebuild data which can resist the collusion between the cloud and revoked users, and we use Shamir Secret Sharing Scheme to encrypt data in the multi-managers system which can separate the authority of the group managers. Moreover, our scheme is more practical because we do not need managers to be online all the time. Performance evaluation shows that our mechanism can improve the efficiency of the process of data re-signature.展开更多
Sharing of personal health records(PHR)in cloud computing is an essential functionality in the healthcare system.However,how to securely,efficiently and flexibly share PHRs data of the patient in a multi-receiver sett...Sharing of personal health records(PHR)in cloud computing is an essential functionality in the healthcare system.However,how to securely,efficiently and flexibly share PHRs data of the patient in a multi-receiver setting has not been well addressed.For instance,since the trust domain of the cloud server is not identical to the data owner or data user,the semi-trust cloud service provider may intentionally destroy or tamper shared PHRs data of user or only transform partial ciphertext of the shared PHRs or even return wrong computation results to save its storage and computation resource,to pursue maximum economic interest or other malicious purposes.Thus,the PHRs data storing or sharing via the cloud server should be performed with consistency and integrity verification.Fortunately,the emergence of blockchain technology provides new ideas and prospects for ensuring the consistency and integrity of shared PHRs data.To this end,in this work,we leverage the consortiumblockchain technology to enhance the trustworthiness of each participant and propose a blockchain-based patient-centric data sharing scheme for PHRs in cloud computing(BC-PC-Share).Different from the state-of-art schemes,our proposal can achieve the following desired properties:(1)Realizing patient-centric PHRs sharing with a public verification function,i.e.,which can ensure that the returned shared data is consistent with the requested shared data and the integrity of the shared data is not compromised.(2)Supporting scalable and fine-grained access control and sharing of PHRs data with multiple domain users,such as hospitals,medical research institutes,and medical insurance companies.(3)Achieving efficient user decryption by leveraging the transformation key technique and efficient user revocation by introducing time-controlled access.The security analysis and simulation experiment demonstrate that the proposed BC-PC-Share scheme is a feasible and promising solution for PHRs data sharing via consortium blockchain.展开更多
In order to explore the travel characteristics and space-time distribution of different groups of bikeshare users,an online analytical processing(OLAP)tool called data cube was used for treating and displaying multi-d...In order to explore the travel characteristics and space-time distribution of different groups of bikeshare users,an online analytical processing(OLAP)tool called data cube was used for treating and displaying multi-dimensional data.We extended and modified the traditionally threedimensional data cube into four dimensions,which are space,date,time,and user,each with a user-specified hierarchy,and took transaction numbers and travel time as two quantitative measures.The results suggest that there are two obvious transaction peaks during the morning and afternoon rush hours on weekdays,while the volume at weekends has an approximate even distribution.Bad weather condition significantly restricts the bikeshare usage.Besides,seamless smartcard users generally take a longer trip than exclusive smartcard users;and non-native users ride faster than native users.These findings not only support the applicability and efficiency of data cube in the field of visualizing massive smartcard data,but also raise equity concerns among bikeshare users with different demographic backgrounds.展开更多
The accelerated advancement of the Internet of Things(IoT)has generated substantial data,including sensitive and private information.Consequently,it is imperative to guarantee the security of data sharing.While facili...The accelerated advancement of the Internet of Things(IoT)has generated substantial data,including sensitive and private information.Consequently,it is imperative to guarantee the security of data sharing.While facilitating fine-grained access control,Ciphertext Policy Attribute-Based Encryption(CP-ABE)can effectively ensure the confidentiality of shared data.Nevertheless,the conventional centralized CP-ABE scheme is plagued by the issues of keymisuse,key escrow,and large computation,which will result in security risks.This paper suggests a lightweight IoT data security sharing scheme that integrates blockchain technology and CP-ABE to address the abovementioned issues.The integrity and traceability of shared data are guaranteed by the use of blockchain technology to store and verify access transactions.The encryption and decryption operations of the CP-ABE algorithm have been implemented using elliptic curve scalarmultiplication to accommodate lightweight IoT devices,as opposed to themore arithmetic bilinear pairing found in the traditional CP-ABE algorithm.Additionally,a portion of the computation is delegated to the edge nodes to alleviate the computational burden on users.A distributed key management method is proposed to address the issues of key escrow andmisuse.Thismethod employs the edge blockchain to facilitate the storage and distribution of attribute private keys.Meanwhile,data security sharing is enhanced by combining off-chain and on-chain ciphertext storage.The security and performance analysis indicates that the proposed scheme is more efficient and secure.展开更多
Proceeding from trade structure variations,this paper provides a new perspective on the study of the share of labor income in China.China's commodity trade structure has experienced a step change in recent years.A...Proceeding from trade structure variations,this paper provides a new perspective on the study of the share of labor income in China.China's commodity trade structure has experienced a step change in recent years.According to theoretical analysis,trade exerts not only a direct effect on the share of labor income through international division of labor and specialization but also an indirect effect through factor intensity variations and technology progress bias.Empirical study discovered that export has a significant negative effect on the share of China's labor income while import has a positive effect.Import and export have different levels and directions of effect on sectors with different factor intensity.展开更多
Cloud storage has been widely used to team work or cooperation devel-opment.Data owners set up groups,generating and uploading their data to cloud storage,while other users in the groups download and make use of it,wh...Cloud storage has been widely used to team work or cooperation devel-opment.Data owners set up groups,generating and uploading their data to cloud storage,while other users in the groups download and make use of it,which is called group data sharing.As all kinds of cloud service,data group sharing also suffers from hardware/software failures and human errors.Provable Data Posses-sion(PDP)schemes are proposed to check the integrity of data stored in cloud without downloading.However,there are still some unmet needs lying in auditing group shared data.Researchers propose four issues necessary for a secure group shared data auditing:public verification,identity privacy,collusion attack resis-tance and traceability.However,none of the published work has succeeded in achieving all of these properties so far.In this paper,we propose a novel block-chain-based ring signature PDP scheme for group shared data,with an instance deployed on a cloud server.We design a linkable ring signature method called Linkable Homomorphic Authenticable Ring Signature(LHARS)to implement public anonymous auditing for group data.We also build smart contracts to resist collusion attack in group auditing.The security analysis and performance evalua-tion prove that our scheme is both secure and efficient.展开更多
Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced tran...Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.展开更多
With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become ...With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become a research hotspot, and the security of the oracle responsible for providing reliable data has attracted much attention. The most widely used centralized oracles in blockchain, such as Provable and Town Crier, all rely on a single oracle to obtain data, which suffers from a single point of failure and limits the large-scale development of blockchain. To this end, the distributed oracle scheme is put forward, but the existing distributed oracle schemes such as Chainlink and Augur generally have low execution efficiency and high communication overhead, which leads to their poor applicability. To solve the above problems, this paper proposes a trusted distributed oracle scheme based on a share recovery threshold signature. First, a data verification method of distributed oracles is designed based on threshold signature. By aggregating the signatures of oracles, data from different data sources can be mutually verified, leading to a more efficient data verification and aggregation process. Then, a credibility-based cluster head election algorithm is designed, which reduces the communication overhead by clarifying the function distribution and building a hierarchical structure. Considering the good performance of the BLS threshold signature in large-scale applications, this paper combines it with distributed oracle technology and proposes a BLS threshold signature algorithm that supports share recovery in distributed oracles. The share recovery mechanism enables the proposed scheme to solve the key loss issue, and the setting of the threshold value enables the proposed scheme to complete signature aggregation with only a threshold number of oracles, making the scheme more robust. Finally, experimental results indicate that, by using the threshold signature technology and the cluster head election algorithm, our scheme effectively improves the execution efficiency of oracles and solves the problem of a single point of failure, leading to higher scalability and robustness.展开更多
BACKGROUND There is a lack of studies and educational programs focused on biosimilars and shared decision-making among patients diagnosed with various rheumatic diseases.AIM To improve knowledge and awareness of biosi...BACKGROUND There is a lack of studies and educational programs focused on biosimilars and shared decision-making among patients diagnosed with various rheumatic diseases.AIM To improve knowledge and awareness of biosimilars and shared decision-making among patients attending rheumatology practices in Colorado as well as to assess a rheumatology patient’s interest in discussing biosimilars as well as shared decision-making with others(e.g.,medical professionals,family members,friends).METHODS Our goal was to work with 80 rheumatology teams in Colorado.We developed and distributed 2000 multi-page brochures to each participating office and later conducted an online anonymous survey.RESULTS There were a total of 49(2.5%)rheumatology patients who responded to our survey.After reading our educational booklet,many survey respondents identified the correct answer in most questions focused on biosimilars or shared decision-making.Our survey results suggest that patients attending rheumatology practices in Colorado are generally not involved in discussions with their providers regarding treatment plans or options.The improvement in scores after reading our educational materials was statistically significant for biosimilars and shared decision-making.CONCLUSION Overall,the level of knowledge and awareness of biosimilars and shared decisionmaking among patients attending rheumatology practices in Colorado was low.More educational programs as well as follow up trainings to measure changes in knowledge and awareness regarding biosimilars and shared decision-making among patients attending rheumatology practices are recommended.展开更多
文摘Missing data presents a crucial challenge in data analysis,especially in high-dimensional datasets,where missing data often leads to biased conclusions and degraded model performance.In this study,we present a novel autoencoder-based imputation framework that integrates a composite loss function to enhance robustness and precision.The proposed loss combines(i)a guided,masked mean squared error focusing on missing entries;(ii)a noise-aware regularization term to improve resilience against data corruption;and(iii)a variance penalty to encourage expressive yet stable reconstructions.We evaluate the proposed model across four missingness mechanisms,such as Missing Completely at Random,Missing at Random,Missing Not at Random,and Missing Not at Random with quantile censorship,under systematically varied feature counts,sample sizes,and missingness ratios ranging from 5%to 60%.Four publicly available real-world datasets(Stroke Prediction,Pima Indians Diabetes,Cardiovascular Disease,and Framingham Heart Study)were used,and the obtained results show that our proposed model consistently outperforms baseline methods,including traditional and deep learning-based techniques.An ablation study reveals the additive value of each component in the loss function.Additionally,we assessed the downstream utility of imputed data through classification tasks,where datasets imputed by the proposed method yielded the highest receiver operating characteristic area under the curve scores across all scenarios.The model demonstrates strong scalability and robustness,improving performance with larger datasets and higher feature counts.These results underscore the capacity of the proposed method to produce not only numerically accurate but also semantically useful imputations,making it a promising solution for robust data recovery in clinical applications.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R104)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Modern intrusion detection systems(MIDS)face persistent challenges in coping with the rapid evolution of cyber threats,high-volume network traffic,and imbalanced datasets.Traditional models often lack the robustness and explainability required to detect novel and sophisticated attacks effectively.This study introduces an advanced,explainable machine learning framework for multi-class IDS using the KDD99 and IDS datasets,which reflects real-world network behavior through a blend of normal and diverse attack classes.The methodology begins with sophisticated data preprocessing,incorporating both RobustScaler and QuantileTransformer to address outliers and skewed feature distributions,ensuring standardized and model-ready inputs.Critical dimensionality reduction is achieved via the Harris Hawks Optimization(HHO)algorithm—a nature-inspired metaheuristic modeled on hawks’hunting strategies.HHO efficiently identifies the most informative features by optimizing a fitness function based on classification performance.Following feature selection,the SMOTE is applied to the training data to resolve class imbalance by synthetically augmenting underrepresented attack types.The stacked architecture is then employed,combining the strengths of XGBoost,SVM,and RF as base learners.This layered approach improves prediction robustness and generalization by balancing bias and variance across diverse classifiers.The model was evaluated using standard classification metrics:precision,recall,F1-score,and overall accuracy.The best overall performance was recorded with an accuracy of 99.44%for UNSW-NB15,demonstrating the model’s effectiveness.After balancing,the model demonstrated a clear improvement in detecting the attacks.We tested the model on four datasets to show the effectiveness of the proposed approach and performed the ablation study to check the effect of each parameter.Also,the proposed model is computationaly efficient.To support transparency and trust in decision-making,explainable AI(XAI)techniques are incorporated that provides both global and local insight into feature contributions,and offers intuitive visualizations for individual predictions.This makes it suitable for practical deployment in cybersecurity environments that demand both precision and accountability.
基金funded by University of Transport and Communications(UTC)under grant number T2025-CN-004.
文摘Reversible data hiding(RDH)enables secret data embedding while preserving complete cover image recovery,making it crucial for applications requiring image integrity.The pixel value ordering(PVO)technique used in multi-stego images provides good image quality but often results in low embedding capability.To address these challenges,this paper proposes a high-capacity RDH scheme based on PVO that generates three stego images from a single cover image.The cover image is partitioned into non-overlapping blocks with pixels sorted in ascending order.Four secret bits are embedded into each block’s maximum pixel value,while three additional bits are embedded into the second-largest value when the pixel difference exceeds a predefined threshold.A similar embedding strategy is also applied to the minimum side of the block,including the second-smallest pixel value.This design enables each block to embed up to 14 bits of secret data.Experimental results demonstrate that the proposed method achieves significantly higher embedding capacity and improved visual quality compared to existing triple-stego RDH approaches,advancing the field of reversible steganography.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS-2023-00235509Development of security monitoring technology based network behavior against encrypted cyber threats in ICT convergence environment).
文摘With the increasing emphasis on personal information protection,encryption through security protocols has emerged as a critical requirement in data transmission and reception processes.Nevertheless,IoT ecosystems comprise heterogeneous networks where outdated systems coexist with the latest devices,spanning a range of devices from non-encrypted ones to fully encrypted ones.Given the limited visibility into payloads in this context,this study investigates AI-based attack detection methods that leverage encrypted traffic metadata,eliminating the need for decryption and minimizing system performance degradation—especially in light of these heterogeneous devices.Using the UNSW-NB15 and CICIoT-2023 dataset,encrypted and unencrypted traffic were categorized according to security protocol,and AI-based intrusion detection experiments were conducted for each traffic type based on metadata.To mitigate the problem of class imbalance,eight different data sampling techniques were applied.The effectiveness of these sampling techniques was then comparatively analyzed using two ensemble models and three Deep Learning(DL)models from various perspectives.The experimental results confirmed that metadata-based attack detection is feasible using only encrypted traffic.In the UNSW-NB15 dataset,the f1-score of encrypted traffic was approximately 0.98,which is 4.3%higher than that of unencrypted traffic(approximately 0.94).In addition,analysis of the encrypted traffic in the CICIoT-2023 dataset using the same method showed a significantly lower f1-score of roughly 0.43,indicating that the quality of the dataset and the preprocessing approach have a substantial impact on detection performance.Furthermore,when data sampling techniques were applied to encrypted traffic,the recall in the UNSWNB15(Encrypted)dataset improved by up to 23.0%,and in the CICIoT-2023(Encrypted)dataset by 20.26%,showing a similar level of improvement.Notably,in CICIoT-2023,f1-score and Receiver Operation Characteristic-Area Under the Curve(ROC-AUC)increased by 59.0%and 55.94%,respectively.These results suggest that data sampling can have a positive effect even in encrypted environments.However,the extent of the improvement may vary depending on data quality,model architecture,and sampling strategy.
基金funded by Deanship of Graduate studies and Scientific Research at Jouf University under grant No.(DGSSR-2024-02-01264).
文摘Automated essay scoring(AES)systems have gained significant importance in educational settings,offering a scalable,efficient,and objective method for evaluating student essays.However,developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology,diglossia,and the scarcity of annotated datasets.This paper presents a hybrid approach to Arabic AES by combining text-based,vector-based,and embeddingbased similarity measures to improve essay scoring accuracy while minimizing the training data required.Using a large Arabic essay dataset categorized into thematic groups,the study conducted four experiments to evaluate the impact of feature selection,data size,and model performance.Experiment 1 established a baseline using a non-machine learning approach,selecting top-N correlated features to predict essay scores.The subsequent experiments employed 5-fold cross-validation.Experiment 2 showed that combining embedding-based,text-based,and vector-based features in a Random Forest(RF)model achieved an R2 of 88.92%and an accuracy of 83.3%within a 0.5-point tolerance.Experiment 3 further refined the feature selection process,demonstrating that 19 correlated features yielded optimal results,improving R2 to 88.95%.In Experiment 4,an optimal data efficiency training approach was introduced,where training data portions increased from 5%to 50%.The study found that using just 10%of the data achieved near-peak performance,with an R2 of 85.49%,emphasizing an effective trade-off between performance and computational costs.These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems,especially in low-resource environments,addressing linguistic challenges while ensuring efficient data usage.
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.
基金The work described in this paper was fully supported by a grant from Hong Kong Metropolitan University(RIF/2021/05).
文摘Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested.
基金Supported by Project of Science and Technology Innovation Strong Province " Construction of Survey and Sharing Platform of Yunnan Agricultural and Biological Resource" (2007C0219Z) Construction Project of Scientific Platform with Basic Condition of Ministry of Science " Standardized Documentation Concordance and Sharing Pilot of Yunnan Tea Resources" (2004DK30390-34)~~
文摘[Objective] The paper was to establish tea germplasm database of Yunnan Province,and promote sharing of the tea germplasm resources.[Method] Eight hundred and thirty copies of tea germplasm resources of Yunnan Province were first systematically documented by using Access database software,the generic description of 631 tea resources and characteristic description of 300 tea resources were submitted for e-platform,then linked with the national e-platform for natural scientific and technological resources,and the tea germplasm database of Yunnan Province was established.[Result] Based on the conservation and utilization status of tea germplam resources,the sharing and utilization framework of tea germplam resources was presented.Many problems and suggestion about tea germplasm resources in the process of conservation,documentation concordance and sharing were pointed out.For example,conservation areas were separated and system was not completed;the main traits assessment and identification researching work had not completely accomplished and sharing was inefficient.[Conclusion] The paper laid foundation for standardized,digitized and information-based management of tea germplasm resources.
基金Supported by the National Natural Science Foundation of China(61572390)the National Key Research and Development Program of China(2017YFB0802000)+1 种基金the National Natural Science Foundation of Ningbo City(201601HJ-B01382)the Open Foundation of Key Laboratory of Cognitive Radio and Information Processing of Ministry of Education(Guilin University of Electronic Technology)(CRKL160202)
文摘With the advent of cloud storage, users can share their own data in the remote cloud as a group. To ensure the security of stored data and the normal operation of public auditing, once a user is revoked from the user group, the data files he signed should be resigned by other legal users in the group. In this paper, we propose a new re-signature scheme utilizing backup files to rebuild data which can resist the collusion between the cloud and revoked users, and we use Shamir Secret Sharing Scheme to encrypt data in the multi-managers system which can separate the authority of the group managers. Moreover, our scheme is more practical because we do not need managers to be online all the time. Performance evaluation shows that our mechanism can improve the efficiency of the process of data re-signature.
基金supported by the Youth Doctoral Foundation of Gansu Education Committee under Grant No.2022QB-176.
文摘Sharing of personal health records(PHR)in cloud computing is an essential functionality in the healthcare system.However,how to securely,efficiently and flexibly share PHRs data of the patient in a multi-receiver setting has not been well addressed.For instance,since the trust domain of the cloud server is not identical to the data owner or data user,the semi-trust cloud service provider may intentionally destroy or tamper shared PHRs data of user or only transform partial ciphertext of the shared PHRs or even return wrong computation results to save its storage and computation resource,to pursue maximum economic interest or other malicious purposes.Thus,the PHRs data storing or sharing via the cloud server should be performed with consistency and integrity verification.Fortunately,the emergence of blockchain technology provides new ideas and prospects for ensuring the consistency and integrity of shared PHRs data.To this end,in this work,we leverage the consortiumblockchain technology to enhance the trustworthiness of each participant and propose a blockchain-based patient-centric data sharing scheme for PHRs in cloud computing(BC-PC-Share).Different from the state-of-art schemes,our proposal can achieve the following desired properties:(1)Realizing patient-centric PHRs sharing with a public verification function,i.e.,which can ensure that the returned shared data is consistent with the requested shared data and the integrity of the shared data is not compromised.(2)Supporting scalable and fine-grained access control and sharing of PHRs data with multiple domain users,such as hospitals,medical research institutes,and medical insurance companies.(3)Achieving efficient user decryption by leveraging the transformation key technique and efficient user revocation by introducing time-controlled access.The security analysis and simulation experiment demonstrate that the proposed BC-PC-Share scheme is a feasible and promising solution for PHRs data sharing via consortium blockchain.
基金Supported by Projects of International Cooperation and Exchange of the National Natural Science Foundation of China(51561135003)Key Project of National Natural Science Foundation of China(51338003)Scientific Research Foundation of Graduated School of Southeast University(YBJJ1842)
文摘In order to explore the travel characteristics and space-time distribution of different groups of bikeshare users,an online analytical processing(OLAP)tool called data cube was used for treating and displaying multi-dimensional data.We extended and modified the traditionally threedimensional data cube into four dimensions,which are space,date,time,and user,each with a user-specified hierarchy,and took transaction numbers and travel time as two quantitative measures.The results suggest that there are two obvious transaction peaks during the morning and afternoon rush hours on weekdays,while the volume at weekends has an approximate even distribution.Bad weather condition significantly restricts the bikeshare usage.Besides,seamless smartcard users generally take a longer trip than exclusive smartcard users;and non-native users ride faster than native users.These findings not only support the applicability and efficiency of data cube in the field of visualizing massive smartcard data,but also raise equity concerns among bikeshare users with different demographic backgrounds.
文摘The accelerated advancement of the Internet of Things(IoT)has generated substantial data,including sensitive and private information.Consequently,it is imperative to guarantee the security of data sharing.While facilitating fine-grained access control,Ciphertext Policy Attribute-Based Encryption(CP-ABE)can effectively ensure the confidentiality of shared data.Nevertheless,the conventional centralized CP-ABE scheme is plagued by the issues of keymisuse,key escrow,and large computation,which will result in security risks.This paper suggests a lightweight IoT data security sharing scheme that integrates blockchain technology and CP-ABE to address the abovementioned issues.The integrity and traceability of shared data are guaranteed by the use of blockchain technology to store and verify access transactions.The encryption and decryption operations of the CP-ABE algorithm have been implemented using elliptic curve scalarmultiplication to accommodate lightweight IoT devices,as opposed to themore arithmetic bilinear pairing found in the traditional CP-ABE algorithm.Additionally,a portion of the computation is delegated to the edge nodes to alleviate the computational burden on users.A distributed key management method is proposed to address the issues of key escrow andmisuse.Thismethod employs the edge blockchain to facilitate the storage and distribution of attribute private keys.Meanwhile,data security sharing is enhanced by combining off-chain and on-chain ciphertext storage.The security and performance analysis indicates that the proposed scheme is more efficient and secure.
文摘Proceeding from trade structure variations,this paper provides a new perspective on the study of the share of labor income in China.China's commodity trade structure has experienced a step change in recent years.According to theoretical analysis,trade exerts not only a direct effect on the share of labor income through international division of labor and specialization but also an indirect effect through factor intensity variations and technology progress bias.Empirical study discovered that export has a significant negative effect on the share of China's labor income while import has a positive effect.Import and export have different levels and directions of effect on sectors with different factor intensity.
基金supported by the National Key Research and Development Program of China(No.2018YFC1604002)the National Natural Science Foundation of China(No.U1836204,No.U1936208,No.U1936216,No.62002197).
文摘Cloud storage has been widely used to team work or cooperation devel-opment.Data owners set up groups,generating and uploading their data to cloud storage,while other users in the groups download and make use of it,which is called group data sharing.As all kinds of cloud service,data group sharing also suffers from hardware/software failures and human errors.Provable Data Posses-sion(PDP)schemes are proposed to check the integrity of data stored in cloud without downloading.However,there are still some unmet needs lying in auditing group shared data.Researchers propose four issues necessary for a secure group shared data auditing:public verification,identity privacy,collusion attack resis-tance and traceability.However,none of the published work has succeeded in achieving all of these properties so far.In this paper,we propose a novel block-chain-based ring signature PDP scheme for group shared data,with an instance deployed on a cloud server.We design a linkable ring signature method called Linkable Homomorphic Authenticable Ring Signature(LHARS)to implement public anonymous auditing for group data.We also build smart contracts to resist collusion attack in group auditing.The security analysis and performance evalua-tion prove that our scheme is both secure and efficient.
基金research was funded by Science and Technology Project of State Grid Corporation of China under grant number 5200-202319382A-2-3-XG.
文摘Iced transmission line galloping poses a significant threat to the safety and reliability of power systems,leading directly to line tripping,disconnections,and power outages.Existing early warning methods of iced transmission line galloping suffer from issues such as reliance on a single data source,neglect of irregular time series,and lack of attention-based closed-loop feedback,resulting in high rates of missed and false alarms.To address these challenges,we propose an Internet of Things(IoT)empowered early warning method of transmission line galloping that integrates time series data from optical fiber sensing and weather forecast.Initially,the method applies a primary adaptive weighted fusion to the IoT empowered optical fiber real-time sensing data and weather forecast data,followed by a secondary fusion based on a Back Propagation(BP)neural network,and uses the K-medoids algorithm for clustering the fused data.Furthermore,an adaptive irregular time series perception adjustment module is introduced into the traditional Gated Recurrent Unit(GRU)network,and closed-loop feedback based on attentionmechanism is employed to update network parameters through gradient feedback of the loss function,enabling closed-loop training and time series data prediction of the GRU network model.Subsequently,considering various types of prediction data and the duration of icing,an iced transmission line galloping risk coefficient is established,and warnings are categorized based on this coefficient.Finally,using an IoT-driven realistic dataset of iced transmission line galloping,the effectiveness of the proposed method is validated through multi-dimensional simulation scenarios.
基金supported by the National Natural Science Foundation of China(Grant No.62102449)the Central Plains Talent Program under Grant No.224200510003.
文摘With the increasing popularity of blockchain applications, the security of data sources on the blockchain is gradually receiving attention. Providing reliable data for the blockchain safely and efficiently has become a research hotspot, and the security of the oracle responsible for providing reliable data has attracted much attention. The most widely used centralized oracles in blockchain, such as Provable and Town Crier, all rely on a single oracle to obtain data, which suffers from a single point of failure and limits the large-scale development of blockchain. To this end, the distributed oracle scheme is put forward, but the existing distributed oracle schemes such as Chainlink and Augur generally have low execution efficiency and high communication overhead, which leads to their poor applicability. To solve the above problems, this paper proposes a trusted distributed oracle scheme based on a share recovery threshold signature. First, a data verification method of distributed oracles is designed based on threshold signature. By aggregating the signatures of oracles, data from different data sources can be mutually verified, leading to a more efficient data verification and aggregation process. Then, a credibility-based cluster head election algorithm is designed, which reduces the communication overhead by clarifying the function distribution and building a hierarchical structure. Considering the good performance of the BLS threshold signature in large-scale applications, this paper combines it with distributed oracle technology and proposes a BLS threshold signature algorithm that supports share recovery in distributed oracles. The share recovery mechanism enables the proposed scheme to solve the key loss issue, and the setting of the threshold value enables the proposed scheme to complete signature aggregation with only a threshold number of oracles, making the scheme more robust. Finally, experimental results indicate that, by using the threshold signature technology and the cluster head election algorithm, our scheme effectively improves the execution efficiency of oracles and solves the problem of a single point of failure, leading to higher scalability and robustness.
文摘BACKGROUND There is a lack of studies and educational programs focused on biosimilars and shared decision-making among patients diagnosed with various rheumatic diseases.AIM To improve knowledge and awareness of biosimilars and shared decision-making among patients attending rheumatology practices in Colorado as well as to assess a rheumatology patient’s interest in discussing biosimilars as well as shared decision-making with others(e.g.,medical professionals,family members,friends).METHODS Our goal was to work with 80 rheumatology teams in Colorado.We developed and distributed 2000 multi-page brochures to each participating office and later conducted an online anonymous survey.RESULTS There were a total of 49(2.5%)rheumatology patients who responded to our survey.After reading our educational booklet,many survey respondents identified the correct answer in most questions focused on biosimilars or shared decision-making.Our survey results suggest that patients attending rheumatology practices in Colorado are generally not involved in discussions with their providers regarding treatment plans or options.The improvement in scores after reading our educational materials was statistically significant for biosimilars and shared decision-making.CONCLUSION Overall,the level of knowledge and awareness of biosimilars and shared decisionmaking among patients attending rheumatology practices in Colorado was low.More educational programs as well as follow up trainings to measure changes in knowledge and awareness regarding biosimilars and shared decision-making among patients attending rheumatology practices are recommended.