With the rapid advancement of sequencing technologies and the growing volume of omics data in plants, there is much anticipation in digging out the treasure from such big data and accordingly refining the current agri...With the rapid advancement of sequencing technologies and the growing volume of omics data in plants, there is much anticipation in digging out the treasure from such big data and accordingly refining the current agricultural practice to be applied in the near future. Toward this end, database resources that deliver web services for plant omics data submission, archiving, and integration are urgently needed. As a part of Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences (CAS), BIG Data Center (http://bigd.big.ac.cn) provides open access to a suite of database resources (Table 1), with the aim of supporting plant research activities for domestic and international users in both academia and industry to translate big data into big discoveries (BIG Data Center Members, 2017;BIG Data Center Members, 2018;BIG Data Center Members, 2019). Here, we give a brief introduction of plant-related database resources in BIG Data Center and appeal to plant research com丒 munities to make full use of these resources for plant data submission, archiving, and integration.展开更多
Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable ...Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable technology uses electronic devices that may be carried as accessories,clothes,or even embedded in the user's body.Although the potential benefits of smart wearables are numerous,their extensive and continual usage creates several privacy concerns and tricky information security challenges.In this paper,we present a comprehensive survey of recent privacy-preserving big data analytics applications based on wearable sensors.We highlight the fundamental features of security and privacy for wearable device applications.Then,we examine the utilization of deep learning algorithms with cryptography and determine their usability for wearable sensors.We also present a case study on privacy-preserving machine learning techniques.Herein,we theoretically and empirically evaluate the privacy-preserving deep learning framework's performance.We explain the implementation details of a case study of a secure prediction service using the convolutional neural network(CNN)model and the Cheon-Kim-Kim-Song(CHKS)homomorphic encryption algorithm.Finally,we explore the obstacles and gaps in the deployment of practical real-world applications.Following a comprehensive overview,we identify the most important obstacles that must be overcome and discuss some interesting future research directions.展开更多
Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs...Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.展开更多
Diabetic kidney disease(DKD)with increasing global prevalence lacks effective therapeutic targets to halt or reverse its progression.Therapeutic targets supported by causal genetic evidence are more likely to succeed ...Diabetic kidney disease(DKD)with increasing global prevalence lacks effective therapeutic targets to halt or reverse its progression.Therapeutic targets supported by causal genetic evidence are more likely to succeed in randomized clinical trials.In this study,we integrated large-scale plasma proteomics,genetic-driven causal inference,and experimental validation to identify prioritized targets for DKD using the UK Biobank(UKB)and FinnGen cohorts.Among 2844 diabetic patients(528 with DKD),we identified 37 targets significantly associated with incident DKD,supported by both observational and causal evidence.Of these,22%(8/37)of the potential targets are currently under investigation for DKD or other diseases.Our prospective study confirmed that higher levels of three prioritized targetsdinsulin-like growth factor binding protein 4(IGFBP4),family with sequence similarity 3 member C(FAM3C),and prostaglandin D2 synthase(PTGDS)dwere associated with a 4.35,3.51,and 3.57-fold increased likelihood of developing DKD,respectively.In addition,population-level protein-altering variants(PAVs)analysis and in vitro experiments cross-validated FAM3C and IGFBP4 as potential new target candidates for DKD,through the classic NLR family pyrin domain containing 3(NLRP3)-caspase-1-gasdermin D(GSDMD)apoptotic axis.Our results demonstrate that integrating omics data mining with causal inference may be a promising strategy for prioritizing therapeutic targets.展开更多
Background:Stroke is the second leading cause of death and third leading cause of disability worldwide and is the leading cause of death and disability among adults in China,with its incidence rate continuing to rise....Background:Stroke is the second leading cause of death and third leading cause of disability worldwide and is the leading cause of death and disability among adults in China,with its incidence rate continuing to rise.In China,the average age of firsttime stroke patients is 66.4 years,and the intravenous thrombolysis rate using recombinant tissue plasminogen activator within 3 h of onset is only 16%.Given this fact,there is a pressing need for real‐time predictive tools,particularly for elderly individuals at home,that can provide early warnings for potential strokes.Methods:We collected continuous monitoring data from nonintrusive smart beds and multimodal temporal data from electronic medical records at the National Center for Neurological Disorders.The data included smart bed monitoring indicators,laboratory tests,nurse observations,and static data as potential predictors,with stroke as the outcome.We applied feature representation and feature selection techniques and then input the predictors into machine learning models.Additionally,deep learning models were used after preprocessing the irregular temporal data.Finally,we evaluated the performance of the stroke prediction models and assessed the importance of the features.We used continuously updated vital signs and clinical data during hospitalization to generate timely stroke risk alerts during the same period of admission.Results:A total of 37,041 samples were analyzed,of which 7020 patients were diagnosed with stroke.When only the smart bed features were used for prediction,the model achieved an area under the receiver operating characteristic curve(AUROC)of 0.59−0.63,with an accuracy ranging from 60%−65%.Among the four artificial intelligence algorithms,the random forest model demonstrated the best performance.After all the available features were incorporated,the AUROC increased to 0.94,and the accuracy improved to 92%.Conclusions:In this study,the occurrence of stroke was successfully identified by integrating multimodal temporal data from electronic medical records.Noncontact monitoring of respiration and heart rate offers a promising approach for daily stroke surveillance in home‐based populations,particularly for elderly individuals living alone.展开更多
During the prevention of coronavirus disease 2019(COVID-19),epidemiological data is essential for controlling the source of infection,cutting off the route of transmission,and protecting vulnerable populations.Followi...During the prevention of coronavirus disease 2019(COVID-19),epidemiological data is essential for controlling the source of infection,cutting off the route of transmission,and protecting vulnerable populations.Following Law of the People's Republic of China on Prevention and Treatment of Infectious Diseases and other related regulations,medical institutions have been authorized to collect the detailed information of patients,while it is still a formidable task in megacities because of the significant patient mobility and the existing information sharing barrier.As a smart city which strengthens precise epidemic prevention and control,Shanghai has established a multi-department platform named"one-net management"on dynamic information monitoring.By sharing epidemiological data with medical institutions under a safe environment,we believe that the ability to prevent and control epidemics among medical institutions will be effectively and comprehensively improved.展开更多
Rice is one of the most important stable food as well as a monocotyledonous model organism for the plant research community.Here,we present RED(Rice Expression Database;http://expression.ic4r.org),an integrated dat...Rice is one of the most important stable food as well as a monocotyledonous model organism for the plant research community.Here,we present RED(Rice Expression Database;http://expression.ic4r.org),an integrated database of rice gene expression profiles derived entirely from RNA-Seq data.RED features a comprehensive collection of 284 high-quality RNA-Seq experiments,integrates a large number of gene expression profiles and covers a wide range of rice growth stages as well as various treatments.Based on massive expression profiles,RED provides a list of housekeeping and tissue-specific genes and dynamically constructs co-expression networks for gene(s) of interest.Besides,it provides user-friendly web interfaces for querying,browsing and visualizing expression profiles of concerned genes.Together,as a core resource in BIG Data Center,RED bears great utility for characterizing the function of rice genes and better understanding important biological processes and mechanisms underlying complex agronomic traits in rice.展开更多
Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support governm...Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.展开更多
With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve suffi...With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve sufficient extraction of data features,which seriously affects the accuracy and performance of anomaly detection.Therefore,this paper proposes a deep learning-based anomaly detection model for power data,which integrates a data alignment enhancement technique based on random sampling and an adaptive feature fusion method leveraging dimension reduction.Aiming at the distribution variability of power data,this paper developed a sliding window-based data adjustment method for this model,which solves the problem of high-dimensional feature noise and low-dimensional missing data.To address the problem of insufficient feature fusion,an adaptive feature fusion method based on feature dimension reduction and dictionary learning is proposed to improve the anomaly data detection accuracy of the model.In order to verify the effectiveness of the proposed method,we conducted effectiveness comparisons through elimination experiments.The experimental results show that compared with the traditional anomaly detection methods,the method proposed in this paper not only has an advantage in model accuracy,but also reduces the amount of parameter calculation of the model in the process of feature matching and improves the detection speed.展开更多
Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary ver...Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary versions of metaheuristic algorithms have issues with convergence and lack an effective binarization method,resulting in suboptimal solutions that hinder diagnosis and prediction accuracy.This paper aims to propose an Improved Binary Quantum-based Avian Navigation Optimizer Algorithm(IBQANA)for FSS in medical data preprocessing to address the suboptimal solutions arising from binary versions of metaheuristic algorithms.The proposed IBQANA’s contributions include the Hybrid Binary Operator(HBO)and the Distance-based Binary Search Strategy(DBSS).HBO is designed to convert continuous values into binary solutions,even for values outside the[0,1]range,ensuring accurate binary mapping.On the other hand,DBSS is a two-phase search strategy that enhances the performance of inferior search agents and accelerates convergence.By combining exploration and exploitation phases based on an adaptive probability function,DBSS effectively avoids local optima.The effectiveness of applying HBO is compared with five transfer function families and thresholding on 12 medical datasets,with feature numbers ranging from 8 to 10,509.IBQANA's effectiveness is evaluated regarding the accuracy,fitness,and selected features and compared with seven binary metaheuristic algorithms.Furthermore,IBQANA is utilized to detect COVID-19.The results reveal that the proposed IBQANA outperforms all comparative algorithms on COVID-19 and 11 other medical datasets.The proposed method presents a promising solution to the FSS problem in medical data preprocessing.展开更多
Most studies have conducted experiments on predicting energy consumption by integrating data formodel training.However, the process of centralizing data can cause problems of data leakage.Meanwhile,many laws and regul...Most studies have conducted experiments on predicting energy consumption by integrating data formodel training.However, the process of centralizing data can cause problems of data leakage.Meanwhile,many laws and regulationson data security and privacy have been enacted, making it difficult to centralize data, which can lead to a datasilo problem. Thus, to train the model while maintaining user privacy, we adopt a federated learning framework.However, in all classical federated learning frameworks secure aggregation, the Federated Averaging (FedAvg)method is used to directly weight the model parameters on average, which may have an adverse effect on te model.Therefore, we propose the Federated Reinforcement Learning (FedRL) model, which consists of multiple userscollaboratively training the model. Each household trains a local model on local data. These local data neverleave the local area, and only the encrypted parameters are uploaded to the central server to participate in thesecure aggregation of the global model. We improve FedAvg by incorporating a Q-learning algorithm to assignweights to each locally uploaded local model. And the model has improved predictive performance. We validatethe performance of the FedRL model by testing it on a real-world dataset and compare the experimental results withother models. The performance of our proposed method in most of the evaluation metrics is improved comparedto both the centralized and distributed models.展开更多
This paper deals with the recommendation system in the so-called user-centric payment environment where users,i.e.,the payers,can make payments without providing self-information to merchants.This service maintains on...This paper deals with the recommendation system in the so-called user-centric payment environment where users,i.e.,the payers,can make payments without providing self-information to merchants.This service maintains only the minimum purchase information such as the purchased product names,the time of purchase,the place of purchase for possible refunds or cancellations of purchases.This study aims to develop AI-based recommendation system by utilizing the minimum transaction data generated by the user-centric payment service.First,we developed a matrix-based extrapolative collaborative filtering algorithm based on open transaction data.The recommendation methodology was verified with the real transaction data.Based on the experimental results,we confirmed that the recommendation performance is satisfactory only with the minimum purchase information.展开更多
基金Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19050302 to Z.Z.XDA08020102 to Z.Z.)+2 种基金National Natural Science Foundation of China (31871328 to Z.Z.)K.C.Wong Education Foundation (to Z.Z.)The Youth Innovation Promotion Association of Chinese Academy of Sciences (2017141 to S.S.).
文摘With the rapid advancement of sequencing technologies and the growing volume of omics data in plants, there is much anticipation in digging out the treasure from such big data and accordingly refining the current agricultural practice to be applied in the near future. Toward this end, database resources that deliver web services for plant omics data submission, archiving, and integration are urgently needed. As a part of Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences (CAS), BIG Data Center (http://bigd.big.ac.cn) provides open access to a suite of database resources (Table 1), with the aim of supporting plant research activities for domestic and international users in both academia and industry to translate big data into big discoveries (BIG Data Center Members, 2017;BIG Data Center Members, 2018;BIG Data Center Members, 2019). Here, we give a brief introduction of plant-related database resources in BIG Data Center and appeal to plant research com丒 munities to make full use of these resources for plant data submission, archiving, and integration.
文摘Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable technology uses electronic devices that may be carried as accessories,clothes,or even embedded in the user's body.Although the potential benefits of smart wearables are numerous,their extensive and continual usage creates several privacy concerns and tricky information security challenges.In this paper,we present a comprehensive survey of recent privacy-preserving big data analytics applications based on wearable sensors.We highlight the fundamental features of security and privacy for wearable device applications.Then,we examine the utilization of deep learning algorithms with cryptography and determine their usability for wearable sensors.We also present a case study on privacy-preserving machine learning techniques.Herein,we theoretically and empirically evaluate the privacy-preserving deep learning framework's performance.We explain the implementation details of a case study of a secure prediction service using the convolutional neural network(CNN)model and the Cheon-Kim-Kim-Song(CHKS)homomorphic encryption algorithm.Finally,we explore the obstacles and gaps in the deployment of practical real-world applications.Following a comprehensive overview,we identify the most important obstacles that must be overcome and discuss some interesting future research directions.
文摘Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.
基金supported by the National Natural Science Foundation of China(Grant Nos.:82204396,82304491,and 82400511).
文摘Diabetic kidney disease(DKD)with increasing global prevalence lacks effective therapeutic targets to halt or reverse its progression.Therapeutic targets supported by causal genetic evidence are more likely to succeed in randomized clinical trials.In this study,we integrated large-scale plasma proteomics,genetic-driven causal inference,and experimental validation to identify prioritized targets for DKD using the UK Biobank(UKB)and FinnGen cohorts.Among 2844 diabetic patients(528 with DKD),we identified 37 targets significantly associated with incident DKD,supported by both observational and causal evidence.Of these,22%(8/37)of the potential targets are currently under investigation for DKD or other diseases.Our prospective study confirmed that higher levels of three prioritized targetsdinsulin-like growth factor binding protein 4(IGFBP4),family with sequence similarity 3 member C(FAM3C),and prostaglandin D2 synthase(PTGDS)dwere associated with a 4.35,3.51,and 3.57-fold increased likelihood of developing DKD,respectively.In addition,population-level protein-altering variants(PAVs)analysis and in vitro experiments cross-validated FAM3C and IGFBP4 as potential new target candidates for DKD,through the classic NLR family pyrin domain containing 3(NLRP3)-caspase-1-gasdermin D(GSDMD)apoptotic axis.Our results demonstrate that integrating omics data mining with causal inference may be a promising strategy for prioritizing therapeutic targets.
基金supported by the National Natural Science Foundation of China(72204169,82425101,82271516,81801187)Noncommunicable Chronic Diseases‐National Science and Technology Major Project(2023ZD0504800,2023ZD0504801,2023ZD0504802,2023ZD0504803,2023ZD0504804)+2 种基金Beijing Municipal Science&Technology Commission(Z231100004823036)Capital's Funds for Health Improvement and Research(2022‐2‐2045)National Key R&D Program of China(2024YFC3044800,2022YFF1501500,2022YFF1501501,2022YFF1501502,2022YFF1501503,2022YFF1501504,2022YFF1501505).
文摘Background:Stroke is the second leading cause of death and third leading cause of disability worldwide and is the leading cause of death and disability among adults in China,with its incidence rate continuing to rise.In China,the average age of firsttime stroke patients is 66.4 years,and the intravenous thrombolysis rate using recombinant tissue plasminogen activator within 3 h of onset is only 16%.Given this fact,there is a pressing need for real‐time predictive tools,particularly for elderly individuals at home,that can provide early warnings for potential strokes.Methods:We collected continuous monitoring data from nonintrusive smart beds and multimodal temporal data from electronic medical records at the National Center for Neurological Disorders.The data included smart bed monitoring indicators,laboratory tests,nurse observations,and static data as potential predictors,with stroke as the outcome.We applied feature representation and feature selection techniques and then input the predictors into machine learning models.Additionally,deep learning models were used after preprocessing the irregular temporal data.Finally,we evaluated the performance of the stroke prediction models and assessed the importance of the features.We used continuously updated vital signs and clinical data during hospitalization to generate timely stroke risk alerts during the same period of admission.Results:A total of 37,041 samples were analyzed,of which 7020 patients were diagnosed with stroke.When only the smart bed features were used for prediction,the model achieved an area under the receiver operating characteristic curve(AUROC)of 0.59−0.63,with an accuracy ranging from 60%−65%.Among the four artificial intelligence algorithms,the random forest model demonstrated the best performance.After all the available features were incorporated,the AUROC increased to 0.94,and the accuracy improved to 92%.Conclusions:In this study,the occurrence of stroke was successfully identified by integrating multimodal temporal data from electronic medical records.Noncontact monitoring of respiration and heart rate offers a promising approach for daily stroke surveillance in home‐based populations,particularly for elderly individuals living alone.
文摘During the prevention of coronavirus disease 2019(COVID-19),epidemiological data is essential for controlling the source of infection,cutting off the route of transmission,and protecting vulnerable populations.Following Law of the People's Republic of China on Prevention and Treatment of Infectious Diseases and other related regulations,medical institutions have been authorized to collect the detailed information of patients,while it is still a formidable task in megacities because of the significant patient mobility and the existing information sharing barrier.As a smart city which strengthens precise epidemic prevention and control,Shanghai has established a multi-department platform named"one-net management"on dynamic information monitoring.By sharing epidemiological data with medical institutions under a safe environment,we believe that the ability to prevent and control epidemics among medical institutions will be effectively and comprehensively improved.
基金supported by grants from Strategic Priority Research Program of the Chinese Academy of Sciences(No. XDA08020102 to Z.Z.and S.H.)International Partnership Program of the Chinese Academy of Sciences(No.153F11KYSB20160008)+3 种基金National Programs for High Technology Research and Development (863 ProgramNo.2015AA020108 to Z.Z.)National Natural Science Foundation of China(No.31100915 to LH.)the 100-Talent Program of Chinese Academy of Sciences(awarded to Z.Z.)
文摘Rice is one of the most important stable food as well as a monocotyledonous model organism for the plant research community.Here,we present RED(Rice Expression Database;http://expression.ic4r.org),an integrated database of rice gene expression profiles derived entirely from RNA-Seq data.RED features a comprehensive collection of 284 high-quality RNA-Seq experiments,integrates a large number of gene expression profiles and covers a wide range of rice growth stages as well as various treatments.Based on massive expression profiles,RED provides a list of housekeeping and tissue-specific genes and dynamically constructs co-expression networks for gene(s) of interest.Besides,it provides user-friendly web interfaces for querying,browsing and visualizing expression profiles of concerned genes.Together,as a core resource in BIG Data Center,RED bears great utility for characterizing the function of rice genes and better understanding important biological processes and mechanisms underlying complex agronomic traits in rice.
基金supported by the Taishan Scholars (No.ts201712003)。
文摘Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.
文摘With the popularisation of intelligent power,power devices have different shapes,numbers and specifications.This means that the power data has distributional variability,the model learning process cannot achieve sufficient extraction of data features,which seriously affects the accuracy and performance of anomaly detection.Therefore,this paper proposes a deep learning-based anomaly detection model for power data,which integrates a data alignment enhancement technique based on random sampling and an adaptive feature fusion method leveraging dimension reduction.Aiming at the distribution variability of power data,this paper developed a sliding window-based data adjustment method for this model,which solves the problem of high-dimensional feature noise and low-dimensional missing data.To address the problem of insufficient feature fusion,an adaptive feature fusion method based on feature dimension reduction and dictionary learning is proposed to improve the anomaly data detection accuracy of the model.In order to verify the effectiveness of the proposed method,we conducted effectiveness comparisons through elimination experiments.The experimental results show that compared with the traditional anomaly detection methods,the method proposed in this paper not only has an advantage in model accuracy,but also reduces the amount of parameter calculation of the model in the process of feature matching and improves the detection speed.
文摘Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary versions of metaheuristic algorithms have issues with convergence and lack an effective binarization method,resulting in suboptimal solutions that hinder diagnosis and prediction accuracy.This paper aims to propose an Improved Binary Quantum-based Avian Navigation Optimizer Algorithm(IBQANA)for FSS in medical data preprocessing to address the suboptimal solutions arising from binary versions of metaheuristic algorithms.The proposed IBQANA’s contributions include the Hybrid Binary Operator(HBO)and the Distance-based Binary Search Strategy(DBSS).HBO is designed to convert continuous values into binary solutions,even for values outside the[0,1]range,ensuring accurate binary mapping.On the other hand,DBSS is a two-phase search strategy that enhances the performance of inferior search agents and accelerates convergence.By combining exploration and exploitation phases based on an adaptive probability function,DBSS effectively avoids local optima.The effectiveness of applying HBO is compared with five transfer function families and thresholding on 12 medical datasets,with feature numbers ranging from 8 to 10,509.IBQANA's effectiveness is evaluated regarding the accuracy,fitness,and selected features and compared with seven binary metaheuristic algorithms.Furthermore,IBQANA is utilized to detect COVID-19.The results reveal that the proposed IBQANA outperforms all comparative algorithms on COVID-19 and 11 other medical datasets.The proposed method presents a promising solution to the FSS problem in medical data preprocessing.
基金supported by National Key R&D Program of China(No.2020YFC2006602)National Natural Science Foundation of China(Nos.62172324,62072324,61876217,6187612)+2 种基金University Natural Science Foundation of Jiangsu Province(No.21KJA520005)Primary Research and Development Plan of Jiangsu Province(No.BE2020026)Natural Science Foundation of Jiangsu Province(No.BK20190942).
文摘Most studies have conducted experiments on predicting energy consumption by integrating data formodel training.However, the process of centralizing data can cause problems of data leakage.Meanwhile,many laws and regulationson data security and privacy have been enacted, making it difficult to centralize data, which can lead to a datasilo problem. Thus, to train the model while maintaining user privacy, we adopt a federated learning framework.However, in all classical federated learning frameworks secure aggregation, the Federated Averaging (FedAvg)method is used to directly weight the model parameters on average, which may have an adverse effect on te model.Therefore, we propose the Federated Reinforcement Learning (FedRL) model, which consists of multiple userscollaboratively training the model. Each household trains a local model on local data. These local data neverleave the local area, and only the encrypted parameters are uploaded to the central server to participate in thesecure aggregation of the global model. We improve FedAvg by incorporating a Q-learning algorithm to assignweights to each locally uploaded local model. And the model has improved predictive performance. We validatethe performance of the FedRL model by testing it on a real-world dataset and compare the experimental results withother models. The performance of our proposed method in most of the evaluation metrics is improved comparedto both the centralized and distributed models.
基金supported under the framework of international cooperation program managed by the National Research Foundation of Korea(NRF 2020K2A9A2A06069972,FY2020)supported by the BK21 FOUR(Fostering Outstanding Universities for Research)funded by the Ministry of Education of the Republic of Korea and National Research Foundation of Korea(NRF)supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea(NRF-2020S1A5B8103855).
文摘This paper deals with the recommendation system in the so-called user-centric payment environment where users,i.e.,the payers,can make payments without providing self-information to merchants.This service maintains only the minimum purchase information such as the purchased product names,the time of purchase,the place of purchase for possible refunds or cancellations of purchases.This study aims to develop AI-based recommendation system by utilizing the minimum transaction data generated by the user-centric payment service.First,we developed a matrix-based extrapolative collaborative filtering algorithm based on open transaction data.The recommendation methodology was verified with the real transaction data.Based on the experimental results,we confirmed that the recommendation performance is satisfactory only with the minimum purchase information.