With the rapid advancement of sequencing technologies and the growing volume of omics data in plants, there is much anticipation in digging out the treasure from such big data and accordingly refining the current agri...With the rapid advancement of sequencing technologies and the growing volume of omics data in plants, there is much anticipation in digging out the treasure from such big data and accordingly refining the current agricultural practice to be applied in the near future. Toward this end, database resources that deliver web services for plant omics data submission, archiving, and integration are urgently needed. As a part of Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences (CAS), BIG Data Center (http://bigd.big.ac.cn) provides open access to a suite of database resources (Table 1), with the aim of supporting plant research activities for domestic and international users in both academia and industry to translate big data into big discoveries (BIG Data Center Members, 2017;BIG Data Center Members, 2018;BIG Data Center Members, 2019). Here, we give a brief introduction of plant-related database resources in BIG Data Center and appeal to plant research com丒 munities to make full use of these resources for plant data submission, archiving, and integration.展开更多
Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable ...Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable technology uses electronic devices that may be carried as accessories,clothes,or even embedded in the user's body.Although the potential benefits of smart wearables are numerous,their extensive and continual usage creates several privacy concerns and tricky information security challenges.In this paper,we present a comprehensive survey of recent privacy-preserving big data analytics applications based on wearable sensors.We highlight the fundamental features of security and privacy for wearable device applications.Then,we examine the utilization of deep learning algorithms with cryptography and determine their usability for wearable sensors.We also present a case study on privacy-preserving machine learning techniques.Herein,we theoretically and empirically evaluate the privacy-preserving deep learning framework's performance.We explain the implementation details of a case study of a secure prediction service using the convolutional neural network(CNN)model and the Cheon-Kim-Kim-Song(CHKS)homomorphic encryption algorithm.Finally,we explore the obstacles and gaps in the deployment of practical real-world applications.Following a comprehensive overview,we identify the most important obstacles that must be overcome and discuss some interesting future research directions.展开更多
Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs...Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.展开更多
Diabetic kidney disease(DKD)with increasing global prevalence lacks effective therapeutic targets to halt or reverse its progression.Therapeutic targets supported by causal genetic evidence are more likely to succeed ...Diabetic kidney disease(DKD)with increasing global prevalence lacks effective therapeutic targets to halt or reverse its progression.Therapeutic targets supported by causal genetic evidence are more likely to succeed in randomized clinical trials.In this study,we integrated large-scale plasma proteomics,genetic-driven causal inference,and experimental validation to identify prioritized targets for DKD using the UK Biobank(UKB)and FinnGen cohorts.Among 2844 diabetic patients(528 with DKD),we identified 37 targets significantly associated with incident DKD,supported by both observational and causal evidence.Of these,22%(8/37)of the potential targets are currently under investigation for DKD or other diseases.Our prospective study confirmed that higher levels of three prioritized targetsdinsulin-like growth factor binding protein 4(IGFBP4),family with sequence similarity 3 member C(FAM3C),and prostaglandin D2 synthase(PTGDS)dwere associated with a 4.35,3.51,and 3.57-fold increased likelihood of developing DKD,respectively.In addition,population-level protein-altering variants(PAVs)analysis and in vitro experiments cross-validated FAM3C and IGFBP4 as potential new target candidates for DKD,through the classic NLR family pyrin domain containing 3(NLRP3)-caspase-1-gasdermin D(GSDMD)apoptotic axis.Our results demonstrate that integrating omics data mining with causal inference may be a promising strategy for prioritizing therapeutic targets.展开更多
Background:Stroke is the second leading cause of death and third leading cause of disability worldwide and is the leading cause of death and disability among adults in China,with its incidence rate continuing to rise....Background:Stroke is the second leading cause of death and third leading cause of disability worldwide and is the leading cause of death and disability among adults in China,with its incidence rate continuing to rise.In China,the average age of firsttime stroke patients is 66.4 years,and the intravenous thrombolysis rate using recombinant tissue plasminogen activator within 3 h of onset is only 16%.Given this fact,there is a pressing need for real‐time predictive tools,particularly for elderly individuals at home,that can provide early warnings for potential strokes.Methods:We collected continuous monitoring data from nonintrusive smart beds and multimodal temporal data from electronic medical records at the National Center for Neurological Disorders.The data included smart bed monitoring indicators,laboratory tests,nurse observations,and static data as potential predictors,with stroke as the outcome.We applied feature representation and feature selection techniques and then input the predictors into machine learning models.Additionally,deep learning models were used after preprocessing the irregular temporal data.Finally,we evaluated the performance of the stroke prediction models and assessed the importance of the features.We used continuously updated vital signs and clinical data during hospitalization to generate timely stroke risk alerts during the same period of admission.Results:A total of 37,041 samples were analyzed,of which 7020 patients were diagnosed with stroke.When only the smart bed features were used for prediction,the model achieved an area under the receiver operating characteristic curve(AUROC)of 0.59−0.63,with an accuracy ranging from 60%−65%.Among the four artificial intelligence algorithms,the random forest model demonstrated the best performance.After all the available features were incorporated,the AUROC increased to 0.94,and the accuracy improved to 92%.Conclusions:In this study,the occurrence of stroke was successfully identified by integrating multimodal temporal data from electronic medical records.Noncontact monitoring of respiration and heart rate offers a promising approach for daily stroke surveillance in home‐based populations,particularly for elderly individuals living alone.展开更多
针对多订单随机到达条件下的动态柔性作业车间调度问题(Dynamic Flexible Job Shop Scheduling Problem with Order Random Arrival, DFJSP_ORA),提出一种面向实际生产环境的建模与求解框架。首先构建了以最小化最大完工时间为优化目标D...针对多订单随机到达条件下的动态柔性作业车间调度问题(Dynamic Flexible Job Shop Scheduling Problem with Order Random Arrival, DFJSP_ORA),提出一种面向实际生产环境的建模与求解框架。首先构建了以最小化最大完工时间为优化目标DFJSP_ORA的数学模型。引入流体模型对系统行为进行连续近似,从而提取关键状态特征。调度过程被建模为马尔可夫决策过程(Markov Decision Process, MDP),并采用近端策略优化(Proximal Policy Optimization, PPO)算法构建端到端的深度强化学习框架进行求解。该方法结合复合规则驱动的离散动作空间与优势函数驱动的策略优化机制,实现了对动态环境的高效决策。最后通过81个不同规模的实例,对所提方法与6种优先调度规则及3种强化学习方法进行比较,结果验证了其优越性,为DFJSP_ORA的求解提供了一种高效、灵活的解决方案。展开更多
基金Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19050302 to Z.Z.XDA08020102 to Z.Z.)+2 种基金National Natural Science Foundation of China (31871328 to Z.Z.)K.C.Wong Education Foundation (to Z.Z.)The Youth Innovation Promotion Association of Chinese Academy of Sciences (2017141 to S.S.).
文摘With the rapid advancement of sequencing technologies and the growing volume of omics data in plants, there is much anticipation in digging out the treasure from such big data and accordingly refining the current agricultural practice to be applied in the near future. Toward this end, database resources that deliver web services for plant omics data submission, archiving, and integration are urgently needed. As a part of Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences (CAS), BIG Data Center (http://bigd.big.ac.cn) provides open access to a suite of database resources (Table 1), with the aim of supporting plant research activities for domestic and international users in both academia and industry to translate big data into big discoveries (BIG Data Center Members, 2017;BIG Data Center Members, 2018;BIG Data Center Members, 2019). Here, we give a brief introduction of plant-related database resources in BIG Data Center and appeal to plant research com丒 munities to make full use of these resources for plant data submission, archiving, and integration.
文摘Wearable technologies have the potential to become a valuable influence on human daily life where they may enable observing the world in new ways,including,for example,using augmented reality(AR)applications.Wearable technology uses electronic devices that may be carried as accessories,clothes,or even embedded in the user's body.Although the potential benefits of smart wearables are numerous,their extensive and continual usage creates several privacy concerns and tricky information security challenges.In this paper,we present a comprehensive survey of recent privacy-preserving big data analytics applications based on wearable sensors.We highlight the fundamental features of security and privacy for wearable device applications.Then,we examine the utilization of deep learning algorithms with cryptography and determine their usability for wearable sensors.We also present a case study on privacy-preserving machine learning techniques.Herein,we theoretically and empirically evaluate the privacy-preserving deep learning framework's performance.We explain the implementation details of a case study of a secure prediction service using the convolutional neural network(CNN)model and the Cheon-Kim-Kim-Song(CHKS)homomorphic encryption algorithm.Finally,we explore the obstacles and gaps in the deployment of practical real-world applications.Following a comprehensive overview,we identify the most important obstacles that must be overcome and discuss some interesting future research directions.
文摘Despite the widespread use of Decision trees (DT) across various applications, their performance tends to suffer when dealing with imbalanced datasets, where the distribution of certain classes significantly outweighs others. Cost-sensitive learning is a strategy to solve this problem, and several cost-sensitive DT algorithms have been proposed to date. However, existing algorithms, which are heuristic, tried to greedily select either a better splitting point or feature node, leading to local optima for tree nodes and ignoring the cost of the whole tree. In addition, determination of the costs is difficult and often requires domain expertise. This study proposes a DT for imbalanced data, called Swarm-based Cost-sensitive DT (SCDT), using the cost-sensitive learning strategy and an enhanced swarm-based algorithm. The DT is encoded using a hybrid individual representation. A hybrid artificial bee colony approach is designed to optimize rules, considering specified costs in an F-Measure-based fitness function. Experimental results using datasets compared with state-of-the-art DT algorithms show that the SCDT method achieved the highest performance on most datasets. Moreover, SCDT also excels in other critical performance metrics, such as recall, precision, F1-score, and AUC, with notable results with average values of 83%, 87.3%, 85%, and 80.7%, respectively.
基金supported by the National Natural Science Foundation of China(Grant Nos.:82204396,82304491,and 82400511).
文摘Diabetic kidney disease(DKD)with increasing global prevalence lacks effective therapeutic targets to halt or reverse its progression.Therapeutic targets supported by causal genetic evidence are more likely to succeed in randomized clinical trials.In this study,we integrated large-scale plasma proteomics,genetic-driven causal inference,and experimental validation to identify prioritized targets for DKD using the UK Biobank(UKB)and FinnGen cohorts.Among 2844 diabetic patients(528 with DKD),we identified 37 targets significantly associated with incident DKD,supported by both observational and causal evidence.Of these,22%(8/37)of the potential targets are currently under investigation for DKD or other diseases.Our prospective study confirmed that higher levels of three prioritized targetsdinsulin-like growth factor binding protein 4(IGFBP4),family with sequence similarity 3 member C(FAM3C),and prostaglandin D2 synthase(PTGDS)dwere associated with a 4.35,3.51,and 3.57-fold increased likelihood of developing DKD,respectively.In addition,population-level protein-altering variants(PAVs)analysis and in vitro experiments cross-validated FAM3C and IGFBP4 as potential new target candidates for DKD,through the classic NLR family pyrin domain containing 3(NLRP3)-caspase-1-gasdermin D(GSDMD)apoptotic axis.Our results demonstrate that integrating omics data mining with causal inference may be a promising strategy for prioritizing therapeutic targets.
基金supported by the National Natural Science Foundation of China(72204169,82425101,82271516,81801187)Noncommunicable Chronic Diseases‐National Science and Technology Major Project(2023ZD0504800,2023ZD0504801,2023ZD0504802,2023ZD0504803,2023ZD0504804)+2 种基金Beijing Municipal Science&Technology Commission(Z231100004823036)Capital's Funds for Health Improvement and Research(2022‐2‐2045)National Key R&D Program of China(2024YFC3044800,2022YFF1501500,2022YFF1501501,2022YFF1501502,2022YFF1501503,2022YFF1501504,2022YFF1501505).
文摘Background:Stroke is the second leading cause of death and third leading cause of disability worldwide and is the leading cause of death and disability among adults in China,with its incidence rate continuing to rise.In China,the average age of firsttime stroke patients is 66.4 years,and the intravenous thrombolysis rate using recombinant tissue plasminogen activator within 3 h of onset is only 16%.Given this fact,there is a pressing need for real‐time predictive tools,particularly for elderly individuals at home,that can provide early warnings for potential strokes.Methods:We collected continuous monitoring data from nonintrusive smart beds and multimodal temporal data from electronic medical records at the National Center for Neurological Disorders.The data included smart bed monitoring indicators,laboratory tests,nurse observations,and static data as potential predictors,with stroke as the outcome.We applied feature representation and feature selection techniques and then input the predictors into machine learning models.Additionally,deep learning models were used after preprocessing the irregular temporal data.Finally,we evaluated the performance of the stroke prediction models and assessed the importance of the features.We used continuously updated vital signs and clinical data during hospitalization to generate timely stroke risk alerts during the same period of admission.Results:A total of 37,041 samples were analyzed,of which 7020 patients were diagnosed with stroke.When only the smart bed features were used for prediction,the model achieved an area under the receiver operating characteristic curve(AUROC)of 0.59−0.63,with an accuracy ranging from 60%−65%.Among the four artificial intelligence algorithms,the random forest model demonstrated the best performance.After all the available features were incorporated,the AUROC increased to 0.94,and the accuracy improved to 92%.Conclusions:In this study,the occurrence of stroke was successfully identified by integrating multimodal temporal data from electronic medical records.Noncontact monitoring of respiration and heart rate offers a promising approach for daily stroke surveillance in home‐based populations,particularly for elderly individuals living alone.
文摘针对多订单随机到达条件下的动态柔性作业车间调度问题(Dynamic Flexible Job Shop Scheduling Problem with Order Random Arrival, DFJSP_ORA),提出一种面向实际生产环境的建模与求解框架。首先构建了以最小化最大完工时间为优化目标DFJSP_ORA的数学模型。引入流体模型对系统行为进行连续近似,从而提取关键状态特征。调度过程被建模为马尔可夫决策过程(Markov Decision Process, MDP),并采用近端策略优化(Proximal Policy Optimization, PPO)算法构建端到端的深度强化学习框架进行求解。该方法结合复合规则驱动的离散动作空间与优势函数驱动的策略优化机制,实现了对动态环境的高效决策。最后通过81个不同规模的实例,对所提方法与6种优先调度规则及3种强化学习方法进行比较,结果验证了其优越性,为DFJSP_ORA的求解提供了一种高效、灵活的解决方案。
文摘董志塬地区位于黄土高原中心地带,滑坡灾害频发,亟需明确滑坡易发性分区,以支持该区域滑坡隐患的科学防控。因此,本文以董志塬为研究区,选取高程、坡向和NDVI等12个影响因素作为评价因子,基于频率比(frequency ratio,FR)模型,结合随机森林(random forest,RF)与人工神经网络(artificial neural network,ANN)模型开展滑坡静态易发性评价,并分析各因子对评价精度的贡献。结果表明,FRRF和FR-ANN模型的曲线下面积(area under the curve,AUC)值分别为0.922和0.918,表明FR-RF模型在董志塬滑坡易发性评价中的精度更高。坡度、坡向和道路密度对滑坡易发性的贡献率分别为16.7%、15.3%和1.4%。为克服地形复杂和数据更新滞后的问题,本文将FR-RF模型的易发性结果与InSAR Stacking结果相结合,将静态滑坡易发性评价精度由6.9%提升到8.1%。动态易发性结果表明,董志塬滑坡高易发区主要分布于河流沿岸,占总面积的6.5%,该区域的滑坡数量占总滑坡数的23.6%,滑坡密度15.7个/km^(2)。低易发区主要位于远离河流的中部区域,占总面积的81.7%,滑坡数量占总滑坡数的57.8%,滑坡密度4.7个/km^(2)。本研究通过融合InSAR Stacking方法,解决了静态滑坡易发性评价数据更新滞后问题,减少了假阴性错误,为传统滑坡易发性评价赋予了时效性,可以实现董志塬滑坡易发性动态评价,为灾害防治提供了重要数据支持。