AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a...AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.展开更多
针对射频识别(RFID:Radio Frequency Identification)系统的信道资源有限,当多个标签竞争同一个频率或时间槽时,会导致发生碰撞和冲突的问题,为优化广播信道的通信效率,对基于帧时隙ALOHA的物联网RFID广播信道防碰撞算法进行了研究。该...针对射频识别(RFID:Radio Frequency Identification)系统的信道资源有限,当多个标签竞争同一个频率或时间槽时,会导致发生碰撞和冲突的问题,为优化广播信道的通信效率,对基于帧时隙ALOHA的物联网RFID广播信道防碰撞算法进行了研究。该方法引入帧时隙概念,对通信时间进行时间段划分;通过时隙内空闲、成功识别以及碰撞3种状态的发生概率分析,得到广播信道内的碰撞原因。结合贝叶斯算法与泊松分布规则,通过标签数目概率分布计算,实现读写器作用范围内标签数量的估计,并根据标签数量计算结果调整下一帧帧长。若调整后的帧时隙范围内仍存在标签碰撞问题,则通过FastICA(Indcpendent Component Analysis)独立主成分分析法,将帧时隙内的标签识别问题,转化为EPC(Electronic Product Code)编码生成问题,进而实现统一时隙内多标签的并行识别,避免发生碰撞。实验表明,所提方的标签数量的估算准确,能在保证通信信道稳定性的前提下,提高时隙内标签识别率,有效提高广播信道的传播效率。展开更多
The penalized least squares(PLS)method with appropriate weights has proved to be a successful baseline estimation method for various spectral analyses.It can extract the baseline from the spectrum while retaining the ...The penalized least squares(PLS)method with appropriate weights has proved to be a successful baseline estimation method for various spectral analyses.It can extract the baseline from the spectrum while retaining the signal peaks in the presence of random noise.The algorithm is implemented by iterating over the weights of the data points.In this study,we propose a new approach for assigning weights based on the Bayesian rule.The proposed method provides a self-consistent weighting formula and performs well,particularly for baselines with different curvature components.This method was applied to analyze Schottky spectra obtained in 86Kr projectile fragmentation measurements in the experimental Cooler Storage Ring(CSRe)at Lanzhou.It provides an accurate and reliable storage lifetime with a smaller error bar than existing PLS methods.It is also a universal baseline-subtraction algorithm that can be used for spectrum-related experiments,such as precision nuclear mass and lifetime measurements in storage rings.展开更多
In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidential...In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.展开更多
目的探究广州市社区老年人跌倒发生率及影响因素,并分析不同影响因素之间的协同模式和影响路径。方法采用前瞻性队列研究设计,对2021年广州市≥65岁的4950名社区老年人开展基线调查,并于2022年进行随访,收集跌倒事件的发生及健康情况信...目的探究广州市社区老年人跌倒发生率及影响因素,并分析不同影响因素之间的协同模式和影响路径。方法采用前瞻性队列研究设计,对2021年广州市≥65岁的4950名社区老年人开展基线调查,并于2022年进行随访,收集跌倒事件的发生及健康情况信息。采用Cox比例风险回归模型分析跌倒的影响因素,并利用关联规则和贝叶斯网络模型分析影响因素的协同模式和影响路径。结果最终纳入有效样本3393例,中位随访时间为1.53年,跌倒发生率为25.49%(95%CI:24.02%~26.96%)。Cox比例风险回归模型结果显示,女性(HR=1.246,95%CI:1.088~1.426)、年龄≥75岁(HR=1.343,95%CI:1.133~1.592)、,有医疗保险(HR=1.440,95%CI:1.038~1.997)、患糖尿病(HR=1.309,95%CI:1.143~1.498)和脑卒中(HR=1.914,95%CI:1.309~2.799)是老年人跌倒的独立危险因素;而高中及以上学历(HR=0.861,95%CI:0.750~0.987)、每天锻炼(HR=0.684,95%CI:0.580~0.807)和健康状态自我评估满意(HR=0.484,95%CI:0.278~0.841)与跌倒风险降低相关。关联规则分析揭示了糖尿病、女性与跌倒的协同模式(支持度:19.07%,置信度:32.77%,提升度:1.29)。贝叶斯网络模型分析结果显示,锻炼频率、年龄、文化程度、健康状态自我评估、性别、患糖尿病和脑卒中是影响老年人跌倒的关键因素,模型受试者工作特征曲线下面积(area under the curve,AUC)为0.612。结论广州市社区老年人跌倒发生率较高,且影响因素多样。建议针对缺乏锻炼、高龄、女性及慢性病患者等高风险群体,制定和实施综合性的干预措施。展开更多
This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data...This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data. Although do-main expertise is often available, the query formulation task is tedious and laborious, and hence automation of query formulation is desirable. In an effort to automate the query formulation process, a machine learning algorithm is lev-eraged to discover knowledge in the form of if-then rules in the data from which the Bayesian belief network model under validation was also induced. The set of if-then rules are processed and filtered through domain expertise to identify a subset that consists of “interesting” and “significant” rules. The subset of interesting and significant rules is formulated into corresponding queries to be posed, for validation purposes, to the Bayesian belief network induced from the same dataset. The promise of the proposed methodology was assessed through an empirical study performed on a real-life dataset, the National Crime Victimization Survey, which has over 250 attributes and well over 200,000 data points. The study demonstrated that the proposed approach is feasible and provides automation, in part, of the query formulation process for validation of a complex probabilistic model, which culminates in substantial savings for the need for human expert involvement and investment.展开更多
对于传统马尔可夫随机场而言,先验能量的势能函数中的先验参数通常是根据经验手动选取大于零的值,没有考虑像元之间的距离,也没有充分考虑图像局部邻域先验特征,针对上述问题,提出一种结合标号场先验特征和像元距离动态估计先验参数的方...对于传统马尔可夫随机场而言,先验能量的势能函数中的先验参数通常是根据经验手动选取大于零的值,没有考虑像元之间的距离,也没有充分考虑图像局部邻域先验特征,针对上述问题,提出一种结合标号场先验特征和像元距离动态估计先验参数的方法,并在先验能量中定义了观测场像元之间的影响系数,似然能量函数中引入Sobel算子描述观测场像元之间的关系,最后结合分水岭算法消除碎屑小区域进一步优化分割结果。通过Merced Land Use Dataset场景分类数据集进行了相关实验,结果表明该方法可以有效应用于遥感图像分割工作中。展开更多
现代军事活动中,空地协同多编队样式越发重要。已有的目标意图识别方法对单一编队效果较好,但对空中和地面协同的多编队场景尚缺乏有力的解决方法。因此,采用动态序列贝叶斯网络(Dynamic Series Bayesian Network,DSBN)对空地协同编队...现代军事活动中,空地协同多编队样式越发重要。已有的目标意图识别方法对单一编队效果较好,但对空中和地面协同的多编队场景尚缺乏有力的解决方法。因此,采用动态序列贝叶斯网络(Dynamic Series Bayesian Network,DSBN)对空地协同编队进行意图识别。该方法首先利用DSBN构建了一个空地协同作战意图识别整体模型,用于描述空中和地面编队之间的协同行动过程,然后通过将不同战场域的事件及其相关概率关系进行融合,结合辅助战场信息,使用推理网络实现对敌方协同作战意图的识别。该方法充分考虑了空中目标的行为规则,精细描述其行为模式和趋势,更好地适用于多协同目标编队的场景。最后通过实例仿真验证了该方法的可行性和有效性。展开更多
基金Supported by National Institute of General Medical Sciences of the National Institutes of Health,No.R01GM100387
文摘AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.
基金supported by the National Key R&D Program of China(No.2018YFA0404401)CAS Project for Young Scientists in Basic Research(No.YSBR-002)Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDB34000000).
文摘The penalized least squares(PLS)method with appropriate weights has proved to be a successful baseline estimation method for various spectral analyses.It can extract the baseline from the spectrum while retaining the signal peaks in the presence of random noise.The algorithm is implemented by iterating over the weights of the data points.In this study,we propose a new approach for assigning weights based on the Bayesian rule.The proposed method provides a self-consistent weighting formula and performs well,particularly for baselines with different curvature components.This method was applied to analyze Schottky spectra obtained in 86Kr projectile fragmentation measurements in the experimental Cooler Storage Ring(CSRe)at Lanzhou.It provides an accurate and reliable storage lifetime with a smaller error bar than existing PLS methods.It is also a universal baseline-subtraction algorithm that can be used for spectrum-related experiments,such as precision nuclear mass and lifetime measurements in storage rings.
文摘In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.
文摘目的探究广州市社区老年人跌倒发生率及影响因素,并分析不同影响因素之间的协同模式和影响路径。方法采用前瞻性队列研究设计,对2021年广州市≥65岁的4950名社区老年人开展基线调查,并于2022年进行随访,收集跌倒事件的发生及健康情况信息。采用Cox比例风险回归模型分析跌倒的影响因素,并利用关联规则和贝叶斯网络模型分析影响因素的协同模式和影响路径。结果最终纳入有效样本3393例,中位随访时间为1.53年,跌倒发生率为25.49%(95%CI:24.02%~26.96%)。Cox比例风险回归模型结果显示,女性(HR=1.246,95%CI:1.088~1.426)、年龄≥75岁(HR=1.343,95%CI:1.133~1.592)、,有医疗保险(HR=1.440,95%CI:1.038~1.997)、患糖尿病(HR=1.309,95%CI:1.143~1.498)和脑卒中(HR=1.914,95%CI:1.309~2.799)是老年人跌倒的独立危险因素;而高中及以上学历(HR=0.861,95%CI:0.750~0.987)、每天锻炼(HR=0.684,95%CI:0.580~0.807)和健康状态自我评估满意(HR=0.484,95%CI:0.278~0.841)与跌倒风险降低相关。关联规则分析揭示了糖尿病、女性与跌倒的协同模式(支持度:19.07%,置信度:32.77%,提升度:1.29)。贝叶斯网络模型分析结果显示,锻炼频率、年龄、文化程度、健康状态自我评估、性别、患糖尿病和脑卒中是影响老年人跌倒的关键因素,模型受试者工作特征曲线下面积(area under the curve,AUC)为0.612。结论广州市社区老年人跌倒发生率较高,且影响因素多样。建议针对缺乏锻炼、高龄、女性及慢性病患者等高风险群体,制定和实施综合性的干预措施。
文摘This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data. Although do-main expertise is often available, the query formulation task is tedious and laborious, and hence automation of query formulation is desirable. In an effort to automate the query formulation process, a machine learning algorithm is lev-eraged to discover knowledge in the form of if-then rules in the data from which the Bayesian belief network model under validation was also induced. The set of if-then rules are processed and filtered through domain expertise to identify a subset that consists of “interesting” and “significant” rules. The subset of interesting and significant rules is formulated into corresponding queries to be posed, for validation purposes, to the Bayesian belief network induced from the same dataset. The promise of the proposed methodology was assessed through an empirical study performed on a real-life dataset, the National Crime Victimization Survey, which has over 250 attributes and well over 200,000 data points. The study demonstrated that the proposed approach is feasible and provides automation, in part, of the query formulation process for validation of a complex probabilistic model, which culminates in substantial savings for the need for human expert involvement and investment.
文摘对于传统马尔可夫随机场而言,先验能量的势能函数中的先验参数通常是根据经验手动选取大于零的值,没有考虑像元之间的距离,也没有充分考虑图像局部邻域先验特征,针对上述问题,提出一种结合标号场先验特征和像元距离动态估计先验参数的方法,并在先验能量中定义了观测场像元之间的影响系数,似然能量函数中引入Sobel算子描述观测场像元之间的关系,最后结合分水岭算法消除碎屑小区域进一步优化分割结果。通过Merced Land Use Dataset场景分类数据集进行了相关实验,结果表明该方法可以有效应用于遥感图像分割工作中。
文摘现代军事活动中,空地协同多编队样式越发重要。已有的目标意图识别方法对单一编队效果较好,但对空中和地面协同的多编队场景尚缺乏有力的解决方法。因此,采用动态序列贝叶斯网络(Dynamic Series Bayesian Network,DSBN)对空地协同编队进行意图识别。该方法首先利用DSBN构建了一个空地协同作战意图识别整体模型,用于描述空中和地面编队之间的协同行动过程,然后通过将不同战场域的事件及其相关概率关系进行融合,结合辅助战场信息,使用推理网络实现对敌方协同作战意图的识别。该方法充分考虑了空中目标的行为规则,精细描述其行为模式和趋势,更好地适用于多协同目标编队的场景。最后通过实例仿真验证了该方法的可行性和有效性。