AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a...AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.展开更多
The penalized least squares(PLS)method with appropriate weights has proved to be a successful baseline estimation method for various spectral analyses.It can extract the baseline from the spectrum while retaining the ...The penalized least squares(PLS)method with appropriate weights has proved to be a successful baseline estimation method for various spectral analyses.It can extract the baseline from the spectrum while retaining the signal peaks in the presence of random noise.The algorithm is implemented by iterating over the weights of the data points.In this study,we propose a new approach for assigning weights based on the Bayesian rule.The proposed method provides a self-consistent weighting formula and performs well,particularly for baselines with different curvature components.This method was applied to analyze Schottky spectra obtained in 86Kr projectile fragmentation measurements in the experimental Cooler Storage Ring(CSRe)at Lanzhou.It provides an accurate and reliable storage lifetime with a smaller error bar than existing PLS methods.It is also a universal baseline-subtraction algorithm that can be used for spectrum-related experiments,such as precision nuclear mass and lifetime measurements in storage rings.展开更多
In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidential...In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.展开更多
针对射频识别(RFID:Radio Frequency Identification)系统的信道资源有限,当多个标签竞争同一个频率或时间槽时,会导致发生碰撞和冲突的问题,为优化广播信道的通信效率,对基于帧时隙ALOHA的物联网RFID广播信道防碰撞算法进行了研究。该...针对射频识别(RFID:Radio Frequency Identification)系统的信道资源有限,当多个标签竞争同一个频率或时间槽时,会导致发生碰撞和冲突的问题,为优化广播信道的通信效率,对基于帧时隙ALOHA的物联网RFID广播信道防碰撞算法进行了研究。该方法引入帧时隙概念,对通信时间进行时间段划分;通过时隙内空闲、成功识别以及碰撞3种状态的发生概率分析,得到广播信道内的碰撞原因。结合贝叶斯算法与泊松分布规则,通过标签数目概率分布计算,实现读写器作用范围内标签数量的估计,并根据标签数量计算结果调整下一帧帧长。若调整后的帧时隙范围内仍存在标签碰撞问题,则通过FastICA(Indcpendent Component Analysis)独立主成分分析法,将帧时隙内的标签识别问题,转化为EPC(Electronic Product Code)编码生成问题,进而实现统一时隙内多标签的并行识别,避免发生碰撞。实验表明,所提方的标签数量的估算准确,能在保证通信信道稳定性的前提下,提高时隙内标签识别率,有效提高广播信道的传播效率。展开更多
This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data...This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data. Although do-main expertise is often available, the query formulation task is tedious and laborious, and hence automation of query formulation is desirable. In an effort to automate the query formulation process, a machine learning algorithm is lev-eraged to discover knowledge in the form of if-then rules in the data from which the Bayesian belief network model under validation was also induced. The set of if-then rules are processed and filtered through domain expertise to identify a subset that consists of “interesting” and “significant” rules. The subset of interesting and significant rules is formulated into corresponding queries to be posed, for validation purposes, to the Bayesian belief network induced from the same dataset. The promise of the proposed methodology was assessed through an empirical study performed on a real-life dataset, the National Crime Victimization Survey, which has over 250 attributes and well over 200,000 data points. The study demonstrated that the proposed approach is feasible and provides automation, in part, of the query formulation process for validation of a complex probabilistic model, which culminates in substantial savings for the need for human expert involvement and investment.展开更多
提出一种改进的粒子滤波SLAM(simultaneous localization and map building)同时定位和地图创建实现方法。改进方法让机器人大约行进10步完成基于局部已创建地图下的粒子滤波定位后,再利用激光传感器探测环境并更新创建的地图;同时在利...提出一种改进的粒子滤波SLAM(simultaneous localization and map building)同时定位和地图创建实现方法。改进方法让机器人大约行进10步完成基于局部已创建地图下的粒子滤波定位后,再利用激光传感器探测环境并更新创建的地图;同时在利用粒子滤波定位时,使粒子只分布在由航位推算法得出的机器人位姿附近,从而可有效地减少粒子的数量。实验结果表明,与标准的粒子滤波SLAM算法比较,改进算法提高了机器人SLAM过程中定位和地图创建的精度和实时性,并为移动机器人在室外未知环境同时定位和地图创建提供了新方法。展开更多
针对未知环境中移动机器人同时定位和地图创建(Simultaneous Localization and Map Building,SLAM)由于机器人位姿和环境地图都不确定导致定位和地图创建变得更加复杂,提出一种局部最优(全局次优)参数法,即通过局部最优的位姿创建局部...针对未知环境中移动机器人同时定位和地图创建(Simultaneous Localization and Map Building,SLAM)由于机器人位姿和环境地图都不确定导致定位和地图创建变得更加复杂,提出一种局部最优(全局次优)参数法,即通过局部最优的位姿创建局部最优的环境地图,再通过局部最优的环境地图寻求局部最优的位姿,如此交替进行,直到得到全局确定性的位姿和确定性的环境地图。实验结果表明,同标准的基于粒子滤波的SLAM算法(Particle Filtering-SLAM,PF-SLAM)比较,改进的算法提高了机器人SLAM过程中定位的准确度和地图创建的精确度,为机器人在未知的室外大环境同时定位和地图创建提供新的方法。展开更多
基金Supported by National Institute of General Medical Sciences of the National Institutes of Health,No.R01GM100387
文摘AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.
基金supported by the National Key R&D Program of China(No.2018YFA0404401)CAS Project for Young Scientists in Basic Research(No.YSBR-002)Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDB34000000).
文摘The penalized least squares(PLS)method with appropriate weights has proved to be a successful baseline estimation method for various spectral analyses.It can extract the baseline from the spectrum while retaining the signal peaks in the presence of random noise.The algorithm is implemented by iterating over the weights of the data points.In this study,we propose a new approach for assigning weights based on the Bayesian rule.The proposed method provides a self-consistent weighting formula and performs well,particularly for baselines with different curvature components.This method was applied to analyze Schottky spectra obtained in 86Kr projectile fragmentation measurements in the experimental Cooler Storage Ring(CSRe)at Lanzhou.It provides an accurate and reliable storage lifetime with a smaller error bar than existing PLS methods.It is also a universal baseline-subtraction algorithm that can be used for spectrum-related experiments,such as precision nuclear mass and lifetime measurements in storage rings.
文摘In the privacy preservation of association rules, sensitivity analysis should be reported after the quantification of items in terms of their occurrence. The traditional methodologies, used for preserving confidentiality of association rules, are based on the assumptions while safeguarding susceptible information rather than recognition of insightful items. Therefore, it is time to go one step ahead in order to remove such assumptions in the protection of responsive information especially in XML association rule mining. Thus, we focus on this central and highly researched area in terms of generating XML association rule mining without arguing on the disclosure risks involvement in such mining process. Hence, we described the identification of susceptible items in order to hide the confidential information through a supervised learning technique. These susceptible items show the high dependency on other items that are measured in terms of statistical significance with Bayesian Network. Thus, we proposed two methodologies based on items probabilistic occurrence and mode of items. Additionally, all this information is modeled and named PPDM (Privacy Preservation in Data Mining) model for XARs. Furthermore, the PPDM model is helpful for sharing markets information among competitors with a lower chance of generating monopoly. Finally, PPDM model introduces great accuracy in computing sensitivity of items and opens new dimensions to the academia for the standardization of such NP-hard problems.
文摘This paper proposes machine learning techniques to discover knowledge in a dataset in the form of if-then rules for the purpose of formulating queries for validation of a Bayesian belief network model of the same data. Although do-main expertise is often available, the query formulation task is tedious and laborious, and hence automation of query formulation is desirable. In an effort to automate the query formulation process, a machine learning algorithm is lev-eraged to discover knowledge in the form of if-then rules in the data from which the Bayesian belief network model under validation was also induced. The set of if-then rules are processed and filtered through domain expertise to identify a subset that consists of “interesting” and “significant” rules. The subset of interesting and significant rules is formulated into corresponding queries to be posed, for validation purposes, to the Bayesian belief network induced from the same dataset. The promise of the proposed methodology was assessed through an empirical study performed on a real-life dataset, the National Crime Victimization Survey, which has over 250 attributes and well over 200,000 data points. The study demonstrated that the proposed approach is feasible and provides automation, in part, of the query formulation process for validation of a complex probabilistic model, which culminates in substantial savings for the need for human expert involvement and investment.
文摘提出一种改进的粒子滤波SLAM(simultaneous localization and map building)同时定位和地图创建实现方法。改进方法让机器人大约行进10步完成基于局部已创建地图下的粒子滤波定位后,再利用激光传感器探测环境并更新创建的地图;同时在利用粒子滤波定位时,使粒子只分布在由航位推算法得出的机器人位姿附近,从而可有效地减少粒子的数量。实验结果表明,与标准的粒子滤波SLAM算法比较,改进算法提高了机器人SLAM过程中定位和地图创建的精度和实时性,并为移动机器人在室外未知环境同时定位和地图创建提供了新方法。