Comprehensive evaluation and warning is very important and difficult in food safety. This paper mainly focuses on introducing the application of using big data mining in food safety warning field. At first,we introduc...Comprehensive evaluation and warning is very important and difficult in food safety. This paper mainly focuses on introducing the application of using big data mining in food safety warning field. At first,we introduce the concept of big data miming and three big data methods. At the same time,we discuss the application of the three big data miming methods in food safety areas. Then we compare these big data miming methods,and propose how to apply Back Propagation Neural Network in food safety risk warning.展开更多
We consider the model selection problem of the dependency between the?terminal event and the non-terminal event under semi-competing risks data. When the relationship between the two events is unspecified, the inferen...We consider the model selection problem of the dependency between the?terminal event and the non-terminal event under semi-competing risks data. When the relationship between the two events is unspecified, the inference on the non-terminal event is not identifiable. We cannot make inference on the non-terminal event without extra assumptions. Thus, an association model for?semi-competing risks data is necessary, and it is important to select an appropriate dependence model for a data set. We construct the likelihood function for semi-competing risks data to select an appropriate dependence model. From?simulation studies, it shows the performance of the proposed approach is well. Finally, we apply our method to a bone marrow transplant data set.展开更多
This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the informat...This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the information quality risks for the database of the geographical information system (GIS). Four quantitative measures are introduced to examine how the quality risks of source information affect the quality of information outputs produced using the relational algebra operations Selection, Projection, and Cubic Product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The GIS is the prime source of information on the location of cables, and detection time strongly depends on whether maps indicate the presence of cables in the construction business. Poor data quality in the GIS can contribute to increased risk or higher risk avoidance costs. A case study provides a numerical example of the calculation of the trade-offs between risk and detection costs and provides an example of the calculation of the costs of data quality. We conclude that the model contributes valuable new insight.展开更多
Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in d...Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in decision making. Risk assessment is very important for safe and reliable investment. Risk management involves assessing the risk sources and designing strategies and procedures to mitigate those risks to an acceptable level. In this paper, we emphasize on classification of different types of risk factors and find a simple and effective way to calculate the risk exposure.. The study uses rough set method to classify and judge the safety attributes related to investment policy. The method which based on intelligent knowledge accusation provides an innovative way for risk analysis. From this approach, we are able to calculate the significance of each factor and relative risk exposure based on the original data without assigning the weight subjectively.展开更多
This paper considers quantile regression analysis based on semi-competing risks data in which a non-terminal event may be dependently censored by a terminal event. The major interest is the covariate effects on the qu...This paper considers quantile regression analysis based on semi-competing risks data in which a non-terminal event may be dependently censored by a terminal event. The major interest is the covariate effects on the quantile of the non-terminal event time. Dependent censoring is handled by assuming that the joint distribution of the two event times follows a parametric copula model with unspecified marginal distributions. The technique of inverse probability weighting (IPW) is adopted to adjust for the selection bias. Large-sample properties of the proposed estimator are derived and a model diagnostic procedure is developed to check the adequacy of the model assumption. Simulation results show that the proposed estimator performs well. For illustrative purposes, our method is applied to analyze the bone marrow transplant data in [1].展开更多
Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when ...Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.展开更多
Construction project is not a standalone engineering maneuver.It is closely linked to the well-being of local communities in concern.The city renovation in Beijing down center for Olympic 2008 transformed many antique...Construction project is not a standalone engineering maneuver.It is closely linked to the well-being of local communities in concern.The city renovation in Beijing down center for Olympic 2008 transformed many antique architecture and regional landscape.It gave a world-recognized achievement in China s modem development and manifested a major milestone in China's economic development.In the course of metro construction projects,there are substantial interwoven municipal structures influencing the success of the projects,which including,but the least,all underground cables and ducts,sewage system,the power consumption of construction works,traffic diversion,air pollution,expatriate business activities and social security.There are many US and UK project insurance companies moving into Asia Pacific.They are doing re-insurance business on major construction guarantee,such as machinery damage,project on-time,power consumption,claims from contractors and communities.Environmental information,such as water quality,indoor and outdoor air quality,people inflow and lift waiting time play deterministic roles in construction's fit-touse.Big Data is a contemporary buzzword since 2013,and the key competence is to provide real time response to heuristic syndrome in order to make short-term prediction.This paper attempts to develop a conceptual model in big data for construction展开更多
This paper proposes a simple two-step nonparametric procedure to estimate the intraday jump tail and measure the jump tail risk in asset price with noisy high frequency data. We first propose the pre-averaging thresho...This paper proposes a simple two-step nonparametric procedure to estimate the intraday jump tail and measure the jump tail risk in asset price with noisy high frequency data. We first propose the pre-averaging threshold approach to estimate the intraday jumps occurred, and then use the peaks-over-threshold (POT) method and generalized Pareto distribution (GPD) to model the intraday jump tail and further measure the jump tail risk. Finally, an empirical example further demonstrates the power of the proposed method to measure the jump tail risk under the effect of microstructure noise.展开更多
The paper aims to discuss three interesting issues of statistical inferences for a common risk ratio (RR) in sparse meta-analysis data. Firstly, the conventional log-risk ratio estimator encounters a number of problem...The paper aims to discuss three interesting issues of statistical inferences for a common risk ratio (RR) in sparse meta-analysis data. Firstly, the conventional log-risk ratio estimator encounters a number of problems when the number of events in the experimental or control group is zero in sparse data of a 2 × 2 table. The adjusted log-risk ratio estimator with the continuity correction points based upon the minimum Bayes risk with respect to the uniform prior density over (0, 1) and the Euclidean loss function is proposed. Secondly, the interest is to find the optimal weights of the pooled estimate that minimize the mean square error (MSE) of subject to the constraint on where , , . Finally, the performance of this minimum MSE weighted estimator adjusted with various values of points is investigated to compare with other popular estimators, such as the Mantel-Haenszel (MH) estimator and the weighted least squares (WLS) estimator (also equivalently known as the inverse-variance weighted estimator) in senses of point estimation and hypothesis testing via simulation studies. The results of estimation illustrate that regardless of the true values of RR, the MH estimator achieves the best performance with the smallest MSE when the study size is rather large and the sample sizes within each study are small. The MSE of WLS estimator and the proposed-weight estimator adjusted by , or , or are close together and they are the best when the sample sizes are moderate to large (and) while the study size is rather small.展开更多
Cyberattacks are difficult to prevent because the targeted companies and organizations are often relying on new and fundamentally insecure cloudbased technologies,such as the Internet of Things.With increasing industr...Cyberattacks are difficult to prevent because the targeted companies and organizations are often relying on new and fundamentally insecure cloudbased technologies,such as the Internet of Things.With increasing industry adoption and migration of traditional computing services to the cloud,one of the main challenges in cybersecurity is to provide mechanisms to secure these technologies.This work proposes a Data Security Framework for cloud computing services(CCS)that evaluates and improves CCS data security from a software engineering perspective by evaluating the levels of security within the cloud computing paradigm using engineering methods and techniques applied to CCS.This framework is developed by means of a methodology based on a heuristic theory that incorporates knowledge generated by existing works as well as the experience of their implementation.The paper presents the design details of the framework,which consists of three stages:identification of data security requirements,management of data security risks and evaluation of data security performance in CCS.展开更多
Since creation of spatial data is a costly and time consuming process, researchers, in this domain, in most of the cases rely on open source spatial attributes for their specific purpose. Likewise, the present researc...Since creation of spatial data is a costly and time consuming process, researchers, in this domain, in most of the cases rely on open source spatial attributes for their specific purpose. Likewise, the present research aims at mapping landslide susceptibility at the metropolitan area of Chittagong district of Bangladesh utilizing obtainable open source spatial data from various web portals. In this regard, we targeted a study region where rainfall induced landslides reportedly causes causalities as well as property damage each year. In this study, however, we employed multi-criteria evaluation (MCE) technique i.e., heuristic, a knowledge driven approach based on expert opinions from various discipline for landslide susceptibility mapping combining nine causative factors—geomorphology, geology, land use/land cover (LULC), slope, aspect, plan curvature, drainage distance, relative relief and vegetation in geographic information system (GIS) environment. The final susceptibility map was devised into five hazard classes viz., very low, low, moderate, high, and very high, representing 22 km2 (13%), 90 km2 (53%);24 km2 (15%);22 km2 (13%) and 10 km2 (6%) areas respectively. This particular study might be beneficial to the local authorities and other stake-holders, concerned in disaster risk reduction and mitigation activities. Moreover this study can also be advantageous for risk sensitive land use planning in the study area.展开更多
In this paper, decision mechanism of credit-risk for banks is studied when the loan interest rate is fixed with asymmetry information in credit market. We give out the designs of rationing and non-rationing on credit ...In this paper, decision mechanism of credit-risk for banks is studied when the loan interest rate is fixed with asymmetry information in credit market. We give out the designs of rationing and non-rationing on credit risky decision mechanism when collateral value provided by an entrepreneur is not less than the minimum demands of the bank. It shows that under the action of the mechanism, banks could efficiently identify the risk size of the project. Finally, the condition of the project investigation of bank is given over again.展开更多
Background Combinations of coronary heart disease(CHD) and other chronic conditions complicate clinical management and increase healthcare costs. The aim of this study was to evaluate gender-specific relationships bet...Background Combinations of coronary heart disease(CHD) and other chronic conditions complicate clinical management and increase healthcare costs. The aim of this study was to evaluate gender-specific relationships between CHD and other comorbidities. Methods We analyzed data from the German Health Interview and Examination Survey(DEGS1), a national survey of 8152 adults aged 18-79 years. Female and male participants with self-reported CHD were compared for 23 chronic medical conditions. Regression models were applied to determine potential associations between CHD and these 23 conditions. Results The prevalence of CHD was 9%(547 participants): 34%(185) were female CHD participants and 66%(362) male. In women, CHD was associated with hypertension(OR = 3.28(1.81-5.9)), lipid disorders(OR = 2.40(1.50-3.83)), diabetes mellitus(OR = 2.08(1.24-3.50)), kidney disease(OR = 2.66(1.101-6.99)), thyroid disease(OR = 1.81(1.18-2.79)), gout/high uric acid levels(OR = 2.08(1.22-3.56)) and osteoporosis(OR = 1.69(1.01-2.84)). In men, CHD patients were more likely to have hypertension(OR = 2.80(1.94-4.04)), diabetes mellitus(OR = 1.87(1.29-2.71)), lipid disorder(OR = 1.82(1.34-2.47)), and chronic kidney disease(OR = 3.28(1.81-5.9)). Conclusion Our analysis revealed two sets of chronic conditions associated with CHD. The first set occurred in both women and men, and comprised known risk factors: hypertension, lipid disorders, kidney disease, and diabetes mellitus. The second set appeared unique to women: thyroid disease, osteoporosis, and gout/high uric acid. Identification of shared and unique gender-related associations between CHD and other conditions provides potential to tailor screening, preventive, and therapeutic options.展开更多
As machine learning moves into high-risk and sensitive applications such as medical care,autonomous driving,and financial planning,how to interpret the predictions of the black-box model becomes the key to whether peo...As machine learning moves into high-risk and sensitive applications such as medical care,autonomous driving,and financial planning,how to interpret the predictions of the black-box model becomes the key to whether people can trust machine learning decisions.Interpretability relies on providing users with additional information or explanations to improve model transparency and help users understand model decisions.However,these information inevitably leads to the dataset or model into the risk of privacy leaks.We propose a strategy to reduce model privacy leakage for instance interpretability techniques.The following is the specific operation process.Firstly,the user inputs data into the model,and the model calculates the prediction confidence of the data provided by the user and gives the prediction results.Meanwhile,the model obtains the prediction confidence of the interpretation data set.Finally,the data with the smallest Euclidean distance between the confidence of the interpretation set and the prediction data as the explainable data.Experimental results show that The Euclidean distance between the confidence of interpretation data and the confidence of prediction data provided by this method is very small,which shows that the model's prediction of interpreted data is very similar to the model's prediction of user data.Finally,we demonstrate the accuracy of the explanatory data.We measure the matching degree between the real label and the predicted label of the interpreted data and the applicability to the network model.The results show that the interpretation method has high accuracy and wide applicability.展开更多
基金Supported by Soft Science Research Project of Guizhou Province(R20142023)Key Youth Fund Project of Guizhou Academy of Sciences(J201402)
文摘Comprehensive evaluation and warning is very important and difficult in food safety. This paper mainly focuses on introducing the application of using big data mining in food safety warning field. At first,we introduce the concept of big data miming and three big data methods. At the same time,we discuss the application of the three big data miming methods in food safety areas. Then we compare these big data miming methods,and propose how to apply Back Propagation Neural Network in food safety risk warning.
文摘We consider the model selection problem of the dependency between the?terminal event and the non-terminal event under semi-competing risks data. When the relationship between the two events is unspecified, the inference on the non-terminal event is not identifiable. We cannot make inference on the non-terminal event without extra assumptions. Thus, an association model for?semi-competing risks data is necessary, and it is important to select an appropriate dependence model for a data set. We construct the likelihood function for semi-competing risks data to select an appropriate dependence model. From?simulation studies, it shows the performance of the proposed approach is well. Finally, we apply our method to a bone marrow transplant data set.
基金The National Natural Science Foundation of China (No.70772021,70372004)China Postdoctoral Science Foundation (No.20060400077)
文摘This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the information quality risks for the database of the geographical information system (GIS). Four quantitative measures are introduced to examine how the quality risks of source information affect the quality of information outputs produced using the relational algebra operations Selection, Projection, and Cubic Product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The GIS is the prime source of information on the location of cables, and detection time strongly depends on whether maps indicate the presence of cables in the construction business. Poor data quality in the GIS can contribute to increased risk or higher risk avoidance costs. A case study provides a numerical example of the calculation of the trade-offs between risk and detection costs and provides an example of the calculation of the costs of data quality. We conclude that the model contributes valuable new insight.
文摘Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in decision making. Risk assessment is very important for safe and reliable investment. Risk management involves assessing the risk sources and designing strategies and procedures to mitigate those risks to an acceptable level. In this paper, we emphasize on classification of different types of risk factors and find a simple and effective way to calculate the risk exposure.. The study uses rough set method to classify and judge the safety attributes related to investment policy. The method which based on intelligent knowledge accusation provides an innovative way for risk analysis. From this approach, we are able to calculate the significance of each factor and relative risk exposure based on the original data without assigning the weight subjectively.
文摘This paper considers quantile regression analysis based on semi-competing risks data in which a non-terminal event may be dependently censored by a terminal event. The major interest is the covariate effects on the quantile of the non-terminal event time. Dependent censoring is handled by assuming that the joint distribution of the two event times follows a parametric copula model with unspecified marginal distributions. The technique of inverse probability weighting (IPW) is adopted to adjust for the selection bias. Large-sample properties of the proposed estimator are derived and a model diagnostic procedure is developed to check the adequacy of the model assumption. Simulation results show that the proposed estimator performs well. For illustrative purposes, our method is applied to analyze the bone marrow transplant data in [1].
基金supported by the National Nature Science Foundation of China(Grant No.71401052)the National Social Science Foundation of China(Grant No.17BGL156)the Key Project of the National Social Science Foundation of China(Grant No.14AZD024)
文摘Identification of security risk factors for small reservoirs is the basis for implementation of early warning systems.The manner of identification of the factors for small reservoirs is of practical significance when data are incomplete.The existing grey relational models have some disadvantages in measuring the correlation between categorical data sequences.To this end,this paper introduces a new grey relational model to analyze heterogeneous data.In this study,a set of security risk factors for small reservoirs was first constructed based on theoretical analysis,and heterogeneous data of these factors were recorded as sequences.The sequences were regarded as random variables,and the information entropy and conditional entropy between sequences were measured to analyze the relational degree between risk factors.Then,a new grey relational analysis model for heterogeneous data was constructed,and a comprehensive security risk factor identification method was developed.A case study of small reservoirs in Guangxi Zhuang Autonomous Region in China shows that the model constructed in this study is applicable to security risk factor identification for small reservoirs with heterogeneous and sparse data.
文摘Construction project is not a standalone engineering maneuver.It is closely linked to the well-being of local communities in concern.The city renovation in Beijing down center for Olympic 2008 transformed many antique architecture and regional landscape.It gave a world-recognized achievement in China s modem development and manifested a major milestone in China's economic development.In the course of metro construction projects,there are substantial interwoven municipal structures influencing the success of the projects,which including,but the least,all underground cables and ducts,sewage system,the power consumption of construction works,traffic diversion,air pollution,expatriate business activities and social security.There are many US and UK project insurance companies moving into Asia Pacific.They are doing re-insurance business on major construction guarantee,such as machinery damage,project on-time,power consumption,claims from contractors and communities.Environmental information,such as water quality,indoor and outdoor air quality,people inflow and lift waiting time play deterministic roles in construction's fit-touse.Big Data is a contemporary buzzword since 2013,and the key competence is to provide real time response to heuristic syndrome in order to make short-term prediction.This paper attempts to develop a conceptual model in big data for construction
文摘This paper proposes a simple two-step nonparametric procedure to estimate the intraday jump tail and measure the jump tail risk in asset price with noisy high frequency data. We first propose the pre-averaging threshold approach to estimate the intraday jumps occurred, and then use the peaks-over-threshold (POT) method and generalized Pareto distribution (GPD) to model the intraday jump tail and further measure the jump tail risk. Finally, an empirical example further demonstrates the power of the proposed method to measure the jump tail risk under the effect of microstructure noise.
文摘The paper aims to discuss three interesting issues of statistical inferences for a common risk ratio (RR) in sparse meta-analysis data. Firstly, the conventional log-risk ratio estimator encounters a number of problems when the number of events in the experimental or control group is zero in sparse data of a 2 × 2 table. The adjusted log-risk ratio estimator with the continuity correction points based upon the minimum Bayes risk with respect to the uniform prior density over (0, 1) and the Euclidean loss function is proposed. Secondly, the interest is to find the optimal weights of the pooled estimate that minimize the mean square error (MSE) of subject to the constraint on where , , . Finally, the performance of this minimum MSE weighted estimator adjusted with various values of points is investigated to compare with other popular estimators, such as the Mantel-Haenszel (MH) estimator and the weighted least squares (WLS) estimator (also equivalently known as the inverse-variance weighted estimator) in senses of point estimation and hypothesis testing via simulation studies. The results of estimation illustrate that regardless of the true values of RR, the MH estimator achieves the best performance with the smallest MSE when the study size is rather large and the sample sizes within each study are small. The MSE of WLS estimator and the proposed-weight estimator adjusted by , or , or are close together and they are the best when the sample sizes are moderate to large (and) while the study size is rather small.
文摘Cyberattacks are difficult to prevent because the targeted companies and organizations are often relying on new and fundamentally insecure cloudbased technologies,such as the Internet of Things.With increasing industry adoption and migration of traditional computing services to the cloud,one of the main challenges in cybersecurity is to provide mechanisms to secure these technologies.This work proposes a Data Security Framework for cloud computing services(CCS)that evaluates and improves CCS data security from a software engineering perspective by evaluating the levels of security within the cloud computing paradigm using engineering methods and techniques applied to CCS.This framework is developed by means of a methodology based on a heuristic theory that incorporates knowledge generated by existing works as well as the experience of their implementation.The paper presents the design details of the framework,which consists of three stages:identification of data security requirements,management of data security risks and evaluation of data security performance in CCS.
文摘Since creation of spatial data is a costly and time consuming process, researchers, in this domain, in most of the cases rely on open source spatial attributes for their specific purpose. Likewise, the present research aims at mapping landslide susceptibility at the metropolitan area of Chittagong district of Bangladesh utilizing obtainable open source spatial data from various web portals. In this regard, we targeted a study region where rainfall induced landslides reportedly causes causalities as well as property damage each year. In this study, however, we employed multi-criteria evaluation (MCE) technique i.e., heuristic, a knowledge driven approach based on expert opinions from various discipline for landslide susceptibility mapping combining nine causative factors—geomorphology, geology, land use/land cover (LULC), slope, aspect, plan curvature, drainage distance, relative relief and vegetation in geographic information system (GIS) environment. The final susceptibility map was devised into five hazard classes viz., very low, low, moderate, high, and very high, representing 22 km2 (13%), 90 km2 (53%);24 km2 (15%);22 km2 (13%) and 10 km2 (6%) areas respectively. This particular study might be beneficial to the local authorities and other stake-holders, concerned in disaster risk reduction and mitigation activities. Moreover this study can also be advantageous for risk sensitive land use planning in the study area.
基金This project was supported by Fubangs Science & Technology Company Ltd.
文摘In this paper, decision mechanism of credit-risk for banks is studied when the loan interest rate is fixed with asymmetry information in credit market. We give out the designs of rationing and non-rationing on credit risky decision mechanism when collateral value provided by an entrepreneur is not less than the minimum demands of the bank. It shows that under the action of the mechanism, banks could efficiently identify the risk size of the project. Finally, the condition of the project investigation of bank is given over again.
文摘Background Combinations of coronary heart disease(CHD) and other chronic conditions complicate clinical management and increase healthcare costs. The aim of this study was to evaluate gender-specific relationships between CHD and other comorbidities. Methods We analyzed data from the German Health Interview and Examination Survey(DEGS1), a national survey of 8152 adults aged 18-79 years. Female and male participants with self-reported CHD were compared for 23 chronic medical conditions. Regression models were applied to determine potential associations between CHD and these 23 conditions. Results The prevalence of CHD was 9%(547 participants): 34%(185) were female CHD participants and 66%(362) male. In women, CHD was associated with hypertension(OR = 3.28(1.81-5.9)), lipid disorders(OR = 2.40(1.50-3.83)), diabetes mellitus(OR = 2.08(1.24-3.50)), kidney disease(OR = 2.66(1.101-6.99)), thyroid disease(OR = 1.81(1.18-2.79)), gout/high uric acid levels(OR = 2.08(1.22-3.56)) and osteoporosis(OR = 1.69(1.01-2.84)). In men, CHD patients were more likely to have hypertension(OR = 2.80(1.94-4.04)), diabetes mellitus(OR = 1.87(1.29-2.71)), lipid disorder(OR = 1.82(1.34-2.47)), and chronic kidney disease(OR = 3.28(1.81-5.9)). Conclusion Our analysis revealed two sets of chronic conditions associated with CHD. The first set occurred in both women and men, and comprised known risk factors: hypertension, lipid disorders, kidney disease, and diabetes mellitus. The second set appeared unique to women: thyroid disease, osteoporosis, and gout/high uric acid. Identification of shared and unique gender-related associations between CHD and other conditions provides potential to tailor screening, preventive, and therapeutic options.
基金This work is supported by the National Natural Science Foundation of China(Grant No.61966011)Hainan University Education and Teaching Reform Research Project(Grant No.HDJWJG01)+3 种基金Key Research and Development Program of Hainan Province(Grant No.ZDYF2020033)Young Talents’Science and Technology Innovation Project of Hainan Association for Science and Technology(Grant No.QCXM202007)Hainan Provincial Natural Science Foundation of China(Grant No.621RC612)Hainan Provincial Natural Science Foundation of China(Grant No.2019RC107).
文摘As machine learning moves into high-risk and sensitive applications such as medical care,autonomous driving,and financial planning,how to interpret the predictions of the black-box model becomes the key to whether people can trust machine learning decisions.Interpretability relies on providing users with additional information or explanations to improve model transparency and help users understand model decisions.However,these information inevitably leads to the dataset or model into the risk of privacy leaks.We propose a strategy to reduce model privacy leakage for instance interpretability techniques.The following is the specific operation process.Firstly,the user inputs data into the model,and the model calculates the prediction confidence of the data provided by the user and gives the prediction results.Meanwhile,the model obtains the prediction confidence of the interpretation data set.Finally,the data with the smallest Euclidean distance between the confidence of the interpretation set and the prediction data as the explainable data.Experimental results show that The Euclidean distance between the confidence of interpretation data and the confidence of prediction data provided by this method is very small,which shows that the model's prediction of interpreted data is very similar to the model's prediction of user data.Finally,we demonstrate the accuracy of the explanatory data.We measure the matching degree between the real label and the predicted label of the interpreted data and the applicability to the network model.The results show that the interpretation method has high accuracy and wide applicability.