目的以中药药性作为特征变量,构建基于Voting集成算法的中药抗炎作用预测模型,并通过可视化技术分析不同药性特征对于中药抗炎作用的影响。方法以《中药学》与SymMap数据库中1247味中药为研究对象,经过初筛和复筛后建立包含性味归经等...目的以中药药性作为特征变量,构建基于Voting集成算法的中药抗炎作用预测模型,并通过可视化技术分析不同药性特征对于中药抗炎作用的影响。方法以《中药学》与SymMap数据库中1247味中药为研究对象,经过初筛和复筛后建立包含性味归经等特征的规范化数据库。基于决策树、支持向量机、轻量级梯度提升机等6种基础模型构建Voting集成模型,并以七折交叉验证和基于树结构的贝叶斯优化算法超参数优化提升模型性能。利用SHAP(SHapley Additive ex Planations)解释器可视化关键药性特征。结果经筛选后,共纳入522味抗炎中药构建数据库。Voting集成模型综合性能最优,F1分数为0.797,AUC值为0.77,较单一模型平均提升7.4%。SHAP分析表明使中药发挥抗炎作用的重要特征分别是“脾经”“甘味”“补益”等,使中药不具有抗炎作用的重要特征为“性温或平”和“毒性”。结论首次通过集成算法构建具有良好性能的中药抗炎作用预测模型,为中医药与机器学习结合的研究模式提供了新思路。展开更多
Small-drone technology has opened a range of new applications for aerial transportation. These drones leverage the Internet of Things (IoT) to offer cross-location services for navigation. However, they are susceptibl...Small-drone technology has opened a range of new applications for aerial transportation. These drones leverage the Internet of Things (IoT) to offer cross-location services for navigation. However, they are susceptible to security and privacy threats due to hardware and architectural issues. Although small drones hold promise for expansion in both civil and defense sectors, they have safety, security, and privacy threats. Addressing these challenges is crucial to maintaining the security and uninterrupted operations of these drones. In this regard, this study investigates security, and preservation concerning both the drones and Internet of Drones (IoD), emphasizing the significance of creating drone networks that are secure and can robustly withstand interceptions and intrusions. The proposed framework incorporates a weighted voting ensemble model comprising three convolutional neural network (CNN) models to enhance intrusion detection within the network. The employed CNNs are customized 1D models optimized to obtain better performance. The output from these CNNs is voted using a weighted criterion using a 0.4, 0.3, and 0.3 ratio for three CNNs, respectively. Experiments involve using multiple benchmark datasets, achieving an impressive accuracy of up to 99.89% on drone data. The proposed model shows promising results concerning precision, recall, and F1 as indicated by their obtained values of 99.92%, 99.98%, and 99.97%, respectively. Furthermore, cross-validation and performance comparison with existing works is also carried out. Findings indicate that the proposed approach offers a prospective solution for detecting security threats for aerial systems and satellite systems with high accuracy.展开更多
Blockchain-based user-centric access network(UCAN)fails in dynamic access point(AP)management,as it lacks an incentive mechanism to promote virtuous behavior.Furthermore,the low throughput of the blockchain has been a...Blockchain-based user-centric access network(UCAN)fails in dynamic access point(AP)management,as it lacks an incentive mechanism to promote virtuous behavior.Furthermore,the low throughput of the blockchain has been a bottleneck to the widespread adoption of UCAN in 6G.In this paper,we propose Overlap Shard,a blockchain framework based on a novel reputation voting(RV)scheme,to dynamically manage the APs in UCAN.AP nodes in UCAN are distributed across multiple shards based on the RV scheme.That is,nodes with good reputation(virtuous behavior)are likely to be selected in the overlap shard.The RV mechanism ensures the security of UCAN because most APs adopt virtuous behaviors.Furthermore,to improve the efficiency of the Overlap Shard,we reduce cross-shard transactions by introducing core nodes.Specifically,a few nodes are overlapped in different shards,which can directly process the transactions in two shards instead of crossshard transactions.This greatly increases the speed of transactions between shards and thus the throughput of the overlap shard.The experiments show that the throughput of the overlap shard is about 2.5 times that of the non-sharded blockchain.展开更多
Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the ident...Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.展开更多
Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine lea...Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.展开更多
In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceana...In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceanatmosphere models,which exhibit varying levels of complexity.This nonlinear approach demonstrated extraordinary superiority and effectiveness in constructing ENSO MME.Subsequently,we employed the leave-one-out crossvalidation and the moving base methods to further validate the robustness of the neural network model in the formulation of ENSO MME.In conclusion,the neural network algorithm outperforms the conventional approach of assigning a uniform weight to all models.This is evidenced by an enhancement in correlation coefficients and reduction in prediction errors,which have the potential to provide a more accurate ENSO forecast.展开更多
According to the Charter of the United Nations,the United Nations Security Council adopts a“collective security system”authorized voting system,which has prominent drawbacks such as difficulty in fully reflecting th...According to the Charter of the United Nations,the United Nations Security Council adopts a“collective security system”authorized voting system,which has prominent drawbacks such as difficulty in fully reflecting the will of all Member States.Combining interdisciplinary,qualitative and quantitative research methods,in response to the dilemma of Security Council voting reform,this article suggests retaining the Security Council voting system and recommending a simplified model of“basic and weighted half”for voting allocation.This model not only inherits the authorized voting system of the collective security system,but also follows the allocation system of sovereignty equality in the Charter.It can also achieve the“draw on the advantages and avoid disadvantages”of Member States towards international development,promote the transformation of“absolute equality”of overall consistency into“real fairness”relative to individual contributions,and further promote the development of international law in the United Nations voting system.展开更多
Given the extremely high inter-patient heterogeneity of acute myeloid leukemia(AML),the identification of biomarkers for prognostic assessment and therapeutic guidance is critical.Cell surface markers(CSMs)have been s...Given the extremely high inter-patient heterogeneity of acute myeloid leukemia(AML),the identification of biomarkers for prognostic assessment and therapeutic guidance is critical.Cell surface markers(CSMs)have been shown to play an important role in AML leukemogenesis and progression.In the current study,we evaluated the prognostic potential of all human CSMs in 130 AML patients from The Cancer Genome Atlas(TCGA)based on differential gene expression analysis and univariable Cox proportional hazards regression analysis.By using multi-model analysis,including Adaptive LASSO regression,LASSO regression,and Elastic Net,we constructed a 9-CSMs prognostic model for risk stratification of the AML patients.The predictive value of the 9-CSMs risk score was further validated at the transcriptome and proteome levels.Multivariable Cox regression analysis showed that the risk score was an independent prognostic factor for the AML patients.The AML patients with high 9-CSMs risk scores had a shorter overall and event-free survival time than those with low scores.Notably,single-cell RNA-sequencing analysis indicated that patients with high 9-CSMs risk scores exhibited chemotherapy resistance.Furthermore,PI3K inhibitors were identified as potential treatments for these high-risk patients.In conclusion,we constructed a 9-CSMs prognostic model that served as an independent prognostic factor for the survival of AML patients and held the potential for guiding drug therapy.展开更多
Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional...Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.展开更多
Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating du...Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating due to the small size of datasets while mapping the relative importance of properties to the model response.This paper proposes an augmented Bayesian multi-model inference(BMMI)coupled with GSA methodology(BMMI-GSA)to address this issue by estimating the imprecision in the momentindependent sensitivity indices of rock structures arising from the small size of input data.The methodology employs BMMI to quantify the epistemic uncertainties associated with model type and parameters of input properties.The estimated uncertainties are propagated in estimating imprecision in moment-independent Borgonovo’s indices by employing a reweighting approach on candidate probabilistic models.The proposed methodology is showcased for a rock slope prone to stress-controlled failure in the Himalayan region of India.The proposed methodology was superior to the conventional GSA(neglects all epistemic uncertainties)and Bayesian coupled GSA(B-GSA)(neglects model uncertainty)due to its capability to incorporate the uncertainties in both model type and parameters of properties.Imprecise Borgonovo’s indices estimated via proposed methodology provide the confidence intervals of the sensitivity indices instead of their fixed-point estimates,which makes the user more informed in the data collection efforts.Analyses performed with the varying sample sizes suggested that the uncertainties in sensitivity indices reduce significantly with the increasing sample sizes.The accurate importance ranking of properties was only possible via samples of large sizes.Further,the impact of the prior knowledge in terms of prior ranges and distributions was significant;hence,any related assumption should be made carefully.展开更多
In the continually evolving landscape of data-driven methodologies addressing car crash patterns,a holistic analysis remains critical to decode the complex nuances of this phenomenon.This study bridges this knowledge ...In the continually evolving landscape of data-driven methodologies addressing car crash patterns,a holistic analysis remains critical to decode the complex nuances of this phenomenon.This study bridges this knowledge gap with a robust examination of car crash occurrence dynamics and the influencing variables in the Greater Melbourne area,Australia.We employed a comprehensive multi-model machine learning and geospatial analytics approach,unveiling the complicated interactions intrinsic to vehicular incidents.By harnessing Random Forest with SHAP(Shapley Additive Explanations),GLR(Generalized Linear Regression),and GWR(Geographically Weighted Regression),our research not only highlighted pivotal contributing elements but also enriched our findings by capturing often overlooked complexities.Using the Random Forest model,essential factors were emphasized,and with the aid of SHAP,we accessed the interaction of these factors.To complement our methodology,we incorporated hexagonalized geographic units,refining the granularity of crash density evaluations.In our multi-model study of car crash dynamics in Greater Melbourne,road geometry emerged as a key factor,with intersections showing a significant positive correlation with crashes.The average land surface temperature had variable significance across scales.Socio-economically,regions with a higher proportion of childless populations were identified as more prone to accidents.Public transit usage displayed a strong positive association with crashes,especially in densely populated areas.The convergence of insights from both Generalized Linear Regression and Random Forest’s SHAP values offered a comprehensive understanding of underlying patterns,pinpointing high-risk zones and influential determinants.These findings offer pivotal insights for targeted safety interventions in Greater Melbourne,Australia.展开更多
文摘目的以中药药性作为特征变量,构建基于Voting集成算法的中药抗炎作用预测模型,并通过可视化技术分析不同药性特征对于中药抗炎作用的影响。方法以《中药学》与SymMap数据库中1247味中药为研究对象,经过初筛和复筛后建立包含性味归经等特征的规范化数据库。基于决策树、支持向量机、轻量级梯度提升机等6种基础模型构建Voting集成模型,并以七折交叉验证和基于树结构的贝叶斯优化算法超参数优化提升模型性能。利用SHAP(SHapley Additive ex Planations)解释器可视化关键药性特征。结果经筛选后,共纳入522味抗炎中药构建数据库。Voting集成模型综合性能最优,F1分数为0.797,AUC值为0.77,较单一模型平均提升7.4%。SHAP分析表明使中药发挥抗炎作用的重要特征分别是“脾经”“甘味”“补益”等,使中药不具有抗炎作用的重要特征为“性温或平”和“毒性”。结论首次通过集成算法构建具有良好性能的中药抗炎作用预测模型,为中医药与机器学习结合的研究模式提供了新思路。
文摘Small-drone technology has opened a range of new applications for aerial transportation. These drones leverage the Internet of Things (IoT) to offer cross-location services for navigation. However, they are susceptible to security and privacy threats due to hardware and architectural issues. Although small drones hold promise for expansion in both civil and defense sectors, they have safety, security, and privacy threats. Addressing these challenges is crucial to maintaining the security and uninterrupted operations of these drones. In this regard, this study investigates security, and preservation concerning both the drones and Internet of Drones (IoD), emphasizing the significance of creating drone networks that are secure and can robustly withstand interceptions and intrusions. The proposed framework incorporates a weighted voting ensemble model comprising three convolutional neural network (CNN) models to enhance intrusion detection within the network. The employed CNNs are customized 1D models optimized to obtain better performance. The output from these CNNs is voted using a weighted criterion using a 0.4, 0.3, and 0.3 ratio for three CNNs, respectively. Experiments involve using multiple benchmark datasets, achieving an impressive accuracy of up to 99.89% on drone data. The proposed model shows promising results concerning precision, recall, and F1 as indicated by their obtained values of 99.92%, 99.98%, and 99.97%, respectively. Furthermore, cross-validation and performance comparison with existing works is also carried out. Findings indicate that the proposed approach offers a prospective solution for detecting security threats for aerial systems and satellite systems with high accuracy.
基金supported by the National Natural Science Foundation of China(NSFC)under Grant 61931005.
文摘Blockchain-based user-centric access network(UCAN)fails in dynamic access point(AP)management,as it lacks an incentive mechanism to promote virtuous behavior.Furthermore,the low throughput of the blockchain has been a bottleneck to the widespread adoption of UCAN in 6G.In this paper,we propose Overlap Shard,a blockchain framework based on a novel reputation voting(RV)scheme,to dynamically manage the APs in UCAN.AP nodes in UCAN are distributed across multiple shards based on the RV scheme.That is,nodes with good reputation(virtuous behavior)are likely to be selected in the overlap shard.The RV mechanism ensures the security of UCAN because most APs adopt virtuous behaviors.Furthermore,to improve the efficiency of the Overlap Shard,we reduce cross-shard transactions by introducing core nodes.Specifically,a few nodes are overlapped in different shards,which can directly process the transactions in two shards instead of crossshard transactions.This greatly increases the speed of transactions between shards and thus the throughput of the overlap shard.The experiments show that the throughput of the overlap shard is about 2.5 times that of the non-sharded blockchain.
文摘Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.
文摘Breast cancer is among the leading causes of cancer mortality globally,and its diagnosis through histopathological image analysis is often prone to inter-observer variability and misclassification.Existing machine learning(ML)methods struggle with intra-class heterogeneity and inter-class similarity,necessitating more robust classification models.This study presents an ML classifier ensemble hybrid model for deep feature extraction with deep learning(DL)and Bat Swarm Optimization(BSO)hyperparameter optimization to improve breast cancer histopathology(BCH)image classification.A dataset of 804 Hematoxylin and Eosin(H&E)stained images classified as Benign,in situ,Invasive,and Normal categories(ICIAR2018_BACH_Challenge)has been utilized.ResNet50 was utilized for feature extraction,while Support Vector Machines(SVM),Random Forests(RF),XGBoosts(XGB),Decision Trees(DT),and AdaBoosts(ADB)were utilized for classification.BSO was utilized for hyperparameter optimization in a soft voting ensemble approach.Accuracy,precision,recall,specificity,F1-score,Receiver Operating Characteristic(ROC),and Precision-Recall(PR)were utilized for model performance metrics.The model using an ensemble outperformed individual classifiers in terms of having greater accuracy(~90.0%),precision(~86.4%),recall(~86.3%),and specificity(~96.6%).The robustness of the model was verified by both ROC and PR curves,which showed AUC values of 1.00,0.99,and 0.98 for Benign,Invasive,and in situ instances,respectively.This ensemble model delivers a strong and clinically valid methodology for breast cancer classification that enhances precision and minimizes diagnostic errors.Future work should focus on explainable AI,multi-modal fusion,few-shot learning,and edge computing for real-world deployment.
基金The fund from Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)under contract No.SML2021SP310the National Natural Science Foundation of China under contract Nos 42227901 and 42475061the Key R&D Program of Zhejiang Province under contract No.2024C03257.
文摘In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceanatmosphere models,which exhibit varying levels of complexity.This nonlinear approach demonstrated extraordinary superiority and effectiveness in constructing ENSO MME.Subsequently,we employed the leave-one-out crossvalidation and the moving base methods to further validate the robustness of the neural network model in the formulation of ENSO MME.In conclusion,the neural network algorithm outperforms the conventional approach of assigning a uniform weight to all models.This is evidenced by an enhancement in correlation coefficients and reduction in prediction errors,which have the potential to provide a more accurate ENSO forecast.
文摘According to the Charter of the United Nations,the United Nations Security Council adopts a“collective security system”authorized voting system,which has prominent drawbacks such as difficulty in fully reflecting the will of all Member States.Combining interdisciplinary,qualitative and quantitative research methods,in response to the dilemma of Security Council voting reform,this article suggests retaining the Security Council voting system and recommending a simplified model of“basic and weighted half”for voting allocation.This model not only inherits the authorized voting system of the collective security system,but also follows the allocation system of sovereignty equality in the Charter.It can also achieve the“draw on the advantages and avoid disadvantages”of Member States towards international development,promote the transformation of“absolute equality”of overall consistency into“real fairness”relative to individual contributions,and further promote the development of international law in the United Nations voting system.
基金supported by the National Natural Science Foundation of China(Grant Nos.32200590 to K.L.,81972358 to Q.W.,91959113 to Q.W.,and 82372897 to Q.W.)the Natural Science Foundation of Jiangsu Province(Grant No.BK20210530 to K.L.).
文摘Given the extremely high inter-patient heterogeneity of acute myeloid leukemia(AML),the identification of biomarkers for prognostic assessment and therapeutic guidance is critical.Cell surface markers(CSMs)have been shown to play an important role in AML leukemogenesis and progression.In the current study,we evaluated the prognostic potential of all human CSMs in 130 AML patients from The Cancer Genome Atlas(TCGA)based on differential gene expression analysis and univariable Cox proportional hazards regression analysis.By using multi-model analysis,including Adaptive LASSO regression,LASSO regression,and Elastic Net,we constructed a 9-CSMs prognostic model for risk stratification of the AML patients.The predictive value of the 9-CSMs risk score was further validated at the transcriptome and proteome levels.Multivariable Cox regression analysis showed that the risk score was an independent prognostic factor for the AML patients.The AML patients with high 9-CSMs risk scores had a shorter overall and event-free survival time than those with low scores.Notably,single-cell RNA-sequencing analysis indicated that patients with high 9-CSMs risk scores exhibited chemotherapy resistance.Furthermore,PI3K inhibitors were identified as potential treatments for these high-risk patients.In conclusion,we constructed a 9-CSMs prognostic model that served as an independent prognostic factor for the survival of AML patients and held the potential for guiding drug therapy.
基金National Natural Science Foundation of China Nos.61962054 and 62372353.
文摘Traditional clustering algorithms often struggle to produce satisfactory results when dealing with datasets withuneven density. Additionally, they incur substantial computational costs when applied to high-dimensional datadue to calculating similarity matrices. To alleviate these issues, we employ the KD-Tree to partition the dataset andcompute the K-nearest neighbors (KNN) density for each point, thereby avoiding the computation of similaritymatrices. Moreover, we apply the rules of voting elections, treating each data point as a voter and casting a votefor the point with the highest density among its KNN. By utilizing the vote counts of each point, we develop thestrategy for classifying noise points and potential cluster centers, allowing the algorithm to identify clusters withuneven density and complex shapes. Additionally, we define the concept of “adhesive points” between two clustersto merge adjacent clusters that have similar densities. This process helps us identify the optimal number of clustersautomatically. Experimental results indicate that our algorithm not only improves the efficiency of clustering butalso increases its accuracy.
文摘Traditional global sensitivity analysis(GSA)neglects the epistemic uncertainties associated with the probabilistic characteristics(i.e.type of distribution type and its parameters)of input rock properties emanating due to the small size of datasets while mapping the relative importance of properties to the model response.This paper proposes an augmented Bayesian multi-model inference(BMMI)coupled with GSA methodology(BMMI-GSA)to address this issue by estimating the imprecision in the momentindependent sensitivity indices of rock structures arising from the small size of input data.The methodology employs BMMI to quantify the epistemic uncertainties associated with model type and parameters of input properties.The estimated uncertainties are propagated in estimating imprecision in moment-independent Borgonovo’s indices by employing a reweighting approach on candidate probabilistic models.The proposed methodology is showcased for a rock slope prone to stress-controlled failure in the Himalayan region of India.The proposed methodology was superior to the conventional GSA(neglects all epistemic uncertainties)and Bayesian coupled GSA(B-GSA)(neglects model uncertainty)due to its capability to incorporate the uncertainties in both model type and parameters of properties.Imprecise Borgonovo’s indices estimated via proposed methodology provide the confidence intervals of the sensitivity indices instead of their fixed-point estimates,which makes the user more informed in the data collection efforts.Analyses performed with the varying sample sizes suggested that the uncertainties in sensitivity indices reduce significantly with the increasing sample sizes.The accurate importance ranking of properties was only possible via samples of large sizes.Further,the impact of the prior knowledge in terms of prior ranges and distributions was significant;hence,any related assumption should be made carefully.
基金Linking Health,Place and Urban Planning through the Australian Urban Observatory by Ian Potter Foundation,Australia.
文摘In the continually evolving landscape of data-driven methodologies addressing car crash patterns,a holistic analysis remains critical to decode the complex nuances of this phenomenon.This study bridges this knowledge gap with a robust examination of car crash occurrence dynamics and the influencing variables in the Greater Melbourne area,Australia.We employed a comprehensive multi-model machine learning and geospatial analytics approach,unveiling the complicated interactions intrinsic to vehicular incidents.By harnessing Random Forest with SHAP(Shapley Additive Explanations),GLR(Generalized Linear Regression),and GWR(Geographically Weighted Regression),our research not only highlighted pivotal contributing elements but also enriched our findings by capturing often overlooked complexities.Using the Random Forest model,essential factors were emphasized,and with the aid of SHAP,we accessed the interaction of these factors.To complement our methodology,we incorporated hexagonalized geographic units,refining the granularity of crash density evaluations.In our multi-model study of car crash dynamics in Greater Melbourne,road geometry emerged as a key factor,with intersections showing a significant positive correlation with crashes.The average land surface temperature had variable significance across scales.Socio-economically,regions with a higher proportion of childless populations were identified as more prone to accidents.Public transit usage displayed a strong positive association with crashes,especially in densely populated areas.The convergence of insights from both Generalized Linear Regression and Random Forest’s SHAP values offered a comprehensive understanding of underlying patterns,pinpointing high-risk zones and influential determinants.These findings offer pivotal insights for targeted safety interventions in Greater Melbourne,Australia.