Fraudulent website is an important car-rier tool for telecom fraud.At present,criminals can use artificial intelligence generative content technol-ogy to quickly generate fraudulent website templates and build fraudul...Fraudulent website is an important car-rier tool for telecom fraud.At present,criminals can use artificial intelligence generative content technol-ogy to quickly generate fraudulent website templates and build fraudulent websites in batches.Accurate identification of fraudulent website will effectively re-duce the risk of public victimization.Therefore,this study developed a fraudulent website template iden-tification method based on DOM structure extraction of website fingerprint features,which solves the prob-lems of single-dimension identification,low accuracy,and the insufficient generalization ability of current fraudulent website templates.This method uses an im-proved SimHash algorithm to traverse the DOM tree of a webpage,extract website node features,calcu-late the weight of each node,and obtain the finger-print feature vector of the website through dimension-ality reduction.Finally,the random forest algorithm is used to optimize the training features for the best combination of parameters.This method automati-cally extracts fingerprint features from websites and identifies website template ownership based on these features.An experimental analysis showed that this method achieves a classification accuracy of 89.8%and demonstrates superior recognition.展开更多
基金This research is a phased achievement of The National Social Science Fund of China(23BGL272).
文摘Fraudulent website is an important car-rier tool for telecom fraud.At present,criminals can use artificial intelligence generative content technol-ogy to quickly generate fraudulent website templates and build fraudulent websites in batches.Accurate identification of fraudulent website will effectively re-duce the risk of public victimization.Therefore,this study developed a fraudulent website template iden-tification method based on DOM structure extraction of website fingerprint features,which solves the prob-lems of single-dimension identification,low accuracy,and the insufficient generalization ability of current fraudulent website templates.This method uses an im-proved SimHash algorithm to traverse the DOM tree of a webpage,extract website node features,calcu-late the weight of each node,and obtain the finger-print feature vector of the website through dimension-ality reduction.Finally,the random forest algorithm is used to optimize the training features for the best combination of parameters.This method automati-cally extracts fingerprint features from websites and identifies website template ownership based on these features.An experimental analysis showed that this method achieves a classification accuracy of 89.8%and demonstrates superior recognition.