Most existing multi-pattern matching algorithms are designed for single English texts leading to issues such as missed matches and space expansion when applied to Chinese-English mixed-text environments.The Hash Trie-...Most existing multi-pattern matching algorithms are designed for single English texts leading to issues such as missed matches and space expansion when applied to Chinese-English mixed-text environments.The Hash Trie-based matching machine demonstrates strong compatibility with both Chinese and English,ensuring high accuracy in text processing and subtree positioning.In this study,a novel functional framework based on the HashTrie structure is proposed and mechanically verified using Isabelle/HOL.This framework is applied to design Functional Multi-Pattern Matching(FMPM),the first functional multi-pattern matching algorithm for Chinese-English mixed texts.FMPM constructs the HashTrie matching machine using character codes and threads the machine according to the associations between pattern strings.The experimental results show that as the stored string information increases,the proposed algorithm demonstrates more significant optimization in retrieval efficiency.FMPM simplifies the implementation of the Threaded Hash Trie(THT)for Chinese-English mixed texts,effectively reducing the uncertainties in the transition from the algorithm description to code implementation.FMPM addresses the problem of space explosion Chinese-English mixed texts and avoids issues such as bound variable iteration errors.The functional framework of the HashTrie structure serves as a reference for the formal verification of future HashTrie-based algorithms.展开更多
This paper presents a twice-gathering information interactive system prototype of e-government based on the condition that the Intranet and the Extranet are physical isolated.Users in the Extranet can gather links of ...This paper presents a twice-gathering information interactive system prototype of e-government based on the condition that the Intranet and the Extranet are physical isolated.Users in the Extranet can gather links of the latest related information from client software which is previously collected by web alert in the Internet.Finally,through ferry-type transport devices,information is browsed by users in the Intranet,and it is transported to a storage device and synchronized with the web platform in the Intranet.During information gathering in the Extranet and data synchronization in the Intranet,it is essential to avoid repeated gathering and copying by means of comparing the extracted information fingerprints gathered from the web pages.This prototype uses HashTrie to store information fingerprints.During testing,the structure based on HashTrie is 2.28 times faster than the Darts(double array Trie)which is the fastest structure in the existing applied patent.The existing 12 types of high speed Hash functions serving for HashTrie are also implemented.When the dictionary content is larger than 5×105 words,the PJWHash or the SuperFastHush function can be adopted;when the dictionary content is 105 words, CalcStrCR32 and ELFHash functions can be adopted.展开更多
基金Supported by the National Natural Science Foundation of China(62462036,62462037)Jiangxi Provincial Natural Science Foundation(20242BAB26017,20232BAB202010)+1 种基金Cultivation Project for Academic and Technical Leader in Major Disciplines in Jiangxi Province(20232BCJ22013)the Jiangxi Province Graduate Innovation Found Project(YC2024-S214)。
文摘Most existing multi-pattern matching algorithms are designed for single English texts leading to issues such as missed matches and space expansion when applied to Chinese-English mixed-text environments.The Hash Trie-based matching machine demonstrates strong compatibility with both Chinese and English,ensuring high accuracy in text processing and subtree positioning.In this study,a novel functional framework based on the HashTrie structure is proposed and mechanically verified using Isabelle/HOL.This framework is applied to design Functional Multi-Pattern Matching(FMPM),the first functional multi-pattern matching algorithm for Chinese-English mixed texts.FMPM constructs the HashTrie matching machine using character codes and threads the machine according to the associations between pattern strings.The experimental results show that as the stored string information increases,the proposed algorithm demonstrates more significant optimization in retrieval efficiency.FMPM simplifies the implementation of the Threaded Hash Trie(THT)for Chinese-English mixed texts,effectively reducing the uncertainties in the transition from the algorithm description to code implementation.FMPM addresses the problem of space explosion Chinese-English mixed texts and avoids issues such as bound variable iteration errors.The functional framework of the HashTrie structure serves as a reference for the formal verification of future HashTrie-based algorithms.
基金The National Basic Research Program of China(973 Program)(No.2007CB310806)
文摘This paper presents a twice-gathering information interactive system prototype of e-government based on the condition that the Intranet and the Extranet are physical isolated.Users in the Extranet can gather links of the latest related information from client software which is previously collected by web alert in the Internet.Finally,through ferry-type transport devices,information is browsed by users in the Intranet,and it is transported to a storage device and synchronized with the web platform in the Intranet.During information gathering in the Extranet and data synchronization in the Intranet,it is essential to avoid repeated gathering and copying by means of comparing the extracted information fingerprints gathered from the web pages.This prototype uses HashTrie to store information fingerprints.During testing,the structure based on HashTrie is 2.28 times faster than the Darts(double array Trie)which is the fastest structure in the existing applied patent.The existing 12 types of high speed Hash functions serving for HashTrie are also implemented.When the dictionary content is larger than 5×105 words,the PJWHash or the SuperFastHush function can be adopted;when the dictionary content is 105 words, CalcStrCR32 and ELFHash functions can be adopted.