Most existing multi-pattern matching algorithms are designed for single English texts leading to issues such as missed matches and space expansion when applied to Chinese-English mixed-text environments.The Hash Trie-...Most existing multi-pattern matching algorithms are designed for single English texts leading to issues such as missed matches and space expansion when applied to Chinese-English mixed-text environments.The Hash Trie-based matching machine demonstrates strong compatibility with both Chinese and English,ensuring high accuracy in text processing and subtree positioning.In this study,a novel functional framework based on the HashTrie structure is proposed and mechanically verified using Isabelle/HOL.This framework is applied to design Functional Multi-Pattern Matching(FMPM),the first functional multi-pattern matching algorithm for Chinese-English mixed texts.FMPM constructs the HashTrie matching machine using character codes and threads the machine according to the associations between pattern strings.The experimental results show that as the stored string information increases,the proposed algorithm demonstrates more significant optimization in retrieval efficiency.FMPM simplifies the implementation of the Threaded Hash Trie(THT)for Chinese-English mixed texts,effectively reducing the uncertainties in the transition from the algorithm description to code implementation.FMPM addresses the problem of space explosion Chinese-English mixed texts and avoids issues such as bound variable iteration errors.The functional framework of the HashTrie structure serves as a reference for the formal verification of future HashTrie-based algorithms.展开更多
BACKGROUND Drug utilization research has an important role in assisting the healthcare administration to know,compute,and refine the prescription whose principal objective is to enable the rational use of drugs.Resear...BACKGROUND Drug utilization research has an important role in assisting the healthcare administration to know,compute,and refine the prescription whose principal objective is to enable the rational use of drugs.Research in developing nations relating to the cost of treatment is scarce when compared with developed countries.Thus,the drug utilization research studies from developing nations are most needed,and their number has been growing.AIM To evaluate patterns of utilization of antipsychotic drugs and direct medical cost analysis in patients newly diagnosed with schizophrenia.METHODS The present study was observational in type and based on a retrospective cohort to evaluate patterns of utilization of antipsychotic drugs using World Health Organization(WHO)core prescribing indicators and anatomical therapeutic chemical/defined daily dose indicators.We also calculated direct medical costs for a period of 6 months.RESULTS This study has found that atypical antipsychotics are the mainstay of treatment for schizophrenia in every age group and subcategories of schizophrenia.The evaluation based on WHO prescribing indicators showed a low average number of drugs per prescription and low prescribing frequency of antipsychotics from the National List of Essential Medicines 2015 and the WHO Essential Medicines List 2019.The total mean drug cost of our study was 1396 Indian rupees.The total mean cost due to the investigation in our study was 1017.34 Indian rupees.Therefore,the total mean direct medical cost incurred on patients in our study was 4337.28 Indian rupees.CONCLUSION The information from the present study can be used for reviewing and updating treatment policy at the institutional level.展开更多
In this study,a machine vision-based pattern matching technique was applied to estimate the location of an autonomous driving robot and perform 3D tunnel mapping in an underground mine environment.The autonomous drivi...In this study,a machine vision-based pattern matching technique was applied to estimate the location of an autonomous driving robot and perform 3D tunnel mapping in an underground mine environment.The autonomous driving robot continuously detects the wall of the tunnel in the horizontal direction using the light detection and ranging(Li DAR)sensor and performs pattern matching by recognizing the shape of the tunnel wall.The proposed method was designed to measure the heading of the robot by fusion with the inertial measurement units sensor according to the pattern matching accuracy;it is combined with the encoder sensor to estimate the location of the robot.In addition,when the robot is driving,the vertical direction of the underground mine is scanned through the vertical Li DAR sensor and stacked to create a 3D map of the underground mine.The performance of the proposed method was superior to that of previous studies;the mean absolute error achieved was 0.08 m for the X-Y axes.A root mean square error of 0.05 m^(2)was achieved by comparing the tunnel section maps that were created by the autonomous driving robot to those of manual surveying.展开更多
The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource...The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.展开更多
The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It i...The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.展开更多
Most of the Point Pattern Matching (PPM) algorithm performs poorly when the noise of the point's position and outliers exist. This paper presents a novel and robust PPM algorithm which combined Point Pair Topologi...Most of the Point Pattern Matching (PPM) algorithm performs poorly when the noise of the point's position and outliers exist. This paper presents a novel and robust PPM algorithm which combined Point Pair Topological Characteristics (PPTC) and Spectral Matching (SM) together to solve the afore mentioned issues. In which PPTC, a new shape descriptor, is firstly proposed. A new comparability measurement based on PPTC is defined as the matching probability. Finally, the correct matching results are achieved by the spectral matching method. The synthetic data experiments show its robustness by comparing with the other state-of-art algorithms and the real world data experiments show its effectiveness.展开更多
Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck ...Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.展开更多
Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many...Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many other applications highly depend on pattern matching or similarity searches. In this paper, we discuss some of the string matching solutions developed in the past. Then, we present a novel mathematical model to search for a given pattern and it’s near approximates in the text.展开更多
This paper presents an efficient pattern matching algorithm (FSW). FSW improves the searching process for a pattern in a text. It scans the text with the help of four sliding windows. The windows are equal to the leng...This paper presents an efficient pattern matching algorithm (FSW). FSW improves the searching process for a pattern in a text. It scans the text with the help of four sliding windows. The windows are equal to the length of the pattern, allowing multiple alignments in the searching process. The text is divided into two parts;each part is scanned from both sides simultaneously using two sliding windows. The four windows slide in parallel in both parts of the text. The comparisons done between the text and the pattern are done from both of the pattern sides in parallel. The conducted experiments show that FSW achieves the best overall results in the number of attempts and the number of character comparisons compared to the pattern matching algorithms: Two Sliding Windows (TSW), Enhanced Two Sliding Windows algorithm (ETSW) and Berry-Ravindran algorithm (BR). The best time case is calculated and found to be??while the average case time complexity is??.展开更多
Pattern matching is a very important topic in computer science. It has been used in various applications such as information retrieval, virus scanning, DNA sequence analysis, data mining, machine learning, network sec...Pattern matching is a very important topic in computer science. It has been used in various applications such as information retrieval, virus scanning, DNA sequence analysis, data mining, machine learning, network security and pattern recognition. This paper has presented a new pattern matching algorithm—Enhanced ERS-A, which is an improvement over ERS-S algorithm. In ERS-A, two sliding windows are used to scan the text from the left and the right simultaneously. The proposed algorithm also scans the text from the left and the right simultaneously as well as making comparisons with the pattern from both sides simultaneously. The comparisons done between the text and the pattern are done from both sides in parallel. The shift technique used in the Enhanced ERS-A is the four consecutive characters in the text immediately following the pattern window. The experimental results show that the Enhanced ERS-A has enhanced the process of pattern matching by reducing the number of comparisons performed.展开更多
Anomaly detection has been an active research topic in the field of network intrusion detection for many years. A novel method is presented for anomaly detection based on system calls into the kernels of Unix or Linux...Anomaly detection has been an active research topic in the field of network intrusion detection for many years. A novel method is presented for anomaly detection based on system calls into the kernels of Unix or Linux systems. The method uses the data mining technique to model the normal behavior of a privileged program and uses a variable-length pattern matching algorithm to perform the comparison of the current behavior and historic normal behavior, which is more suitable for this problem than the fixed-length pattern matching algorithm proposed by Forrest et al. At the detection stage, the particularity of the audit data is taken into account, and two alternative schemes could be used to distinguish between normalities and intrusions. The method gives attention to both computational efficiency and detection accuracy and is especially applicable for on-line detection. The performance of the method is evaluated using the typical testing data set, and the results show that it is significantly better than the anomaly detection method based on hidden Markov models proposed by Yan et al. and the method based on fixed-length patterns proposed by Forrest and Hofmeyr. The novel method has been applied to practical hosted-based intrusion detection systems and achieved high detection performance.展开更多
An approach was proposed to evaluate preparation technology by means of fingerprint-peak matching technology of high performance liquid chromatography with diode array detector (HPLC-DAD). Similarity and hierarchica...An approach was proposed to evaluate preparation technology by means of fingerprint-peak matching technology of high performance liquid chromatography with diode array detector (HPLC-DAD). Similarity and hierarchical clustering analysis (HCA) were applied to identify the 15 batches of Xiaochaihu granules from different manufacturers and our laboratory, and peak pattern matching between the composite formulae and Radix Bupleuri Chinensis, which was one of the main ingredients of Xiaochaihu granules, was utilized to evaluate the preparation technology of Xiaochaihu granules via the indexes of the relative deviation of retention time (RT) and UV spectrum feature similarity of their corresponding peaks. Eleven matching peaks were found between Xiaochaihu granules and Radix Bupleuri Chinensis. However, the saikosaponin A and saikosaponin D, which are the important active components in Radix Bupleuri Chinensis, were not found in Xiaochaihu granules from any manufacturers. The peak areas of 11 characteristic peaks of Xiaochaihu granules samples formed a matrix of 11 × 15. The result of HCA showed that Xiaochaihu granules samples were divided into four kinds of category. Xiaochaihu granules samples from the same manufacturer were basically clustered of the same category. The results suggested that the saikosaponin A and saikosaponin D are prone to structural transformation under the condition of decoction and in the presence of the organic acidic components. These active components, existing in raw herb, might transform to a series of non-active secondary saikosaponin due to unfavourable preparation technology. So the conventional decoction-based preparation technology of Xiaochaihu granules might greatly affect its quality and therapeutic effectiveness. This study demonstrates that fingerprint-peak matching technology can not only be used for quality control of this composite formulae, but also provide some guidance for preparation technology of Xiaochaihu granules.展开更多
The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism a...The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.展开更多
This paper discusses potential application of fuzzy set theory,more specifically, pattern matching, in assessing risk in chemicalplants. Risk factors have been evaluated using linguisticrepresentations of the quantity...This paper discusses potential application of fuzzy set theory,more specifically, pattern matching, in assessing risk in chemicalplants. Risk factors have been evaluated using linguisticrepresentations of the quantity of the hazardous substance involved,its frequency of interaction with the environment, severity of itsimpact and the uncertainty involved in its detection in advance. Foreach linguistic value there is a corresponding membership functionranging over a universe of discourse. The risk scenario created by ahazard/hazardous situation having highest degree of featural value istaken as the known pattern.展开更多
In order to identify any traces of suspicious activities for the networks security, Network Traffic Analysis has been the basis of network security and network management. With the continued emergence of new applicati...In order to identify any traces of suspicious activities for the networks security, Network Traffic Analysis has been the basis of network security and network management. With the continued emergence of new applications and encrypted traffic, the currently available approaches can not perform well for all kinds of network data. In this paper, we propose a novel stream pattern matching technique which is not only easily deployed but also includes the advantages of different methods. The main idea is: first, defining a formal description specification, by which any series of data stream can be unambiguously descrbed by a special stream pattern; then a tree representation is constructed by parsing the stream pattern; at last, a stream pattern engine is constructed with the Non-t-mite automata (S-CG-NFA) and Bit-parallel searching algorithms. Our stream pattern analysis system has been fully prototyped on C programming language and Xilinx Vn-tex2 FPGA. The experimental results show the method could provides a high level of recognition efficiency and accuracy.展开更多
To solve the problem of data recovery on free disk sectors, an approach of data recovering based on intelligent pattern matching is proposed in this paper. Different from the methods based on the file directory, this ...To solve the problem of data recovery on free disk sectors, an approach of data recovering based on intelligent pattern matching is proposed in this paper. Different from the methods based on the file directory, this approach utilizes the consistency among the data on the disk. A feature pattern library is established based on different types of fries according to the internal constructions of text. Data on sectors will be classified automatically by data clustering and evaluating. When the conflict happens on data classification, the digestion will be initiated by adopting context pattern. Based on this approach, the paper achieved the data recovery system aiming at pattern matching of txt, word and PDF fries. Raw and formatting recovery tests proved that the system works well.展开更多
Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplicat...Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U.展开更多
Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given qu...Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.展开更多
By studying the algorithms of single pattern matching, five factors that have effect on time complexity of the algorithm are analyzed. The five factors are: sorting the characters of pattern string in an increasing o...By studying the algorithms of single pattern matching, five factors that have effect on time complexity of the algorithm are analyzed. The five factors are: sorting the characters of pattern string in an increasing order of using frequency, utilizing already-matched pattern suffix information, utilizing already-matched pattern prefix information, utilizing the position factor which is absorbed from quick search algorithm, and utilizing the continue-skip idea which is originally proposed by this paper. Combining all the five factors, a new single pattern matching algorithm is implemented. It's proven by the experiment that the efficiency of new algorithm is the best of all algorithms.展开更多
Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem ca...Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm's performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ )(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size.展开更多
基金Supported by the National Natural Science Foundation of China(62462036,62462037)Jiangxi Provincial Natural Science Foundation(20242BAB26017,20232BAB202010)+1 种基金Cultivation Project for Academic and Technical Leader in Major Disciplines in Jiangxi Province(20232BCJ22013)the Jiangxi Province Graduate Innovation Found Project(YC2024-S214)。
文摘Most existing multi-pattern matching algorithms are designed for single English texts leading to issues such as missed matches and space expansion when applied to Chinese-English mixed-text environments.The Hash Trie-based matching machine demonstrates strong compatibility with both Chinese and English,ensuring high accuracy in text processing and subtree positioning.In this study,a novel functional framework based on the HashTrie structure is proposed and mechanically verified using Isabelle/HOL.This framework is applied to design Functional Multi-Pattern Matching(FMPM),the first functional multi-pattern matching algorithm for Chinese-English mixed texts.FMPM constructs the HashTrie matching machine using character codes and threads the machine according to the associations between pattern strings.The experimental results show that as the stored string information increases,the proposed algorithm demonstrates more significant optimization in retrieval efficiency.FMPM simplifies the implementation of the Threaded Hash Trie(THT)for Chinese-English mixed texts,effectively reducing the uncertainties in the transition from the algorithm description to code implementation.FMPM addresses the problem of space explosion Chinese-English mixed texts and avoids issues such as bound variable iteration errors.The functional framework of the HashTrie structure serves as a reference for the formal verification of future HashTrie-based algorithms.
文摘BACKGROUND Drug utilization research has an important role in assisting the healthcare administration to know,compute,and refine the prescription whose principal objective is to enable the rational use of drugs.Research in developing nations relating to the cost of treatment is scarce when compared with developed countries.Thus,the drug utilization research studies from developing nations are most needed,and their number has been growing.AIM To evaluate patterns of utilization of antipsychotic drugs and direct medical cost analysis in patients newly diagnosed with schizophrenia.METHODS The present study was observational in type and based on a retrospective cohort to evaluate patterns of utilization of antipsychotic drugs using World Health Organization(WHO)core prescribing indicators and anatomical therapeutic chemical/defined daily dose indicators.We also calculated direct medical costs for a period of 6 months.RESULTS This study has found that atypical antipsychotics are the mainstay of treatment for schizophrenia in every age group and subcategories of schizophrenia.The evaluation based on WHO prescribing indicators showed a low average number of drugs per prescription and low prescribing frequency of antipsychotics from the National List of Essential Medicines 2015 and the WHO Essential Medicines List 2019.The total mean drug cost of our study was 1396 Indian rupees.The total mean cost due to the investigation in our study was 1017.34 Indian rupees.Therefore,the total mean direct medical cost incurred on patients in our study was 4337.28 Indian rupees.CONCLUSION The information from the present study can be used for reviewing and updating treatment policy at the institutional level.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1A2C1011216)。
文摘In this study,a machine vision-based pattern matching technique was applied to estimate the location of an autonomous driving robot and perform 3D tunnel mapping in an underground mine environment.The autonomous driving robot continuously detects the wall of the tunnel in the horizontal direction using the light detection and ranging(Li DAR)sensor and performs pattern matching by recognizing the shape of the tunnel wall.The proposed method was designed to measure the heading of the robot by fusion with the inertial measurement units sensor according to the pattern matching accuracy;it is combined with the encoder sensor to estimate the location of the robot.In addition,when the robot is driving,the vertical direction of the underground mine is scanned through the vertical Li DAR sensor and stacked to create a 3D map of the underground mine.The performance of the proposed method was superior to that of previous studies;the mean absolute error achieved was 0.08 m for the X-Y axes.A root mean square error of 0.05 m^(2)was achieved by comparing the tunnel section maps that were created by the autonomous driving robot to those of manual surveying.
基金supported in part by National Natural Science Foundation of China(61671078)the Director Funds of Beijing Key Laboratory of Network System Architecture and Convergence(2017BKL-NSACZJ-06)
文摘The rapid development of mobile network brings opportunities for researchers to analyze user behaviors based on largescale network traffic data. It is important for Internet Service Providers(ISP) to optimize resource allocation and provide customized services to users. The first step of analyzing user behaviors is to extract information of user actions from HTTP traffic data by multi-pattern URL matching. However, the efficiency is a huge problem when performing this work on massive network traffic data. To solve this problem, we propose a novel and accurate algorithm named Multi-Pattern Parallel Matching(MPPM) that takes advantage of HashMap in data searching for extracting user behaviors from big network data more effectively. Extensive experiments based on real-world traffic data prove the ability of MPPM algorithm to deal with massive HTTP traffic with better performance on accuracy, concurrency and efficiency. We expect the proposed algorithm and it parallelized implementation would be a solid base to build a high-performance analysis engine of user behavior based on massive HTTP traffic data processing.
基金This project was supported by the National "863" High Technology Research and Development Program of China(2003AA142160) and the National Natural Science Foundation of China (60402019)
文摘The traditional multiple pattern matching algorithm, deterministic finite state automata, is implemented by tree structure. A new algorithm is proposed by substituting sequential binary tree for traditional tree. It is proved by experiment that the algorithm has three features, its construction process is quick, its cost of memory is small. At the same time, its searching process is as quick as the traditional algorithm. The algorithm is suitable for the application which requires preprocessing the patterns dynamically.
文摘Most of the Point Pattern Matching (PPM) algorithm performs poorly when the noise of the point's position and outliers exist. This paper presents a novel and robust PPM algorithm which combined Point Pair Topological Characteristics (PPTC) and Spectral Matching (SM) together to solve the afore mentioned issues. In which PPTC, a new shape descriptor, is firstly proposed. A new comparability measurement based on PPTC is defined as the matching probability. Finally, the correct matching results are achieved by the spectral matching method. The synthetic data experiments show its robustness by comparing with the other state-of-art algorithms and the real world data experiments show its effectiveness.
基金supported by China MOST project (No.2012BAH46B04)
文摘Pattern matching is a fundamental approach to detect malicious behaviors and information over Internet, which has been gradually used in high-speed network traffic analysis. However, there is a performance bottleneck for multi-pattern matching on online compressed network traffic(CNT), this is because malicious and intrusion codes are often embedded into compressed network traffic. In this paper, we propose an online fast and multi-pattern matching algorithm on compressed network traffic(FMMCN). FMMCN employs two types of jumping, i.e. jumping during sliding window and a string jump scanning strategy to skip unnecessary compressed bytes. Moreover, FMMCN has the ability to efficiently process multiple large volume of networks such as HTTP traffic, vehicles traffic, and other Internet-based services. The experimental results show that FMMCN can ignore more than 89.5% of bytes, and its maximum speed reaches 176.470MB/s in a midrange switches device, which is faster than the current fastest algorithm ACCH by almost 73.15 MB/s.
文摘Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many other applications highly depend on pattern matching or similarity searches. In this paper, we discuss some of the string matching solutions developed in the past. Then, we present a novel mathematical model to search for a given pattern and it’s near approximates in the text.
文摘This paper presents an efficient pattern matching algorithm (FSW). FSW improves the searching process for a pattern in a text. It scans the text with the help of four sliding windows. The windows are equal to the length of the pattern, allowing multiple alignments in the searching process. The text is divided into two parts;each part is scanned from both sides simultaneously using two sliding windows. The four windows slide in parallel in both parts of the text. The comparisons done between the text and the pattern are done from both of the pattern sides in parallel. The conducted experiments show that FSW achieves the best overall results in the number of attempts and the number of character comparisons compared to the pattern matching algorithms: Two Sliding Windows (TSW), Enhanced Two Sliding Windows algorithm (ETSW) and Berry-Ravindran algorithm (BR). The best time case is calculated and found to be??while the average case time complexity is??.
文摘Pattern matching is a very important topic in computer science. It has been used in various applications such as information retrieval, virus scanning, DNA sequence analysis, data mining, machine learning, network security and pattern recognition. This paper has presented a new pattern matching algorithm—Enhanced ERS-A, which is an improvement over ERS-S algorithm. In ERS-A, two sliding windows are used to scan the text from the left and the right simultaneously. The proposed algorithm also scans the text from the left and the right simultaneously as well as making comparisons with the pattern from both sides simultaneously. The comparisons done between the text and the pattern are done from both sides in parallel. The shift technique used in the Enhanced ERS-A is the four consecutive characters in the text immediately following the pattern window. The experimental results show that the Enhanced ERS-A has enhanced the process of pattern matching by reducing the number of comparisons performed.
基金supported by the National Grand Fundamental Research "973" Program of China (2004CB318109)the National High-Technology Research and Development Plan of China (2006AA01Z452)the National Information Security "242"Program of China (2005C39).
文摘Anomaly detection has been an active research topic in the field of network intrusion detection for many years. A novel method is presented for anomaly detection based on system calls into the kernels of Unix or Linux systems. The method uses the data mining technique to model the normal behavior of a privileged program and uses a variable-length pattern matching algorithm to perform the comparison of the current behavior and historic normal behavior, which is more suitable for this problem than the fixed-length pattern matching algorithm proposed by Forrest et al. At the detection stage, the particularity of the audit data is taken into account, and two alternative schemes could be used to distinguish between normalities and intrusions. The method gives attention to both computational efficiency and detection accuracy and is especially applicable for on-line detection. The performance of the method is evaluated using the typical testing data set, and the results show that it is significantly better than the anomaly detection method based on hidden Markov models proposed by Yan et al. and the method based on fixed-length patterns proposed by Forrest and Hofmeyr. The novel method has been applied to practical hosted-based intrusion detection systems and achieved high detection performance.
基金supported by the Fundamental Research Funds for the Central Universities
文摘An approach was proposed to evaluate preparation technology by means of fingerprint-peak matching technology of high performance liquid chromatography with diode array detector (HPLC-DAD). Similarity and hierarchical clustering analysis (HCA) were applied to identify the 15 batches of Xiaochaihu granules from different manufacturers and our laboratory, and peak pattern matching between the composite formulae and Radix Bupleuri Chinensis, which was one of the main ingredients of Xiaochaihu granules, was utilized to evaluate the preparation technology of Xiaochaihu granules via the indexes of the relative deviation of retention time (RT) and UV spectrum feature similarity of their corresponding peaks. Eleven matching peaks were found between Xiaochaihu granules and Radix Bupleuri Chinensis. However, the saikosaponin A and saikosaponin D, which are the important active components in Radix Bupleuri Chinensis, were not found in Xiaochaihu granules from any manufacturers. The peak areas of 11 characteristic peaks of Xiaochaihu granules samples formed a matrix of 11 × 15. The result of HCA showed that Xiaochaihu granules samples were divided into four kinds of category. Xiaochaihu granules samples from the same manufacturer were basically clustered of the same category. The results suggested that the saikosaponin A and saikosaponin D are prone to structural transformation under the condition of decoction and in the presence of the organic acidic components. These active components, existing in raw herb, might transform to a series of non-active secondary saikosaponin due to unfavourable preparation technology. So the conventional decoction-based preparation technology of Xiaochaihu granules might greatly affect its quality and therapeutic effectiveness. This study demonstrates that fingerprint-peak matching technology can not only be used for quality control of this composite formulae, but also provide some guidance for preparation technology of Xiaochaihu granules.
文摘The identification of design pattern instances is important for program understanding and software maintenance. Aiming at the mining of design patterns in existing systems, this paper proposes a subgraph isomorphism approach to discover several design patterns in a legacy system at a time. The attributed relational graph is used to describe design patterns and legacy systems. The sub-graph isomorphism approach consists of decomposition and composition process. During the decomposition process, graphs corresponding to the design patterns are decomposed into sub-graphs, some of which are graphs corresponding to the elemental design patterns. The composition process tries to get sub-graph isomorphism of the matched graph if sub-graph isomorphism of each subgraph is obtained. Due to the common structures between design patterns, the proposed approach can reduce the matching times of entities and relations. Compared with the existing methods, the proposed algorithm is not linearly dependent on the number of design pattern graphs. Key words design pattern mining - attributed relational graph - subgraph isomorphism CLC number TP 311.5 Foundation item: Supported by the National Natural Science Foundation of China (60273075) and the Science Foundation of Naval University of Engineering (HGDJJ03019)Biography: LI Qing-hua (1940-), male, Professor, research direction: parallel computing.
文摘This paper discusses potential application of fuzzy set theory,more specifically, pattern matching, in assessing risk in chemicalplants. Risk factors have been evaluated using linguisticrepresentations of the quantity of the hazardous substance involved,its frequency of interaction with the environment, severity of itsimpact and the uncertainty involved in its detection in advance. Foreach linguistic value there is a corresponding membership functionranging over a universe of discourse. The risk scenario created by ahazard/hazardous situation having highest degree of featural value istaken as the known pattern.
基金This work is supported by the following projects: National Natural Science Foundation of China grant 60772136, 111 Development Program of China NO.B08038, National Science & Technology Pillar Program of China NO.2008BAH22B03 and NO. 2007BAH08B01.
文摘In order to identify any traces of suspicious activities for the networks security, Network Traffic Analysis has been the basis of network security and network management. With the continued emergence of new applications and encrypted traffic, the currently available approaches can not perform well for all kinds of network data. In this paper, we propose a novel stream pattern matching technique which is not only easily deployed but also includes the advantages of different methods. The main idea is: first, defining a formal description specification, by which any series of data stream can be unambiguously descrbed by a special stream pattern; then a tree representation is constructed by parsing the stream pattern; at last, a stream pattern engine is constructed with the Non-t-mite automata (S-CG-NFA) and Bit-parallel searching algorithms. Our stream pattern analysis system has been fully prototyped on C programming language and Xilinx Vn-tex2 FPGA. The experimental results show the method could provides a high level of recognition efficiency and accuracy.
文摘To solve the problem of data recovery on free disk sectors, an approach of data recovering based on intelligent pattern matching is proposed in this paper. Different from the methods based on the file directory, this approach utilizes the consistency among the data on the disk. A feature pattern library is established based on different types of fries according to the internal constructions of text. Data on sectors will be classified automatically by data clustering and evaluating. When the conflict happens on data classification, the digestion will be initiated by adopting context pattern. Based on this approach, the paper achieved the data recovery system aiming at pattern matching of txt, word and PDF fries. Raw and formatting recovery tests proved that the system works well.
文摘Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U.
文摘Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.
基金the National Natural Science Foundation of China (Nos. 60502032 and 60672068)
文摘By studying the algorithms of single pattern matching, five factors that have effect on time complexity of the algorithm are analyzed. The five factors are: sorting the characters of pattern string in an increasing order of using frequency, utilizing already-matched pattern suffix information, utilizing already-matched pattern prefix information, utilizing the position factor which is absorbed from quick search algorithm, and utilizing the continue-skip idea which is originally proposed by this paper. Combining all the five factors, a new single pattern matching algorithm is implemented. It's proven by the experiment that the efficiency of new algorithm is the best of all algorithms.
基金Supported by the European Framework Program(FP7)(FP7-PEOPLE-2011-IRSES)the National Sci-Tech Support Plan of China(2014BAH02F03)
文摘Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm's performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ )(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size.