期刊文献+
共找到443篇文章
< 1 2 23 >
每页显示 20 50 100
Parallel Quick Search Algorithm for the Exact String Matching Problem Using OpenMP
1
作者 Sinan Sameer Mahmood Al-Dabbagh Nawaf Hazim Barnouti +1 位作者 Mustafa Abdul Sahib Naser Zaid G. Ali 《Journal of Computer and Communications》 2016年第13期1-11,共11页
String matching is seen as one of the essential problems in computer science. A variety of computer applications provide the string matching service for their end users. The remarkable boost in the number of data that... String matching is seen as one of the essential problems in computer science. A variety of computer applications provide the string matching service for their end users. The remarkable boost in the number of data that is created and kept by modern computational devices influences researchers to obtain even more powerful methods for coping with this problem. In this research, the Quick Search string matching algorithm are adopted to be implemented under the multi-core environment using OpenMP directive which can be employed to reduce the overall execution time of the program. English text, Proteins and DNA data types are utilized to examine the effect of parallelization and implementation of Quick Search string matching algorithm on multi-core based environment. Experimental outcomes reveal that the overall performance of the mentioned string matching algorithm has been improved, and the improvement in the execution time which has been obtained is considerable enough to recommend the multi-core environment as the suitable platform for parallelizing the Quick Search string matching algorithm. 展开更多
关键词 string matching Pattern matching string Searching ALGORITHMS Quick Search Algorithm Exact string matching Algorithm ? Parallelization OPENMP
在线阅读 下载PDF
A Mathematical Solution to String Matching for Big Data Linking 被引量:1
2
作者 Kevin McCormack Mary Smyth 《Journal of Statistical Science and Application》 2017年第2期39-55,共17页
This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both t... This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here. 展开更多
关键词 Big Data Data Linking Identity Correlation Approach string matching Public Sector Datasets DataPrivacy.
在线阅读 下载PDF
Energy Cost Minimization Using String Matching Algorithm in Geo-Distributed Data Centers
3
作者 Muhammad Imran Khan Khalil Syed Adeel Ali Shah +3 位作者 Izaz Ahmad Khan Mohammad Hijji Muhammad Shiraz Qaisar Shaheen 《Computers, Materials & Continua》 SCIE EI 2023年第6期6305-6322,共18页
Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due ... Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques. 展开更多
关键词 string matching OPTIMIZATION geo-distributed data centers geographical load balancing green energy
在线阅读 下载PDF
Screen Content Coding with Primary and Secondary Reference Buffers for String Matching and Copying
4
作者 Tao Lin Kailun Zhou Liping Zhao 《ZTE Communications》 2015年第4期53-60,共8页
A screen content coding (SCC) algorithm that uses a primary reference buffer (PRB) and a secondary reference buffer (SRB) for string matching and string copying is proposed. PRB is typically the traditional reco... A screen content coding (SCC) algorithm that uses a primary reference buffer (PRB) and a secondary reference buffer (SRB) for string matching and string copying is proposed. PRB is typically the traditional reconstructed picture buffer which provides reference string pixels for the current pixels being coded. SRB stores a few of recently and frequently referenced pixels for repetitive reference by the current pixels being coded. In the encoder, searching of optimal reference string is performed in both PRB and SRB, and either a PRB or SRB string is selected as an optimal reference string on a string-by-string basis. Compared with HM-16.4+SCM-40 reference software, the proposed SCC algorithm can improve coding performance measured by bit-distortion rate reduction of average 4.19% in all-intra configuration for text and graphics with motion category' of test sequences defined by JCT-VC common test condition. 展开更多
关键词 HEVC hnage Coding Screen Content Coding string matching Video Coding
在线阅读 下载PDF
Memory Efficient String Matching Algorithm for Network Intrusion Management System 被引量:9
5
作者 余建明 薛一波 李军 《Tsinghua Science and Technology》 SCIE EI CAS 2007年第5期585-593,共9页
As the core algorithm and the most time consuming part of almost every modern network intrusion management system (NIMS), string matching is essential for the inspection of network flows at the line speed. This pape... As the core algorithm and the most time consuming part of almost every modern network intrusion management system (NIMS), string matching is essential for the inspection of network flows at the line speed. This paper presents a memory and time efficient string matching algorithm specifically designed for NIMS on commodity processors. Modifications of the Aho-Corasick (AC) algorithm based on the distribution characteristics of NIMS patterns drastically reduce the memory usage without sacrificing speed in software implementations. In tests on the Snort pattern set and traces that represent typical NIMS workloads, the Snort performance was enhanced 1.48%-20% compared to other well-known alternatives with an automaton size reduction of 4.86-6.11 compared to the standard AC implementation. The results show that special characteristics of the NIMS can be used into a very effective method to optimize the algorithm design. 展开更多
关键词 string matching network intrusion management system (NIMS) Aho-Corasick (AC) algorithm
原文传递
Fast algorithm on string cross pattern matching
6
作者 LiuGongshen LiJianhua LiShenghong 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第1期179-186,共8页
Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplicat... Given a set U which is consisted of strings defined on alphabet Σ, string cross pattern matching is to find all the matches between every two strings in U. It is utilized in text processing like removing the duplication of strings. This paper presents a fast string cross pattern matching algorithm based on extracting high frequency strings. Compared with existing algorithms including single-pattern algorithms and multi-pattern matching algorithms, this algorithm is featured by both low time complexity and low space complexity. Because Chinese alphabet is large and the average length of Chinese words is much short, this algorithm is more suitable to process the text written by Chinese, especially when the size of Σ is large and the number of strings is far more than the maximum length of strings of set U. 展开更多
关键词 pattern matching high frequency string string cross pattern matching.
在线阅读 下载PDF
CUSMART:effective parallelization of stringmatching algorithms using GPGPU accelerators
7
作者 Adnan OZSOY Mengu NAZLI +1 位作者 Onur CANKUR Cagri SAHIN 《Frontiers of Information Technology & Electronic Engineering》 2025年第6期877-895,共19页
This study presents a parallel version of the string matching algorithms research tool(SMART)library,implemented on NVIDIA’s compute unified device architecture(CUDA)platform,and uses general-purpose computing on gra... This study presents a parallel version of the string matching algorithms research tool(SMART)library,implemented on NVIDIA’s compute unified device architecture(CUDA)platform,and uses general-purpose computing on graphics processing unit(GPGPU)programming concepts to enhance performance and gain insight into the parallel versions of these algorithms.We have developed the CUDA-enhanced SMART(CUSMART)library,which incorporates parallelized iterations of 64 string matching algorithms,leveraging the CUDA application programming interface.The performance of these algorithms has been assessed across various scenarios to ensure a comprehensive and impartial comparison,allowing for the identification of their strengths and weaknesses in specific application contexts.We have explored and established optimization techniques to gauge their influence on the performance of these algorithms.The results of this study highlight the potential of GPGPU computing in string matching applications through the scalability of algorithms,suggesting significant performance improvements.Furthermore,we have identified the best and worst performing algorithms in various scenarios. 展开更多
关键词 string matching Parallel programming Graphics processing unit(GPU)programming General-purpose computing on GPU(GPGPU) NVIDIA Compute unified device architecture(CUDA) string matching algorithms research tool(SMART)
原文传递
Surface reconstruction of complex contour lines based on chain code matching technique 被引量:1
8
作者 姜晓彤 《Journal of Southeast University(English Edition)》 EI CAS 2005年第4期432-435,共4页
A new method for solving the tiling problem of surface reconstruction is proposed. The proposed method uses a snake algorithm to segment the original images, the contours are then transformed into strings by Freeman'... A new method for solving the tiling problem of surface reconstruction is proposed. The proposed method uses a snake algorithm to segment the original images, the contours are then transformed into strings by Freeman' s code. Symbolic string matching technique is applied to establish a correspondence between the two consecutive contours. The surface is composed of the pieces reconstructed from the correspondence points. Experimental results show that the proposed method exhibits a good behavior for the quality of surface reconstruction and its time complexity is proportional to mn where m and n are the numbers of vertices of the two consecutive slices, respectively. 展开更多
关键词 chain code string matching surface reconstruction local shape feature
在线阅读 下载PDF
一种计算存储设备中的字符串并行匹配算法
9
作者 张东阳 刘东石 +2 位作者 苏攀 马玉梅 王其乐 《计算机技术与发展》 2025年第8期25-35,共11页
传统的字符串匹配算法在遭遇最不利情况时时间消耗显著攀升,成为性能瓶颈,此外还往往伴随大量数据的频繁迁移与操作,当面临数据密集型应用和输入输出(IO)性能限制时,其局限性愈发凸显。针对传统字符串匹配解决方案中的数据移动量大、最... 传统的字符串匹配算法在遭遇最不利情况时时间消耗显著攀升,成为性能瓶颈,此外还往往伴随大量数据的频繁迁移与操作,当面临数据密集型应用和输入输出(IO)性能限制时,其局限性愈发凸显。针对传统字符串匹配解决方案中的数据移动量大、最差情况下的性能瓶颈等问题,提出了基于计算存储设备(Computational Storage Device,CSD)的解决方法。该方法通过在存储器内部部署嵌入式处理引擎,将计算移动到存储端,大幅减少了数据在处理单元和存储单元之间的传输,从而显著提升了整体计算效率。将现场可编程门阵列(Field Programmable Gate Array,FPGA)作为CSD嵌入式处理引擎,利用其并行处理能力,设计了一种高效的精确字符串并行匹配算法。在FPGA读取数据的同时,完成字符串匹配工作,消除了字符串匹配过程中的额外时间开销。实验结果表明,基于CSD的解决方法展现出了显著的性能优势,为大数据环境下的字符串匹配问题提供了一种新的解决方案。 展开更多
关键词 字符串匹配 计算存储设备 现场可编程门阵列 并行 算法
在线阅读 下载PDF
考虑重合字符位次差异的地名相似性度量方法
10
作者 姜宇荣 高苏 +1 位作者 蔡忠亮 王巧 《武汉大学学报(信息科学版)》 北大核心 2025年第7期1425-1434,共10页
当前中文地名匹配常见的相似性度量或是只考虑重合字符的数量未考虑位次性,或是体现位次性但未考虑重合字符。通过考虑两个中文地名字符串的重合字符对应关系及其位次差距,构建了一种新的距离度量和相似性度量,能够结合重合字符和位次... 当前中文地名匹配常见的相似性度量或是只考虑重合字符的数量未考虑位次性,或是体现位次性但未考虑重合字符。通过考虑两个中文地名字符串的重合字符对应关系及其位次差距,构建了一种新的距离度量和相似性度量,能够结合重合字符和位次性两个因素计算两个地名的偏距和偏距相似度。针对重合字符复现的情形确定偏距最小原则,设计全体顺次匹配方案;针对字符片段偏移的情形调整距离度量,从而更符合两个地名相似性的直观认知。该距离度量满足正定性和对称性,但不满足三角不等式。与Jaccard系数和编辑距离相似度的测试对比结果表明,所提偏距算法对相似度刻画更为细致,能够检测到重合字符位次差异但更重视未重合字符的差异;在地名匹配实验中匹配正确率和运行时间分别为63.64%和2940.56 s,两项指标均优于Jaccard系数和编辑距离相似度。 展开更多
关键词 相似性度量 近似字符串匹配 自然语言处理 地名匹配 中文地名
原文传递
A Novel Mathematical Model for Similarity Search in Pattern Matching Algorithms 被引量:1
11
作者 P. Vinod-Prasad 《Journal of Computer and Communications》 2020年第9期94-99,共6页
Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many... Modern applications require large databases to be searched for regions that are similar to a given pattern. The DNA sequence analysis, speech and text recognition, artificial intelligence, Internet of Things, and many other applications highly depend on pattern matching or similarity searches. In this paper, we discuss some of the string matching solutions developed in the past. Then, we present a novel mathematical model to search for a given pattern and it’s near approximates in the text. 展开更多
关键词 string matching Pattern matching Similarity Search Substring Search
在线阅读 下载PDF
Multi-Pattern Matching Algorithm with Wildcards Based on Bit-Parallelism
12
作者 Ahmed A. F. Saif HU Liang CHU Jianfeng 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2017年第2期178-184,共7页
Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem ca... Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p^1,… ,p^k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm's performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ )(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size. 展开更多
关键词 multi-pattern string matching WILDCARD bitparallelism
原文传递
StringB-tree在软件复用中的应用研究
13
作者 姚全珠 罗亚红 孙越 《计算机工程与应用》 CSCD 北大核心 2004年第29期166-168,共3页
提出了将StringB-tree用于解决软件复用中的参数化样式匹配问题(parameterizedpatternmatching)。通过对参数化字符串做一个变换,使用StringB-tree这种特殊的数据结构可提高匹配效率。文章的重点有两部分,一个是介绍了StringB-tree这种... 提出了将StringB-tree用于解决软件复用中的参数化样式匹配问题(parameterizedpatternmatching)。通过对参数化字符串做一个变换,使用StringB-tree这种特殊的数据结构可提高匹配效率。文章的重点有两部分,一个是介绍了StringB-tree这种特殊的数据结构的优点及其构建过程;另一个是讲怎样利用StringB-tree解决参数化样式匹配问题。 展开更多
关键词 字符串平衡树参数化字符串 参数化样式匹配 P-匹配 P-出现
在线阅读 下载PDF
Improving Classification Performance with Single-category Concept Match
14
作者 尹中航 Wang +4 位作者 Yongcheng Song Juping Cai Wei 《High Technology Letters》 EI CAS 2001年第4期20-22,共3页
Discarding more and more complicated algorithms, this paper presents a new classification algorithm with single category concept match. It also introduces the method to find such concepts, which is important to the al... Discarding more and more complicated algorithms, this paper presents a new classification algorithm with single category concept match. It also introduces the method to find such concepts, which is important to the algorithm. Experiment results show that it can improve classification precision and accelerate classification speed to some extent. 展开更多
关键词 Subject concept string match Information processing
在线阅读 下载PDF
基于q-gram改进的多字节字符串精确匹配算法
15
作者 刘荔 庞俊奇 +1 位作者 张磊 谭秋林 《计算机仿真》 2025年第8期307-311,共5页
随着互联网时代数据流量的高速增长,作为网络安全重要技术的多模式匹配算法需要更加精确快速的识别网络入侵。为了提高多模式匹配算法的匹配效率、获得更大的移位距离,设计了一种基于q-gram改进的多字节精确字符串匹配算法,通过改进哈... 随着互联网时代数据流量的高速增长,作为网络安全重要技术的多模式匹配算法需要更加精确快速的识别网络入侵。为了提高多模式匹配算法的匹配效率、获得更大的移位距离,设计了一种基于q-gram改进的多字节精确字符串匹配算法,通过改进哈希表的元素结构,去除了字符串前缀和后缀的跳转限制。又引入滑动窗口提取待匹配文本字符块,将多模式匹配算法转换为并行的多个单模式匹配算法,减少了匹配的时间成本和计算成本。经过理论分析与实验验证,上述算法准确率达到100%且受到模式串长度、文本字符集等因素的影响更小,在匹配效率、准确率和稳定性方面均取得了显著的改进。 展开更多
关键词 多字节片段 字典树 滑动窗口 字符串精确匹配
在线阅读 下载PDF
A Fast Pattern Matching Algorithm Using Changing Consecutive Characters
16
作者 Amjad Hudaib Dima Suleiman Arafat Awajan 《Journal of Software Engineering and Applications》 2016年第8期399-411,共13页
Pattern matching is a very important algorithm used in many applications such as search engine and DNA analysis. They are aiming to find a pattern in a text. This paper proposes a Pattern Matching Algorithm Using Chan... Pattern matching is a very important algorithm used in many applications such as search engine and DNA analysis. They are aiming to find a pattern in a text. This paper proposes a Pattern Matching Algorithm Using Changing Consecutive Characters (PMCCC) to make the searching pro- cess of the algorithm faster. PMCCC enhances the shift process that determines how the pattern moves in case of the occurrence of the mismatch between the pattern and the text. It enhances the Berry Ravindran (BR) shift function by using m consecutive characters where m is the pattern length. The formal basis and the algorithms are presented. The experimental results show that PMCCC made enhancements in searching process by reducing the number of comparisons and the number of attempts. Comparing the results of PMCCC with other related algorithms has shown significant enhancements in average number of comparisons and average number of attempts. 展开更多
关键词 PATTERN Pattern matching Algorithms string matching Berry Ravindran EBR RS-A Fast Pattern matching Algorithms
在线阅读 下载PDF
基于局部标签树匹配的网页相似度去重算法
17
作者 邱紫韵 《西安文理学院学报(自然科学版)》 2025年第2期16-21,共6页
当搜索引擎进行网页相似度去重时,可能会使相似的内容被合并为一个链接,导致搜索结果的精确率降低,用户难以找到真正希望找到的页面.为优化搜索引擎、提高用户体验,提出基于局部标签树匹配的网页相似度去重算法.利用爬虫技术抓取网络中... 当搜索引擎进行网页相似度去重时,可能会使相似的内容被合并为一个链接,导致搜索结果的精确率降低,用户难以找到真正希望找到的页面.为优化搜索引擎、提高用户体验,提出基于局部标签树匹配的网页相似度去重算法.利用爬虫技术抓取网络中的网页内容,将这些网页的HTML结构解析成标签树的形式,每个节点代表一个HTML标签.针对每个标签节点,采用基于词频-逆文本频率算法筛选关键词,并利用关键词提取关键句,从而获取网页局部标签树的特征串.采用LCS算法遍历整个局部标签树,计算不同标签节点之间的相似度,剔除相似度较高的网页,即可完成网页相似度去重.经实验验证:该算法可以高效率地完成特征串提取,在去重时F值与召回率均保持在95%以上,可以有效地实现网页相似度去重工作. 展开更多
关键词 局部标签树匹配 网页相似度去重 LCS算法 特征串
在线阅读 下载PDF
State of the Art for String Analysis and Pattern Search Using CPU and GPU Based Programming
18
作者 Mario Góngora-Blandón Miguel Vargas-Lombardo 《Journal of Information Security》 2012年第4期314-318,共5页
String matching algorithms are an important piece in the network intrusion detection systems. In these systems, the chain coincidence algorithms occupy more than half the CPU process time. The GPU technology has showe... String matching algorithms are an important piece in the network intrusion detection systems. In these systems, the chain coincidence algorithms occupy more than half the CPU process time. The GPU technology has showed in the past years to have a superior performance on these types of applications than the CPU. In this article we perform a review of the state of the art of the different string matching algorithms used in network intrusion detection systems;and also some research done about CPU and GPU on this area. 展开更多
关键词 GPU string matchING PATTERN matchING
在线阅读 下载PDF
基于藏文音节结构的单模式匹配算法 被引量:2
19
作者 张学通 彭展 《计算机仿真》 2024年第8期374-378,共5页
字符比较次数和失配后的跳转长度是影响模式(字符串)匹配算法效率的两个关键因素。BM算法是最经典的单模式字符串匹配算法之一。在长度为m的模式串中,其失配后的最大跳转长度为m,但是在应用于藏文环境时无意义字符比较次数较多、最大跳... 字符比较次数和失配后的跳转长度是影响模式(字符串)匹配算法效率的两个关键因素。BM算法是最经典的单模式字符串匹配算法之一。在长度为m的模式串中,其失配后的最大跳转长度为m,但是在应用于藏文环境时无意义字符比较次数较多、最大跳转长度还可增加。针对以上两点不足并结合藏文的文字特征,提出一种基于藏文音节点的单模式匹配算法:BM-Tibetan算法。算法采用“先对齐,再匹配”的思想以减少匹配次数,失配后的最大跳转距离为m+k(2≤k≤8)。实验结果表明,BM-Tibetan算法的字符比较次数和模式串的移动次数均少于BM算法,算法性能有一定优势。 展开更多
关键词 字符串匹配 单模式匹配算法 藏文
在线阅读 下载PDF
一种基于编辑距离的中文字符串近似匹配算法
20
作者 王昭 薛晨浩 裴卓雄 《山西电子技术》 2024年第4期43-45,共3页
字符串近似匹配是模式匹配领域中的一个重要研究方向。在中文字符串近似匹配中,基于字符操作的编辑距离不能准确衡量由复制、剪贴等操作导致的相似关系。基于此,在传统编辑距离的基础上引入了字符串的平移和复制操作,给出了一种在贪心... 字符串近似匹配是模式匹配领域中的一个重要研究方向。在中文字符串近似匹配中,基于字符操作的编辑距离不能准确衡量由复制、剪贴等操作导致的相似关系。基于此,在传统编辑距离的基础上引入了字符串的平移和复制操作,给出了一种在贪心算法基础上进行动态规划搜索的计算方法,能有效计算改进的编辑距离,在真实数据集上的实验结果和分析显示了对文本检索的有效性。 展开更多
关键词 字符串匹配 近似匹配 动态规划算法 编辑距离
在线阅读 下载PDF
上一页 1 2 23 下一页 到第
使用帮助 返回顶部