期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
On-line linear time construction of sequential binary suffix trees
1
作者 Lai Huoyao Liu Gongshen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2009年第5期1104-1110,共7页
Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which... Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Ukkonen algorithm is deeply investigated and a new algorithm, which decreases the number of memory operations in construction and keeps the result tree sequential, is proposed. The experiment result shows that both the construction and the matching procedure are more efficient than Ukkonen algorithm. 展开更多
关键词 suffix tree SEQUENTIAL linear time construction.
在线阅读 下载PDF
Fault Tolerant Suffix Trees
2
作者 Iftikhar Ahmad Syed Zulfiqar Ali Shah +5 位作者 Ambreen Shahnaz Sadeeq Jan Salma Noor Wajeeha Khalil Fazal Qudus Khan Muhammad Iftikhar Khan 《Computers, Materials & Continua》 SCIE EI 2021年第1期157-164,共8页
Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large an... Classical algorithms and data structures assume that the underlying memory is reliable,and the data remain safe during or after processing.However,the assumption is perilous as several studies have shown that large and inexpensive memories are vulnerable to bit flips.Thus,the correctness of output of a classical algorithm can be threatened by a few memory faults.Fault tolerant data structures and resilient algorithms are developed to tolerate a limited number of faults and provide a correct output based on the uncorrupted part of the data.Suffix tree is one of the important data structures that has widespread applications including substring search,super string problem and data compression.The fault tolerant version of the suffix tree presented in the literature uses complex techniques of encodable and decodable error-correcting codes,blocked data structures and fault-resistant tries.In this work,we use the natural approach of data replication to develop a fault tolerant suffix tree based on the faulty memory random access machine model.The proposed data structure stores copies of the indices to sustain memory faults injected by an adversary.We develop a resilient version of the Ukkonen’s algorithm for constructing the fault tolerant suffix tree and derive an upper bound on the number of corrupt suffixes. 展开更多
关键词 Resilient data structures fault tolerant data structures suffix tree
在线阅读 下载PDF
A Chinese Web Page Clustering Algorithm Based on the Suffix Tree 被引量:4
3
作者 YANGJian-wu 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期817-822,共6页
In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction p... In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining. 展开更多
关键词 CLUSTERING suffix tree Web mining
在线阅读 下载PDF
ISTC:A New Method for Clustering Search Results 被引量:2
4
作者 ZHANG Wei XU Baowen +1 位作者 ZHANG Weifeng XU Junling 《Wuhan University Journal of Natural Sciences》 CAS 2008年第4期501-504,共4页
A new common phrase scoring method is proposed according to term frequency-inverse document frequency(TFIDF)and independence of the phrase.Combining the two properties can help identify more reasonable common phrases,... A new common phrase scoring method is proposed according to term frequency-inverse document frequency(TFIDF)and independence of the phrase.Combining the two properties can help identify more reasonable common phrases,which improve the accuracy of clustering.Also,the equation to measure the in-dependence of a phrase is proposed in this paper.The new algorithm which improves suffix tree clustering algorithm(STC)is named as improved suffix tree clustering(ISTC).To validate the proposed algorithm,a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine.Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering. 展开更多
关键词 Web search results clustering suffix tree term frequency-inverse document frequency(TFIDF) independence of phrases
在线阅读 下载PDF
Research on Extraction Method for Taxonomic Relation among Conceptions of Tea-science Field Ontology
5
作者 童波 《Agricultural Science & Technology》 CAS 2010年第11期180-182,共3页
[Objective] Taking the knowledge of tea-science field as research object,an extraction method for the taxonomic relation of ontology conception was proposed in the paper.[Method] Through improving the rule based on la... [Objective] Taking the knowledge of tea-science field as research object,an extraction method for the taxonomic relation of ontology conception was proposed in the paper.[Method] Through improving the rule based on language mode,generalized suffix tree was constructed for the concept set of tea-science field,forming hierarchical structure and taxonomic relation among conceptions.[Result and Conclusion] Moreover,corresponding prototype system was developed based on above method,and test result indicating that the method was effective. 展开更多
关键词 Tea-science field ontology Conception Taxonomic relation Generalized suffix tree
在线阅读 下载PDF
Verbumculus and the Discovery of Unusual Words 被引量:1
6
作者 AlbertoApostolico Fang-ChengGong StefanoLonardi 《Journal of Computer Science & Technology》 SCIE EI CSCD 2004年第1期22-41,共20页
Measures relating word frequencies and expectations have been constantly ofinterest in Bioinformatics studies. With sequence data becoming massively available, exhaustiveenumeration of such measures have become concei... Measures relating word frequencies and expectations have been constantly ofinterest in Bioinformatics studies. With sequence data becoming massively available, exhaustiveenumeration of such measures have become conceivable, and yet pose significant computational burdeneven when limited to words of bounded maximum length. In addition, the display of the huge tablespossibly resulting from these counts poses practical problems of visualization and inference.VERBUMCULUS is a suite of software tools for the efficient and fast detection of over- orunder-represented words in nucleotide sequences. The inner core of VERBUMCULUS rests on subtlyinterwoven properties of statistics, pattern matching and combinatorics on words, that enable one tolimit drastically and a priori the set of over-or under-represented candidate words of all lengthsin a given sequence, thereby rendering it more feasible both to detect and visualize such words in afast and practically useful way. This paper is devoted to the description of the facility at theoutset and to report experimental results, ranging from simulations on synthetic data to thediscovery of regulatory elements on the upstream regions of a set of genes of the yeast. 展开更多
关键词 verbumculus unusual words subword statistics pattern discovery regulatoryelements suffix trees
原文传递
STAG-CNS: An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species 被引量:3
7
作者 Xianjun Lai Sairam Behera +3 位作者 Zhikai Liang Yanli Lu Jitender S. Deogun James C. Schnable 《Molecular Plant》 SCIE CAS CSCD 2017年第7期990-999,共10页
One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying... One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain. 展开更多
关键词 conserved noncoding sequence comparative genomics suffix tree longest path algorithm grain crops
原文传递
Discovering User Profiles for Web Personalized Recommendation 被引量:2
8
作者 Ai-BoSong Mao-XianZhao +2 位作者 Zuo-PengLiang Yi-ShengDong Jun-ZhouLuo 《Journal of Computer Science & Technology》 SCIE EI CSCD 2004年第3期320-328,共9页
With the growing popularity of the World Wide Web, large volume of useraccess data has been gathered automatically by Web servers and stored in Web logs. Discovering andunderstanding user behavior patterns from log fi... With the growing popularity of the World Wide Web, large volume of useraccess data has been gathered automatically by Web servers and stored in Web logs. Discovering andunderstanding user behavior patterns from log files can provide Web personalized recommendationservices. In this paper, a novel clustering method is presented for log files called Clusteringlarge Weblog based on Key Path Model (CWKPM), which is based on user browsing key path model, to getuser behavior profiles. Compared with the previous Boolean model, key path model considers themajor features of users'' accessing to the Web: ordinal, contiguous and duplicate. Moreover, forclustering, it has fewer dimensions. The analysis and experiments show that CWKPM is an efficientand effective approach for clustering large and high-dimension Web logs. 展开更多
关键词 web log user profile PERSONALIZATION generalized suffix tree CLUSTERING
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部