期刊文献+
共找到358篇文章
< 1 2 18 >
每页显示 20 50 100
Distributed Document Clustering Analysis Based on a Hybrid Method 被引量:2
1
作者 J.E.Judith J.Jayakumari 《China Communications》 SCIE CSCD 2017年第2期131-142,共12页
Clustering is one of the recently challenging tasks since there is an ever.growing amount of data in scientific research and commercial applications. High quality and fast document clustering algorithms are in great d... Clustering is one of the recently challenging tasks since there is an ever.growing amount of data in scientific research and commercial applications. High quality and fast document clustering algorithms are in great demand to deal with large volume of data. The computational requirements for bringing such growing amount data to a central site for clustering are complex. The proposed algorithm uses optimal centroids for K.Means clustering based on Particle Swarm Optimization(PSO).PSO is used to take advantage of its global search ability to provide optimal centroids which aids in generating more compact clusters with improved accuracy. This proposed methodology utilizes Hadoop and Map Reduce framework which provides distributed storage and analysis to support data intensive distributed applications. Experiments were performed on Reuter's and RCV1 document dataset which shows an improvement in accuracy with reduced execution time. 展开更多
关键词 distributed document clustering HADOOP K-MEANS PSO MAPREDUCE
在线阅读 下载PDF
Document Clustering Using Graph Based Fuzzy Association Rule Generation
2
作者 P.Perumal 《Computer Systems Science & Engineering》 SCIE EI 2022年第10期203-218,共16页
With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate inform... With the wider growth of web-based documents,the necessity of automatic document clustering and text summarization is increased.Here,document summarization that is extracting the essential task with appropriate information,removal of unnecessary data and providing the data in a cohesive and coherent manner is determined to be a most confronting task.In this research,a novel intelligent model for document clustering is designed with graph model and Fuzzy based association rule generation(gFAR).Initially,the graph model is used to map the relationship among the data(multi-source)followed by the establishment of document clustering with the generation of association rule using the fuzzy concept.This method shows benefit in redundancy elimination by mapping the relevant document using graph model and reduces the time consumption and improves the accuracy using the association rule generation with fuzzy.This framework is provided in an interpretable way for document clustering.It iteratively reduces the error rate during relationship mapping among the data(clusters)with the assistance of weighted document content.Also,this model represents the significance of data features with class discrimination.It is also helpful in measuring the significance of the features during the data clustering process.The simulation is done with MATLAB 2016b environment and evaluated with the empirical standards like Relative Risk Patterns(RRP),ROUGE score,and Discrimination Information Measure(DMI)respectively.Here,DailyMail and DUC 2004 dataset is used to extract the empirical results.The proposed gFAR model gives better trade-off while compared with various prevailing approaches. 展开更多
关键词 document clustering text summarization fuzzy model association rule generation graph model relevance mapping feature patterns
在线阅读 下载PDF
Genetic-Frog-Leaping Algorithm for Text Document Clustering 被引量:1
3
作者 Lubna Alhenak Manar Hosny 《Computers, Materials & Continua》 SCIE EI 2019年第9期1045-1074,共30页
In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from lar... In recent years,the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web.As a result,the use of techniques for extracting useful information from large collections of data,and particularly documents,has become more necessary and challenging.Text clustering is such a technique;it consists in dividing a set of text documents into clusters(groups),so that documents within the same cluster are closely related,whereas documents in different clusters are as different as possible.Clustering depends on measuring the content(i.e.,words)of a document in terms of relevance.Nevertheless,as documents usually contain a large number of words,some of them may be irrelevant to the topic under consideration or redundant.This can confuse and complicate the clustering process and make it less accurate.Accordingly,feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features.In this study,we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features.The proposed approach is based on two metaheuristic algorithms:a genetic algorithm(GA)and a shuffled frog-leaping algorithm(SFLA).The GA performs feature selection,and the SFLA performs clustering.To evaluate its effectiveness,the proposed approach was tested on a well-known text document dataset:the“20Newsgroup”dataset from the University of California Irvine Machine Learning Repository.Overall,after multiple experiments were compared and analyzed,it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering,compared with classical K-means clustering.Nevertheless,this improvement requires longer computational time. 展开更多
关键词 Text documents clustering meta-heuristic algorithms shuffled frog-leaping algorithm genetic algorithm feature selection
在线阅读 下载PDF
Semi-supervised Document Clustering Based on Latent Dirichlet Allocation (LDA) 被引量:2
4
作者 秦永彬 李解 +1 位作者 黄瑞章 李晶 《Journal of Donghua University(English Edition)》 EI CAS 2016年第5期685-688,共4页
To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.... To discover personalized document structure with the consideration of user preferences,user preferences were captured by limited amount of instance level constraints and given as interested and uninterested key terms.Develop a semi-supervised document clustering approach based on the latent Dirichlet allocation(LDA)model,namely,pLDA,guided by the user provided key terms.Propose a generalized Polya urn(GPU) model to integrate the user preferences to the document clustering process.A Gibbs sampler was investigated to infer the document collection structure.Experiments on real datasets were taken to explore the performance of pLDA.The results demonstrate that the pLDA approach is effective. 展开更多
关键词 supervised clustering document latent Dirichlet instance captured constraints labeled interested
在线阅读 下载PDF
Document Clustering Based on Constructing Density Tree
5
作者 戴维迪 王文俊 +2 位作者 侯越先 王英 张璐 《Transactions of Tianjin University》 EI CAS 2008年第1期21-26,共6页
This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential c... This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential cluster by dynamically adjusting the radius of neighborhood according to local density. It avoids density-based spatial clustering of applications with noise (DBSCAN) ′s global density parameters and reduces input parameters to one. The results of experiment on real document show that CABDET achieves better accuracy of clustering than DBSCAN method. The CABDET algorithm obtains the max F-measure value 0.347 with the root node's radius of neighborhood 0.80, which is higher than 0.332 of DBSCAN with the radius of neighborhood 0.65 and the minimum number of objects 6. 展开更多
关键词 document handling clustering tree structure vector space model
在线阅读 下载PDF
Document Clustering Using Semantic Cliques Aggregation
6
作者 Ajit Kumar I-Jen Chiang 《Journal of Computer and Communications》 2015年第12期28-40,共13页
The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ qu... The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries;however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such cases, documents clustering centered on the subject and contents might improve search results. This paper presents a novel method of document clustering, which uses semantic clique. First, we extracted the Features from the documents. Later, the associations between frequently co-occurring terms were defined, which were called as semantic cliques. Each connected component in the semantic clique represented a theme. The documents clustered based on the theme, for which we designed an aggregation algorithm. We evaluated the aggregation algorithm effectiveness using four kinds of datasets. The result showed that the semantic clique based document clustering algorithm performed significantly better than traditional clustering algorithms such as Principal Direction Divisive Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We found that the Semantic Clique Aggregation is a potential model to represent association rules in text and could be immensely useful for automatic document clustering. 展开更多
关键词 document clustering SEMANTIC CLIQUE ASSOCIATION AGGREGATION THEME
在线阅读 下载PDF
A SOM-Based Document Clustering Using Frequent Max Substrings for Non-Segmented Texts
7
作者 Todsanai Chumwatana Kok Wai Wong Hong Xie 《Journal of Intelligent Learning Systems and Applications》 2010年第3期117-125,共9页
This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for docu... This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for document clustering and is successful in many applications. However, when applying to non-segmented document, the challenge is to identify any interesting pattern efficiently. There are two main phases in the propose method: preprocessing phase and clustering phase. In the preprocessing phase, the frequent max substring technique is first applied to discover the patterns of interest called Frequent Max substrings that are long and frequent substrings, rather than individual words from the non-segmented texts. These discovered patterns are then used as indexing terms. The indexing terms together with their number of occurrences form a document vector. In the clustering phase, SOM is used to generate the document cluster map by using the feature vector of Frequent Max substrings. To demonstrate the proposed technique, experimental studies and comparison results on clustering the Thai text documents, which consist of non-segmented texts, are presented in this paper. The results show that the proposed technique can be used for Thai texts. The document cluster map generated with the method can be used to find the relevant documents more efficiently. 展开更多
关键词 Frequent MAX SUBSTRING SELF-ORGANIZING Map document clustering
暂未订购
Improving Web Document Clustering through Employing User-Related Tag Expansion Techniques 被引量:5
8
作者 李鹏 王斌 晋薇 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第3期554-566,共13页
As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This spa... As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods., which indicates that tags could be a better resource for the clustering task. 展开更多
关键词 web document clustering social bookmarking topic model tag expansion
原文传递
Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering 被引量:4
9
作者 Sahand Vahidnia Alireza Abbasi Hussein A.Abbass 《Journal of Data and Information Science》 CSCD 2021年第3期99-122,共24页
Purpose:Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields.This also helps in having a better collab... Purpose:Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields.This also helps in having a better collaboration with governments and businesses.This study aims to investigate the development of research fields over time,translating it into a topic detection problem.Design/methodology/approach:To achieve the objectives,we propose a modified deep clustering method to detect research trends from the abstracts and titles of academic documents.Document embedding approaches are utilized to transform documents into vector-based representations.The proposed method is evaluated by comparing it with a combination of different embedding and clustering approaches and the classical topic modeling algorithms(i.e.LDA)against a benchmark dataset.A case study is also conducted exploring the evolution of Artificial Intelligence(AI)detecting the research topics or sub-fields in related AI publications.Findings:Evaluating the performance of the proposed method using clustering performance indicators reflects that our proposed method outperforms similar approaches against the benchmark dataset.Using the proposed method,we also show how the topics have evolved in the period of the recent 30 years,taking advantage of a keyword extraction method for cluster tagging and labeling,demonstrating the context of the topics.Research limitations:We noticed that it is not possible to generalize one solution for all downstream tasks.Hence,it is required to fine-tune or optimize the solutions for each task and even datasets.In addition,interpretation of cluster labels can be subjective and vary based on the readers’opinions.It is also very difficult to evaluate the labeling techniques,rendering the explanation of the clusters further limited.Practical implications:As demonstrated in the case study,we show that in a real-world example,how the proposed method would enable the researchers and reviewers of the academic research to detect,summarize,analyze,and visualize research topics from decades of academic documents.This helps the scientific community and all related organizations in fast and effective analysis of the fields,by establishing and explaining the topics.Originality/value:In this study,we introduce a modified and tuned deep embedding clustering coupled with Doc2Vec representations for topic extraction.We also use a concept extraction method as a labeling approach in this study.The effectiveness of the method has been evaluated in a case study of AI publications,where we analyze the AI topics during the past three decades. 展开更多
关键词 Dynamics of science Science mapping document clustering Artificial intelligence Deep learning
在线阅读 下载PDF
Research of Web Documents Clustering Based on Dynamic Concept
10
作者 WANG Yun-hua CHEN Shi-hong 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期547-552,共6页
Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge.Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web them... Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge.Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information,this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents,and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. 展开更多
关键词 conceptual clustering clustering center dynamic conceptual clustering THEME web documents clustering
在线阅读 下载PDF
A Novel Method for Transforming XML Documents to Time Series and Clustering Them Based on Delaunay Triangulation
11
作者 Narges Shafieian 《Applied Mathematics》 2015年第6期1076-1085,共10页
Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures... Nowadays exchanging data in XML format become more popular and have widespread application because of simple maintenance and transferring nature of XML documents. So, accelerating search within such a document ensures search engine’s efficiency. In this paper, we propose a technique for detecting the similarity in the structure of XML documents;in the following, we would cluster this document with Delaunay Triangulation method. The technique is based on the idea of representing the structure of an XML document as a time series in which each occurrence of a tag corresponds to a given impulse. So we could use Discrete Fourier Transform as a simple method to analyze these signals in frequency domain and make similarity matrices through a kind of distance measurement, in order to group them into clusters. We exploited Delaunay Triangulation as a clustering method to cluster the d-dimension points of XML documents. The results show a significant efficiency and accuracy in front of common methods. 展开更多
关键词 XML Mining document clusterING XML clusterING Schema Matching Similarity Measures DELAUNAY TRIANGULATION cluster
在线阅读 下载PDF
Application of a soft competition learning method in document clustering
12
作者 Zhu Yehang Zhang Mingjie 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2018年第3期80-91,共12页
Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid... Hard competition learning has the feature that each point modifies only one cluster centroid that wins. Correspondingly, soft competition learning has the feature that each point modifies not only the cluster centroid that wins, but also many other cluster centroids near this point. A soft competition learning method is proposed. Centroid all rank distance (CARD), CARDx, and centroid all rank distance batch K-means (CARDBK) are three clustering algorithms that adopt the proposed soft competition learning method. Among them the extent to which one point affects a cluster centroid depends on the distances from this point to the other nearer cluster centroids, rather than just the rank number of the distance from this point to this cluster centroid among the distances from this point to all cluster centroids. In addition, the validation experiments are carried out in order to compare the three soft competition learning algorithms CARD, CARDx, and CARDBK with several hard competition learning algorithms as well as neural gas (NG) algorithm on five data sets from different sources. Judging from the values of five performance indexes in the clustering results, this kind of soft competition learning method has better clustering effect and efficiency, and has linear scalability. 展开更多
关键词 clustering methods text processing document handling competition learning method
原文传递
Cluster-Merge本体构造算法 被引量:2
13
作者 徐德智 Junaid 《计算技术与自动化》 2010年第3期49-52,共4页
本体构造就是利用各种数据源以半自动方式新建或扩充改编已有本体以构建一个新本体。现有的本体构造方法大都以大量领域文本和背景语料库为基础抽取大量概念术语,然后从中选出领域概念构造出一个本体。Cluster-Merge算法首先对领域文档... 本体构造就是利用各种数据源以半自动方式新建或扩充改编已有本体以构建一个新本体。现有的本体构造方法大都以大量领域文本和背景语料库为基础抽取大量概念术语,然后从中选出领域概念构造出一个本体。Cluster-Merge算法首先对领域文档先用k-means聚类算法进行聚类,然后根据文档聚类的结果来构造本体,最后根据本体相似度进行本体合并得到最终的输出本体。通过实验可证明用Cluster-Merge算法得出的本体可以提高查全率、查准率。 展开更多
关键词 本体学习 文档聚类 K-MEANS聚类算法 相似度 本体合并
在线阅读 下载PDF
Meaningful String Extraction Based on Clustering for Improving Webpage Classification
14
作者 Chen Jie Tan Jianlong +1 位作者 Liao Hao Zhou Yanquan 《China Communications》 SCIE CSCD 2012年第3期68-77,共10页
Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with ... Since webpage classification is different from traditional text classification with its irregular words and phrases,massive and unlabeled features,which makes it harder for us to obtain effective feature.To cope with this problem,we propose two scenarios to extract meaningful strings based on document clustering and term clustering with multi-strategies to optimize a Vector Space Model(VSM) in order to improve webpage classification.The results show that document clustering work better than term clustering in coping with document content.However,a better overall performance is obtained by spectral clustering with document clustering.Moreover,owing to image existing in a same webpage with document content,the proposed method is also applied to extract image meaningful terms,and experiment results also show its effectiveness in improving webpage classification. 展开更多
关键词 webpage classification meaningfulstring extraction document clustering term cluste-ring K-MEANS spectral clustering
在线阅读 下载PDF
色素沉着绒毛结节性滑膜炎研究趋势的可视化分析 被引量:1
15
作者 熊冰朗 曹旭含 +4 位作者 张程 郭紫嫣 孙旭东 白子兴 孙卫东 《中国组织工程研究》 CAS 北大核心 2025年第15期3290-3300,共11页
背景:色素沉着绒毛结节性滑膜炎在病因、临床表现、诊断与治疗等研究领域依然存在较大争议,对色素沉着绒毛结节性滑膜炎进行文献计量学及可视化研究可以理清该领域研究发展脉络,为未来的研究指明方向。目的:分析色素沉着绒毛结节性滑膜... 背景:色素沉着绒毛结节性滑膜炎在病因、临床表现、诊断与治疗等研究领域依然存在较大争议,对色素沉着绒毛结节性滑膜炎进行文献计量学及可视化研究可以理清该领域研究发展脉络,为未来的研究指明方向。目的:分析色素沉着绒毛结节性滑膜炎全球研究现状、热点及趋势。方法:选用Web of Science及中国知网数据库检索1995-2023年所有与色素沉着绒毛结节性滑膜炎相关的文献出版物,用Citespace和Bibliometrics软件对所有文献进行聚类、共现和突现词分析。Web of Science核心合集数据库采用主题词加自由词进行检索,中国知网数据库通过主题词进行检索。最终纳入986篇英文文献和599篇中文文献。结果与结论:①美国在该领域研究占有绝对领导地位,共发文数量、H指数和被引次数均排名第一。中国总发文量排名第4,H指数排名第12,发文质量与国际合作方面仍有待加强。②中国知网数据库相关研究聚类分析显示排名前5的聚类为放射治疗、软组织肉瘤、类风湿性关节炎、磁共振影像和诊断。③持续突现至2023年的关键词有集落刺激因子1、腱鞘巨细胞瘤、个案报道、染色体易位、放射治疗、表达和激酶。④依据关键词及共被引文献分析发现,关于色素沉着绒毛结节性滑膜炎临床特征的研究、集落刺激因子1抑制剂新药开发,以及集落刺激因子1抑制剂在治疗过程中的应用是目前的研究热点。⑤结合主题演化及现有研究热点分析,未来在明确该病病因、发病机制及临床特征的基础上,提高色素沉着绒毛结节性滑膜炎诊断准确率、增强治疗的精准性、降低治疗后复发率将是需要重点关注的问题。 展开更多
关键词 文献计量学 可视化 CITESPACE 色素沉着绒毛结节性滑膜炎 文献聚类分析 共现分析 关节 热点
暂未订购
视频荧光光谱图像技术鉴别印文色料
16
作者 孙林杰 陈维娜 +3 位作者 李开开 田紫雲 潘正浩 黄晨 《光散射学报》 北大核心 2025年第2期306-316,共11页
选取国内市场中76种不同品牌和类型的印泥和印油,模拟实际条件,以正常力度的均匀地印在同一种A4打印纸上,得到76种印文色料样本。使用德皓VF10 Pro超景深高倍视频荧光显微镜采集样本的荧光图像,通过Matlab软件计算出样本荧光图像的RGB和... 选取国内市场中76种不同品牌和类型的印泥和印油,模拟实际条件,以正常力度的均匀地印在同一种A4打印纸上,得到76种印文色料样本。使用德皓VF10 Pro超景深高倍视频荧光显微镜采集样本的荧光图像,通过Matlab软件计算出样本荧光图像的RGB和LCH平均像素值,并运用T-SNE算法对样本荧光图像的亮度和颜色数值进行K-means聚类分析,进而进行综合量化的检验与鉴别。研究结果表明,根据LCH值的差异进行亮度聚类分析可将76种印泥印油分为4类,根据RGB值的差异进行颜色聚类分析可将76种印泥印油分为6类。根据荧光的亮度和颜色进行综合分析,最终可将76种印泥印油分为21类。视频荧光光谱图像技术能够广泛、有效实现对不同品牌、类型的印泥印油进行系统分析和有效鉴别,为印章印文的法庭科学检验提供了具有较高实用价值的新型技术手段。 展开更多
关键词 视频荧光 印文色料 聚类分析 文件检验
在线阅读 下载PDF
基于大数据与集成方法的文档聚类研究
17
作者 郑芳 李志威 王巍 《计算机与数字工程》 2025年第1期152-157,208,共7页
聚类是无监督机器学习中广泛使用的技术,但是由于通常数据标签的不可用,因此对于其结果的质量评估是一个棘手的问题。尽管之前有许多方法可以验证聚类质量,但单一的度量指标无法做到全面的评估。此外,大数据中通常包含相当比例的噪声,... 聚类是无监督机器学习中广泛使用的技术,但是由于通常数据标签的不可用,因此对于其结果的质量评估是一个棘手的问题。尽管之前有许多方法可以验证聚类质量,但单一的度量指标无法做到全面的评估。此外,大数据中通常包含相当比例的噪声,此时聚类时需要进行额外的改进。基于这些痛点,论文在文档聚类时,首先进行包括词元化、停用词消除、词干提取与向量空间转换等数据集预处理,然后使用基于模糊逻辑的改进K均值和K中心方法对文档大数据进行了聚类,最后基于集成方法使用七个不同的有效性度量指标并基于绝对共识和多数共识对结果进行评估,以确定模糊聚类的最优质量。此外,还使用了不同的簇数对标准文档数据集进行聚类,并证明了论文方法可以确定最佳的聚类数量。 展开更多
关键词 集成方法 数据挖掘 文档聚类 模糊逻辑
在线阅读 下载PDF
面向信息网模型的语义数据库优化
18
作者 周瑞平 刘梦赤 +1 位作者 唐诗琪 谢仕斌 《计算机与数字工程》 2025年第9期2509-2515,2565,共8页
大数据背景下,以互联网为信息交换媒介的各行业产生的数据总量呈现爆发式增长,对Web语义数据的存储查询技术出了新的挑战。为满足非结构化语义数据的大规模存储需求,课题组设计实现了基于BerkelyDB存储引擎的分布式信息网模型语义数据... 大数据背景下,以互联网为信息交换媒介的各行业产生的数据总量呈现爆发式增长,对Web语义数据的存储查询技术出了新的挑战。为满足非结构化语义数据的大规模存储需求,课题组设计实现了基于BerkelyDB存储引擎的分布式信息网模型语义数据存储系统,解决了关系型数据库无法存储海量非结构化语义数据的问题,但系统仍存在存储和访问的效率瓶颈问题。因此为提升面向信息网模型的语义数据库整体读写性能,沿用信息网模型对语义数据进行组织与建模,设计语义对象模式到BSON文档模式的映射算法,提出了一种基于MongoDB分布式集群架构的大规模语义数据存储设计方案。系统读写性能的对比实验结果表明,在1.2 GB的语义数据规模下,插入平均耗时降低19.8%以及复杂查询平均耗时降低59.3%。 展开更多
关键词 语义数据 信息网模型 对象-文档映射算法 MongoDB集群 分片策略
在线阅读 下载PDF
基于融媒体的电视播出系统媒体集群处理中心的设计与应用
19
作者 张欣月 《广播与电视技术》 2025年第10期35-38,共4页
随着新疆广播电视台超高清节目制作不断深入,在制播一体的深度融合下,为满足安全播出和互联互通接口的业务需求,本台构建了基于融媒体的媒体集群处理中心。该系统以现有硬盘播出系统为基础,采用分布式架构实现全台网文件化整备,有效整... 随着新疆广播电视台超高清节目制作不断深入,在制播一体的深度融合下,为满足安全播出和互联互通接口的业务需求,本台构建了基于融媒体的媒体集群处理中心。该系统以现有硬盘播出系统为基础,采用分布式架构实现全台网文件化整备,有效整合了新媒体、4K新闻演播室及5G转播车等多种资源。本文重点分析了媒体处理中心的系统架构,并阐述了关键技术的创新与应用。该系统的实施显著提升了制播一体化水平,为构建全域安全播出体系奠定了坚实基础。 展开更多
关键词 文件化制播 内容汇聚子系统 交换集群 互联互通
在线阅读 下载PDF
一种基于群体智能的Web文档聚类算法 被引量:41
20
作者 吴斌 傅伟鹏 +2 位作者 郑毅 刘少辉 史忠植 《计算机研究与发展》 EI CSCD 北大核心 2002年第11期1429-1435,共7页
将群体智能聚类模型运用于文档聚类 ,提出了一种基于群体智能的 Web文档聚类算法 .首先运用向量空间模型表示 Web文档信息 ,采用常规方法如消除无用词和特征词条约简法则得到文本特征集 ,然后将文档向量随机分布到一个平面上 ,运用基于... 将群体智能聚类模型运用于文档聚类 ,提出了一种基于群体智能的 Web文档聚类算法 .首先运用向量空间模型表示 Web文档信息 ,采用常规方法如消除无用词和特征词条约简法则得到文本特征集 ,然后将文档向量随机分布到一个平面上 ,运用基于群体智能的聚类方法进行文档聚类 ,最后从平面上采用递归算法收集聚类结果 .为了改善算法的实用性 ,将原算法与 k均值算法相结合提出一种混合聚类算法 .通过实验比较 ,结果表明基于群体智能的 Web文档聚类算法具有较好的聚类特性 ,它能将与一个主题相关的 Web文档较完全而准确地聚成一类 . 展开更多
关键词 群体智能 WEB 文档聚类算法 自组织聚类 群体相似度 互联网 信息检索
在线阅读 下载PDF
上一页 1 2 18 下一页 到第
使用帮助 返回顶部