期刊文献+
共找到1,490篇文章
< 1 2 75 >
每页显示 20 50 100
Chinese to Braille Translation Based on Braille Word Segmentation Using Statistical Model 被引量:2
1
作者 王向东 杨阳 +3 位作者 张金超 姜文斌 刘宏 钱跃良 《Journal of Shanghai Jiaotong university(Science)》 EI 2017年第1期82-86,共5页
Automatic translation of Chinese text to Chinese Braille is important for blind people in China to acquire information using computers or smart phones. In this paper, a novel scheme of Chinese-Braille translation is p... Automatic translation of Chinese text to Chinese Braille is important for blind people in China to acquire information using computers or smart phones. In this paper, a novel scheme of Chinese-Braille translation is proposed. Under the scheme, a Braille word segmentation model based on statistical machine learning is trained on a Braille corpus, and Braille word segmentation is carried out using the statistical model directly without the stage of Chinese word segmentation. This method avoids establishing rules concerning syntactic and semantic information and uses statistical model to learn the rules stealthily and automatically. To further improve the performance, an algorithm of fusing the results of Chinese word segmentation and Braille word segmentation is also proposed. Our results show that the proposed method achieves accuracy of 92.81% for Braille word segmentation and considerably outperforms current approaches using the segmentation-merging scheme. 展开更多
关键词 Chinese Braille word segmentation perceptron algorithm TP 391.1 A
原文传递
Chinese word segmentation with local and global context representation learning 被引量:2
2
作者 李岩 Zhang Yinghua +2 位作者 Huang Xiaoping Yin Xucheng Hao Hongwei 《High Technology Letters》 EI CAS 2015年第1期71-77,共7页
A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chin... A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chinese character learning model uses the semanties of loeal context and global context to learn the representation of Chinese characters. Then, Chinese word segmentation model is built by a neural network, while the segmentation model is trained with the eharaeter representations as its input features. Finally, experimental results show that Chinese charaeter representations can effectively learn the semantic information. Characters with similar semantics cluster together in the visualize space. Moreover, the proposed Chinese word segmentation model also achieves a pretty good improvement on precision, recall and f-measure. 展开更多
关键词 local and global context representation learning Chinese character representa- tion Chinese word segmentation
在线阅读 下载PDF
An Improved Unsupervised Approach to Word Segmentation
3
作者 WANG Hanshi HAN Xuhong +2 位作者 LIU Lizhen SONG Wei YUAN Mudan 《China Communications》 SCIE CSCD 2015年第7期82-95,共14页
ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the... ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the extension of ESA. In Ex ESA, the original approach is extended to a 2-pass process and the ratio of different word lengths is introduced as the third type of information combined with cohesion and separation. A maximum strategy is adopted to determine the best segmentation of a character sequence in the phrase of Selection. Besides, in Adjustment, Ex ESA re-evaluates separation information and individual information to overcome the overestimation frequencies. Additionally, a smoothing algorithm is applied to alleviate sparseness. The experiment results show that Ex ESA can further improve the performance and is time-saving by properly utilizing more information from un-annotated corpora. Moreover, the parameters of Ex ESA can be predicted by a set of empirical formulae or combined with the minimum description length principle. 展开更多
关键词 word segmentation character sequence smoothing algorithm maximum strategy
在线阅读 下载PDF
Applying rough sets in word segmentation disambiguation based on maximum entropy model
4
作者 姜维 王晓龙 +1 位作者 关毅 梁国华 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2006年第1期94-98,共5页
To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets ill WSD based on the Maximum Entropy model. Firstly... To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets ill WSD based on the Maximum Entropy model. Firstly, rough set theory is applied to extract the complicated features and long distance features, even frnm noise or inconsistent corpus. Secondly, these features are added into the Maximum Entropy model, and consequently, the feature weights can be assigned according to the performance of the whole disambiguation mnltel. Finally, tile semantic lexicou is adopted to build class-hased rough set teatures to overcome data spareness. The experiment indicated that our method performed better than previous models, which got top rank in WSD in 863 Evaluation in 2003. This system ranked first and second respcetively in MSR and PKU open test in the Second International Chinese Word Segmentation Bankeoff held in 2005. 展开更多
关键词 word segmentation feature extraction rough sets maximum entropy
在线阅读 下载PDF
Design and Implementation of a New Chinese Word Segmentation Dictionary for the Personalized Mobile Search
5
作者 Zhongmin Wang Jingna Qi Yan He 《Communications and Network》 2013年第1期81-85,共5页
Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is impl... Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is implied in the user’s query. As the traditional dictionary mechanisms can't meet the present situation of personalized mobile search, this paper presents a new dictionary mechanism which contains the word classification information. This paper, furthermore, puts forward an approach for improving the traditional word bank structure, and proposes an improved FMM segmentation algorithm. The results show that the new dictionary mechanism has made a significant increase on the query efficiency and met the user’s individual requirements better. 展开更多
关键词 Chinese word segmentation DICTIONARY Mechanism Natural LANGUAGE Processing PERSONALIZED SEARCH word Classification Information
在线阅读 下载PDF
Improvement in Accuracy of Word Segmentation of a Web-Based Japanese-to-Braille Translation Program for Medical Information
6
作者 Tsuyoshi Oda Aki Sugano +10 位作者 Masashi Shimbo Kenji Miura Mika Ohta Masako Matsuura Mineko Ikegami Tetsuya Watanabe Shinichi Kita Akihiro Ichinose Eiichi Maeda Yuji Matsumoto Yutaka Takaoka 《通讯和计算机(中英文版)》 2013年第1期82-89,共8页
关键词 医疗信息 翻译程序 Web 盲文 分词 精度 自然语言处理 专有名词
在线阅读 下载PDF
Remove Redundancy Samples for SVM in A Chinese Word Segmentation Task
7
作者 Feiliang Ren Tianshun Yao 《通讯和计算机(中英文版)》 2006年第5期103-107,共5页
关键词 文字处理 变参数系统 软件开发 数据处理
在线阅读 下载PDF
Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node 被引量:2
8
作者 Nuo Qun Hang Yan +1 位作者 Xi-Peng Qiu Xuan-Jing Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第5期1115-1126,共12页
Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit ... Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences.Despite its theoretical advantage,Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentenced length.In this paper,we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity.Specifically,we first adopt a bi-directional long short-term memory(BiLSTM)on character level to model the context information,and then use simple but effective fusion layer to represent the segment information.Besides,to model arbitrarily long segments within linear time complexity,we also propose a new model named Semi-CRF-Relay.The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings.Experiments on four popular CWS datasets show the effectiveness of our proposed methods.The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/. 展开更多
关键词 Semi-Markov conditional random field(Semi-CRF) Chinese word segmentation bi-directional long short-term memory deep learning
原文传递
Word Segmentation Based on Database Semantics in NChiql 被引量:2
9
作者 孟小峰 刘爽 王珊 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第4期346-354,共9页
In this paper a novel word-segmentation algorithm is presented todelimit words in Chinese natural language queries in NChiql system, a Chinese natural language query interface to databases. Although there are sizable ... In this paper a novel word-segmentation algorithm is presented todelimit words in Chinese natural language queries in NChiql system, a Chinese natural language query interface to databases. Although there are sizable literatureson Chinese segmentation, they cannot satisfy particular requirements in this system. The novel word-segmentation algorithm is based on the database semantics,namely Semantic Conceptual Model (SCM) for specific domain knowledge. Basedon SCM, the segmenter labels the database semantics to words directly, which easesthe disambiguation and translation (from natural language to database query) inNChiql. 展开更多
关键词 database query natural language processing word segmentation disambiguation
原文传递
Construction of Word Segmentation Model Based on HMM+BI-LSTM
10
作者 Hang Zhang Bin Wen 《国际计算机前沿大会会议论文集》 2020年第2期47-61,共15页
Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation a... Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation algorithms,statistics-based word segmentation algorithms,and understandingbased word segmentation algorithms.However,few people combine these three methods or two of them.Therefore,a Chinese word segmentation model is proposed based on a combination of statistical word segmentation algorithm and understanding-based word segmentation algorithm.It combines Hidden Markov Model(HMM)word segmentation and Bi-LSTM word segmentation to improve accuracy.The main method is to make lexical statistics on the results of the two participles,and to choose the best results based on the statistical results,and then to combine them into the final word segmentation results.This combined word segmentation model is applied to perform experiments on the MSRA corpus provided by Bakeoff.Experiments show that the accuracy of word segmentation results is 12.52%higher than that of traditional HMM model and 0.19%higher than that of BI-LSTM model. 展开更多
关键词 Chinese word segmentation HMM BI-LSTM Sequence tagging
原文传递
Effective Analysis of Chinese Word-Segmentation Accuracy
11
作者 MA Weiyin 《现代电子技术》 2007年第4期108-110,共3页
Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidate... Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidates are invalid.These false unknown word candidates deteriorate the overall segmentation accuracy,as it will affect the segmentation accuracy of known words.In this paper,we propose several methods for reducing the difficulties and improving the accuracy of the word-segmentation of written Chinese,such as full segmentation of a sentence,processing the duplicative word,idioms and statistical identification for unknown words.A simulation shows the feasibility of our proposed methods in improving the accuracy of word-segmentation of Chinese. 展开更多
关键词 中文信息处理 汉字处理 自动分割 效率分析
在线阅读 下载PDF
利用word2vec对中文词进行聚类的研究 被引量:30
12
作者 郑文超 徐鹏 《软件》 2013年第12期160-162,共3页
文本聚类在数据挖掘和机器学习中发挥着重要的作用,该技术经过多年的发展,已产生了一系列的理论成果。本文在前人研究成果的基础上,探索了一种新的中文聚类方法。本文先提出了一种中文分词算法,用来将中文文本分割成独立的词语。再对处... 文本聚类在数据挖掘和机器学习中发挥着重要的作用,该技术经过多年的发展,已产生了一系列的理论成果。本文在前人研究成果的基础上,探索了一种新的中文聚类方法。本文先提出了一种中文分词算法,用来将中文文本分割成独立的词语。再对处理后的语料使用Word2Vec工具集,应用深度神经网络算法,转化为对应的词向量。最后,将词向量之间的余弦距离定义为词之间的相似度,通过使用K-means聚类算法将获取的词向量进行聚类,最终可以返回语料库中同输入词语语意最接近的词。本文从网络上抓取了2012年的网络新闻数据,应用上述方法进行了实验,取得了不错的实验效果。 展开更多
关键词 数据挖掘 聚类 分词 词向量 神经网络
在线阅读 下载PDF
Word2007在长篇文档编辑中的几个技巧 被引量:2
13
作者 楚叶峰 周海涛 《长春大学学报》 2013年第2期241-245,共5页
Word 2007在办公领域中使用率很高,用于长篇文档编辑中需要考虑的事情有很多,本文简单介绍了在长篇文档编辑中几个实用的小技巧,在实际应用过程中可以达到事半功倍的效果。主要技巧有分节、制作样式、目录提取、双面打印以及几种特殊的... Word 2007在办公领域中使用率很高,用于长篇文档编辑中需要考虑的事情有很多,本文简单介绍了在长篇文档编辑中几个实用的小技巧,在实际应用过程中可以达到事半功倍的效果。主要技巧有分节、制作样式、目录提取、双面打印以及几种特殊的排版技巧。 展开更多
关键词 word2007 排版 分节 样式 打印
在线阅读 下载PDF
Chinese Word Boundary Ambiguity and Unknown Word Resolution Using Unsupervised Methods 被引量:1
14
作者 傅国宏 《High Technology Letters》 EI CAS 2000年第2期29-39,共11页
An unsupervised framework to partially resolve the four issues, namely ambiguity, unknown word, knowledge acquisition and efficient algorithm, in developing a robust Chinese segmentation system is described. It first ... An unsupervised framework to partially resolve the four issues, namely ambiguity, unknown word, knowledge acquisition and efficient algorithm, in developing a robust Chinese segmentation system is described. It first proposes a statistical segmentation model integrating the simplified character juncture model (SCJM) with word formation power. The advantage of this model is that it can employ the affinity of characters inside or outside a word and word formation power simultaneously to process disambiguation and all the parameters can be estimated in an unsupervised way. After investigating the differences between real and theoretical size of segmentation space, we apply A * algorithm to perform segmentation without exhaustively searching all the potential segmentations. Finally, an unsupervised version of Chinese word formation patterns to detect unknown words is presented. Experiments show that the proposed methods are efficient. 展开更多
关键词 word segmentation CHARACTER JUNCTURE Work formation pattern
在线阅读 下载PDF
应用Jieba和Wordcloud库的词云设计与优化 被引量:22
15
作者 徐博龙 《福建电脑》 2019年第6期25-28,共4页
分词是Python中的一项重要应用,实现分词功能的工具有很多种,如jieba、SnowNLP、THULAC、NLPIR等。词云是在分词的基础上设计并实现的,它提供阅读整个信息的重点,揭示关键概念,并可使用不同的展示形式,以有趣、高效、新颖的方式呈现给... 分词是Python中的一项重要应用,实现分词功能的工具有很多种,如jieba、SnowNLP、THULAC、NLPIR等。词云是在分词的基础上设计并实现的,它提供阅读整个信息的重点,揭示关键概念,并可使用不同的展示形式,以有趣、高效、新颖的方式呈现给阅读者。在此,以中文分词为例,详细介绍使用jieba库和wordcloud库实现词云的设计与优化。 展开更多
关键词 PYTHON 中文分词 词云 Jieba wordcloud
在线阅读 下载PDF
用Word进行毕业论文编辑排版的技巧 被引量:4
16
作者 刘敏 《电脑学习》 2009年第2期112-113,共2页
介绍用Word进行毕业论文编辑排版时常用的技巧。
关键词 word 排版 样式 分节 目录
在线阅读 下载PDF
基于GA-LGBM算法的文本泄露智能预警
17
作者 叶磊 李卫国 +3 位作者 蔡翔 魏绪亮 孙露露 杜成斌 《电子设计工程》 2026年第4期178-181,187,共5页
为有效识别和预警文本数据中的隐私泄露风险,设计基于GA-LGBM算法的文本泄露智能预警方法。对文本数据实施清洗、分词、去除停用词等预处理操作。使用Word2Vec模型实施文本向量化,将文本数据转换为数值特征。提出遗传算法(Genetic Algor... 为有效识别和预警文本数据中的隐私泄露风险,设计基于GA-LGBM算法的文本泄露智能预警方法。对文本数据实施清洗、分词、去除停用词等预处理操作。使用Word2Vec模型实施文本向量化,将文本数据转换为数值特征。提出遗传算法(Genetic Algorithm,GA)优化的轻量梯度提升机(Light Gradient Boosting Machine,LGBM)模型(GA-LGBM算法),将GA的全局搜索优势与Light GBM的预测能力相结合,优化文本泄露智能预警效果。测试结果表明,设计方法在数据量较大的情况下错误预警与无法预警的情况较少,正确预警的占比高;当测试集中的数据从较为平衡的状态转变为极度不平衡时,设计方法的AUC值较高,具有较好的预警效果。 展开更多
关键词 分词 停用词 word2Vec模型 GA-LGBM算法 智能预警
在线阅读 下载PDF
语法为纲,感知为用:国家通用盲文分词连写规则的优化与实践
18
作者 潘江 《牡丹江大学学报》 2026年第1期36-41,61,共7页
国家通用盲文分词连写是视障群体依托触觉获取信息、建构语言逻辑、实现教育公平与社会融入的关键环节,其规则科学性直接影响盲文读者的阅读流畅度、语义理解深度及语言学习效果。不合理的分词连写易导致触觉感知碎片化、语义关联断裂,... 国家通用盲文分词连写是视障群体依托触觉获取信息、建构语言逻辑、实现教育公平与社会融入的关键环节,其规则科学性直接影响盲文读者的阅读流畅度、语义理解深度及语言学习效果。不合理的分词连写易导致触觉感知碎片化、语义关联断裂,加重视障者认知负担。当前国家通用盲文分词连写遵循“符合汉语语法”“遵循语言的逻辑性和习惯性”“考虑音节长短适度”三项原则,但实践中三者因内在逻辑差异常产生矛盾,导致盲文编校人员面临操作困境。通过梳理三项原则的核心内涵与冲突表现,以“语法结构原则为总纲、其他原则为辅助”单一核心思路,构建“主谓结构分写为纲,定中结构、动宾结构按需处理,多语素组合依音节属性适配”的解决路径,尤其明确成语等固定语义单元需整体连写,旨在为汉语盲文书写规范优化与视障教育实践提供参考。 展开更多
关键词 国家通用盲文 分词连写 三项基本原则 语法原则为纲 音节搭配 成语连写
在线阅读 下载PDF
A New Word Detection Method for Chinese Based on Local Context Information 被引量:1
19
作者 曾华琳 周昌乐 郑旭玲 《Journal of Donghua University(English Edition)》 EI CAS 2010年第2期189-192,共4页
Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction b... Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent. 展开更多
关键词 new word detection improved PPM model context information Chinese words segmentation
在线阅读 下载PDF
基于TF-IDF与Word2vec的用户评论分析研究 被引量:4
20
作者 刘宇韬 施莉 刘诗含 《成都航空职业技术学院学报》 2022年第4期89-92,共4页
文章以对电脑产品为实验对象,通过网络爬虫对评论数据进行爬取,并将用户评论进行分词处理,而后就处理结果分别基于TF-IDF和Word2vec两者进行文本分析,计算该评论中的高频词语及其相关性,从而了解用户对该类产品的关注点及与之相关的其... 文章以对电脑产品为实验对象,通过网络爬虫对评论数据进行爬取,并将用户评论进行分词处理,而后就处理结果分别基于TF-IDF和Word2vec两者进行文本分析,计算该评论中的高频词语及其相关性,从而了解用户对该类产品的关注点及与之相关的其他问题,最后为生产商及电商平台提出指导性建议。 展开更多
关键词 用户评论 中文分词 TF-IDF word2vec
在线阅读 下载PDF
上一页 1 2 75 下一页 到第
使用帮助 返回顶部