It has been suggested that text-based computer-mediated communication can help learners to use target language both in classrooms and in social contexts.It’s necessary to investigate the effect of text-based CMC on l...It has been suggested that text-based computer-mediated communication can help learners to use target language both in classrooms and in social contexts.It’s necessary to investigate the effect of text-based CMC on learners’communicative competence by conducting the method of systematic review.The findings implied that text-based CMC settings allowed learners to interact.The interaction provided learners with more opportunities to develop their communicative competence of target language.展开更多
The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communica...The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communication: computer-mediated communication (CMC). Text-based CMC is one of the most popular forms of CMC in which people send instant messages to others in different settings. Since this mode of interaction combines features of both the written and spoken language (Greenfield & Subrahmanyam, 2003), it's of great interest whether it follows the same sequential rule as the telephone conversation. However, compared to telephone conversations, computer-mediated communication has received much less attention, let alone text-based CMC. The existing body of literature mostly focuses on content analysis and linguistic features but neglects the sequential organization of such interaction (Paolillo, 1999; Greenfield and Subrahmanyam, 2003; Herring, 1999). In light of this, this paper examines the opening moves of instant message exchanges among Chinese adults in an attempt to find out the unique features characterizing the way they open an online chat. The framework that was chosen for data analysis was the sequential model proposed by Schegloff for American telephone openings.展开更多
Nowadays, millions of users use many social media systems every day. These services produce massive messages, which play a vital role in the social networking paradigm. As we see, an intelligent learning emotion syste...Nowadays, millions of users use many social media systems every day. These services produce massive messages, which play a vital role in the social networking paradigm. As we see, an intelligent learning emotion system is desperately needed for detecting emotion among these messages. This system could be suitable in understanding users’ feelings towards particular discussion. This paper proposes a text-based emotion recognition approach that uses personal text data to recognize user’s current emotion. The proposed approach applies Dominant Meaning Technique to recognize user’s emotion. The paper reports promising experiential results on the tested dataset based on the proposed algorithm.展开更多
Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of te...Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.展开更多
The present study aims to explore the discursive strategies in Chinese housing news texts on the basis of a self-built corpus.A critical discourse analysis methodology is a—dopted to explore the implicit meaning and ...The present study aims to explore the discursive strategies in Chinese housing news texts on the basis of a self-built corpus.A critical discourse analysis methodology is a—dopted to explore the implicit meaning and attitudes in the housing texts.The keyword analysis proves to be effective in discovering the lexical features of the texts.The findings show that the news texts remain implicit in their attitudes through the use of professional voice and the direction of readers'attention.展开更多
The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability ...The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability of training samples. When using the original covering algorithm(CA), many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained. The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network. By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy. Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification.展开更多
With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working...With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working with Text Mining. This study is based on The Women’s Clothing E-Commerce Reviews database, which consists of reviews written by real customers. The aim of this paper is to conduct a Text Mining approach on a set of customer reviews. Each review was classified as either a positive or negative review by employing a classification method. Four tree-based methods were applied to solve the classification problem, namely Classification Tree, Random Forest, Gradient Boosting and XGBoost. The dataset was categorized into training and test sets. The results indicate that the Random Forest method displays an overfitting, XGBoost displays an overfitting if the number of trees is too high, Classification Tree is good at detecting negative reviews and bad at detecting positive reviews and the Gradient Boosting shows stable values and quality measures above 77% for the test dataset. A consensus between the applied methods is noted for important classification terms.展开更多
为了对高校学术活动进行信息提取和文本分类,基于结合注意力机制的长短期记忆网络(Attention-Based Long Short-Term Memory)特征选择模型,构建了高校学术活动分类系统。通过大量分析高校学术活动语料的特点,准确抽取学术活动的相关内容...为了对高校学术活动进行信息提取和文本分类,基于结合注意力机制的长短期记忆网络(Attention-Based Long Short-Term Memory)特征选择模型,构建了高校学术活动分类系统。通过大量分析高校学术活动语料的特点,准确抽取学术活动的相关内容,改善了文本数据质量;提出了一种改进的Attention-Based LSTM特征选择模型,降低了数据维度,有效地突出了重点信息。实验结果表明,该方法提高了分类的准确率,其分类效果明显优于普通LSTM(Long Short-Term Memory)模型和传统模型的处理结果。展开更多
Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for ...Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically.展开更多
Support vector machines(SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data...Support vector machines(SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled.The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy.展开更多
文摘It has been suggested that text-based computer-mediated communication can help learners to use target language both in classrooms and in social contexts.It’s necessary to investigate the effect of text-based CMC on learners’communicative competence by conducting the method of systematic review.The findings implied that text-based CMC settings allowed learners to interact.The interaction provided learners with more opportunities to develop their communicative competence of target language.
文摘The development of science and technology has made it not only possible but very convenient for people living in different parts of the world to communicate with each other, thus bringing forth a new form of communication: computer-mediated communication (CMC). Text-based CMC is one of the most popular forms of CMC in which people send instant messages to others in different settings. Since this mode of interaction combines features of both the written and spoken language (Greenfield & Subrahmanyam, 2003), it's of great interest whether it follows the same sequential rule as the telephone conversation. However, compared to telephone conversations, computer-mediated communication has received much less attention, let alone text-based CMC. The existing body of literature mostly focuses on content analysis and linguistic features but neglects the sequential organization of such interaction (Paolillo, 1999; Greenfield and Subrahmanyam, 2003; Herring, 1999). In light of this, this paper examines the opening moves of instant message exchanges among Chinese adults in an attempt to find out the unique features characterizing the way they open an online chat. The framework that was chosen for data analysis was the sequential model proposed by Schegloff for American telephone openings.
文摘Nowadays, millions of users use many social media systems every day. These services produce massive messages, which play a vital role in the social networking paradigm. As we see, an intelligent learning emotion system is desperately needed for detecting emotion among these messages. This system could be suitable in understanding users’ feelings towards particular discussion. This paper proposes a text-based emotion recognition approach that uses personal text data to recognize user’s current emotion. The proposed approach applies Dominant Meaning Technique to recognize user’s emotion. The paper reports promising experiential results on the tested dataset based on the proposed algorithm.
基金supported by the Innovation Platform Construction of Qinghai Province(No.2016-ZJ-Y04)the Basic Research Program of Qinghai Province(No.2016-ZJ-740)
文摘Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.
基金supported by the Teaching Research Project of Yangtze University"Research on the Backwash Effects of Changes in CET Question Types"(JY2014041)and"Research on Assessment for Learning in Foreign Language Teaching"(JY2014040)the 12th Five-Year Plan Project of Hubei Educational Science"Formative Assessment of University English Teachers in Hubei"
文摘The present study aims to explore the discursive strategies in Chinese housing news texts on the basis of a self-built corpus.A critical discourse analysis methodology is a—dopted to explore the implicit meaning and attitudes in the housing texts.The keyword analysis proves to be effective in discovering the lexical features of the texts.The findings show that the news texts remain implicit in their attitudes through the use of professional voice and the direction of readers'attention.
基金supported by the Fund for Philosophy and Social Science of Anhui Provincethe Fund for Human and Art Social Science of the Education Department of Anhui Province(Grant Nos.AHSKF0708D13 and 2009sk038)
文摘The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability of training samples. When using the original covering algorithm(CA), many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained. The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network. By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy. Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification.
文摘With the increasing interest in e-commerce shopping, customer reviews have become one of the most important elements that determine customer satisfaction regarding products. This demonstrates the importance of working with Text Mining. This study is based on The Women’s Clothing E-Commerce Reviews database, which consists of reviews written by real customers. The aim of this paper is to conduct a Text Mining approach on a set of customer reviews. Each review was classified as either a positive or negative review by employing a classification method. Four tree-based methods were applied to solve the classification problem, namely Classification Tree, Random Forest, Gradient Boosting and XGBoost. The dataset was categorized into training and test sets. The results indicate that the Random Forest method displays an overfitting, XGBoost displays an overfitting if the number of trees is too high, Classification Tree is good at detecting negative reviews and bad at detecting positive reviews and the Gradient Boosting shows stable values and quality measures above 77% for the test dataset. A consensus between the applied methods is noted for important classification terms.
基金supported by NIH grants R01LM010817 and P01AG039347
文摘Purpose: The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.Design/methodology/approach: Personal recollections and literature review. Findings: The Swanson A-B-C model of literature-based discovery has been successfully used by laboratory investigators analyzing their findings and hypotheses. It continues to be a fertile area of research in a wide range of application areas including text mining, drug repurposing, studies of scientific innovation, knowledge discovery in databases, and bioinformatics. Recently, additional modes of discovery that do not follow the A-B-C model have also been proposed and explored (e.g. so-called storytelling, gaps, analogies, link prediction, negative consensus, outliers, and revival of neglected or discarded research questions). Research limitations: This paper reflects the opinions of the author and is not a comprehensive nor technically based review of literature-based discovery. Practical implications: The general scientific public is still not aware of the availability of tools for literature-based discovery. Our Arrowsmith project site maintains a suite of discovery tools that are free and open to the public (http://arrowsmith.psych.uic.edu), as does BITOLA which is maintained by Dmitar Hristovski (http:// http://ibmi.mf.uni-lj.si/bitola), and Epiphanet which is maintained by Trevor Cohen (http://epiphanet.uth.tme.edu/). Bringing user-friendly tools to the public should be a high priority, since even more than advancing basic research in informatics, it is vital that we ensure that scientists actually use discovery tools and that these are actually able to help them make experimental discoveries in the lab and in the clinic. Originality/value: This paper discusses problems and issues which were inherent in Don's thoughts during his life, including those which have not yet been fully taken up and studied systematically.
文摘Support vector machines(SVMs) are a popular class of supervised learning algorithms, and are particularly applicable to large and high-dimensional classification problems. Like most machine learning methods for data classification and information retrieval, they require manually labeled data samples in the training stage. However, manual labeling is a time consuming and errorprone task. One possible solution to this issue is to exploit the large number of unlabeled samples that are easily accessible via the internet. This paper presents a novel active learning method for text categorization. The main objective of active learning is to reduce the labeling effort, without compromising the accuracy of classification, by intelligently selecting which samples should be labeled.The proposed method selects a batch of informative samples using the posterior probabilities provided by a set of multi-class SVM classifiers, and these samples are then manually labeled by an expert. Experimental results indicate that the proposed active learning method significantly reduces the labeling effort, while simultaneously enhancing the classification accuracy.