期刊文献+
共找到50篇文章
< 1 2 3 >
每页显示 20 50 100
A Chinese Named Entity Recognition Method for News Domain Based on Transfer Learning and Word Embeddings
1
作者 Rui Fang Liangzhong Cui 《Computers, Materials & Continua》 2025年第5期3247-3275,共29页
Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications li... Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications like news summarization and event tracking.However,NER in the news domain faces challenges due to insufficient annotated data,complex entity structures,and strong context dependencies.To address these issues,we propose a new Chinesenamed entity recognition method that integrates transfer learning with word embeddings.Our approach leverages the ERNIE pre-trained model for transfer learning and obtaining general language representations and incorporates the Soft-lexicon word embedding technique to handle varied entity structures.This dual-strategy enhances the model’s understanding of context and boosts its ability to process complex texts.Experimental results show that our method achieves an F1 score of 94.72% on a news dataset,surpassing baseline methods by 3%–4%,thereby confirming its effectiveness for Chinese-named entity recognition in the news domain. 展开更多
关键词 News domain named entity recognition(NER) transfer learning word embeddings ERNIE soft-lexicon
在线阅读 下载PDF
Word Embedding Bootstrapped Deep Active Learning Method to Information Extraction on Chinese Electronic Medical Record 被引量:1
2
作者 MA Qunsheng CEN Xingxing +1 位作者 YUAN Junyi HOU Xumin 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第4期494-502,共9页
Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, whic... Electronic medical record (EMR) containing rich biomedical information has a great potential in disease diagnosis and biomedical research. However, the EMR information is usually in the form of unstructured text, which increases the use cost and hinders its applications. In this work, an effective named entity recognition (NER) method is presented for information extraction on Chinese EMR, which is achieved by word embedding bootstrapped deep active learning to promote the acquisition of medical information from Chinese EMR and to release its value. In this work, deep active learning of bi-directional long short-term memory followed by conditional random field (Bi-LSTM+CRF) is used to capture the characteristics of different information from labeled corpus, and the word embedding models of contiguous bag of words and skip-gram are combined in the above model to respectively capture the text feature of Chinese EMR from unlabeled corpus. To evaluate the performance of above method, the tasks of NER on Chinese EMR with “medical history” content were used. Experimental results show that the word embedding bootstrapped deep active learning method using unlabeled medical corpus can achieve a better performance compared with other models. 展开更多
关键词 deep active learning named entity recognition(NER) information extraction word embedding Chinese electronic medical record(EMR)
原文传递
Word Embeddings and Semantic Spaces in Natural Language Processing 被引量:2
3
作者 Peter J. Worth 《International Journal of Intelligence Science》 2023年第1期1-21,共21页
One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ... One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP. 展开更多
关键词 Natural Language Processing Vector Space Models Semantic Spaces word Embeddings Representation Learning Text Vectorization Machine Learning Deep Learning
在线阅读 下载PDF
Novel Representations of Word Embedding Based on the Zolu Function
4
作者 Jihua Lu Youcheng Zhang 《Journal of Beijing Institute of Technology》 EI CAS 2020年第4期526-530,共5页
Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can ... Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can process extremely large data sets as well as word2vec without increasing the complexity.Also,the models outperform several word embedding methods both in word similarity and syntactic accuracy.The method of ZL-CBOW outperforms CBOW in accuracy by 8.43%on the training set of capital-world,and by 1.24%on the training set of plural-verbs.Moreover,experimental simulations on word similarity and syntactic accuracy show that ZL-CBOW and ZL-SG are superior to LL-CBOW and LL-SG,respectively. 展开更多
关键词 Zolu function word embedding continuous bags of words word similarity accuracy
在线阅读 下载PDF
Aspect-Based Sentiment Classification Using Deep Learning and Hybrid of Word Embedding and Contextual Position
5
作者 Waqas Ahmad Hikmat Ullah Khan +3 位作者 Fawaz Khaled Alarfaj Saqib Iqbal Abdullah Mohammad Alomair Naif Almusallam 《Intelligent Automation & Soft Computing》 SCIE 2023年第9期3101-3124,共24页
Aspect-based sentiment analysis aims to detect and classify the sentiment polarities as negative,positive,or neutral while associating them with their identified aspects from the corresponding context.In this regard,p... Aspect-based sentiment analysis aims to detect and classify the sentiment polarities as negative,positive,or neutral while associating them with their identified aspects from the corresponding context.In this regard,prior methodologies widely utilize either word embedding or tree-based rep-resentations.Meanwhile,the separate use of those deep features such as word embedding and tree-based dependencies has become a significant cause of information loss.Generally,word embedding preserves the syntactic and semantic relations between a couple of terms lying in a sentence.Besides,the tree-based structure conserves the grammatical and logical dependencies of context.In addition,the sentence-oriented word position describes a critical factor that influences the contextual information of a targeted sentence.Therefore,knowledge of the position-oriented information of words in a sentence has been considered significant.In this study,we propose to use word embedding,tree-based representation,and contextual position information in combination to evaluate whether their combination will improve the result’s effectiveness or not.In the meantime,their joint utilization enhances the accurate identification and extraction of targeted aspect terms,which also influences their classification process.In this research paper,we propose a method named Attention Based Multi-Channel Convolutional Neural Net-work(Att-MC-CNN)that jointly utilizes these three deep features such as word embedding with tree-based structure and contextual position informa-tion.These three parameters deliver to Multi-Channel Convolutional Neural Network(MC-CNN)that identifies and extracts the potential terms and classifies their polarities.In addition,these terms have been further filtered with the attention mechanism,which determines the most significant words.The empirical analysis proves the proposed approach’s effectiveness compared to existing techniques when evaluated on standard datasets.The experimental results represent our approach outperforms in the F1 measure with an overall achievement of 94%in identifying aspects and 92%in the task of sentiment classification. 展开更多
关键词 Sentiment analysis word embedding aspect extraction consistency tree multichannel convolutional neural network contextual position information
在线阅读 下载PDF
Enhanced Image Captioning Using Features Concatenation and Efficient Pre-Trained Word Embedding
6
作者 Samar Elbedwehy T.Medhat +1 位作者 Taher Hamza Mohammed F.Alrahmawy 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3637-3652,共16页
One of the issues in Computer Vision is the automatic development of descriptions for images,sometimes known as image captioning.Deep Learning techniques have made significant progress in this area.The typical archite... One of the issues in Computer Vision is the automatic development of descriptions for images,sometimes known as image captioning.Deep Learning techniques have made significant progress in this area.The typical architecture of image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem.This paper aims to find optimized models for these two subsystems.For the image feature extraction subsystem,the research tested eight different concatenations of pairs of vision models to get among them the most expressive extracted feature vector of the image.For the caption generation lingual subsystem,this paper tested three different pre-trained language embedding models:Glove(Global Vectors for Word Representation),BERT(Bidirectional Encoder Representations from Transformers),and TaCL(Token-aware Contrastive Learning),to select from them the most accurate pre-trained language embedding model.Our experiments showed that building an image captioning system that uses a concatenation of the two Transformer based models SWIN(Shiftedwindow)and PVT(PyramidVision Transformer)as an image feature extractor,combined with the TaCL language embedding model is the best result among the other combinations. 展开更多
关键词 Image captioning word embedding CONCATENATION transformer
在线阅读 下载PDF
Leveraging Pre-Trained Word Embedding Models for Fake Review Identification
7
作者 Glody Muka Patrick Mukala 《Journal on Artificial Intelligence》 2024年第1期211-223,共13页
Reviews have a significant impact on online businesses.Nowadays,online consumers rely heavily on other people’s reviews before purchasing a product,instead of looking at the product description.With the emergence of ... Reviews have a significant impact on online businesses.Nowadays,online consumers rely heavily on other people’s reviews before purchasing a product,instead of looking at the product description.With the emergence of technology,malicious online actors are using techniques such as Natural Language Processing(NLP)and others to generate a large number of fake reviews to destroy their competitors’markets.To remedy this situation,several researches have been conducted in the last few years.Most of them have applied NLP techniques to preprocess the text before building Machine Learning(ML)or Deep Learning(DL)models to detect and filter these fake reviews.However,with the same NLP techniques,machine-generated fake reviews are increasing exponentially.This work explores a powerful text representation technique called Embedding models to combat the proliferation of fake reviews in online marketplaces.Indeed,these embedding structures can capture much more information from the data compared to other standard text representations.To do this,we tested our hypothesis in two different Recurrent Neural Network(RNN)architectures,namely Long Short-Term Memory(LSTM)and Gated Recurrent Unit(GRU),using fake review data from Amazon and TripAdvisor.Our experimental results show that our best-proposed model can distinguish between real and fake reviews with 91.44%accuracy.Furthermore,our results corroborate with the state-of-the-art research in this area and demonstrate some improvements over other approaches.Therefore,proper text representation improves the accuracy of fake review detection. 展开更多
关键词 Natural language processing word embedding deep learning fake review detection
在线阅读 下载PDF
城市文化口碑对景点品牌价值的交互式嵌入研究 被引量:3
8
作者 郭斌 《商业研究》 CSSCI 北大核心 2016年第8期171-178,共8页
本文基于品牌信号生成与传播的视角,构建城市景点品牌生成的机制框架,通过文本分析及层次分析,评估并比较北京、上海、香港等不同城市的景点品牌分值差异,阐释城市文化口碑对景点品牌价值的交互式嵌入影响。研究表明:城市景点品牌价值... 本文基于品牌信号生成与传播的视角,构建城市景点品牌生成的机制框架,通过文本分析及层次分析,评估并比较北京、上海、香港等不同城市的景点品牌分值差异,阐释城市文化口碑对景点品牌价值的交互式嵌入影响。研究表明:城市景点品牌价值以旅游竞争优势为基础,且又受城市文化口碑的嵌入与干扰;城市不同景点均体现出独特的城市文化内涵,共同组成城市文化口碑的物质载体,两者存在"俱损俱荣"的交互作用。 展开更多
关键词 城市文化 景点品牌 口碑 交互式嵌入
在线阅读 下载PDF
网络嵌入性对会展企业经营的影响 被引量:1
9
作者 陈秋英 张莉 《厦门理工学院学报》 2013年第1期85-89,共5页
以Granovetter关于网络嵌入性的关系嵌入性和结构嵌入性理论为基础,结合中国国际投资贸易洽谈会、海峡西岸汽车博览会和中国进出口商品交易会三个著名展会的案例分析,探讨关系嵌入性的强关系和弱关系、结构嵌入的高密度网络和低密度网... 以Granovetter关于网络嵌入性的关系嵌入性和结构嵌入性理论为基础,结合中国国际投资贸易洽谈会、海峡西岸汽车博览会和中国进出口商品交易会三个著名展会的案例分析,探讨关系嵌入性的强关系和弱关系、结构嵌入的高密度网络和低密度网络对会展业经营的影响。指出会展企业应从嵌入的网络中获取知识与能力,以提升企业核心竞争力;同时政府、行业协会和媒体等相关嵌入的网络应为会展企业的发展提供支持。 展开更多
关键词 网络嵌入性 会展企业 关系嵌入性 结构嵌入性
在线阅读 下载PDF
Suggestion Mining from Opinionated Text of Big Social Media Data 被引量:6
10
作者 Youseef Alotaibi Muhammad Noman Malik +4 位作者 Huma Hayat Khan Anab Batool Saif ul Islam Abdulmajeed Alsufyani Saleh Alghamdi 《Computers, Materials & Continua》 SCIE EI 2021年第9期3323-3338,共16页
:Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates cha... :Social media data are rapidly increasing and constitute a source of user opinions and tips on a wide range of products and services.The increasing availability of such big data on biased reviews and blogs creates challenges for customers and businesses in reviewing all content in their decision-making process.To overcome this challenge,extracting suggestions from opinionated text is a possible solution.In this study,the characteristics of suggestions are analyzed and a suggestion mining extraction process is presented for classifying suggestive sentences from online customers’reviews.A classification using a word-embedding approach is used via the XGBoost classifier.The two datasets used in this experiment relate to online hotel reviews and Microsoft Windows App Studio discussion reviews.F1,precision,recall,and accuracy scores are calculated.The results demonstrated that the XGBoost classifier outperforms—with an accuracy of more than 80%.Moreover,the results revealed that suggestion keywords and phrases are the predominant features for suggestion extraction.Thus,this study contributes to knowledge and practice by comparing feature extraction classifiers and identifying XGBoost as a better suggestion mining process for identifying online reviews. 展开更多
关键词 Suggestion mining word embedding Naïve Bayes random forest XGBoost DATASET
在线阅读 下载PDF
Multi-Level Knowledge Engineering Approach for Mapping Implicit Aspects to Explicit Aspects 被引量:4
11
作者 Jibran Mir Azhar Mahmood Shaheen Khatoon 《Computers, Materials & Continua》 SCIE EI 2022年第2期3491-3509,共19页
Aspect’s extraction is a critical task in aspect-based sentiment analysis,including explicit and implicit aspects identification.While extensive research has identified explicit aspects,little effort has been put for... Aspect’s extraction is a critical task in aspect-based sentiment analysis,including explicit and implicit aspects identification.While extensive research has identified explicit aspects,little effort has been put forward on implicit aspects extraction due to the complexity of the problem.Moreover,existing research on implicit aspect identification is widely carried out on product reviews targeting specific aspects while neglecting sentences’dependency problems.Therefore,in this paper,a multi-level knowledge engineering approach for identifying implicit movie aspects is proposed.The proposed method first identifies explicit aspects using a variant of BiLSTM and CRF(Bidirectional Long Short Memory-Conditional Random Field),which serve as a memory to process dependent sentences to infer implicit aspects.It can identify implicit aspects from four types of sentences,including independent and three types of dependent sentences.The study is evaluated on a largemovie reviews dataset with 50k examples.The experimental results showed that the explicit aspect identification method achieved 89%F1-score and implicit aspect extraction methods achieved 76%F1-score.In addition,the proposed approach also performs better than the state-of-the-art techniques(NMFIAD andML-KB+)on the product review dataset,where it achieved 93%precision,92%recall,and 93%F1-score. 展开更多
关键词 Movie NEs(named entities) ASPECTS opinion words annotation process memory implicit aspects implicit aspects mapping word embedding and BiLSTM
在线阅读 下载PDF
Automatic Classification of Swedish Metadata Using Dewey Decimal Classification:A Comparison of Approaches 被引量:2
12
作者 Koraljka Golub Johan Hagelback Anders Ardo 《Journal of Data and Information Science》 CSCD 2020年第1期18-38,共21页
Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization syst... Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems. 展开更多
关键词 LIBRIS Dewey Decimal Classification Automatic classification Machine learning Support Vector Machine Multinomial Naive Bayes Simple linear network Standard neural network 1D convolutional neural network Recurrent neural network word embeddings String matching
在线阅读 下载PDF
Neural Machine Translation Models with Attention-Based Dropout Layer 被引量:1
13
作者 Huma Israr Safdar Abbas Khan +3 位作者 Muhammad Ali Tahir Muhammad Khuram Shahzad Muneer Ahmad Jasni Mohamad Zain 《Computers, Materials & Continua》 SCIE EI 2023年第5期2981-3009,共29页
In bilingual translation,attention-based Neural Machine Translation(NMT)models are used to achieve synchrony between input and output sequences and the notion of alignment.NMT model has obtained state-of-the-art perfo... In bilingual translation,attention-based Neural Machine Translation(NMT)models are used to achieve synchrony between input and output sequences and the notion of alignment.NMT model has obtained state-of-the-art performance for several language pairs.However,there has been little work exploring useful architectures for Urdu-to-English machine translation.We conducted extensive Urdu-to-English translation experiments using Long short-term memory(LSTM)/Bidirectional recurrent neural networks(Bi-RNN)/Statistical recurrent unit(SRU)/Gated recurrent unit(GRU)/Convolutional neural network(CNN)and Transformer.Experimental results show that Bi-RNN and LSTM with attention mechanism trained iteratively,with a scalable data set,make precise predictions on unseen data.The trained models yielded competitive results by achieving 62.6%and 61%accuracy and 49.67 and 47.14 BLEU scores,respectively.From a qualitative perspective,the translation of the test sets was examined manually,and it was observed that trained models tend to produce repetitive output more frequently.The attention score produced by Bi-RNN and LSTM produced clear alignment,while GRU showed incorrect translation for words,poor alignment and lack of a clear structure.Therefore,we considered refining the attention-based models by defining an additional attention-based dropout layer.Attention dropout fixes alignment errors and minimizes translation errors at the word level.After empirical demonstration and comparison with their counterparts,we found improvement in the quality of the resulting translation system and a decrease in the perplexity and over-translation score.The ability of the proposed model was evaluated using Arabic-English and Persian-English datasets as well.We empirically concluded that adding an attention-based dropout layer helps improve GRU,SRU,and Transformer translation and is considerably more efficient in translation quality and speed. 展开更多
关键词 Natural language processing neural machine translation word embedding ATTENTION PERPLEXITY selective dropout regularization URDU PERSIAN Arabic BLEU
在线阅读 下载PDF
Identification of Sarcasm in Textual Data: A Comparative Study 被引量:1
14
作者 Pulkit Mehndiratta Devpriya Soni 《Journal of Data and Information Science》 CSCD 2019年第4期56-83,共28页
Purpose:Ever increasing penetration of the Internet in our lives has led to an enormous amount of multimedia content generation on the internet.Textual data contributes a major share towards data generated on the worl... Purpose:Ever increasing penetration of the Internet in our lives has led to an enormous amount of multimedia content generation on the internet.Textual data contributes a major share towards data generated on the world wide web.Understanding people’s sentiment is an important aspect of natural language processing,but this opinion can be biased and incorrect,if people use sarcasm while commenting,posting status updates or reviewing any product or a movie.Thus,it is of utmost importance to detect sarcasm correctly and make a correct prediction about the people’s intentions.Design/methodology/approach:This study tries to evaluate various machine learning models along with standard and hybrid deep learning models across various standardized datasets.We have performed vectorization of text using word embedding techniques.This has been done to convert the textual data into vectors for analytical purposes.We have used three standardized datasets available in public domain and used three word embeddings i.e Word2Vec,GloVe and fastText to validate the hypothesis.Findings:The results were analyzed and conclusions are drawn.The key finding is:the hybrid models that include Bidirectional LongTerm Short Memory(Bi-LSTM)and Convolutional Neural Network(CNN)outperform others conventional machine learning as well as deep learning models across all the datasets considered in this study,making our hypothesis valid.Research limitations:Using the data from different sources and customizing the models according to each dataset,slightly decreases the usability of the technique.But,overall this methodology provides effective measures to identify the presence of sarcasm with a minimum average accuracy of 80%or above for one dataset and better than the current baseline results for the other datasets.Practical implications:The results provide solid insights for the system developers to integrate this model into real-time analysis of any review or comment posted in the public domain.This study has various other practical implications for businesses that depend on user ratings and public opinions.This study also provides a launching platform for various researchers to work on the problem of sarcasm identification in textual data.Originality/value:This is a first of its kind study,to provide us the difference between conventional and the hybrid methods of prediction of sarcasm in textual data.The study also provides possible indicators that hybrid models are better when applied to textual data for analysis of sarcasm. 展开更多
关键词 Machine learning Artificial neural networks word embedding Text vectorization ACCURACY
在线阅读 下载PDF
Machine Learning-Based Advertisement Banner Identification Technique for Effective Piracy Website Detection Process
15
作者 Lelisa Adeba Jilcha Jin Kwak 《Computers, Materials & Continua》 SCIE EI 2022年第5期2883-2899,共17页
In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement.Billions of dollars are lost annually because of this illegal act. The... In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement.Billions of dollars are lost annually because of this illegal act. The currentmost effective trend to tackle this problem is believed to be blocking thosewebsites, particularly through affiliated government bodies. To do so, aneffective detection mechanism is a necessary first step. Some researchers haveused various approaches to analyze the possible common features of suspectedpiracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, theseadvertisements have some common attributes that make them unique ascompared to advertisements posted on normal or legitimate websites. Theyusually encompass keywords such as click-words (words that redirect to installmalicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key featuresin the process of successfully detecting websites involved in the act of copyrightinfringement. Research has been conducted to identify advertisements servedon suspected piracy websites. However, these studies use a static approachthat relies mainly on manual scanning for the aforementioned keywords. Thisbrings with it some limitations, particularly in coping with the dynamic andever-changing behavior of advertisements posted on these websites. Therefore,we propose a technique that can continuously fine-tune itself and is intelligentenough to effectively identify advertisement (Ad) banners extracted fromsuspected piracy websites. We have done this by leveraging the power ofmachine learning algorithms, particularly the support vector machine with theword2vec word-embedding model. After applying the proposed technique to1015 Ad banners collected from 98 suspected piracy websites and 90 normal orlegitimate websites, we were able to successfully identify Ad banners extractedfrom suspected piracy websites with an accuracy of 97%. We present thistechnique with the hope that it will be a useful tool for various effective piracywebsite detection approaches. To our knowledge, this is the first approachthat uses machine learning to identify Ad banners served on suspected piracywebsites. 展开更多
关键词 Copyright infringement piracy website detection online advertisement advertisement banners machine learning support vector machine word embedding word2vec
在线阅读 下载PDF
Deep Neural Network and Pseudo Relevance Feedback Based Query Expansion
16
作者 Abhishek Kumar Shukla Sujoy Das 《Computers, Materials & Continua》 SCIE EI 2022年第5期3557-3570,共14页
The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retriev... The neural network has attracted researchers immensely in the last couple of years due to its wide applications in various areas such as Data mining,Natural language processing,Image processing,and Information retrieval etc.Word embedding has been applied by many researchers for Information retrieval tasks.In this paper word embedding-based skip-gram model has been developed for the query expansion task.Vocabulary terms are obtained from the top“k”initially retrieved documents using the Pseudo relevance feedback model and then they are trained using the skip-gram model to find the expansion terms for the user query.The performance of the model based on mean average precision is 0.3176.The proposed model compares with other existing models.An improvement of 6.61%,6.93%,and 9.07%on MAP value is observed compare to the Original query,BM25 model,and query expansion with the Chi-Square model respectively.The proposed model also retrieves 84,25,and 81 additional relevant documents compare to the original query,query expansion with Chi-Square model,and BM25 model respectively and thus improves the recall value also.The per query analysis reveals that the proposed model performs well in 30,36,and 30 queries compare to the original query,query expansion with Chi-square model,and BM25 model respectively. 展开更多
关键词 Information retrieval query expansion word embedding neural network deep neural network
在线阅读 下载PDF
Quantum Particle Swarm Optimization with Deep Learning-Based Arabic Tweets Sentiment Analysis
17
作者 Badriyya BAl-onazi Abdulkhaleq Q.A.Hassan +5 位作者 Mohamed K.Nour Mesfer Al Duhayyim Abdullah Mohamed Amgad Atta Abdelmageed Ishfaq Yaseen Gouse Pasha Mohammed 《Computers, Materials & Continua》 SCIE EI 2023年第5期2575-2591,共17页
Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier u... Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier upon the SA of the tweets were mostly aimed at automating the feature extraction process.In this background,the current study introduces a novel method called Quantum Particle Swarm Optimization with Deep Learning-Based Sentiment Analysis on Arabic Tweets(QPSODL-SAAT).The presented QPSODL-SAAT model determines and classifies the sentiments of the tweets written in Arabic.Initially,the data pre-processing is performed to convert the raw tweets into a useful format.Then,the word2vec model is applied to generate the feature vectors.The Bidirectional Gated Recurrent Unit(BiGRU)classifier is utilized to identify and classify the sentiments.Finally,the QPSO algorithm is exploited for the optimal finetuning of the hyperparameters involved in the BiGRU model.The proposed QPSODL-SAAT model was experimentally validated using the standard datasets.An extensive comparative analysis was conducted,and the proposed model achieved a maximum accuracy of 98.35%.The outcomes confirmed the supremacy of the proposed QPSODL-SAAT model over the rest of the approaches,such as the Surface Features(SF),Generic Embeddings(GE),Arabic Sentiment Embeddings constructed using the Hybrid(ASEH)model and the Bidirectional Encoder Representations from Transformers(BERT)model. 展开更多
关键词 Sentiment analysis Arabic tweets quantum particle swarm optimization deep learning word embedding
在线阅读 下载PDF
Personality Assessment Based on Natural Stream of Thoughts Empowered with Machine Learning
18
作者 Mohammed Salahat Liaqat Ali +1 位作者 Taher M.Ghazal Haitham M.Alzoubi 《Computers, Materials & Continua》 SCIE EI 2023年第7期1-17,共17页
Knowing each other is obligatory in a multi-agent collaborative environment.Collaborators may develop the desired know-how of each other in various aspects such as habits,job roles,status,and behaviors.Among different... Knowing each other is obligatory in a multi-agent collaborative environment.Collaborators may develop the desired know-how of each other in various aspects such as habits,job roles,status,and behaviors.Among different distinguishing characteristics related to a person,personality traits are an effective predictive tool for an individual’s behavioral pattern.It has been observed that when people are asked to share their details through questionnaires,they intentionally or unintentionally become biased.They knowingly or unknowingly provide enough information in much-unbiased comportment in open writing about themselves.Such writings can effectively assess an individual’s personality traits that may yield enormous possibilities for applications such as forensic departments,job interviews,mental health diagnoses,etc.Stream of consciousness,collected by James Pennbaker and Laura King,is one such way of writing,referring to a narrative technique where the emotions and thoughts of the writer are presented in a way that brings the reader to the fluid through the mental states of the narrator.More-over,computationally,various attempts have been made in an individual’s personality traits assessment through deep learning algorithms;however,the effectiveness and reliability of results vary with varying word embedding techniques.This article proposes an empirical approach to assessing personality by applying convolutional networks to text documents.Bidirectional Encoder Representations from Transformers(BERT)word embedding technique is used for word vector generation to enhance the contextual meanings. 展开更多
关键词 Personality traits convolutional neural network deep learning word embedding
在线阅读 下载PDF
Research on Web Page Classification Method Based on Query Log
19
作者 YE Feiyue MA Yixing 《Journal of Shanghai Jiaotong university(Science)》 EI 2018年第3期404-410,共7页
Web page classification is an important application in many fields of Internet information retrieval,such as providing directory classification and vertical search. Methods based on query log which is a light weight v... Web page classification is an important application in many fields of Internet information retrieval,such as providing directory classification and vertical search. Methods based on query log which is a light weight version of Web page classification can avoid Web content crawling, making it relatively high in efficiency, but the sparsity of user click data makes it difficult to be used directly for constructing a classifier. To solve this problem, we explore the semantic relations among different queries through word embedding, and propose three improved graph structure classification algorithms. To reflect the semantic relevance between queries, we map the user query into the low-dimensional space according to its query vector in the first step. Then, we calculate the uniform resource locator(URL) vector according to the relationship between the query and URL. Finally, we use the improved label propagation algorithm(LPA) and the bipartite graph expansion algorithm to classify the unlabeled Web pages. Experiments show that our methods make about 20% more increase in F1-value than other Web page classification methods based on query log. 展开更多
关键词 Web page classification word embedding query log
原文传递
Improved Metaheuristics with Deep Learning Enabled Movie Review Sentiment Analysis
20
作者 Abdelwahed Motwakel Najm Alotaibi +5 位作者 Eatedal Alabdulkreem Hussain Alshahrani MohamedAhmed Elfaki Mohamed K Nour Radwa Marzouk Mahmoud Othman 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期1249-1266,共18页
Sentiment Analysis(SA)of natural language text is not only a challenging process but also gains significance in various Natural Language Processing(NLP)applications.The SA is utilized in various applications,namely,ed... Sentiment Analysis(SA)of natural language text is not only a challenging process but also gains significance in various Natural Language Processing(NLP)applications.The SA is utilized in various applications,namely,education,to improve the learning and teaching processes,marketing strategies,customer trend predictions,and the stock market.Various researchers have applied lexicon-related approaches,Machine Learning(ML)techniques and so on to conduct the SA for multiple languages,for instance,English and Chinese.Due to the increased popularity of the Deep Learning models,the current study used diverse configuration settings of the Convolution Neural Network(CNN)model and conducted SA for Hindi movie reviews.The current study introduces an Effective Improved Metaheuristics with Deep Learning(DL)-Enabled Sentiment Analysis for Movie Reviews(IMDLSA-MR)model.The presented IMDLSA-MR technique initially applies different levels of pre-processing to convert the input data into a compatible format.Besides,the Term Frequency-Inverse Document Frequency(TF-IDF)model is exploited to generate the word vectors from the pre-processed data.The Deep Belief Network(DBN)model is utilized to analyse and classify the sentiments.Finally,the improved Jellyfish Search Optimization(IJSO)algorithm is utilized for optimal fine-tuning of the hyperparameters related to the DBN model,which shows the novelty of the work.Different experimental analyses were conducted to validate the better performance of the proposed IMDLSA-MR model.The comparative study outcomes highlighted the enhanced performance of the proposed IMDLSA-MR model over recent DL models with a maximum accuracy of 98.92%. 展开更多
关键词 Corpus linguistics sentiment analysis natural language processing deep learning word embedding
在线阅读 下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部