Predicting the currency exchange rate is crucial for financial agents,risk managers,and policymakers.Traditional approaches use publicly announced news on macroeconomic and financial variables as predictors of currenc...Predicting the currency exchange rate is crucial for financial agents,risk managers,and policymakers.Traditional approaches use publicly announced news on macroeconomic and financial variables as predictors of currency exchange.However,the rise of social media may have changed the source of information.For instance,tweets can help investors make informed decisions about the foreign exchange(FX)market by reflecting market sentiment and opinion.From another aspect,changes in currency exchange may incite agents to post tweets.Are tweets good predictors of currency exchange?Is the relationship between tweets and currency exchange bidirectional?We investigate the comovement/causality between the number of#dolar(“enflasyon”resp.)tweets and USDTRY currency exchange using wavelet coherence and transfer entropy(TE)to answer these questions.Wavelet coherence allows us to determine the relationship between the number of tweets and the USDTRY rate by considering the time–frequency domain.TE enables us to quantify the net information flow between the number of tweets and USDTRY.Data from October 2020 to March 2022 were used.The obtained results remain robust regardless of the frequency of retained data(daily or hourly)and the methods used(wavelet or TE).Based on our results,USDTRY is correlated with the number of#dolar tweets(#inflation)mainly in the short run and a few times in the medium run.These relationships change through time and frequency(wavelet analysis results).However,the results from TE indicate a bidirectional relationship between the#dolar(#inflation)tweets number and the USDTRY exchange rate.The influence of the exchange rate on the number of tweets is highly pronounced.Financial agents,risk managers,policymakers,and investors should then pay moderate attention to the number of#dolar(#inflation)tweets in trading/forecasting the USD–TRY exchange rate.展开更多
Microblogs currently play an important role in social communication. Hot topics currently being tweeted can quickly become popular within a very short time as a result of retweeting. Gaining an understanding of the re...Microblogs currently play an important role in social communication. Hot topics currently being tweeted can quickly become popular within a very short time as a result of retweeting. Gaining an understanding of the retweeting behavior is desirable for a number of tasks such as topic detection, personalized message recommendation, and fake information monitoring and prevention. Interestingly, the propagation of tweets bears some similarity to the spread of infectious diseases. We present a method to model the tweets' spread behavior in microblogs based on the classic Susceptible-Infectious-Susceptible (SIS) epidemic model that was developed in the medical field for the spread of infectious diseases. On the basis of this model, future retweeting trends can be predicted. Our experiments on data obtained from the Chinese micro-blogging website Sina Weibo show that the proposed model has lower predictive error compared to the four commonly used prediction methods.展开更多
The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS;introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the ...The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS;introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the way organizations design and operate their datacenters. However, some organizations still have many concerns like: security, governance, lack of expertise, and migration. The purpose of this paper is to discuss the cloud computing customers’ opinions, feedbacks, attitudes, and emotions towards cloud computing services using sentiment analysis. The associated aim, is to help people and organizations to understand the benefits and challenges of cloud services from the general public’s perspective view as well as opinions about existing cloud providers, focusing on three main cloud providers: Azure, Amazon Web Services (AWS) and Google Cloud. The methodology used in this paper is based on sentiment analysis applied to the tweets that were extracted from social media platform (Twitter) via its search API. We have extracted a sample of 11,000 tweets and each cloud provider has almost similar proportion of the tweets based on relevant hashtags and keywords. Analysis starts by combining the tweets in order to find the overall polarity about cloud computing, then breaking the tweets to find the specific polarity for each cloud provider. Bing and NRC Lexicons are employed to measure the polarity and emotion of the terms in the tweets. The overall polarity classification of the tweets across all cloud providers shows 68.5% positive and 31.5% negative percentages. More specifically, Azure shows 63.8% positive and 36.2% negative tweets, Google Cloud shows 72.6% positive and 27.4% negative tweets and AWS shows 69.1% positive and 30.9% negative tweets.展开更多
Sentiment Analysis(SA)is one of the Machine Learning(ML)techniques that has been investigated by several researchers in recent years,especially due to the evolution of novel data collection methods focused on social m...Sentiment Analysis(SA)is one of the Machine Learning(ML)techniques that has been investigated by several researchers in recent years,especially due to the evolution of novel data collection methods focused on social media.In literature,it has been reported that SA data is created for English language in excess of any other language.It is challenging to perform SA for Arabic Twitter data owing to informal nature and rich morphology of Arabic language.An earlier study conducted upon SA for Arabic Twitter focused mostly on automatic extraction of the features from the text.Neural word embedding has been employed in literature,since it is less labor-intensive than automatic feature engineering.By ignoring the context of sentiment,most of the word-embedding models follow syntactic data of words.The current study presents a new Dragonfly Optimization with Deep Learning Enabled Sentiment Analysis for Arabic Tweets(DFODLSAAT)model.The aim of the presented DFODL-SAAT model is to distinguish the sentiments from opinions that are tweeted in Arabic language.At first,data cleaning and pre-processing steps are performed to convert the input tweets into a useful format.In addition,TF-IDF model is exploited as a feature extractor to generate the feature vectors.Besides,Attention-based Bidirectional Long Short Term Memory(ABLSTM)technique is applied for identification and classification of sentiments.At last,the hyperparameters of ABLSTM model are optimized using DFO algorithm.The performance of the proposed DFODL-SAAT model was validated using the benchmark dataset and the outcomes were investigated under different aspects.The experimental outcomes highlight the superiority of DFODL-SAAT model over recent approaches.展开更多
With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,l...With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.展开更多
Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier u...Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier upon the SA of the tweets were mostly aimed at automating the feature extraction process.In this background,the current study introduces a novel method called Quantum Particle Swarm Optimization with Deep Learning-Based Sentiment Analysis on Arabic Tweets(QPSODL-SAAT).The presented QPSODL-SAAT model determines and classifies the sentiments of the tweets written in Arabic.Initially,the data pre-processing is performed to convert the raw tweets into a useful format.Then,the word2vec model is applied to generate the feature vectors.The Bidirectional Gated Recurrent Unit(BiGRU)classifier is utilized to identify and classify the sentiments.Finally,the QPSO algorithm is exploited for the optimal finetuning of the hyperparameters involved in the BiGRU model.The proposed QPSODL-SAAT model was experimentally validated using the standard datasets.An extensive comparative analysis was conducted,and the proposed model achieved a maximum accuracy of 98.35%.The outcomes confirmed the supremacy of the proposed QPSODL-SAAT model over the rest of the approaches,such as the Surface Features(SF),Generic Embeddings(GE),Arabic Sentiment Embeddings constructed using the Hybrid(ASEH)model and the Bidirectional Encoder Representations from Transformers(BERT)model.展开更多
Extracting information about emerging events in large study areas through spatiotemporal and textual analysis of geotagged tweets provides the possibility of monitoring the current state of a disaster.This study propo...Extracting information about emerging events in large study areas through spatiotemporal and textual analysis of geotagged tweets provides the possibility of monitoring the current state of a disaster.This study proposes dynamic spatio-temporal tweet mining as a method for dynamic event extraction from geotagged tweets in large study areas.It introduces the use of a modified version of ordering points to identify the clustering structure to address the intrinsic heterogeneity of Twitter data.To precisely calculate the textual similarity,three state-of-theart text embedding methods of Word2vec,GloVe,and Fast Text were used to capture both syntactic and semantic similarities.The impact of selected embedding algorithms on the quality of the outputs was studied.Different combinations of spatial and temporal distances with the textual similarity measure were investigated to improve the event detection outcomes.The proposed method was applied to a case study related to 2018 Hurricane Florence.The method was able to precisely identify events of varied sizes and densities before,during,and after the hurricane.The feasibility of the proposed method was qualitatively evaluated using the Silhouette coefficient and qualitatively discussed.The proposed method was also compared to an implementation based on the standard density-based spatial clustering of applications with noise algorithm,where it showed more promising results.展开更多
In recent years,social media such as Twitter have received much attention as a new data source for rapid flood awareness.The timely response and large coverage provided by citizen sensors significantly compensate the ...In recent years,social media such as Twitter have received much attention as a new data source for rapid flood awareness.The timely response and large coverage provided by citizen sensors significantly compensate the limitations of non-timely remote sensing data and spatially isolated river gauges.However,automatic extraction of flood tweets from a massive tweets pool remains a challenge.Taking the Houston Flood in 2017 as a study case,this paper presents an automated flood tweets extraction approach by mining both visual and textual information a tweet contains.A CNN architecture was designed to classify the visual content of flood pictures during the Houston Flood.A sensitivity test was then applied to extract flood-sensitive keywords that were further used to refine the CNN classified results.A duplication test was finally performed to trim the database by removing the duplicated pictures to create the flood tweets pool for the flood event.The results indicated that coupling CNN classification results with flood-sensitive words in tweets allows a significant increase in precision while keeps the recall rate in a high level.The elimination of tweets containing duplicated pictures greatly contributes to higher spatio-temporal relevance to the flood.展开更多
Purpose-The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents,which is useful for achieving the robust tweets data cl...Purpose-The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents,which is useful for achieving the robust tweets data clustering results.Design/methodology/approach-Let“N”be the number of tweets documents for the topics extraction.Unwanted texts,punctuations and other symbols are removed,tokenization and stemming operations are performed in the initial tweets pre-processing step.Bag-of-features are determined for the tweets;later tweets are modelled with the obtained bag-of-features during the process of topics extraction.Approximation of topics features are extracted for every tweet document.These set of topics features of N documents are treated as multi-viewpoints.The key idea of the proposed work is to use multi-viewpoints in the similarity features computation.The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents(here N 55)and corresponding documents are defined in projected space with five viewpoints,say,v_(1),v_(2),v_(3),v4,and v5.For example,similarity features between two documents(viewpoints v_(1),and v_(2))are computed concerning the other three multi-viewpoints(v_(3),v4,and v5),unlike a single viewpoint in traditional cosine metric.Findings-Healthcare problems with tweets data.Topic models play a crucial role in the classification of health-related tweets with finding topics(or health clusters)instead of finding term frequency and inverse document frequency(TF-IDF)for unlabelled tweets.Originality/value-Topic models play a crucial role in the classification of health-related tweets with finding topics(or health clusters)instead of finding TF-IDF for unlabelled tweets.展开更多
Since the end of 2019,the COVID-19 outbreak worldwide has not only presented challenges for government agencies in addressing public health emergency,but also tested their capacity in dealing with public opinion on so...Since the end of 2019,the COVID-19 outbreak worldwide has not only presented challenges for government agencies in addressing public health emergency,but also tested their capacity in dealing with public opinion on social media and responding to social emergencies.To understand the impact of COVID-19 related tweets posted by the major public health agencies in the United States on public emotion,this paper studied public emotional diffusion in the tweets network,including its process and characteristics,by taking Twitter users of four official public health systems in the United States as an example.We extracted the interactions between tweets in the COVID-19-Tweet Ids data set and drew the tweets diffusion network.We proposed a method to measure the characteristics of the emotional diffusion network,with which we analyzed the changes of the public emotional intensity and the proportion of emotional polarity,investigated the emotional influence of key nodes and users,and the emotional diffusion of tweets at different tweeting time,tweet topics and the tweet posting agencies.The results show that the emotional polarity of tweets has changed from negative to positive with the improvement of pandemic management measures.The public’s emotional polarity on pandemic related topics tends to be negative,and the emotional intensity of management measures such as pandemic medical services turn from positive to negative to the greatest extent,while the emotional intensity of pandemic related knowledge changes the most.The tweets posted by the Centers for Disease Control and Prevention and the Food and Drug Administration of the United States have a broad impact on public emotions,and the emotional spread of tweets’polarity eventually forms a very close proportion of opposite emotions.展开更多
Language is a media of scientific communication.Language distribution of scientific communication reflects the status of global scientific power.The study,based on scientific tweets,has revealed the language distribut...Language is a media of scientific communication.Language distribution of scientific communication reflects the status of global scientific power.The study,based on scientific tweets,has revealed the language distribution in informal scientific communication.展开更多
社交媒体中,用户所发布的推文内容记录了与用户相关的各种信息。文字信息中涵盖了推文中包含的各种话题,以及时间和空间信息,从这些信息中分析出话题的时空演变情况具有十分重要的研究意义。针对推文数据,设计了一套可视分析流程来挖掘...社交媒体中,用户所发布的推文内容记录了与用户相关的各种信息。文字信息中涵盖了推文中包含的各种话题,以及时间和空间信息,从这些信息中分析出话题的时空演变情况具有十分重要的研究意义。针对推文数据,设计了一套可视分析流程来挖掘推文信息,通过用户交互的方式多角度地展示了推文话题的时空演变过程。首先,基于部分历史推文数据,通过DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类算法,结合泰森多边形对全球地理空间进行区域划分;然后,针对用户查询搜索的兴趣话题,索引找到所有相关的推文内容,并将信息与聚类中心绑定;最后,通过设计的多个结合时序聚类算法和自适应算法的可视化视图来展示话题的时空演变过程。通过推特官网提供的API抓取存储的推文数据,并进行实验和分析,结果表明:改进的可视化视图自适应布局算法有效地解决了图形遮挡问题,完整展现了推文的时空演变模式;地理区域的划分以及可视化组件能够有效帮助研究人员分析推文的时空演变以及全球关注的热点话题分布。展开更多
文摘Predicting the currency exchange rate is crucial for financial agents,risk managers,and policymakers.Traditional approaches use publicly announced news on macroeconomic and financial variables as predictors of currency exchange.However,the rise of social media may have changed the source of information.For instance,tweets can help investors make informed decisions about the foreign exchange(FX)market by reflecting market sentiment and opinion.From another aspect,changes in currency exchange may incite agents to post tweets.Are tweets good predictors of currency exchange?Is the relationship between tweets and currency exchange bidirectional?We investigate the comovement/causality between the number of#dolar(“enflasyon”resp.)tweets and USDTRY currency exchange using wavelet coherence and transfer entropy(TE)to answer these questions.Wavelet coherence allows us to determine the relationship between the number of tweets and the USDTRY rate by considering the time–frequency domain.TE enables us to quantify the net information flow between the number of tweets and USDTRY.Data from October 2020 to March 2022 were used.The obtained results remain robust regardless of the frequency of retained data(daily or hourly)and the methods used(wavelet or TE).Based on our results,USDTRY is correlated with the number of#dolar tweets(#inflation)mainly in the short run and a few times in the medium run.These relationships change through time and frequency(wavelet analysis results).However,the results from TE indicate a bidirectional relationship between the#dolar(#inflation)tweets number and the USDTRY exchange rate.The influence of the exchange rate on the number of tweets is highly pronounced.Financial agents,risk managers,policymakers,and investors should then pay moderate attention to the number of#dolar(#inflation)tweets in trading/forecasting the USD–TRY exchange rate.
基金supported by National Natural Science Foundation of China under Grants No. 60773156, No. 61073004Chinese Major State Basic Research Development 973 Program under Grant No. 2011CB302203-2Important National Science &Technology Specific Program under Grant No. 2011ZX01042001-002-2
文摘Microblogs currently play an important role in social communication. Hot topics currently being tweeted can quickly become popular within a very short time as a result of retweeting. Gaining an understanding of the retweeting behavior is desirable for a number of tasks such as topic detection, personalized message recommendation, and fake information monitoring and prevention. Interestingly, the propagation of tweets bears some similarity to the spread of infectious diseases. We present a method to model the tweets' spread behavior in microblogs based on the classic Susceptible-Infectious-Susceptible (SIS) epidemic model that was developed in the medical field for the spread of infectious diseases. On the basis of this model, future retweeting trends can be predicted. Our experiments on data obtained from the Chinese micro-blogging website Sina Weibo show that the proposed model has lower predictive error compared to the four commonly used prediction methods.
文摘The availability and advancements of cloud computing service models such as IaaS, SaaS, and PaaS;introducing on-demand self-service, auto scaling, easy maintenance, and pay as you go, has dramatically transformed the way organizations design and operate their datacenters. However, some organizations still have many concerns like: security, governance, lack of expertise, and migration. The purpose of this paper is to discuss the cloud computing customers’ opinions, feedbacks, attitudes, and emotions towards cloud computing services using sentiment analysis. The associated aim, is to help people and organizations to understand the benefits and challenges of cloud services from the general public’s perspective view as well as opinions about existing cloud providers, focusing on three main cloud providers: Azure, Amazon Web Services (AWS) and Google Cloud. The methodology used in this paper is based on sentiment analysis applied to the tweets that were extracted from social media platform (Twitter) via its search API. We have extracted a sample of 11,000 tweets and each cloud provider has almost similar proportion of the tweets based on relevant hashtags and keywords. Analysis starts by combining the tweets in order to find the overall polarity about cloud computing, then breaking the tweets to find the specific polarity for each cloud provider. Bing and NRC Lexicons are employed to measure the polarity and emotion of the terms in the tweets. The overall polarity classification of the tweets across all cloud providers shows 68.5% positive and 31.5% negative percentages. More specifically, Azure shows 63.8% positive and 36.2% negative tweets, Google Cloud shows 72.6% positive and 27.4% negative tweets and AWS shows 69.1% positive and 30.9% negative tweets.
基金The authors are thankful to the Deanship of Scientific Research at Najran University for funding this work under the National Research Priorities funding program,support under code number:NU/NRP/SERC/11/3.
文摘Sentiment Analysis(SA)is one of the Machine Learning(ML)techniques that has been investigated by several researchers in recent years,especially due to the evolution of novel data collection methods focused on social media.In literature,it has been reported that SA data is created for English language in excess of any other language.It is challenging to perform SA for Arabic Twitter data owing to informal nature and rich morphology of Arabic language.An earlier study conducted upon SA for Arabic Twitter focused mostly on automatic extraction of the features from the text.Neural word embedding has been employed in literature,since it is less labor-intensive than automatic feature engineering.By ignoring the context of sentiment,most of the word-embedding models follow syntactic data of words.The current study presents a new Dragonfly Optimization with Deep Learning Enabled Sentiment Analysis for Arabic Tweets(DFODLSAAT)model.The aim of the presented DFODL-SAAT model is to distinguish the sentiments from opinions that are tweeted in Arabic language.At first,data cleaning and pre-processing steps are performed to convert the input tweets into a useful format.In addition,TF-IDF model is exploited as a feature extractor to generate the feature vectors.Besides,Attention-based Bidirectional Long Short Term Memory(ABLSTM)technique is applied for identification and classification of sentiments.At last,the hyperparameters of ABLSTM model are optimized using DFO algorithm.The performance of the proposed DFODL-SAAT model was validated using the benchmark dataset and the outcomes were investigated under different aspects.The experimental outcomes highlight the superiority of DFODL-SAAT model over recent approaches.
文摘With the rapid growth of internet usage,a new situation has been created that enables practicing bullying.Cyberbullying has increased over the past decade,and it has the same adverse effects as face-to-face bullying,like anger,sadness,anxiety,and fear.With the anonymity people get on the internet,they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study.This study presents a thorough background of cyberbullying and the techniques used to collect,preprocess,and analyze the datasets.Moreover,a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages,and it was deduced that there is significant room for improvement in the Arabic language.As a result,the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing(NLP)for the classification of Arabic datasets duly collected from Twitter(also known as X).In this regard,support vector machine(SVM),Naive Bayes(NB),Random Forest(RF),Logistic regression(LR),Bootstrap aggregating(Bagging),Gradient Boosting(GBoost),Light Gradient Boosting Machine(LightGBM),Adaptive Boosting(AdaBoost),and eXtreme Gradient Boosting(XGBoost)were shortlisted and investigated due to their effectiveness in the similar problems.Finally,the scheme was evaluated by well-known performance measures like accuracy,precision,Recall,and F1-score.Consequently,XGBoost exhibited the best performance with 89.95%accuracy,which is promising compared to the state-of-the-art.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Small Groups Project under Grant Number(120/43)Princess Nourah Bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R263)+1 种基金Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura Universitysupporting this work by Grant Code:(22UQU4310373DSR36).
文摘Sentiment Analysis(SA),a Machine Learning(ML)technique,is often applied in the literature.The SA technique is specifically applied to the data collected from social media sites.The research studies conducted earlier upon the SA of the tweets were mostly aimed at automating the feature extraction process.In this background,the current study introduces a novel method called Quantum Particle Swarm Optimization with Deep Learning-Based Sentiment Analysis on Arabic Tweets(QPSODL-SAAT).The presented QPSODL-SAAT model determines and classifies the sentiments of the tweets written in Arabic.Initially,the data pre-processing is performed to convert the raw tweets into a useful format.Then,the word2vec model is applied to generate the feature vectors.The Bidirectional Gated Recurrent Unit(BiGRU)classifier is utilized to identify and classify the sentiments.Finally,the QPSO algorithm is exploited for the optimal finetuning of the hyperparameters involved in the BiGRU model.The proposed QPSODL-SAAT model was experimentally validated using the standard datasets.An extensive comparative analysis was conducted,and the proposed model achieved a maximum accuracy of 98.35%.The outcomes confirmed the supremacy of the proposed QPSODL-SAAT model over the rest of the approaches,such as the Surface Features(SF),Generic Embeddings(GE),Arabic Sentiment Embeddings constructed using the Hybrid(ASEH)model and the Bidirectional Encoder Representations from Transformers(BERT)model.
文摘Extracting information about emerging events in large study areas through spatiotemporal and textual analysis of geotagged tweets provides the possibility of monitoring the current state of a disaster.This study proposes dynamic spatio-temporal tweet mining as a method for dynamic event extraction from geotagged tweets in large study areas.It introduces the use of a modified version of ordering points to identify the clustering structure to address the intrinsic heterogeneity of Twitter data.To precisely calculate the textual similarity,three state-of-theart text embedding methods of Word2vec,GloVe,and Fast Text were used to capture both syntactic and semantic similarities.The impact of selected embedding algorithms on the quality of the outputs was studied.Different combinations of spatial and temporal distances with the textual similarity measure were investigated to improve the event detection outcomes.The proposed method was applied to a case study related to 2018 Hurricane Florence.The method was able to precisely identify events of varied sizes and densities before,during,and after the hurricane.The feasibility of the proposed method was qualitatively evaluated using the Silhouette coefficient and qualitatively discussed.The proposed method was also compared to an implementation based on the standard density-based spatial clustering of applications with noise algorithm,where it showed more promising results.
文摘In recent years,social media such as Twitter have received much attention as a new data source for rapid flood awareness.The timely response and large coverage provided by citizen sensors significantly compensate the limitations of non-timely remote sensing data and spatially isolated river gauges.However,automatic extraction of flood tweets from a massive tweets pool remains a challenge.Taking the Houston Flood in 2017 as a study case,this paper presents an automated flood tweets extraction approach by mining both visual and textual information a tweet contains.A CNN architecture was designed to classify the visual content of flood pictures during the Houston Flood.A sensitivity test was then applied to extract flood-sensitive keywords that were further used to refine the CNN classified results.A duplication test was finally performed to trim the database by removing the duplicated pictures to create the flood tweets pool for the flood event.The results indicated that coupling CNN classification results with flood-sensitive words in tweets allows a significant increase in precision while keeps the recall rate in a high level.The elimination of tweets containing duplicated pictures greatly contributes to higher spatio-temporal relevance to the flood.
文摘Purpose-The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents,which is useful for achieving the robust tweets data clustering results.Design/methodology/approach-Let“N”be the number of tweets documents for the topics extraction.Unwanted texts,punctuations and other symbols are removed,tokenization and stemming operations are performed in the initial tweets pre-processing step.Bag-of-features are determined for the tweets;later tweets are modelled with the obtained bag-of-features during the process of topics extraction.Approximation of topics features are extracted for every tweet document.These set of topics features of N documents are treated as multi-viewpoints.The key idea of the proposed work is to use multi-viewpoints in the similarity features computation.The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents(here N 55)and corresponding documents are defined in projected space with five viewpoints,say,v_(1),v_(2),v_(3),v4,and v5.For example,similarity features between two documents(viewpoints v_(1),and v_(2))are computed concerning the other three multi-viewpoints(v_(3),v4,and v5),unlike a single viewpoint in traditional cosine metric.Findings-Healthcare problems with tweets data.Topic models play a crucial role in the classification of health-related tweets with finding topics(or health clusters)instead of finding term frequency and inverse document frequency(TF-IDF)for unlabelled tweets.Originality/value-Topic models play a crucial role in the classification of health-related tweets with finding topics(or health clusters)instead of finding TF-IDF for unlabelled tweets.
基金supported by Humanities and Social Science Research Fund of the Ministry of Education in China(Grant No.18YJC840045)Jiangsu Social Science Fund(No.20TQA001)
文摘Since the end of 2019,the COVID-19 outbreak worldwide has not only presented challenges for government agencies in addressing public health emergency,but also tested their capacity in dealing with public opinion on social media and responding to social emergencies.To understand the impact of COVID-19 related tweets posted by the major public health agencies in the United States on public emotion,this paper studied public emotional diffusion in the tweets network,including its process and characteristics,by taking Twitter users of four official public health systems in the United States as an example.We extracted the interactions between tweets in the COVID-19-Tweet Ids data set and drew the tweets diffusion network.We proposed a method to measure the characteristics of the emotional diffusion network,with which we analyzed the changes of the public emotional intensity and the proportion of emotional polarity,investigated the emotional influence of key nodes and users,and the emotional diffusion of tweets at different tweeting time,tweet topics and the tweet posting agencies.The results show that the emotional polarity of tweets has changed from negative to positive with the improvement of pandemic management measures.The public’s emotional polarity on pandemic related topics tends to be negative,and the emotional intensity of management measures such as pandemic medical services turn from positive to negative to the greatest extent,while the emotional intensity of pandemic related knowledge changes the most.The tweets posted by the Centers for Disease Control and Prevention and the Food and Drug Administration of the United States have a broad impact on public emotions,and the emotional spread of tweets’polarity eventually forms a very close proportion of opposite emotions.
文摘Language is a media of scientific communication.Language distribution of scientific communication reflects the status of global scientific power.The study,based on scientific tweets,has revealed the language distribution in informal scientific communication.
文摘社交媒体中,用户所发布的推文内容记录了与用户相关的各种信息。文字信息中涵盖了推文中包含的各种话题,以及时间和空间信息,从这些信息中分析出话题的时空演变情况具有十分重要的研究意义。针对推文数据,设计了一套可视分析流程来挖掘推文信息,通过用户交互的方式多角度地展示了推文话题的时空演变过程。首先,基于部分历史推文数据,通过DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类算法,结合泰森多边形对全球地理空间进行区域划分;然后,针对用户查询搜索的兴趣话题,索引找到所有相关的推文内容,并将信息与聚类中心绑定;最后,通过设计的多个结合时序聚类算法和自适应算法的可视化视图来展示话题的时空演变过程。通过推特官网提供的API抓取存储的推文数据,并进行实验和分析,结果表明:改进的可视化视图自适应布局算法有效地解决了图形遮挡问题,完整展现了推文的时空演变模式;地理区域的划分以及可视化组件能够有效帮助研究人员分析推文的时空演变以及全球关注的热点话题分布。