As the pivotal green space,urban parks play an important role in urban residents’daily activities.Thy can not only bring people physical health,but also can be more likely to elicit positive sentiment to those who vi...As the pivotal green space,urban parks play an important role in urban residents’daily activities.Thy can not only bring people physical health,but also can be more likely to elicit positive sentiment to those who visit them.Recently,social media big data has provided new data sources for sentiment analysis.However,there was limited researches that explored the connection between urban parks and individual’s sentiments.Therefore,this study firstly employed a pre-trained language model(BERT,Bidirectional Encoder Representations from Transformers)to calculate sentiment scores based on social media data.Secondly,this study analysed the relationship between urban parks and individual’s sentiment from both spatial and temporal perspectives.Finally,by utilizing structural equation model(SEM),we identified 13 factors and analyzed its degree of the influence.The research findings are listed as below:①It confirmed that individuals generally experienced positive sentiment with high sentiment scores in the majority of urban parks;②The urban park type showed an influence on sentiment scores.In this study,higher sentiment scores observed in Eco-parks,comprehensive parks,and historical parks;③The urban parks level showed low impact on sentiment scores.With distinctions observed mainly at level-3 and level-4;④Compared to internal factors in parks,the external infrastructure surround them exerted more significant impact on sentiment scores.For instance,number of bus and subway stations around urban parks led to higher sentiment scores,while scenic spots and restaurants had inverse result.This study provided a novel method to quantify the services of various urban parks,which can be served as inspiration for similar studies in other cities and countries,enhancing their park planning and management strategies.展开更多
COVID-19 posed challenges for global tourism management.Changes in visitor temporal and spatial patterns and their associated determinants pre-and peri-pandemic in Canadian Rocky Mountain National Parks are analyzed.D...COVID-19 posed challenges for global tourism management.Changes in visitor temporal and spatial patterns and their associated determinants pre-and peri-pandemic in Canadian Rocky Mountain National Parks are analyzed.Data was collected through social media programming and analyzed using spatiotemporal analysis and a geographically weighted regression(GWR)model.Results highlight that COVID-19 significantly changed park visitation patterns.Visitors tended to explore more remote areas peri-pandemic.The GWR model also indicated distance to nearby trails was a significant influence on visitor density.Our results indicate that the pandemic influenced tourism temporal and spatial imbalance.This research presents a novel approach using combined social media big data which can be extended to the field of tourism management,and has important implications to manage visitor patterns and to allocate resources efficiently to satisfy multiple objectives of park management.展开更多
Urbanization is one of the most impactful human activities across the world today affecting the quality of urban life and its sustainable development.Urbanization in Africa is occurring at an unprecedented rate and it...Urbanization is one of the most impactful human activities across the world today affecting the quality of urban life and its sustainable development.Urbanization in Africa is occurring at an unprecedented rate and it threatens the attainment of Sustainable Development Goals(SDGs).Urban sprawl has resulted in unsustainable urban development patterns from social,environmental,and economic perspectives.This study is among the first examples of research in Africa to combine remote sensing data with social media data to determine urban sprawl from 2011 to 2017 in Morogoro urban municipality,Tanzania.Random Forest(RF)method was applied to accomplish imagery classification and location-based social media(Twitter usage)data were obtained through a Twitter Application Programming Interface(API).Morogoro urban municipality was classified into built-up,vegetation,agriculture,and water land cover classes while the classification results were validated by the generation of 480 random points.Using the Kernel function,the study measured the location of Twitter users within a 1 km buffer from the center of the city.The results indicate that,expansion of the city(built-up land use),which is primarily driven by population expansion,has negative impacts on ecosystem services because pristine grasslands and forests which provide essential ecosystem services such as carbon sequestration and support for biodiversity have been replaced by built-up land cover.In addition,social media usage data suggest that there is the concentration of Twitter usage within the city center while Twitter usage declines away from the city center with significant spatial and numerical increase in Twitter usage in the study area.The outcome of the study suggests that the combination of remote sensing,social sensing,and population data were useful as a proxy/inference for interpreting urban sprawl and status of access to urban services and infrastructure in Morogoro,and Africa city where data for urban planning is often unavailable,inaccurate,or stale.展开更多
Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Prev...Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance.展开更多
With the emergence of new types of data(e.g.social media data)and cutting-edge computer technology(e.g.Natural Language Processing),the shortcomings of traditional methods(subjective and objective ways)for de-tecting ...With the emergence of new types of data(e.g.social media data)and cutting-edge computer technology(e.g.Natural Language Processing),the shortcomings of traditional methods(subjective and objective ways)for de-tecting urban livability can be overcome by an integrated approach.This study aims to develop a comprehensive approach to measure urban livability based on statistic data,geo-data(e.g.points of interest),questionnaires survey,and social media data(Instagram),from both objective and subjective angles.Hong Kong,as a city with a high level of urbanization and contrasting urban environments,is chosen as the study area in this research.Through this study,the question“which area of Hong Kong is more suitable for living”is answered by the visu-alization of GIS-based analysis.Also,the correlation between livability scores and individuals’sentiment scores are explored.Specifically,the results show that central areas of Hong Kong with a higher level of urbanization are relatively more livable than suburban regions.However,through sentiment analysis,individuals who post Instagram in suburban areas of Hong Kong usually express more positive content and happier emotion than those who post Instagram in central urban areas.The study could offer useful information for the policy action of authorities as well as the residential location choices of citizens.展开更多
Blockchain is a viable solution to provide data integrity for the enormous volume of 5G IoT social data, while we need to break through the throughput bottleneck of blockchain. Sharding is a promising technology to so...Blockchain is a viable solution to provide data integrity for the enormous volume of 5G IoT social data, while we need to break through the throughput bottleneck of blockchain. Sharding is a promising technology to solve the problem of low throughput in blockchains. However, cross-shard communication hinders the effective improvement of blockchain throughput. Therefore, it is critical to reasonably allocate transactions to different shards to improve blockchain throughput. Existing research on blockchain sharding mainly focuses on shards formation, configuration, and consensus, while ignoring the negative impact of cross-shard communication on blockchain throughput. Aiming to maximize the throughput of transaction processing, we study how to allocate blockchain transactions to shards in this paper. We propose an Associated Transaction assignment algorithm based on Closest Fit (ATCF). ATCF classifies associated transactions into transaction groups which are then assigned to different shards in the non-ascending order of transaction group sizes periodically. Within each epoch, ATCF tries to select a shard that can handle all the transactions for each transaction group. If there are multiple such shards, ATCF selects the shard with the remaining processing capacity closest to the number of transactions in the transaction group. When no such shard exists, ATCF chooses the shard with the largest remaining processing capacity for the transaction group. The transaction groups that cannot be completely processed within the current epoch will be allocated in the subsequent epochs. We prove that ATCF is a 2-approximation algorithm for the associated transaction assignment problem. Simulation results show that ATCF can effectively improve the blockchain throughput and reduce the number of cross-shard transactions.展开更多
The explosive growth of mobile data demand is becoming an increasing burden on current cellular network.To address this issue,we propose a solution of opportunistic data offloading for alleviating overloaded cellular ...The explosive growth of mobile data demand is becoming an increasing burden on current cellular network.To address this issue,we propose a solution of opportunistic data offloading for alleviating overloaded cellular traffic.The principle behind it is to select a few important users as seeds for data sharing.The three critical steps are detailed as follows.We first explore individual interests of users by the construction of user profiles,on which an interest graph is built by Gaussian graphical modeling.We then apply the extreme value theory to threshold the encounter duration of user pairs.So,a contact graph is generated to indicate the social relationships of users.Moreover,a contact-interest graph is developed on the basis of the social ties and individual interests of users.Corresponding on different graphs,three strategies are finally proposed for seed selection in an aim to maximize overloaded cellular data.We evaluate the performance of our algorithms by the trace data of real-word mobility.It demonstrates the effectiveness of the strategy of taking social relationships and individual interests into account.展开更多
In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely f...In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.展开更多
With the advent of the 5G Internet of Things era,communication and social interaction in our daily life have changed a lot,and a large amount of social data is transmitted to the Internet.At the same time,with the rap...With the advent of the 5G Internet of Things era,communication and social interaction in our daily life have changed a lot,and a large amount of social data is transmitted to the Internet.At the same time,with the rapid development of deep forgery technology,a new generation of social data trust crisis has also followed.Therefore,how to ensure the trust and credibility of social data in the 5G Internet of Things era is an urgent problem to be solved.This paper proposes a new method for forgery detection based on GANs.We first discover the hidden gradient information in the grayscale image of the forged image and use this gradient information to guide the generation of forged traces.In the classifier,we replace the traditional binary loss with the focal loss that can focus on difficult-to-classify samples,which can achieve accurate classification when the real and fake samples are unbalanced.Experimental results show that the proposed method can achieve high accuracy on the DeeperForensics dataset and with the highest accuracy is 98%.展开更多
Social media plays a crucial role in the organization of massive social movements. However, the sheer quantity of data generated by the events as well as the data collection restrictions that researchers encounter, le...Social media plays a crucial role in the organization of massive social movements. However, the sheer quantity of data generated by the events as well as the data collection restrictions that researchers encounter, leads to a series of challenges for researchers who want to analyze dynamic public discourse and opinion in response to and in the creation of world events. In this paper we present gatherTweet, a Python package that helps researchers efficiently collect social media data for events that are composed of many decentralized actions (across both space and time). The package is useful for studies that require analysis of the organizational or baseline messaging before an action, the action itself, and the effects of the action on subsequent public discourse. By capturing these aspects of world events gatherTweet enables the study of events and actions like protests, natural disasters, and elections.展开更多
Data acquisition and preprocessing is a core course on digital intelligence at Wuhan University that is designed to cultivate students’understanding of data sources and improve preprocessing methods.The course aims a...Data acquisition and preprocessing is a core course on digital intelligence at Wuhan University that is designed to cultivate students’understanding of data sources and improve preprocessing methods.The course aims at fostering digital thinking and literacy and enhancing intelligent computing skills.This study examined digital intelligence education and reform practices integrated into the data acquisition and preprocessing course,which covered web data,social sensing data,remote sensing data,sensor network data,unmanned aerial vehicle data,and 3D data.Moreover,the study explored the development and implementation of the course’s teaching platform,which was based on the open geospatial engine.展开更多
Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for ...Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for pharmacovigilance.Methods Relevant domestic and foreign literature was used to explore text classification based on machine learning,text mining based on deep learning(neural networks)and adverse drug reaction(ADR)terminology.Results and Conclusion Text classification based on traditional machine learning mainly include support vector machine(SVM)algorithm,naive Bayesian(NB)classifier,decision tree,hidden Markov model(HMM)and bidirectional en-coder representations from transformers(BERT).The main neural network text mining based on deep learning are convolution neural network(CNN),recurrent neural network(RNN)and long short-term memory(LSTM).ADR terminology standardization tools mainly include“Medical Dictionary for Regulatory Activities”(MedDRA),“WHODrug”and“Systematized Nomenclature of Medicine-Clinical Terms”(SNOMED CT).展开更多
This study aims to conduct an in-depth analysis of social media data using causal inference methods to explore the underlying mechanisms driving user behavior patterns.By leveraging large-scale social media datasets,t...This study aims to conduct an in-depth analysis of social media data using causal inference methods to explore the underlying mechanisms driving user behavior patterns.By leveraging large-scale social media datasets,this research develops a systematic analytical framework that integrates techniques such as propensity score matching,regression analysis,and regression discontinuity design to identify the causal effects of content characteristics,user attributes,and social network structures on user interactions,including clicks,shares,comments,and likes.The empirical findings indicate that factors such as sentiment,topical relevance,and network centrality have significant causal impacts on user behavior,with notable differences observed among various user groups.This study not only enriches the theoretical understanding of social media data analysis but also provides data-driven decision support and practical guidance for fields such as digital marketing,public opinion management,and digital governance.展开更多
Social media has been the primary source of information from mainstream news agencies due to the large number of users posting their feedback.The COVID-19 outbreak did not only bring a virus with it but it also brough...Social media has been the primary source of information from mainstream news agencies due to the large number of users posting their feedback.The COVID-19 outbreak did not only bring a virus with it but it also brought fear and uncertainty along with inaccurate and misinformation spread on social media platforms.This phenomenon caused a state of panic among people.Different studies were conducted to stop the spread of fake news to help people cope with the situation.In this paper,a semantic analysis of three levels(negative,neutral,and positive)is used to gauge the feelings of Gulf countries towards the pandemic and the lockdown,on basis of a Twitter dataset of 2 months,using Natural Language Processing(NLP)techniques.It has been observed that there are no mixed emotions during the pandemic as it started with a neutral reaction,then positive sentiments,and lastly,peaks of negative reactions.The results show that the feelings of the Gulf countries towards the pandemic depict approximately a 50.5%neutral,a 31.2%positive,and an 18.3%negative sentiment overall.The study can be useful for government authorities to learn the discrepancies between different populations from diverse areas to overcome the COVID-19 spread accordingly.展开更多
Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus o...Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.展开更多
Urban public spaces are pivotal to the welfare and prosperity of modern cities.Recognizing their importance,this research addresses the critical gap in understanding and enhancing the qualities of these spaces through...Urban public spaces are pivotal to the welfare and prosperity of modern cities.Recognizing their importance,this research addresses the critical gap in understanding and enhancing the qualities of these spaces through advanced analytics,focusing on Tehran’s main traditional market,the Bazaar.A novel methodological framework combining Social Network Analysis(SNA),and Strengths,Weaknesses,Opportunities,and Threats(SWOT)analysis,supported by location-based social media reviews,was employed.This innovative approach assessed the Bazaar’s comfort,vitality,and safety,analyzing real-time public interactions and perceptions through social media data.The findings highlighted the Bazaar’s central role in Tehran’s urban landscape and identified the need for strategic design interventions.These interventions aimed to improve walkability,comfort,safety,and diversity,and have been successfully implemented,significantly enhancing the Bazaar’s quality and usability.This study not only advances urban studies and planning by providing a model for urban public space analysis but also underscores the value of social media data in urban analytics.The successful revitalization of Tehran’s Bazaar sets a precedent for enriching urban experiences and boosting city vitality through similar interventions in other urban spaces.展开更多
Clarifying the quality elements that have a significant impact on public perception is a prerequisite for improving the quality of parks,and comparative cross-regional studies can help to identify local landscape pref...Clarifying the quality elements that have a significant impact on public perception is a prerequisite for improving the quality of parks,and comparative cross-regional studies can help to identify local landscape preferences and formulate specialized development strategies.Using online review data and natural language processing methods,this study explores how Chongqing and Chengdu residents’perceptions of environmental features of urban parks affect their overall satisfaction.The results show that:(1)There are 16(Chongqing)and 13(Chengdu)environmental features that residents pay attention to in urban parks,and the two places pay the highest attention to the natural features of urban parks.In addition,Chongqing residents pay more attention to the recreational services of urban parks,while Chengdu residents pay more attention to the aesthetics and culture of urban parks.(2)Positive environmental factors increase visitors’satisfaction,but this effect decreases with increasing frequency of perceived,while negative factors continue to have a negative impact on satisfaction.Through online text data and natural language processing technology,the public’s perception of parks can be analyzed on a large scale,in depth,and with high accuracy,providing guidance for urban sustainable construction and characteristic style extraction.展开更多
One of the main purposes for which people use Twitter is to share emotions with others. Users can easily post a message as a short text when they experience emotions such as pleasure or sadness. Such tweet serves to a...One of the main purposes for which people use Twitter is to share emotions with others. Users can easily post a message as a short text when they experience emotions such as pleasure or sadness. Such tweet serves to acquire empathy from followers, and can possibly influence others' emotions. In this study, we analyze the influence of emotional behaviors to user relationships based on Twitter data using two dictionaries of emotional words. Emotion scores are calculated via keyword matching. Moreover, we design three experiments with different settings: calculate the average emotion score of a user with random sampling, calculate the average emotion score using all emotional tweets, and calculate the average emotion score using emotional tweets, excluding users of few emotional tweets. We evaluate the influence of emotional behaviors to user relationships through the Brunner-Munzel test. The result shows that a positive user is more active than a negative user in constructing user relationships in a specific condition.展开更多
The European Commission report“Turning FAIR into reality”provides an index of 27 FAIR Action Plan recommendations.This index is used for a self-assessment on CESSDA,the Consortium of European Social Science Data Arc...The European Commission report“Turning FAIR into reality”provides an index of 27 FAIR Action Plan recommendations.This index is used for a self-assessment on CESSDA,the Consortium of European Social Science Data Archives.CESSDA is performing well on“Concepts for FAIR implementation”,“Skills for FAIR”,and“Investment in FAIR”;there is work in progress on“FAIR culture”,and work to start up on“FAIR ecosystem”and especially on“Incentives and metrics for FAIR data and services”.Next,an analysis on the FAIR components,reveals that CESSDA has accomplished the“F”,is working on the“A”-considering the sensitivity and security requirements of social data,just started on“I”,and that there is lack of clarity on what should be in“R”.On Findability,the CESSDA Data Catalogue is explained,showing the building blocks that need to be in place before one can produce a catalogue.The article ends with a forward look on CESSDA’s deployment on the FAIR principles.展开更多
This study explores the influence of social media on stock volatility and builds a feature model with an intelligence algorithm using social media data from Xueqiu.com in China, Sina Finance and Economics, Sina Microb...This study explores the influence of social media on stock volatility and builds a feature model with an intelligence algorithm using social media data from Xueqiu.com in China, Sina Finance and Economics, Sina Microblog, and Oriental Fortune. We find that the effect of social factors, such as increased attention to a stock's volatility, is more significant than public sentiment. A prediction model is introduced based on social factors and public sentiment to predict stock volatility. Our findings indicate that the influence of social media data on the next day's volatility is more significant but declines over time.展开更多
基金R&D Program of Beijing Municipal Education Commission(No.KM202211417015)Academic Research Projects of Beijing Union University(No.ZK10202209)+1 种基金The team-building subsidy of“Xuezhi Professorship”of the College of Applied Arts and Science of Beijing Union University(No.BUUCAS-XZJSTD-2024005)Academic Research Projects of Beijing Union University(No.ZKZD202305).
文摘As the pivotal green space,urban parks play an important role in urban residents’daily activities.Thy can not only bring people physical health,but also can be more likely to elicit positive sentiment to those who visit them.Recently,social media big data has provided new data sources for sentiment analysis.However,there was limited researches that explored the connection between urban parks and individual’s sentiments.Therefore,this study firstly employed a pre-trained language model(BERT,Bidirectional Encoder Representations from Transformers)to calculate sentiment scores based on social media data.Secondly,this study analysed the relationship between urban parks and individual’s sentiment from both spatial and temporal perspectives.Finally,by utilizing structural equation model(SEM),we identified 13 factors and analyzed its degree of the influence.The research findings are listed as below:①It confirmed that individuals generally experienced positive sentiment with high sentiment scores in the majority of urban parks;②The urban park type showed an influence on sentiment scores.In this study,higher sentiment scores observed in Eco-parks,comprehensive parks,and historical parks;③The urban parks level showed low impact on sentiment scores.With distinctions observed mainly at level-3 and level-4;④Compared to internal factors in parks,the external infrastructure surround them exerted more significant impact on sentiment scores.For instance,number of bus and subway stations around urban parks led to higher sentiment scores,while scenic spots and restaurants had inverse result.This study provided a novel method to quantify the services of various urban parks,which can be served as inspiration for similar studies in other cities and countries,enhancing their park planning and management strategies.
基金This research was supported by the UBC APFNet Grant(Project ID:2022sp2 CAN).
文摘COVID-19 posed challenges for global tourism management.Changes in visitor temporal and spatial patterns and their associated determinants pre-and peri-pandemic in Canadian Rocky Mountain National Parks are analyzed.Data was collected through social media programming and analyzed using spatiotemporal analysis and a geographically weighted regression(GWR)model.Results highlight that COVID-19 significantly changed park visitation patterns.Visitors tended to explore more remote areas peri-pandemic.The GWR model also indicated distance to nearby trails was a significant influence on visitor density.Our results indicate that the pandemic influenced tourism temporal and spatial imbalance.This research presents a novel approach using combined social media big data which can be extended to the field of tourism management,and has important implications to manage visitor patterns and to allocate resources efficiently to satisfy multiple objectives of park management.
基金This work is supported by the National Natural Science Foundation of China[Grants Number 41771452,41771454 and 41890820]the Natural Science Fund of Hubei Province in China[Grant Number 2018CFA007].
文摘Urbanization is one of the most impactful human activities across the world today affecting the quality of urban life and its sustainable development.Urbanization in Africa is occurring at an unprecedented rate and it threatens the attainment of Sustainable Development Goals(SDGs).Urban sprawl has resulted in unsustainable urban development patterns from social,environmental,and economic perspectives.This study is among the first examples of research in Africa to combine remote sensing data with social media data to determine urban sprawl from 2011 to 2017 in Morogoro urban municipality,Tanzania.Random Forest(RF)method was applied to accomplish imagery classification and location-based social media(Twitter usage)data were obtained through a Twitter Application Programming Interface(API).Morogoro urban municipality was classified into built-up,vegetation,agriculture,and water land cover classes while the classification results were validated by the generation of 480 random points.Using the Kernel function,the study measured the location of Twitter users within a 1 km buffer from the center of the city.The results indicate that,expansion of the city(built-up land use),which is primarily driven by population expansion,has negative impacts on ecosystem services because pristine grasslands and forests which provide essential ecosystem services such as carbon sequestration and support for biodiversity have been replaced by built-up land cover.In addition,social media usage data suggest that there is the concentration of Twitter usage within the city center while Twitter usage declines away from the city center with significant spatial and numerical increase in Twitter usage in the study area.The outcome of the study suggests that the combination of remote sensing,social sensing,and population data were useful as a proxy/inference for interpreting urban sprawl and status of access to urban services and infrastructure in Morogoro,and Africa city where data for urban planning is often unavailable,inaccurate,or stale.
基金supported in part by the Beijing Natural Science Foundation under grants M21032 and 19L2029in part by the National Natural Science Foundation of China under grants U1836106 and 81961138010in part by the Scientific and Technological Innovation Foundation of Foshan under grants BK21BF001 and BK20BF010.
文摘Nowadays short texts can be widely found in various social data in relation to the 5G-enabled Internet of Things (IoT). Short text classification is a challenging task due to its sparsity and the lack of context. Previous studies mainly tackle these problems by enhancing the semantic information or the statistical information individually. However, the improvement achieved by a single type of information is limited, while fusing various information may help to improve the classification accuracy more effectively. To fuse various information for short text classification, this article proposes a feature fusion method that integrates the statistical feature and the comprehensive semantic feature together by using the weighting mechanism and deep learning models. In the proposed method, we apply Bidirectional Encoder Representations from Transformers (BERT) to generate word vectors on the sentence level automatically, and then obtain the statistical feature, the local semantic feature and the overall semantic feature using Term Frequency-Inverse Document Frequency (TF-IDF) weighting approach, Convolutional Neural Network (CNN) and Bidirectional Gate Recurrent Unit (BiGRU). Then, the fusion feature is accordingly obtained for classification. Experiments are conducted on five popular short text classification datasets and a 5G-enabled IoT social dataset and the results show that our proposed method effectively improves the classification performance.
文摘With the emergence of new types of data(e.g.social media data)and cutting-edge computer technology(e.g.Natural Language Processing),the shortcomings of traditional methods(subjective and objective ways)for de-tecting urban livability can be overcome by an integrated approach.This study aims to develop a comprehensive approach to measure urban livability based on statistic data,geo-data(e.g.points of interest),questionnaires survey,and social media data(Instagram),from both objective and subjective angles.Hong Kong,as a city with a high level of urbanization and contrasting urban environments,is chosen as the study area in this research.Through this study,the question“which area of Hong Kong is more suitable for living”is answered by the visu-alization of GIS-based analysis.Also,the correlation between livability scores and individuals’sentiment scores are explored.Specifically,the results show that central areas of Hong Kong with a higher level of urbanization are relatively more livable than suburban regions.However,through sentiment analysis,individuals who post Instagram in suburban areas of Hong Kong usually express more positive content and happier emotion than those who post Instagram in central urban areas.The study could offer useful information for the policy action of authorities as well as the residential location choices of citizens.
基金supported by Anhui Provincial Key R&D Program of China(202004a05020040),the open project of State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System in China(CEMEE2018Z0102B)the open fund of Intelligent Interconnected Systems Laboratory of Anhui Province(PA2021AKSK0114),Hefei University of Technology.
文摘Blockchain is a viable solution to provide data integrity for the enormous volume of 5G IoT social data, while we need to break through the throughput bottleneck of blockchain. Sharding is a promising technology to solve the problem of low throughput in blockchains. However, cross-shard communication hinders the effective improvement of blockchain throughput. Therefore, it is critical to reasonably allocate transactions to different shards to improve blockchain throughput. Existing research on blockchain sharding mainly focuses on shards formation, configuration, and consensus, while ignoring the negative impact of cross-shard communication on blockchain throughput. Aiming to maximize the throughput of transaction processing, we study how to allocate blockchain transactions to shards in this paper. We propose an Associated Transaction assignment algorithm based on Closest Fit (ATCF). ATCF classifies associated transactions into transaction groups which are then assigned to different shards in the non-ascending order of transaction group sizes periodically. Within each epoch, ATCF tries to select a shard that can handle all the transactions for each transaction group. If there are multiple such shards, ATCF selects the shard with the remaining processing capacity closest to the number of transactions in the transaction group. When no such shard exists, ATCF chooses the shard with the largest remaining processing capacity for the transaction group. The transaction groups that cannot be completely processed within the current epoch will be allocated in the subsequent epochs. We prove that ATCF is a 2-approximation algorithm for the associated transaction assignment problem. Simulation results show that ATCF can effectively improve the blockchain throughput and reduce the number of cross-shard transactions.
基金This work was supported in part by National Natural Science Foundation of China under Grant No.61502261,61572457,61379132Key Research and Development Plan Project of Shandong Province under Grant No.2016GGX101032+1 种基金Science,Technology Plan Project for Colleges and Universities of Shandong Province under Grant No.J14LN85the Natural Science Foundation of Shandong Province under Grant No.ZR2017PF013.
文摘The explosive growth of mobile data demand is becoming an increasing burden on current cellular network.To address this issue,we propose a solution of opportunistic data offloading for alleviating overloaded cellular traffic.The principle behind it is to select a few important users as seeds for data sharing.The three critical steps are detailed as follows.We first explore individual interests of users by the construction of user profiles,on which an interest graph is built by Gaussian graphical modeling.We then apply the extreme value theory to threshold the encounter duration of user pairs.So,a contact graph is generated to indicate the social relationships of users.Moreover,a contact-interest graph is developed on the basis of the social ties and individual interests of users.Corresponding on different graphs,three strategies are finally proposed for seed selection in an aim to maximize overloaded cellular data.We evaluate the performance of our algorithms by the trace data of real-word mobility.It demonstrates the effectiveness of the strategy of taking social relationships and individual interests into account.
基金supported by the Major Scientific and Technological Projects of CNPC under Grant ZD2019-183-006partially supported by the Shandong Provincial Natural Science Foundation,China under Grant ZR2020MF006partially supported by“the Fundamental Research Funds for the Central Universities”of China University of Petroleum(East China)under Grant 20CX05017A,18CX02139A.
文摘In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.
基金results of the research project funded by National Natural Science Foundation of China(No.61871283)the Foundation of Pre-Research on Equipment of China(No.61400010304)Major Civil-Military Integration Project in Tianjin,China(No.18ZXJMTG00170).
文摘With the advent of the 5G Internet of Things era,communication and social interaction in our daily life have changed a lot,and a large amount of social data is transmitted to the Internet.At the same time,with the rapid development of deep forgery technology,a new generation of social data trust crisis has also followed.Therefore,how to ensure the trust and credibility of social data in the 5G Internet of Things era is an urgent problem to be solved.This paper proposes a new method for forgery detection based on GANs.We first discover the hidden gradient information in the grayscale image of the forged image and use this gradient information to guide the generation of forged traces.In the classifier,we replace the traditional binary loss with the focal loss that can focus on difficult-to-classify samples,which can achieve accurate classification when the real and fake samples are unbalanced.Experimental results show that the proposed method can achieve high accuracy on the DeeperForensics dataset and with the highest accuracy is 98%.
文摘Social media plays a crucial role in the organization of massive social movements. However, the sheer quantity of data generated by the events as well as the data collection restrictions that researchers encounter, leads to a series of challenges for researchers who want to analyze dynamic public discourse and opinion in response to and in the creation of world events. In this paper we present gatherTweet, a Python package that helps researchers efficiently collect social media data for events that are composed of many decentralized actions (across both space and time). The package is useful for studies that require analysis of the organizational or baseline messaging before an action, the action itself, and the effects of the action on subsequent public discourse. By capturing these aspects of world events gatherTweet enables the study of events and actions like protests, natural disasters, and elections.
文摘Data acquisition and preprocessing is a core course on digital intelligence at Wuhan University that is designed to cultivate students’understanding of data sources and improve preprocessing methods.The course aims at fostering digital thinking and literacy and enhancing intelligent computing skills.This study examined digital intelligence education and reform practices integrated into the data acquisition and preprocessing course,which covered web data,social sensing data,remote sensing data,sensor network data,unmanned aerial vehicle data,and 3D data.Moreover,the study explored the development and implementation of the course’s teaching platform,which was based on the open geospatial engine.
文摘Objective To discuss how to use social media data for post-marketing drug safety monitoring in China as soon as possible by systematically combing the text mining applications,and to provide new ideas and methods for pharmacovigilance.Methods Relevant domestic and foreign literature was used to explore text classification based on machine learning,text mining based on deep learning(neural networks)and adverse drug reaction(ADR)terminology.Results and Conclusion Text classification based on traditional machine learning mainly include support vector machine(SVM)algorithm,naive Bayesian(NB)classifier,decision tree,hidden Markov model(HMM)and bidirectional en-coder representations from transformers(BERT).The main neural network text mining based on deep learning are convolution neural network(CNN),recurrent neural network(RNN)and long short-term memory(LSTM).ADR terminology standardization tools mainly include“Medical Dictionary for Regulatory Activities”(MedDRA),“WHODrug”and“Systematized Nomenclature of Medicine-Clinical Terms”(SNOMED CT).
文摘This study aims to conduct an in-depth analysis of social media data using causal inference methods to explore the underlying mechanisms driving user behavior patterns.By leveraging large-scale social media datasets,this research develops a systematic analytical framework that integrates techniques such as propensity score matching,regression analysis,and regression discontinuity design to identify the causal effects of content characteristics,user attributes,and social network structures on user interactions,including clicks,shares,comments,and likes.The empirical findings indicate that factors such as sentiment,topical relevance,and network centrality have significant causal impacts on user behavior,with notable differences observed among various user groups.This study not only enriches the theoretical understanding of social media data analysis but also provides data-driven decision support and practical guidance for fields such as digital marketing,public opinion management,and digital governance.
文摘Social media has been the primary source of information from mainstream news agencies due to the large number of users posting their feedback.The COVID-19 outbreak did not only bring a virus with it but it also brought fear and uncertainty along with inaccurate and misinformation spread on social media platforms.This phenomenon caused a state of panic among people.Different studies were conducted to stop the spread of fake news to help people cope with the situation.In this paper,a semantic analysis of three levels(negative,neutral,and positive)is used to gauge the feelings of Gulf countries towards the pandemic and the lockdown,on basis of a Twitter dataset of 2 months,using Natural Language Processing(NLP)techniques.It has been observed that there are no mixed emotions during the pandemic as it started with a neutral reaction,then positive sentiments,and lastly,peaks of negative reactions.The results show that the feelings of the Gulf countries towards the pandemic depict approximately a 50.5%neutral,a 31.2%positive,and an 18.3%negative sentiment overall.The study can be useful for government authorities to learn the discrepancies between different populations from diverse areas to overcome the COVID-19 spread accordingly.
基金supported by the National Natural Science Foundation of China(grant no.61573328).
文摘Faster internet, IoT, and social media have reformed the conventional web into a collaborative web resulting in enormous user-generated content. Several studies are focused on such content;however, they mainly focus on textual data, thus undermining the importance of metadata. Considering this gap, we provide a temporal pattern mining framework to model and utilize user-generated content's metadata. First, we scrap 2.1 million tweets from Twitter between Nov-2020 to Sep-2021 about 100 hashtag keywords and present these tweets into 100 User-Tweet-Hashtag (UTH) dynamic graphs. Second, we extract and identify four time-series in three timespans (Day, Hour, and Minute) from UTH dynamic graphs. Lastly, we model these four time-series with three machine learning algorithms to mine temporal patterns with the accuracy of 95.89%, 93.17%, 90.97%, and 93.73%, respectively. We demonstrate that user-generated content's metadata contains valuable information, which helps to understand the users' collective behavior and can be beneficial for business and research. Dataset and codes are publicly available;the link is given in the dataset section.
文摘Urban public spaces are pivotal to the welfare and prosperity of modern cities.Recognizing their importance,this research addresses the critical gap in understanding and enhancing the qualities of these spaces through advanced analytics,focusing on Tehran’s main traditional market,the Bazaar.A novel methodological framework combining Social Network Analysis(SNA),and Strengths,Weaknesses,Opportunities,and Threats(SWOT)analysis,supported by location-based social media reviews,was employed.This innovative approach assessed the Bazaar’s comfort,vitality,and safety,analyzing real-time public interactions and perceptions through social media data.The findings highlighted the Bazaar’s central role in Tehran’s urban landscape and identified the need for strategic design interventions.These interventions aimed to improve walkability,comfort,safety,and diversity,and have been successfully implemented,significantly enhancing the Bazaar’s quality and usability.This study not only advances urban studies and planning by providing a model for urban public space analysis but also underscores the value of social media data in urban analytics.The successful revitalization of Tehran’s Bazaar sets a precedent for enriching urban experiences and boosting city vitality through similar interventions in other urban spaces.
基金supported by the National Natural Science Foundation of China(Grant No.52108043)the Humanities and Social Sciences Research Project of the Chongqing Municipal Education Commission(21SKGH092).
文摘Clarifying the quality elements that have a significant impact on public perception is a prerequisite for improving the quality of parks,and comparative cross-regional studies can help to identify local landscape preferences and formulate specialized development strategies.Using online review data and natural language processing methods,this study explores how Chongqing and Chengdu residents’perceptions of environmental features of urban parks affect their overall satisfaction.The results show that:(1)There are 16(Chongqing)and 13(Chengdu)environmental features that residents pay attention to in urban parks,and the two places pay the highest attention to the natural features of urban parks.In addition,Chongqing residents pay more attention to the recreational services of urban parks,while Chengdu residents pay more attention to the aesthetics and culture of urban parks.(2)Positive environmental factors increase visitors’satisfaction,but this effect decreases with increasing frequency of perceived,while negative factors continue to have a negative impact on satisfaction.Through online text data and natural language processing technology,the public’s perception of parks can be analyzed on a large scale,in depth,and with high accuracy,providing guidance for urban sustainable construction and characteristic style extraction.
文摘One of the main purposes for which people use Twitter is to share emotions with others. Users can easily post a message as a short text when they experience emotions such as pleasure or sadness. Such tweet serves to acquire empathy from followers, and can possibly influence others' emotions. In this study, we analyze the influence of emotional behaviors to user relationships based on Twitter data using two dictionaries of emotional words. Emotion scores are calculated via keyword matching. Moreover, we design three experiments with different settings: calculate the average emotion score of a user with random sampling, calculate the average emotion score using all emotional tweets, and calculate the average emotion score using emotional tweets, excluding users of few emotional tweets. We evaluate the influence of emotional behaviors to user relationships through the Brunner-Munzel test. The result shows that a positive user is more active than a negative user in constructing user relationships in a specific condition.
文摘The European Commission report“Turning FAIR into reality”provides an index of 27 FAIR Action Plan recommendations.This index is used for a self-assessment on CESSDA,the Consortium of European Social Science Data Archives.CESSDA is performing well on“Concepts for FAIR implementation”,“Skills for FAIR”,and“Investment in FAIR”;there is work in progress on“FAIR culture”,and work to start up on“FAIR ecosystem”and especially on“Incentives and metrics for FAIR data and services”.Next,an analysis on the FAIR components,reveals that CESSDA has accomplished the“F”,is working on the“A”-considering the sensitivity and security requirements of social data,just started on“I”,and that there is lack of clarity on what should be in“R”.On Findability,the CESSDA Data Catalogue is explained,showing the building blocks that need to be in place before one can produce a catalogue.The article ends with a forward look on CESSDA’s deployment on the FAIR principles.
基金supported by National Natural Science Foundation of China (Grant No. 71532004)
文摘This study explores the influence of social media on stock volatility and builds a feature model with an intelligence algorithm using social media data from Xueqiu.com in China, Sina Finance and Economics, Sina Microblog, and Oriental Fortune. We find that the effect of social factors, such as increased attention to a stock's volatility, is more significant than public sentiment. A prediction model is introduced based on social factors and public sentiment to predict stock volatility. Our findings indicate that the influence of social media data on the next day's volatility is more significant but declines over time.