Objective To study the research status,research hotspots and development trends in the field of real-world data(RWD)through social network analysis and knowledge graph analysis.Methods RWD of the past 10 years were re...Objective To study the research status,research hotspots and development trends in the field of real-world data(RWD)through social network analysis and knowledge graph analysis.Methods RWD of the past 10 years were retrieved,and literature metrological analysis was made by using UCINET and CiteSpace from CNKI.Results and Conclusion The frequency and centrality of related keywords such as real-world study,hospital information system(HIS),drug combination,data mining and TCM are high.The clusters labeled as clinical medication and RWD contain more keywords.In recent 4 years,there are more articles involving the keywords of data specification,data authenticity,data security and information security.Among them,compound Kushen injection,HIS database and RWD are the top three keywords.It is a long-term research hotspot for Chinese and western medicine to use HIS to study clinical medication,clinical characteristics,diseases and injections.Besides,the research of RWD database has changed from construction to standardized collection and governance,which can make RWD effective.Data authenticity,data security and information security will become the new hotspots in the research of RWD.展开更多
The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on pr...The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.展开更多
The inter-city linkage heat data provided by Baidu Migration is employed as a characterization of inter-city linkages in order to facilitate the study of the network linkage characteristics and hierarchical structure ...The inter-city linkage heat data provided by Baidu Migration is employed as a characterization of inter-city linkages in order to facilitate the study of the network linkage characteristics and hierarchical structure of urban agglomeration in the Greater Bay Area through the use of social network analysis method.This is the inaugural application of big data based on location services in the study of urban agglomeration network structure,which represents a novel research perspective on this topic.The study reveals that the density of network linkages in the Greater Bay Area urban agglomeration has reached 100%,indicating a mature network-like spatial structure.This structure has given rise to three distinct communities:Shenzhen-Dongguan-Huizhou,Guangzhou-Foshan-Zhaoqing,and Zhuhai-Zhongshan-Jiangmen.Additionally,cities within the Greater Bay Area urban agglomeration play different roles,suggesting that varying development strategies may be necessary to achieve staggered development.The study demonstrates that large datasets represented by LBS can offer novel insights and methodologies for the examination of urban agglomeration network structures,contingent on the appropriate mining and processing of the data.展开更多
The public is increasingly using social media platforms such as Twitter and Facebook to express their views on a variety of topics.As a result,social media has emerged as the most effective and largest open source for...The public is increasingly using social media platforms such as Twitter and Facebook to express their views on a variety of topics.As a result,social media has emerged as the most effective and largest open source for obtaining public opinion.Single node computational methods are inefficient for sentiment analysis on such large datasets.Supercomputers or parallel or distributed proces-sing are two options for dealing with such large amounts of data.Most parallel programming frameworks,such as MPI(Message Processing Interface),are dif-ficult to use and scale in environments where supercomputers are expensive.Using the Apache Spark Parallel Model,this proposed work presents a scalable system for sentiment analysis on Twitter.A Spark-based Naive Bayes training technique is suggested for this purpose;unlike prior research,this algorithm does not need any disk access.Millions of tweets have been classified using the trained model.Experiments with various-sized clusters reveal that the suggested strategy is extremely scalable and cost-effective for larger data sets.It is nearly 12 times quicker than the Map Reduce-based model and nearly 21 times faster than the Naive Bayes Classifier in Apache Mahout.To evaluate the framework’s scalabil-ity,we gathered a large training corpus from Twitter.The accuracy of the classi-fier trained with this new dataset was more than 80%.展开更多
The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for id...The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for identifying sentiment in OSNs such as communication pattern mining and classification based on emoticon and parts of speech, the majority of them utilize a suboptimal batch mode learning approach when analyzing a large amount of real time data. As an alternative we present a stream algorithm using Modified Balanced Winnow for sentiment analysis on OSNs. Tested on three real-world network datasets, the performance of our sentiment predictions is close to that of batch learning with the ability to detect important features dynamically for sentiment analysis in data streams. These top features reveal key words important to the analysis of sentiment.展开更多
Half centuries of follow-up survey has enabled the architects and urban planners to design rationally by the aid of planning Nonetheless, limitation has occurred at planning because city has been changing its utility ...Half centuries of follow-up survey has enabled the architects and urban planners to design rationally by the aid of planning Nonetheless, limitation has occurred at planning because city has been changing its utility in accordance with its users' demand. In this paper, the authors proposed a method to analyze trait of users in market areas near stations by analyzing location based social network. After the datum collection from geotagged tweets, these GPS (global positioning system) datum were plotted to map attained from yahoo open location platform. Then the morphological analysis and terminology extraction system extracted the keywords and their scores. After calculating the distance from stations and users' GPS coordination, the authors extracted the array of keywords and corresponding scores in some station market area. Lastly, ratios of all users' scores and city's scores were calculated to examine the locality. Full combination of data collection, natural language processing and visualization enabled the authors to envisage distribution of collective background in city.展开更多
City cluster is an effective platform for encouraging regionally coordinated development.Coordinated reduction of carbon emissions within city cluster via the spatial association network between cities can help coordi...City cluster is an effective platform for encouraging regionally coordinated development.Coordinated reduction of carbon emissions within city cluster via the spatial association network between cities can help coordinate the regional carbon emission management,realize sustainable development,and assist China in achieving the carbon peaking and carbon neutrality goals.This paper applies the improved gravity model and social network analysis(SNA)to the study of spatial correlation of carbon emissions in city clusters and analyzes the structural characteristics of the spatial correlation network of carbon emissions in the Yangtze River Delta(YRD)city cluster in China and its influencing factors.The results demonstrate that:1)the spatial association of carbon emissions in the YRD city cluster exhibits a typical and complex multi-threaded network structure.The network association number and density show an upward trend,indicating closer spatial association between cities,but their values remain generally low.Meanwhile,the network hierarchy and network efficiency show a downward trend but remain high.2)The spatial association network of carbon emissions in the YRD city cluster shows an obvious‘core-edge’distribution pattern.The network is centered around Shanghai,Suzhou and Wuxi,all of which play the role of‘bridges’,while cities such as Zhoushan,Ma'anshan,Tongling and other cities characterized by the remote location,single transportation mode or lower economic level are positioned at the edge of the network.3)Geographic proximity,varying levels of economic development,different industrial structures,degrees of urbanization,levels of technological innovation,energy intensities and environmental regulation are important influencing factors on the spatial association of within the YRD city cluster.Finally,policy implications are provided from four aspects:government macro-control and market mechanism guidance,structural characteristics of the‘core-edge’network,reconfiguration and optimization of the spatial layout of the YRD city cluster,and the application of advanced technologies.展开更多
Due to the unique geographical location and historical background of Central Asia,the region’s geo-relation networks are complex and changeable.A social network analysis was conducted in this study to visualize the 2...Due to the unique geographical location and historical background of Central Asia,the region’s geo-relation networks are complex and changeable.A social network analysis was conducted in this study to visualize the 20-year evolutionary process of bilateral(diplomatic relations)and multilateral(intergovernmental organization(IGO)connections)networks in Central Asia since 1993.Additionally,a further empirical study determined the significant driving forces of the construction of the geo-relation networks.The results showed that since the independence of the five Central Asian countries,their degree centrality(C,D(ni))values have been increasing,with the index values being the highest for Kazakhstan,followed by Uzbekistan,while the other three countries had relatively low values.The Central Asian countries maintain bilateral relations with post-Soviet nations,neighboring countries,and Western powers,and have gradually deepened and expanded their diplomatic networks.From each state’s perspective,the geostrategic approaches adopted by the five countries were different.Kazakhstan has focused on expanding its bilateral and multilateral relations,while the other Central Asian countries have attempted to increase their influence by joining influential IGOs.Various driving forces,including economic,political,cultural,and geographical factors,have played significant roles in the construction of geo-relation networks in Central Asia.The importance of these factors has changed over time,from political and cultural factors(before 1995)to relations with neighboring countries(1996-2001),and finally to economic power and cultural and religious proximity(after 2002).展开更多
In the course of network supported collaborative design,the data processing plays a very vital role.Much effort has been spent in this area,and many kinds of approaches have been proposed.Based on the correlative mate...In the course of network supported collaborative design,the data processing plays a very vital role.Much effort has been spent in this area,and many kinds of approaches have been proposed.Based on the correlative materials,this paper presents extensible markup language(XML)based strategy for several important problems of data processing in network supported collaborative design,such as the representation of standard for the exchange of product model data(STEP)with XML in the product information expression and the management of XML documents using relational database.The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language(SQL)queries.Finally,the structure of data processing system based on XML is presented.展开更多
Despite the recent development of many worldwide initiatives, there is still a need for the development of observation frameworks that will provide a comprehensive view of SDI’s use. Amongst the many challenges left,...Despite the recent development of many worldwide initiatives, there is still a need for the development of observation frameworks that will provide a comprehensive view of SDI’s use. Amongst the many challenges left, a thorough analysis of the information flows between existing SDIs as well as their respective uses and the way that those evolve over time is an important issue to explore. The research presented in this paper introduces a methodological framework oriented to the study of the SDIs use from a diachronic perspective. The approach is based on a Social Network Analysis (SNA) and questionnaires collected by online surveys. We develop a structural and diachronic analysis based on a series of graph-based measures identifying the main patterns that appear over time. The methodological framework is applied to a series of French SDIs and users involved in environmental management. The study identifies a series of structural differences in the data flows that emerge between the users and SDIs. Last, the diachronic network analysis provides an overall understanding on how data flows evolve over time at different institutional levels.展开更多
This paper is devoted to analyze and model user reading and replying activities in a bulletin board system(BBS)social network.By analyzing the data set from a famous Chinese BBS social network,we show how some user ac...This paper is devoted to analyze and model user reading and replying activities in a bulletin board system(BBS)social network.By analyzing the data set from a famous Chinese BBS social network,we show how some user activities distribute,and reveal several important features that might characterize user dynamics.We propose a method to model user activities in the BBS social network.The model could reproduce power-law and non-power-law distributions of user activities at the same time.Our results show that user reading and replying activities could be simulated through simple agent-based models.Specifically,manners of how the BBS server interacts with Internet users in the Web 2.0 application,how users organize their reading lists,and how user behavioral trait distributes are the important factors in the formation of activity patterns.展开更多
Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leadi...Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leading social media platform—Sina Weibo, a hybrid of Twitter and Facebook—has more than 600 million users. Weibo’s great market penetration suggests that tourism operators and markets need to understand how to build effective and sustainable communications on Chinese social media platforms. In order to offer a better decision support platform to tourism destination managers as well as Chinese tourists, we proposed a framework using linked data on Sina Weibo. Linked Data is a term referring to using the Internet to connect related data. We will show how it can be used and how ontology can be designed to include the users’ context (e.g., GPS locations). Our framework will provide a good theoretical foundation for further understand Chinese tourists’ expectation, experiences, behaviors and new trends in Switzerland.展开更多
文摘Objective To study the research status,research hotspots and development trends in the field of real-world data(RWD)through social network analysis and knowledge graph analysis.Methods RWD of the past 10 years were retrieved,and literature metrological analysis was made by using UCINET and CiteSpace from CNKI.Results and Conclusion The frequency and centrality of related keywords such as real-world study,hospital information system(HIS),drug combination,data mining and TCM are high.The clusters labeled as clinical medication and RWD contain more keywords.In recent 4 years,there are more articles involving the keywords of data specification,data authenticity,data security and information security.Among them,compound Kushen injection,HIS database and RWD are the top three keywords.It is a long-term research hotspot for Chinese and western medicine to use HIS to study clinical medication,clinical characteristics,diseases and injections.Besides,the research of RWD database has changed from construction to standardized collection and governance,which can make RWD effective.Data authenticity,data security and information security will become the new hotspots in the research of RWD.
基金We thank the anonymous reviewers and editors for their very constructive comments.the National Social Science Foundation Project of China under Grant 16BTQ085.
文摘The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.
文摘The inter-city linkage heat data provided by Baidu Migration is employed as a characterization of inter-city linkages in order to facilitate the study of the network linkage characteristics and hierarchical structure of urban agglomeration in the Greater Bay Area through the use of social network analysis method.This is the inaugural application of big data based on location services in the study of urban agglomeration network structure,which represents a novel research perspective on this topic.The study reveals that the density of network linkages in the Greater Bay Area urban agglomeration has reached 100%,indicating a mature network-like spatial structure.This structure has given rise to three distinct communities:Shenzhen-Dongguan-Huizhou,Guangzhou-Foshan-Zhaoqing,and Zhuhai-Zhongshan-Jiangmen.Additionally,cities within the Greater Bay Area urban agglomeration play different roles,suggesting that varying development strategies may be necessary to achieve staggered development.The study demonstrates that large datasets represented by LBS can offer novel insights and methodologies for the examination of urban agglomeration network structures,contingent on the appropriate mining and processing of the data.
文摘The public is increasingly using social media platforms such as Twitter and Facebook to express their views on a variety of topics.As a result,social media has emerged as the most effective and largest open source for obtaining public opinion.Single node computational methods are inefficient for sentiment analysis on such large datasets.Supercomputers or parallel or distributed proces-sing are two options for dealing with such large amounts of data.Most parallel programming frameworks,such as MPI(Message Processing Interface),are dif-ficult to use and scale in environments where supercomputers are expensive.Using the Apache Spark Parallel Model,this proposed work presents a scalable system for sentiment analysis on Twitter.A Spark-based Naive Bayes training technique is suggested for this purpose;unlike prior research,this algorithm does not need any disk access.Millions of tweets have been classified using the trained model.Experiments with various-sized clusters reveal that the suggested strategy is extremely scalable and cost-effective for larger data sets.It is nearly 12 times quicker than the Map Reduce-based model and nearly 21 times faster than the Naive Bayes Classifier in Apache Mahout.To evaluate the framework’s scalabil-ity,we gathered a large training corpus from Twitter.The accuracy of the classi-fier trained with this new dataset was more than 80%.
文摘The rising popularity of online social networks (OSNs), such as Twitter, Facebook, MySpace, and LinkedIn, in recent years has sparked great interest in sentiment analysis on their data. While many methods exist for identifying sentiment in OSNs such as communication pattern mining and classification based on emoticon and parts of speech, the majority of them utilize a suboptimal batch mode learning approach when analyzing a large amount of real time data. As an alternative we present a stream algorithm using Modified Balanced Winnow for sentiment analysis on OSNs. Tested on three real-world network datasets, the performance of our sentiment predictions is close to that of batch learning with the ability to detect important features dynamically for sentiment analysis in data streams. These top features reveal key words important to the analysis of sentiment.
文摘Half centuries of follow-up survey has enabled the architects and urban planners to design rationally by the aid of planning Nonetheless, limitation has occurred at planning because city has been changing its utility in accordance with its users' demand. In this paper, the authors proposed a method to analyze trait of users in market areas near stations by analyzing location based social network. After the datum collection from geotagged tweets, these GPS (global positioning system) datum were plotted to map attained from yahoo open location platform. Then the morphological analysis and terminology extraction system extracted the keywords and their scores. After calculating the distance from stations and users' GPS coordination, the authors extracted the array of keywords and corresponding scores in some station market area. Lastly, ratios of all users' scores and city's scores were calculated to examine the locality. Full combination of data collection, natural language processing and visualization enabled the authors to envisage distribution of collective background in city.
基金Under the auspices of the National Natural Science Foundation of China (No.72273151)。
文摘City cluster is an effective platform for encouraging regionally coordinated development.Coordinated reduction of carbon emissions within city cluster via the spatial association network between cities can help coordinate the regional carbon emission management,realize sustainable development,and assist China in achieving the carbon peaking and carbon neutrality goals.This paper applies the improved gravity model and social network analysis(SNA)to the study of spatial correlation of carbon emissions in city clusters and analyzes the structural characteristics of the spatial correlation network of carbon emissions in the Yangtze River Delta(YRD)city cluster in China and its influencing factors.The results demonstrate that:1)the spatial association of carbon emissions in the YRD city cluster exhibits a typical and complex multi-threaded network structure.The network association number and density show an upward trend,indicating closer spatial association between cities,but their values remain generally low.Meanwhile,the network hierarchy and network efficiency show a downward trend but remain high.2)The spatial association network of carbon emissions in the YRD city cluster shows an obvious‘core-edge’distribution pattern.The network is centered around Shanghai,Suzhou and Wuxi,all of which play the role of‘bridges’,while cities such as Zhoushan,Ma'anshan,Tongling and other cities characterized by the remote location,single transportation mode or lower economic level are positioned at the edge of the network.3)Geographic proximity,varying levels of economic development,different industrial structures,degrees of urbanization,levels of technological innovation,energy intensities and environmental regulation are important influencing factors on the spatial association of within the YRD city cluster.Finally,policy implications are provided from four aspects:government macro-control and market mechanism guidance,structural characteristics of the‘core-edge’network,reconfiguration and optimization of the spatial layout of the YRD city cluster,and the application of advanced technologies.
基金The Strategic Priority Research of the CAS,No.XDA20040400National Natural Science Foundation of China,No.41871118。
文摘Due to the unique geographical location and historical background of Central Asia,the region’s geo-relation networks are complex and changeable.A social network analysis was conducted in this study to visualize the 20-year evolutionary process of bilateral(diplomatic relations)and multilateral(intergovernmental organization(IGO)connections)networks in Central Asia since 1993.Additionally,a further empirical study determined the significant driving forces of the construction of the geo-relation networks.The results showed that since the independence of the five Central Asian countries,their degree centrality(C,D(ni))values have been increasing,with the index values being the highest for Kazakhstan,followed by Uzbekistan,while the other three countries had relatively low values.The Central Asian countries maintain bilateral relations with post-Soviet nations,neighboring countries,and Western powers,and have gradually deepened and expanded their diplomatic networks.From each state’s perspective,the geostrategic approaches adopted by the five countries were different.Kazakhstan has focused on expanding its bilateral and multilateral relations,while the other Central Asian countries have attempted to increase their influence by joining influential IGOs.Various driving forces,including economic,political,cultural,and geographical factors,have played significant roles in the construction of geo-relation networks in Central Asia.The importance of these factors has changed over time,from political and cultural factors(before 1995)to relations with neighboring countries(1996-2001),and finally to economic power and cultural and religious proximity(after 2002).
基金supported by National High Technology Research and Development Program of China(863 Program)(No.AA420060)
文摘In the course of network supported collaborative design,the data processing plays a very vital role.Much effort has been spent in this area,and many kinds of approaches have been proposed.Based on the correlative materials,this paper presents extensible markup language(XML)based strategy for several important problems of data processing in network supported collaborative design,such as the representation of standard for the exchange of product model data(STEP)with XML in the product information expression and the management of XML documents using relational database.The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language(SQL)queries.Finally,the structure of data processing system based on XML is presented.
文摘Despite the recent development of many worldwide initiatives, there is still a need for the development of observation frameworks that will provide a comprehensive view of SDI’s use. Amongst the many challenges left, a thorough analysis of the information flows between existing SDIs as well as their respective uses and the way that those evolve over time is an important issue to explore. The research presented in this paper introduces a methodological framework oriented to the study of the SDIs use from a diachronic perspective. The approach is based on a Social Network Analysis (SNA) and questionnaires collected by online surveys. We develop a structural and diachronic analysis based on a series of graph-based measures identifying the main patterns that appear over time. The methodological framework is applied to a series of French SDIs and users involved in environmental management. The study identifies a series of structural differences in the data flows that emerge between the users and SDIs. Last, the diachronic network analysis provides an overall understanding on how data flows evolve over time at different institutional levels.
基金supported in part by the National Natural Science Foundation of China under Grant No.60972010the Beijing Natural Science Foundation under Grant No.4102047+1 种基金the Major Program for Research on Philosophy&Humanity Social Sciences of the Ministry of Education of China under Grant No.08WL1101the Service Business of Scientists and Engineers Project under Grant No.2009GJA00048
文摘This paper is devoted to analyze and model user reading and replying activities in a bulletin board system(BBS)social network.By analyzing the data set from a famous Chinese BBS social network,we show how some user activities distribute,and reveal several important features that might characterize user dynamics.We propose a method to model user activities in the BBS social network.The model could reproduce power-law and non-power-law distributions of user activities at the same time.Our results show that user reading and replying activities could be simulated through simple agent-based models.Specifically,manners of how the BBS server interacts with Internet users in the Web 2.0 application,how users organize their reading lists,and how user behavioral trait distributes are the important factors in the formation of activity patterns.
文摘Switzerland is one of the most desirable European destinations for Chinese tourists;therefore, a better understanding of Chinese tourists is essential for successful business practices. In China, the largest and leading social media platform—Sina Weibo, a hybrid of Twitter and Facebook—has more than 600 million users. Weibo’s great market penetration suggests that tourism operators and markets need to understand how to build effective and sustainable communications on Chinese social media platforms. In order to offer a better decision support platform to tourism destination managers as well as Chinese tourists, we proposed a framework using linked data on Sina Weibo. Linked Data is a term referring to using the Internet to connect related data. We will show how it can be used and how ontology can be designed to include the users’ context (e.g., GPS locations). Our framework will provide a good theoretical foundation for further understand Chinese tourists’ expectation, experiences, behaviors and new trends in Switzerland.