With large-scale applications,the loss of power load data during transmission is inevitable.This paper proposes a data completion method considering the low rank property of the data.According to the low-rank property...With large-scale applications,the loss of power load data during transmission is inevitable.This paper proposes a data completion method considering the low rank property of the data.According to the low-rank property of data and numerical experiments,we find either the linear interpolation(LI)or the singular value decomposition(SVD)based method is superior to other methods depending on the smoothness of the data.We construct an index to measure the smoothness of data,and propose the SVDLI algorithm which adaptively selects different algorithms for data completion according to the index.Numerical simulations show that irrespective of the smoothness of data,the data complementing results of SVDLI are comparable to or better than the best of SVD or LI algorithms.The present study is verified using the measurements in China,and the public data of the Australian electricity distribution company and Lawrence Berkeley National Laboratory.展开更多
Nowadays,several research projects show interest in employing volunteered geographic information(VGI)to improve their systems through using up-to-date and detailed data.The European project CAP4Access is one of the su...Nowadays,several research projects show interest in employing volunteered geographic information(VGI)to improve their systems through using up-to-date and detailed data.The European project CAP4Access is one of the successful examples of such international-wide research projects that aims to improve the accessibility of people with restricted mobility using crowdsourced data.In this project,OpenStreetMap(OSM)is used to extend OpenRouteService,a well-known routing platform.However,a basic challenge that this project tackled was the incompleteness of OSM data with regards to certain information that is required for wheelchair accessibility(e.g.sidewalk information,kerb data,etc.).In this article,we present the results of initial assessment of sidewalk data in OSM at the beginning of the project as well as our approach in awareness raising and using tools for tagging accessibility data into OSM database for enriching the sidewalk data completeness.Several experiments have been carried out in different European cities,and discussion on the results of the experiments as well as the lessons learned are provided.The lessons learned provide recommendations that help in organizing better mapping party events in the future.We conclude by reporting on how and to what extent the OSM sidewalk data completeness in these study areas have benefited from the mapping parties by the end of the project.展开更多
The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope ...The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope with the intermittency of ever-increasing renewable energy and ensure the security of the smart grid,state estimation,which serves as a basic tool for understanding the true states of a smart grid,should be performed with high frequency.More complete system state data are needed to support high-frequency state estimation.The data completeness problem for smart grid state estimation is therefore studied in this paper.The problem of improving data completeness by recovering highfrequency data from low-frequency data is formulated as a super resolution perception(SRP)problem in this paper.A novel machine-learning-based SRP approach is thereafter proposed.The proposed method,namely the Super Resolution Perception Net for State Estimation(SRPNSE),consists of three steps:feature extraction,information completion,and data reconstruction.Case studies have demonstrated the effectiveness and value of the proposed SRPNSE approach in recovering high-frequency data from low-frequency data for the state estimation.展开更多
Based on the concrete conditions of earthquake data in the west of China, East China and SOuth China, we studied the completeness of data in these regions by suitable methods to local conditions. Otherwise, we roughly...Based on the concrete conditions of earthquake data in the west of China, East China and SOuth China, we studied the completeness of data in these regions by suitable methods to local conditions. Otherwise, we roughly estimated monitoring capability of local networks in China since 1970 and some outlying regions where the data is lack. Finally, we gave the regional distribution of the beginning years since which the data for different magnitude intervals are largely complete in the Chinese mainland.展开更多
Various stakeholders,such as researchers,government agencies,businesses,and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support ...Various stakeholders,such as researchers,government agencies,businesses,and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support their work.These data are crucial for a variety of application,such as advancing scientific research,conducting business evaluations,and undertaking policy analysis.However,collecting such data is often a time-consuming and laborious task.Consequently,many users turn to using openly accessible data for their research.However,these existing open dataset releases typically suffer from lack of relationship between different data sources and a limited temporal coverage.To address this issue,we present a new open dataset,the Intelligent Innovation Dataset(IIDS),which comprises six interrelated datasets spanning nearly 120 years,encompassing paper information,paper citation relationships,patent details,patent legal statuses,and funding information.The extensive contextual and extensive temporal coverage of the IIDS dataset will provide researchers and practitioners and policy maker with comprehensive data support,enabling them to conduct in-depth scientific research and comprehensive data analyses.展开更多
We present VCNet,a new deep learning approach for volume completion by synthesizing missing subvolumes.Our solution leverages a generative adversarial network(GAN)that learns to complete volumes using the adversarial ...We present VCNet,a new deep learning approach for volume completion by synthesizing missing subvolumes.Our solution leverages a generative adversarial network(GAN)that learns to complete volumes using the adversarial and volumetric losses.The core design of VCNet features a dilated residual block and long-term connection.During training,VCNet first randomly masks basic subvolumes(e.g.,cuboids,slices)from complete volumes and learns to recover them.Moreover,we design a two-stage algorithm for stabilizing and accelerating network optimization.Once trained,VCNet takes an incomplete volume as input and automatically identifies and fills in the missing subvolumes with high quality.We quantitatively and qualitatively test VCNet with volumetric data sets of various characteristics to demonstrate its effectiveness.We also compare VCNet against a diffusion-based solution and two GAN-based solutions.展开更多
Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is comm...Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it.Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.展开更多
Building,as an integral aspect of human life,is vital in the domains of urban management and urban analysis.To facilitate large-scale urban planning applications,the acquisition of complete and reliable building data ...Building,as an integral aspect of human life,is vital in the domains of urban management and urban analysis.To facilitate large-scale urban planning applications,the acquisition of complete and reliable building data becomes imperative.There are a few publicly available products that provide a lot of building data,such as Microsoft and Open Street Map.However,in East Asia,due to the more complex distribution of buildings and the scarcity of auxiliary data,there is a lack of building data in these regions,hindering the large-scale application in East Asia.Some studies attempt to simulate large-scale building distribution information using incomplete local buildings footprints data through regression.However,the reliance on inaccurate buildings data introduces cumulative errors,rendering this simulation data highly unreliable,leading to limitations in achieving precise research in East Asian region.Therefore,we proposed a comprehensive large-scale buildings mapping framework in view of the complexity of buildings in East Asia,and conducted buildings footprints extraction in 2,897 cities across 5 countries in East Asia and yielded a substantial dataset of 281,093,433 buildings.The evaluation shows the validity of our building product,with an average overall accuracy of 89.63%and an F1 score of 82.55%.In addition,a comparison with existing products further shows the high quality and completeness of our building data.Finally,we conduct spatial analysis of our building data,revealing its value in supporting urban-related research.The data for this article can be downloaded from https://doi.org/10.5281/zenodo.8174931.展开更多
文摘With large-scale applications,the loss of power load data during transmission is inevitable.This paper proposes a data completion method considering the low rank property of the data.According to the low-rank property of data and numerical experiments,we find either the linear interpolation(LI)or the singular value decomposition(SVD)based method is superior to other methods depending on the smoothness of the data.We construct an index to measure the smoothness of data,and propose the SVDLI algorithm which adaptively selects different algorithms for data completion according to the index.Numerical simulations show that irrespective of the smoothness of data,the data complementing results of SVDLI are comparable to or better than the best of SVD or LI algorithms.The present study is verified using the measurements in China,and the public data of the Australian electricity distribution company and Lawrence Berkeley National Laboratory.
基金supported by the European Community’s Seventh Framework Programme[FP7/2007–2013],[Grant No 612096(CAP4Access)].
文摘Nowadays,several research projects show interest in employing volunteered geographic information(VGI)to improve their systems through using up-to-date and detailed data.The European project CAP4Access is one of the successful examples of such international-wide research projects that aims to improve the accessibility of people with restricted mobility using crowdsourced data.In this project,OpenStreetMap(OSM)is used to extend OpenRouteService,a well-known routing platform.However,a basic challenge that this project tackled was the incompleteness of OSM data with regards to certain information that is required for wheelchair accessibility(e.g.sidewalk information,kerb data,etc.).In this article,we present the results of initial assessment of sidewalk data in OSM at the beginning of the project as well as our approach in awareness raising and using tools for tagging accessibility data into OSM database for enriching the sidewalk data completeness.Several experiments have been carried out in different European cities,and discussion on the results of the experiments as well as the lessons learned are provided.The lessons learned provide recommendations that help in organizing better mapping party events in the future.We conclude by reporting on how and to what extent the OSM sidewalk data completeness in these study areas have benefited from the mapping parties by the end of the project.
基金the Training Program of the Major Research Plan of the National Natural Science Foundation of China(91746118)the Shenzhen Municipal Science and Technology Innovation Committee Basic Research project(JCYJ20170410172224515)。
文摘The smart grid is an evolving critical infrastructure,which combines renewable energy and the most advanced information and communication technologies to provide more economic and secure power supply services.To cope with the intermittency of ever-increasing renewable energy and ensure the security of the smart grid,state estimation,which serves as a basic tool for understanding the true states of a smart grid,should be performed with high frequency.More complete system state data are needed to support high-frequency state estimation.The data completeness problem for smart grid state estimation is therefore studied in this paper.The problem of improving data completeness by recovering highfrequency data from low-frequency data is formulated as a super resolution perception(SRP)problem in this paper.A novel machine-learning-based SRP approach is thereafter proposed.The proposed method,namely the Super Resolution Perception Net for State Estimation(SRPNSE),consists of three steps:feature extraction,information completion,and data reconstruction.Case studies have demonstrated the effectiveness and value of the proposed SRPNSE approach in recovering high-frequency data from low-frequency data for the state estimation.
文摘Based on the concrete conditions of earthquake data in the west of China, East China and SOuth China, we studied the completeness of data in these regions by suitable methods to local conditions. Otherwise, we roughly estimated monitoring capability of local networks in China since 1970 and some outlying regions where the data is lack. Finally, we gave the regional distribution of the beginning years since which the data for different magnitude intervals are largely complete in the Chinese mainland.
文摘Various stakeholders,such as researchers,government agencies,businesses,and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support their work.These data are crucial for a variety of application,such as advancing scientific research,conducting business evaluations,and undertaking policy analysis.However,collecting such data is often a time-consuming and laborious task.Consequently,many users turn to using openly accessible data for their research.However,these existing open dataset releases typically suffer from lack of relationship between different data sources and a limited temporal coverage.To address this issue,we present a new open dataset,the Intelligent Innovation Dataset(IIDS),which comprises six interrelated datasets spanning nearly 120 years,encompassing paper information,paper citation relationships,patent details,patent legal statuses,and funding information.The extensive contextual and extensive temporal coverage of the IIDS dataset will provide researchers and practitioners and policy maker with comprehensive data support,enabling them to conduct in-depth scientific research and comprehensive data analyses.
基金This work was supported in part by the U.S.National Science Foundation through grants IIS-1455886,CNS-1629914,DUE-1833129,IIS-1955395,IIS-2101696,and OAC-2104158.The authors would like to thank the anonymous reviewers for their insightful comments.
文摘We present VCNet,a new deep learning approach for volume completion by synthesizing missing subvolumes.Our solution leverages a generative adversarial network(GAN)that learns to complete volumes using the adversarial and volumetric losses.The core design of VCNet features a dilated residual block and long-term connection.During training,VCNet first randomly masks basic subvolumes(e.g.,cuboids,slices)from complete volumes and learns to recover them.Moreover,we design a two-stage algorithm for stabilizing and accelerating network optimization.Once trained,VCNet takes an incomplete volume as input and automatically identifies and fills in the missing subvolumes with high quality.We quantitatively and qualitatively test VCNet with volumetric data sets of various characteristics to demonstrate its effectiveness.We also compare VCNet against a diffusion-based solution and two GAN-based solutions.
基金The work was supported by the National Basic Research 973 Program of China under Grant No. 2011CB036202 and the National Natural Science Foundation of China under Grant No. 61532015.
文摘Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it.Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.
基金supported in part by the National Key R&D Program of China under Grant 2022YFB3903402National Natural Science Foundation of China under Grant 42222106+1 种基金National Natural Science Foun-dation of China under Grant 61976234Fundamental Research Funds for the Central Universities,Sun Yat-sen University under Grant 22lgqb12.
文摘Building,as an integral aspect of human life,is vital in the domains of urban management and urban analysis.To facilitate large-scale urban planning applications,the acquisition of complete and reliable building data becomes imperative.There are a few publicly available products that provide a lot of building data,such as Microsoft and Open Street Map.However,in East Asia,due to the more complex distribution of buildings and the scarcity of auxiliary data,there is a lack of building data in these regions,hindering the large-scale application in East Asia.Some studies attempt to simulate large-scale building distribution information using incomplete local buildings footprints data through regression.However,the reliance on inaccurate buildings data introduces cumulative errors,rendering this simulation data highly unreliable,leading to limitations in achieving precise research in East Asian region.Therefore,we proposed a comprehensive large-scale buildings mapping framework in view of the complexity of buildings in East Asia,and conducted buildings footprints extraction in 2,897 cities across 5 countries in East Asia and yielded a substantial dataset of 281,093,433 buildings.The evaluation shows the validity of our building product,with an average overall accuracy of 89.63%and an F1 score of 82.55%.In addition,a comparison with existing products further shows the high quality and completeness of our building data.Finally,we conduct spatial analysis of our building data,revealing its value in supporting urban-related research.The data for this article can be downloaded from https://doi.org/10.5281/zenodo.8174931.