Several Wireless Fidelity(WiFi)fingerprint datasets based on Received Signal Strength(RSS)have been shared for indoor localization.However,they can’t meet all the demands of WiFi RSS-based localization.A supplementar...Several Wireless Fidelity(WiFi)fingerprint datasets based on Received Signal Strength(RSS)have been shared for indoor localization.However,they can’t meet all the demands of WiFi RSS-based localization.A supplementary open dataset for WiFi indoor localization based on RSS,called as SODIndoorLoc,covering three buildings with multiple floors,is presented in this work.The dataset includes dense and uniformly distributed Reference Points(RPs)with the average distance between two adjacent RPs smaller than 1.2 m.Besides,the locations and channel information of pre-installed Access Points(APs)are summarized in the SODIndoorLoc.In addition,computer-aided design drawings of each floor are provided.The SODIndoorLoc supplies nine training and five testing sheets.Four standard machine learning algorithms and their variants(eight in total)are explored to evaluate positioning accuracy,and the best average positioning accuracy is about 2.3 m.Therefore,the SODIndoorLoc can be treated as a supplement to UJIIndoorLoc with a consistent format.The dataset can be used for clustering,classification,and regression to compare the performance of different indoor positioning applications based on WiFi RSS values,e.g.,high-precision positioning,building,floor recognition,fine-grained scene identification,range model simulation,and rapid dataset construction.展开更多
This paper presented a real-time millimeter wave radar-based system for tracking vehicle trajectories in a wide area (continuously along a roadway with essentially no length limit in practice). The trajectory tracking...This paper presented a real-time millimeter wave radar-based system for tracking vehicle trajectories in a wide area (continuously along a roadway with essentially no length limit in practice). The trajectory tracking results were first validated for single vehicle trajectory tracking using the Real-Time Kinematic positioning technology based on the Beidou satellite navigation systems. The validation showed that the vehicle positions were captured with a mean lateral offset of −0.284 m and a mean longitudinal offset of −0.352 m. The mean estimated speeds were found to have a difference of only −0.048 km/h from the ground truths. The trajectory tracking results were also validated through multi-object tracking using video data from an unmanned aerial vehicle (UAV). Compared with the UAV video footage, the millimeter wave-based system was found to correctly capture around 92% of the total number of vehicles. For the correctly captured vehicles, their positions were found to be within 0.99 m (about a quarter of the width of a regular traffic lane) of the ground truth. In this paper, we also would like to share openly the entire validated datasets which has been recently published online as the TJRD TS platform. Additionally, a demonstration in using the dataset to detect aggressive driving behaviors, such as speeding was presented. It is expected that the open dataset will help enable researchers and practitioners to further explore the behaviors of road users, track the dynamics of safety risks and congestion formation, and evaluate incident impacts in a more microscopic but comprehensive way.展开更多
Data have become valuable assets for enterprises.Data governance aims to manage and reuse data assets,facilitating enterprise management and enabling product innovations.A data lineage graph(DLG)is an abstracted colle...Data have become valuable assets for enterprises.Data governance aims to manage and reuse data assets,facilitating enterprise management and enabling product innovations.A data lineage graph(DLG)is an abstracted collection of data assets and their data lineages in data governance.Analyzing DLGs can provide rich data insights for data governance.However,the progress of data governance technologies is hindered by the shortage of available open datasets for DLGs.This paper introduces an open dataset of DLGs,including the DLG model,the dataset construction process,and applied areas.This real-world dataset is sourced from Huawei Cloud Computing Technology Company Limited,which contains 18 DLGs with three types of data assets and two types of relations.To the best of our knowledge,this dataset is the first open dataset of DLGs for data governance.This dataset can also support the development of other application areas,such as graph analytics and visualization.展开更多
Various stakeholders,such as researchers,government agencies,businesses,and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support ...Various stakeholders,such as researchers,government agencies,businesses,and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support their work.These data are crucial for a variety of application,such as advancing scientific research,conducting business evaluations,and undertaking policy analysis.However,collecting such data is often a time-consuming and laborious task.Consequently,many users turn to using openly accessible data for their research.However,these existing open dataset releases typically suffer from lack of relationship between different data sources and a limited temporal coverage.To address this issue,we present a new open dataset,the Intelligent Innovation Dataset(IIDS),which comprises six interrelated datasets spanning nearly 120 years,encompassing paper information,paper citation relationships,patent details,patent legal statuses,and funding information.The extensive contextual and extensive temporal coverage of the IIDS dataset will provide researchers and practitioners and policy maker with comprehensive data support,enabling them to conduct in-depth scientific research and comprehensive data analyses.展开更多
Malicious webshells currently present tremendous threats to cloud security.Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem,that is,identifying wh...Malicious webshells currently present tremendous threats to cloud security.Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem,that is,identifying whether a webshell is malicious or benign.However,a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats.This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multiclassification researches.This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud.Each of them is provided with a family label.The samples of the same family generally present similar characteristics or behaviors.The dataset has a total of 78 families and 22 outliers.Moreover,this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples,address privacy issues,and determine the family of each sample.This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence,graph,and tree data analytics and visualization.展开更多
In this paper,a novel deep learning dataset,called Air2Land,is presented for advancing the state‐of‐the‐art object detection and pose estimation in the context of one fixed‐wing unmanned aerial vehicle autolanding...In this paper,a novel deep learning dataset,called Air2Land,is presented for advancing the state‐of‐the‐art object detection and pose estimation in the context of one fixed‐wing unmanned aerial vehicle autolanding scenarios.It bridges vision and control for ground‐based vision guidance systems having the multi‐modal data obtained by diverse sensors and pushes forward the development of computer vision and autopilot algorithms tar-geted at visually assisted landing of one fixed‐wing vehicle.The dataset is composed of sequential stereo images and synchronised sensor data,in terms of the flying vehicle pose and Pan‐Tilt Unit angles,simulated in various climate conditions and landing scenarios.Since real‐world automated landing data is very limited,the proposed dataset provides the necessary foundation for vision‐based tasks such as flying vehicle detection,key point localisation,pose estimation etc.Hereafter,in addition to providing plentiful and scene‐rich data,the developed dataset covers high‐risk scenarios that are hardly accessible in reality.The dataset is also open and available at https://github.com/micros‐uav/micros_air2land as well.展开更多
基金National Natural Science Foundation of China(No.42001397)National Key Research and Development Program of China(No.2016YFB0502102)+2 种基金Introduction and Training Program of Young Creative Talents of Shandong Province(No.0031802)Doctoral Research Fund of Shandong Jianzhu University(No.XNBS1985)National College Student Innovation and Entrepreneurship Training Program(No.S202110430036).
文摘Several Wireless Fidelity(WiFi)fingerprint datasets based on Received Signal Strength(RSS)have been shared for indoor localization.However,they can’t meet all the demands of WiFi RSS-based localization.A supplementary open dataset for WiFi indoor localization based on RSS,called as SODIndoorLoc,covering three buildings with multiple floors,is presented in this work.The dataset includes dense and uniformly distributed Reference Points(RPs)with the average distance between two adjacent RPs smaller than 1.2 m.Besides,the locations and channel information of pre-installed Access Points(APs)are summarized in the SODIndoorLoc.In addition,computer-aided design drawings of each floor are provided.The SODIndoorLoc supplies nine training and five testing sheets.Four standard machine learning algorithms and their variants(eight in total)are explored to evaluate positioning accuracy,and the best average positioning accuracy is about 2.3 m.Therefore,the SODIndoorLoc can be treated as a supplement to UJIIndoorLoc with a consistent format.The dataset can be used for clustering,classification,and regression to compare the performance of different indoor positioning applications based on WiFi RSS values,e.g.,high-precision positioning,building,floor recognition,fine-grained scene identification,range model simulation,and rapid dataset construction.
基金supported by the National Key R&D Program of China(2019YFB1600703)the Chinese National Natural Science Foundation(Grant No.72001161 and 52172348)the Fundamental Research Funds for the Central Universities.
文摘This paper presented a real-time millimeter wave radar-based system for tracking vehicle trajectories in a wide area (continuously along a roadway with essentially no length limit in practice). The trajectory tracking results were first validated for single vehicle trajectory tracking using the Real-Time Kinematic positioning technology based on the Beidou satellite navigation systems. The validation showed that the vehicle positions were captured with a mean lateral offset of −0.284 m and a mean longitudinal offset of −0.352 m. The mean estimated speeds were found to have a difference of only −0.048 km/h from the ground truths. The trajectory tracking results were also validated through multi-object tracking using video data from an unmanned aerial vehicle (UAV). Compared with the UAV video footage, the millimeter wave-based system was found to correctly capture around 92% of the total number of vehicles. For the correctly captured vehicles, their positions were found to be within 0.99 m (about a quarter of the width of a regular traffic lane) of the ground truth. In this paper, we also would like to share openly the entire validated datasets which has been recently published online as the TJRD TS platform. Additionally, a demonstration in using the dataset to detect aggressive driving behaviors, such as speeding was presented. It is expected that the open dataset will help enable researchers and practitioners to further explore the behaviors of road users, track the dynamics of safety risks and congestion formation, and evaluate incident impacts in a more microscopic but comprehensive way.
基金the National Natural Science Foundation of China(No.62272480 and 62072470)。
文摘Data have become valuable assets for enterprises.Data governance aims to manage and reuse data assets,facilitating enterprise management and enabling product innovations.A data lineage graph(DLG)is an abstracted collection of data assets and their data lineages in data governance.Analyzing DLGs can provide rich data insights for data governance.However,the progress of data governance technologies is hindered by the shortage of available open datasets for DLGs.This paper introduces an open dataset of DLGs,including the DLG model,the dataset construction process,and applied areas.This real-world dataset is sourced from Huawei Cloud Computing Technology Company Limited,which contains 18 DLGs with three types of data assets and two types of relations.To the best of our knowledge,this dataset is the first open dataset of DLGs for data governance.This dataset can also support the development of other application areas,such as graph analytics and visualization.
文摘Various stakeholders,such as researchers,government agencies,businesses,and research laboratories require a large volume of reliable scientific research outcomes including research articles and patent data to support their work.These data are crucial for a variety of application,such as advancing scientific research,conducting business evaluations,and undertaking policy analysis.However,collecting such data is often a time-consuming and laborious task.Consequently,many users turn to using openly accessible data for their research.However,these existing open dataset releases typically suffer from lack of relationship between different data sources and a limited temporal coverage.To address this issue,we present a new open dataset,the Intelligent Innovation Dataset(IIDS),which comprises six interrelated datasets spanning nearly 120 years,encompassing paper information,paper citation relationships,patent details,patent legal statuses,and funding information.The extensive contextual and extensive temporal coverage of the IIDS dataset will provide researchers and practitioners and policy maker with comprehensive data support,enabling them to conduct in-depth scientific research and comprehensive data analyses.
基金the National Natural Science Foundation of China(No.62272480 and 62072470).
文摘Malicious webshells currently present tremendous threats to cloud security.Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem,that is,identifying whether a webshell is malicious or benign.However,a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats.This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multiclassification researches.This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud.Each of them is provided with a family label.The samples of the same family generally present similar characteristics or behaviors.The dataset has a total of 78 families and 22 outliers.Moreover,this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples,address privacy issues,and determine the family of each sample.This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence,graph,and tree data analytics and visualization.
文摘In this paper,a novel deep learning dataset,called Air2Land,is presented for advancing the state‐of‐the‐art object detection and pose estimation in the context of one fixed‐wing unmanned aerial vehicle autolanding scenarios.It bridges vision and control for ground‐based vision guidance systems having the multi‐modal data obtained by diverse sensors and pushes forward the development of computer vision and autopilot algorithms tar-geted at visually assisted landing of one fixed‐wing vehicle.The dataset is composed of sequential stereo images and synchronised sensor data,in terms of the flying vehicle pose and Pan‐Tilt Unit angles,simulated in various climate conditions and landing scenarios.Since real‐world automated landing data is very limited,the proposed dataset provides the necessary foundation for vision‐based tasks such as flying vehicle detection,key point localisation,pose estimation etc.Hereafter,in addition to providing plentiful and scene‐rich data,the developed dataset covers high‐risk scenarios that are hardly accessible in reality.The dataset is also open and available at https://github.com/micros‐uav/micros_air2land as well.