Artificial Intelligence(AI)Machine Learning(ML)technologies,particularly Deep Learning(DL),have demonstrated significant potential in the interpretation of Remote Sensing(RS)imagery,covering tasks such as scene classi...Artificial Intelligence(AI)Machine Learning(ML)technologies,particularly Deep Learning(DL),have demonstrated significant potential in the interpretation of Remote Sensing(RS)imagery,covering tasks such as scene classification,object detection,land-cover/land-use classification,change detection,and multi-view stereo reconstruction.Large-scale training samples are essential for ML/DL models to achieve optimal performance.However,the current organization of training samples is ad-hoc and vendor-specific,lacking an integrated approach that can effectively manage training samples from different vendors to meet the demands of various RS AI tasks.This article proposes a solution to address these challenges by designing and implementing LuoJiaSET,a large-scale training sample database system for intelligent interpretation of RS imagery.LuoJiaSET accommodates over five million training samples,providing support for cross-dataset queries and serving as a comprehensive training data store for RS AI model training and calibration.It overcomes challenges related to label semantic categories,structural heterogeneity in label representation,and interoperable data access.展开更多
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni...Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.展开更多
Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address th...Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address the challenges,we design and implement a graph-based storage and parallel loading system aimed at multimode medical image data.The system is a framework designed to flexibly store and rapidly load these multimode data.Specifically,the system utilizes the Mode Network to model the modes and their relationships in multimode medical image data and the graph database to store the data with a parallel loading technique.展开更多
Background:Studies on myocardial infarction(MI)based on large medical databases have become popular in recent years.The influence of the National Inpatient Sample(NIS),the largest collection of administrative healthca...Background:Studies on myocardial infarction(MI)based on large medical databases have become popular in recent years.The influence of the National Inpatient Sample(NIS),the largest collection of administrative healthcare data across the United States,on the field of MI has not been well investigated.This study aimed to quantify the contribution of NIS to MI research using bibliometric methods.Methods:We searched the Web of Science Core Collection database to identify publications on MI using NIS from 2000 to 2022.Bibliometric indicators,such as the number of publications,citations,and Hirsch index(H-index),were summarized by years,authors,organizations,and journals.VOSviewer and CiteSpace software were used to analyze the keywords and trends of the hot spots.Results:A total of 342 articles on MI based on NIS were included.A significant growth in outputs related to MI using the NIS from 2000 to 2020 was observed.The publications were mainly from the United States.The Mayo Clinic was the most prolific institution and had the most citations and the highest H-index.The American Journal of Cardiology ranked first among journals with the highest number of publications,citations,and H-index.Mortality and healthcare management are the main focuses of this field.Personalized risks and care are receiving increased attention.Conclusion:This study suggests that NIS significantly contributes to high-quality output in MI research.More efforts are needed to improve the impact of knowledge gained from the NIS on MI.展开更多
The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mi...The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.展开更多
From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a ce...From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a century, a total of approximately 30,000 astronomical photographic plates were captured. These historical plates play an irreplaceable role in conducting long-term, time-domain astronomical research. To preserve and explore these valuable original astronomical observational data, Shanghai Astronomical Observatory has organized the transportation of plates, taken during nighttime observations from various stations across the country, to the Sheshan Plate Archive for centralized preservation. For the first time, plate information statistics were calculated. On this basis, the plates were cleaned and digitally scanned, and finally digitized images were acquired for 29,314 plates. In this study, using Gaia DR2 as the reference star catalog, astrometric processing was carried out successfully on 15,696 single-exposure plates, including object extraction, stellar identification,and plate model computation. As a result, for long focal length telescopes, such as the 40 cm double-tube refractor telescope, the 1.56 m reflector telescope at Shanghai Astronomical Observatory, and the 1m reflecting telescope at Yunnan Astronomical Observatory, the astrometric accuracy obtained for their plates is approximately 0."1–0."3. The distribution of astrometric accuracy for medium and short focal length telescopes ranges from 0."3 to 1."0. The relevant data of this batch of plates, including digitized images and a stellar catalog of the plates, are archived and released by the National Astronomical Data Center. Users can access and download plate data based on keywords such as station, telescope, observation year, and observed celestial coordinates.展开更多
针对深度学习模型在工业缺陷视觉检测领域中因样本稀缺而难以较好训练的问题,本文提出一种融合生成对抗网络(Generative Adversarial Network,GAN)和基于物理的渲染(Physically Based Rendering,PBR)流程的生成式样本合成方法用于数据...针对深度学习模型在工业缺陷视觉检测领域中因样本稀缺而难以较好训练的问题,本文提出一种融合生成对抗网络(Generative Adversarial Network,GAN)和基于物理的渲染(Physically Based Rendering,PBR)流程的生成式样本合成方法用于数据增强。该方法以ConSinGAN为缺陷特征扩增模型,并通过引入坐标注意力机制(Coordinate Attention,CA)来优化鉴别器,使其能更精确识别图像中的缺陷特征。同时调整损失函数,引入重构损失与多尺度结构相似度损失的加权组合以缓解小样本训练中的梯度消失问题并提高生成质量。采用PBR流程输出扩增样本,首先为待扩增样本的工件构建三维模型,然后利用泊松融合将扩增的缺陷特征与原始模型贴图融合,最后在虚拟生产环境中通过虚拟相机渲染输出工件缺陷样本。在公共数据集下的实验结果表明该方法可以对给定的工件缺陷小样本进行有效的数据增强。展开更多
从遥感影像上自动解译铁路设计控制要素是实现“一键成图”的关键,但深度学习遥感影像智能解译需要大量标注样本。依据铁路线路设计原则,提出一种多源遥感数据的设计控制要素智能解译样本库构建方法。首先,基于数字正射影像图(Digital O...从遥感影像上自动解译铁路设计控制要素是实现“一键成图”的关键,但深度学习遥感影像智能解译需要大量标注样本。依据铁路线路设计原则,提出一种多源遥感数据的设计控制要素智能解译样本库构建方法。首先,基于数字正射影像图(Digital Orthophoto Map,DOM)、数字线划地图(Digital Line Graphic,DLG)和激光雷达(Light Detection and Ranging,Lidar)点云多源数据自动生成初始样本;其次,基于增量主动学习迭代方法对初始样本进行优化,达到高质量、全面覆盖铁路沿线的目的;然后,以长赣铁路为例,构建以铁路沿线周边房屋、道路、水体和植被四类铁路线路设计控制要素为重点的高分辨率智能解译样本数据库——铁路线路设计控制要素智能解译样本库(Wuhan University Sample Database of Control Elements of Railway Route Design,WHU-RRDSD),其地面分辨率为0.1 m,样本总数超过20万张;最后,为验证样本库的可用性,分别从定性评价、定量评价以及其他场景应用案例三方面进行详细验证,结果表明,基于房屋、道路、水体和植被四类样本库的IoU评价指标分别为84.43%、82.38%、90.19%、90.28%,表现出优异的解译效果;基于WHU-RRDSD训练得到的智能模型迁移至宜涪高铁场景中房屋、道路、水体和植被要素的解译,验证样本库在其他场景的可用性;简要介绍基于WHU-RRDSD样本库进行的高分辨率遥感图像弱监督建筑提取和高分辨率遥感图像地物分类两个应用案例,进一步验证本文方法所构建样本库可用性。展开更多
基金supported by the National Natural Science Foundation of China[grant number 42071354]supported by the Fundamental Research Funds for the Central Universities[grant number 2042022dx0001]supported by the Fundamental Research Funds for the Central Universities[grant number WUT:223108001].
文摘Artificial Intelligence(AI)Machine Learning(ML)technologies,particularly Deep Learning(DL),have demonstrated significant potential in the interpretation of Remote Sensing(RS)imagery,covering tasks such as scene classification,object detection,land-cover/land-use classification,change detection,and multi-view stereo reconstruction.Large-scale training samples are essential for ML/DL models to achieve optimal performance.However,the current organization of training samples is ad-hoc and vendor-specific,lacking an integrated approach that can effectively manage training samples from different vendors to meet the demands of various RS AI tasks.This article proposes a solution to address these challenges by designing and implementing LuoJiaSET,a large-scale training sample database system for intelligent interpretation of RS imagery.LuoJiaSET accommodates over five million training samples,providing support for cross-dataset queries and serving as a comprehensive training data store for RS AI model training and calibration.It overcomes challenges related to label semantic categories,structural heterogeneity in label representation,and interoperable data access.
基金Supported by the Open Researches Fund Program of L IESMARS(WKL(0 0 ) 0 30 2 )
文摘Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases.
文摘Since Multimode data is composed of many modes and their complex relationships,it cannot be retrieved or mined effectively by utilizing traditional analysis and processing techniques for single mode data.To address the challenges,we design and implement a graph-based storage and parallel loading system aimed at multimode medical image data.The system is a framework designed to flexibly store and rapidly load these multimode data.Specifically,the system utilizes the Mode Network to model the modes and their relationships in multimode medical image data and the graph database to store the data with a parallel loading technique.
基金National Clinical Research Center for geriatric diseases(Jianchao Liu,grant number NCRCG-PLAGH-2019001)National Natural Science Foundation of China(Zhouheng Ye,grant number 82000587)。
文摘Background:Studies on myocardial infarction(MI)based on large medical databases have become popular in recent years.The influence of the National Inpatient Sample(NIS),the largest collection of administrative healthcare data across the United States,on the field of MI has not been well investigated.This study aimed to quantify the contribution of NIS to MI research using bibliometric methods.Methods:We searched the Web of Science Core Collection database to identify publications on MI using NIS from 2000 to 2022.Bibliometric indicators,such as the number of publications,citations,and Hirsch index(H-index),were summarized by years,authors,organizations,and journals.VOSviewer and CiteSpace software were used to analyze the keywords and trends of the hot spots.Results:A total of 342 articles on MI based on NIS were included.A significant growth in outputs related to MI using the NIS from 2000 to 2020 was observed.The publications were mainly from the United States.The Mayo Clinic was the most prolific institution and had the most citations and the highest H-index.The American Journal of Cardiology ranked first among journals with the highest number of publications,citations,and H-index.Mortality and healthcare management are the main focuses of this field.Personalized risks and care are receiving increased attention.Conclusion:This study suggests that NIS significantly contributes to high-quality output in MI research.More efforts are needed to improve the impact of knowledge gained from the NIS on MI.
基金This work was supported by the National Natural Science Foundation of China! (No.69743001) the National Doctoral Subject Fou
文摘The huge amount of information stored in databases owned by corporations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper, several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally, some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.
基金supported by the Shanghai Science and Technology Innovation Action Plan(grant No.21511104100)the Global Common Challenge Special Project(grant No.018GJHZ2023110GC)the China National Key Basic Research Program(grant No.2012FY120500)。
文摘From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a century, a total of approximately 30,000 astronomical photographic plates were captured. These historical plates play an irreplaceable role in conducting long-term, time-domain astronomical research. To preserve and explore these valuable original astronomical observational data, Shanghai Astronomical Observatory has organized the transportation of plates, taken during nighttime observations from various stations across the country, to the Sheshan Plate Archive for centralized preservation. For the first time, plate information statistics were calculated. On this basis, the plates were cleaned and digitally scanned, and finally digitized images were acquired for 29,314 plates. In this study, using Gaia DR2 as the reference star catalog, astrometric processing was carried out successfully on 15,696 single-exposure plates, including object extraction, stellar identification,and plate model computation. As a result, for long focal length telescopes, such as the 40 cm double-tube refractor telescope, the 1.56 m reflector telescope at Shanghai Astronomical Observatory, and the 1m reflecting telescope at Yunnan Astronomical Observatory, the astrometric accuracy obtained for their plates is approximately 0."1–0."3. The distribution of astrometric accuracy for medium and short focal length telescopes ranges from 0."3 to 1."0. The relevant data of this batch of plates, including digitized images and a stellar catalog of the plates, are archived and released by the National Astronomical Data Center. Users can access and download plate data based on keywords such as station, telescope, observation year, and observed celestial coordinates.
文摘从遥感影像上自动解译铁路设计控制要素是实现“一键成图”的关键,但深度学习遥感影像智能解译需要大量标注样本。依据铁路线路设计原则,提出一种多源遥感数据的设计控制要素智能解译样本库构建方法。首先,基于数字正射影像图(Digital Orthophoto Map,DOM)、数字线划地图(Digital Line Graphic,DLG)和激光雷达(Light Detection and Ranging,Lidar)点云多源数据自动生成初始样本;其次,基于增量主动学习迭代方法对初始样本进行优化,达到高质量、全面覆盖铁路沿线的目的;然后,以长赣铁路为例,构建以铁路沿线周边房屋、道路、水体和植被四类铁路线路设计控制要素为重点的高分辨率智能解译样本数据库——铁路线路设计控制要素智能解译样本库(Wuhan University Sample Database of Control Elements of Railway Route Design,WHU-RRDSD),其地面分辨率为0.1 m,样本总数超过20万张;最后,为验证样本库的可用性,分别从定性评价、定量评价以及其他场景应用案例三方面进行详细验证,结果表明,基于房屋、道路、水体和植被四类样本库的IoU评价指标分别为84.43%、82.38%、90.19%、90.28%,表现出优异的解译效果;基于WHU-RRDSD训练得到的智能模型迁移至宜涪高铁场景中房屋、道路、水体和植被要素的解译,验证样本库在其他场景的可用性;简要介绍基于WHU-RRDSD样本库进行的高分辨率遥感图像弱监督建筑提取和高分辨率遥感图像地物分类两个应用案例,进一步验证本文方法所构建样本库可用性。