期刊文献+
共找到14,664篇文章
< 1 2 250 >
每页显示 20 50 100
Standardizing Healthcare Datasets in China:Challenges and Strategies
1
作者 Zheng-Yong Hu Xiao-Lei Xiu +2 位作者 Jing-Yu Zhang Wan-Fei Hu Si-Zhu Wu 《Chinese Medical Sciences Journal》 2025年第4期253-267,I0001,共16页
Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthc... Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthcare dataset standards from 2011 to 2025.It analyzes their evolution across types,applications,institutions,and themes,highlighting key achievements including substantial growth in quantity,optimized typology,expansion into innovative application scenarios such as health decision support,and broadened institutional involvement.The study also identifies critical challenges,including imbalanced development,insufficient quality control,and a lack of essential metadata—such as authoritative data element mappings and privacy annotations—which hampers the delivery of intelligent services.To address these challenges,the study proposes a multi-faceted strategy focused on optimizing the standard system's architecture,enhancing quality and implementation,and advancing both data governance—through authoritative tracing and privacy protection—and intelligent service provision.These strategies aim to promote the application of dataset standards,thereby fostering and securing the development of new productive forces in healthcare. 展开更多
关键词 healthcare dataset standards data standardization data management
在线阅读 下载PDF
Big Texture Dataset Synthesized Based on Gradient and Convolution Kernels Using Pre-Trained Deep Neural Networks
2
作者 Farhan A.Alenizi Faten Khalid Karim +1 位作者 Alaa R.Al-Shamasneh Mohammad Hossein Shakoor 《Computer Modeling in Engineering & Sciences》 2025年第8期1793-1829,共37页
Deep neural networks provide accurate results for most applications.However,they need a big dataset to train properly.Providing a big dataset is a significant challenge in most applications.Image augmentation refers t... Deep neural networks provide accurate results for most applications.However,they need a big dataset to train properly.Providing a big dataset is a significant challenge in most applications.Image augmentation refers to techniques that increase the amount of image data.Common operations for image augmentation include changes in illumination,rotation,contrast,size,viewing angle,and others.Recently,Generative Adversarial Networks(GANs)have been employed for image generation.However,like image augmentation methods,GAN approaches can only generate images that are similar to the original images.Therefore,they also cannot generate new classes of data.Texture images presentmore challenges than general images,and generating textures is more complex than creating other types of images.This study proposes a gradient-based deep neural network method that generates a new class of texture.It is possible to rapidly generate new classes of textures using different kernels from pre-trained deep networks.After generating new textures for each class,the number of textures increases through image augmentation.During this process,several techniques are proposed to automatically remove incomplete and similar textures that are created.The proposed method is faster than some well-known generative networks by around 4 to 10 times.In addition,the quality of the generated textures surpasses that of these networks.The proposed method can generate textures that surpass those of someGANs and parametric models in certain image qualitymetrics.It can provide a big texture dataset to train deep networks.A new big texture dataset is created artificially using the proposed method.This dataset is approximately 2 GB in size and comprises 30,000 textures,each 150×150 pixels in size,organized into 600 classes.It is uploaded to the Kaggle site and Google Drive.This dataset is called BigTex.Compared to other texture datasets,the proposed dataset is the largest and can serve as a comprehensive texture dataset for training more powerful deep neural networks and mitigating overfitting. 展开更多
关键词 Big texture dataset data generation pre-trained deep neural network
在线阅读 下载PDF
Handling missing data in large-scale TBM datasets:Methods,strategies,and applications
3
作者 Haohan Xiao Ruilang Cao +5 位作者 Zuyu Chen Chengyu Hong Jun Wang Min Yao Litao Fan Teng Luo 《Intelligent Geoengineering》 2025年第3期109-125,共17页
Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This s... Substantial advancements have been achieved in Tunnel Boring Machine(TBM)technology and monitoring systems,yet the presence of missing data impedes accurate analysis and interpretation of TBM monitoring results.This study aims to investigate the issue of missing data in extensive TBM datasets.Through a comprehensive literature review,we analyze the mechanism of missing TBM data and compare different imputation methods,including statistical analysis and machine learning algorithms.We also examine the impact of various missing patterns and rates on the efficacy of these methods.Finally,we propose a dynamic interpolation strategy tailored for TBM engineering sites.The research results show that K-Nearest Neighbors(KNN)and Random Forest(RF)algorithms can achieve good interpolation results;As the missing rate increases,the interpolation effect of different methods will decrease;The interpolation effect of block missing is poor,followed by mixed missing,and the interpolation effect of sporadic missing is the best.On-site application results validate the proposed interpolation strategy's capability to achieve robust missing value interpolation effects,applicable in ML scenarios such as parameter optimization,attitude warning,and pressure prediction.These findings contribute to enhancing the efficiency of TBM missing data processing,offering more effective support for large-scale TBM monitoring datasets. 展开更多
关键词 Tunnel boring machine(TBM) Missing data imputation Machine learning(ML) Time series interpolation Data preprocessing Real-time data stream
在线阅读 下载PDF
Cooperative Iteration Matching Method for Aligning Samples from Heterogeneous Industrial Datasets
4
作者 LI Han SHI Guohong +1 位作者 LIU Zhao ZHU Ping 《Journal of Shanghai Jiaotong university(Science)》 2025年第2期375-384,共10页
Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining s... Industrial data mining usually deals with data from different sources.These heterogeneous datasets describe the same object in different views.However,samples from some of the datasets may be lost.Then the remaining samples do not correspond one-to-one correctly.Mismatched datasets caused by missing samples make the industrial data unavailable for further machine learning.In order to align the mismatched samples,this article presents a cooperative iteration matching method(CIMM)based on the modified dynamic time warping(DTW).The proposed method regards the sequentially accumulated industrial data as the time series.Mismatched samples are aligned by the DTW.In addition,dynamic constraints are applied to the warping distance of the DTW process to make the alignment more efficient.Then a series of models are trained with the cumulated samples iteratively.Several groups of numerical experiments on different missing patterns and missing locations are designed and analyzed to prove the effectiveness and the applicability of the proposed method. 展开更多
关键词 dynamic time warping mismatched samples sample alignment industrial data data missing
原文传递
Machine learning assisted enhancement of petrophysical property dataset of fractured Variscan granites of the Cornubian Batholith,SW UK
5
作者 A.Turan E.Artun I.Sass 《Artificial Intelligence in Geosciences》 2025年第2期236-249,共14页
Outcrop analogue studies play an important role in advancing our comprehension of reservoir architectures,offering insights into hidden reservoir rocks prior to drilling,in a cost-effective manner.These studies contri... Outcrop analogue studies play an important role in advancing our comprehension of reservoir architectures,offering insights into hidden reservoir rocks prior to drilling,in a cost-effective manner.These studies contribute to the delineation of the three-dimensional geometry of geological structures,the characterization of petro-and thermo-physical properties,and the structural geological aspects of reservoir rocks.Nevertheless,several challenges,including inaccessible sampling sites,limited resources,and the dimensional constraints of different laboratories hinder the acquisition of comprehensive datasets.In this study,we employ machine learning techniques to estimate missing data in a petrophysical dataset of fractured Variscan granites from the Cornubian Batholith in Southwest UK.The utilization of mean,k-nearest neighbors,and random forest imputation methods addresses the challenge of missing data,thereby revealing the effectiveness of random forest imputation in providing realistic estimations.Subsequently,supervised classification models are trained to classify samples according to their pluton origins,with promising accuracy achieved by models trained with imputed values.Variable importance ranking of the models showed that the choice of imputation method influences the inferred importance of specific petrophysical properties.While porosity(POR)and grain density(GD)were among important variables,variables with high missingness ratio were not among the top variables.This study demonstrates the value of machine learning in enhancing petrophysical datasets,while emphasizing the importance of careful method selection and model validation for reliable results.The findings contribute to a more informed decision-making process in geothermal exploration and reservoir tion characteriza-efforts,thereby demonstrating the potential of machine learning in advancing subsurface characterization techniques. 展开更多
关键词 Machine learning Cornwall Geothermal GRANITE Petrophysical data IMPUTATION
在线阅读 下载PDF
Experimental dataset from BESⅢdetector at Beijing electron-positron collider
6
作者 Ming-Hua Liao Jian-Shu Liu +2 位作者 Xin-Nan Wang Sheng-Sen Sun Zheng-Yun You 《Nuclear Science and Techniques》 2025年第11期83-88,共6页
In the BESⅢdetector at Beijing electron-positron collider,billions of events from e^(+)e^(-)collisions were recorded.These events passing through the trigger system were saved in raw data format files.They play an im... In the BESⅢdetector at Beijing electron-positron collider,billions of events from e^(+)e^(-)collisions were recorded.These events passing through the trigger system were saved in raw data format files.They play an important role in the study of physics inτ-charm energy region.Here,we published an e^(+)e^(-)collision dataset containing both Monte Carlo simulation samples and real data collected by the BESⅢdetector.The data pass through the detector trigger system,file format conversion,and physics information extraction and finally save the physics information and detector response in text format files.This dataset is publicly available and is intended to provide interested scientists and those outside of the BESⅢcollaboration with event information from BESⅢ,which can be used to understand physics research in e^(+)e^(-)collisions,developing visualization projects for physics education,public outreach,and science advocacy. 展开更多
关键词 Electron-positron collision BESIII Data sharing Education Visualization
在线阅读 下载PDF
Experimental dataset from CDEX's high-purity germanium detectors in China Jinping Underground Laboratory
7
作者 Li-Tao Yang Zhen-Yu Zhang +2 位作者 Hao Ma Qian Yue Zhi Zeng 《Nuclear Science and Techniques》 2025年第11期89-93,共5页
Founded in 2009,the China Dark Matter Experiment(CDEX)collaboration was dedicated to the detection of dark matter(DM)and neutrinoless double beta decay using high-purity germanium(HPGe)detectors in the China Jinping U... Founded in 2009,the China Dark Matter Experiment(CDEX)collaboration was dedicated to the detection of dark matter(DM)and neutrinoless double beta decay using high-purity germanium(HPGe)detectors in the China Jinping Underground Laboratory.HPGe detectors are characterized by a high energy resolution,low analysis threshold,and low radioactive background,making them an ideal platform for the direct detection of DM.Over the years,CDEX has accumulated a massive amount of experimental data,based on which various results on DM detection and neutrinoless double beta decay have been presented.Because the dataset was collected in a low-background environment,apart from the analysis of DM-related physical channels,it has great potential as an indicator in other rare physical events searches.Furthermore,by providing raw pulse shapes,the dataset can serve as a tool for effectively understanding the internal mechanisms of HPGe detectors. 展开更多
关键词 Low-background experiment Pulse shapes Raw data HPGe detectors CDEX
在线阅读 下载PDF
A small step towards the epistemic decentralization of science:A dataset of journals and publications indexed in African Journals Online
8
作者 Patricia Alonso-Álvarez 《Journal of Data and Information Science》 2025年第4期104-121,共18页
Purpose:This paper examines African Journals Online(AJOL)as a bibliometric resource,providing a structured dataset of journal and publication metadata.In addition,it integrates AJOL data with OpenAlex to enhance metad... Purpose:This paper examines African Journals Online(AJOL)as a bibliometric resource,providing a structured dataset of journal and publication metadata.In addition,it integrates AJOL data with OpenAlex to enhance metadata coverage and improve interoperability with other bibliometric sources.Design/methodology/approach:The journal list and publications indexed in AJOL were retrieved using web scraping techniques.This paper details the database construction process,highlighting its strengths and limitations,and presents a descriptive analysis of AJOL’s indexed journals and publications.Findings:The publication analysis demonstrates a steady growth in the number of publications over time but reveals significant disparities in their distribution across African countries.This paper presents an example of the possibility of integrating both sources using author country data from OpenAlex.The analysis of author contributions reveals that African journals serve as both regional and international venues,confirming that African journals play a dual role in fostering both regional and global research engagement.Research limitations:While AJOL contains relevant information for identifying and providing insights about African publications and journals,its metadata are limited.Therefore,the kind of analysis that can be performed with the database presented here is also limited.The integration with OpenAlex aims to overcome some of the limitations.Finally,although some automatic citation procedures have been performed,the metadata has not been manually curated.Therefore,if errors or inaccuracies are present in the AJOL,they may be reproduced in this database.Practical implications:The database introduced in this article contributes to the accessibility of African scholarly publications by providing structured,accessible metadata derived from the AJOL.It facilitates bibliometric analyses that are more representative of African research activities.This contribution complements ongoing efforts to develop alternative data sources and infrastructure that better reflect the diversity of global knowledge production.Originality/value:This paper presents a novel database for bibliometric analysis and offers a detailed report of the retrieval and construction procedures.The inclusion of matched data with OpenAlex further enhances the database’s utility.By showcasing AJOL’s potential,this study contributes to the broader goal of fostering inclusivity and improving the representation of African research in global bibliometric analyses. 展开更多
关键词 Decentralization of science African journals online African science Data paper
在线阅读 下载PDF
Predicting financial distress in high‑dimensional imbalanced datasets: a multi‑heterogeneous self‑paced ensemble learning framework
9
作者 Ruize Gao Shaoze Cui +1 位作者 Yu Wang Wei Xu 《Financial Innovation》 2025年第1期1656-1689,共34页
Financial distress prediction(FDP)is a critical area of study for researchers,industry stakeholders,and regulatory authorities.However,FDP tasks present several challenges,including high-dimensional datasets,class imb... Financial distress prediction(FDP)is a critical area of study for researchers,industry stakeholders,and regulatory authorities.However,FDP tasks present several challenges,including high-dimensional datasets,class imbalances,and the complexity of parameter optimization.These issues often hinder the predictive model’s ability to accurately identify companies at high risk of financial distress.To mitigate these challenges,we introduce FinMHSPE—a novel multi-heterogeneous self-paced ensemble(MHSPE)FDP learning framework.The proposed model uses pairwise comparisons of data from multiple time frames combined with the maximum relevance and minimum redundancy method to select an optimal subset of features,effectively resolving the high dimensionality issue.Furthermore,the proposed framework incorporates the MHSPE model to iteratively identify the most informative majority class data samples,effectively addressing the class imbalance issue.To optimize the model’s parameters,we leverage the particle swarm optimization algorithm.The robustness of our proposed model is validated through extensive experiments performed on a financial dataset of Chinese listed companies.The empirical results demonstrate that the proposed model outperforms existing competing models in the field of FDP.Specifically,our FinMHSPE framework achieves the highest performance,achieving an area under the curve(AUC)value of 0.9574,considerably surpassing all existing methods.A comparative analysis of AUC values further reveals that FinMHSPE outperforms state-of-the-art approaches that rely on financial features as inputs.Furthermore,our investigation identifies several valuable features for enhancing FDP model performance,notably those associated with a company’s information and growth potential. 展开更多
关键词 Financial distress prediction Feature selection Imbalanced data Ensemble learning Particle swarm optimization
在线阅读 下载PDF
Generating Synthetic Data for Machine Learning Models from the Pediatric Heart Network Fontan I Dataset
10
作者 Vatche Bahudian John Valdovinos 《Congenital Heart Disease》 2025年第1期115-127,共13页
Background: The population of Fontan patients, patients born with a single functioningventricle, is growing. There is a growing need to develop algorithms for this population that can predicthealth outcomes. Artiffcia... Background: The population of Fontan patients, patients born with a single functioningventricle, is growing. There is a growing need to develop algorithms for this population that can predicthealth outcomes. Artiffcial intelligence models predicting short-term and long-term health outcomes forpatients with the Fontan circulation are needed. Generative adversarial networks (GANs) provide a solutionfor generating realistic and useful synthetic data that can be used to train such models. Methods: Despitetheir promise, GANs have not been widely adopted in the congenital heart disease research communitydue, in some part, to a lack of knowledge on how to employ them. In this research study, a GAN was usedto generate synthetic data from the Pediatric Heart Network Fontan I dataset. A subset of data consistingof the echocardiographic and BNP measures collected from Fontan patients was used to train the GAN.Two sets of synthetic data were created to understand the effect of data missingness on synthetic datageneration. Synthetic data was created from real data in which the missing values were imputed usingMultiple Imputation by Chained Equations (MICE) (referred to as synthetic from imputed real samples). Inaddition, synthetic data was created from real data in which the missing values were dropped (referred to assynthetic from dropped real samples). Both synthetic datasets were evaluated for ffdelity by using visualmethods which involved comparing histograms and principal component analysis (PCA) plots. Fidelitywas measured quantitatively by (1) comparing synthetic and real data using the Kolmogorov-Smirnovtest to evaluate the similarity between two distributions and (2) training a neural network to distinguishbetween real and synthetic samples. Both synthetic datasets were evaluated for utility by training aneural network with synthetic data and testing the neural network on its ability to classify patients thathave ventricular dysfunction using echocardiograph measures and serological measures. Results: Usinghistograms, associated probability density functions, and (PCA), both synthetic datasets showed visualresemblance in distribution and variance to real Fontan data. Quantitatively, synthetic data from droppedreal samples had higher similarity scores, as demonstrated by the Kolmogorov–Smirnov statistic, for all butone feature (age at Fontan) compared to synthetic data from imputed real samples, which demonstrateddissimilar scores for three features (Echo SV, Echo tda, and BNP). In addition, synthetic data from droppedreal samples resembled real data to a larger extent (49.3% classiffcation error) than synthetic data fromimputed real samples (65.28% classiffcation error). Classiffcation errors approximating 50% represent datasetsthat are indistinguishable. In terms of utility, synthetic data created from real data in which the missingvalues were imputed classiffed ventricular dysfunction in real data with a classiffcation error of 10.99%.Similarly, utility of the generated synthetic data by showing that a neural network trained on synthetic dataderived from real data in which the missing values were dropped could classify ventricular dysfunction inreal data with a classiffcation error of 9.44%. Conclusions: Although representing a limited subset of thevast data available on the Pediatric Heart Network, generative adversarial networks can create syntheticdata that mimics the probability distribution of real Fontan echocardiographic measures. Clinicians can usethese synthetic data to create models that predict health outcomes for Fontan patients. 展开更多
关键词 Synthetic data congenital heart disease Fontan circulation
暂未订购
MAID:making accurate transmission line icing detector by enhancing inaccurate dataset
11
作者 SUN Wei WANG Yu +3 位作者 GAO Bo ZHANG Shujuan WANG Xiaojin XING Lu 《Optoelectronics Letters》 2025年第10期606-611,共6页
Power transmission lines are a critical component of the entire power system,and ice accretion incidents caused by various types of power systems can result in immeasurable harm.Currently,network models used for ice d... Power transmission lines are a critical component of the entire power system,and ice accretion incidents caused by various types of power systems can result in immeasurable harm.Currently,network models used for ice detection on power transmission lines require a substantial amount of sample data to support their training,and their drawback is that detection accuracy is significantly affected by the inaccurate annotation among training dataset.Therefore,we propose a transformer-based detection model,structured into two stages to collectively address the impact of inaccurate datasets on model training.In the first stage,a spatial similarity enhancement(SSE)module is designed to leverage spatial information to enhance the construction of the detection framework,thereby improving the accuracy of the detector.In the second stage,a target similarity enhancement(TSE)module is introduced to enhance object-related features,reducing the impact of inaccurate data on model training,thereby expanding global correlation.Additionally,by incorporating a multi-head adaptive attention window(MAAW),spatial information is combined with category information to achieve information interaction.Simultaneously,a quasi-wavelet structure,compatible with deep learning,is employed to highlight subtle features at different scales.Experimental results indicate that the proposed model in this paper outperforms existing mainstream detection models,demonstrating superior performance and stability. 展开更多
关键词 power transmission lines ice detection ice accretion sample data spatial similarity enhancement power systemand transformer based model power systems
原文传递
Grid data management based on dataspace 被引量:1
12
作者 倪彤前 吴开贵 +1 位作者 刘鹏 刘永金 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期257-259,共3页
To manipulate the heterogeneous and distributed data better in the data grid,a dataspace management framework for grid data is proposed based on in-depth research on grid technology.Combining technologies in dataspace... To manipulate the heterogeneous and distributed data better in the data grid,a dataspace management framework for grid data is proposed based on in-depth research on grid technology.Combining technologies in dataspace management,such as data model iDM and query language iTrails,with the grid data access middleware OGSA-DAI,a grid dataspace management prototype system is built,in which tasks like data accessing,Abstraction,indexing,services management and answer-query are implemented by the OGSA-DAI workflows.Experimental results show that it is feasible to apply a dataspace management mechanism to the grid environment.Dataspace meets the grid data management needs in that it hides the heterogeneity and distribution of grid data and can adapt to the dynamic characteristics of the grid.The proposed grid dataspace management provides a new method for grid data management. 展开更多
关键词 GRID dataspace data model OGSA-DAI WORKFLOW
在线阅读 下载PDF
远程测控数据传输中的DataSocket技术应用 被引量:7
13
作者 李伯全 潘海彬 +1 位作者 罗开玉 周重益 《江苏大学学报(自然科学版)》 EI CAS 2004年第4期285-288,共4页
作为一种新的网络通讯编程技术,DataSocket是利用虚拟仪器技术构建远程、分布式网络测控系统的核心技术之一.DataSocket以TCP/IP为基础,支持多种协议,并对底层进行了高度封装,大大简化了同一台计算机上的应用程序,通过网络实现不同计算... 作为一种新的网络通讯编程技术,DataSocket是利用虚拟仪器技术构建远程、分布式网络测控系统的核心技术之一.DataSocket以TCP/IP为基础,支持多种协议,并对底层进行了高度封装,大大简化了同一台计算机上的应用程序,通过网络实现不同计算机上的动态数据交换.其独特之处在于:为自动化测量应用程序提供了一个易学易用,性能较高的编程接口,实现数据的实时发布和共享.以LabVIEW作为虚拟仪器软件开发平台,基于DataSocket技术组建远程测控系统并利用先进的浏览器技术进行试验,取得了成功,实现了真正意义上的远程测控. 展开更多
关键词 虚拟仪器 测控网络 DATA SOCKET
在线阅读 下载PDF
Pavement Cracks Coupled With Shadows:A New Shadow-Crack Dataset and A Shadow-Removal-Oriented Crack Detection Approach 被引量:3
14
作者 Lili Fan Shen Li +3 位作者 Ying Li Bai Li Dongpu Cao Fei-Yue Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第7期1593-1607,共15页
Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,whi... Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,which interfere with the crack detection performance.Till to the present,there still lacks efficient algorithm models and training datasets to deal with the interference brought by the shadows.To fill in the gap,we made several contributions as follows.First,we proposed a new pavement shadow and crack dataset,which contains a variety of shadow and pavement pixel size combinations.It also covers all common cracks(linear cracks and network cracks),placing higher demands on crack detection methods.Second,we designed a two-step shadow-removal-oriented crack detection approach:SROCD,which improves the performance of the algorithm by first removing the shadow and then detecting it.In addition to shadows,the method can cope with other noise disturbances.Third,we explored the mechanism of how shadows affect crack detection.Based on this mechanism,we propose a data augmentation method based on the difference in brightness values,which can adapt to brightness changes caused by seasonal and weather changes.Finally,we introduced a residual feature augmentation algorithm to detect small cracks that can predict sudden disasters,and the algorithm improves the performance of the model overall.We compare our method with the state-of-the-art methods on existing pavement crack datasets and the shadow-crack dataset,and the experimental results demonstrate the superiority of our method. 展开更多
关键词 Automatic pavement crack detection data augmentation compensation deep learning residual feature augmentation shadow removal shadow-crack dataset
在线阅读 下载PDF
Intercomparison of the Extended Reconstructed Sea Surface Temperature v4 and v3b Datasets 被引量:1
15
作者 WANG Jinping CHEN Xianyao 《Journal of Ocean University of China》 SCIE CAS CSCD 2018年第2期209-218,共10页
Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences i... Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences in the characteristics of the sea surface temperature(SST) anomaly(SSTa) in both the temporal and spatial domains. First, the largest discrepancy of the global mean SSTa values around the 1940 s is due to ship-observation corrections made to reconcile observations from buckets and engine intake thermometers. Second, differences in global and regional mean SSTa values between v4 and v3b exhibit a downward trend(around-0.032℃ per decade) before the 1940s, an upward trend(around 0.014℃ per decade) during the period of 1950–2015, interdecadal oscillation with one peak around the 1980s, and two troughs during the 1960s and 2000s, respectively. This does not derive from treatments of the polar or the other data-void regions, since the difference of the SSTa does not share the common features. Third, the spatial pattern of the ENSO-related variability of v4 exhibits a wider but weaker cold tongue in the tropical region of the Pacific Ocean compared with that of v3b, which could be attributed to differences in gap-filling assumptions since the latter features satellite observations whereas the former features in situ ones. This intercomparison confirms that the structural uncertainty arising from underlying assumptions on the treatment of diverse SST observations even in the same SST product family is the main source of significant SST differences in the temporal domain. Why this uncertainty introduces artificial decadal oscillations remains unknown. 展开更多
关键词 ERSST datasets SEA surface temperature global WARMING ARCTIC data intercomparison
在线阅读 下载PDF
Research on Enhanced Contraband Dataset ACXray Based on ETL 被引量:1
16
作者 Xueping Song Jianming Yang +1 位作者 Shuyu Zhang Jicun Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第6期4551-4572,共22页
To address the shortage of public datasets for customs X-ray images of contraband and the difficulties in deploying trained models in engineering applications,a method has been proposed that employs the Extract-Transf... To address the shortage of public datasets for customs X-ray images of contraband and the difficulties in deploying trained models in engineering applications,a method has been proposed that employs the Extract-Transform-Load(ETL)approach to create an X-ray dataset of contraband items.Initially,X-ray scatter image data is collected and cleaned.Using Kafka message queues and the Elasticsearch(ES)distributed search engine,the data is transmitted in real-time to cloud servers.Subsequently,contraband data is annotated using a combination of neural networks and manual methods to improve annotation efficiency and implemented mean hash algorithm for quick image retrieval.The method of integrating targets with backgrounds has enhanced the X-ray contraband image data,increasing the number of positive samples.Finally,an Airport Customs X-ray dataset(ACXray)compatible with customs business scenarios has been constructed,featuring an increased number of positive contraband samples.Experimental tests using three datasets to train the Mask Region-based Convolutional Neural Network(Mask R-CNN)algorithm and tested on 400 real customs images revealed that the recognition accuracy of algorithms trained with Security Inspection X-ray(SIXray)and Occluded Prohibited Items X-ray(OPIXray)decreased by 16.3%and 15.1%,respectively,while the ACXray dataset trained algorithm’s accuracy was almost unaffected.This indicates that the ACXray dataset-trained algorithm possesses strong generalization capabilities and is more suitable for customs detection scenarios. 展开更多
关键词 X-ray contraband ETL data enhancement datasET
在线阅读 下载PDF
A LiDAR Point Clouds Dataset of Ships in a Maritime Environment 被引量:1
17
作者 Qiuyu Zhang Lipeng Wang +2 位作者 Hao Meng Wen Zhang Genghua Huang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第7期1681-1694,共14页
For the first time, this article introduces a LiDAR Point Clouds Dataset of Ships composed of both collected and simulated data to address the scarcity of LiDAR data in maritime applications. The collected data are ac... For the first time, this article introduces a LiDAR Point Clouds Dataset of Ships composed of both collected and simulated data to address the scarcity of LiDAR data in maritime applications. The collected data are acquired using specialized maritime LiDAR sensors in both inland waterways and wide-open ocean environments. The simulated data is generated by placing a ship in the LiDAR coordinate system and scanning it with a redeveloped Blensor that emulates the operation of a LiDAR sensor equipped with various laser beams. Furthermore,we also render point clouds for foggy and rainy weather conditions. To describe a realistic shipping environment, a dynamic tail wave is modeled by iterating the wave elevation of each point in a time series. Finally, networks serving small objects are migrated to ship applications by feeding our dataset. The positive effect of simulated data is described in object detection experiments, and the negative impact of tail waves as noise is verified in single-object tracking experiments. The Dataset is available at https://github.com/zqy411470859/ship_dataset. 展开更多
关键词 3D point clouds dataset dynamic tail wave fog simulation rainy simulation simulated data
在线阅读 下载PDF
ADC-DL:Communication-Efficient Distributed Learning with Hierarchical Clustering and Adaptive Dataset Condensation
18
作者 Zhipeng Gao Yan Yang +1 位作者 Chen Zhao Zijia Mo 《China Communications》 SCIE CSCD 2022年第12期73-85,共13页
The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized... The rapid growth of modern mobile devices leads to a large number of distributed data,which is extremely valuable for learning models.Unfortunately,model training by collecting all these original data to a centralized cloud server is not applicable due to data privacy and communication costs concerns,hindering artificial intelligence from empowering mobile devices.Moreover,these data are not identically and independently distributed(Non-IID)caused by their different context,which will deteriorate the performance of the model.To address these issues,we propose a novel Distributed Learning algorithm based on hierarchical clustering and Adaptive Dataset Condensation,named ADC-DL,which learns a shared model by collecting the synthetic samples generated on each device.To tackle the heterogeneity of data distribution,we propose an entropy topsis comprehensive tiering model for hierarchical clustering,which distinguishes clients in terms of their data characteristics.Subsequently,synthetic dummy samples are generated based on the hierarchical structure utilizing adaptive dataset condensation.The procedure of dataset condensation can be adjusted adaptively according to the tier of the client.Extensive experiments demonstrate that the performance of our ADC-DL is more outstanding in prediction accuracy and communication costs compared with existing algorithms. 展开更多
关键词 distributed learning Non-IID data partition hierarchical clustering adaptive dataset condensation
在线阅读 下载PDF
An accessible seismological dataset of 2021 Yangbi M_(S)6.4 earthquake
19
作者 Shuguang Wang Hui Yang +12 位作者 Weilai Wang Fang Wang Zheng Liu Wei Yang Weitao Wang Yunpeng Zhang Lu Li Jiupeng Hu Xiaobin Li Wenjian Cha Beng Ye Hongbo Zhu Jun Yang 《Earthquake Science》 2021年第5期460-464,共5页
A M_(S)6.4 earthquake occurred on 21 May 2021 in Yangbi county,Dali prefecture,Yunnan,China,at 21:48 Beijing Time(13:48 UTC).Earthquakes with an M3.0 or higher occurred before and after the main shock.Seismic data ana... A M_(S)6.4 earthquake occurred on 21 May 2021 in Yangbi county,Dali prefecture,Yunnan,China,at 21:48 Beijing Time(13:48 UTC).Earthquakes with an M3.0 or higher occurred before and after the main shock.Seismic data analysis is essential for the in-depth investigation of the 2021 Yangbi M_(S)6.4 earthquake sequence and the seismotectonics of northwestern Yunnan.Institute of Geophysics,China Earthquake Administration(CEA),has compiled a dataset of seismological observations from 157 broadband stations located within 500 km of the epicenter,and has made this dataset available to the earthquake science research community.The dataset(total file size:329 GB)consists of event waveforms with a sampling frequency of 100 sps collected from 18 to 28 May 2021,20-Hz and 100-Hz continuous waveforms collected from 12 to 31 May 2021,and seismic instrument response files.To promote data sharing,the dataset also includes the seismic event waveforms from 20 to 22 May 2021 recorded at 50 stations of the ongoing Binchuan Active Source Geophysical Observation Project,for which the data protection period has not expired.Sample waveforms of the main shock are included in the appendix of this article and can be downloaded from the Earthquake Science website.The event and continuous waveforms are available from the Earthquake Science Data Center website(www.esdc.ac.cn)on application. 展开更多
关键词 Yangbi earthquake seismological dataset geophys-ical observation data share
在线阅读 下载PDF
Terrorism Attack Classification Using Machine Learning: The Effectiveness of Using Textual Features Extracted from GTD Dataset
20
作者 Mohammed Abdalsalam Chunlin Li +1 位作者 Abdelghani Dahou Natalia Kryvinska 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1427-1467,共41页
One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelli... One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelligence (AI) havebecome the basis for making strategic decisions in many sensitive areas, such as fraud detection, risk management,medical diagnosis, and counter-terrorism. However, there is still a need to assess how terrorist attacks are related,initiated, and detected. For this purpose, we propose a novel framework for classifying and predicting terroristattacks. The proposed framework posits that neglected text attributes included in the Global Terrorism Database(GTD) can influence the accuracy of the model’s classification of terrorist attacks, where each part of the datacan provide vital information to enrich the ability of classifier learning. Each data point in a multiclass taxonomyhas one or more tags attached to it, referred as “related tags.” We applied machine learning classifiers to classifyterrorist attack incidents obtained from the GTD. A transformer-based technique called DistilBERT extracts andlearns contextual features from text attributes to acquiremore information from text data. The extracted contextualfeatures are combined with the “key features” of the dataset and used to perform the final classification. Thestudy explored different experimental setups with various classifiers to evaluate the model’s performance. Theexperimental results show that the proposed framework outperforms the latest techniques for classifying terroristattacks with an accuracy of 98.7% using a combined feature set and extreme gradient boosting classifier. 展开更多
关键词 Artificial intelligence machine learning natural language processing data analytic DistilBERT feature extraction terrorism classification GTD dataset
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部