期刊文献+
共找到3,935篇文章
< 1 2 197 >
每页显示 20 50 100
Robust Multi-Label Cartoon Character Classification on the Novel Kral Sakir Dataset Using Deep Learning Techniques
1
作者 Candan Tumer Erdal Guvenoglu Volkan Tunali 《Computers, Materials & Continua》 2025年第12期5135-5158,共24页
Automated cartoon character recognition is crucial for applications in content indexing,filtering,and copyright protection,yet it faces a significant challenge in animated media due to high intra-class visual variabil... Automated cartoon character recognition is crucial for applications in content indexing,filtering,and copyright protection,yet it faces a significant challenge in animated media due to high intra-class visual variability,where characters frequently alter their appearance.To address this problem,we introduce the novel Kral Sakir dataset,a public benchmark of 16,725 images specifically curated for the task of multi-label cartoon character classification under these varied conditions.This paper conducts a comprehensive benchmark study,evaluating the performance of state-of-the-art pretrained Convolutional Neural Networks(CNNs),including DenseNet,ResNet,and VGG,against a custom baseline model trained from scratch.Our experiments,evaluated using metrics of F1-Score,accuracy,and Area Under the ROC Curve(AUC),demonstrate that fine-tuning pretrained models is a highly effective strategy.The best-performing model,DenseNet121,achieved an F1-Score of 0.9890 and an accuracy of 0.9898,significantly outperforming our baseline CNN(F1-Score of 0.9545).The findings validate the power of transfer learning for this domain and establish a strong performance benchmark.The introduced dataset provides a valuable resource for future research into developing robust and accurate character recognition systems. 展开更多
关键词 Cartoon character recognition multi-label classification deep learning transfer learning predictive modelling artificial intelligence-enhanced(AI-Enhanced)systems Kral Sakir dataset
在线阅读 下载PDF
BWRadarDataset-1.0:多波段多模态雷达探测感知数据集
2
作者 张转花 靳俊峰 +22 位作者 常沛 何洋洋 汪振亚 侯其立 李玉景 郝慧军 曾怡 夏勇 商国军 许涛 任伟杰 雷鸣 王歆远 寿博 邓丽颖 任乐乐 窦曼莉 杨利红 张琦珺 李伟 牛蕾 林晓斌 张志成 《雷达科学与技术》 北大核心 2026年第1期1-14,共14页
雷达探测感知技术飞速发展浪潮下高质量数据集在算法创新、模型训练与性能验证中发挥着重要作用。当前,深度学习等数据驱动方法已成为提升雷达在检测、跟踪、识别、干扰及合成孔径雷达(SAR)成像等核心任务性能的关键。然而,现有的数据... 雷达探测感知技术飞速发展浪潮下高质量数据集在算法创新、模型训练与性能验证中发挥着重要作用。当前,深度学习等数据驱动方法已成为提升雷达在检测、跟踪、识别、干扰及合成孔径雷达(SAR)成像等核心任务性能的关键。然而,现有的数据集大多基于仿真生成,与真实电磁环境存在差异,泛化能力受限,并且现有的数据集仅针对单一功能,例仅有检测或SAR,缺乏系统性,难以支撑探测感知处理的一体化研究。针对这一空白,本文公开了一套完整的雷达检测跟踪识别一体化数据集。该数据集源于典型的实测场景,涵盖了信号处理、目标跟踪、精细识别、复合干扰以及高分辨率SAR图像的多波段、多模态数据,真实反映复杂环境下雷达信号的传播特性与目标特性。进一步,本文对数据集中的关键特征进行了系统性提取与分析,为不同任务的算法研究与性能评估提供了标准化的特征输入,为研究雷达智能化信号与信息处理提供了坚实的基础。 展开更多
关键词 雷达探测 公开数据集 特征提取 目标检测 目标跟踪 目标识别 有源干扰 SAR图像 特征分析
在线阅读 下载PDF
A standardized dataset of CO-TPD spectra on transitionmetal single-crystal surfaces
3
作者 YANG Lin WU Jianghong WANG He 《燃料化学学报(中英文)》 北大核心 2026年第4期180-190,共11页
Temperature-programmed desorption(TPD)is a fundamental technique in surface science and heterogeneous catalysis for characterizing adsorption behavior,and for extracting key parameters such as adsorption energy.Howeve... Temperature-programmed desorption(TPD)is a fundamental technique in surface science and heterogeneous catalysis for characterizing adsorption behavior,and for extracting key parameters such as adsorption energy.However,the majority of existing TPD data is accessible in the form of published images,which lacks structured and quantitative datasets.This constrains its utility for rigorous quantitative analysis and computational modelling.Using carbon monoxide(CO)which is a widely adopted probe molecule,a curated and standardized dataset of CO-TPD is constructed,encompassing 14 transition-metal single-crystal surfaces,including copper(Cu)and ruthenium(Ru).By systematically extracting numerical data points from published spectra and applying normalization,essential spectral features such as peak shape are fully preserved.The dataset also documents relevant experimental parameters,including heating rates,and was developed using a standardized protocol for data collection and quality control.This resource serves as both a reference library to support the deconvolution of TPD spectra from complex catalysts and an experimental benchmark for calibrating parameters in theoretical models.By providing a reliable and accessible data function,this work advances the microscopic understanding and the rational design of catalyst active centers. 展开更多
关键词 CO-TPD standardized dataset transition metal single-crystal surfaces
在线阅读 下载PDF
A Convolutional Neural Network-Based Deep Support Vector Machine for Parkinson’s Disease Detection with Small-Scale and Imbalanced Datasets
4
作者 Kwok Tai Chui Varsha Arya +2 位作者 Brij B.Gupta Miguel Torres-Ruiz Razaz Waheeb Attar 《Computers, Materials & Continua》 2026年第1期1410-1432,共23页
Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using d... Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested. 展开更多
关键词 Convolutional neural network data generation deep support vector machine feature extraction generative artificial intelligence imbalanced dataset medical diagnosis Parkinson’s disease small-scale dataset
在线阅读 下载PDF
A Unified Feature Selection Framework Combining Mutual Information and Regression Optimization for Multi-Label Learning
5
作者 Hyunki Lim 《Computers, Materials & Continua》 2026年第4期1262-1281,共20页
High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of ... High-dimensional data causes difficulties in machine learning due to high time consumption and large memory requirements.In particular,in amulti-label environment,higher complexity is required asmuch as the number of labels.Moreover,an optimization problem that fully considers all dependencies between features and labels is difficult to solve.In this study,we propose a novel regression-basedmulti-label feature selectionmethod that integrates mutual information to better exploit the underlying data structure.By incorporating mutual information into the regression formulation,the model captures not only linear relationships but also complex non-linear dependencies.The proposed objective function simultaneously considers three types of relationships:(1)feature redundancy,(2)featurelabel relevance,and(3)inter-label dependency.These three quantities are computed usingmutual information,allowing the proposed formulation to capture nonlinear dependencies among variables.These three types of relationships are key factors in multi-label feature selection,and our method expresses them within a unified formulation,enabling efficient optimization while simultaneously accounting for all of them.To efficiently solve the proposed optimization problem under non-negativity constraints,we develop a gradient-based optimization algorithm with fast convergence.Theexperimental results on sevenmulti-label datasets show that the proposed method outperforms existingmulti-label feature selection techniques. 展开更多
关键词 feature selection multi-label learning regression model optimization mutual information
在线阅读 下载PDF
Multi-Label Classification Model Using Graph Convolutional Neural Network for Social Network Nodes
6
作者 Junmin Lyu Guangyu Xu +4 位作者 Feng Bao Yu Zhou Yuxin Liu Siyu Lu Wenfeng Zheng 《Computer Modeling in Engineering & Sciences》 2026年第2期1235-1256,共22页
Graph neural networks(GNN)have shown strong performance in node classification tasks,yet most existing models rely on uniform or shared weight aggregation,lacking flexibility in modeling the varying strength of relati... Graph neural networks(GNN)have shown strong performance in node classification tasks,yet most existing models rely on uniform or shared weight aggregation,lacking flexibility in modeling the varying strength of relationships among nodes.This paper proposes a novel graph coupling convolutional model that introduces an adaptive weighting mechanism to assign distinct importance to neighboring nodes based on their similarity to the central node.Unlike traditional methods,the proposed coupling strategy enhances the interpretability of node interactions while maintaining competitive classification performance.The model operates in the spatial domain,utilizing adjacency list structures for efficient convolution and addressing the limitations of weight sharing through a coupling-based similarity computation.Extensive experiments are conducted on five graph-structured datasets,including Cora,Citeseer,PubMed,Reddit,and BlogCatalog,as well as a custom topology dataset constructed from the Open University Learning Analytics Dataset(OULAD)educational platform.Results demonstrate that the proposed model achieves good classification accuracy,while significantly reducing training time through direct second-order neighbor fusion and data preprocessing.Moreover,analysis of neighborhood order reveals that considering third-order neighbors offers limited accuracy gains but introduces considerable computational overhead,confirming the efficiency of first-and second-order convolution in practical applications.Overall,the proposed graph coupling model offers a lightweight,interpretable,and effective framework for multi-label node classification in complex networks. 展开更多
关键词 GNN social networks nodes multi-label classification model graphic convolution neural network coupling principle
在线阅读 下载PDF
Multi-label fundus disease classification using dual-branch deep learning:an intelligent diagnosis framework inspired by traditional Chinese medicine Five Wheels theory
7
作者 Xin He Xiaohui Li +5 位作者 Jun Peng Lei Sun Dan Shu Li Xiao Qinghua Peng Xiaoxia Xiao 《Digital Chinese Medicine》 2026年第1期80-90,共11页
Objective To develop a dual-branch deep learning framework for accurate multi-label classification of fundus diseases,addressing the key limitations of insufficient complementary feature extraction and inadequate cros... Objective To develop a dual-branch deep learning framework for accurate multi-label classification of fundus diseases,addressing the key limitations of insufficient complementary feature extraction and inadequate cross-modal feature fusion in existing automated diagnostic methods.Methods The fundus multi-label classification dataset with 12 disease categories(FMLC-12)dataset was constructed by integrating complementary samples from Ocular Disease Intelligent Recognition(ODIR)and Retinal Fundus Multi-Disease Image Dataset(RFMiD),yielding 6936 fundus images across 12 retinal pathology categories,and the framework was validated on both FMLC-12 and ODIR.Inspired by the holistic multi-regional assessment principle of the Five Wheels theory in traditional Chinese medicine(TCM)ophthalmology,the dualbranch multi-label network(DBMNet)was developed as a novel framework integrating complementary visual feature extraction with pathological correlation modeling.The architecture employed a TransNeXt backbone within a dual-branch design:one branch processed redgreen-blue(RGB)images to capture color-dependent features,such as vascular patterns and lesion morphology,while the other processed grayscale-converted images to enhance subtle textural details and contrast variations.A feature interaction module(FIM)effectively integrated the multi-scale features from both branches.Comprehensive ablation studies were conducted to evaluate the contributions of the dual-branch architecture and the FIM.The performance of DBMNet was compared against four state-of-the-art methods,including EfficientNet Ensemble,transfer learning-based convolutional neural network(CNN),BFENet,and EyeDeep-Net,using mean average precision(mAP),F1-score,and Cohen's kappa coefficient.Results The dual-branch architecture improved mAP by 15.44 percentage points over the single-branch TransNeXt baseline,increasing from 34.41%to 44.24%,and the addition of FIM further boosted mAP to 49.85%.On FMLC-12,DBMNet achieved an mAP of 49.85%,a Cohen’s kappa coefficient of 62.14%,and an F1-score of 70.21%.Compared with BFENet(mAP:45.42%,kappa:46.64%,F1-score:71.34%),DBMNet outperformed it by 4.43 percentage points in mAP and 15.50 percentage points in kappa,while BFENet achieved a marginally higher F1-score.On ODIR,DBMNet achieved an F1-score of 85.50%,comparable to state-of-the-art methods.Conclusion DBMNet effectively integrates RGB and grayscale visual modalities through a dual-branch architecture,significantly improving multi-label fundus disease classification.The framework not only addresses the issue of insufficient feature fusion in existing methods but also demonstrates outstanding performance in balancing detection across both common and rare diseases,providing a promising and clinically applicable pathway for standardized,intelligent fundus disease classification. 展开更多
关键词 multi-label classification Fundus images Deep learning Dual-branch network Traditional Chinese medicine ophthalmology Five Wheels theory
在线阅读 下载PDF
Detection Method for Bolt Loosening of Fan Base through Bayesian Learning with Small Dataset:A Real-World Application
8
作者 Zhongyun Tang Hanyi Xu Haiyang Hu 《Computers, Materials & Continua》 2026年第2期550-578,共29页
With the deep integration of smart manufacturing and IoT technologies,higher demands are placed on the intelligence and real-time performance of industrial equipment fault detection.For industrial fans,base bolt loose... With the deep integration of smart manufacturing and IoT technologies,higher demands are placed on the intelligence and real-time performance of industrial equipment fault detection.For industrial fans,base bolt loosening faults are difficult to identify through conventional spectrum analysis,and the extreme scarcity of fault data leads to limited training datasets,making traditional deep learning methods inaccurate in fault identification and incapable of detecting loosening severity.This paper employs Bayesian Learning by training on a small fault dataset collected from the actual operation of axial-flow fans in a factory to obtain posterior distribution.This method proposes specific data processing approaches and a configuration of Bayesian Convolutional Neural Network(BCNN).It can effectively improve the model’s generalization ability.Experimental results demonstrate high detection accuracy and alignment with real-world applications,offering practical significance and reference value for industrial fan bolt loosening detection under data-limited conditions. 展开更多
关键词 Bolt loosening detection industrial small dataset Bayesian learning INTERPRETABILITY real-world application
在线阅读 下载PDF
Layered Feature Engineering for E-Commerce Purchase Prediction:A Hierarchical Evaluation on Taobao User Behavior Datasets
9
作者 Liqiu Suo Lin Xia +1 位作者 Yoona Chung Eunchan Kim 《Computers, Materials & Continua》 2026年第4期1865-1889,共25页
Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three ... Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers:Basic,Conversion&Stability(efficiency and volatility across actions),and Advanced Interactions&Activity(crossbehavior synergies and intensity).Using real Taobao(Alibaba’s primary e-commerce platform)logs(57,976 records for 10,203 users;25 November–03 December 2017),we conducted a hierarchical,layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution.Across logistic regression(LR),decision tree,random forest,XGBoost,and CatBoost models with stratified 5-fold cross-validation,the performance improvedmonotonically fromBasic to Conversion&Stability to Advanced features.With LR,F1 increased from 0.613(Basic)to 0.962(Advanced);boosted models achieved high discrimination(0.995 AUC Score)and an F1 score up to 0.983.Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short(9-day)window.By making feature contributions measurable and reproducible,the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioralmodeling.The code and processed artifacts are publicly available,and future work will extend the validation to longer,seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design. 展开更多
关键词 Hierarchical feature engineering purchase prediction user behavior dataset feature importance e-commerce platform TAOBAO
在线阅读 下载PDF
Federated Multi-Label Feature Selection via Dual-Layer Hybrid Breeding Cooperative Particle Swarm Optimization with Manifold and Sparsity Regularization
10
作者 Songsong Zhang Huazhong Jin +5 位作者 Zhiwei Ye Jia Yang Jixin Zhang Dongfang Wu Xiao Zheng Dingfeng Song 《Computers, Materials & Continua》 2026年第1期1141-1159,共19页
Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant chal... Multi-label feature selection(MFS)is a crucial dimensionality reduction technique aimed at identifying informative features associated with multiple labels.However,traditional centralized methods face significant challenges in privacy-sensitive and distributed settings,often neglecting label dependencies and suffering from low computational efficiency.To address these issues,we introduce a novel framework,Fed-MFSDHBCPSO—federated MFS via dual-layer hybrid breeding cooperative particle swarm optimization algorithm with manifold and sparsity regularization(DHBCPSO-MSR).Leveraging the federated learning paradigm,Fed-MFSDHBCPSO allows clients to perform local feature selection(FS)using DHBCPSO-MSR.Locally selected feature subsets are encrypted with differential privacy(DP)and transmitted to a central server,where they are securely aggregated and refined through secure multi-party computation(SMPC)until global convergence is achieved.Within each client,DHBCPSO-MSR employs a dual-layer FS strategy.The inner layer constructs sample and label similarity graphs,generates Laplacian matrices to capture the manifold structure between samples and labels,and applies L2,1-norm regularization to sparsify the feature subset,yielding an optimized feature weight matrix.The outer layer uses a hybrid breeding cooperative particle swarm optimization algorithm to further refine the feature weight matrix and identify the optimal feature subset.The updated weight matrix is then fed back to the inner layer for further optimization.Comprehensive experiments on multiple real-world multi-label datasets demonstrate that Fed-MFSDHBCPSO consistently outperforms both centralized and federated baseline methods across several key evaluation metrics. 展开更多
关键词 multi-label feature selection federated learning manifold regularization sparse constraints hybrid breeding optimization algorithm particle swarm optimizatio algorithm privacy protection
在线阅读 下载PDF
Fine-Med-Mental-T&P:a dual-track approach for high-quality instructional datasets of mental disorders in traditional Chinese medicine
11
作者 Yanbai Wei Xiaoshuo Jing Junfeng Yan 《Digital Chinese Medicine》 2026年第1期31-42,共12页
Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine(TCM)mental disorders and to validate its efficacy.Methods We proposed the Fine-Med-Mental-T&P... Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine(TCM)mental disorders and to validate its efficacy.Methods We proposed the Fine-Med-Mental-T&P methodology for constructing high-quality instruction datasets in TCM mental disorders.This approach integrates theoretical knowledge and practical case studies through a dual-track strategy.(i)Theoretical track:textbooks and guidelines on TCM mental disorders were manually segmented.Initial responses were generated using DeepSeek-V3,followed by refinement by the Qwen3-32B model to align the expression with human preferences.A screening algorithm was then applied to select 16000 high-quality instruction pairs.(ii)Practical track:starting from over 600 real clinical case seeds,diagnostic and therapeutic instruction pairs were generated using DeepSeek-V3 and subsequently screened through manual evaluation,resulting in 4000 high-quality practiceoriented instruction pairs.The integration of both tracks yielded the Med-Mental-Instruct-T&P dataset,comprising a total of 20000 instruction pairs.To validate the dataset’s effectiveness,three experimental evaluations(both manual and automated)were conducted:(i)comparative studies to compare the performance of models fine-tuned on different datasets;(ii)benchmarking to compare against mainstream TCM-specific large language models(LLMs);(iii)data ablation study to investigate the relationship between data volume and model performance.Results Experimental results demonstrate the superior performance of T&P-model finetuned on the Med-Mental-Instruct-T&P dataset.In the comparative study,the T&P-model significantly outperformed the baseline models trained solely on self-generated or purely human-curated baseline data.This superiority was evident in both automated metrics(ROUGEL>0.55)and expert manual evaluations(scoring above 7/10 across accuracy).In benchmark comparisons,the T&P-model also excelled against existing mainstream TCM LLMs(e.g.,HuatuoGPT and ZuoyiGPT).It showed particularly strong capabilities in handling diverse clinical presentations,including challenging disorders such as insomnia and coma,showcasing its robustness and versatility.Data ablation studies showed that T&P-model performance had an overall upward trend with minor fluctuations when training data increased from 10%to 50%;beyond 50%,performance improvement slowed significantly,with metrics plateauing and approaching a saturation point. 展开更多
关键词 Mental disorder Traditional Chinese medicine(TCM) Instruction dataset construction Instruction tuning Large language model
在线阅读 下载PDF
Efficient Dataset Generation for Stacked Meat Products Instance Segmentation in Food Automation
12
作者 Hoang Minh Pham Anh Dong Le +2 位作者 Pablo Malvido-Fresnillo Saigopal Vasudevan JoséL.Martínez Lastra 《IEEE/CAA Journal of Automatica Sinica》 2026年第1期224-226,共3页
Dear Editor,This letter presents techniques to simplify dataset generation for instance segmentation of raw meat products,a critical step toward automating food production lines.Accurate segmentation is essential for ... Dear Editor,This letter presents techniques to simplify dataset generation for instance segmentation of raw meat products,a critical step toward automating food production lines.Accurate segmentation is essential for addressing challenges such as occlusions,indistinct edges,and stacked configurations,which demand large,diverse datasets.To meet these demands,we propose two complementary approaches:a semi-automatic annotation interface using tools like the segment anything model(SAM)and GrabCut and a synthetic data generation pipeline leveraging 3D-scanned models.These methods reduce reliance on real meat,mitigate food waste,and improve scalability.Experimental results demonstrate that incorporating synthetic data enhances segmentation model performance and,when combined with real data,further boosts accuracy,paving the way for more efficient automation in the food industry. 展开更多
关键词 dataset generation segment anything model sam food automation raw meat productsa automating food production linesaccurate instance segmentation stacked meat products semi automatic annotation
在线阅读 下载PDF
Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets 被引量:1
13
作者 Shuo Xu Yuefu Zhang +1 位作者 Xin An Sainan Pi 《Journal of Data and Information Science》 CSCD 2024年第2期81-103,共23页
Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t... Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution. 展开更多
关键词 multi-label classification Real-World datasets Hierarchical structure Classification system Label correlation Machine learning
在线阅读 下载PDF
Impact of Dataset Size on Machine Learning Regression Accuracy in Solar Power Prediction 被引量:1
14
作者 S.M.Rezaul Karim Md.Shouquat Hossain +3 位作者 Khadiza Akter Debasish Sarker Md.Moniul Kabir Mamdouh Assad 《Energy Engineering》 2025年第8期3041-3054,共14页
Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence o... Knowing the influence of the size of datasets for regression models can help in improving the accuracy of a solar power forecast and make the most out of renewable energy systems.This research explores the influence of dataset size on the accuracy and reliability of regression models for solar power prediction,contributing to better forecasting methods.The study analyzes data from two solar panels,aSiMicro03036 and aSiTandem72-46,over 7,14,17,21,28,and 38 days,with each dataset comprising five independent and one dependent parameter,and split 80–20 for training and testing.Results indicate that Random Forest consistently outperforms other models,achieving the highest correlation coefficient of 0.9822 and the lowest Mean Absolute Error(MAE)of 2.0544 on the aSiTandem72-46 panel with 21 days of data.For the aSiMicro03036 panel,the best MAE of 4.2978 was reached using the k-Nearest Neighbor(k-NN)algorithm,which was set up as instance-based k-Nearest neighbors(IBk)in Weka after being trained on 17 days of data.Regression performance for most models(excluding IBk)stabilizes at 14 days or more.Compared to the 7-day dataset,increasing to 21 days reduced the MAE by around 20%and improved correlation coefficients by around 2.1%,highlighting the value of moderate dataset expansion.These findings suggest that datasets spanning 17 to 21 days,with 80%used for training,can significantly enhance the predictive accuracy of solar power generation models. 展开更多
关键词 Correlation coefficients dataset size machine learning mean absolute error regression solar power prediction
在线阅读 下载PDF
High-resolution Simulation Dataset of Hourly PM_(2.5)Chemical Composition in China(CAQRA-aerosol)from 2013 to 2020 被引量:1
15
作者 Lei KONG Xiao TANG +14 位作者 Jiang ZHU Zifa WANG Bing LIU Yuanyuan ZHU Lili ZHU Duohong CHEN Ke HU Huangjian WU Qian WU Jin SHEN Yele SUN Zirui LIU Jinyuan XIN Dongsheng JI Mei ZHENG 《Advances in Atmospheric Sciences》 2025年第4期697-712,共16页
Scientific knowledge on the chemical compositions of fine particulate matter(PM_(2.5)) is essential for properly assessing its health and climate effects,and for decisionmakers to develop efficient mitigation strategi... Scientific knowledge on the chemical compositions of fine particulate matter(PM_(2.5)) is essential for properly assessing its health and climate effects,and for decisionmakers to develop efficient mitigation strategies.A high-resolution PM_(2.5) chemical composition dataset(CAQRA-aerosol)is developed in this study,which provides hourly maps of organic carbon,black carbon,ammonium,nitrate,and sulfate in China from 2013 to 2020 with a horizontal resolution of 15 km.This paper describes the method,access,and validation results of this dataset.It shows that CAQRA-aerosol has good consistency with observations and achieves higher or comparable accuracy with previous PM_(2.5) composition datasets.Based on CAQRA-aerosol,spatiotemporal changes of different PM_(2.5) compositions were investigated from a national viewpoint,which emphasizes different changes of nitrate from other compositions.The estimated annual rate of population-weighted concentrations of nitrate is 0.23μg m^(−3)yr^(−1) from 2015 to 2020,compared with−0.19 to−1.1μg m^(−3)yr^(−1) for other compositions.The whole dataset is freely available from the China Air Pollution Data Center(https://doi.org/10.12423/capdb_PKU.2023.DA). 展开更多
关键词 PM_(2.5)composition dataset black carbon organic carbon AMMONIUM NITRATE SULFATE
在线阅读 下载PDF
MEET:A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification With Zoom-Free Remote Sensing Imagery 被引量:1
16
作者 Yansheng Li Yuning Wu +9 位作者 Gong Cheng Chao Tao Bo Dang Yu Wang Jiahao Zhang Chuge Zhang Yiting Liu Xu Tang Jiayi Ma Yongjun Zhang 《IEEE/CAA Journal of Automatica Sinica》 2025年第5期1004-1023,共20页
Accurate fine-grained geospatial scene classification using remote sensing imagery is essential for a wide range of applications.However,existing approaches often rely on manually zooming remote sensing images at diff... Accurate fine-grained geospatial scene classification using remote sensing imagery is essential for a wide range of applications.However,existing approaches often rely on manually zooming remote sensing images at different scales to create typical scene samples.This approach fails to adequately support the fixed-resolution image interpretation requirements in real-world scenarios.To address this limitation,we introduce the million-scale fine-grained geospatial scene classification dataset(MEET),which contains over 1.03 million zoom-free remote sensing scene samples,manually annotated into 80 fine-grained categories.In MEET,each scene sample follows a scene-in-scene layout,where the central scene serves as the reference,and auxiliary scenes provide crucial spatial context for fine-grained classification.Moreover,to tackle the emerging challenge of scene-in-scene classification,we present the context-aware transformer(CAT),a model specifically designed for this task,which adaptively fuses spatial context to accurately classify the scene samples.CAT adaptively fuses spatial context to accurately classify the scene samples by learning attentional features that capture the relationships between the center and auxiliary scenes.Based on MEET,we establish a comprehensive benchmark for fine-grained geospatial scene classification,evaluating CAT against 11 competitive baselines.The results demonstrate that CAT significantly outperforms these baselines,achieving a 1.88%higher balanced accuracy(BA)with the Swin-Large backbone,and a notable 7.87%improvement with the Swin-Huge backbone.Further experiments validate the effectiveness of each module in CAT and show the practical applicability of CAT in the urban functional zone mapping.The source code and dataset will be publicly available at https://jerrywyn.github.io/project/MEET.html. 展开更多
关键词 Fine-grained geospatial scene classification(FGSC) million-scale dataset remote sensing imagery(RSI) scene-in-scene transformer
在线阅读 下载PDF
A dataset for the structure and electrochemical performance of hard carbon as anodes for sodium-ion batteries
17
作者 HOU Wei-yan YI Zong-lin +7 位作者 JIA Wan-ru YU Hong-tao DAI Li-qin YANG Jun-jie CHEN Jing-peng XIE Li-jing SU Fang-yuan CHEN Cheng-meng 《新型炭材料(中英文)》 北大核心 2025年第5期1193-1200,共8页
This data set collects,compares and contrasts the capacities and structures of a series of hard carbon materials,and then searches for correlations between structure and electrochemical performance.The capacity data o... This data set collects,compares and contrasts the capacities and structures of a series of hard carbon materials,and then searches for correlations between structure and electrochemical performance.The capacity data of the hard carbons were obtained by charge/discharge tests and the materials were characterized by XRD,gas adsorption,true density tests and SAXS.In particular,the fitting of SAXS gave a series of structural parameters which showed good characterization.The related test details are given with the structural data of the hard carbons and the electrochemical performance of the sodium-ion batteries. 展开更多
关键词 Hard carbon Sodium-ion battery SAXS Structural characterization dataset
在线阅读 下载PDF
UD-TN:A comprehensive ultrasound dataset for benign and malignant thyroid nodule classification
18
作者 Jialin Zhu Xuzhou Fu +5 位作者 Zhiqiang Liu Luchen Chang Xuewei Li Jie Gao Ruiguo Yu Xi Wei 《Intelligent Oncology》 2025年第2期176-187,共12页
The automatic classification of thyroid nodules in ultrasound images is a critical research focus in medical imaging.However,publicly available thyroid ultrasound datasets remain scarce.In this study,we developed the ... The automatic classification of thyroid nodules in ultrasound images is a critical research focus in medical imaging.However,publicly available thyroid ultrasound datasets remain scarce.In this study,we developed the Ultrasound Dataset for Thyroid Nodules(UD-TN),a comprehensive dataset containing 10,495 labeled images classified as benign or malignant based on pathology-confirmed results.To establish a benchmark,we proposed the Thyroid Ultrasound Image Neural Network(ThyUNet),a deep learning model designed for accurate nodule classification.By incorporating high-resolution feature enhancement,instance normalization,and dilated convolutions into residual blocks,ThyUNet excels in extracting fine-grained features,particularly for small nodules.Experimental results demonstrate that ThyUNet achieves state-of-the-art performance,with an accuracy of 89.7%,a sensitivity of 0.879,and a specificity of 0.910 on the testing set.These results surpass those of other advanced architectures,highlighting the model’s effectiveness.UD-TN and ThyUNet contribute significantly to advancing intelligent medical diagnostics.Dataset details and access instructions are available at https://github.com/18811755633/Sample-of-UD-TN. 展开更多
关键词 Ultrasound dataset Deep learning Nodule classification Medical imaging dataset
在线阅读 下载PDF
Multi-Label Machine Learning Classification of Cardiovascular Diseases
19
作者 Chih-Ta Yen Jung-Ren Wong Chia-Hsang Chang 《Computers, Materials & Continua》 2025年第7期347-363,共17页
In its 2023 global health statistics,the World Health Organization noted that noncommunicable diseases(NCDs)remain the leading cause of disease burden worldwide,with cardiovascular diseases(CVDs)resulting in more deat... In its 2023 global health statistics,the World Health Organization noted that noncommunicable diseases(NCDs)remain the leading cause of disease burden worldwide,with cardiovascular diseases(CVDs)resulting in more deaths than the three other major NCDs combined.In this study,we developed a method that can comprehensively detect which CVDs are present in a patient.Specifically,we propose a multi-label classification method that utilizes photoplethysmography(PPG)signals and physiological characteristics from public datasets to classify four types of CVDs and related conditions:hypertension,diabetes,cerebral infarction,and cerebrovascular disease.Our approach to multi-disease classification of cardiovascular diseases(CVDs)using PPG signals achieves the highest classification performance when encompassing the broadest range of disease categories,thereby offering a more comprehensive assessment of human health.We employ a multi-label classification strategy to simultaneously predict the presence or absence of multiple diseases.Specifically,we first apply the Savitzky-Golay(S-G)filter to the PPG signals to reduce noise and then transform into statistical features.We integrate processed PPG signals with individual physiological features as a multimodal input,thereby expanding the learned feature space.Notably,even with a simple machine learning method,this approach can achieve relatively high accuracy.The proposed method achieved a maximum F1-score of 0.91,minimum Hamming loss of 0.04,and an accuracy of 0.95.Thus,our method represents an effective and rapid solution for detecting multiple diseases simultaneously,which is beneficial for comprehensively managing CVDs. 展开更多
关键词 PHOTOPLETHYSMOGRAPHY machine learning health management multi-label classification cardiovascu-lar disease
在线阅读 下载PDF
A Comprehensive Brain MRI and Neurodevelopmental Dataset in Children with Tetralogy of Fallot
20
作者 Yang Xu Yaqi Zhang +10 位作者 Meijiao Zhu Pengcheng Xue Siyu Ma Di Yu Liang Hu Yuxi Zhang Wei Peng Jirong Qi Xuyun Wen Ming Yang Xuming Mo 《Congenital Heart Disease》 2025年第5期559-570,共12页
Background:The life-course management of children with tetralogy of Fallot(TOF)has focused on demonstrating brain structural alterations,developmental trajectories,and cognition-related changes that unfold over time.M... Background:The life-course management of children with tetralogy of Fallot(TOF)has focused on demonstrating brain structural alterations,developmental trajectories,and cognition-related changes that unfold over time.Methods:We introduce an magnetic resonance imaging(MRI)dataset comprising TOF children who underwent brain MRI scanning and cross-sectional neurocognitive follow-up.The dataset includes brain three-dimensional T1-weighted imaging(3D-T1WI),three-dimensional T2-weighted imaging(3D-T2WI),and neurodevelopmental evaluations using the Wechsler Preschool and Primary Scale of Intelligence–Fourth Edition(WPPSI-IV).Results:Thirty-one children with TOF(age range:4–33 months;18 males)were recruited and completed corrective surgery at the Children’s Hospital of Nanjing Medical University,Nanjing,China.Aiming to promote the neurodevelopmental outcomes in children with TOF,we have meticulously curated a comprehensive dataset designed to dissect the complex interplay among risk factors,neuroimaging findings,and adverse neurodevelopmental outcomes.Conclusion:This article aims to introduce our open-source dataset on neurodevelopment in children with TOF,which covers the data types,data acquisition and processing methods,the procedure for accessing the data,and related publications. 展开更多
关键词 Tetralogy of Fallot NEURODEVELOPMENT dataset congenital heart disease
暂未订购
上一页 1 2 197 下一页 到第
使用帮助 返回顶部