期刊文献+
共找到3,863篇文章
< 1 2 194 >
每页显示 20 50 100
A Convolutional Neural Network-Based Deep Support Vector Machine for Parkinson’s Disease Detection with Small-Scale and Imbalanced Datasets
1
作者 Kwok Tai Chui Varsha Arya +2 位作者 Brij B.Gupta Miguel Torres-Ruiz Razaz Waheeb Attar 《Computers, Materials & Continua》 2026年第1期1410-1432,共23页
Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using d... Parkinson’s disease(PD)is a debilitating neurological disorder affecting over 10 million people worldwide.PD classification models using voice signals as input are common in the literature.It is believed that using deep learning algorithms further enhances performance;nevertheless,it is challenging due to the nature of small-scale and imbalanced PD datasets.This paper proposed a convolutional neural network-based deep support vector machine(CNN-DSVM)to automate the feature extraction process using CNN and extend the conventional SVM to a DSVM for better classification performance in small-scale PD datasets.A customized kernel function reduces the impact of biased classification towards the majority class(healthy candidates in our consideration).An improved generative adversarial network(IGAN)was designed to generate additional training data to enhance the model’s performance.For performance evaluation,the proposed algorithm achieves a sensitivity of 97.6%and a specificity of 97.3%.The performance comparison is evaluated from five perspectives,including comparisons with different data generation algorithms,feature extraction techniques,kernel functions,and existing works.Results reveal the effectiveness of the IGAN algorithm,which improves the sensitivity and specificity by 4.05%–4.72%and 4.96%–5.86%,respectively;and the effectiveness of the CNN-DSVM algorithm,which improves the sensitivity by 1.24%–57.4%and specificity by 1.04%–163%and reduces biased detection towards the majority class.The ablation experiments confirm the effectiveness of individual components.Two future research directions have also been suggested. 展开更多
关键词 Convolutional neural network data generation deep support vector machine feature extraction generative artificial intelligence imbalanced dataset medical diagnosis Parkinson’s disease small-scale dataset
在线阅读 下载PDF
Layered Feature Engineering for E-Commerce Purchase Prediction:A Hierarchical Evaluation on Taobao User Behavior Datasets
2
作者 Liqiu Suo Lin Xia +1 位作者 Yoona Chung Eunchan Kim 《Computers, Materials & Continua》 2026年第4期1865-1889,共25页
Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three ... Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features.This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers:Basic,Conversion&Stability(efficiency and volatility across actions),and Advanced Interactions&Activity(crossbehavior synergies and intensity).Using real Taobao(Alibaba’s primary e-commerce platform)logs(57,976 records for 10,203 users;25 November–03 December 2017),we conducted a hierarchical,layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution.Across logistic regression(LR),decision tree,random forest,XGBoost,and CatBoost models with stratified 5-fold cross-validation,the performance improvedmonotonically fromBasic to Conversion&Stability to Advanced features.With LR,F1 increased from 0.613(Basic)to 0.962(Advanced);boosted models achieved high discrimination(0.995 AUC Score)and an F1 score up to 0.983.Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short(9-day)window.By making feature contributions measurable and reproducible,the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioralmodeling.The code and processed artifacts are publicly available,and future work will extend the validation to longer,seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design. 展开更多
关键词 Hierarchical feature engineering purchase prediction user behavior dataset feature importance e-commerce platform TAOBAO
在线阅读 下载PDF
Fine-Med-Mental-T&P:a dual-track approach for high-quality instructional datasets of mental disorders in traditional Chinese medicine
3
作者 Yanbai Wei Xiaoshuo Jing Junfeng Yan 《Digital Chinese Medicine》 2026年第1期31-42,共12页
Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine(TCM)mental disorders and to validate its efficacy.Methods We proposed the Fine-Med-Mental-T&P... Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine(TCM)mental disorders and to validate its efficacy.Methods We proposed the Fine-Med-Mental-T&P methodology for constructing high-quality instruction datasets in TCM mental disorders.This approach integrates theoretical knowledge and practical case studies through a dual-track strategy.(i)Theoretical track:textbooks and guidelines on TCM mental disorders were manually segmented.Initial responses were generated using DeepSeek-V3,followed by refinement by the Qwen3-32B model to align the expression with human preferences.A screening algorithm was then applied to select 16000 high-quality instruction pairs.(ii)Practical track:starting from over 600 real clinical case seeds,diagnostic and therapeutic instruction pairs were generated using DeepSeek-V3 and subsequently screened through manual evaluation,resulting in 4000 high-quality practiceoriented instruction pairs.The integration of both tracks yielded the Med-Mental-Instruct-T&P dataset,comprising a total of 20000 instruction pairs.To validate the dataset’s effectiveness,three experimental evaluations(both manual and automated)were conducted:(i)comparative studies to compare the performance of models fine-tuned on different datasets;(ii)benchmarking to compare against mainstream TCM-specific large language models(LLMs);(iii)data ablation study to investigate the relationship between data volume and model performance.Results Experimental results demonstrate the superior performance of T&P-model finetuned on the Med-Mental-Instruct-T&P dataset.In the comparative study,the T&P-model significantly outperformed the baseline models trained solely on self-generated or purely human-curated baseline data.This superiority was evident in both automated metrics(ROUGEL>0.55)and expert manual evaluations(scoring above 7/10 across accuracy).In benchmark comparisons,the T&P-model also excelled against existing mainstream TCM LLMs(e.g.,HuatuoGPT and ZuoyiGPT).It showed particularly strong capabilities in handling diverse clinical presentations,including challenging disorders such as insomnia and coma,showcasing its robustness and versatility.Data ablation studies showed that T&P-model performance had an overall upward trend with minor fluctuations when training data increased from 10%to 50%;beyond 50%,performance improvement slowed significantly,with metrics plateauing and approaching a saturation point. 展开更多
关键词 Mental disorder Traditional Chinese medicine(TCM) Instruction dataset construction Instruction tuning Large language model
在线阅读 下载PDF
Hint-SQL:基于自动线索生成的Text-to-SQL提示方法
4
作者 谭钊 刘喜平 +4 位作者 舒晴 万齐智 刘德喜 万常选 廖国琼 《计算机学报》 北大核心 2026年第3期700-720,共21页
Text-to-SQL旨在将自然语言问题翻译为可被数据库系统执行的SQL语句,从而为数据查询提供便利。随着大语言模型(LLMs)技术的发展,基于LLMs的Text-to-SQL提示方法成为该领域的主流解决方案。近年来,研究者在LLMs的提示词中加入线索(Hint)... Text-to-SQL旨在将自然语言问题翻译为可被数据库系统执行的SQL语句,从而为数据查询提供便利。随着大语言模型(LLMs)技术的发展,基于LLMs的Text-to-SQL提示方法成为该领域的主流解决方案。近年来,研究者在LLMs的提示词中加入线索(Hint)来传递具体的Text-to-SQL建议,以引导LLMs生成SQL。然而,现有线索多由研究者根据Text-to-SQL任务的特点人为撰写,其内容过于宽泛,难以根据具体的任务需求做出调整,无法适配所有Text-to-SQL任务。本文提出基于自动线索生成的Text-to-SQL提示方法Hint-SQL,它能够根据当前Text-to-SQL任务自动地生成合适的语义、操作和结构线索,从而引导LLMs生成语义一致、结构正确的SQL。为了生成任务定制化线索,我们构建了线索生成智能体(HAgent)。HAgent基于两阶段微调框架,由开源LLMs微调而来,该框架自动合成微调所需数据,无需人工标注,为监督微调和偏好学习优化提供支持。HintSQL既可以单独使用,也可以用来增强现有方法。大规模实验结果显示,HintSQL独立使用时可以媲美主流方法,也可以显著增强现有方法性能,在BIRD数据集上,HintSQL将当前最好方法的准确率提升到了71.58%,提升幅度达到4.37%。本研究揭示了线索在Text-to-SQL任务中的重要作用,为Text-to-SQL的后续研究提供了参考。 展开更多
关键词 自然语言处理 text-to-sql 大语言模型 提示工程 线索
在线阅读 下载PDF
BWRadarDataset-1.0:多波段多模态雷达探测感知数据集
5
作者 张转花 靳俊峰 +22 位作者 常沛 何洋洋 汪振亚 侯其立 李玉景 郝慧军 曾怡 夏勇 商国军 许涛 任伟杰 雷鸣 王歆远 寿博 邓丽颖 任乐乐 窦曼莉 杨利红 张琦珺 李伟 牛蕾 林晓斌 张志成 《雷达科学与技术》 北大核心 2026年第1期1-14,共14页
雷达探测感知技术飞速发展浪潮下高质量数据集在算法创新、模型训练与性能验证中发挥着重要作用。当前,深度学习等数据驱动方法已成为提升雷达在检测、跟踪、识别、干扰及合成孔径雷达(SAR)成像等核心任务性能的关键。然而,现有的数据... 雷达探测感知技术飞速发展浪潮下高质量数据集在算法创新、模型训练与性能验证中发挥着重要作用。当前,深度学习等数据驱动方法已成为提升雷达在检测、跟踪、识别、干扰及合成孔径雷达(SAR)成像等核心任务性能的关键。然而,现有的数据集大多基于仿真生成,与真实电磁环境存在差异,泛化能力受限,并且现有的数据集仅针对单一功能,例仅有检测或SAR,缺乏系统性,难以支撑探测感知处理的一体化研究。针对这一空白,本文公开了一套完整的雷达检测跟踪识别一体化数据集。该数据集源于典型的实测场景,涵盖了信号处理、目标跟踪、精细识别、复合干扰以及高分辨率SAR图像的多波段、多模态数据,真实反映复杂环境下雷达信号的传播特性与目标特性。进一步,本文对数据集中的关键特征进行了系统性提取与分析,为不同任务的算法研究与性能评估提供了标准化的特征输入,为研究雷达智能化信号与信息处理提供了坚实的基础。 展开更多
关键词 雷达探测 公开数据集 特征提取 目标检测 目标跟踪 目标识别 有源干扰 SAR图像 特征分析
在线阅读 下载PDF
Standardizing Healthcare Datasets in China:Challenges and Strategies
6
作者 Zheng-Yong Hu Xiao-Lei Xiu +2 位作者 Jing-Yu Zhang Wan-Fei Hu Si-Zhu Wu 《Chinese Medical Sciences Journal》 2025年第4期253-267,I0001,共16页
Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthc... Standardized datasets are foundational to healthcare informatization by enhancing data quality and unleashing the value of data elements.Using bibliometrics and content analysis,this study examines China's healthcare dataset standards from 2011 to 2025.It analyzes their evolution across types,applications,institutions,and themes,highlighting key achievements including substantial growth in quantity,optimized typology,expansion into innovative application scenarios such as health decision support,and broadened institutional involvement.The study also identifies critical challenges,including imbalanced development,insufficient quality control,and a lack of essential metadata—such as authoritative data element mappings and privacy annotations—which hampers the delivery of intelligent services.To address these challenges,the study proposes a multi-faceted strategy focused on optimizing the standard system's architecture,enhancing quality and implementation,and advancing both data governance—through authoritative tracing and privacy protection—and intelligent service provision.These strategies aim to promote the application of dataset standards,thereby fostering and securing the development of new productive forces in healthcare. 展开更多
关键词 healthcare dataset standards data standardization data management
在线阅读 下载PDF
DCS-SOCP-SVM:A Novel Integrated Sampling and Classification Algorithm for Imbalanced Datasets
7
作者 Xuewen Mu Bingcong Zhao 《Computers, Materials & Continua》 2025年第5期2143-2159,共17页
When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes... When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets. 展开更多
关键词 DCS-SOCP-SVM imbalanced datasets sampling method ensemble method integrated algorithm
在线阅读 下载PDF
Development and validation of AI delineation of the thoracic RTOG organs at risk with deep learning on multi-institutional datasets
8
作者 Xianghua Ye Dazhou Guo +32 位作者 Lujun Zhao Congying Xie Dandan Zheng Haihua Yang Xiangzhi Zhu Xin Sun Pingping Dong Huanhuan Li Weiwei Kong Jianzhong Cao Honglei Chen Juntao Ran Kai Ren Hongxin Su Hao Hu Cuimeng Tian Tianlu Wang Qiang Zeng Xiao Hu Ping Peng Junhua Zhang Li Zhang Tingting Zhang Lue Zhou Wenchao Guo Zhanghexuan Ji Puyang Wang Hua Zhang Jiali Liu Le Lu Senxiang Yan Dakai Jin Feng-Ming(Spring)Kong 《Intelligent Oncology》 2025年第1期61-71,共11页
Introduction:Accurate contouring of thoracic organs at risk(OARs)is essential for minimizing complications in radiation treatment.Manual contouring of thoracic OARs is not only time-consuming but also prone to substan... Introduction:Accurate contouring of thoracic organs at risk(OARs)is essential for minimizing complications in radiation treatment.Manual contouring of thoracic OARs is not only time-consuming but also prone to substantial user variation.To enhance the efficiency and consistency,we developed a unified deep learning(DL)OAR contouring model,DeepOAR,that was trained using multiple partially labeled datasets for segmenting a comprehensive set of thoracic OARs following the Radiation Therapy Oncology Group(RTOG)-guided OAR atlas.This DL model supports the segmentation of six required and eight optional OARs guided by the NRG-RTOG 1106 trial,providing precise and reproducible OARs contouring that are ready to be used in radiotherapy practice.Materials and methods:Following the OAR contouring recommendation of the NRG-RTOG 1106 trial,we collected and curated three private datasets and two public datasets,comprising a total of 531 patients with partially annotated thoracic OARs.These partially annotated datasets were utilized to develop DeepOAR,which consisted of a shared encoder and 14 separate decoders,with each decoder dedicated to one specific OAR.For model training,we utilized all patients from the two public datasets and 75%of the patients from the private datasets.We reserved the remaining 25%of the private datasets for independent testing.A multi-user study involving 21 radiation oncologists was conducted on 40 randomly selected patients from the independent testing dataset to evaluate the clinical applicability of DeepOAR.The Dice coefficient score(DSC)and average surface distance(ASD)were computed to evaluate the quantitative delineation performance of the model.Results:DeepOAR outperformed nnUNet(the benchmark medical segmentation model)across all 14 OARs,achieving mean DSC and ASD values of 88.4%and 1.0 mm,respectively,in the independent testing set.Multi-user validation demonstrated that 89.7%of DeepOAR-generated OARs were clinically acceptable or required only minor revisions.A comparison using two randomly selected patients showed that the delineation variability of DeepOAR was significantly smaller than the inter-user variation among radiation oncologists.Human editing of DeepOAR’s predictions could further improve OAR delineation accuracy by an average of 3%increase in DSC and 40%reduction in ASD while significantly reducing the workload of radiation oncologists for contouring 14 thoracic OARs by an average of 77.0%.Conclusion:We developed DeepOAR,a DL-based unified contouring model trained using multiple partially labeled datasets,to delineate a comprehensive set of 14 thoracic OARs following the RTOG-guided OAR atlas.Both qualitative and quantitative results demonstrated the strong clinical applicability of DeepOAR for the OAR delineation process in thoracic cancer radiotherapy workflows,along with improved efficiency,comprehensiveness,and quality. 展开更多
关键词 NRG-RTOG 1106 OAR segmentation Deep learning Partially labeled datasets
暂未订购
A Comprehensive Review of Face Detection Techniques for Occluded Faces:Methods,Datasets,and Open Challenges
9
作者 Thaer Thaher Majdi Mafarja +2 位作者 Muhammed Saffarini Abdul Hakim H.M.Mohamed Ayman A.El-Saleh 《Computer Modeling in Engineering & Sciences》 2025年第6期2615-2673,共59页
Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveill... Detecting faces under occlusion remains a significant challenge in computer vision due to variations caused by masks,sunglasses,and other obstructions.Addressing this issue is crucial for applications such as surveillance,biometric authentication,and human-computer interaction.This paper provides a comprehensive review of face detection techniques developed to handle occluded faces.Studies are categorized into four main approaches:feature-based,machine learning-based,deep learning-based,and hybrid methods.We analyzed state-of-the-art studies within each category,examining their methodologies,strengths,and limitations based on widely used benchmark datasets,highlighting their adaptability to partial and severe occlusions.The review also identifies key challenges,including dataset diversity,model generalization,and computational efficiency.Our findings reveal that deep learning methods dominate recent studies,benefiting from their ability to extract hierarchical features and handle complex occlusion patterns.More recently,researchers have increasingly explored Transformer-based architectures,such as Vision Transformer(ViT)and Swin Transformer,to further improve detection robustness under challenging occlusion scenarios.In addition,hybrid approaches,which aim to combine traditional andmodern techniques,are emerging as a promising direction for improving robustness.This review provides valuable insights for researchers aiming to develop more robust face detection systems and for practitioners seeking to deploy reliable solutions in real-world,occlusionprone environments.Further improvements and the proposal of broader datasets are required to developmore scalable,robust,and efficient models that can handle complex occlusions in real-world scenarios. 展开更多
关键词 Occluded face detection feature-based deep learning machine learning hybrid approaches datasets
在线阅读 下载PDF
Impact of climate changes on Arizona State precipitation patterns using high-resolution climatic gridded datasets
10
作者 Hayder H.Kareem Shahla Abdulqader Nassrullah 《Journal of Groundwater Science and Engineering》 2025年第1期34-46,共13页
Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geograph... Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geographical study was conducted in Arizona State,USA,to examine monthly precipi-tation concentration rates over time.This analysis used a high-resolution 0.50×0.50 grid for monthly precip-itation data from 1961 to 2022,Provided by the Climatic Research Unit.The study aimed to analyze climatic changes affected the first and last five years of each decade,as well as the entire decade,during the specified period.GIS was used to meet the objectives of this study.Arizona experienced 51–568 mm,67–560 mm,63–622 mm,and 52–590 mm of rainfall in the sixth,seventh,eighth,and ninth decades of the second millennium,respectively.Both the first and second five year periods of each decade showed accept-able rainfall amounts despite fluctuations.However,rainfall decreased in the first and second decades of the third millennium.and in the first two years of the third decade.Rainfall amounts dropped to 42–472 mm,55–469 mm,and 74–498 mm,respectively,indicating a downward trend in precipitation.The central part of the state received the highest rainfall,while the eastern and western regions(spanning north to south)had significantly less.Over the decades of the third millennium,the average annual rainfall every five years was relatively low,showing a declining trend due to severe climate changes,generally ranging between 35 mm and 498 mm.The central regions consistently received more rainfall than the eastern and western outskirts.Arizona is currently experiencing a decrease in rainfall due to climate change,a situation that could deterio-rate further.This highlights the need to optimize the use of existing rainfall and explore alternative water sources. 展开更多
关键词 Spatial Analysis Climate Impact Precipitation Rates CRU dataset GIS Arizona State USA
在线阅读 下载PDF
A standardized dataset of CO-TPD spectra on transitionmetal single-crystal surfaces
11
作者 YANG Lin WU Jianghong WANG He 《燃料化学学报(中英文)》 北大核心 2026年第4期180-190,共11页
Temperature-programmed desorption(TPD)is a fundamental technique in surface science and heterogeneous catalysis for characterizing adsorption behavior,and for extracting key parameters such as adsorption energy.Howeve... Temperature-programmed desorption(TPD)is a fundamental technique in surface science and heterogeneous catalysis for characterizing adsorption behavior,and for extracting key parameters such as adsorption energy.However,the majority of existing TPD data is accessible in the form of published images,which lacks structured and quantitative datasets.This constrains its utility for rigorous quantitative analysis and computational modelling.Using carbon monoxide(CO)which is a widely adopted probe molecule,a curated and standardized dataset of CO-TPD is constructed,encompassing 14 transition-metal single-crystal surfaces,including copper(Cu)and ruthenium(Ru).By systematically extracting numerical data points from published spectra and applying normalization,essential spectral features such as peak shape are fully preserved.The dataset also documents relevant experimental parameters,including heating rates,and was developed using a standardized protocol for data collection and quality control.This resource serves as both a reference library to support the deconvolution of TPD spectra from complex catalysts and an experimental benchmark for calibrating parameters in theoretical models.By providing a reliable and accessible data function,this work advances the microscopic understanding and the rational design of catalyst active centers. 展开更多
关键词 CO-TPD standardized dataset transition metal single-crystal surfaces
在线阅读 下载PDF
The Development of Artificial Intelligence:Toward Consistency in the Logical Structures of Datasets,AI Models,Model Building,and Hardware?
12
作者 Li Guo Jinghai Li 《Engineering》 2025年第7期13-17,共5页
The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficu... The aim of this article is to explore potential directions for the development of artificial intelligence(AI).It points out that,while current AI can handle the statistical properties of complex systems,it has difficulty effectively processing and fully representing their spatiotemporal complexity patterns.The article also discusses a potential path of AI development in the engineering domain.Based on the existing understanding of the principles of multilevel com-plexity,this article suggests that consistency among the logical structures of datasets,AI models,model-building software,and hardware will be an important AI development direction and is worthy of careful consideration. 展开更多
关键词 CONSISTENCY datasets model building ai models artificial intelligence ai explore potential directions HARDWARE artificial intelligence
在线阅读 下载PDF
A Comprehensive Review of Face Detection/Recognition Algorithms and Competitive Datasets to Optimize Machine Vision
13
作者 Mahmood Ul Haq Muhammad Athar Javed Sethi +3 位作者 Sadique Ahmad Naveed Ahmad Muhammad Shahid Anwar Alpamis Kutlimuratov 《Computers, Materials & Continua》 2025年第7期1-24,共24页
Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensi... Face recognition has emerged as one of the most prominent applications of image analysis and under-standing,gaining considerable attention in recent years.This growing interest is driven by two key factors:its extensive applications in law enforcement and the commercial domain,and the rapid advancement of practical technologies.Despite the significant advancements,modern recognition algorithms still struggle in real-world conditions such as varying lighting conditions,occlusion,and diverse facial postures.In such scenarios,human perception is still well above the capabilities of present technology.Using the systematic mapping study,this paper presents an in-depth review of face detection algorithms and face recognition algorithms,presenting a detailed survey of advancements made between 2015 and 2024.We analyze key methodologies,highlighting their strengths and restrictions in the application context.Additionally,we examine various datasets used for face detection/recognition datasets focusing on the task-specific applications,size,diversity,and complexity.By analyzing these algorithms and datasets,this survey works as a valuable resource for researchers,identifying the research gap in the field of face detection and recognition and outlining potential directions for future research. 展开更多
关键词 Face recognition algorithms face detection techniques face recognition/detection datasets
在线阅读 下载PDF
A critical evaluation of deep-learning based phylogenetic inference programs using simulated datasets
14
作者 Yixiao Zhu Yonglin Li +2 位作者 Chuhao Li Xing-Xing Shen Xiaofan Zhou 《Journal of Genetics and Genomics》 2025年第5期714-717,共4页
Inferring phylogenetic trees from molecular sequences is a cornerstone of evolutionary biology.Many standard phylogenetic methods(such as maximum-likelihood[ML])rely on explicit models of sequence evolution and thus o... Inferring phylogenetic trees from molecular sequences is a cornerstone of evolutionary biology.Many standard phylogenetic methods(such as maximum-likelihood[ML])rely on explicit models of sequence evolution and thus often suffer from model misspecification or inadequacy.The on-rising deep learning(DL)techniques offer a powerful alternative.Deep learning employs multi-layered artificial neural networks to progressively transform input data into more abstract and complex representations.DL methods can autonomously uncover meaningful patterns from data,thereby bypassing potential biases introduced by predefined features(Franklin,2005;Murphy,2012).Recent efforts have aimed to apply deep neural networks(DNNs)to phylogenetics,with a growing number of applications in tree reconstruction(Suvorov et al.,2020;Zou et al.,2020;Nesterenko et al.,2022;Smith and Hahn,2023;Wang et al.,2023),substitution model selection(Abadi et al.,2020;Burgstaller-Muehlbacher et al.,2023),and diversification rate inference(Voznica et al.,2022;Lajaaiti et al.,2023;Lambert et al.,2023).In phylogenetic tree reconstruction,PhyDL(Zou et al.,2020)and Tree_learning(Suvorov et al.,2020)are two notable DNN-based programs designed to infer unrooted quartet trees directly from alignments of four amino acid(AA)and DNA sequences,respectively. 展开更多
关键词 phylogenetic inference explicit models sequence evolution deep learning deep learning dl techniques molecular sequences simulated datasets phylogenetic methods such evolutionary biologymany
原文传递
Detection Method for Bolt Loosening of Fan Base through Bayesian Learning with Small Dataset:A Real-World Application
15
作者 Zhongyun Tang Hanyi Xu Haiyang Hu 《Computers, Materials & Continua》 2026年第2期550-578,共29页
With the deep integration of smart manufacturing and IoT technologies,higher demands are placed on the intelligence and real-time performance of industrial equipment fault detection.For industrial fans,base bolt loose... With the deep integration of smart manufacturing and IoT technologies,higher demands are placed on the intelligence and real-time performance of industrial equipment fault detection.For industrial fans,base bolt loosening faults are difficult to identify through conventional spectrum analysis,and the extreme scarcity of fault data leads to limited training datasets,making traditional deep learning methods inaccurate in fault identification and incapable of detecting loosening severity.This paper employs Bayesian Learning by training on a small fault dataset collected from the actual operation of axial-flow fans in a factory to obtain posterior distribution.This method proposes specific data processing approaches and a configuration of Bayesian Convolutional Neural Network(BCNN).It can effectively improve the model’s generalization ability.Experimental results demonstrate high detection accuracy and alignment with real-world applications,offering practical significance and reference value for industrial fan bolt loosening detection under data-limited conditions. 展开更多
关键词 Bolt loosening detection industrial small dataset Bayesian learning INTERPRETABILITY real-world application
在线阅读 下载PDF
Impacts of random negative training datasets on machine learning-based geologic hazard susceptibility assessment
16
作者 Hao Cheng Wei Hong +3 位作者 Zhen-kai Zhang Zeng-lin Hong Zi-yao Wang Yu-xuan Dong 《China Geology》 2025年第4期676-690,共15页
This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,... This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures. 展开更多
关键词 LANDSLIDES Debris flows Collapses Ground fissures Geologic hazard prevention and control ENGINEERING Geologic hazard susceptibility assessment Negative training dataset Average spatial correlation Random forest algorithm Risk and return analysis Geological survey engineering Loess Plateau area
在线阅读 下载PDF
Efficient Dataset Generation for Stacked Meat Products Instance Segmentation in Food Automation
17
作者 Hoang Minh Pham Anh Dong Le +2 位作者 Pablo Malvido-Fresnillo Saigopal Vasudevan JoséL.Martínez Lastra 《IEEE/CAA Journal of Automatica Sinica》 2026年第1期224-226,共3页
Dear Editor,This letter presents techniques to simplify dataset generation for instance segmentation of raw meat products,a critical step toward automating food production lines.Accurate segmentation is essential for ... Dear Editor,This letter presents techniques to simplify dataset generation for instance segmentation of raw meat products,a critical step toward automating food production lines.Accurate segmentation is essential for addressing challenges such as occlusions,indistinct edges,and stacked configurations,which demand large,diverse datasets.To meet these demands,we propose two complementary approaches:a semi-automatic annotation interface using tools like the segment anything model(SAM)and GrabCut and a synthetic data generation pipeline leveraging 3D-scanned models.These methods reduce reliance on real meat,mitigate food waste,and improve scalability.Experimental results demonstrate that incorporating synthetic data enhances segmentation model performance and,when combined with real data,further boosts accuracy,paving the way for more efficient automation in the food industry. 展开更多
关键词 dataset generation segment anything model sam food automation raw meat productsa automating food production linesaccurate instance segmentation stacked meat products semi automatic annotation
在线阅读 下载PDF
Comparisons of cropland area from multiple datasets over the past 300 years in the traditional cultivated region of China 被引量:22
18
作者 HE Fanneng LI Shicheng +2 位作者 ZHANG Xuezhen GE Quansheng DAI Junhu 《Journal of Geographical Sciences》 SCIE CSCD 2013年第6期978-990,共13页
Land use/cover change is an important parameter in the climate and ecological simulations. Although they had been widely used in the community, SAGE dataset and HYDE dataset, the two representative global historical l... Land use/cover change is an important parameter in the climate and ecological simulations. Although they had been widely used in the community, SAGE dataset and HYDE dataset, the two representative global historical land use datasets, were little assessed about their accuracies in regional scale. Here, we carried out some assessments for the traditional cultivated region of China (TCRC) over last 300 years, by comparing SAGE2010 and HYDE (v3.1) with Chinese Historical Cropland Dataset (CHCD). The comparisons were performed at three spatial scales: entire study area, provincial area and 60 km by 60 km grid cell. The results show that (1) the cropland area from SAGE2010 was much more than that from CHCD moreover, the growth at a rate of 0.51% from 1700 to 1950 and -0.34% after 1950 were also inconsistent with that from CHCD. (2) HYDE dataset (v3.1) was closer to CHCD dataset than SAGE dataset on entire study area. However, the large biases could be detected at provincial scale and 60 km by 60 km grid cell scale. The percent of grid cells having biases greater than 70% (〈-70% or 〉70%) and 90% (〈-90% or 〉90%) accounted for 56%-63% and 40%-45% of the total grid cells respectively while those having biases range from -10% to 10% and from -30% to 30% account for only 5%-6% and 17% of the total grid cells respectively. (3) Using local historical archives to reconstruct historical dataset with high accuracy would be a valu- able way to improve the accuracy of climate and ecological simulation. 展开更多
关键词 cropland datasets comparisons past 300 years traditional cultivated region China
原文传递
The spatial local accuracy of land cover datasets over the Qiangtang Plateau, High Asia 被引量:3
19
作者 LIU Qionghuan ZHANG Yili +2 位作者 LIU Linshan LI Lanhui QI Wei 《Journal of Geographical Sciences》 SCIE CSCD 2019年第11期1841-1858,共18页
We analyzed the spatial local accuracy of land cover (LC) datasets for the Qiangtang Plateau,High Asia,incorporating 923 field sampling points and seven LC compilations including the International Geosphere Biosphere ... We analyzed the spatial local accuracy of land cover (LC) datasets for the Qiangtang Plateau,High Asia,incorporating 923 field sampling points and seven LC compilations including the International Geosphere Biosphere Programme Data and Information System (IGBPDIS),Global Land cover mapping at 30 m resolution (GlobeLand30),MODIS Land Cover Type product (MCD12Q1),Climate Change Initiative Land Cover (CCI-LC),Global Land Cover 2000 (GLC2000),University of Maryland (UMD),and GlobCover 2009 (Glob- Cover).We initially compared resultant similarities and differences in both area and spatial patterns and analyzed inherent relationships with data sources.We then applied a geographically weighted regression (GWR) approach to predict local accuracy variation.The results of this study reveal that distinct differences,even inverse time series trends,in LC data between CCI-LC and MCD12Q1 were present between 2001 and 2015,with the exception of category areal discordance between the seven datasets.We also show a series of evident discrepancies amongst the LC datasets sampled here in terms of spatial patterns,that is,high spatial congruence is mainly seen in the homogeneous southeastern region of the study area while a low degree of spatial congruence is widely distributed across heterogeneous northwestern and northeastern regions.The overall combined spatial accuracy of the seven LC datasets considered here is less than 70%,and the GlobeLand30 and CCI-LC datasets exhibit higher local accuracy than their counterparts,yielding maximum overall accuracy (OA) values of 77.39% and 61.43%,respectively.Finally,5.63% of this area is characterized by both high assessment and accuracy (HH) values,mainly located in central and eastern regions of the Qiangtang Plateau,while most low accuracy regions are found in northern,northeastern,and western regions. 展开更多
关键词 land cover datasets SPATIAL ACCURACY assessment remote sensing QIANGTANG PLATEAU HIGH ASIA
原文传递
Performances of Seven Datasets in Presenting the Upper Ocean Heat Content in the South China Sea 被引量:2
20
作者 陈晓 严幼芳 +1 位作者 程旭华 齐义泉 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2013年第5期1331-1342,共12页
In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (W... In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (WOA09) (climatology), Ishii datasets, Ocean General Circulation ModeI for the Earth Simulator (OFES), Simple Ocean Data Assimilation system (SODA), Global Ocean Data Assimilation System (GODAS), China Oceanic ReAnalysis system (CORA) , and an ocean reanalysis dataset for the joining area of Asia and Indian-Pacific Ocean (AIPO1.0). Among these datasets, two were independent of any numerical model, four relied on data assimilation, and one was generated without any data assimilation. The annual cycles revealed by the seven datasets were similar, but the interannual variations were different. Vertical structures of temperatures along the 18~N, 12.75~N, and 120~E sections were compared with data collected during open cruises in 1998 and 2005-08. The results indicated that Ishii, OFES, CORA, and AIPO1.0 were more consistent with the observations. Through systematic shortcomings and advantages in presenting the upper comparisons, we found that each dataset had its own OHC in the SCS. 展开更多
关键词 South China Sea ocean heat content multiple datasets interannual variability
在线阅读 下载PDF
上一页 1 2 194 下一页 到第
使用帮助 返回顶部