Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conductin...Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.展开更多
With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study p...With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification.展开更多
Distributed Denial-of-Service(DDoS)attacks pose severe threats to Industrial Control Networks(ICNs),where service disruption can cause significant economic losses and operational risks.Existing signature-based methods...Distributed Denial-of-Service(DDoS)attacks pose severe threats to Industrial Control Networks(ICNs),where service disruption can cause significant economic losses and operational risks.Existing signature-based methods are ineffective against novel attacks,and traditional machine learning models struggle to capture the complex temporal dependencies and dynamic traffic patterns inherent in ICN environments.To address these challenges,this study proposes a deep feature-driven hybrid framework that integrates Transformer,BiLSTM,and KNN to achieve accurate and robust DDoS detection.The Transformer component extracts global temporal dependencies from network traffic flows,while BiLSTM captures fine-grained sequential dynamics.The learned embeddings are then classified using an instance-based KNN layer,enhancing decision boundary precision.This cascaded architecture balances feature abstraction and locality preservation,improving both generalization and robustness.The proposed approach was evaluated on a newly collected real-time ICN traffic dataset and further validated using the public CIC-IDS2017 and Edge-IIoT datasets to demonstrate generalization.Comprehensive metrics including accuracy,precision,recall,F1-score,ROC-AUC,PR-AUC,false positive rate(FPR),and detection latency were employed.Results show that the hybrid framework achieves 98.42%accuracy with an ROC-AUC of 0.992 and FPR below 1%,outperforming baseline machine learning and deep learning models.Robustness experiments under Gaussian noise perturbations confirmed stable performance with less than 2%accuracy degradation.Moreover,detection latency remained below 2.1 ms per sample,indicating suitability for real-time ICS deployment.In summary,the proposed hybrid temporal learning and instance-based classification model offers a scalable and effective solution for DDoS detection in industrial control environments.By combining global contextual modeling,sequential learning,and instance-based refinement,the framework demonstrates strong adaptability across datasets and resilience against noise,providing practical utility for safeguarding critical infrastructure.展开更多
Skin diseases affect millions worldwide.Early detection is key to preventing disfigurement,lifelong disability,or death.Dermoscopic images acquired in primary-care settings show high intra-class visual similarity and ...Skin diseases affect millions worldwide.Early detection is key to preventing disfigurement,lifelong disability,or death.Dermoscopic images acquired in primary-care settings show high intra-class visual similarity and severe class imbalance,and occasional imaging artifacts can create ambiguity for state-of-the-art convolutional neural networks(CNNs).We frame skin lesion recognition as graph-based reasoning and,to ensure fair evaluation and avoid data leakage,adopt a strict lesion-level partitioning strategy.Each image is first over-segmented using SLIC(Simple Linear Iterative Clustering)to produce perceptually homogeneous superpixels.These superpixels form the nodes of a region-adjacency graph whose edges encode spatial continuity.Node attributes are 1280-dimensional embeddings extracted with a lightweight yet expressive EfficientNet-B0 backbone,providing strong representational power at modest computational cost.The resulting graphs are processed by a five-layer Graph Attention Network(GAT)that learns to weight inter-node relationships dynamically and aggregates multi-hop context before classifying lesions into seven classes with a log-softmax output.Extensive experiments on the DermaMNIST benchmark show the proposed pipeline achieves 88.35%accuracy and 98.04%AUC,outperforming contemporary CNNs,AutoML approaches,and alternative graph neural networks.An ablation study indicates EfficientNet-B0 produces superior node descriptors compared with ResNet-18 and DenseNet,and that roughly five GAT layers strike a good balance between being too shallow and over-deep while avoiding oversmoothing.The method requires no data augmentation or external metadata,making it a drop-in upgrade for clinical computer-aided diagnosis systems.展开更多
The generation of high-quality 3D models from single 2D images remains challenging in terms of accuracy and completeness.Deep learning has emerged as a promising solution,offering new avenues for improvements.However,...The generation of high-quality 3D models from single 2D images remains challenging in terms of accuracy and completeness.Deep learning has emerged as a promising solution,offering new avenues for improvements.However,building models from scratch is computationally expensive and requires large datasets.This paper presents a transfer-learning-based approach for category-specific 3D reconstruction from a single 2D image.The core idea is to fine-tune a pre-trained model on specific object categories using new,unseen data,resulting in specialized versions of the model that are better adapted to reconstruct particular objects.The proposed approach utilizes a three-phase pipeline comprising image acquisition,3D reconstruction,and refinement.After ensuring the quality of the input image,a ResNet50 model is used for object recognition,directing the image to the corresponding category-specific model to generate a voxel-based representation.The voxel-based 3D model is then refined by transforming it into a detailed triangular mesh representation using the Marching Cubes algorithm and Laplacian smoothing.An experimental study,using the Pix2Vox model and the Pascal3D dataset,has been conducted to evaluate and validate the effectiveness of the proposed approach.Results demonstrate that category-specific fine-tuning of Pix2Vox significantly outperforms both the original model and the general model fine-tuned for all object categories,with substantial gains in Intersection over Union(IoU)scores.Visual assessments confirm improvements in geometric detail and surface realism.These findings indicate that combining transfer learning with category-specific fine tuning and refinement strategy of our approach leads to better-quality 3D model generation.展开更多
Accurately counting dense objects in complex and diverse backgrounds is a significant challenge in computer vision,with applications ranging from crowd counting to various other object counting tasks.To address this,w...Accurately counting dense objects in complex and diverse backgrounds is a significant challenge in computer vision,with applications ranging from crowd counting to various other object counting tasks.To address this,we propose HUANNet(High-Resolution Unified Attention Network),a convolutional neural network designed to capture both local features and rich semantic information through a high-resolution representation learning framework,while optimizing computational distribution across parallel branches.HUANNet introduces three core modules:the High-Resolution Attention Module(HRAM),which enhances feature extraction by optimizing multiresolution feature fusion;the Unified Multi-Scale Attention Module(UMAM),which integrates spatial,channel,and convolutional kernel information through an attention mechanism applied across multiple levels of the network;and the Grid-Assisted Point Matching Module(GPMM),which stabilizes and improves point-to-point matching by leveraging grid-based mechanisms.Extensive experiments show that HUANNet achieves competitive results on the ShanghaiTech Part A/B crowd counting datasets and sets new state-of-the-art performance on dense object counting datasets such as CARPK and XRAY-IECCD,demonstrating the effectiveness and versatility of HUANNet.展开更多
Radiance field-based 3D reconstruction has emerged as a transformative research direction due to its remarkable efficiency and quality.This paper presents a systematic analysis of representation models,reconstruction ...Radiance field-based 3D reconstruction has emerged as a transformative research direction due to its remarkable efficiency and quality.This paper presents a systematic analysis of representation models,reconstruction methodologies,and future applications in this field.We start from an overview of multi-view 3D reconstruction tasks,then focus on the key issue:how to represent 3D content effectively.Radiance fields are highlighted for their flexibility and representational completeness.Distinguished from the existing review literature,we adopt a multi-dimensional comparison between neural radiance fields(Ne RF)and 3D Gaussian splatting(3DGS)to develop a unified and in-depth understanding of the radiance field-based approach.Beyond the initial goal of novel view synthesis(NVS),recent breakthroughs in geometry extraction are summarized.Finally,we explore potential applications across areas such as robot localization and mapping,virtual reality,physical simulation,and stereo display.Empowered by the flexible 3D representation within the radiance field-based paradigm,the latest advancements strive to push the boundaries and overcome long-standing bottlenecks in related domains.展开更多
Word cloud visualization is a compelling graphical representation that visually depicts the frequency of words within a given text or dataset[1].Research on word clouds focuses on two main aspects.The first emphasizes...Word cloud visualization is a compelling graphical representation that visually depicts the frequency of words within a given text or dataset[1].Research on word clouds focuses on two main aspects.The first emphasizes processing words,such as using the latent Dirichlet allocation(LDA)algorithm to uncover topics in the documents[2],while the second involves visual impact through striking word arrangements[3,4].In the realm of extensive biomedical data,effectiveknowledge delivery to biologists is crucial.展开更多
In global navigation satellite system denial environment,cross-view geo-localization based on image retrieval presents an exceedingly critical visual localization solution for Unmanned Aerial Vehicle(UAV)systems.The e...In global navigation satellite system denial environment,cross-view geo-localization based on image retrieval presents an exceedingly critical visual localization solution for Unmanned Aerial Vehicle(UAV)systems.The essence of cross-view geo-localization resides in matching images containing the same geographical targets from disparate platforms,such as UAV-view and satellite-view images.However,images of the same geographical targets may suffer from occlusions and geometric distortions due to variations in the capturing platform,view,and timing.The existing methods predominantly extract features by segmenting feature maps,which overlook the holistic semantic distribution and structural information of objects,resulting in loss of image information.To address these challenges,dilated neighborhood attention Transformer is employed as the feature extraction backbone,and Multi-feature representations based on Multi-scale Hierarchical Contextual Aggregation(MMHCA)is proposed.In the proposed MMHCA method,the multiscale hierarchical contextual aggregation method is utilized to extract contextual information from local to global across various granularity levels,establishing feature associations of contextual information with global and local information in the image.Subsequently,the multi-feature representations method is utilized to obtain rich discriminative feature information,bolstering the robustness of model in scenarios characterized by positional shifts,varying distances,and scale ambiguities.Comprehensive experiments conducted on the extensively utilized University-1652 and SUES-200 benchmarks indicate that the MMHCA method surpasses the existing techniques.showing outstanding results in UAV localization and navigation.展开更多
Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces.Here we have developed a deep learning algorithm,GPT2 Ion Channel Class...Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces.Here we have developed a deep learning algorithm,GPT2 Ion Channel Classifier(GPT2-ICC),which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins.GPT2-ICC integrates representation learning with a large language model(LLM)-based classifier,enabling highly accurate identification of potential ion channels.Several potential ion channels were predicated from the unannotated human proteome,further demonstrating GPT2-ICC’s generalization ability.This study marks a significant advancement in artificial-intelligence-driven ion channel research,highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data.Moreover,it provides a valuable computational tool for uncovering previously uncharacterized ion channels.展开更多
Accurate predictions of the Remaining useful life(RUL)of mechanical equipment are vital for lowering maintenance costs and maintaining equipment reliability and safety.Datadriven RUL prediction methods have made signi...Accurate predictions of the Remaining useful life(RUL)of mechanical equipment are vital for lowering maintenance costs and maintaining equipment reliability and safety.Datadriven RUL prediction methods have made significant progress,but they often assume that the training and testing data have the same distribution,which is often not the case in practical engineering applications.To address this issue,this paper proposes a residual useful life prediction model that combines deep learning and transfer learning.In this model,called transfer convolutional attention mechanism for early-life stage time convolutional network(TCAM-EASTCN),an unsupervised domain adaptation strategy is introduced based on the characterization of subspace distances and orthogonal basis mismatch penalties in the convolutional attention mechanism for early-life stage time convolutional network(CAMEASTCN).This approach minimizes the distribution differences between different domains,enhancing the learning of cross-domain invariant features and effectively reducing the distribution gap between the source and target domains,thereby improving the accuracy of RUL prediction under varying conditions.Experimental results demonstrate that TCAMEASTCN outperforms other models in terms of RUL prediction accuracy and generalization.展开更多
A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such...A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.展开更多
Artificial intelligence(AI)researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes.Digital molecular representation plays ...Artificial intelligence(AI)researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes.Digital molecular representation plays a crucial role in achieving this objective by making molecules machine-readable,thereby enhancing the accuracy of molecular prediction tasks and facilitating evidence-based decision making.This study presents a comprehensive review of small molecular representations and AI-driven drug discovery downstream tasks utilizing these representations.The research methodology begins with the compilation of small molecule databases,followed by an analysis of fundamental molecular representations and the models that learn these representations from initial forms,capturing patterns and salient features across extensive chemical spaces.The study then examines various drug discovery downstream tasks,including drug-target interaction(DTI)prediction,drug-target affinity(DTA)prediction,drug property(DP)prediction,and drug generation,all based on learned representations.The analysis concludes by highlighting challenges and opportunities associated with machine learning(ML)methods for molecular representation and improving downstream task performance.Additionally,the representation of small molecules and AI-based downstream tasks demonstrates significant potential in identifying traditional Chinese medicine(TCM)medicinal substances and facilitating TCM target discovery.展开更多
Let F_(1)be the virtual field consisting of one element and(Q,I)a string pair.In this paper,we study the representations of string pairs over the virtual field F_(1).It is proved that an indecomposable F_(1)-represent...Let F_(1)be the virtual field consisting of one element and(Q,I)a string pair.In this paper,we study the representations of string pairs over the virtual field F_(1).It is proved that an indecomposable F_(1)-representation is either a string representation or a band representation by using the coefficient quivers.It is worth noting that for a given band and a positive integer,there exists a unique band representation up to isomorphism.展开更多
The modifiedλ-differential Lie-Yamaguti algebras are considered,in which a modifiedλ-differential Lie-Yamaguti algebra consisting of a Lie-Yamaguti algebra and a modifiedλ-differential operator.First we introduce t...The modifiedλ-differential Lie-Yamaguti algebras are considered,in which a modifiedλ-differential Lie-Yamaguti algebra consisting of a Lie-Yamaguti algebra and a modifiedλ-differential operator.First we introduce the representation of modifiedλ-differential Lie-Yamaguti algebras.Furthermore,we establish the cohomology of a modifiedλ-differential Lie-Yamaguti algebra with coefficients in a representation.Finally,we investigate the one-parameter formal deformations and Abelian extensions of modifiedλ-differential Lie-Yamaguti algebras using the second cohomology group.展开更多
Binary Code Similarity Detection(BCSD)is vital for vulnerability discovery,malware detection,and software security,especially when source code is unavailable.Yet,it faces challenges from semantic loss,recompilation va...Binary Code Similarity Detection(BCSD)is vital for vulnerability discovery,malware detection,and software security,especially when source code is unavailable.Yet,it faces challenges from semantic loss,recompilation variations,and obfuscation.Recent advances in artificial intelligence—particularly natural language processing(NLP),graph representation learning(GRL),and large language models(LLMs)—have markedly improved accuracy,enabling better recognition of code variants and deeper semantic understanding.This paper presents a comprehensive review of 82 studies published between 1975 and 2025,systematically tracing the historical evolution of BCSD and analyzing the progressive incorporation of artificial intelligence(AI)techniques.Particular emphasis is placed on the role of LLMs,which have recently emerged as transformative tools in advancing semantic representation and enhancing detection performance.The review is organized around five central research questions:(1)the chronological development and milestones of BCSD;(2)the construction of AI-driven technical roadmaps that chart methodological transitions;(3)the design and implementation of general analytical workflows for binary code analysis;(4)the applicability,strengths,and limitations of LLMs in capturing semantic and structural features of binary code;and(5)the persistent challenges and promising directions for future investigation.By synthesizing insights across these dimensions,the study demonstrates how LLMs reshape the landscape of binary code analysis,offering unprecedented opportunities to improve accuracy,scalability,and adaptability in real-world scenarios.This review not only bridges a critical gap in the existing literature but also provides a forward-looking perspective,serving as a valuable reference for researchers and practitioners aiming to advance AI-powered BCSD methodologies and applications.展开更多
The purpose of this article is to depart from the conventional belief that John Donne,a vibrant 17th-century writer,is a full-blown metaphysical poet as widely claimed while also acknowledging the poetic ingenuity of ...The purpose of this article is to depart from the conventional belief that John Donne,a vibrant 17th-century writer,is a full-blown metaphysical poet as widely claimed while also acknowledging the poetic ingenuity of John Donne.While Donne’s poetry is rich in matter and manner,and his poems are caked in wit,intellectual superiority,and apt exploration of telling themes,dressing him fully in borrowed robes seems a stretch.Some of Donne’s poems,without a shred of doubt,contain flavors of metaphysical poetry,but the term“metaphysical”seems to be unsuitable for poems such as“A Valediction:Forbidding Mourning”.展开更多
Dear Editor,This letter proposes an end-to-end feature disentangled Transformer(FDTs)for entanglement-free and semantic feature representation to enable accurate and trustworthy pathology grading of squamous cell carc...Dear Editor,This letter proposes an end-to-end feature disentangled Transformer(FDTs)for entanglement-free and semantic feature representation to enable accurate and trustworthy pathology grading of squamous cell carcinoma(SCC).Existing vision transformers(ViTs)can implement representation learning for SCC grading,however,they all adopt the class-patch token fuzzy mapping for pattern prediction probability or window down-sampling to enhance the representation to contextual information.展开更多
基金supported by the Innovative Human Resource Development for Local Intel-lectualization program through the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.IITP-2026-2020-0-01741)the research fund of Hanyang University(HY-2025-1110).
文摘Arrhythmias are a frequently occurring phenomenon in clinical practice,but how to accurately dis-tinguish subtle rhythm abnormalities remains an ongoing difficulty faced by the entire research community when conducting ECG-based studies.From a review of existing studies,two main factors appear to contribute to this problem:the uneven distribution of arrhythmia classes and the limited expressiveness of features learned by current models.To overcome these limitations,this study proposes a dual-path multimodal framework,termed DM-EHC(Dual-Path Multimodal ECG Heartbeat Classifier),for ECG-based heartbeat classification.The proposed framework links 1D ECG temporal features with 2D time–frequency features.By setting up the dual paths described above,the model can process more dimensions of feature information.The MIT-BIH arrhythmia database was selected as the baseline dataset for the experiments.Experimental results show that the proposed method outperforms single modalities and performs better for certain specific types of arrhythmias.The model achieved mean precision,recall,and F1 score of 95.14%,92.26%,and 93.65%,respectively.These results indicate that the framework is robust and has potential value in automated arrhythmia classification.
基金supported by the SungKyunKwan University and the BK21 FOUR(Graduate School Innovation)funded by the Ministry of Education(MOE,Korea)and National Research Foundation of Korea(NRF).
文摘With the recent increase in data volume and diversity,traditional text representation techniques are struggling to capture context,particularly in environments with sparse data.To address these challenges,this study proposes a new model,the Masked Joint Representation Model(MJRM).MJRM approximates the original hypothesis by leveraging multiple elements in a limited context.It dynamically adapts to changes in characteristics based on data distribution through three main components.First,masking-based representation learning,termed selective dynamic masking,integrates topic modeling and sentiment clustering to generate and train multiple instances across different data subsets,whose predictions are then aggregated with optimized weights.This design alleviates sparsity,suppresses noise,and preserves contextual structures.Second,regularization-based improvements are applied.Third,techniques for addressing sparse data are used to perform final inference.As a result,MJRM improves performance by up to 4%compared to existing AI techniques.In our experiments,we analyzed the contribution of each factor,demonstrating that masking,dynamic learning,and aggregating multiple instances complement each other to improve performance.This demonstrates that a masking-based multi-learning strategy is effective for context-aware sparse text classification,and can be useful even in challenging situations such as data shortage or data distribution variations.We expect that the approach can be extended to diverse fields such as sentiment analysis,spam filtering,and domain-specific document classification.
基金supported by the Extral High Voltage Power Transmission Company,China Southern Power Grid Co.,Ltd.
文摘Distributed Denial-of-Service(DDoS)attacks pose severe threats to Industrial Control Networks(ICNs),where service disruption can cause significant economic losses and operational risks.Existing signature-based methods are ineffective against novel attacks,and traditional machine learning models struggle to capture the complex temporal dependencies and dynamic traffic patterns inherent in ICN environments.To address these challenges,this study proposes a deep feature-driven hybrid framework that integrates Transformer,BiLSTM,and KNN to achieve accurate and robust DDoS detection.The Transformer component extracts global temporal dependencies from network traffic flows,while BiLSTM captures fine-grained sequential dynamics.The learned embeddings are then classified using an instance-based KNN layer,enhancing decision boundary precision.This cascaded architecture balances feature abstraction and locality preservation,improving both generalization and robustness.The proposed approach was evaluated on a newly collected real-time ICN traffic dataset and further validated using the public CIC-IDS2017 and Edge-IIoT datasets to demonstrate generalization.Comprehensive metrics including accuracy,precision,recall,F1-score,ROC-AUC,PR-AUC,false positive rate(FPR),and detection latency were employed.Results show that the hybrid framework achieves 98.42%accuracy with an ROC-AUC of 0.992 and FPR below 1%,outperforming baseline machine learning and deep learning models.Robustness experiments under Gaussian noise perturbations confirmed stable performance with less than 2%accuracy degradation.Moreover,detection latency remained below 2.1 ms per sample,indicating suitability for real-time ICS deployment.In summary,the proposed hybrid temporal learning and instance-based classification model offers a scalable and effective solution for DDoS detection in industrial control environments.By combining global contextual modeling,sequential learning,and instance-based refinement,the framework demonstrates strong adaptability across datasets and resilience against noise,providing practical utility for safeguarding critical infrastructure.
基金funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No.(DGSSR-2025-02-01296).
文摘Skin diseases affect millions worldwide.Early detection is key to preventing disfigurement,lifelong disability,or death.Dermoscopic images acquired in primary-care settings show high intra-class visual similarity and severe class imbalance,and occasional imaging artifacts can create ambiguity for state-of-the-art convolutional neural networks(CNNs).We frame skin lesion recognition as graph-based reasoning and,to ensure fair evaluation and avoid data leakage,adopt a strict lesion-level partitioning strategy.Each image is first over-segmented using SLIC(Simple Linear Iterative Clustering)to produce perceptually homogeneous superpixels.These superpixels form the nodes of a region-adjacency graph whose edges encode spatial continuity.Node attributes are 1280-dimensional embeddings extracted with a lightweight yet expressive EfficientNet-B0 backbone,providing strong representational power at modest computational cost.The resulting graphs are processed by a five-layer Graph Attention Network(GAT)that learns to weight inter-node relationships dynamically and aggregates multi-hop context before classifying lesions into seven classes with a log-softmax output.Extensive experiments on the DermaMNIST benchmark show the proposed pipeline achieves 88.35%accuracy and 98.04%AUC,outperforming contemporary CNNs,AutoML approaches,and alternative graph neural networks.An ablation study indicates EfficientNet-B0 produces superior node descriptors compared with ResNet-18 and DenseNet,and that roughly five GAT layers strike a good balance between being too shallow and over-deep while avoiding oversmoothing.The method requires no data augmentation or external metadata,making it a drop-in upgrade for clinical computer-aided diagnosis systems.
基金funded by the Research,Development,and Innovation Authority(RDIA)—Kingdom of Saudi Arabia—under supervision Energy,Industry,and Advanced Technologies Research Center,Taibah University,Madinah,Saudi Arabia with grant number(12979-iau-2023-TAU-R-3-1-EI-).
文摘The generation of high-quality 3D models from single 2D images remains challenging in terms of accuracy and completeness.Deep learning has emerged as a promising solution,offering new avenues for improvements.However,building models from scratch is computationally expensive and requires large datasets.This paper presents a transfer-learning-based approach for category-specific 3D reconstruction from a single 2D image.The core idea is to fine-tune a pre-trained model on specific object categories using new,unseen data,resulting in specialized versions of the model that are better adapted to reconstruct particular objects.The proposed approach utilizes a three-phase pipeline comprising image acquisition,3D reconstruction,and refinement.After ensuring the quality of the input image,a ResNet50 model is used for object recognition,directing the image to the corresponding category-specific model to generate a voxel-based representation.The voxel-based 3D model is then refined by transforming it into a detailed triangular mesh representation using the Marching Cubes algorithm and Laplacian smoothing.An experimental study,using the Pix2Vox model and the Pascal3D dataset,has been conducted to evaluate and validate the effectiveness of the proposed approach.Results demonstrate that category-specific fine-tuning of Pix2Vox significantly outperforms both the original model and the general model fine-tuned for all object categories,with substantial gains in Intersection over Union(IoU)scores.Visual assessments confirm improvements in geometric detail and surface realism.These findings indicate that combining transfer learning with category-specific fine tuning and refinement strategy of our approach leads to better-quality 3D model generation.
基金funded by the National Natural Science Foundation of China(62273213,62472262,62572287)Natural Science Foundation of Shandong Province(ZR2024MF144)+1 种基金Natural Science Foundation of Shandong Province for Innovation and Development Joint Funds(ZR2022LZH001)Taishan Scholarship Construction Engineering.
文摘Accurately counting dense objects in complex and diverse backgrounds is a significant challenge in computer vision,with applications ranging from crowd counting to various other object counting tasks.To address this,we propose HUANNet(High-Resolution Unified Attention Network),a convolutional neural network designed to capture both local features and rich semantic information through a high-resolution representation learning framework,while optimizing computational distribution across parallel branches.HUANNet introduces three core modules:the High-Resolution Attention Module(HRAM),which enhances feature extraction by optimizing multiresolution feature fusion;the Unified Multi-Scale Attention Module(UMAM),which integrates spatial,channel,and convolutional kernel information through an attention mechanism applied across multiple levels of the network;and the Grid-Assisted Point Matching Module(GPMM),which stabilizes and improves point-to-point matching by leveraging grid-based mechanisms.Extensive experiments show that HUANNet achieves competitive results on the ShanghaiTech Part A/B crowd counting datasets and sets new state-of-the-art performance on dense object counting datasets such as CARPK and XRAY-IECCD,demonstrating the effectiveness and versatility of HUANNet.
基金supported by the National Natural Science Foundation of China(Grant No.62573019)。
文摘Radiance field-based 3D reconstruction has emerged as a transformative research direction due to its remarkable efficiency and quality.This paper presents a systematic analysis of representation models,reconstruction methodologies,and future applications in this field.We start from an overview of multi-view 3D reconstruction tasks,then focus on the key issue:how to represent 3D content effectively.Radiance fields are highlighted for their flexibility and representational completeness.Distinguished from the existing review literature,we adopt a multi-dimensional comparison between neural radiance fields(Ne RF)and 3D Gaussian splatting(3DGS)to develop a unified and in-depth understanding of the radiance field-based approach.Beyond the initial goal of novel view synthesis(NVS),recent breakthroughs in geometry extraction are summarized.Finally,we explore potential applications across areas such as robot localization and mapping,virtual reality,physical simulation,and stereo display.Empowered by the flexible 3D representation within the radiance field-based paradigm,the latest advancements strive to push the boundaries and overcome long-standing bottlenecks in related domains.
基金supported by the National Key R&D Program of China(2022YFC2704304 and 2021YFF0702000)the National Natural Science Foundation of China(32341020 and 32341021)+1 种基金Hubei Innovation Group Project(2021CFA005)the Research Core Facilities for Life Science(HUST).
文摘Word cloud visualization is a compelling graphical representation that visually depicts the frequency of words within a given text or dataset[1].Research on word clouds focuses on two main aspects.The first emphasizes processing words,such as using the latent Dirichlet allocation(LDA)algorithm to uncover topics in the documents[2],while the second involves visual impact through striking word arrangements[3,4].In the realm of extensive biomedical data,effectiveknowledge delivery to biologists is crucial.
基金supported by the National Natural Science Foundation of China(Nos.12072027,62103052,61603346 and 62103379)the Henan Key Laboratory of General Aviation Technology,China(No.ZHKF-230201)+3 种基金the Funding for the Open Research Project of the Rotor Aerodynamics Key Laboratory,China(No.RAL20200101)the Key Research and Development Program of Henan Province,China(Nos.241111222000 and 241111222900)the Key Science and Technology Program of Henan Province,China(No.232102220067)the Scholarship Funding from the China Scholarship Council(No.202206030079).
文摘In global navigation satellite system denial environment,cross-view geo-localization based on image retrieval presents an exceedingly critical visual localization solution for Unmanned Aerial Vehicle(UAV)systems.The essence of cross-view geo-localization resides in matching images containing the same geographical targets from disparate platforms,such as UAV-view and satellite-view images.However,images of the same geographical targets may suffer from occlusions and geometric distortions due to variations in the capturing platform,view,and timing.The existing methods predominantly extract features by segmenting feature maps,which overlook the holistic semantic distribution and structural information of objects,resulting in loss of image information.To address these challenges,dilated neighborhood attention Transformer is employed as the feature extraction backbone,and Multi-feature representations based on Multi-scale Hierarchical Contextual Aggregation(MMHCA)is proposed.In the proposed MMHCA method,the multiscale hierarchical contextual aggregation method is utilized to extract contextual information from local to global across various granularity levels,establishing feature associations of contextual information with global and local information in the image.Subsequently,the multi-feature representations method is utilized to obtain rich discriminative feature information,bolstering the robustness of model in scenarios characterized by positional shifts,varying distances,and scale ambiguities.Comprehensive experiments conducted on the extensively utilized University-1652 and SUES-200 benchmarks indicate that the MMHCA method surpasses the existing techniques.showing outstanding results in UAV localization and navigation.
基金funded by grants from the National Key Research and Development Program of China(Grant Nos.:2022YFE0205600 and 2022YFC3400504)the National Natural Science Foundation of China(Grant Nos.:82373792 and 82273857)the Fundamental Research Funds for the Central Universities,China,and the East China Normal University Medicine and Health Joint Fund,China(Grant No.:2022JKXYD07001).
文摘Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces.Here we have developed a deep learning algorithm,GPT2 Ion Channel Classifier(GPT2-ICC),which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins.GPT2-ICC integrates representation learning with a large language model(LLM)-based classifier,enabling highly accurate identification of potential ion channels.Several potential ion channels were predicated from the unannotated human proteome,further demonstrating GPT2-ICC’s generalization ability.This study marks a significant advancement in artificial-intelligence-driven ion channel research,highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data.Moreover,it provides a valuable computational tool for uncovering previously uncharacterized ion channels.
基金supported in part by the Key Research and Development Program of Shaanxi Province under Grant 2020GY-104in part by the Key Laboratory of Highway Construction Machinery of Shaanxi Province,Key Laboratory of Road Construction Technology and Equipment(Chang'an University),MOE,under Grant 300102250503in part by the Fundamental Research Funds for the Central Universities under Grant CHD 300102250503.
文摘Accurate predictions of the Remaining useful life(RUL)of mechanical equipment are vital for lowering maintenance costs and maintaining equipment reliability and safety.Datadriven RUL prediction methods have made significant progress,but they often assume that the training and testing data have the same distribution,which is often not the case in practical engineering applications.To address this issue,this paper proposes a residual useful life prediction model that combines deep learning and transfer learning.In this model,called transfer convolutional attention mechanism for early-life stage time convolutional network(TCAM-EASTCN),an unsupervised domain adaptation strategy is introduced based on the characterization of subspace distances and orthogonal basis mismatch penalties in the convolutional attention mechanism for early-life stage time convolutional network(CAMEASTCN).This approach minimizes the distribution differences between different domains,enhancing the learning of cross-domain invariant features and effectively reducing the distribution gap between the source and target domains,thereby improving the accuracy of RUL prediction under varying conditions.Experimental results demonstrate that TCAMEASTCN outperforms other models in terms of RUL prediction accuracy and generalization.
基金Shanghai Frontier Science Research Center for Modern Textiles,Donghua University,ChinaOpen Project of Henan Key Laboratory of Intelligent Manufacturing of Mechanical Equipment,Zhengzhou University of Light Industry,China(No.IM202303)National Key Research and Development Program of China(No.2019YFB1706300)。
文摘A personalized outfit recommendation has emerged as a hot research topic in the fashion domain.However,existing recommendations do not fully exploit user style preferences.Typically,users prefer particular styles such as casual and athletic styles,and consider attributes like color and texture when selecting outfits.To achieve personalized outfit recommendations in line with user style preferences,this paper proposes a personal style guided outfit recommendation with multi-modal fashion compatibility modeling,termed as PSGNet.Firstly,a style classifier is designed to categorize fashion images of various clothing types and attributes into distinct style categories.Secondly,a personal style prediction module extracts user style preferences by analyzing historical data.Then,to address the limitations of single-modal representations and enhance fashion compatibility,both fashion images and text data are leveraged to extract multi-modal features.Finally,PSGNet integrates these components through Bayesian personalized ranking(BPR)to unify the personal style and fashion compatibility,where the former is used as personal style features and guides the output of the personalized outfit recommendation tailored to the target user.Extensive experiments on large-scale datasets demonstrate that the proposed model is efficient on the personalized outfit recommendation.
基金supported by the Shenzhen Key Laboratory of Intelligent Bioinformatics(No.ZDSYS20220422103800001)the Shenzhen Science and Technology Program(No.JCYJ20230807140709020)+2 种基金National Natural Science Foundation of China(Nos.62402489,U22A2041,and 62373172)the China Postdoctoral Science Foundation(No.2023M743688)Guangdong Basic and Applied Basic Research Foundation(Nos.2024A1515011960 and 2023A1515110570)。
文摘Artificial intelligence(AI)researchers and cheminformatics specialists strive to identify effective drug precursors while optimizing costs and accelerating development processes.Digital molecular representation plays a crucial role in achieving this objective by making molecules machine-readable,thereby enhancing the accuracy of molecular prediction tasks and facilitating evidence-based decision making.This study presents a comprehensive review of small molecular representations and AI-driven drug discovery downstream tasks utilizing these representations.The research methodology begins with the compilation of small molecule databases,followed by an analysis of fundamental molecular representations and the models that learn these representations from initial forms,capturing patterns and salient features across extensive chemical spaces.The study then examines various drug discovery downstream tasks,including drug-target interaction(DTI)prediction,drug-target affinity(DTA)prediction,drug property(DP)prediction,and drug generation,all based on learned representations.The analysis concludes by highlighting challenges and opportunities associated with machine learning(ML)methods for molecular representation and improving downstream task performance.Additionally,the representation of small molecules and AI-based downstream tasks demonstrates significant potential in identifying traditional Chinese medicine(TCM)medicinal substances and facilitating TCM target discovery.
文摘Let F_(1)be the virtual field consisting of one element and(Q,I)a string pair.In this paper,we study the representations of string pairs over the virtual field F_(1).It is proved that an indecomposable F_(1)-representation is either a string representation or a band representation by using the coefficient quivers.It is worth noting that for a given band and a positive integer,there exists a unique band representation up to isomorphism.
基金National Natural Science Foundation of China(12161013)Research Projects of Guizhou University of Commerce in 2024。
文摘The modifiedλ-differential Lie-Yamaguti algebras are considered,in which a modifiedλ-differential Lie-Yamaguti algebra consisting of a Lie-Yamaguti algebra and a modifiedλ-differential operator.First we introduce the representation of modifiedλ-differential Lie-Yamaguti algebras.Furthermore,we establish the cohomology of a modifiedλ-differential Lie-Yamaguti algebra with coefficients in a representation.Finally,we investigate the one-parameter formal deformations and Abelian extensions of modifiedλ-differential Lie-Yamaguti algebras using the second cohomology group.
文摘Binary Code Similarity Detection(BCSD)is vital for vulnerability discovery,malware detection,and software security,especially when source code is unavailable.Yet,it faces challenges from semantic loss,recompilation variations,and obfuscation.Recent advances in artificial intelligence—particularly natural language processing(NLP),graph representation learning(GRL),and large language models(LLMs)—have markedly improved accuracy,enabling better recognition of code variants and deeper semantic understanding.This paper presents a comprehensive review of 82 studies published between 1975 and 2025,systematically tracing the historical evolution of BCSD and analyzing the progressive incorporation of artificial intelligence(AI)techniques.Particular emphasis is placed on the role of LLMs,which have recently emerged as transformative tools in advancing semantic representation and enhancing detection performance.The review is organized around five central research questions:(1)the chronological development and milestones of BCSD;(2)the construction of AI-driven technical roadmaps that chart methodological transitions;(3)the design and implementation of general analytical workflows for binary code analysis;(4)the applicability,strengths,and limitations of LLMs in capturing semantic and structural features of binary code;and(5)the persistent challenges and promising directions for future investigation.By synthesizing insights across these dimensions,the study demonstrates how LLMs reshape the landscape of binary code analysis,offering unprecedented opportunities to improve accuracy,scalability,and adaptability in real-world scenarios.This review not only bridges a critical gap in the existing literature but also provides a forward-looking perspective,serving as a valuable reference for researchers and practitioners aiming to advance AI-powered BCSD methodologies and applications.
文摘The purpose of this article is to depart from the conventional belief that John Donne,a vibrant 17th-century writer,is a full-blown metaphysical poet as widely claimed while also acknowledging the poetic ingenuity of John Donne.While Donne’s poetry is rich in matter and manner,and his poems are caked in wit,intellectual superiority,and apt exploration of telling themes,dressing him fully in borrowed robes seems a stretch.Some of Donne’s poems,without a shred of doubt,contain flavors of metaphysical poetry,but the term“metaphysical”seems to be unsuitable for poems such as“A Valediction:Forbidding Mourning”.
基金supported by the National Natural Science Foundation of China(62272078)the Chongqing Natural Science Foundation(CSTB2023NSCQ-LZX0069).
文摘Dear Editor,This letter proposes an end-to-end feature disentangled Transformer(FDTs)for entanglement-free and semantic feature representation to enable accurate and trustworthy pathology grading of squamous cell carcinoma(SCC).Existing vision transformers(ViTs)can implement representation learning for SCC grading,however,they all adopt the class-patch token fuzzy mapping for pattern prediction probability or window down-sampling to enhance the representation to contextual information.