期刊文献+
共找到100,886篇文章
< 1 2 250 >
每页显示 20 50 100
Transformer-Driven Multimodal for Human-Object Detection and Recognition for Intelligent Robotic Surveillance
1
作者 Aman Aman Ullah Yanfeng Wu +3 位作者 Shaheryar Najam Nouf Abdullah Almujally Ahmad Jalal Hui Liu 《Computers, Materials & Continua》 2026年第4期1364-1383,共20页
Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To addre... Human object detection and recognition is essential for elderly monitoring and assisted living however,models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings.To address this,we present SCENET-3D,a transformer-drivenmultimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline.In the first stage,scene analysis,rich geometric and texture descriptors are extracted from RGB frames,including surface-normal histograms,angles between neighboring normals,Zernike moments,directional standard deviation,and Gabor-filter responses.In the second stage,scene-object analysis,non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information.In the third stage,human-pose estimation,silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints,which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons.Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities.Experiments on UCLA(95.8%),ETRI-Activity3D(89.4%),andCAD-120(91.2%)demonstrate that combining pseudo-3D skeletonswith rich scene-object fusion significantly improves generalizable activity recognition,enabling safer elderly care,natural human–robot interaction,and robust context-aware robotic perception in real-world environments. 展开更多
关键词 Human object detection elderly care RGB-based pose estimation scene context analysis object recognition Gabor features point cloud reconstruction
在线阅读 下载PDF
Superpixel-Aware Transformer with Attention-Guided Boundary Refinement for Salient Object Detection
2
作者 Burhan Baraklı Can Yüzkollar +1 位作者 Tugrul Ta¸sçı Ibrahim Yıldırım 《Computer Modeling in Engineering & Sciences》 2026年第1期1092-1129,共38页
Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task... Salient object detection(SOD)models struggle to simultaneously preserve global structure,maintain sharp object boundaries,and sustain computational efficiency in complex scenes.In this study,we propose SPSALNet,a task-driven two-stage(macro–micro)architecture that restructures the SOD process around superpixel representations.In the proposed approach,a“split-and-enhance”principle,introduced to our knowledge for the first time in the SOD literature,hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions.At the macro stage,the image is partitioned into content-adaptive superpixel regions,and each superpixel is represented by a high-dimensional region-level feature vector.These representations define a regional decomposition problem in which superpixels are assigned to three classes:background,object interior,and transition regions.Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context.At the micro stage,the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions.The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway.Subsequently,channel-aware fusion blocks adaptively combine information from these two sources,producing sharper and more stable object boundaries.Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods.On the PASCAL-S and DUT-OMRON datasets,SPSALNet exhibits a clear performance advantage across all key metrics,and it ranks first on accuracy-oriented measures on HKU-IS.On the challenging DUT-OMRON benchmark,SPSALNet reaches a MAE of 0.034.Across all datasets,it preserves object boundaries and regional structure in a stable and competitive manner. 展开更多
关键词 Salient object detection superpixel segmentation TRANSFormERS attention mechanism multi-level fusion edge-preserving refinement model-driven
在线阅读 下载PDF
WTNet-YOLO:结合离散小波变换与Transformer的棉田害虫检测算法
3
作者 刘江涛 周刚 +2 位作者 刘浩南 王佳佳 贾振红 《计算机工程与应用》 北大核心 2026年第3期226-240,共15页
棉花生长过程中受到害虫严重危害,因此精准的害虫检测已成为智慧农业体系中的关键环节。其中大量棉田害虫属于小目标,特征提取困难,而且害虫个体之间存在显著的尺寸差异,这限制了现有目标检测算法的性能。提出了一种结合离散小波变换与T... 棉花生长过程中受到害虫严重危害,因此精准的害虫检测已成为智慧农业体系中的关键环节。其中大量棉田害虫属于小目标,特征提取困难,而且害虫个体之间存在显著的尺寸差异,这限制了现有目标检测算法的性能。提出了一种结合离散小波变换与Transformer的YOLO11目标检测算法——WTNet-YOLO(wavelet and Transformer network-YOLO)。融合部分卷积与多尺度深度卷积构建C3K2-MKPF模块,增强对多尺寸目标的特征提取能力。在颈部结合小波域融合模块(wavelet domain fusion module,WDFM)和跨阶段部分局部和全局模块(cross stage partial local and global block,CSP-LGB),提升各尺寸害虫的频域信息表达与全局信息定位。引入多尺度自适应空间注意门(multi-scale adaptive spatial attention gate,MASAG),动态融合主干与颈部的跨层特征,强化空间与语义信息表达。为验证相关方法,构建了一个棉田害虫数据集YST-PestCotton(yellow sticky trap pest dataset in cotton),涵盖多个尺寸范围的害虫,具有显著的尺度多样性,害虫像素面积最大可相差1200多倍。实验表明,在YST-PestCotton上mAP50提升了3.1个百分点,同时将害虫按目标框面积划分为0~256、256~512、512~1024和大于1024四个子集,mAP50分别提升2.4、1.3、1.5、3个百分点。在公开数据集Yellow sticky traps上mAP50达到了最高的95.3%。综合来看,WTNet-YOLO能够有效应对小目标内部的尺寸差异,同时兼顾不同尺寸害虫的检测需求。 展开更多
关键词 智慧农业 害虫检测 小目标 多尺寸
在线阅读 下载PDF
结合TransFormer和复合FPN的YOLOv7tiny绝缘子缺陷检测算法
4
作者 党宏社 许勃 张选德 《电瓷避雷器》 2026年第1期85-94,共10页
绝缘子缺陷的准确、快速排查处理,有助于电力系统的稳定运行。然而,现有绝缘子缺陷检测算法存在网络结构复杂、推理时间长以及鲁棒性差等问题,不能满足实际巡检的需要。为此,本研究提出一种结合TransFormer和复合FPN的轻量级绝缘子缺陷... 绝缘子缺陷的准确、快速排查处理,有助于电力系统的稳定运行。然而,现有绝缘子缺陷检测算法存在网络结构复杂、推理时间长以及鲁棒性差等问题,不能满足实际巡检的需要。为此,本研究提出一种结合TransFormer和复合FPN的轻量级绝缘子缺陷检测算法。该算法以YOLOv7tiny为基础框架,引入TransFormer架构的轻量型EMO作为主干网络,其次在颈部网络设计一种结合上下文增强和特征细化的复合FPN结构,最后使用Wise IOU做为损失函数。实验结果显示,所提算法Map.5:.95达到70.7%,检测速度达到69.12帧·S^(-1),模型参数量为4.19M,直观效果图中绝缘子缺陷检测的平均置信度达到0.87。表明所提算法实现了网络的轻量化,降低了推理所需时间,提升了检测的鲁棒性。 展开更多
关键词 深度学习 目标检测 YOLOv7tiny网络 绝缘子缺陷
原文传递
Coupling the Power of YOLOv9 with Transformer for Small Object Detection in Remote-Sensing Images 被引量:1
5
作者 Mohammad Barr 《Computer Modeling in Engineering & Sciences》 2025年第4期593-616,共24页
Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presen... Recent years have seen a surge in interest in object detection on remote sensing images for applications such as surveillance andmanagement.However,challenges like small object detection,scale variation,and the presence of closely packed objects in these images hinder accurate detection.Additionally,the motion blur effect further complicates the identification of such objects.To address these issues,we propose enhanced YOLOv9 with a transformer head(YOLOv9-TH).The model introduces an additional prediction head for detecting objects of varying sizes and swaps the original prediction heads for transformer heads to leverage self-attention mechanisms.We further improve YOLOv9-TH using several strategies,including data augmentation,multi-scale testing,multi-model integration,and the introduction of an additional classifier.The cross-stage partial(CSP)method and the ghost convolution hierarchical graph(GCHG)are combined to improve detection accuracy by better utilizing feature maps,widening the receptive field,and precisely extracting multi-scale objects.Additionally,we incorporate the E-SimAM attention mechanism to address low-resolution feature loss.Extensive experiments on the VisDrone2021 and DIOR datasets demonstrate the effectiveness of YOLOv9-TH,showing good improvement in mAP compared to the best existing methods.The YOLOv9-TH-e achieved 54.2% of mAP50 on the VisDrone2021 dataset and 92.3% of mAP on the DIOR dataset.The results confirmthemodel’s robustness and suitability for real-world applications,particularly for small object detection in remote sensing images. 展开更多
关键词 Remote sensing images YOLOv9-TH multi-scale object detection transformer heads VisDrone2021 dataset
在线阅读 下载PDF
Transforming Education with Photogrammetry:Creating Realistic 3D Objects for Augmented Reality Applications
6
作者 Kaviyaraj Ravichandran Uma Mohan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2025年第1期185-208,共24页
Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in ed... Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in education continues to increase,educators actively seek innovative and immersive methods to engage students in learning.However,exploring these possibilities also entails identifying and overcoming existing barriers to optimal educational integration.Concurrently,this surge in demand has prompted the identification of specific barriers,one of which is three-dimensional(3D)modeling.Creating 3D objects for augmented reality education applications can be challenging and time-consuming for the educators.To address this,we have developed a pipeline that creates realistic 3D objects from the two-dimensional(2D)photograph.Applications for augmented and virtual reality can then utilize these created 3D objects.We evaluated the proposed pipeline based on the usability of the 3D object and performance metrics.Quantitatively,with 117 respondents,the co-creation team was surveyed with openended questions to evaluate the precision of the 3D object created by the proposed photogrammetry pipeline.We analyzed the survey data using descriptive-analytical methods and found that the proposed pipeline produces 3D models that are positively accurate when compared to real-world objects,with an average mean score above 8.This study adds new knowledge in creating 3D objects for augmented reality applications by using the photogrammetry technique;finally,it discusses potential problems and future research directions for 3D objects in the education sector. 展开更多
关键词 Augmented reality education immersive learning 3D object creation PHOTOGRAMMETRY and StructureFromMotion
在线阅读 下载PDF
Point-voxel dual transformer for LiDAR 3D object detection
7
作者 TONG Jigang YANG Fanhang +1 位作者 YANG Sen DU Shengzhi 《Optoelectronics Letters》 2025年第9期547-554,共8页
In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the propos... In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods. 展开更多
关键词 proposal refinement encode representative scene features point voxel dual transformer object detection LIDAR d object detection generation proposals proposal refinementspecificallykeypoints
原文传递
Global-local feature optimization based RGB-IR fusion object detection on drone view 被引量:1
8
作者 Zhaodong CHEN Hongbing JI Yongquan ZHANG 《Chinese Journal of Aeronautics》 2026年第1期436-453,共18页
Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still st... Visible and infrared(RGB-IR)fusion object detection plays an important role in security,disaster relief,etc.In recent years,deep-learning-based RGB-IR fusion detection methods have been developing rapidly,but still struggle to deal with the complex and changing scenarios captured by drones,mainly due to two reasons:(A)RGB-IR fusion detectors are susceptible to inferior inputs that degrade performance and stability.(B)RGB-IR fusion detectors are susceptible to redundant features that reduce accuracy and efficiency.In this paper,an innovative RGB-IR fusion detection framework based on global-local feature optimization,named GLFDet,is proposed to improve the detection performance and efficiency of drone-captured objects.The key components of GLFDet include a Global Feature Optimization(GFO)module,a Local Feature Optimization(LFO)module and a Channel Separation Fusion(CSF)module.Specifically,GFO calculates the information content of the input image from the frequency domain and optimizes the features holistically.Then,LFO dynamically selects high-value features and filters out low-value features before fusion,which significantly improves the efficiency of fusion.Finally,CSF fuses the RGB and IR features across the corresponding channels,which avoids the rearrangement of the channel relationships and enhances the model stability.Extensive experimental results show that the proposed method achieves the best performance on three popular RGB-IR datasets Drone Vehicle,VEDAI,and LLVIP.In addition,GLFDet is more lightweight than other comparable models,making it more appealing to edge devices such as drones.The code is available at https://github.com/lao chen330/GLFDet. 展开更多
关键词 object detection Deep learning RGB-IR fusion DRONES Global feature Local feature
原文传递
Efficient Spatiotemporal Information Utilization for Video Camouflaged Object Detection
9
作者 Dongdong Zhang Chunping Wang +1 位作者 Huiying Wang Qiang Fu 《Computers, Materials & Continua》 2025年第3期4319-4338,共20页
Video camouflaged object detection(VCOD)has become a fundamental task in computer vision that has attracted significant attention in recent years.Unlike image camouflaged object detection(ICOD),VCOD not only requires ... Video camouflaged object detection(VCOD)has become a fundamental task in computer vision that has attracted significant attention in recent years.Unlike image camouflaged object detection(ICOD),VCOD not only requires spatial cues but also needs motion cues.Thus,effectively utilizing spatiotemporal information is crucial for generating accurate segmentation results.Current VCOD methods,which typically focus on exploring motion representation,often ineffectively integrate spatial and motion features,leading to poor performance in diverse scenarios.To address these issues,we design a novel spatiotemporal network with an encoder-decoder structure.During the encoding stage,an adjacent space-time memory module(ASTM)is employed to extract high-level temporal features(i.e.,motion cues)from the current frame and its adjacent frames.In the decoding stage,a selective space-time aggregation module is introduced to efficiently integrate spatial and temporal features.Additionally,a multi-feature fusion module is developed to progressively refine the rough prediction by utilizing the information provided by multiple types of features.Furthermore,we incorporate multi-task learning into the proposed network to obtain more accurate predictions.Experimental results show that the proposed method outperforms existing cutting-edge baselines on VCOD benchmarks. 展开更多
关键词 Video camouflaged object detection spatiotemporal information feature fusion multi-task learning
在线阅读 下载PDF
Exploration of the Application of Artificial Intelligence Technology in the Transformation of Old Objects
10
作者 Tonghuan Zhang Xinyu Yang +1 位作者 Ying Chen Qiufan Xie 《Journal of Electronic Research and Application》 2025年第2期51-57,共7页
With the rapid development of technology,artificial intelligence(AI)is increasingly being applied in various fields.In today’s context of resource scarcity,pursuit of sustainable development and resource reuse,the tr... With the rapid development of technology,artificial intelligence(AI)is increasingly being applied in various fields.In today’s context of resource scarcity,pursuit of sustainable development and resource reuse,the transformation of old objects is particularly important.This article analyzes the current status of old object transformation and the opportunities brought by the internet to old objects and delves into the application of artificial intelligence in old object transformation.The focus is on five aspects:intelligent identification and classification,intelligent evaluation and prediction,automation integration,intelligent design and optimization,and integration of 3D printing technology.Finally,the process of“redesigning an old furniture,such as a wooden desk,through AI technology”is described,including the recycling,identification,detection,design,transformation,and final user feedback of the old wooden desk.This illustrates the unlimited potential of the“AI+old object transformation”approach,advocates for people to strengthen green environmental protection,and drives sustainable development. 展开更多
关键词 Artificial Intelligence(AI) Old object transformation Environmental protection
在线阅读 下载PDF
Railway-CLIP:A multimodal model for abnormal object detection in high-speed railway
11
作者 Jiayu Zhang Qingji Guan +2 位作者 Junbo Liu Yaping Huang Jianyong Guo 《High-Speed Railway》 2025年第3期194-204,共11页
Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical impo... Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical importance of this task, conventional vision-based foreign object detection methodologies have predominantly concentrated on image data, neglecting the exploration and integration of textual information. The currently popular multimodal model Contrastive Language-Image Pre-training (CLIP) employs contrastive learning to enable simultaneous understanding of both visual and textual modalities. Drawing inspiration from CLIP’s capabilities, this paper introduces a novel CLIP-based multimodal foreign object detection model tailored for railway applications, referred to as Railway-CLIP. This model leverages CLIP’s robust generalization capabilities to enhance performance in the context of catenary foreign object detection. The Railway-CLIP model is primarily composed of an image encoder and a text encoder. Initially, the Segment Anything Model (SAM) is employed to preprocess raw images, identifying candidate bounding boxes that may contain foreign objects. Both the original images and the detected candidate bounding boxes are subsequently fed into the image encoder to extract their respective visual features. In parallel, distinct prompt templates are crafted for both the original images and the candidate bounding boxes to serve as textual inputs. These prompts are then processed by the text encoder to derive textual features. The image and text encoders collaboratively project the multimodal features into a shared semantic space, facilitating the computation of similarity scores between visual and textual representations. The final detection results are determined based on these similarity scores, ensuring a robust and accurate identification of anomalous objects. Extensive experiments on our collected Railway Anomaly Dataset (RAD) demonstrate that the proposed Railway-CLIP outperforms previous state-of-the-art methods, achieving 97.25% AUROC and 92.66% F1-score, thereby validating the effectiveness and superiority of the proposed approach in real-world high-speed railway anomaly detection scenarios. 展开更多
关键词 High-speed railway catenary systems Anomalous object detection Multimodal model Railway-CLIP
在线阅读 下载PDF
Meyer Wavelet Transform and Jaccard Deep Q Net for Small Object Classification Using Multi-Modal Images
12
作者 Mian Muhammad Kamal Syed Zain Ul Abideen +7 位作者 MAAl-Khasawneh Alaa MMomani Hala Mostafa Mohammed Salem Atoum Saeed Ullah Jamil Abedalrahim Jamil Alsayaydeh Mohd Faizal Bin Yusof Suhaila Binti Mohd Najib 《Computer Modeling in Engineering & Sciences》 2025年第9期3053-3083,共31页
Accurate detection of small objects is critically important in high-stakes applications such as military reconnaissance and emergency rescue.However,low resolution,occlusion,and background interference make small obje... Accurate detection of small objects is critically important in high-stakes applications such as military reconnaissance and emergency rescue.However,low resolution,occlusion,and background interference make small object detection a complex and demanding task.One effective approach to overcome these issues is the integration of multimodal image data to enhance detection capabilities.This paper proposes a novel small object detection method that utilizes three types of multimodal image combinations,such as Hyperspectral-Multispectral(HSMS),Hyperspectral-Synthetic Aperture Radar(HS-SAR),and HS-SAR-Digital Surface Model(HS-SAR-DSM).The detection process is done by the proposed Jaccard Deep Q-Net(JDQN),which integrates the Jaccard similarity measure with a Deep Q-Network(DQN)using regression modeling.To produce the final output,a Deep Maxout Network(DMN)is employed to fuse the detection results obtained from each modality.The effectiveness of the proposed JDQN is validated using performance metrics,such as accuracy,Mean Squared Error(MSE),precision,and Root Mean Squared Error(RMSE).Experimental results demonstrate that the proposed JDQN method outperforms existing approaches,achieving the highest accuracy of 0.907,a precision of 0.904,the lowest normalized MSE of 0.279,and a normalized RMSE of 0.528. 展开更多
关键词 Small object detection MULTIMODALITY deep learning jaccard deep Q-net deep maxout network
在线阅读 下载PDF
Physics-Informed Graph Learning for Shape Prediction in Robot Manipulate of Deformable Linear Objects
13
作者 Meixuan Wang Junliang Wang +2 位作者 Jie Zhang Xinting Liao Guojin Li 《Chinese Journal of Mechanical Engineering》 2025年第6期154-165,共12页
Shape prediction of deformable linear objects(DLO)plays critical roles in robotics,medical devices,aerospace,and manufacturing,especially in manipulating objects such as cables,wires,and fibers.Due to the inherent fle... Shape prediction of deformable linear objects(DLO)plays critical roles in robotics,medical devices,aerospace,and manufacturing,especially in manipulating objects such as cables,wires,and fibers.Due to the inherent flexibility of DLO and their complex deformation behaviors,such as bending and torsion,it is challenging to predict their dynamic characteristics accurately.Although the traditional physical modeling method can simulate the complex deformation behavior of DLO,the calculation cost is high and it is difficult to meet the demand of real-time prediction.In addition,the scarcity of data resources also limits the prediction accuracy of existing models.To solve these problems,a method of fiber shape prediction based on a physical information graph neural network(PIGNN)is proposed in this paper.This method cleverly combines the powerful expressive power of graph neural networks with the strict constraints of physical laws.Specifically,we learn the initial deformation model of the fiber through graph neural networks(GNN)to provide a good initial estimate for the model,which helps alleviate the problem of data resource scarcity.During the training process,we incorporate the physical prior knowledge of the dynamic deformation of the fiber optics into the loss function as a constraint,which is then fed back to the network model.This ensures that the shape of the fiber optics gradually approaches the true target shape,effectively solving the complex nonlinear behavior prediction problem of deformable linear objects.Experimental results demonstrate that,compared to traditional methods,the proposed method significantly reduces execution time and prediction error when handling the complex deformations of deformable fibers.This showcases its potential application value and superiority in fiber manipulation. 展开更多
关键词 Deformable linear objects Fiber Physics-informed graph neural network(PIGNN) Shape prediction
在线阅读 下载PDF
Two Performance Indicators Assisted Infill Strategy for Expensive Many⁃Objective Optimization
14
作者 Yi Zhao Jianchao Zeng Ying Tan 《Journal of Harbin Institute of Technology(New Series)》 2025年第5期24-40,共17页
In recent years,surrogate models derived from genuine data samples have proven to be efficient in addressing optimization challenges that are costly or time⁃intensive.However,the individuals in the population become i... In recent years,surrogate models derived from genuine data samples have proven to be efficient in addressing optimization challenges that are costly or time⁃intensive.However,the individuals in the population become indistinguishable as the curse of dimensionality increases in the objective space and the accumulation of surrogate approximated errors.Therefore,in this paper,each objective function is modeled using a radial basis function approach,and the optimal solution set of the surrogate model is located by the multi⁃objective evolutionary algorithm of strengthened dominance relation.The original objective function values of the true evaluations are converted to two indicator values,and then the surrogate models are set up for the two performance indicators.Finally,an adaptive infill sampling strategy that relies on approximate performance indicators is proposed to assist in selecting individuals for real evaluations from the potential optimal solution set.The algorithm is contrasted against several advanced surrogate⁃assisted evolutionary algorithms on two suites of test cases,and the experimental findings prove that the approach is competitive in solving expensive many⁃objective optimization problems. 展开更多
关键词 expensive multi⁃objective optimization problems infill sample strategy evolutionary optimization algorithm
在线阅读 下载PDF
知识蒸馏Transformer的人物交互检测 被引量:1
15
作者 陈东吉 赖惠成 +3 位作者 高古学 马骏 李俊凯 权虎拓 《计算机工程》 北大核心 2026年第1期206-216,共11页
得到广泛应用的跨界之星Transformer,在人-物交互(HOI)检测领域同样取得了很好的效果。基于此,提出全新的基于知识蒸馏的Transformer(KDT)网络来进行端到端的HOI检测。由于Transformer网络建模的HOI整体特征粗糙,针对HOI检测的3个子任务... 得到广泛应用的跨界之星Transformer,在人-物交互(HOI)检测领域同样取得了很好的效果。基于此,提出全新的基于知识蒸馏的Transformer(KDT)网络来进行端到端的HOI检测。由于Transformer网络建模的HOI整体特征粗糙,针对HOI检测的3个子任务:预测人框,预测物框与物体类别,预测人物之间的交互动作,构建基础多分支Transformer结构,包含一个人体实例分支、一个物体实例分支和一个交互分支,并利用人、物分支的解码器为交互分支解码器提供人、物的区域线索。为了给Transformer结构提供关键的语义、空间信息,预先生成物体类别和交互动词语义特征,以及人物框的空间特征为不同的Transformer分支提供语义、空间线索,进一步提升解码器对于不同HOI任务的特征提取能力。并在此基础上构建另一个多分支Transformer结构作为教师网络,教师网络的解码器以预生成特征为解码器查询,输出更精确的HOI特征。在训练过程中让基础多分支网络模仿教师网络的输出,构建额外的类相似度损失度量两个网络输出预测之间的类内、类间向量相似度,从而达到提升基础网络解码器性能的目的。实验结果表明,在人-物交互基准数据集HICO-DET所有类别、稀有类别和非稀有类别上的均值平均精度(mAP)分别为32.13%、28.57%和33.19%,对比基线取得了最多4.65百分点的提升。 展开更多
关键词 Transformer网络 人-物交互 预生成特征 教师网络 类相似度损失
在线阅读 下载PDF
基于Transformer-CNN特征深度融合的复杂环境香梨目标检测
16
作者 杨瑛 谭忠 郑文轩 《农业机械学报》 北大核心 2026年第2期161-170,共10页
为了能在复杂环境下提高香梨目标检测准确性,本研究提出了一种基于Transformer-CNN特征深度融合的TC-ICSA-YOLO v8香梨目标检测模型。模型有效融合了卷积神经网络CNN在提取图像局部(高频)特征信息和Transformer在提取图像全局(低频)特... 为了能在复杂环境下提高香梨目标检测准确性,本研究提出了一种基于Transformer-CNN特征深度融合的TC-ICSA-YOLO v8香梨目标检测模型。模型有效融合了卷积神经网络CNN在提取图像局部(高频)特征信息和Transformer在提取图像全局(低频)特征信息的优势,设计了Inception dilated卷积模块和自适应细节融合模块ADI,引入挤压增强轴向注意力机制SeaAttention和坐标注意力机制CA等提升模型的特征提取能力,采用傅里叶变换FT进行数据增强,使用频率斜坡结构以更好地平衡局部(高频)特征信息和全局(低频)特征信息成分,从而优化网络在特征提取过程的性能表现。试验结果表明,TC-ICSA-YOLO v8模型在验证集上的平均精度均值(mAP)、精确率、召回率和检测速度分别达到97.01%、97.33%、95.69%和81.21 f/s。本文模型对夜间拍摄的图像目标检测精度也优于同等条件的YOLO v8s模型;其平均精度均值(mAP)与Faster R-CNN、YOLO v3、YOLO v7s、YOLO v8s、SwinTransformer、RT-DETR模型对比分别提高14.65、3.34、0.52、0.20、6.68、5.45个百分点;模型内存占用量分别减少74.60、34.16、9.48、16.81、20.84、13.64 MB,本文模型检测精度更高,参数量更少,更有利于部署在移动端。本文提出的检测模型对香梨具有很好的目标检测效果,为复杂环境下目标检测提供参考,可为香梨自动化采摘提供技术支撑。 展开更多
关键词 TRANSFormER 香梨 目标检测 傅里叶变换
在线阅读 下载PDF
Transformer网络在集装箱箱号自动识别中的应用
17
作者 张明 涂昊 +1 位作者 程文明 杜润 《机械设计与制造》 北大核心 2026年第3期206-209,214,共5页
集装箱作为现代物流中最重要的载体之一,其箱号的自动识别技术对于提高集装箱码头的自动化和信息化程度起着至关重要的作用。然而,现有的研究并不能够满足码头实时、准确地识别集装箱箱号的要求。因此,我们提出了一种轻量化的箱号识别... 集装箱作为现代物流中最重要的载体之一,其箱号的自动识别技术对于提高集装箱码头的自动化和信息化程度起着至关重要的作用。然而,现有的研究并不能够满足码头实时、准确地识别集装箱箱号的要求。因此,我们提出了一种轻量化的箱号识别算法。该方法分为两个模块:定位模块与识别模块。在定位模块中,基于YOLO v3算法进行改进,提出了一个新的轻量化定位网络—swift-YOLO;在识别模块中,设计了基于Transformer的字符识别网络对箱号进行识别。实验结果表明,所提方法识别成功率达到了98.3%,且单帧识别时间仅为20ms,显著优于现有的最佳结果。 展开更多
关键词 深度学习 集装箱箱号 字符识别 目标检测 Transformer网络
在线阅读 下载PDF
基于Transformer的DETR目标检测算法综述
18
作者 李沂杨 陆声链 +1 位作者 王继杰 陈明 《计算机工程》 北大核心 2026年第4期62-81,共20页
在目标检测领域,卷积神经网络(CNN)凭借其优异的准确性和可扩展性,长期主导着相关研究,并获得了学术界的广泛认可。在此框架下,先后涌现出基于区域的卷积神经网络(R-CNN)系列(如Fast R-CNN、Faster R-CNN)与YOLO(You Only Look Once)系... 在目标检测领域,卷积神经网络(CNN)凭借其优异的准确性和可扩展性,长期主导着相关研究,并获得了学术界的广泛认可。在此框架下,先后涌现出基于区域的卷积神经网络(R-CNN)系列(如Fast R-CNN、Faster R-CNN)与YOLO(You Only Look Once)系列等多个代表性模型。随着Transformer在自然语言处理领域的成功,研究者开始探索将其用于计算机视觉领域,由此产生了视觉Transformer(ViT)和Swin Transformer等视觉骨干网络。Facebook团队为减少目标检测任务中的先验知识和后处理,在2020年推出了一种端到端目标检测算法——基于Transformer的DETR(DEtection TRansformer)。尽管DETR在目标检测领域展现出潜力,但也存在收敛速度慢、准确性较差、目标查询的物理意义不明确等缺点。这促使研究者对该算法开展了进一步的研究和改进。本研究旨在归纳总结针对DETR的改进探索,并分析它们的优势与不足,同时对利用DETR开展的前沿研究和细分应用领域进行概括,最后给出DETR在计算机视觉领域的未来展望。 展开更多
关键词 计算机视觉 目标检测 DETR算法 视觉Transformer 图像分割
在线阅读 下载PDF
An Improved Variant of Multi-Population Cooperative Constrained Multi-Objective Optimization(MCCMO)for Multi-Objective Optimization Problem
19
作者 Muhammad Waqar Khan Adnan Ahmed Siddiqui Syed Sajjad Hussain Rizvi 《Computers, Materials & Continua》 2026年第2期1874-1888,共15页
The multi-objective optimization problems,especially in constrained environments such as power distribution planning,demand robust strategies for discovering effective solutions.This work presents the improved variant... The multi-objective optimization problems,especially in constrained environments such as power distribution planning,demand robust strategies for discovering effective solutions.This work presents the improved variant of the Multi-population Cooperative Constrained Multi-Objective Optimization(MCCMO)Algorithm,termed Adaptive Diversity Preservation(ADP).This enhancement is primarily focused on the improvement of constraint handling strategies,local search integration,hybrid selection approaches,and adaptive parameter control.Theimproved variant was experimented on with the RWMOP50 power distribution systemplanning benchmark.As per the findings,the improved variant outperformed the original MCCMO across the eleven performance metrics,particularly in terms of convergence speed,constraint handling efficiency,and solution diversity.The results also establish that MCCMOADP consistently delivers substantial performance gains over the baseline MCCMO,demonstrating its effectiveness across performancemetrics.The new variant also excels atmaintaining the balanced trade-off between exploration and exploitation throughout the search process,making it especially suitable for complex optimization problems in multiconstrained power systems.These enhancements make MCCMO-ADP a valuable and promising candidate for handling problems such as renewable energy scheduling,logistics planning,and power system optimization.Future work will benchmark the MCCMO-ADP against widely recognized algorithms such as NSGA-Ⅱ,NSGA-Ⅲ,and MOEA/D and will also extend its validation to large-scale real-world optimization domains to further consolidate its generalizability. 展开更多
关键词 MCCMO algorithms adaptive diversity preservation RWMOP50 power distribution system multi-modal multi objective optimization evolutionary algorithm multi objective problem
在线阅读 下载PDF
基于注意力和Conv2Former的激光雷达三维目标检测算法
20
作者 杨若楠 闫梓涵 +2 位作者 冯嘉强 关丽敏 张玉杰 《现代电子技术》 北大核心 2026年第3期137-144,共8页
针对点云稀疏性导致现有三维目标检测模型在特征提取方面表现不足的问题,文中在PointPillars基础上提出一种融合注意力和Conv2Former的三维目标检测模型。在体柱特征编码网络中,设计了堆叠式多重注意力模块,融合点级、通道级和体柱级注... 针对点云稀疏性导致现有三维目标检测模型在特征提取方面表现不足的问题,文中在PointPillars基础上提出一种融合注意力和Conv2Former的三维目标检测模型。在体柱特征编码网络中,设计了堆叠式多重注意力模块,融合点级、通道级和体柱级注意力有效增强点云局部特征表达能力,从而提升小目标检测精度;在特征融合网络中引入Conv2Former模块,通过卷积调制机制提高模型的全局上下文感知能力,进一步增强特征提取能力。在KITTI三维目标检测数据集上的实验结果表明:所提模型在车辆、行人和骑行者的3D检测精度较原始PointPillars模型分别提高了2.12%、3.56%和2.65%;此外,所提模型的执行速度达到38.46 f/s,完全满足实时处理的需求。上述结果验证了该模型在提升点云三维目标检测精度和实时性能方面的有效性与可行性。 展开更多
关键词 辅助驾驶 激光雷达 目标检测 PointPillars 注意力机制 Conv2Former
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部