Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in ed...Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in education continues to increase,educators actively seek innovative and immersive methods to engage students in learning.However,exploring these possibilities also entails identifying and overcoming existing barriers to optimal educational integration.Concurrently,this surge in demand has prompted the identification of specific barriers,one of which is three-dimensional(3D)modeling.Creating 3D objects for augmented reality education applications can be challenging and time-consuming for the educators.To address this,we have developed a pipeline that creates realistic 3D objects from the two-dimensional(2D)photograph.Applications for augmented and virtual reality can then utilize these created 3D objects.We evaluated the proposed pipeline based on the usability of the 3D object and performance metrics.Quantitatively,with 117 respondents,the co-creation team was surveyed with openended questions to evaluate the precision of the 3D object created by the proposed photogrammetry pipeline.We analyzed the survey data using descriptive-analytical methods and found that the proposed pipeline produces 3D models that are positively accurate when compared to real-world objects,with an average mean score above 8.This study adds new knowledge in creating 3D objects for augmented reality applications by using the photogrammetry technique;finally,it discusses potential problems and future research directions for 3D objects in the education sector.展开更多
To investigate the applicability of four commonly used color difference formulas(CIELAB,CIE94,CMC(1:1),and CIEDE2000)in the printing field on 3D objects,as well as the impact of four standard light sources(D65,D50,A,a...To investigate the applicability of four commonly used color difference formulas(CIELAB,CIE94,CMC(1:1),and CIEDE2000)in the printing field on 3D objects,as well as the impact of four standard light sources(D65,D50,A,and TL84)on 3D color difference evaluations,50 glossy spheres with a diameter of 2cm based on the Sailner J4003D color printing device were created.These spheres were centered around the five recommended colors(gray,red,yellow,green,and blue)by CIE.Color difference was calculated according to the four formulas,and 111 pairs of experimental samples meeting the CIELAB gray scale color difference requirements(1.0-14.0)were selected.Ten observers,aged between 22 and 27 with normal color vision,were participated in this study,using the gray scale method from psychophysical experiments to conduct color difference evaluations under the four light sources,with repeated experiments for each observer.The results indicated that the overall effect of the D65 light source on 3D objects color difference was minimal.In contrast,D50 and A light sources had a significant impact within the small color difference range,while the TL84 light source influenced both large and small color difference considerably.Among the four color difference formulas,CIEDE2000 demonstrated the best predictive performance for color difference in 3D objects,followed by CMC(1:1),CIE94,and CIELAB.展开更多
Transorbital craniocerebral injury is a relatively rare type of penetrating head injury that poses a significant threat to the ocular and cerebral structures.^([1])The clinical prognosis of transorbital craniocerebral...Transorbital craniocerebral injury is a relatively rare type of penetrating head injury that poses a significant threat to the ocular and cerebral structures.^([1])The clinical prognosis of transorbital craniocerebral injury is closely related to the size,shape,speed,nature,and trajectory of the foreign object,as well as the incidence of central nervous system damage and secondary complications.The foreign objects reported to have caused these injuries are categorized into wooden items,metallic items,^([2-8])and other materials,which penetrate the intracranial region via fi ve major pathways,including the orbital roof (OR),superior orbital fissure (SOF),inferior orbital fissure(IOF),optic canal (OC),and sphenoid wing.Herein,we present eight cases of transorbital craniocerebral injury caused by an unusual metallic foreign body.展开更多
With the rapid development of technology,artificial intelligence(AI)is increasingly being applied in various fields.In today’s context of resource scarcity,pursuit of sustainable development and resource reuse,the tr...With the rapid development of technology,artificial intelligence(AI)is increasingly being applied in various fields.In today’s context of resource scarcity,pursuit of sustainable development and resource reuse,the transformation of old objects is particularly important.This article analyzes the current status of old object transformation and the opportunities brought by the internet to old objects and delves into the application of artificial intelligence in old object transformation.The focus is on five aspects:intelligent identification and classification,intelligent evaluation and prediction,automation integration,intelligent design and optimization,and integration of 3D printing technology.Finally,the process of“redesigning an old furniture,such as a wooden desk,through AI technology”is described,including the recycling,identification,detection,design,transformation,and final user feedback of the old wooden desk.This illustrates the unlimited potential of the“AI+old object transformation”approach,advocates for people to strengthen green environmental protection,and drives sustainable development.展开更多
Shape prediction of deformable linear objects(DLO)plays critical roles in robotics,medical devices,aerospace,and manufacturing,especially in manipulating objects such as cables,wires,and fibers.Due to the inherent fle...Shape prediction of deformable linear objects(DLO)plays critical roles in robotics,medical devices,aerospace,and manufacturing,especially in manipulating objects such as cables,wires,and fibers.Due to the inherent flexibility of DLO and their complex deformation behaviors,such as bending and torsion,it is challenging to predict their dynamic characteristics accurately.Although the traditional physical modeling method can simulate the complex deformation behavior of DLO,the calculation cost is high and it is difficult to meet the demand of real-time prediction.In addition,the scarcity of data resources also limits the prediction accuracy of existing models.To solve these problems,a method of fiber shape prediction based on a physical information graph neural network(PIGNN)is proposed in this paper.This method cleverly combines the powerful expressive power of graph neural networks with the strict constraints of physical laws.Specifically,we learn the initial deformation model of the fiber through graph neural networks(GNN)to provide a good initial estimate for the model,which helps alleviate the problem of data resource scarcity.During the training process,we incorporate the physical prior knowledge of the dynamic deformation of the fiber optics into the loss function as a constraint,which is then fed back to the network model.This ensures that the shape of the fiber optics gradually approaches the true target shape,effectively solving the complex nonlinear behavior prediction problem of deformable linear objects.Experimental results demonstrate that,compared to traditional methods,the proposed method significantly reduces execution time and prediction error when handling the complex deformations of deformable fibers.This showcases its potential application value and superiority in fiber manipulation.展开更多
Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging du...Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging due to advances in both camouflage materials and biological mimicry.Although multispectral-RGB based technology shows promise,conventional dual-aperture multispectral-RGB imaging systems are constrained by imprecise and time-consuming registration and fusion across different modalities,limiting their performance.Here,we propose the Reconstructed Multispectral-RGB Fusion Network(RMRF-Net),which reconstructs RGB images into multispectral ones,enabling efficient multimodal segmentation using only an RGB camera.Specifically,RMRF-Net employs a divergentsimilarity feature correction strategy to minimize reconstruction errors and includes an efficient boundary-aware decoder to enhance object contours.Notably,we establish the first real-world aerial multispectral-RGB semantic segmentation of camouflage objects dataset,including 11 object categories.Experimental results demonstrate that RMRF-Net outperforms existing methods,achieving 17.38 FPS on the NVIDIA Jetson AGX Orin,with only a 0.96%drop in mIoU compared to the RTX 3090,showing its practical applicability in multimodal remote sensing.展开更多
The increasing prevalence of violent incidents in public spaces has created an urgent need for intelligent surveillance systems capable of detecting dangerous objects in real time.While traditional video surveillance ...The increasing prevalence of violent incidents in public spaces has created an urgent need for intelligent surveillance systems capable of detecting dangerous objects in real time.While traditional video surveillance relies on human monitoring,this approach suffers from limitations such as fatigue and delayed response times.This study addresses these challenges by developing an automated detection system using advanced deep learning techniques to enhance public safety.Our approach leverages state-of-the-art convolutional neural networks(CNNs),specifically You Only Look Once version 4(YOLOv4)and EfficientDet,for real-time object detection.The system was trained on a comprehensive dataset of over 50,000 images,enhanced through data augmentation techniques to improve robustness across varying lighting conditions and viewing angles.Cloud-based deployment on Amazon Web Services(AWS)ensured scalability and efficient processing.Experimental evaluations demonstrated high performance,with YOLOv4 achieving 92%accuracy and processing images in 0.45 s,while EfficientDet reached 93%accuracy with a slightly longer processing time of 0.55 s per image.Field tests in high-traffic environments such as train stations and shopping malls confirmed the system’s reliability,with a false alarm rate of only 4.5%.The integration of automatic alerts enabled rapid security responses to potential threats.The proposed CNN-based system provides an effective solution for real-time detection of dangerous objects in video surveillance,significantly improving response times and public safety.While YOLOv4 proved more suitable for speed-critical applications,EfficientDet offered marginally better accuracy.Future work will focus on optimizing the system for low-light conditions and further reducing false positives.This research contributes to the advancement of AI-driven surveillance technologies,offering a scalable framework adaptable to various security scenarios.展开更多
Most image-based object detection methods employ horizontal bounding boxes(HBBs)to capture objects in tunnel images.However,these bounding boxes often fail to effectively enclose objects oriented in arbitrary directio...Most image-based object detection methods employ horizontal bounding boxes(HBBs)to capture objects in tunnel images.However,these bounding boxes often fail to effectively enclose objects oriented in arbitrary directions,resulting in reduced accuracy and suboptimal detection performance.Moreover,HBBs cannot provide directional information for rotated objects.This study proposes a rotated detection method for identifying apparent defects in shield tunnels.Specifically,the oriented region-convolutional neural network(oriented R-CNN)is utilized to detect rotated objects in tunnel images.To enhance feature extraction,a novel hybrid backbone combining CNN-based networks with Swin Transformers is proposed.A feature fusion strategy is employed to integrate features extracted from both networks.Additionally,a neck network based on the bidirectional-feature pyramid network(Bi-FPN)is designed to combine multi-scale object features.The bolt hole dataset is curated to evaluate the efficacyof the proposed method.In addition,a dedicated pre-processing approach is developed for large-sized images to accommodate the rotated,dense,and small-scale characteristics of objects in tunnel images.Experimental results demonstrate that the proposed method achieves a more than 4%improvement in mAP_(50-95)compared to other rotated detectors and a 6.6%-12.7%improvement over mainstream horizontal detectors.Furthermore,the proposed method outperforms mainstream methods by 6.5%-14.7%in detecting leakage bolt holes,underscoring its significant engineering applicability.展开更多
The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditio...The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.展开更多
To address the issues of frequent identity switches(IDs)and degraded identification accuracy in multi object tracking(MOT)under complex occlusion scenarios,this study proposes an occlusion-robust tracking framework ba...To address the issues of frequent identity switches(IDs)and degraded identification accuracy in multi object tracking(MOT)under complex occlusion scenarios,this study proposes an occlusion-robust tracking framework based on face-pedestrian joint feature modeling.By constructing a joint tracking model centered on“intra-class independent tracking+cross-category dynamic binding”,designing a multi-modal matching metric with spatio-temporal and appearance constraints,and innovatively introducing a cross-category feature mutual verification mechanism and a dual matching strategy,this work effectively resolves performance degradation in traditional single-category tracking methods caused by short-term occlusion,cross-camera tracking,and crowded environments.Experiments on the Chokepoint_Face_Pedestrian_Track test set demonstrate that in complex scenes,the proposed method improves Face-Pedestrian Matching F1 area under the curve(F1 AUC)by approximately 4 to 43 percentage points compared to several traditional methods.The joint tracking model achieves overall performance metrics of IDF1:85.1825%and MOTA:86.5956%,representing improvements of 0.91 and 0.06 percentage points,respectively,over the baseline model.Ablation studies confirm the effectiveness of key modules such as the Intersection over Area(IoA)/Intersection over Union(IoU)joint metric and dynamic threshold adjustment,validating the significant role of the cross-category identity matching mechanism in enhancing tracking stability.Our_model shows a 16.7%frame per second(FPS)drop vs.fairness of detection and re-identification in multiple object tracking(FairMOT),with its cross-category binding module adding aboute 10%overhead,yet maintains near-real-time performance for essential face-pedestrian tracking at small resolutions.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
文摘Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in education continues to increase,educators actively seek innovative and immersive methods to engage students in learning.However,exploring these possibilities also entails identifying and overcoming existing barriers to optimal educational integration.Concurrently,this surge in demand has prompted the identification of specific barriers,one of which is three-dimensional(3D)modeling.Creating 3D objects for augmented reality education applications can be challenging and time-consuming for the educators.To address this,we have developed a pipeline that creates realistic 3D objects from the two-dimensional(2D)photograph.Applications for augmented and virtual reality can then utilize these created 3D objects.We evaluated the proposed pipeline based on the usability of the 3D object and performance metrics.Quantitatively,with 117 respondents,the co-creation team was surveyed with openended questions to evaluate the precision of the 3D object created by the proposed photogrammetry pipeline.We analyzed the survey data using descriptive-analytical methods and found that the proposed pipeline produces 3D models that are positively accurate when compared to real-world objects,with an average mean score above 8.This study adds new knowledge in creating 3D objects for augmented reality applications by using the photogrammetry technique;finally,it discusses potential problems and future research directions for 3D objects in the education sector.
文摘To investigate the applicability of four commonly used color difference formulas(CIELAB,CIE94,CMC(1:1),and CIEDE2000)in the printing field on 3D objects,as well as the impact of four standard light sources(D65,D50,A,and TL84)on 3D color difference evaluations,50 glossy spheres with a diameter of 2cm based on the Sailner J4003D color printing device were created.These spheres were centered around the five recommended colors(gray,red,yellow,green,and blue)by CIE.Color difference was calculated according to the four formulas,and 111 pairs of experimental samples meeting the CIELAB gray scale color difference requirements(1.0-14.0)were selected.Ten observers,aged between 22 and 27 with normal color vision,were participated in this study,using the gray scale method from psychophysical experiments to conduct color difference evaluations under the four light sources,with repeated experiments for each observer.The results indicated that the overall effect of the D65 light source on 3D objects color difference was minimal.In contrast,D50 and A light sources had a significant impact within the small color difference range,while the TL84 light source influenced both large and small color difference considerably.Among the four color difference formulas,CIEDE2000 demonstrated the best predictive performance for color difference in 3D objects,followed by CMC(1:1),CIE94,and CIELAB.
文摘Transorbital craniocerebral injury is a relatively rare type of penetrating head injury that poses a significant threat to the ocular and cerebral structures.^([1])The clinical prognosis of transorbital craniocerebral injury is closely related to the size,shape,speed,nature,and trajectory of the foreign object,as well as the incidence of central nervous system damage and secondary complications.The foreign objects reported to have caused these injuries are categorized into wooden items,metallic items,^([2-8])and other materials,which penetrate the intracranial region via fi ve major pathways,including the orbital roof (OR),superior orbital fissure (SOF),inferior orbital fissure(IOF),optic canal (OC),and sphenoid wing.Herein,we present eight cases of transorbital craniocerebral injury caused by an unusual metallic foreign body.
基金2023 College Student Innovation and Entrepreneurship Training Program-Provincial and Ministerial Level(Chongqing City):Jiangjiang-A DIY Old Object Transformation Platform Integrating AI Technology(Project No.:S202312608036)。
文摘With the rapid development of technology,artificial intelligence(AI)is increasingly being applied in various fields.In today’s context of resource scarcity,pursuit of sustainable development and resource reuse,the transformation of old objects is particularly important.This article analyzes the current status of old object transformation and the opportunities brought by the internet to old objects and delves into the application of artificial intelligence in old object transformation.The focus is on five aspects:intelligent identification and classification,intelligent evaluation and prediction,automation integration,intelligent design and optimization,and integration of 3D printing technology.Finally,the process of“redesigning an old furniture,such as a wooden desk,through AI technology”is described,including the recycling,identification,detection,design,transformation,and final user feedback of the old wooden desk.This illustrates the unlimited potential of the“AI+old object transformation”approach,advocates for people to strengthen green environmental protection,and drives sustainable development.
基金Supported by the Fundamental Research Funds for the Central Universities(Grant Nos.2232024Y-01,LZB2023001)DHU Distinguished Young Professor Program+1 种基金National Natural Science Foundation of China(Grant No.52275478)AI-Enhanced Research Program of Shanghai Municipal Education Commission(Grant No.SMEC-AI-DHUY-05)。
文摘Shape prediction of deformable linear objects(DLO)plays critical roles in robotics,medical devices,aerospace,and manufacturing,especially in manipulating objects such as cables,wires,and fibers.Due to the inherent flexibility of DLO and their complex deformation behaviors,such as bending and torsion,it is challenging to predict their dynamic characteristics accurately.Although the traditional physical modeling method can simulate the complex deformation behavior of DLO,the calculation cost is high and it is difficult to meet the demand of real-time prediction.In addition,the scarcity of data resources also limits the prediction accuracy of existing models.To solve these problems,a method of fiber shape prediction based on a physical information graph neural network(PIGNN)is proposed in this paper.This method cleverly combines the powerful expressive power of graph neural networks with the strict constraints of physical laws.Specifically,we learn the initial deformation model of the fiber through graph neural networks(GNN)to provide a good initial estimate for the model,which helps alleviate the problem of data resource scarcity.During the training process,we incorporate the physical prior knowledge of the dynamic deformation of the fiber optics into the loss function as a constraint,which is then fed back to the network model.This ensures that the shape of the fiber optics gradually approaches the true target shape,effectively solving the complex nonlinear behavior prediction problem of deformable linear objects.Experimental results demonstrate that,compared to traditional methods,the proposed method significantly reduces execution time and prediction error when handling the complex deformations of deformable fibers.This showcases its potential application value and superiority in fiber manipulation.
基金National Natural Science Foundation of China(Grant Nos.62005049 and 62072110)Natural Science Foundation of Fujian Province(Grant No.2020J01451).
文摘Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging due to advances in both camouflage materials and biological mimicry.Although multispectral-RGB based technology shows promise,conventional dual-aperture multispectral-RGB imaging systems are constrained by imprecise and time-consuming registration and fusion across different modalities,limiting their performance.Here,we propose the Reconstructed Multispectral-RGB Fusion Network(RMRF-Net),which reconstructs RGB images into multispectral ones,enabling efficient multimodal segmentation using only an RGB camera.Specifically,RMRF-Net employs a divergentsimilarity feature correction strategy to minimize reconstruction errors and includes an efficient boundary-aware decoder to enhance object contours.Notably,we establish the first real-world aerial multispectral-RGB semantic segmentation of camouflage objects dataset,including 11 object categories.Experimental results demonstrate that RMRF-Net outperforms existing methods,achieving 17.38 FPS on the NVIDIA Jetson AGX Orin,with only a 0.96%drop in mIoU compared to the RTX 3090,showing its practical applicability in multimodal remote sensing.
文摘The increasing prevalence of violent incidents in public spaces has created an urgent need for intelligent surveillance systems capable of detecting dangerous objects in real time.While traditional video surveillance relies on human monitoring,this approach suffers from limitations such as fatigue and delayed response times.This study addresses these challenges by developing an automated detection system using advanced deep learning techniques to enhance public safety.Our approach leverages state-of-the-art convolutional neural networks(CNNs),specifically You Only Look Once version 4(YOLOv4)and EfficientDet,for real-time object detection.The system was trained on a comprehensive dataset of over 50,000 images,enhanced through data augmentation techniques to improve robustness across varying lighting conditions and viewing angles.Cloud-based deployment on Amazon Web Services(AWS)ensured scalability and efficient processing.Experimental evaluations demonstrated high performance,with YOLOv4 achieving 92%accuracy and processing images in 0.45 s,while EfficientDet reached 93%accuracy with a slightly longer processing time of 0.55 s per image.Field tests in high-traffic environments such as train stations and shopping malls confirmed the system’s reliability,with a false alarm rate of only 4.5%.The integration of automatic alerts enabled rapid security responses to potential threats.The proposed CNN-based system provides an effective solution for real-time detection of dangerous objects in video surveillance,significantly improving response times and public safety.While YOLOv4 proved more suitable for speed-critical applications,EfficientDet offered marginally better accuracy.Future work will focus on optimizing the system for low-light conditions and further reducing false positives.This research contributes to the advancement of AI-driven surveillance technologies,offering a scalable framework adaptable to various security scenarios.
基金support from the National Natural Science Foundation of China(Grant Nos.52025084 and 52408420)the Beijing Natural Science Foundation(Grant No.8244058).
文摘Most image-based object detection methods employ horizontal bounding boxes(HBBs)to capture objects in tunnel images.However,these bounding boxes often fail to effectively enclose objects oriented in arbitrary directions,resulting in reduced accuracy and suboptimal detection performance.Moreover,HBBs cannot provide directional information for rotated objects.This study proposes a rotated detection method for identifying apparent defects in shield tunnels.Specifically,the oriented region-convolutional neural network(oriented R-CNN)is utilized to detect rotated objects in tunnel images.To enhance feature extraction,a novel hybrid backbone combining CNN-based networks with Swin Transformers is proposed.A feature fusion strategy is employed to integrate features extracted from both networks.Additionally,a neck network based on the bidirectional-feature pyramid network(Bi-FPN)is designed to combine multi-scale object features.The bolt hole dataset is curated to evaluate the efficacyof the proposed method.In addition,a dedicated pre-processing approach is developed for large-sized images to accommodate the rotated,dense,and small-scale characteristics of objects in tunnel images.Experimental results demonstrate that the proposed method achieves a more than 4%improvement in mAP_(50-95)compared to other rotated detectors and a 6.6%-12.7%improvement over mainstream horizontal detectors.Furthermore,the proposed method outperforms mainstream methods by 6.5%-14.7%in detecting leakage bolt holes,underscoring its significant engineering applicability.
基金funded by the National Natural Science Foundation of China under Grant No.62371187the Open Program of Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center under Grant No.2024JS101.
文摘The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.
基金supported by the confidential research grant No.a8317。
文摘To address the issues of frequent identity switches(IDs)and degraded identification accuracy in multi object tracking(MOT)under complex occlusion scenarios,this study proposes an occlusion-robust tracking framework based on face-pedestrian joint feature modeling.By constructing a joint tracking model centered on“intra-class independent tracking+cross-category dynamic binding”,designing a multi-modal matching metric with spatio-temporal and appearance constraints,and innovatively introducing a cross-category feature mutual verification mechanism and a dual matching strategy,this work effectively resolves performance degradation in traditional single-category tracking methods caused by short-term occlusion,cross-camera tracking,and crowded environments.Experiments on the Chokepoint_Face_Pedestrian_Track test set demonstrate that in complex scenes,the proposed method improves Face-Pedestrian Matching F1 area under the curve(F1 AUC)by approximately 4 to 43 percentage points compared to several traditional methods.The joint tracking model achieves overall performance metrics of IDF1:85.1825%and MOTA:86.5956%,representing improvements of 0.91 and 0.06 percentage points,respectively,over the baseline model.Ablation studies confirm the effectiveness of key modules such as the Intersection over Area(IoA)/Intersection over Union(IoU)joint metric and dynamic threshold adjustment,validating the significant role of the cross-category identity matching mechanism in enhancing tracking stability.Our_model shows a 16.7%frame per second(FPS)drop vs.fairness of detection and re-identification in multiple object tracking(FairMOT),with its cross-category binding module adding aboute 10%overhead,yet maintains near-real-time performance for essential face-pedestrian tracking at small resolutions.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金National Science and Technology Council,the Republic of China,under grants NSTC 113-2221-E-194-011-MY3 and Research Center on Artificial Intelligence and Sustainability,National Chung Cheng University under the research project grant titled“Generative Digital Twin System Design for Sustainable Smart City Development in Taiwan.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.