The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditio...The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.展开更多
To address the issues of frequent identity switches(IDs)and degraded identification accuracy in multi object tracking(MOT)under complex occlusion scenarios,this study proposes an occlusion-robust tracking framework ba...To address the issues of frequent identity switches(IDs)and degraded identification accuracy in multi object tracking(MOT)under complex occlusion scenarios,this study proposes an occlusion-robust tracking framework based on face-pedestrian joint feature modeling.By constructing a joint tracking model centered on“intra-class independent tracking+cross-category dynamic binding”,designing a multi-modal matching metric with spatio-temporal and appearance constraints,and innovatively introducing a cross-category feature mutual verification mechanism and a dual matching strategy,this work effectively resolves performance degradation in traditional single-category tracking methods caused by short-term occlusion,cross-camera tracking,and crowded environments.Experiments on the Chokepoint_Face_Pedestrian_Track test set demonstrate that in complex scenes,the proposed method improves Face-Pedestrian Matching F1 area under the curve(F1 AUC)by approximately 4 to 43 percentage points compared to several traditional methods.The joint tracking model achieves overall performance metrics of IDF1:85.1825%and MOTA:86.5956%,representing improvements of 0.91 and 0.06 percentage points,respectively,over the baseline model.Ablation studies confirm the effectiveness of key modules such as the Intersection over Area(IoA)/Intersection over Union(IoU)joint metric and dynamic threshold adjustment,validating the significant role of the cross-category identity matching mechanism in enhancing tracking stability.Our_model shows a 16.7%frame per second(FPS)drop vs.fairness of detection and re-identification in multiple object tracking(FairMOT),with its cross-category binding module adding aboute 10%overhead,yet maintains near-real-time performance for essential face-pedestrian tracking at small resolutions.展开更多
With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods ...With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
Purpose: The primary aim of this study was to develop an assessment of the fundamental, combined, and complex movement skills required to support childhood physical literacy. The secondary aim was to establish the fea...Purpose: The primary aim of this study was to develop an assessment of the fundamental, combined, and complex movement skills required to support childhood physical literacy. The secondary aim was to establish the feasibility, objectivity, and reliability evidence for the assessment.Methods: An expert advisory group recommended a course format for the assessment that would require children to complete a series of dynamic movement skills. Criterion-referenced skill performance and completion time were the recommended forms of evaluation. Children, 8–12 years of age, self-reported their age and gender and then completed the study assessments while attending local schools or day camps. Face validity was previously established through a Delphi expert(n = 19, 21% female) review process. Convergent validity was evaluated by age and gender associations with assessment performance. Inter-and intra-rater(n = 53, 34% female) objectivity and test–retest(n = 60, 47% female) reliability were assessed through repeated test administration.Results: Median total score was 21 of 28 points(range 5–28). Median completion time was 17 s. Total scores were feasible for all 995 children who self-reported age and gender. Total score did not differ between inside and outside environments(95% confidence interval(CI) of difference:-0.7 to 0.6;p = 0.91) or with/without footwear(95%CI of difference:-2.5 to 1.9; p = 0.77). Older age(p < 0.001, η2= 0.15) and male gender(p < 0.001, η2= 0.02)were associated with a higher total score. Inter-rater objectivity evidence was excellent(intraclass correlation coefficient(ICC) = 0.99) for completion time and substantial for skill score(ICC = 0.69) for 104 attempts by 53 children(34% female). Intra-rater objectivity was moderate(ICC = 0.52) for skill score and excellent for completion time(ICC = 0.99). Reliability was excellent for completion time over a short(2–4 days; ICC = 0.84) or long(8–14days; ICC = 0.82) interval. Skill score reliability was moderate(ICC = 0.46) over a short interval, and substantial(ICC = 0.74) over a long interval.Conclusion: The Canadian Agility and Movement Skill Assessment is a feasible measure of selected fundamental, complex and combined movement skills, which are an important building block for childhood physical literacy. Moderate-to-excellent objectivity was demonstrated for children 8–12 years of age. Test–retest reliability has been established over an interval of at least 1 week. The time and skill scores can be accurately estimated by 1 trained examiner.展开更多
For a physically possible deformation field of a continuum, the deformation gradient function F can be decomposed into direct sum of a symmetric tensor S and on orthogonal tensor R, which is called S-R decomposition t...For a physically possible deformation field of a continuum, the deformation gradient function F can be decomposed into direct sum of a symmetric tensor S and on orthogonal tensor R, which is called S-R decomposition theorem. In this paper, the S-R decomposition unique existence theorem is proved, by employing matrix and tensor method. Also, a brief proof of its objectivity is given.展开更多
Although objectivity is mainly accounted for in terms of linguistic thought and communication,in this article I will aim to showthat at least one condition of possibility for our understanding of objectivity is ground...Although objectivity is mainly accounted for in terms of linguistic thought and communication,in this article I will aim to showthat at least one condition of possibility for our understanding of objectivity is grounded on a prepredicative,i. e. pre-linguistic and pre-communicative,level. I will endorse a Husserlian viewpoint on the issue,and I will try to develop some aspects of the Husserlian account of three-dimensional thing-perception by means of which I will showhowprepredicative experience can actually offer us a fundamental element of our common understanding of objectivity. In doing this,it will be necessary to acknowledge thing-perception as being primarily intertwined with indeterminacy. I will claim that only on the basis of such an intuitive and prepredicative access to the things as partially indeterminate,first,and as determinable,second,is it possible to have an understanding of the world as something (at least partially) independent from the intuition (s) all subjects can have of it. By means of the addition of a consciousness of the thing as accessible to other subjects,one achieves a vision of the thing as fully determinate in itself. This"vision",however,takes one to be aware of the determination of the thing as lying beyond any intuitive grasp of it. The result will,thus,be that the prepredicative constitution of our basic sense of objectivity leads us to intend the world as something which should be accounted for (also) by means of sources different from intuition.展开更多
基金funded by the National Natural Science Foundation of China under Grant No.62371187the Open Program of Hunan Intelligent Rehabilitation Robot and Auxiliary Equipment Engineering Technology Research Center under Grant No.2024JS101.
文摘The ubiquity of mobile devices has driven advancements in mobile object detection.However,challenges in multi-scale object detection in open,complex environments persist due to limited computational resources.Traditional approaches like network compression,quantization,and lightweight design often sacrifice accuracy or feature representation robustness.This article introduces the Fast Multi-scale Channel Shuffling Network(FMCSNet),a novel lightweight detection model optimized for mobile devices.FMCSNet integrates a fully convolutional Multilayer Perceptron(MLP)module,offering global perception without significantly increasing parameters,effectively bridging the gap between CNNs and Vision Transformers.FMCSNet achieves a delicate balance between computation and accuracy mainly by two key modules:the ShiftMLP module,including a shift operation and an MLP module,and a Partial group Convolutional(PGConv)module,reducing computation while enhancing information exchange between channels.With a computational complexity of 1.4G FLOPs and 1.3M parameters,FMCSNet outperforms CNN-based and DWConv-based ShuffleNetv2 by 1%and 4.5%mAP on the Pascal VOC 2007 dataset,respectively.Additionally,FMCSNet achieves a mAP of 30.0(0.5:0.95 IoU threshold)with only 2.5G FLOPs and 2.0M parameters.It achieves 32 FPS on low-performance i5-series CPUs,meeting real-time detection requirements.The versatility of the PGConv module’s adaptability across scenarios further highlights FMCSNet as a promising solution for real-time mobile object detection.
基金supported by the confidential research grant No.a8317。
文摘To address the issues of frequent identity switches(IDs)and degraded identification accuracy in multi object tracking(MOT)under complex occlusion scenarios,this study proposes an occlusion-robust tracking framework based on face-pedestrian joint feature modeling.By constructing a joint tracking model centered on“intra-class independent tracking+cross-category dynamic binding”,designing a multi-modal matching metric with spatio-temporal and appearance constraints,and innovatively introducing a cross-category feature mutual verification mechanism and a dual matching strategy,this work effectively resolves performance degradation in traditional single-category tracking methods caused by short-term occlusion,cross-camera tracking,and crowded environments.Experiments on the Chokepoint_Face_Pedestrian_Track test set demonstrate that in complex scenes,the proposed method improves Face-Pedestrian Matching F1 area under the curve(F1 AUC)by approximately 4 to 43 percentage points compared to several traditional methods.The joint tracking model achieves overall performance metrics of IDF1:85.1825%and MOTA:86.5956%,representing improvements of 0.91 and 0.06 percentage points,respectively,over the baseline model.Ablation studies confirm the effectiveness of key modules such as the Intersection over Area(IoA)/Intersection over Union(IoU)joint metric and dynamic threshold adjustment,validating the significant role of the cross-category identity matching mechanism in enhancing tracking stability.Our_model shows a 16.7%frame per second(FPS)drop vs.fairness of detection and re-identification in multiple object tracking(FairMOT),with its cross-category binding module adding aboute 10%overhead,yet maintains near-real-time performance for essential face-pedestrian tracking at small resolutions.
文摘With the rapid expansion of drone applications,accurate detection of objects in aerial imagery has become crucial for intelligent transportation,urban management,and emergency rescue missions.However,existing methods face numerous challenges in practical deployment,including scale variation handling,feature degradation,and complex backgrounds.To address these issues,we propose Edge-enhanced and Detail-Capturing You Only Look Once(EHDC-YOLO),a novel framework for object detection in Unmanned Aerial Vehicle(UAV)imagery.Based on the You Only Look Once version 11 nano(YOLOv11n)baseline,EHDC-YOLO systematically introduces several architectural enhancements:(1)a Multi-Scale Edge Enhancement(MSEE)module that leverages multi-scale pooling and edge information to enhance boundary feature extraction;(2)an Enhanced Feature Pyramid Network(EFPN)that integrates P2-level features with Cross Stage Partial(CSP)structures and OmniKernel convolutions for better fine-grained representation;and(3)Dynamic Head(DyHead)with multi-dimensional attention mechanisms for enhanced cross-scale modeling and perspective adaptability.Comprehensive experiments on the Vision meets Drones for Detection(VisDrone-DET)2019 dataset demonstrate that EHDC-YOLO achieves significant improvements,increasing mean Average Precision(mAP)@0.5 from 33.2%to 46.1%(an absolute improvement of 12.9 percentage points)and mAP@0.5:0.95 from 19.5%to 28.0%(an absolute improvement of 8.5 percentage points)compared with the YOLOv11n baseline,while maintaining a reasonable parameter count(2.81 M vs the baseline’s 2.58 M).Further ablation studies confirm the effectiveness of each proposed component,while visualization results highlight EHDC-YOLO’s superior performance in detecting objects and handling occlusions in complex drone scenarios.
基金National Science and Technology Council,the Republic of China,under grants NSTC 113-2221-E-194-011-MY3 and Research Center on Artificial Intelligence and Sustainability,National Chung Cheng University under the research project grant titled“Generative Digital Twin System Design for Sustainable Smart City Development in Taiwan.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
基金funded by a grant from the Canadian Institutes of Health Research awarded to Dr. Meghann Lloyd and Dr. Mark Tremblay (IHD 94356)
文摘Purpose: The primary aim of this study was to develop an assessment of the fundamental, combined, and complex movement skills required to support childhood physical literacy. The secondary aim was to establish the feasibility, objectivity, and reliability evidence for the assessment.Methods: An expert advisory group recommended a course format for the assessment that would require children to complete a series of dynamic movement skills. Criterion-referenced skill performance and completion time were the recommended forms of evaluation. Children, 8–12 years of age, self-reported their age and gender and then completed the study assessments while attending local schools or day camps. Face validity was previously established through a Delphi expert(n = 19, 21% female) review process. Convergent validity was evaluated by age and gender associations with assessment performance. Inter-and intra-rater(n = 53, 34% female) objectivity and test–retest(n = 60, 47% female) reliability were assessed through repeated test administration.Results: Median total score was 21 of 28 points(range 5–28). Median completion time was 17 s. Total scores were feasible for all 995 children who self-reported age and gender. Total score did not differ between inside and outside environments(95% confidence interval(CI) of difference:-0.7 to 0.6;p = 0.91) or with/without footwear(95%CI of difference:-2.5 to 1.9; p = 0.77). Older age(p < 0.001, η2= 0.15) and male gender(p < 0.001, η2= 0.02)were associated with a higher total score. Inter-rater objectivity evidence was excellent(intraclass correlation coefficient(ICC) = 0.99) for completion time and substantial for skill score(ICC = 0.69) for 104 attempts by 53 children(34% female). Intra-rater objectivity was moderate(ICC = 0.52) for skill score and excellent for completion time(ICC = 0.99). Reliability was excellent for completion time over a short(2–4 days; ICC = 0.84) or long(8–14days; ICC = 0.82) interval. Skill score reliability was moderate(ICC = 0.46) over a short interval, and substantial(ICC = 0.74) over a long interval.Conclusion: The Canadian Agility and Movement Skill Assessment is a feasible measure of selected fundamental, complex and combined movement skills, which are an important building block for childhood physical literacy. Moderate-to-excellent objectivity was demonstrated for children 8–12 years of age. Test–retest reliability has been established over an interval of at least 1 week. The time and skill scores can be accurately estimated by 1 trained examiner.
文摘For a physically possible deformation field of a continuum, the deformation gradient function F can be decomposed into direct sum of a symmetric tensor S and on orthogonal tensor R, which is called S-R decomposition theorem. In this paper, the S-R decomposition unique existence theorem is proved, by employing matrix and tensor method. Also, a brief proof of its objectivity is given.
文摘Although objectivity is mainly accounted for in terms of linguistic thought and communication,in this article I will aim to showthat at least one condition of possibility for our understanding of objectivity is grounded on a prepredicative,i. e. pre-linguistic and pre-communicative,level. I will endorse a Husserlian viewpoint on the issue,and I will try to develop some aspects of the Husserlian account of three-dimensional thing-perception by means of which I will showhowprepredicative experience can actually offer us a fundamental element of our common understanding of objectivity. In doing this,it will be necessary to acknowledge thing-perception as being primarily intertwined with indeterminacy. I will claim that only on the basis of such an intuitive and prepredicative access to the things as partially indeterminate,first,and as determinable,second,is it possible to have an understanding of the world as something (at least partially) independent from the intuition (s) all subjects can have of it. By means of the addition of a consciousness of the thing as accessible to other subjects,one achieves a vision of the thing as fully determinate in itself. This"vision",however,takes one to be aware of the determination of the thing as lying beyond any intuitive grasp of it. The result will,thus,be that the prepredicative constitution of our basic sense of objectivity leads us to intend the world as something which should be accounted for (also) by means of sources different from intuition.