To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-cap...To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-captured images has posed a challenge for traditional target detection methods,especially in identifying small objects in high-resolution images.This study presents an enhanced object detection algorithm based on the Faster Regionbased Convolutional Neural Network(Faster R-CNN)framework,specifically tailored for detecting small-scale electrical components like insulators,shock hammers,and screws in transmission line.The algorithm features an improved backbone network for Faster R-CNN,which significantly boosts the feature extraction network’s ability to detect fine details.The Region Proposal Network is optimized using a method of guided feature refinement(GFR),which achieves a balance between accuracy and speed.The incorporation of Generalized Intersection over Union(GIOU)and Region of Interest(ROI)Align further refines themodel’s accuracy.Experimental results demonstrate a notable improvement in mean Average Precision,reaching 89.3%,an 11.1%increase compared to the standard Faster R-CNN.This highlights the effectiveness of the proposed algorithm in identifying electrical components in high-resolution aerial images.展开更多
The gears of new energy vehicles are required to withstand higher rotational speeds and greater loads,which puts forward higher precision essentials for gear manufacturing.However,machining process parameters can caus...The gears of new energy vehicles are required to withstand higher rotational speeds and greater loads,which puts forward higher precision essentials for gear manufacturing.However,machining process parameters can cause changes in cutting force/heat,resulting in affecting gear machining precision.Therefore,this paper studies the effect of different process parameters on gear machining precision.A multi-objective optimization model is established for the relationship between process parameters and tooth surface deviations,tooth profile deviations,and tooth lead deviations through the cutting speed,feed rate,and cutting depth of the worm wheel gear grinding machine.The response surface method(RSM)is used for experimental design,and the corresponding experimental results and optimal process parameters are obtained.Subsequently,gray relational analysis-principal component analysis(GRA-PCA),particle swarm optimization(PSO),and genetic algorithm-particle swarm optimization(GA-PSO)methods are used to analyze the experimental results and obtain different optimal process parameters.The results show that optimal process parameters obtained by the GRA-PCA,PSO,and GA-PSO methods improve the gear machining precision.Moreover,the gear machining precision obtained by GA-PSO is superior to other methods.展开更多
Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in ed...Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in education continues to increase,educators actively seek innovative and immersive methods to engage students in learning.However,exploring these possibilities also entails identifying and overcoming existing barriers to optimal educational integration.Concurrently,this surge in demand has prompted the identification of specific barriers,one of which is three-dimensional(3D)modeling.Creating 3D objects for augmented reality education applications can be challenging and time-consuming for the educators.To address this,we have developed a pipeline that creates realistic 3D objects from the two-dimensional(2D)photograph.Applications for augmented and virtual reality can then utilize these created 3D objects.We evaluated the proposed pipeline based on the usability of the 3D object and performance metrics.Quantitatively,with 117 respondents,the co-creation team was surveyed with openended questions to evaluate the precision of the 3D object created by the proposed photogrammetry pipeline.We analyzed the survey data using descriptive-analytical methods and found that the proposed pipeline produces 3D models that are positively accurate when compared to real-world objects,with an average mean score above 8.This study adds new knowledge in creating 3D objects for augmented reality applications by using the photogrammetry technique;finally,it discusses potential problems and future research directions for 3D objects in the education sector.展开更多
The subcortical visual pathway is generally thought to be involved in dangerous information processing,such as fear processing and defensive behavior.A recent study,published in Human Brain Mapping,shows a new functio...The subcortical visual pathway is generally thought to be involved in dangerous information processing,such as fear processing and defensive behavior.A recent study,published in Human Brain Mapping,shows a new function of the subcortical pathway involved in the fast processing of non-emotional object perception.Rapid object processing is a critical function of visual system.Topological perception theory proposes that the initial perception of objects begins with the extraction of topological property(TP).However,the mechanism of rapid TP processing remains unclear.The researchers investigated the subcortical mechanism of TP processing with transcranial magnetic stimulation(TMS).They find that a subcortical magnocellular pathway is responsible for the early processing of TP,and this subcortical processing of TP accelerates object recognition.Based on their findings,we propose a novel training approach called subcortical magnocellular pathway training(SMPT),aimed at improving the efficiency of the subcortical M pathway to restore visual and attentional functions in disorders associated with subcortical pathway dysfunction.展开更多
Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dens...Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging.展开更多
To investigate the applicability of four commonly used color difference formulas(CIELAB,CIE94,CMC(1:1),and CIEDE2000)in the printing field on 3D objects,as well as the impact of four standard light sources(D65,D50,A,a...To investigate the applicability of four commonly used color difference formulas(CIELAB,CIE94,CMC(1:1),and CIEDE2000)in the printing field on 3D objects,as well as the impact of four standard light sources(D65,D50,A,and TL84)on 3D color difference evaluations,50 glossy spheres with a diameter of 2cm based on the Sailner J4003D color printing device were created.These spheres were centered around the five recommended colors(gray,red,yellow,green,and blue)by CIE.Color difference was calculated according to the four formulas,and 111 pairs of experimental samples meeting the CIELAB gray scale color difference requirements(1.0-14.0)were selected.Ten observers,aged between 22 and 27 with normal color vision,were participated in this study,using the gray scale method from psychophysical experiments to conduct color difference evaluations under the four light sources,with repeated experiments for each observer.The results indicated that the overall effect of the D65 light source on 3D objects color difference was minimal.In contrast,D50 and A light sources had a significant impact within the small color difference range,while the TL84 light source influenced both large and small color difference considerably.Among the four color difference formulas,CIEDE2000 demonstrated the best predictive performance for color difference in 3D objects,followed by CMC(1:1),CIE94,and CIELAB.展开更多
Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones...Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.展开更多
Top-view fisheye cameras are widely used in personnel surveillance for their broad field of view,but their unique imaging characteristics pose challenges like distortion,complex scenes,scale variations,and small objec...Top-view fisheye cameras are widely used in personnel surveillance for their broad field of view,but their unique imaging characteristics pose challenges like distortion,complex scenes,scale variations,and small objects near image edges.To tackle these,we proposed peripheral focus you only look once(PF-YOLO),an enhanced YOLOv8n-based method.Firstly,we introduced a cutting-patch data augmentation strategy to mitigate the problem of insufficient small-object samples in various scenes.Secondly,to enhance the model's focus on small objects near the edges,we designed the peripheral focus loss,which uses dynamic focus coefficients to provide greater gradient gains for these objects,improving their regression accuracy.Finally,we designed the three dimensional(3D)spatial-channel coordinate attention C2f module,enhancing spatial and channel perception,suppressing noise,and improving personnel detection.Experimental results demonstrate that PF-YOLO achieves strong performance on the challenging events for person detection from overhead fisheye images(CEPDTOF)and in-the-wild events for people detection and tracking from overhead fisheye cameras(WEPDTOF)datasets.Compared to the original YOLOv8n model,PFYOLO achieves improvements on CEPDTOF with increases of 2.1%,1.7%and 2.9%in mean average precision 50(mAP 50),mAP 50-95,and tively.On WEPDTOF,PF-YOLO achieves substantial improvements with increases of 31.4%,14.9%,61.1%and 21.0%in 91.2%and 57.2%,respectively.展开更多
To improve small object detection and trajectory estimation from an aerial moving perspective,we propose the Aerial View Attention-PRB(AVA-PRB)model.AVA-PRB integrates two attention mechanisms—Coordinate Attention(CA...To improve small object detection and trajectory estimation from an aerial moving perspective,we propose the Aerial View Attention-PRB(AVA-PRB)model.AVA-PRB integrates two attention mechanisms—Coordinate Attention(CA)and the Convolutional Block Attention Module(CBAM)—to enhance detection accuracy.Additionally,Shape-IoU is employed as the loss function to refine localization precision.Our model further incorporates an adaptive feature fusion mechanism,which optimizes multi-scale object representation,ensuring robust tracking in complex aerial environments.We evaluate the performance of AVA-PRB on two benchmark datasets:Aerial Person Detection and VisDrone2019-Det.The model achieves 60.9%mAP@0.5 on the Aerial Person Detection dataset,and 51.2%mAP@0.5 on VisDrone2019-Det,demonstrating its effectiveness in aerial object detection.Beyond detection,we propose a novel trajectory estimation method that improves movement path prediction under aerial motion.Experimental results indicate that our approach reduces path deviation by up to 64%,effectively mitigating errors caused by rapid camera movements and background variations.By optimizing feature extraction and enhancing spatialtemporal coherence,our method significantly improves object tracking under aerial moving perspectives.This research addresses the limitations of fixed-camera tracking,enhancing flexibility and accuracy in aerial tracking applications.The proposed approach has broad potential for real-world applications,including surveillance,traffic monitoring,and environmental observation.展开更多
AIM:To compare objective dry retinoscopy and subjective refraction measurements in patients with mild keratoconus(KCN)and quantify any differences.METHODS:This cross-sectional study was done on 68 eyes of 68 patients ...AIM:To compare objective dry retinoscopy and subjective refraction measurements in patients with mild keratoconus(KCN)and quantify any differences.METHODS:This cross-sectional study was done on 68 eyes of 68 patients diagnosed with mild KCN.Objective dry retinoscopy using autorefractometer and subjective refraction measurements were performed.Sphere,cylinder,J0,J45,and spherical equivalent values were compared between the two techniques.RESULTS:The mean age of 68 patients with mild KCN was 21.32±5.03y(12–35y).There were 37(54.4%)males.Objective refraction yielded significantly more myopic sphere(-1.44 D vs-0.57 D),higher cylinder magnitude(-2.24 D vs-1.48 D),and more myopic spherical equivalent(-2.56 D vs-1.31 D)compared to subjective refraction(all P<0.05).The mean differences were-0.87 D for sphere,-0.76 D for cylinder,and-1.25 D for spherical equivalent.No significant differences were found for J0 and J45 values,indicating agreement in astigmatism axis(P>0.05).CONCLUSION:In patients with mild KCN,objective dry retinoscopy overestimates the degree of myopia and astigmatism compared to subjective refraction.The irregular cornea in KCN likely impacts objective measurements.Subjective refraction allows compensation for irregularity,providing a more accurate correction.When determining refractive targets,the tendency of objective methods to overcorrect should be considered.展开更多
In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the propos...In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.展开更多
Three-dimensional(3D)object detection is crucial for applications such as robotic control and autonomous driving.While high-precision sensors like LiDAR are expensive,RGB-D sensors(e.g.,Kinect)offer a cost-effective a...Three-dimensional(3D)object detection is crucial for applications such as robotic control and autonomous driving.While high-precision sensors like LiDAR are expensive,RGB-D sensors(e.g.,Kinect)offer a cost-effective alternative,especially for indoor environments.However,RGB-D sensors still face limitations in accuracy and depth perception.This paper proposes an enhanced method that integrates attention-driven YOLOv9 with xLSTM into the F-ConvNet framework.By improving the precision of 2D bounding boxes generated for 3D object detection,this method addresses issues in indoor environments with complex structures and occlusions.The proposed approach enhances detection accuracy and robustness by combining RGB images and depth data,offering improved indoor 3D object detection performance.展开更多
Transorbital craniocerebral injury is a relatively rare type of penetrating head injury that poses a significant threat to the ocular and cerebral structures.^([1])The clinical prognosis of transorbital craniocerebral...Transorbital craniocerebral injury is a relatively rare type of penetrating head injury that poses a significant threat to the ocular and cerebral structures.^([1])The clinical prognosis of transorbital craniocerebral injury is closely related to the size,shape,speed,nature,and trajectory of the foreign object,as well as the incidence of central nervous system damage and secondary complications.The foreign objects reported to have caused these injuries are categorized into wooden items,metallic items,^([2-8])and other materials,which penetrate the intracranial region via fi ve major pathways,including the orbital roof (OR),superior orbital fissure (SOF),inferior orbital fissure(IOF),optic canal (OC),and sphenoid wing.Herein,we present eight cases of transorbital craniocerebral injury caused by an unusual metallic foreign body.展开更多
Detecting oriented targets in remote sensing images amidst complex and heterogeneous backgrounds remains a formidable challenge in the field of object detection.Current frameworks for oriented detection modules are co...Detecting oriented targets in remote sensing images amidst complex and heterogeneous backgrounds remains a formidable challenge in the field of object detection.Current frameworks for oriented detection modules are constrained by intrinsic limitations,including excessive computational and memory overheads,discrepancies between predefined anchors and ground truth bounding boxes,intricate training processes,and feature alignment inconsistencies.To overcome these challenges,we present ASL-OOD(Angle-based SIOU Loss for Oriented Object Detection),a novel,efficient,and robust one-stage framework tailored for oriented object detection.The ASL-OOD framework comprises three core components:the Transformer-based Backbone(TB),the Transformer-based Neck(TN),and the Angle-SIOU(Scylla Intersection over Union)based Decoupled Head(ASDH).By leveraging the Swin Transformer,the TB and TN modules offer several key advantages,such as the capacity to model long-range dependencies,preserve high-resolution feature representations,seamlessly integrate multi-scale features,and enhance parameter efficiency.These improvements empower the model to accurately detect objects across varying scales.The ASDH module further enhances detection performance by incorporating angle-aware optimization based on SIOU,ensuring precise angular consistency and bounding box coherence.This approach effectively harmonizes shape loss and distance loss during the optimization process,thereby significantly boosting detection accuracy.Comprehensive evaluations and ablation studies on standard benchmark datasets such as DOTA with an mAP(mean Average Precision)of 80.16 percent,HRSC2016 with an mAP of 91.07 percent,MAR20 with an mAP of 85.45 percent,and UAVDT with an mAP of 39.7 percent demonstrate the clear superiority of ASL-OOD over state-of-the-art oriented object detection models.These findings underscore the model’s efficacy as an advanced solution for challenging remote sensing object detection tasks.展开更多
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
Knowledge distillation(KD)is an emerging model compression technique for learning compact object detector models.Previous KD often focused solely on distilling from the logits layer or the feature intermediate layers,...Knowledge distillation(KD)is an emerging model compression technique for learning compact object detector models.Previous KD often focused solely on distilling from the logits layer or the feature intermediate layers,which may limit the comprehensive learning of the student network.Additionally,the imbalance between the foreground and background also affects the performance of the model.To address these issues,this paper employs feature-based distillation to enhance the detection performance of the bounding box localization part,and logit-based distillation to improve the detection performance of the category prediction part.Specifically,for the intermediate layer feature distillation,we introduce feature resampling to reduce the risk of the student model merely imitating the teacher model.At the same time,we incorporate a Spatial Attention Mechanism(SAM)to highlight the foreground features learned by the student model.In terms of output layer feature distillation,we divide the traditional distillation targets into target-class objects and non-target-class objects,aiming to improve overall distillation performance.Furthermore,we introduce a one-to-many matching distillation strategy based on Feature Alignment Module(FAM),which further enhances the studentmodel’s feature representation ability,making its feature distribution closer to that of the teacher model,and thus demonstrating superior localization and classification capabilities in object detection tasks.Experimental results demonstrate that our proposedmethodology outperforms conventional distillation techniques in terms of object detecting performance.展开更多
At present,the identification of tropical cyclone remote precipitation(TRP)requires subjective participation,leading to inconsistent results among different researchers despite adopting the same identification standar...At present,the identification of tropical cyclone remote precipitation(TRP)requires subjective participation,leading to inconsistent results among different researchers despite adopting the same identification standard.Thus,establishing an objective identification method is greatly important.In this study,an objective synoptic analysis technique for TRP(OSAT_TRP)is proposed to identify TRP using daily precipitation datasets,historical tropical cyclone(TC)track data,and the ERA5 reanalysis data.This method includes three steps:first,independent rain belts are separated,and those that might relate to TCs'remote effects are distinguished according to their distance from the TCs.Second,the strong water vapor transport belt from the TC is identified using integrated horizontal water vapor transport(IVT).Third,TRP is distinguished by connecting the first two steps.The TRP obtained through this method can satisfy three criteria,as follows:1)the precipitation occurs outside the circulation of TCs,2)the precipitation is affected by TCs,and 3)a gap exists between the TRP and TC rain belt.Case diagnosis analysis,compared with subjective TRP results and backward trajectory analyses using HYSPLIT,indicates that OSAT_TRP can distinguish TRP even when multiple TCs in the Northwest Pacific are involved.Then,we applied the OSAT_TRP to select typical TRPs and obtained the synoptic-scale environments of the TRP through composite analysis.展开更多
At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of...At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of missed and false detections. Effectively optimizing features to capture key information and better integrating different levels of features to enhance their complementarity are two significant challenges in the domain of SOD. In response to these challenges, this study proposes a novel SOD method based on multi-strategy feature optimization. We propose the multi-size feature extraction module (MSFEM), which uses the attention mechanism, the multi-level feature fusion, and the residual block to obtain finer features. This module provides robust support for the subsequent accurate detection of the salient object. In addition, we use two rounds of feature fusion and the feedback mechanism to optimize the features obtained by the MSFEM to improve detection accuracy. The first round of feature fusion is applied to integrate the features extracted by the MSFEM to obtain more refined features. Subsequently, the feedback mechanism and the second round of feature fusion are applied to refine the features, thereby providing a stronger foundation for accurately detecting salient objects. To improve the fusion effect, we propose the feature enhancement module (FEM) and the feature optimization module (FOM). The FEM integrates the upper and lower features with the optimized features obtained by the FOM to enhance feature complementarity. The FOM uses different receptive fields, the attention mechanism, and the residual block to more effectively capture key information. Experimental results demonstrate that our method outperforms 10 state-of-the-art SOD methods.展开更多
The forecast results of temperature based on the intelligent grids of the Central Meteorological Observatory and the meteorological bureau of the autonomous region and the numerical forecast model of the European Cent...The forecast results of temperature based on the intelligent grids of the Central Meteorological Observatory and the meteorological bureau of the autonomous region and the numerical forecast model of the European Center(EC model)from February to December in 2022 were used.Based on the data of the national intelligent grid forecast,the intelligent grid forecast of the regional bureau,EC model,etc.,temperature was predicted.According to the research of the grid point forecast synthesis algorithm with the highest accuracy rate in the recent three days,the temperature grid point correction was conducted in two forms of stations and grids.In order to reduce the deviation caused by the seasonal system temperature difference,a temperature prediction model was established by using the rolling forecast errors of 5,10,15,20,25 and 30 d as the basis data.The verification and evaluation of objective correction results show that the accuracy rate of temperature forecast by the intelligent grid of the regional bureau,the national intelligent grid,and EC model could be increased by 10%,8%,and 12%,respectively.展开更多
The YOLO(You Only Look Once)series,a leading single-stage object detection framework,has gained significant prominence in medical-image analysis due to its real-time efficiency and robust performance.Recent iterations...The YOLO(You Only Look Once)series,a leading single-stage object detection framework,has gained significant prominence in medical-image analysis due to its real-time efficiency and robust performance.Recent iterations of YOLO have further enhanced its accuracy and reliability in critical clinical tasks such as tumor detection,lesion segmentation,and microscopic image analysis,thereby accelerating the development of clinical decision support systems.This paper systematically reviews advances in YOLO-based medical object detection from 2018 to 2024.It compares YOLO’s performance with othermodels(e.g.,Faster R-CNN,RetinaNet)inmedical contexts,summarizes standard evaluation metrics(e.g.,mean Average Precision(mAP),sensitivity),and analyzes hardware deployment strategies using public datasets such as LUNA16,BraTS,andCheXpert.Thereviewhighlights the impressive performance of YOLO models,particularly from YOLOv5 to YOLOv8,in achieving high precision(up to 99.17%),sensitivity(up to 97.5%),and mAP exceeding 95%in tasks such as lung nodule,breast cancer,and polyp detection.These results demonstrate the significant potential of YOLO models for early disease detection and real-time clinical applications,indicating their ability to enhance clinical workflows.However,the study also identifies key challenges,including high small-object miss rates,limited generalization in low-contrast images,scarcity of annotated data,and model interpretability issues.Finally,the potential future research directions are also proposed to address these challenges and further advance the application of YOLO models in healthcare.展开更多
基金supported by the Shanghai Science and Technology Innovation Action Plan High-Tech Field Project(Grant No.22511100601)for the year 2022 and Technology Development Fund for People’s Livelihood Research(Research on Transmission Line Deep Foundation Pit Environmental Situation Awareness System Based on Multi-Source Data).
文摘To maintain the reliability of power systems,routine inspections using drones equipped with advanced object detection algorithms are essential for preempting power-related issues.The increasing resolution of drone-captured images has posed a challenge for traditional target detection methods,especially in identifying small objects in high-resolution images.This study presents an enhanced object detection algorithm based on the Faster Regionbased Convolutional Neural Network(Faster R-CNN)framework,specifically tailored for detecting small-scale electrical components like insulators,shock hammers,and screws in transmission line.The algorithm features an improved backbone network for Faster R-CNN,which significantly boosts the feature extraction network’s ability to detect fine details.The Region Proposal Network is optimized using a method of guided feature refinement(GFR),which achieves a balance between accuracy and speed.The incorporation of Generalized Intersection over Union(GIOU)and Region of Interest(ROI)Align further refines themodel’s accuracy.Experimental results demonstrate a notable improvement in mean Average Precision,reaching 89.3%,an 11.1%increase compared to the standard Faster R-CNN.This highlights the effectiveness of the proposed algorithm in identifying electrical components in high-resolution aerial images.
基金Projects(U22B2084,52275483,52075142)supported by the National Natural Science Foundation of ChinaProject(2023ZY01050)supported by the Ministry of Industry and Information Technology High Quality Development,China。
文摘The gears of new energy vehicles are required to withstand higher rotational speeds and greater loads,which puts forward higher precision essentials for gear manufacturing.However,machining process parameters can cause changes in cutting force/heat,resulting in affecting gear machining precision.Therefore,this paper studies the effect of different process parameters on gear machining precision.A multi-objective optimization model is established for the relationship between process parameters and tooth surface deviations,tooth profile deviations,and tooth lead deviations through the cutting speed,feed rate,and cutting depth of the worm wheel gear grinding machine.The response surface method(RSM)is used for experimental design,and the corresponding experimental results and optimal process parameters are obtained.Subsequently,gray relational analysis-principal component analysis(GRA-PCA),particle swarm optimization(PSO),and genetic algorithm-particle swarm optimization(GA-PSO)methods are used to analyze the experimental results and obtain different optimal process parameters.The results show that optimal process parameters obtained by the GRA-PCA,PSO,and GA-PSO methods improve the gear machining precision.Moreover,the gear machining precision obtained by GA-PSO is superior to other methods.
文摘Augmented reality(AR)is an emerging dynamic technology that effectively supports education across different levels.The increased use of mobile devices has an even greater impact.As the demand for AR applications in education continues to increase,educators actively seek innovative and immersive methods to engage students in learning.However,exploring these possibilities also entails identifying and overcoming existing barriers to optimal educational integration.Concurrently,this surge in demand has prompted the identification of specific barriers,one of which is three-dimensional(3D)modeling.Creating 3D objects for augmented reality education applications can be challenging and time-consuming for the educators.To address this,we have developed a pipeline that creates realistic 3D objects from the two-dimensional(2D)photograph.Applications for augmented and virtual reality can then utilize these created 3D objects.We evaluated the proposed pipeline based on the usability of the 3D object and performance metrics.Quantitatively,with 117 respondents,the co-creation team was surveyed with openended questions to evaluate the precision of the 3D object created by the proposed photogrammetry pipeline.We analyzed the survey data using descriptive-analytical methods and found that the proposed pipeline produces 3D models that are positively accurate when compared to real-world objects,with an average mean score above 8.This study adds new knowledge in creating 3D objects for augmented reality applications by using the photogrammetry technique;finally,it discusses potential problems and future research directions for 3D objects in the education sector.
文摘The subcortical visual pathway is generally thought to be involved in dangerous information processing,such as fear processing and defensive behavior.A recent study,published in Human Brain Mapping,shows a new function of the subcortical pathway involved in the fast processing of non-emotional object perception.Rapid object processing is a critical function of visual system.Topological perception theory proposes that the initial perception of objects begins with the extraction of topological property(TP).However,the mechanism of rapid TP processing remains unclear.The researchers investigated the subcortical mechanism of TP processing with transcranial magnetic stimulation(TMS).They find that a subcortical magnocellular pathway is responsible for the early processing of TP,and this subcortical processing of TP accelerates object recognition.Based on their findings,we propose a novel training approach called subcortical magnocellular pathway training(SMPT),aimed at improving the efficiency of the subcortical M pathway to restore visual and attentional functions in disorders associated with subcortical pathway dysfunction.
基金supported in part by the National Science Foundation of China(52371372)the Project of Science and Technology Commission of Shanghai Municipality,China(22JC1401400,21190780300)the 111 Project,China(D18003)
文摘Dear Editor,This letter focuses on the fact that small objects with few pixels disappear in feature maps with large receptive fields, as the network deepens, in object detection tasks. Therefore, the detection of dense small objects is challenging.
文摘To investigate the applicability of four commonly used color difference formulas(CIELAB,CIE94,CMC(1:1),and CIEDE2000)in the printing field on 3D objects,as well as the impact of four standard light sources(D65,D50,A,and TL84)on 3D color difference evaluations,50 glossy spheres with a diameter of 2cm based on the Sailner J4003D color printing device were created.These spheres were centered around the five recommended colors(gray,red,yellow,green,and blue)by CIE.Color difference was calculated according to the four formulas,and 111 pairs of experimental samples meeting the CIELAB gray scale color difference requirements(1.0-14.0)were selected.Ten observers,aged between 22 and 27 with normal color vision,were participated in this study,using the gray scale method from psychophysical experiments to conduct color difference evaluations under the four light sources,with repeated experiments for each observer.The results indicated that the overall effect of the D65 light source on 3D objects color difference was minimal.In contrast,D50 and A light sources had a significant impact within the small color difference range,while the TL84 light source influenced both large and small color difference considerably.Among the four color difference formulas,CIEDE2000 demonstrated the best predictive performance for color difference in 3D objects,followed by CMC(1:1),CIE94,and CIELAB.
基金supported by the National Natural Science Foundation of China(Nos.62276204 and 62203343)the Fundamental Research Funds for the Central Universities(No.YJSJ24011)+1 种基金the Natural Science Basic Research Program of Shanxi,China(Nos.2022JM-340 and 2023-JC-QN-0710)the China Postdoctoral Science Foundation(Nos.2020T130494 and 2018M633470).
文摘Drone-based small object detection is of great significance in practical applications such as military actions, disaster rescue, transportation, etc. However, the severe scale differences in objects captured by drones and lack of detail information for small-scale objects make drone-based small object detection a formidable challenge. To address these issues, we first develop a mathematical model to explore how changing receptive fields impacts the polynomial fitting results. Subsequently, based on the obtained conclusions, we propose a simple but effective Hybrid Receptive Field Network (HRFNet), whose modules include Hybrid Feature Augmentation (HFA), Hybrid Feature Pyramid (HFP) and Dual Scale Head (DSH). Specifically, HFA employs parallel dilated convolution kernels of different sizes to extend shallow features with different receptive fields, committed to improving the multi-scale adaptability of the network;HFP enhances the perception of small objects by capturing contextual information across layers, while DSH reconstructs the original prediction head utilizing a set of high-resolution features and ultrahigh-resolution features. In addition, in order to train HRFNet, the corresponding dual-scale loss function is designed. Finally, comprehensive evaluation results on public benchmarks such as VisDrone-DET and TinyPerson demonstrate the robustness of the proposed method. Most impressively, the proposed HRFNet achieves a mAP of 51.0 on VisDrone-DET with 29.3 M parameters, which outperforms the extant state-of-the-art detectors. HRFNet also performs excellently in complex scenarios captured by drones, achieving the best performance on the CS-Drone dataset we built.
基金supported by National Natural Science Foundation of China(Nos.62171042,62102033,U24A20331)the R&D Program of Beijing Municipal Education Commission(No.KZ202211417048)+2 种基金the Project of Construction and Support for High-Level Innovative Teams of Beijing Municipal Institutions(No.BPHR20220121)Beijing Natural Science Foundation(Nos.4232026,4242020)the Academic Research Projects of Beijing Union University(Nos.ZKZD202302,ZK20202403)。
文摘Top-view fisheye cameras are widely used in personnel surveillance for their broad field of view,but their unique imaging characteristics pose challenges like distortion,complex scenes,scale variations,and small objects near image edges.To tackle these,we proposed peripheral focus you only look once(PF-YOLO),an enhanced YOLOv8n-based method.Firstly,we introduced a cutting-patch data augmentation strategy to mitigate the problem of insufficient small-object samples in various scenes.Secondly,to enhance the model's focus on small objects near the edges,we designed the peripheral focus loss,which uses dynamic focus coefficients to provide greater gradient gains for these objects,improving their regression accuracy.Finally,we designed the three dimensional(3D)spatial-channel coordinate attention C2f module,enhancing spatial and channel perception,suppressing noise,and improving personnel detection.Experimental results demonstrate that PF-YOLO achieves strong performance on the challenging events for person detection from overhead fisheye images(CEPDTOF)and in-the-wild events for people detection and tracking from overhead fisheye cameras(WEPDTOF)datasets.Compared to the original YOLOv8n model,PFYOLO achieves improvements on CEPDTOF with increases of 2.1%,1.7%and 2.9%in mean average precision 50(mAP 50),mAP 50-95,and tively.On WEPDTOF,PF-YOLO achieves substantial improvements with increases of 31.4%,14.9%,61.1%and 21.0%in 91.2%and 57.2%,respectively.
基金funded by theNational Science and TechnologyCouncil(NSTC),Taiwan,under grant numbers NSTC 113-2634-F-A49-007 and NSTC 112-2634-F-A49-007.
文摘To improve small object detection and trajectory estimation from an aerial moving perspective,we propose the Aerial View Attention-PRB(AVA-PRB)model.AVA-PRB integrates two attention mechanisms—Coordinate Attention(CA)and the Convolutional Block Attention Module(CBAM)—to enhance detection accuracy.Additionally,Shape-IoU is employed as the loss function to refine localization precision.Our model further incorporates an adaptive feature fusion mechanism,which optimizes multi-scale object representation,ensuring robust tracking in complex aerial environments.We evaluate the performance of AVA-PRB on two benchmark datasets:Aerial Person Detection and VisDrone2019-Det.The model achieves 60.9%mAP@0.5 on the Aerial Person Detection dataset,and 51.2%mAP@0.5 on VisDrone2019-Det,demonstrating its effectiveness in aerial object detection.Beyond detection,we propose a novel trajectory estimation method that improves movement path prediction under aerial motion.Experimental results indicate that our approach reduces path deviation by up to 64%,effectively mitigating errors caused by rapid camera movements and background variations.By optimizing feature extraction and enhancing spatialtemporal coherence,our method significantly improves object tracking under aerial moving perspectives.This research addresses the limitations of fixed-camera tracking,enhancing flexibility and accuracy in aerial tracking applications.The proposed approach has broad potential for real-world applications,including surveillance,traffic monitoring,and environmental observation.
文摘AIM:To compare objective dry retinoscopy and subjective refraction measurements in patients with mild keratoconus(KCN)and quantify any differences.METHODS:This cross-sectional study was done on 68 eyes of 68 patients diagnosed with mild KCN.Objective dry retinoscopy using autorefractometer and subjective refraction measurements were performed.Sphere,cylinder,J0,J45,and spherical equivalent values were compared between the two techniques.RESULTS:The mean age of 68 patients with mild KCN was 21.32±5.03y(12–35y).There were 37(54.4%)males.Objective refraction yielded significantly more myopic sphere(-1.44 D vs-0.57 D),higher cylinder magnitude(-2.24 D vs-1.48 D),and more myopic spherical equivalent(-2.56 D vs-1.31 D)compared to subjective refraction(all P<0.05).The mean differences were-0.87 D for sphere,-0.76 D for cylinder,and-1.25 D for spherical equivalent.No significant differences were found for J0 and J45 values,indicating agreement in astigmatism axis(P>0.05).CONCLUSION:In patients with mild KCN,objective dry retinoscopy overestimates the degree of myopia and astigmatism compared to subjective refraction.The irregular cornea in KCN likely impacts objective measurements.Subjective refraction allows compensation for irregularity,providing a more accurate correction.When determining refractive targets,the tendency of objective methods to overcorrect should be considered.
基金supported by the Natural Science Foundation of China (No.62103298)the South African National Research Foundation (Nos.132797 and 137951)。
文摘In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.
文摘Three-dimensional(3D)object detection is crucial for applications such as robotic control and autonomous driving.While high-precision sensors like LiDAR are expensive,RGB-D sensors(e.g.,Kinect)offer a cost-effective alternative,especially for indoor environments.However,RGB-D sensors still face limitations in accuracy and depth perception.This paper proposes an enhanced method that integrates attention-driven YOLOv9 with xLSTM into the F-ConvNet framework.By improving the precision of 2D bounding boxes generated for 3D object detection,this method addresses issues in indoor environments with complex structures and occlusions.The proposed approach enhances detection accuracy and robustness by combining RGB images and depth data,offering improved indoor 3D object detection performance.
文摘Transorbital craniocerebral injury is a relatively rare type of penetrating head injury that poses a significant threat to the ocular and cerebral structures.^([1])The clinical prognosis of transorbital craniocerebral injury is closely related to the size,shape,speed,nature,and trajectory of the foreign object,as well as the incidence of central nervous system damage and secondary complications.The foreign objects reported to have caused these injuries are categorized into wooden items,metallic items,^([2-8])and other materials,which penetrate the intracranial region via fi ve major pathways,including the orbital roof (OR),superior orbital fissure (SOF),inferior orbital fissure(IOF),optic canal (OC),and sphenoid wing.Herein,we present eight cases of transorbital craniocerebral injury caused by an unusual metallic foreign body.
基金supported by the Key Research and Development Program of Shaanxi Province(2024GX-YBXM-010).
文摘Detecting oriented targets in remote sensing images amidst complex and heterogeneous backgrounds remains a formidable challenge in the field of object detection.Current frameworks for oriented detection modules are constrained by intrinsic limitations,including excessive computational and memory overheads,discrepancies between predefined anchors and ground truth bounding boxes,intricate training processes,and feature alignment inconsistencies.To overcome these challenges,we present ASL-OOD(Angle-based SIOU Loss for Oriented Object Detection),a novel,efficient,and robust one-stage framework tailored for oriented object detection.The ASL-OOD framework comprises three core components:the Transformer-based Backbone(TB),the Transformer-based Neck(TN),and the Angle-SIOU(Scylla Intersection over Union)based Decoupled Head(ASDH).By leveraging the Swin Transformer,the TB and TN modules offer several key advantages,such as the capacity to model long-range dependencies,preserve high-resolution feature representations,seamlessly integrate multi-scale features,and enhance parameter efficiency.These improvements empower the model to accurately detect objects across varying scales.The ASDH module further enhances detection performance by incorporating angle-aware optimization based on SIOU,ensuring precise angular consistency and bounding box coherence.This approach effectively harmonizes shape loss and distance loss during the optimization process,thereby significantly boosting detection accuracy.Comprehensive evaluations and ablation studies on standard benchmark datasets such as DOTA with an mAP(mean Average Precision)of 80.16 percent,HRSC2016 with an mAP of 91.07 percent,MAR20 with an mAP of 85.45 percent,and UAVDT with an mAP of 39.7 percent demonstrate the clear superiority of ASL-OOD over state-of-the-art oriented object detection models.These findings underscore the model’s efficacy as an advanced solution for challenging remote sensing object detection tasks.
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
基金funded by National Natural Science Foundation of China(61603245).
文摘Knowledge distillation(KD)is an emerging model compression technique for learning compact object detector models.Previous KD often focused solely on distilling from the logits layer or the feature intermediate layers,which may limit the comprehensive learning of the student network.Additionally,the imbalance between the foreground and background also affects the performance of the model.To address these issues,this paper employs feature-based distillation to enhance the detection performance of the bounding box localization part,and logit-based distillation to improve the detection performance of the category prediction part.Specifically,for the intermediate layer feature distillation,we introduce feature resampling to reduce the risk of the student model merely imitating the teacher model.At the same time,we incorporate a Spatial Attention Mechanism(SAM)to highlight the foreground features learned by the student model.In terms of output layer feature distillation,we divide the traditional distillation targets into target-class objects and non-target-class objects,aiming to improve overall distillation performance.Furthermore,we introduce a one-to-many matching distillation strategy based on Feature Alignment Module(FAM),which further enhances the studentmodel’s feature representation ability,making its feature distribution closer to that of the teacher model,and thus demonstrating superior localization and classification capabilities in object detection tasks.Experimental results demonstrate that our proposedmethodology outperforms conventional distillation techniques in terms of object detecting performance.
基金supported by the Postgraduate Research&Practice Innovation Program of Jiangsu Province(No.KYCX22_1136)the National Natural Scientific Foundation of China(No.42275037)+2 种基金the Basic Research Fund of CAMS(No.2023Z016)the Key Laboratory of South China Sea Meteorological Disaster Prevention and Mitigation of Hainan Province(No.SCSF202202)supported by the Jiangsu Collaborative Innovation Center for Climate Change。
文摘At present,the identification of tropical cyclone remote precipitation(TRP)requires subjective participation,leading to inconsistent results among different researchers despite adopting the same identification standard.Thus,establishing an objective identification method is greatly important.In this study,an objective synoptic analysis technique for TRP(OSAT_TRP)is proposed to identify TRP using daily precipitation datasets,historical tropical cyclone(TC)track data,and the ERA5 reanalysis data.This method includes three steps:first,independent rain belts are separated,and those that might relate to TCs'remote effects are distinguished according to their distance from the TCs.Second,the strong water vapor transport belt from the TC is identified using integrated horizontal water vapor transport(IVT).Third,TRP is distinguished by connecting the first two steps.The TRP obtained through this method can satisfy three criteria,as follows:1)the precipitation occurs outside the circulation of TCs,2)the precipitation is affected by TCs,and 3)a gap exists between the TRP and TC rain belt.Case diagnosis analysis,compared with subjective TRP results and backward trajectory analyses using HYSPLIT,indicates that OSAT_TRP can distinguish TRP even when multiple TCs in the Northwest Pacific are involved.Then,we applied the OSAT_TRP to select typical TRPs and obtained the synoptic-scale environments of the TRP through composite analysis.
文摘At present, salient object detection (SOD) has achieved considerable progress. However, the methods that perform well still face the issue of inadequate detection accuracy. For example, sometimes there are problems of missed and false detections. Effectively optimizing features to capture key information and better integrating different levels of features to enhance their complementarity are two significant challenges in the domain of SOD. In response to these challenges, this study proposes a novel SOD method based on multi-strategy feature optimization. We propose the multi-size feature extraction module (MSFEM), which uses the attention mechanism, the multi-level feature fusion, and the residual block to obtain finer features. This module provides robust support for the subsequent accurate detection of the salient object. In addition, we use two rounds of feature fusion and the feedback mechanism to optimize the features obtained by the MSFEM to improve detection accuracy. The first round of feature fusion is applied to integrate the features extracted by the MSFEM to obtain more refined features. Subsequently, the feedback mechanism and the second round of feature fusion are applied to refine the features, thereby providing a stronger foundation for accurately detecting salient objects. To improve the fusion effect, we propose the feature enhancement module (FEM) and the feature optimization module (FOM). The FEM integrates the upper and lower features with the optimized features obtained by the FOM to enhance feature complementarity. The FOM uses different receptive fields, the attention mechanism, and the residual block to more effectively capture key information. Experimental results demonstrate that our method outperforms 10 state-of-the-art SOD methods.
文摘The forecast results of temperature based on the intelligent grids of the Central Meteorological Observatory and the meteorological bureau of the autonomous region and the numerical forecast model of the European Center(EC model)from February to December in 2022 were used.Based on the data of the national intelligent grid forecast,the intelligent grid forecast of the regional bureau,EC model,etc.,temperature was predicted.According to the research of the grid point forecast synthesis algorithm with the highest accuracy rate in the recent three days,the temperature grid point correction was conducted in two forms of stations and grids.In order to reduce the deviation caused by the seasonal system temperature difference,a temperature prediction model was established by using the rolling forecast errors of 5,10,15,20,25 and 30 d as the basis data.The verification and evaluation of objective correction results show that the accuracy rate of temperature forecast by the intelligent grid of the regional bureau,the national intelligent grid,and EC model could be increased by 10%,8%,and 12%,respectively.
基金supported by the National Natural Science Foundation of China under grant number 62066016the Natural Science Foundation of Hunan Province of China under grant number 2024JJ7395+2 种基金the Scientific Research Project of Education Department of Hunan Province of China under grant number 22B0549International and Regional Science and Technology Cooperation and Exchange Program of the Hunan Association for Science and Technology under grant number 025SKX-KJ-04Hunan Province Undergraduate Innovation and Entrepreneurship Training Program(grant number S202410531015).
文摘The YOLO(You Only Look Once)series,a leading single-stage object detection framework,has gained significant prominence in medical-image analysis due to its real-time efficiency and robust performance.Recent iterations of YOLO have further enhanced its accuracy and reliability in critical clinical tasks such as tumor detection,lesion segmentation,and microscopic image analysis,thereby accelerating the development of clinical decision support systems.This paper systematically reviews advances in YOLO-based medical object detection from 2018 to 2024.It compares YOLO’s performance with othermodels(e.g.,Faster R-CNN,RetinaNet)inmedical contexts,summarizes standard evaluation metrics(e.g.,mean Average Precision(mAP),sensitivity),and analyzes hardware deployment strategies using public datasets such as LUNA16,BraTS,andCheXpert.Thereviewhighlights the impressive performance of YOLO models,particularly from YOLOv5 to YOLOv8,in achieving high precision(up to 99.17%),sensitivity(up to 97.5%),and mAP exceeding 95%in tasks such as lung nodule,breast cancer,and polyp detection.These results demonstrate the significant potential of YOLO models for early disease detection and real-time clinical applications,indicating their ability to enhance clinical workflows.However,the study also identifies key challenges,including high small-object miss rates,limited generalization in low-contrast images,scarcity of annotated data,and model interpretability issues.Finally,the potential future research directions are also proposed to address these challenges and further advance the application of YOLO models in healthcare.