Forests are vital ecosystems that play a crucial role in sustaining life on Earth and supporting human well-being.Traditional forest mapping and monitoring methods are often costly and limited in scope,necessitating t...Forests are vital ecosystems that play a crucial role in sustaining life on Earth and supporting human well-being.Traditional forest mapping and monitoring methods are often costly and limited in scope,necessitating the adoption of advanced,automated approaches for improved forest conservation and management.This study explores the application of deep learning-based object detection techniques for individual tree detection in RGB satellite imagery.A dataset of 3157 images was collected and divided into training(2528),validation(495),and testing(134)sets.To enhance model robustness and generalization,data augmentation was applied to the training part of the dataset.Various YOLO-based models,including YOLOv8,YOLOv9,YOLOv10,YOLOv11,and YOLOv12,were evaluated using different hyperparameters and optimization techniques,such as stochastic gradient descent(SGD)and auto-optimization.These models were assessed in terms of detection accuracy and the number of detected trees.The highest-performing model,YOLOv12m,achieved a mean average precision(mAP@50)of 0.908,mAP@50:95 of 0.581,recall of 0.851,precision of 0.852,and an F1-score of 0.847.The results demonstrate that YOLO-based object detection offers a highly efficient,scalable,and accurate solution for individual tree detection in satellite imagery,facilitating improved forest inventory,monitoring,and ecosystem management.This study underscores the potential of AI-driven tree detection to enhance environmental sustainability and support data-driven decision-making in forestry.展开更多
Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone t...Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.展开更多
Due to the continuous increase in global energy demand,photovoltaic solar energy generation and associated maintenance requirements have significantly expanded.One critical maintenance challenge in photovoltaic instal...Due to the continuous increase in global energy demand,photovoltaic solar energy generation and associated maintenance requirements have significantly expanded.One critical maintenance challenge in photovoltaic installations is detecting hot spots,localized overheating defects in solar cells that drastically reduce efficiency and can lead to permanent damage.Traditional methods for detecting these defects rely on manual inspections using thermal imaging,which are costly,labor-intensive,and impractical for large-scale installations.This research introduces an automated hybrid system based on two specialized convolutional neural networks deployed in a cascaded architecture.The first convolutional neural network efficiently detects and isolates individual solar panels from high-resolution aerial thermal images captured by drones.Subsequently,a second,more advanced convolutional neural network accurately classifies each isolated panel as either defective or healthy,effectively distinguishing genuine thermal anomalies from false positives caused by reflections or glare.Experimental validation on a real-world dataset comprising thousands of thermal images yielded exceptional accuracy,significantly reducing inspection time,costs,and the likelihood of false defect detections.This proposed system enhances the reliability and efficiency of photovoltaic plant inspections,thus contributing to improved operational performance and economic viability.展开更多
In the aerospace field, residual stress directly affects the strength, fatigue life and dimensional stability of thin-walled structural components, and is a key factor to ensure flight safety and reliability. At prese...In the aerospace field, residual stress directly affects the strength, fatigue life and dimensional stability of thin-walled structural components, and is a key factor to ensure flight safety and reliability. At present, research on residual stress at home and abroad mainly focuses on the optimization of traditional detection technology, stress control of manufacturing process and service performance evaluation, among which research on residual stress detection methods mainly focuses on the improvement of the accuracy, sensitivity, reliability and other performance of existing detection methods, but it still faces many challenges such as extremely small detection range, low efficiency, large error and limited application range.展开更多
Vehicular Ad Hoc Networks(VANETs)are central to Intelligent Transportation Systems(ITS),especially for real-time communication involving emergency vehicles.Yet,Distributed Denial of Service(DDoS)attacks can disrupt sa...Vehicular Ad Hoc Networks(VANETs)are central to Intelligent Transportation Systems(ITS),especially for real-time communication involving emergency vehicles.Yet,Distributed Denial of Service(DDoS)attacks can disrupt safety-critical channels and undermine reliability.This paper presents a robust,scalable framework for detecting DDoS attacks in highway VANETs.We construct a new dataset with Network Simulator 3(NS-3)and Simulation of Urban Mobility(SUMO),enriched with real mobility traces from Germany’s A81 highway(OpenStreetMap).Three traffic classes are modeled:DDoS,Voice over IP(VoIP),and Transmission Control Protocol Based(TCP-based)video streaming(VideoTCP).The pipeline includes normalization,feature selection with SHapley Additive exPlanations(SHAP),and class balancing via Synthetic Minority Over-sampling Technique(SMOTE).Eleven classifiers are benchmarked—including eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),Adaptive Boosting(AdaBoost),Gradient Boosting(GB),and an Artificial Neural Network(ANN)—using stratified 5-fold cross-validation.XGBoost,GB,CatBoost and ANN achieve the highest performance(weighted F1-score=97%).To assess robustness under non-ideal conditions,we introduce an adversarial evaluation with packet-loss and traffic-jitter(small-sample deformation);the top models retain strong performance,supporting real-time applicability.Collectively,these results demonstrate that the proposed highway-focused framework is accurate,resilient,and well-suited for deployment in VANET security for emergency communications.展开更多
An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyram...An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.展开更多
This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o...This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.展开更多
Because methane is flammable and explosive,the detection process is time-consuming and dangerous,and it is difficult to obtain labeled data.In order to reduce the dependence on marker data when detecting methane conce...Because methane is flammable and explosive,the detection process is time-consuming and dangerous,and it is difficult to obtain labeled data.In order to reduce the dependence on marker data when detecting methane concentration using tunable diode laser absorption spectroscopy(TDLAS)technology,this paper designs a methane gas acquisition platform based on TDLAS and proposes a methane gas concentration detection model based on semi-supervised learning.Firstly,the methane gas is feature extracted,and then semi-supervised learning is introduced to select the optimal feature combination;subsequently,the traditional whale optimization algorithm is improved to optimize the parameters of the random forest to detect the methane gas concentration.The results show that the model is not only able to select the optimal feature combination under limited labeled data,but also has an accuracy of 94.25%,which is better than the traditional model,and is robust in terms of parameter optimization.展开更多
Introduction Early cancer detection represents a critical evolution in healthcare,addressing a significant pain point in cancer treatment:the tendency for diagnoses to occur at advanced stages.Traditionally,many cance...Introduction Early cancer detection represents a critical evolution in healthcare,addressing a significant pain point in cancer treatment:the tendency for diagnoses to occur at advanced stages.Traditionally,many cancers are not identified until they have progressed to late stages,where treatment options become limited,less effective,and more costly.This late detection results in poorer prognoses,higher mortality rates,and increased healthcare costs.Without early detection tools like Fluorescence In Situ Hybridization(FISH),these challenges persist,leaving patients with fewer opportunities for successful outcomes.展开更多
A measurement system for the scattering characteristics of warhead fragments based on high-speed imaging systems offers advantages such as simple deployment,flexible maneuverability,and high spatiotemporal resolution,...A measurement system for the scattering characteristics of warhead fragments based on high-speed imaging systems offers advantages such as simple deployment,flexible maneuverability,and high spatiotemporal resolution,enabling the acquisition of full-process data of the fragment scattering process.However,mismatches between camera frame rates and target velocities can lead to long motion blur tails of high-speed fragment targets,resulting in low signal-to-noise ratios and rendering conventional detection algorithms ineffective in dynamic strong interference testing environments.In this study,we propose a detection framework centered on dynamic strong interference disturbance signal separation and suppression.We introduce a mixture Gaussian model constrained under a joint spatialtemporal-transform domain Dirichlet process,combined with total variation regularization to achieve disturbance signal suppression.Experimental results demonstrate that the proposed disturbance suppression method can be integrated with certain conventional motion target detection tasks,enabling adaptation to real-world data to a certain extent.Moreover,we provide a specific implementation of this process,which achieves a detection rate close to 100%with an approximate 0%false alarm rate in multiple sets of real target field test data.This research effectively advances the development of the field of damage parameter testing.展开更多
The detection of surface defects in concrete bridges using deep learning is of significant importance for reducing operational risks,saving maintenance costs,and driving the intelligent transformation of bridge defect...The detection of surface defects in concrete bridges using deep learning is of significant importance for reducing operational risks,saving maintenance costs,and driving the intelligent transformation of bridge defect detection.In contrast to the subjective and inefficient manual visual inspection,deep learning-based algorithms for concrete defect detection exhibit remarkable advantages,emerging as a focal point in recent research.This paper comprehensively analyzes the research progress of deep learning algorithms in the field of surface defect detection in concrete bridges in recent years.It introduces the early detection methods for surface defects in concrete bridges and the development of deep learning.Subsequently,it provides an overview of deep learning-based concrete bridge surface defect detection research from three aspects:image classification,object detection,and semantic segmentation.The paper summarizes the strengths and weaknesses of existing methods and the challenges they face.Additionally,it analyzes and prospects the development trends of surface defect detection in concrete bridges.展开更多
Infrared(IR)spectroscopy,a technique within the realm of molecular vibrational spectroscopy,furnishes distinctive chemical signatures pivotal for both structural analysis and compound identification.A notable challeng...Infrared(IR)spectroscopy,a technique within the realm of molecular vibrational spectroscopy,furnishes distinctive chemical signatures pivotal for both structural analysis and compound identification.A notable challenge emerges from the misalignment between the mid-IR light wavelength range and molecular dimensions,culminating in a constrained absorption cross-section and diminished vibrational absorption coefficients(Supplementary data).展开更多
In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the propos...In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.展开更多
Aiming at the problems of low detection efficiency and difficult positioning of traditional steel surface defect detection methods,a lightweight steel surface defect detection model based on you only look once version...Aiming at the problems of low detection efficiency and difficult positioning of traditional steel surface defect detection methods,a lightweight steel surface defect detection model based on you only look once version 7(YOLOv7)is proposed.First,a cascading style sheets(CSS)block module is proposed,which uses more lightweight operations to obtain redundant information in the feature map,reduces the amount of computation,and effectively improves the detection speed.Secondly,the improved spatial pyramid pooling with cross stage partial convolutions(SPPCSPC)structure is adopted to ensure that the model can also pay attention to the defect location information while predicting the defect category information,obtain richer defect features.In addition,the convolution operation in the original model is simplified,which significantly reduces the size of the model and helps to improve the detection speed.Finally,using efficient intersection over union(EIOU)loss to focus on high-quality anchors,speed up convergence and improve positioning accuracy.Experiments were carried out on the Northeastern University-defect(NEU-DET)steel surface defect dataset.Compared with the original YOLOv7 model,the number of parameters of this model was reduced by 40%,the frames per second(FPS)reached 112,and the average accuracy reached 79.1%,the detection accuracy and speed have been improved,which can meet the needs of steel surface defect detection.展开更多
Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm f...Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.展开更多
1 Introduction Sound event detection(SED)aims to identify and locate specific sound event categories and their corresponding timestamps within continuous audio streams.To overcome the limitations posed by the scarcity...1 Introduction Sound event detection(SED)aims to identify and locate specific sound event categories and their corresponding timestamps within continuous audio streams.To overcome the limitations posed by the scarcity of strongly labeled training data,researchers have increasingly turned to semi-supervised learning(SSL)[1],which leverages unlabeled data to augment training and improve detection performance.Among many SSL methods[2-4].展开更多
Fire detection has held stringent importance in computer vision for over half a century.The development of early fire detection strategies is pivotal to the realization of safe and smart cities,inhabitable in the futu...Fire detection has held stringent importance in computer vision for over half a century.The development of early fire detection strategies is pivotal to the realization of safe and smart cities,inhabitable in the future.However,the development of optimal fire and smoke detection models is hindered by limitations like publicly available datasets,lack of diversity,and class imbalance.In this work,we explore the possible ways forward to overcome these challenges posed by available datasets.We study the impact of a class-balanced dataset to improve the fire detection capability of state-of-the-art(SOTA)vision-based models and propose the use of generative models for data augmentation,as a future work direction.First,a comparative analysis of two prominent object detection architectures,You Only Look Once version 7(YOLOv7)and YOLOv8 has been carried out using a balanced dataset,where both models have been evaluated across various evaluation metrics including precision,recall,and mean Average Precision(mAP).The results are compared to other recent fire detection models,highlighting the superior performance and efficiency of the proposed YOLOv8 architecture as trained on our balanced dataset.Next,a fractal dimension analysis gives a deeper insight into the repetition of patterns in fire,and the effectiveness of the results has been demonstrated by a windowing-based inference approach.The proposed Slicing-Aided Hyper Inference(SAHI)improves the fire and smoke detection capability of YOLOv8 for real-life applications with a significantly improved mAP performance over a strict confidence threshold.YOLOv8 with SAHI inference gives a mAP:50-95 improvement of more than 25%compared to the base YOLOv8 model.The study also provides insights into future work direction by exploring the potential of generative models like deep convolutional generative adversarial network(DCGAN)and diffusion models like stable diffusion,for data augmentation.展开更多
6G is desired to support more intelligence networks and this trend attaches importance to the self-healing capability if degradation emerges in the cellular networks.As a primary component of selfhealing networks,faul...6G is desired to support more intelligence networks and this trend attaches importance to the self-healing capability if degradation emerges in the cellular networks.As a primary component of selfhealing networks,fault detection is investigated in this paper.Considering the fast response and low timeand-computational consumption,it is the first time that the Online Broad Learning System(OBLS)is applied to identify outages in cellular networks.In addition,the Automatic-constructed Online Broad Learning System(AOBLS)is put forward to rationalize its structure and consequently avoid over-fitting and under-fitting.Furthermore,a multi-layer classification structure is proposed to further improve the classification performance.To face the challenges caused by imbalanced data in fault detection problems,a novel weighting strategy is derived to achieve the Multilayer Automatic-constructed Weighted Online Broad Learning System(MAWOBLS)and ensemble learning with retrained Support Vector Machine(SVM),denoted as EMAWOBLS,for superior treatment with this imbalance issue.Simulation results show that the proposed algorithm has excellent performance in detecting faults with satisfactory time usage.展开更多
Many applications,including security systems,medical diagnostics,and human-computer interfaces,depend on eye gaze recognition.However,due to factors including individual variations,occlusions,and shifting illumination...Many applications,including security systems,medical diagnostics,and human-computer interfaces,depend on eye gaze recognition.However,due to factors including individual variations,occlusions,and shifting illumination conditions,real-world scenarios continue to provide difficulties for accurate and consistent eye gaze recognition.This work is aimed at investigating the potential benefits of employing transfer learning to improve eye gaze detection ability and efficiency.Transfer learning is the process of fine-tuning pre-trained models on smaller,domain-specific datasets after they have been trained on larger datasets.We study several transfer learning algorithms and evaluate their effectiveness on eye gaze identification,including both Regression and Classification tasks,using a range of deep learning architectures,namely AlexNet,Visual Geometry Group(VGG),InceptionV3,and ResNet.In this study,we evaluate the effectiveness of transfer learning-basedmodels against models that were trained fromscratch using eye-gazing datasets on grounds of various performance and loss metrics such as Precision,Accuracy,and Mean Absolute Error.We investigate the effects of different pre-trainedmodels,dataset sizes,and domain gaps on the transfer learning process,and the findings of our study clarify the efficacy of transfer learning for eye gaze detection and offer suggestions for the most successful transfer learning strategies to apply in real-world situations.展开更多
Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to signi...Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to significant challenges,one of the most pressing being cyberbullying.This issue has become a major concern in modern society,particularly due to its profound negative impacts on the mental health and well-being of its victims.In the Arab world,where social media usage is exceptionblly high,cyberbullying has become increasingly prevalent,necessitating urgent attention.Early detection of harmful online behavior is critical to fostering safer digital environments and mitigating the adverse efcts of cyberbullying.This underscores the importance of developing advanced tools and systems to identify and address such behavior efectively.This paper investigates the development of a robust cyberbullying detection and classifcation system tailored for Arabic comments on YouTube.The study explores the efectiveness of various deep learning models,including Bi-LSTM(Bidirectional Long Short Term Memory),LSTM(Long Short-Term Memory),CNN(Convolutional Neural Networks),and a hybrid CNN-LSTM,in classifying Arabic comments into binary classes(bullying or not)and multiclass categories.A comprehensive dataset of 20,000 Arabic YouTube comments was collected,preprocessed,and labeled to support these tasks.The results revealed that the CNN and hybrid CNN-LSTM models achieved the highest accuracy in binary classification,reaching an impressive 91.9%.For multiclass dlassification,the LSTM and Bi-LSTM models outperformed others,achieving an accuracy of 89.5%.These findings highlight the efctiveness of deep learning approaches in the mitigation of cyberbullying within Arabic online communities.展开更多
基金funding from Horizon Europe Framework Programme(HORIZON),call Teaming for Excellence(HORIZON-WIDERA-2022-ACCESS-01-two-stage)-Creation of the centre of excellence in smart forestry“Forest 4.0”No.101059985funded by the EuropeanUnion under the project FOREST 4.0-“Ekscelencijos centras tvariai miško bioekonomikai vystyti”No.10-042-P-0002.
文摘Forests are vital ecosystems that play a crucial role in sustaining life on Earth and supporting human well-being.Traditional forest mapping and monitoring methods are often costly and limited in scope,necessitating the adoption of advanced,automated approaches for improved forest conservation and management.This study explores the application of deep learning-based object detection techniques for individual tree detection in RGB satellite imagery.A dataset of 3157 images was collected and divided into training(2528),validation(495),and testing(134)sets.To enhance model robustness and generalization,data augmentation was applied to the training part of the dataset.Various YOLO-based models,including YOLOv8,YOLOv9,YOLOv10,YOLOv11,and YOLOv12,were evaluated using different hyperparameters and optimization techniques,such as stochastic gradient descent(SGD)and auto-optimization.These models were assessed in terms of detection accuracy and the number of detected trees.The highest-performing model,YOLOv12m,achieved a mean average precision(mAP@50)of 0.908,mAP@50:95 of 0.581,recall of 0.851,precision of 0.852,and an F1-score of 0.847.The results demonstrate that YOLO-based object detection offers a highly efficient,scalable,and accurate solution for individual tree detection in satellite imagery,facilitating improved forest inventory,monitoring,and ecosystem management.This study underscores the potential of AI-driven tree detection to enhance environmental sustainability and support data-driven decision-making in forestry.
文摘Modern manufacturing processes have become more reliant on automation because of the accelerated transition from Industry 3.0 to Industry 4.0.Manual inspection of products on assembly lines remains inefficient,prone to errors and lacks consistency,emphasizing the need for a reliable and automated inspection system.Leveraging both object detection and image segmentation approaches,this research proposes a vision-based solution for the detection of various kinds of tools in the toolkit using deep learning(DL)models.Two Intel RealSense D455f depth cameras were arranged in a top down configuration to capture both RGB and depth images of the toolkits.After applying multiple constraints and enhancing them through preprocessing and augmentation,a dataset consisting of 3300 annotated RGB-D photos was generated.Several DL models were selected through a comprehensive assessment of mean Average Precision(mAP),precision-recall equilibrium,inference latency(target≥30 FPS),and computational burden,resulting in a preference for YOLO and Region-based Convolutional Neural Networks(R-CNN)variants over ViT-based models due to the latter’s increased latency and resource requirements.YOLOV5,YOLOV8,YOLOV11,Faster R-CNN,and Mask R-CNN were trained on the annotated dataset and evaluated using key performance metrics(Recall,Accuracy,F1-score,and Precision).YOLOV11 demonstrated balanced excellence with 93.0%precision,89.9%recall,and a 90.6%F1-score in object detection,as well as 96.9%precision,95.3%recall,and a 96.5%F1-score in instance segmentation with an average inference time of 25 ms per frame(≈40 FPS),demonstrating real-time performance.Leveraging these results,a YOLOV11-based windows application was successfully deployed in a real-time assembly line environment,where it accurately processed live video streams to detect and segment tools within toolkits,demonstrating its practical effectiveness in industrial automation.The application is capable of precisely measuring socket dimensions by utilising edge detection techniques on YOLOv11 segmentation masks,in addition to detection and segmentation.This makes it possible to do specification-level quality control right on the assembly line,which improves the ability to examine things in real time.The implementation is a big step forward for intelligent manufacturing in the Industry 4.0 paradigm.It provides a scalable,efficient,and accurate way to do automated inspection and dimensional verification activities.
基金funded by the Spanish Ministerio de Ciencia,Innovación y Universidades,grant number RTC2019-007364-3(FPGM)by the Comunidad de Madrid through the direct grant with ref.SI4/PJI/2024-00233 for the promotion of research and technology transfer at the Universidad Autónoma de Madrid。
文摘Due to the continuous increase in global energy demand,photovoltaic solar energy generation and associated maintenance requirements have significantly expanded.One critical maintenance challenge in photovoltaic installations is detecting hot spots,localized overheating defects in solar cells that drastically reduce efficiency and can lead to permanent damage.Traditional methods for detecting these defects rely on manual inspections using thermal imaging,which are costly,labor-intensive,and impractical for large-scale installations.This research introduces an automated hybrid system based on two specialized convolutional neural networks deployed in a cascaded architecture.The first convolutional neural network efficiently detects and isolates individual solar panels from high-resolution aerial thermal images captured by drones.Subsequently,a second,more advanced convolutional neural network accurately classifies each isolated panel as either defective or healthy,effectively distinguishing genuine thermal anomalies from false positives caused by reflections or glare.Experimental validation on a real-world dataset comprising thousands of thermal images yielded exceptional accuracy,significantly reducing inspection time,costs,and the likelihood of false defect detections.This proposed system enhances the reliability and efficiency of photovoltaic plant inspections,thus contributing to improved operational performance and economic viability.
文摘In the aerospace field, residual stress directly affects the strength, fatigue life and dimensional stability of thin-walled structural components, and is a key factor to ensure flight safety and reliability. At present, research on residual stress at home and abroad mainly focuses on the optimization of traditional detection technology, stress control of manufacturing process and service performance evaluation, among which research on residual stress detection methods mainly focuses on the improvement of the accuracy, sensitivity, reliability and other performance of existing detection methods, but it still faces many challenges such as extremely small detection range, low efficiency, large error and limited application range.
文摘Vehicular Ad Hoc Networks(VANETs)are central to Intelligent Transportation Systems(ITS),especially for real-time communication involving emergency vehicles.Yet,Distributed Denial of Service(DDoS)attacks can disrupt safety-critical channels and undermine reliability.This paper presents a robust,scalable framework for detecting DDoS attacks in highway VANETs.We construct a new dataset with Network Simulator 3(NS-3)and Simulation of Urban Mobility(SUMO),enriched with real mobility traces from Germany’s A81 highway(OpenStreetMap).Three traffic classes are modeled:DDoS,Voice over IP(VoIP),and Transmission Control Protocol Based(TCP-based)video streaming(VideoTCP).The pipeline includes normalization,feature selection with SHapley Additive exPlanations(SHAP),and class balancing via Synthetic Minority Over-sampling Technique(SMOTE).Eleven classifiers are benchmarked—including eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),Adaptive Boosting(AdaBoost),Gradient Boosting(GB),and an Artificial Neural Network(ANN)—using stratified 5-fold cross-validation.XGBoost,GB,CatBoost and ANN achieve the highest performance(weighted F1-score=97%).To assess robustness under non-ideal conditions,we introduce an adversarial evaluation with packet-loss and traffic-jitter(small-sample deformation);the top models retain strong performance,supporting real-time applicability.Collectively,these results demonstrate that the proposed highway-focused framework is accurate,resilient,and well-suited for deployment in VANET security for emergency communications.
基金supported by the National Natural Science Foundation of China(No.62241109)the Tianjin Science and Technology Commissioner Project(No.20YDTPJC01110)。
文摘An improved model based on you only look once version 8(YOLOv8)is proposed to solve the problem of low detection accuracy due to the diversity of object sizes in optical remote sensing images.Firstly,the feature pyramid network(FPN)structure of the original YOLOv8 mode is replaced by the generalized-FPN(GFPN)structure in GiraffeDet to realize the"cross-layer"and"cross-scale"adaptive feature fusion,to enrich the semantic information and spatial information on the feature map to improve the target detection ability of the model.Secondly,a pyramid-pool module of multi atrous spatial pyramid pooling(MASPP)is designed by using the idea of atrous convolution and feature pyramid structure to extract multi-scale features,so as to improve the processing ability of the model for multi-scale objects.The experimental results show that the detection accuracy of the improved YOLOv8 model on DIOR dataset is 92%and mean average precision(mAP)is 87.9%,respectively 3.5%and 1.7%higher than those of the original model.It is proved the detection and classification ability of the proposed model on multi-dimensional optical remote sensing target has been improved.
基金funded by Woosong University Academic Research 2024.
文摘This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.
基金supported by the Ministry of Education Chunhui Program of China(No.HZKY20220304).
文摘Because methane is flammable and explosive,the detection process is time-consuming and dangerous,and it is difficult to obtain labeled data.In order to reduce the dependence on marker data when detecting methane concentration using tunable diode laser absorption spectroscopy(TDLAS)technology,this paper designs a methane gas acquisition platform based on TDLAS and proposes a methane gas concentration detection model based on semi-supervised learning.Firstly,the methane gas is feature extracted,and then semi-supervised learning is introduced to select the optimal feature combination;subsequently,the traditional whale optimization algorithm is improved to optimize the parameters of the random forest to detect the methane gas concentration.The results show that the model is not only able to select the optimal feature combination under limited labeled data,but also has an accuracy of 94.25%,which is better than the traditional model,and is robust in terms of parameter optimization.
基金supported by Guangzhou Development Zone Science and Technology(2021GH10,2020GH10,2023GH02)the University of Macao(MYRG2022-00271-FST)The Science and Technology Development Fund(FDCT)of Macao(0032/2022/A).
文摘Introduction Early cancer detection represents a critical evolution in healthcare,addressing a significant pain point in cancer treatment:the tendency for diagnoses to occur at advanced stages.Traditionally,many cancers are not identified until they have progressed to late stages,where treatment options become limited,less effective,and more costly.This late detection results in poorer prognoses,higher mortality rates,and increased healthcare costs.Without early detection tools like Fluorescence In Situ Hybridization(FISH),these challenges persist,leaving patients with fewer opportunities for successful outcomes.
文摘A measurement system for the scattering characteristics of warhead fragments based on high-speed imaging systems offers advantages such as simple deployment,flexible maneuverability,and high spatiotemporal resolution,enabling the acquisition of full-process data of the fragment scattering process.However,mismatches between camera frame rates and target velocities can lead to long motion blur tails of high-speed fragment targets,resulting in low signal-to-noise ratios and rendering conventional detection algorithms ineffective in dynamic strong interference testing environments.In this study,we propose a detection framework centered on dynamic strong interference disturbance signal separation and suppression.We introduce a mixture Gaussian model constrained under a joint spatialtemporal-transform domain Dirichlet process,combined with total variation regularization to achieve disturbance signal suppression.Experimental results demonstrate that the proposed disturbance suppression method can be integrated with certain conventional motion target detection tasks,enabling adaptation to real-world data to a certain extent.Moreover,we provide a specific implementation of this process,which achieves a detection rate close to 100%with an approximate 0%false alarm rate in multiple sets of real target field test data.This research effectively advances the development of the field of damage parameter testing.
基金supported by the Key Research and Development Program of Shaanxi Province-International Science and Technology Cooperation Program Project (No.2020KW-001)the Contract for Xi'an Municipal Science and Technology Plan Project-Xi'an City Strong Foundation Innovation Plan (No.21XJZZ0074)the Key Project of Graduate Student Innovation Fund at Xi'an University of Posts and Telecommunications (No.CXJJZL2023013)。
文摘The detection of surface defects in concrete bridges using deep learning is of significant importance for reducing operational risks,saving maintenance costs,and driving the intelligent transformation of bridge defect detection.In contrast to the subjective and inefficient manual visual inspection,deep learning-based algorithms for concrete defect detection exhibit remarkable advantages,emerging as a focal point in recent research.This paper comprehensively analyzes the research progress of deep learning algorithms in the field of surface defect detection in concrete bridges in recent years.It introduces the early detection methods for surface defects in concrete bridges and the development of deep learning.Subsequently,it provides an overview of deep learning-based concrete bridge surface defect detection research from three aspects:image classification,object detection,and semantic segmentation.The paper summarizes the strengths and weaknesses of existing methods and the challenges they face.Additionally,it analyzes and prospects the development trends of surface defect detection in concrete bridges.
基金supported by National Natural Science Foundation of China(Grant No.:32301161)the Natural Scientific Foundation of Hunan Province,China(Grant No.:2023JJ60052)+3 种基金the Scientific Research Project of Hunan Provincial Health Commission,China(Grant No.:202112062218,20190161)the Scientific Research Project of Hunan Provincial Department of Education,China(Grant No.:22B0455)the Clinical“4310”Project of the University of South China,China(Grant No.:20224310NHYCG02)the Doctoral Scientific Research Foundation of University of South China,China(Grant No.:200XQD042).
文摘Infrared(IR)spectroscopy,a technique within the realm of molecular vibrational spectroscopy,furnishes distinctive chemical signatures pivotal for both structural analysis and compound identification.A notable challenge emerges from the misalignment between the mid-IR light wavelength range and molecular dimensions,culminating in a constrained absorption cross-section and diminished vibrational absorption coefficients(Supplementary data).
基金supported by the Natural Science Foundation of China (No.62103298)the South African National Research Foundation (Nos.132797 and 137951)。
文摘In this paper,a two-stage light detection and ranging(LiDAR) three-dimensional(3D) object detection framework is presented,namely point-voxel dual transformer(PV-DT3D),which is a transformer-based method.In the proposed PV-DT3D,point-voxel fusion features are used for proposal refinement.Specifically,keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module.Subsequently,following the generation of proposals by the region proposal networks(RPN),the internal encoded keypoints are fed into the dual transformer encoder-decoder architecture.In 3D object detection,the proposed PV-DT3D takes advantage of both point-wise transformer and channel-wise architecture to capture contextual information from the spatial and channel dimensions.Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.
基金supported by the National Natural Science Foundation of China(No.62103298)the Natural Science Foundation of Hebei Province(No.F2018209289)。
文摘Aiming at the problems of low detection efficiency and difficult positioning of traditional steel surface defect detection methods,a lightweight steel surface defect detection model based on you only look once version 7(YOLOv7)is proposed.First,a cascading style sheets(CSS)block module is proposed,which uses more lightweight operations to obtain redundant information in the feature map,reduces the amount of computation,and effectively improves the detection speed.Secondly,the improved spatial pyramid pooling with cross stage partial convolutions(SPPCSPC)structure is adopted to ensure that the model can also pay attention to the defect location information while predicting the defect category information,obtain richer defect features.In addition,the convolution operation in the original model is simplified,which significantly reduces the size of the model and helps to improve the detection speed.Finally,using efficient intersection over union(EIOU)loss to focus on high-quality anchors,speed up convergence and improve positioning accuracy.Experiments were carried out on the Northeastern University-defect(NEU-DET)steel surface defect dataset.Compared with the original YOLOv7 model,the number of parameters of this model was reduced by 40%,the frames per second(FPS)reached 112,and the average accuracy reached 79.1%,the detection accuracy and speed have been improved,which can meet the needs of steel surface defect detection.
基金supported by the National Natural Science Foundation of China(No.62103298)。
文摘Aiming at the problems of low detection accuracy and large model size of existing object detection algorithms applied to complex road scenes,an improved you only look once version 8(YOLOv8)object detection algorithm for infrared images,F-YOLOv8,is proposed.First,a spatial-to-depth network replaces the traditional backbone network's strided convolution or pooling layer.At the same time,it combines with the channel attention mechanism so that the neural network focuses on the channels with large weight values to better extract low-resolution image feature information;then an improved feature pyramid network of lightweight bidirectional feature pyramid network(L-BiFPN)is proposed,which can efficiently fuse features of different scales.In addition,a loss function of insertion of union based on the minimum point distance(MPDIoU)is introduced for bounding box regression,which obtains faster convergence speed and more accurate regression results.Experimental results on the FLIR dataset show that the improved algorithm can accurately detect infrared road targets in real time with 3%and 2.2%enhancement in mean average precision at 50%IoU(mAP50)and mean average precision at 50%—95%IoU(mAP50-95),respectively,and 38.1%,37.3%and 16.9%reduction in the number of model parameters,the model weight,and floating-point operations per second(FLOPs),respectively.To further demonstrate the detection capability of the improved algorithm,it is tested on the public dataset PASCAL VOC,and the results show that F-YOLO has excellent generalized detection performance.
基金supported by the Zhejiang Provincial Key R&D Program(Nos.2024C01108,2023C01030,2023C01034)the Hangzhou Key R&D Program(Nos.2023SZD0046,2024SZD1A03)the Ningbo Key R&D Program(No.2024Z114).
文摘1 Introduction Sound event detection(SED)aims to identify and locate specific sound event categories and their corresponding timestamps within continuous audio streams.To overcome the limitations posed by the scarcity of strongly labeled training data,researchers have increasingly turned to semi-supervised learning(SSL)[1],which leverages unlabeled data to augment training and improve detection performance.Among many SSL methods[2-4].
基金supported by a grant from R&D Program Development of Rail-Specific Digital Resource Technology Based on an AI-Enabled Rail Support Platform,grant number PK2401C1,of the Korea Railroad Research Institute.
文摘Fire detection has held stringent importance in computer vision for over half a century.The development of early fire detection strategies is pivotal to the realization of safe and smart cities,inhabitable in the future.However,the development of optimal fire and smoke detection models is hindered by limitations like publicly available datasets,lack of diversity,and class imbalance.In this work,we explore the possible ways forward to overcome these challenges posed by available datasets.We study the impact of a class-balanced dataset to improve the fire detection capability of state-of-the-art(SOTA)vision-based models and propose the use of generative models for data augmentation,as a future work direction.First,a comparative analysis of two prominent object detection architectures,You Only Look Once version 7(YOLOv7)and YOLOv8 has been carried out using a balanced dataset,where both models have been evaluated across various evaluation metrics including precision,recall,and mean Average Precision(mAP).The results are compared to other recent fire detection models,highlighting the superior performance and efficiency of the proposed YOLOv8 architecture as trained on our balanced dataset.Next,a fractal dimension analysis gives a deeper insight into the repetition of patterns in fire,and the effectiveness of the results has been demonstrated by a windowing-based inference approach.The proposed Slicing-Aided Hyper Inference(SAHI)improves the fire and smoke detection capability of YOLOv8 for real-life applications with a significantly improved mAP performance over a strict confidence threshold.YOLOv8 with SAHI inference gives a mAP:50-95 improvement of more than 25%compared to the base YOLOv8 model.The study also provides insights into future work direction by exploring the potential of generative models like deep convolutional generative adversarial network(DCGAN)and diffusion models like stable diffusion,for data augmentation.
基金supported in part by the National Key Research and Development Project under Grant 2020YFB1806805partially funded through a grant from Qualcomm。
文摘6G is desired to support more intelligence networks and this trend attaches importance to the self-healing capability if degradation emerges in the cellular networks.As a primary component of selfhealing networks,fault detection is investigated in this paper.Considering the fast response and low timeand-computational consumption,it is the first time that the Online Broad Learning System(OBLS)is applied to identify outages in cellular networks.In addition,the Automatic-constructed Online Broad Learning System(AOBLS)is put forward to rationalize its structure and consequently avoid over-fitting and under-fitting.Furthermore,a multi-layer classification structure is proposed to further improve the classification performance.To face the challenges caused by imbalanced data in fault detection problems,a novel weighting strategy is derived to achieve the Multilayer Automatic-constructed Weighted Online Broad Learning System(MAWOBLS)and ensemble learning with retrained Support Vector Machine(SVM),denoted as EMAWOBLS,for superior treatment with this imbalance issue.Simulation results show that the proposed algorithm has excellent performance in detecting faults with satisfactory time usage.
文摘Many applications,including security systems,medical diagnostics,and human-computer interfaces,depend on eye gaze recognition.However,due to factors including individual variations,occlusions,and shifting illumination conditions,real-world scenarios continue to provide difficulties for accurate and consistent eye gaze recognition.This work is aimed at investigating the potential benefits of employing transfer learning to improve eye gaze detection ability and efficiency.Transfer learning is the process of fine-tuning pre-trained models on smaller,domain-specific datasets after they have been trained on larger datasets.We study several transfer learning algorithms and evaluate their effectiveness on eye gaze identification,including both Regression and Classification tasks,using a range of deep learning architectures,namely AlexNet,Visual Geometry Group(VGG),InceptionV3,and ResNet.In this study,we evaluate the effectiveness of transfer learning-basedmodels against models that were trained fromscratch using eye-gazing datasets on grounds of various performance and loss metrics such as Precision,Accuracy,and Mean Absolute Error.We investigate the effects of different pre-trainedmodels,dataset sizes,and domain gaps on the transfer learning process,and the findings of our study clarify the efficacy of transfer learning for eye gaze detection and offer suggestions for the most successful transfer learning strategies to apply in real-world situations.
基金financed by the European Union-NextGenerationEU,through the National Recowery and Resilience Plan of the Republic of Bulgaria,Project No.BG-RRP-2.013-0001-C01.
文摘Social media has emerged as one of the most transformative developments on the internet,revolu-tionizing the way people communicate and interact.However,alongside its benefits,social media has also given rise to significant challenges,one of the most pressing being cyberbullying.This issue has become a major concern in modern society,particularly due to its profound negative impacts on the mental health and well-being of its victims.In the Arab world,where social media usage is exceptionblly high,cyberbullying has become increasingly prevalent,necessitating urgent attention.Early detection of harmful online behavior is critical to fostering safer digital environments and mitigating the adverse efcts of cyberbullying.This underscores the importance of developing advanced tools and systems to identify and address such behavior efectively.This paper investigates the development of a robust cyberbullying detection and classifcation system tailored for Arabic comments on YouTube.The study explores the efectiveness of various deep learning models,including Bi-LSTM(Bidirectional Long Short Term Memory),LSTM(Long Short-Term Memory),CNN(Convolutional Neural Networks),and a hybrid CNN-LSTM,in classifying Arabic comments into binary classes(bullying or not)and multiclass categories.A comprehensive dataset of 20,000 Arabic YouTube comments was collected,preprocessed,and labeled to support these tasks.The results revealed that the CNN and hybrid CNN-LSTM models achieved the highest accuracy in binary classification,reaching an impressive 91.9%.For multiclass dlassification,the LSTM and Bi-LSTM models outperformed others,achieving an accuracy of 89.5%.These findings highlight the efctiveness of deep learning approaches in the mitigation of cyberbullying within Arabic online communities.