Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource effic...Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource efficiency, we propose a high efficiency hardware implementation for TSR. We divide the TSR procedure into two stages, detection and recognition. In the detection stage, under the assumption that most German traffic signs have red or blue colors with circle, triangle or rectangle shapes, we use Normalized RGB color transform and Single-Pass Connected Component Labeling (CCL) to find the potential traffic signs efficiently. For Single-Pass CCL, our contribution is to eliminate the “merge-stack” operations by recording connected relations of region in the scan phase and updating the labels in the iterating phase. In the recognition stage, the Histogram of Oriented Gradient (HOG) is used to generate the descriptor of the signs, and we classify the signs with Support Vector Machine (SVM). In the HOG module, we analyze the required minimum bits under different recognition rate. The proposed method achieves 96.61% detection rate and 90.85% recognition rate while testing with the GTSDB dataset. Our hardware implementation reduces the storage of CCL and simplifies the HOG computation. Main CCL storage size is reduced by 20% comparing to the most advanced design under typical condition. By using TSMC 90 nm technology, the proposed design operates at 105 MHz clock rate and processes in 135 fps with the image size of 1360 × 800. The chip size is about 1 mm2 and the power consumption is close to 8 mW. Therefore, this work is resource efficient and achieves real-time requirement.展开更多
Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only...Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The dataset consists of 14,186 images across 19 activity classes,from dynamic activities such as running and swimming to static activities such as sitting and sleeping.Preprocessing included resizing all images to 512512 pixels,annotating them in YOLO’s bounding box format,and applying data augmentation methods such as flipping,rotation,and cropping to enhance model generalization.The proposed model was trained for 100 epochs with adaptive learning rate methods and hyperparameter optimization for performance improvement,with a mAP@0.5 of 74.93%and a mAP@0.5-0.95 of 64.11%,outperforming previous versions of YOLO(v10,v9,and v8)and general-purpose architectures like ResNet50 and EfficientNet.It exhibited improved precision and recall for all activity classes with high precision values of 0.76 for running,0.79 for swimming,0.80 for sitting,and 0.81 for sleeping,and was tested for real-time deployment with an inference time of 8.9 ms per image,being computationally light.Proposed YOLOv11’s improvements are attributed to architectural advancements like a more complex feature extraction process,better attention modules,and an anchor-free detection mechanism.While YOLOv10 was extremely stable in static activity recognition,YOLOv9 performed well in dynamic environments but suffered from overfitting,and YOLOv8,while being a decent baseline,failed to differentiate between overlapping static activities.The experimental results determine proposed YOLOv11 to be the most appropriate model,providing an ideal balance between accuracy,computational efficiency,and robustness for real-world deployment.Nevertheless,there exist certain issues to be addressed,particularly in discriminating against visually similar activities and the use of publicly available datasets.Future research will entail the inclusion of 3D data and multimodal sensor inputs,such as depth and motion information,for enhancing recognition accuracy and generalizability to challenging real-world environments.展开更多
The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The ...The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The design of license plate recognition algorithms has undergone digitalization through the utilization of neural networks. In contemporary times, there is a growing demand for vehicle surveillance due to the need for efficient vehicle processing and traffic management. The design, development, and implementation of a license plate recognition system hold significant social, economic, and academic importance. The study aims to present contemporary methodologies and empirical findings pertaining to automated license plate recognition. The primary focus of the automatic license plate recognition algorithm was on image extraction, character segmentation, and recognition. The task of character segmentation has been identified as the most challenging function based on my observations. The license plate recognition project that we designed demonstrated the effectiveness of this method across various observed conditions. Particularly in low-light environments, such as during periods of limited illumination or inclement weather characterized by precipitation. The method has been subjected to testing using a sample size of fifty images, resulting in a 100% accuracy rate. The findings of this study demonstrate the project’s ability to effectively determine the optimal outcomes of simulations.展开更多
The human ear has been substantiated as a viable nonintrusive biometric modality for identification or verification.Among many feasible techniques for ear biometric recognition,convolutional neural network(CNN)models ...The human ear has been substantiated as a viable nonintrusive biometric modality for identification or verification.Among many feasible techniques for ear biometric recognition,convolutional neural network(CNN)models have recently offered high-performance and reliable systems.However,their performance can still be further improved using the capabilities of soft biometrics,a research question yet to be investigated.This research aims to augment the traditional CNN-based ear recognition performance by adding increased discriminatory ear soft biometric traits.It proposes a novel framework of augmented ear identification/verification using a group of discriminative categorical soft biometrics and deriving new,more perceptive,comparative soft biometrics for feature-level fusion with hard biometric deep features.It conducts several identification and verification experiments for performance evaluation,analysis,and comparison while varying ear image datasets,hard biometric deep-feature extractors,soft biometric augmentation methods,and classifiers used.The experimental work yields promising results,reaching up to 99.94%accuracy and up to 14%improvement using the AMI and AMIC datasets,along with their corresponding soft biometric label data.The results confirm the proposed augmented approaches’superiority over their standard counterparts and emphasize the robustness of the new ear comparative soft biometrics over their categorical peers.展开更多
In the era of artificial intelligence(AI),healthcare and medical sciences are inseparable from different AI technologies[1].ChatGPT once shocked the medical field,but the latest AI model DeepSeek has recently taken th...In the era of artificial intelligence(AI),healthcare and medical sciences are inseparable from different AI technologies[1].ChatGPT once shocked the medical field,but the latest AI model DeepSeek has recently taken the lead[2].PubMed indexed publications on DeepSeek are evolving[3],but limited to editorials and news articles.In this Letter,we explore the use of DeepSeek in early symptoms recognition for stroke care.To the best of our knowledge,this is the first DeepSeek-related writing on stroke.展开更多
Molecular recognition of bioreceptors and enzymes relies on orthogonal interactions with small molecules within their cavity. To date, Chinese scientists have developed three types of strategies for introducing active...Molecular recognition of bioreceptors and enzymes relies on orthogonal interactions with small molecules within their cavity. To date, Chinese scientists have developed three types of strategies for introducing active sites inside the cavity of macrocyclic arenes to better mimic molecular recognition of bioreceptors and enzymes.The editorial aims to enlighten scientists in this field when they develop novel macrocycles for molecular recognition, supramolecular assembly, and applications.展开更多
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions...Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.展开更多
Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emerg...Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emergency medicine poses a significant challenge.To address this,we propose a novel approach based on the inter-learning of visual features between global and local information.Specifically,our method enhances the perception capabilities of the visual feature extractor by strategically leveraging the strengths of convolutional neural network(CNN),which are adept at capturing local features,and visual transformers which perform well at perceiving global features.Furthermore,to mitigate the issue of overfitting caused by the limited availability of sign language data for emergency medical applications,we introduce an enhanced short temporal module for data augmentation through additional subsequences.Experimental results on three publicly available sign language datasets demonstrate the efficacy of the proposed approach.展开更多
Convolutional neural networks(CNNs)exhibit superior performance in image feature extraction,making them extensively used in the area of traffic sign recognition.However,the design of existing traffic sign recognition ...Convolutional neural networks(CNNs)exhibit superior performance in image feature extraction,making them extensively used in the area of traffic sign recognition.However,the design of existing traffic sign recognition algorithms often relies on expert knowledge to enhance the image feature extraction networks,necessitating image preprocessing and model parameter tuning.This increases the complexity of the model design process.This study introduces an evolutionary neural architecture search(ENAS)algorithm for the automatic design of neural network models tailored for traffic sign recognition.By integrating the construction parameters of residual network(ResNet)into evolutionary algorithms(EAs),we automatically generate lightweight networks for traffic sign recognition,utilizing blocks as the fundamental building units.Experimental evaluations on the German traffic sign recognition benchmark(GTSRB)dataset reveal that the algorithm attains a recognition accuracy of 99.32%,with a mere 2.8×10^(6)parameters.Experimental results comparing the proposed method with other traffic sign recognition algorithms demonstrate that the method can more efficiently discover neural network architectures,significantly reducing the number of network parameters while maintaining recognition accuracy.展开更多
Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensur...Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensure patient safety.This survey examines the current state of pill image recognition,focusing on advancements,methodologies,and the challenges that remain unresolved.It provides a comprehensive overview of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and aims to explore the ongoing difficulties in the field.We summarize and classify the methods used in each article,compare the strengths and weaknesses of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and review benchmark datasets for pill image recognition.Additionally,we compare the performance of proposed methods on popular benchmark datasets.This survey applies recent advancements,such as Transformer models and cutting-edge technologies like Augmented Reality(AR),to discuss potential research directions and conclude the review.By offering a holistic perspective,this paper aims to serve as a valuable resource for researchers and practitioners striving to advance the field of pill image recognition.展开更多
Smart grid substation operations often take place in hazardous environments and pose significant threats to the safety of power personnel.Relying solely on manual supervision can lead to inadequate oversight.In respon...Smart grid substation operations often take place in hazardous environments and pose significant threats to the safety of power personnel.Relying solely on manual supervision can lead to inadequate oversight.In response to the demand for technology to identify improper operations in substation work scenarios,this paper proposes a substation safety action recognition technology to avoid the misoperation and enhance the safety management.In general,this paper utilizes a dual-branch transformer network to extract spatial and temporal information from the video dataset of operational behaviors in complex substation environments.Firstly,in order to capture the spatial-temporal correlation of people's behaviors in smart grid substation,we devise a sparse attention module and a segmented linear attention module that are embedded into spatial branch transformer and temporal branch transformer respectively.To avoid the redundancy of spatial and temporal information,we fuse the temporal and spatial features using a tensor decomposition fusion module by a decoupled manner.Experimental results indicate that our proposed method accurately detects improper operational behaviors in substation work scenarios,outperforming other existing methods in terms of detection and recognition accuracy.展开更多
The Cosic Resonance Recognition Model (RRM) for amino acid sequences was applied to the classes of proteins displayed by four strains (Sudan, Zaire, Reston, Ivory Coast) of Ebola virus that produced either high or min...The Cosic Resonance Recognition Model (RRM) for amino acid sequences was applied to the classes of proteins displayed by four strains (Sudan, Zaire, Reston, Ivory Coast) of Ebola virus that produced either high or minimal numbers of human fatalities. The results clearly differentiated highly lethal and non-lethal strains. Solutions for the two lethal strains exhibited near ultraviolet (~230 nm) photon values while the two asymptomatic forms displayed near infrared (~1000 nm) values. Cross-correlations of spectral densities of the RRM values of the different classes of proteins associated with the genome of the viruses supported this dichotomy. The strongest coefficient occurred only between Sudan-Zaire strains but not for any of the other pairs of strains for sGP, the small glycoprotein that intercalated with the plasma cell membrane to promote insertion of viral contents into cellular space. A surprising, statistically significant cross-spectral correlation occurred between the “spike” glycoprotein component (GP1) of the virus that associated the anchoring of the virus to the mammalian cell plasma membrane and the Schumann resonance of the earth whose intensities were determined by the incidence of equatorial thunderstorms. Previous applications of the RRM to shifting photon wavelengths emitted by melanoma cells adapting to reduced ambient temperature have validated Cosic’s model and have demonstrated very narrowwave-length (about 10 nm) specificity. One possible ancillary and non-invasive treatment of people within which the fatal Ebola strains are residing would be whole body application of narrow band near-infrared light pulsed as specific physiologically-patterned sequences with sufficient radiant flux density to perfuse the entire body volume.展开更多
The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for he...The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.展开更多
In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fi...In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.展开更多
Mahalanobis-Taguchi system(MTS)is a kind of data mining and pattern recognition method which can identify the attribute characteristics of multidimensional data by constructing Mahalanobis distance(MD)measurement scal...Mahalanobis-Taguchi system(MTS)is a kind of data mining and pattern recognition method which can identify the attribute characteristics of multidimensional data by constructing Mahalanobis distance(MD)measurement scale.In this paper,considering the influence of irregular distribution of the sample data and abnormal variation of the normal data on accuracy of MTS,a feature recognition and selection model of the equipment state based on the improved MTS is proposed,and two aspects of the model namely construction of the original Mahalanobis space(MS)and determination of the threshold are studied.Firstly,the original training sample space is statistically controlled by the X-bar-S control chart,and extreme data of the single characteristic attribute is filtered to reduce the impact of extreme condition on the accuracy of the model,so as to construct a more robust MS.Furthermore,the box plot method is used to determine the threshold of the model.And the stability of the model and the tolerance to the extreme condition are improved by leaving sufficient range of the variation for the extreme condition which is identified as in the normal range.Finally,the improved model is compared with the traditional one based on the unimproved MTS by using the data from the literature.The result shows that compared with the traditional model,the accuracy and sensitivity of the improved model for state identification can be greatly enhanced.展开更多
Objective: To examine and measure the decision-making processes involved in Visual Recognition of Facial Emotional Expressions (VRFEE) and to study the effects of demographic factors on this process. Method: We evalua...Objective: To examine and measure the decision-making processes involved in Visual Recognition of Facial Emotional Expressions (VRFEE) and to study the effects of demographic factors on this process. Method: We evaluated a newly designed software application (M.A.R.I.E.) that permits computerized metric measurement of VRFEE. We administered it to 204 cognitively normal participants ranging in age from 20 to 70 years. Results: We established normative values for the recognition of anger, disgust, joy, fear, surprise and sadness expressed on the faces of three individuals. There was a significant difference in the: 1) measurement (F (8.189) = 3896, p = 0.0001);2) education level (x2(12) = 28.4, p = 0.005);3) face (F(2.195) = 10, p = 0.0001);4)series (F (8.189)=28, p = 0.0001);5) interaction between the identity and recognition of emotions (F (16, 181 =11, p = 0.0001). However, performance did not differ according to: 1) age (F (6.19669) = 1.35, p = 0.2) or 2) level of education (F (1, 1587) = 0.6, p = 0.4). Conclusions: In healthy participants, the VRFEE remains stable throughout the lifespan when cognitive functions remain optimal. Disgust, sadness, fear, and joy seem to be the four most easily recognized facial emotions, while anger and surprise are not easily recognized. Visual recognition of disgust and fear is independent of aging. The characteristics of a face have a significant influence on the ease with which people recognize expressed emotions (idiosyncrasy). Perception and recognition of emotions is categorical, even when the facial images are integrated in a spectrum of morphs reflecting two different emotions on either side.展开更多
Digit Recognition is an essential element of the process of scanning and converting documents into electronic format. In this work, a new Multiple-Cell Size (MCS) approach is being proposed for utilizing Histogram of ...Digit Recognition is an essential element of the process of scanning and converting documents into electronic format. In this work, a new Multiple-Cell Size (MCS) approach is being proposed for utilizing Histogram of Oriented Gradient (HOG) features and a Support Vector Machine (SVM) based classifier for efficient classification of Handwritten Digits. The HOG based technique is sensitive to the cell size selection used in the relevant feature extraction computations. Hence a new MCS approach has been used to perform HOG analysis and compute the HOG features. The system has been tested on the Benchmark MNIST Digit Database of handwritten digits and a classification accuracy of 99.36% has been achieved using an Independent Test set strategy. A Cross-Validation analysis of the classification system has also been performed using the 10-Fold Cross-Validation strategy and a 10-Fold classification accuracy of 99.26% has been obtained. The classification performance of the proposed system is superior to existing techniques using complex procedures since it has achieved at par or better results using simple operations in both the Feature Space and in the Classifier Space. The plots of the system’s Confusion Matrix and the Receiver Operating Characteristics (ROC) show evidence of the superior performance of the proposed new MCS HOG and SVM based digit classification system.展开更多
Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have bee...Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have been proposed,most of them focus on recognizing printed Urdu characters and digits.To the best of our knowledge,very little research has focused solely on Urdu pure handwriting recognition,and the results of such proposed methods are often inadequate.In this study,we introduce a novel approach to recognizing Urdu pure handwritten digits and characters using Convolutional Neural Networks(CNN).Our proposed method utilizes convolutional layers to extract important features from input images and classifies them using fully connected layers,enabling efficient and accurate detection of Urdu handwritten digits and characters.We implemented the proposed technique on a large publicly available dataset of Urdu handwritten digits and characters.The findings demonstrate that the CNN model achieves an accuracy of 98.30%and an F1 score of 88.6%,indicating its effectiveness in detecting and classifyingUrdu handwritten digits and characters.These results have far-reaching implications for various applications,including document analysis,text recognition,and language understanding,which have previously been unexplored in the context of Urdu handwriting data.This work lays a solid foundation for future research and development in Urdu language detection and processing,opening up new opportunities for advancement in this field.展开更多
Accurate recognition of flight deck operations for carrier-based aircraft, based on operation trajectories, is critical for optimizing carrier-based aircraft performance. This recognition involves understanding short-...Accurate recognition of flight deck operations for carrier-based aircraft, based on operation trajectories, is critical for optimizing carrier-based aircraft performance. This recognition involves understanding short-term and long-term spatial collaborative relationships among support agents and positions from long spatial–temporal trajectories. While the existing methods excel at recognizing collaborative behaviors from short trajectories, they often struggle with long spatial–temporal trajectories. To address this challenge, this paper introduces a dynamic graph method to enhance flight deck operation recognition. First, spatial–temporal collaborative relationships are modeled as a dynamic graph. Second, a discretized and compressed method is proposed to assign values to the states of this dynamic graph. To extract features that represent diverse collaborative relationships among agents and account for the duration of these relationships, a biased random walk is then conducted. Subsequently, the Swin Transformer is employed to comprehend spatial–temporal collaborative relationships, and a fully connected layer is applied to deck operation recognition. Finally, to address the scarcity of real datasets, a simulation pipeline is introduced to generate deck operations in virtual flight deck scenarios. Experimental results on the simulation dataset demonstrate the superior performance of the proposed method.展开更多
文摘Traffic sign recognition (TSR, or Road Sign Recognition, RSR) is one of the Advanced Driver Assistance System (ADAS) devices in modern cars. To concern the most important issues, which are real-time and resource efficiency, we propose a high efficiency hardware implementation for TSR. We divide the TSR procedure into two stages, detection and recognition. In the detection stage, under the assumption that most German traffic signs have red or blue colors with circle, triangle or rectangle shapes, we use Normalized RGB color transform and Single-Pass Connected Component Labeling (CCL) to find the potential traffic signs efficiently. For Single-Pass CCL, our contribution is to eliminate the “merge-stack” operations by recording connected relations of region in the scan phase and updating the labels in the iterating phase. In the recognition stage, the Histogram of Oriented Gradient (HOG) is used to generate the descriptor of the signs, and we classify the signs with Support Vector Machine (SVM). In the HOG module, we analyze the required minimum bits under different recognition rate. The proposed method achieves 96.61% detection rate and 90.85% recognition rate while testing with the GTSDB dataset. Our hardware implementation reduces the storage of CCL and simplifies the HOG computation. Main CCL storage size is reduced by 20% comparing to the most advanced design under typical condition. By using TSMC 90 nm technology, the proposed design operates at 105 MHz clock rate and processes in 135 fps with the image size of 1360 × 800. The chip size is about 1 mm2 and the power consumption is close to 8 mW. Therefore, this work is resource efficient and achieves real-time requirement.
基金supported by King Saud University,Riyadh,Saudi Arabia,under Ongoing Research Funding Program(ORF-2025-951).
文摘Human activity recognition is a significant area of research in artificial intelligence for surveillance,healthcare,sports,and human-computer interaction applications.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The article benchmarks the performance of You Only Look Once version 11-based(YOLOv11-based)architecture for multi-class human activity recognition.The dataset consists of 14,186 images across 19 activity classes,from dynamic activities such as running and swimming to static activities such as sitting and sleeping.Preprocessing included resizing all images to 512512 pixels,annotating them in YOLO’s bounding box format,and applying data augmentation methods such as flipping,rotation,and cropping to enhance model generalization.The proposed model was trained for 100 epochs with adaptive learning rate methods and hyperparameter optimization for performance improvement,with a mAP@0.5 of 74.93%and a mAP@0.5-0.95 of 64.11%,outperforming previous versions of YOLO(v10,v9,and v8)and general-purpose architectures like ResNet50 and EfficientNet.It exhibited improved precision and recall for all activity classes with high precision values of 0.76 for running,0.79 for swimming,0.80 for sitting,and 0.81 for sleeping,and was tested for real-time deployment with an inference time of 8.9 ms per image,being computationally light.Proposed YOLOv11’s improvements are attributed to architectural advancements like a more complex feature extraction process,better attention modules,and an anchor-free detection mechanism.While YOLOv10 was extremely stable in static activity recognition,YOLOv9 performed well in dynamic environments but suffered from overfitting,and YOLOv8,while being a decent baseline,failed to differentiate between overlapping static activities.The experimental results determine proposed YOLOv11 to be the most appropriate model,providing an ideal balance between accuracy,computational efficiency,and robustness for real-world deployment.Nevertheless,there exist certain issues to be addressed,particularly in discriminating against visually similar activities and the use of publicly available datasets.Future research will entail the inclusion of 3D data and multimodal sensor inputs,such as depth and motion information,for enhancing recognition accuracy and generalizability to challenging real-world environments.
文摘The development of scientific inquiry and research has yielded numerous benefits in the realm of intelligent traffic control systems, particularly in the realm of automatic license plate recognition for vehicles. The design of license plate recognition algorithms has undergone digitalization through the utilization of neural networks. In contemporary times, there is a growing demand for vehicle surveillance due to the need for efficient vehicle processing and traffic management. The design, development, and implementation of a license plate recognition system hold significant social, economic, and academic importance. The study aims to present contemporary methodologies and empirical findings pertaining to automated license plate recognition. The primary focus of the automatic license plate recognition algorithm was on image extraction, character segmentation, and recognition. The task of character segmentation has been identified as the most challenging function based on my observations. The license plate recognition project that we designed demonstrated the effectiveness of this method across various observed conditions. Particularly in low-light environments, such as during periods of limited illumination or inclement weather characterized by precipitation. The method has been subjected to testing using a sample size of fifty images, resulting in a 100% accuracy rate. The findings of this study demonstrate the project’s ability to effectively determine the optimal outcomes of simulations.
基金funded by WAQF at King Abdulaziz University,Jeddah,Saudi Arabia.
文摘The human ear has been substantiated as a viable nonintrusive biometric modality for identification or verification.Among many feasible techniques for ear biometric recognition,convolutional neural network(CNN)models have recently offered high-performance and reliable systems.However,their performance can still be further improved using the capabilities of soft biometrics,a research question yet to be investigated.This research aims to augment the traditional CNN-based ear recognition performance by adding increased discriminatory ear soft biometric traits.It proposes a novel framework of augmented ear identification/verification using a group of discriminative categorical soft biometrics and deriving new,more perceptive,comparative soft biometrics for feature-level fusion with hard biometric deep features.It conducts several identification and verification experiments for performance evaluation,analysis,and comparison while varying ear image datasets,hard biometric deep-feature extractors,soft biometric augmentation methods,and classifiers used.The experimental work yields promising results,reaching up to 99.94%accuracy and up to 14%improvement using the AMI and AMIC datasets,along with their corresponding soft biometric label data.The results confirm the proposed augmented approaches’superiority over their standard counterparts and emphasize the robustness of the new ear comparative soft biometrics over their categorical peers.
文摘In the era of artificial intelligence(AI),healthcare and medical sciences are inseparable from different AI technologies[1].ChatGPT once shocked the medical field,but the latest AI model DeepSeek has recently taken the lead[2].PubMed indexed publications on DeepSeek are evolving[3],but limited to editorials and news articles.In this Letter,we explore the use of DeepSeek in early symptoms recognition for stroke care.To the best of our knowledge,this is the first DeepSeek-related writing on stroke.
文摘Molecular recognition of bioreceptors and enzymes relies on orthogonal interactions with small molecules within their cavity. To date, Chinese scientists have developed three types of strategies for introducing active sites inside the cavity of macrocyclic arenes to better mimic molecular recognition of bioreceptors and enzymes.The editorial aims to enlighten scientists in this field when they develop novel macrocycles for molecular recognition, supramolecular assembly, and applications.
基金supported by the Zhejiang Provincial Natural Science Foundation of China(No.LQ23F030001)the National Natural Science Foundation of China(No.62406280)+5 种基金the Autism Research Special Fund of Zhejiang Foundation for Disabled Persons(No.2023008)the Liaoning Province Higher Education Innovative Talents Program Support Project(No.LR2019058)the Liaoning Province Joint Open Fund for Key Scientific and Technological Innovation Bases(No.2021-KF-12-05)the Central Guidance on Local Science and Technology Development Fund of Liaoning Province(No.2023JH6/100100066)the Key Laboratory for Biomedical Engineering of Ministry of Education,Zhejiang University,Chinain part by the Open Research Fund of the State Key Laboratory of Cognitive Neuroscience and Learning.
文摘Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions.
基金supported by the National Natural Science Foundation of China(No.62376197)the Tianjin Science and Technology Program(No.23JCYBJC00360)the Tianjin Health Research Project(No.TJWJ2025MS045).
文摘Accessible communication based on sign language recognition(SLR)is the key to emergency medical assistance for the hearing-impaired community.Balancing the capture of both local and global information in SLR for emergency medicine poses a significant challenge.To address this,we propose a novel approach based on the inter-learning of visual features between global and local information.Specifically,our method enhances the perception capabilities of the visual feature extractor by strategically leveraging the strengths of convolutional neural network(CNN),which are adept at capturing local features,and visual transformers which perform well at perceiving global features.Furthermore,to mitigate the issue of overfitting caused by the limited availability of sign language data for emergency medical applications,we introduce an enhanced short temporal module for data augmentation through additional subsequences.Experimental results on three publicly available sign language datasets demonstrate the efficacy of the proposed approach.
基金supported by the National Natural Science Foundation of China(No.62066041).
文摘Convolutional neural networks(CNNs)exhibit superior performance in image feature extraction,making them extensively used in the area of traffic sign recognition.However,the design of existing traffic sign recognition algorithms often relies on expert knowledge to enhance the image feature extraction networks,necessitating image preprocessing and model parameter tuning.This increases the complexity of the model design process.This study introduces an evolutionary neural architecture search(ENAS)algorithm for the automatic design of neural network models tailored for traffic sign recognition.By integrating the construction parameters of residual network(ResNet)into evolutionary algorithms(EAs),we automatically generate lightweight networks for traffic sign recognition,utilizing blocks as the fundamental building units.Experimental evaluations on the German traffic sign recognition benchmark(GTSRB)dataset reveal that the algorithm attains a recognition accuracy of 99.32%,with a mere 2.8×10^(6)parameters.Experimental results comparing the proposed method with other traffic sign recognition algorithms demonstrate that the method can more efficiently discover neural network architectures,significantly reducing the number of network parameters while maintaining recognition accuracy.
文摘Pill image recognition is an important field in computer vision.It has become a vital technology in healthcare and pharmaceuticals due to the necessity for precise medication identification to prevent errors and ensure patient safety.This survey examines the current state of pill image recognition,focusing on advancements,methodologies,and the challenges that remain unresolved.It provides a comprehensive overview of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and aims to explore the ongoing difficulties in the field.We summarize and classify the methods used in each article,compare the strengths and weaknesses of traditional image processing-based,machine learning-based,deep learning-based,and hybrid-based methods,and review benchmark datasets for pill image recognition.Additionally,we compare the performance of proposed methods on popular benchmark datasets.This survey applies recent advancements,such as Transformer models and cutting-edge technologies like Augmented Reality(AR),to discuss potential research directions and conclude the review.By offering a holistic perspective,this paper aims to serve as a valuable resource for researchers and practitioners striving to advance the field of pill image recognition.
文摘Smart grid substation operations often take place in hazardous environments and pose significant threats to the safety of power personnel.Relying solely on manual supervision can lead to inadequate oversight.In response to the demand for technology to identify improper operations in substation work scenarios,this paper proposes a substation safety action recognition technology to avoid the misoperation and enhance the safety management.In general,this paper utilizes a dual-branch transformer network to extract spatial and temporal information from the video dataset of operational behaviors in complex substation environments.Firstly,in order to capture the spatial-temporal correlation of people's behaviors in smart grid substation,we devise a sparse attention module and a segmented linear attention module that are embedded into spatial branch transformer and temporal branch transformer respectively.To avoid the redundancy of spatial and temporal information,we fuse the temporal and spatial features using a tensor decomposition fusion module by a decoupled manner.Experimental results indicate that our proposed method accurately detects improper operational behaviors in substation work scenarios,outperforming other existing methods in terms of detection and recognition accuracy.
文摘The Cosic Resonance Recognition Model (RRM) for amino acid sequences was applied to the classes of proteins displayed by four strains (Sudan, Zaire, Reston, Ivory Coast) of Ebola virus that produced either high or minimal numbers of human fatalities. The results clearly differentiated highly lethal and non-lethal strains. Solutions for the two lethal strains exhibited near ultraviolet (~230 nm) photon values while the two asymptomatic forms displayed near infrared (~1000 nm) values. Cross-correlations of spectral densities of the RRM values of the different classes of proteins associated with the genome of the viruses supported this dichotomy. The strongest coefficient occurred only between Sudan-Zaire strains but not for any of the other pairs of strains for sGP, the small glycoprotein that intercalated with the plasma cell membrane to promote insertion of viral contents into cellular space. A surprising, statistically significant cross-spectral correlation occurred between the “spike” glycoprotein component (GP1) of the virus that associated the anchoring of the virus to the mammalian cell plasma membrane and the Schumann resonance of the earth whose intensities were determined by the incidence of equatorial thunderstorms. Previous applications of the RRM to shifting photon wavelengths emitted by melanoma cells adapting to reduced ambient temperature have validated Cosic’s model and have demonstrated very narrowwave-length (about 10 nm) specificity. One possible ancillary and non-invasive treatment of people within which the fatal Ebola strains are residing would be whole body application of narrow band near-infrared light pulsed as specific physiologically-patterned sequences with sufficient radiant flux density to perfuse the entire body volume.
基金funded by the ICT Division of theMinistry of Posts,Telecommunications,and Information Technology of Bangladesh under Grant Number 56.00.0000.052.33.005.21-7(Tracking No.22FS15306)support from the University of Rajshahi.
文摘The Internet of Things(IoT)and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients.Recognizing Medical-Related Human Activities(MRHA)is pivotal for healthcare systems,particularly for identifying actions critical to patient well-being.However,challenges such as high computational demands,low accuracy,and limited adaptability persist in Human Motion Recognition(HMR).While some studies have integrated HMR with IoT for real-time healthcare applications,limited research has focused on recognizing MRHA as essential for effective patient monitoring.This study proposes a novel HMR method tailored for MRHA detection,leveraging multi-stage deep learning techniques integrated with IoT.The approach employs EfficientNet to extract optimized spatial features from skeleton frame sequences using seven Mobile Inverted Bottleneck Convolutions(MBConv)blocks,followed by Convolutional Long Short Term Memory(ConvLSTM)to capture spatio-temporal patterns.A classification module with global average pooling,a fully connected layer,and a dropout layer generates the final predictions.The model is evaluated on the NTU RGB+D 120 and HMDB51 datasets,focusing on MRHA such as sneezing,falling,walking,sitting,etc.It achieves 94.85%accuracy for cross-subject evaluations and 96.45%for cross-view evaluations on NTU RGB+D 120,along with 89.22%accuracy on HMDB51.Additionally,the system integrates IoT capabilities using a Raspberry Pi and GSM module,delivering real-time alerts via Twilios SMS service to caregivers and patients.This scalable and efficient solution bridges the gap between HMR and IoT,advancing patient monitoring,improving healthcare outcomes,and reducing costs.
文摘In computer vision and artificial intelligence,automatic facial expression-based emotion identification of humans has become a popular research and industry problem.Recent demonstrations and applications in several fields,including computer games,smart homes,expression analysis,gesture recognition,surveillance films,depression therapy,patientmonitoring,anxiety,and others,have brought attention to its significant academic and commercial importance.This study emphasizes research that has only employed facial images for face expression recognition(FER),because facial expressions are a basic way that people communicate meaning to each other.The immense achievement of deep learning has resulted in a growing use of its much architecture to enhance efficiency.This review is on machine learning,deep learning,and hybrid methods’use of preprocessing,augmentation techniques,and feature extraction for temporal properties of successive frames of data.The following section gives a brief summary of assessment criteria that are accessible to the public and then compares them with benchmark results the most trustworthy way to assess FER-related research topics statistically.In this review,a brief synopsis of the subject matter may be beneficial for novices in the field of FER as well as seasoned scholars seeking fruitful avenues for further investigation.The information conveys fundamental knowledge and provides a comprehensive understanding of the most recent state-of-the-art research.
基金the National Natural Science Foundation of China(No.71401016)the Shaanxi Provincial Natural Science Foundation of China(No.2019JM-495)the Fundamental Research Funds for Central Universities of Chang'an University(Nos.300102228110 and 300102228402)。
文摘Mahalanobis-Taguchi system(MTS)is a kind of data mining and pattern recognition method which can identify the attribute characteristics of multidimensional data by constructing Mahalanobis distance(MD)measurement scale.In this paper,considering the influence of irregular distribution of the sample data and abnormal variation of the normal data on accuracy of MTS,a feature recognition and selection model of the equipment state based on the improved MTS is proposed,and two aspects of the model namely construction of the original Mahalanobis space(MS)and determination of the threshold are studied.Firstly,the original training sample space is statistically controlled by the X-bar-S control chart,and extreme data of the single characteristic attribute is filtered to reduce the impact of extreme condition on the accuracy of the model,so as to construct a more robust MS.Furthermore,the box plot method is used to determine the threshold of the model.And the stability of the model and the tolerance to the extreme condition are improved by leaving sufficient range of the variation for the extreme condition which is identified as in the normal range.Finally,the improved model is compared with the traditional one based on the unimproved MTS by using the data from the literature.The result shows that compared with the traditional model,the accuracy and sensitivity of the improved model for state identification can be greatly enhanced.
文摘Objective: To examine and measure the decision-making processes involved in Visual Recognition of Facial Emotional Expressions (VRFEE) and to study the effects of demographic factors on this process. Method: We evaluated a newly designed software application (M.A.R.I.E.) that permits computerized metric measurement of VRFEE. We administered it to 204 cognitively normal participants ranging in age from 20 to 70 years. Results: We established normative values for the recognition of anger, disgust, joy, fear, surprise and sadness expressed on the faces of three individuals. There was a significant difference in the: 1) measurement (F (8.189) = 3896, p = 0.0001);2) education level (x2(12) = 28.4, p = 0.005);3) face (F(2.195) = 10, p = 0.0001);4)series (F (8.189)=28, p = 0.0001);5) interaction between the identity and recognition of emotions (F (16, 181 =11, p = 0.0001). However, performance did not differ according to: 1) age (F (6.19669) = 1.35, p = 0.2) or 2) level of education (F (1, 1587) = 0.6, p = 0.4). Conclusions: In healthy participants, the VRFEE remains stable throughout the lifespan when cognitive functions remain optimal. Disgust, sadness, fear, and joy seem to be the four most easily recognized facial emotions, while anger and surprise are not easily recognized. Visual recognition of disgust and fear is independent of aging. The characteristics of a face have a significant influence on the ease with which people recognize expressed emotions (idiosyncrasy). Perception and recognition of emotions is categorical, even when the facial images are integrated in a spectrum of morphs reflecting two different emotions on either side.
文摘Digit Recognition is an essential element of the process of scanning and converting documents into electronic format. In this work, a new Multiple-Cell Size (MCS) approach is being proposed for utilizing Histogram of Oriented Gradient (HOG) features and a Support Vector Machine (SVM) based classifier for efficient classification of Handwritten Digits. The HOG based technique is sensitive to the cell size selection used in the relevant feature extraction computations. Hence a new MCS approach has been used to perform HOG analysis and compute the HOG features. The system has been tested on the Benchmark MNIST Digit Database of handwritten digits and a classification accuracy of 99.36% has been achieved using an Independent Test set strategy. A Cross-Validation analysis of the classification system has also been performed using the 10-Fold Cross-Validation strategy and a 10-Fold classification accuracy of 99.26% has been obtained. The classification performance of the proposed system is superior to existing techniques using complex procedures since it has achieved at par or better results using simple operations in both the Feature Space and in the Classifier Space. The plots of the system’s Confusion Matrix and the Receiver Operating Characteristics (ROC) show evidence of the superior performance of the proposed new MCS HOG and SVM based digit classification system.
文摘Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have been proposed,most of them focus on recognizing printed Urdu characters and digits.To the best of our knowledge,very little research has focused solely on Urdu pure handwriting recognition,and the results of such proposed methods are often inadequate.In this study,we introduce a novel approach to recognizing Urdu pure handwritten digits and characters using Convolutional Neural Networks(CNN).Our proposed method utilizes convolutional layers to extract important features from input images and classifies them using fully connected layers,enabling efficient and accurate detection of Urdu handwritten digits and characters.We implemented the proposed technique on a large publicly available dataset of Urdu handwritten digits and characters.The findings demonstrate that the CNN model achieves an accuracy of 98.30%and an F1 score of 88.6%,indicating its effectiveness in detecting and classifyingUrdu handwritten digits and characters.These results have far-reaching implications for various applications,including document analysis,text recognition,and language understanding,which have previously been unexplored in the context of Urdu handwriting data.This work lays a solid foundation for future research and development in Urdu language detection and processing,opening up new opportunities for advancement in this field.
基金co-supported by the National Key Research and Development Program of China(No. 2021YFB3301504)the National Natural Science Foundation of China (Nos. 62072415, 62036010, 42301526, 62372416 and 62472389)the National Natural Science Foundation of Henan Province, China (No. 242300421215)
文摘Accurate recognition of flight deck operations for carrier-based aircraft, based on operation trajectories, is critical for optimizing carrier-based aircraft performance. This recognition involves understanding short-term and long-term spatial collaborative relationships among support agents and positions from long spatial–temporal trajectories. While the existing methods excel at recognizing collaborative behaviors from short trajectories, they often struggle with long spatial–temporal trajectories. To address this challenge, this paper introduces a dynamic graph method to enhance flight deck operation recognition. First, spatial–temporal collaborative relationships are modeled as a dynamic graph. Second, a discretized and compressed method is proposed to assign values to the states of this dynamic graph. To extract features that represent diverse collaborative relationships among agents and account for the duration of these relationships, a biased random walk is then conducted. Subsequently, the Swin Transformer is employed to comprehend spatial–temporal collaborative relationships, and a fully connected layer is applied to deck operation recognition. Finally, to address the scarcity of real datasets, a simulation pipeline is introduced to generate deck operations in virtual flight deck scenarios. Experimental results on the simulation dataset demonstrate the superior performance of the proposed method.