The satellite-based augmentation system(SBAS)provides differential and integrity augmentation services for life safety fields of aviation and navigation.However,the signal structure of SBAS is public,which incurs a ri...The satellite-based augmentation system(SBAS)provides differential and integrity augmentation services for life safety fields of aviation and navigation.However,the signal structure of SBAS is public,which incurs a risk of spoofing attacks.To improve the anti-spoofing capability of the SBAS,European Union and the United States conduct research on navigation message authentication,and promote the standardization of SBAS message authentication.For the development of Beidou satellite-based augmentation system(BDSBAS),this paper proposes navigation message authentication based on the Chinese commercial cryptographic standards.Firstly,this paper expounds the architecture and principles of the SBAS message authentication,and then carries out the design of timed efficient streaming losstolerant authentication scheme(TESLA)and elliptic curve digital signature algorithm(ECDSA)authentication schemes based on Chinese commercial cryptographic standards,message arrangement and the design of over-the-air rekeying(OTAR)message.Finally,this paper conducts a theoretical analysis of the time between authentications(TBA)and maximum authentication latency(MAL)for L5 TESLA-I and L5 ECDSA-Q,and further simulates the reception time of OTAR message,TBA and MAL from the aspects of OTAR message weight and demodulation error rate.The simulation results can provide theoretical supports for the standardization of BDSBAS message authentication.展开更多
Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in comp...Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in complex backgrounds,small target objects,and limited training data,leading to poor recognition.Fine-grained images exhibit“small inter-class differences,”and while second-order feature encoding enhances discrimination,it often requires dual Convolutional Neural Networks(CNN),increasing training time and complexity.This study proposes a model integrating discriminative region localization and efficient second-order feature encoding.By ranking feature map channels via a fully connected layer,it selects high-importance channels to generate an enhanced map,accurately locating discriminative regions.Cropping and erasing augmentations further refine recognition.To improve efficiency,a novel second-order feature encoding module generates an attention map from the fourth convolutional group of Residual Network 50 layers(ResNet-50)and multiplies it with features from the fifth group,producing second-order features while reducing dimensionality and training time.Experiments on Caltech-University of California,San Diego Birds-200-2011(CUB-200-2011),Stanford Car,and Fine-Grained Visual Classification of Aircraft(FGVC Aircraft)datasets show state-of-the-art accuracy of 88.9%,94.7%,and 93.3%,respectively.展开更多
The advent of artificial intelligence(AI)has propelled augmented reality(AR)display technology to a pivotal juncture,positioning it as a contender for the next generation of mobile intelligent terminals.However,the pu...The advent of artificial intelligence(AI)has propelled augmented reality(AR)display technology to a pivotal juncture,positioning it as a contender for the next generation of mobile intelligent terminals.However,the pursuit of advanced AR displays,particularly those capable of delivering immersive 3D experiences,is significantly hindered by the performance limitations of current hardware and the complexity of system integration.In this study,we present an innovative multi-focal plane AR display system that integrates a non-orthogonal polarization-multiplexing metasurface,freeform optical elements,and an OLED display screen.All optical elements are integrated into a single solid-state architecture,based on a joint optimization design approach of ray tracing and diffraction theory.The multi-focal plane AR visual effect is realized by the compact and multiplexing metasurface,which performs distinct phase functions across diverse polarization channels.Meanwhile,freeform surfaces offer ample design flexibility for the collaborative optimization of multi-focal plane imaging and the see-through systems.Followed by a mechanical design and prototype assembly,we demonstrate the system's capabilities in real-time and multi-focal plane display.The digital images at all virtual image distances seamlessly integrate with the real environment,fully exhibiting the system's high parallelism and real-time interactivity.With the innovative design concept and joint design method,we believe that our work will spur more innovative and compact intelligent solutions for AR displays and inject new vitality into hybrid optical systems.展开更多
In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often fa...In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often face challenges with handling high-resolution images and achieving accurate detection against complex backgrounds.To address these issues,this study employs the PatchCore unsupervised anomaly detection algorithm combined with data augmentation techniques to enhance the system’s generalization capability across varying lighting conditions,viewing angles,and object scales.The proposed method is evaluated in a complex industrial detection scenario involving the bogie of an electric multiple unit(EMU).A dataset consisting of complex backgrounds,diverse lighting conditions,and multiple viewing angles is constructed to validate the performance of the detection system in real industrial environments.Experimental results show that the proposed model achieves an average area under the receiver operating characteristic curve(AUROC)of 0.92 and an average F1 score of 0.85.Combined with data augmentation,the proposed model exhibits improvements in AUROC by 0.06 and F1 score by 0.03,demonstrating enhanced accuracy and robustness for foreign object detection in complex industrial settings.In addition,the effects of key factors on detection performance are systematically analyzed,providing practical guidance for parameter selection in real industrial applications.展开更多
In an era dominated by visual information,the display interface serves as a critical gateway between the human and digital worlds.The relentless pursuit of visual immersion has driven display technology from cinema sc...In an era dominated by visual information,the display interface serves as a critical gateway between the human and digital worlds.The relentless pursuit of visual immersion has driven display technology from cinema screens to smart-phones and now to virtual and augmented reality(VR/AR)headsets,progressively moving closer to the human eye.This evolution places unprecedented demands on pixel density,power efficiency,and form factor,pushing up against funda-mental physical and physiological limits.展开更多
Background:Penile augmentation through injectable substances is becoming increasingly common.A growing number of aesthetic clinics are developing penile enlargement procedures using various injectable materials.Althou...Background:Penile augmentation through injectable substances is becoming increasingly common.A growing number of aesthetic clinics are developing penile enlargement procedures using various injectable materials.Although these procedures are now performed in more controlled and medically supervised environments,their long-term outcomes remain poorly understood.The promotion of such medical treatments contributes to an increasing interest among adult males in self-injection as a method to alleviate psychological distress associated with penile size concerns.At the same time,access to injectable substances through unofficial or unregulated sources has become increasingly easy.Tor our knowledge,we report the first documented case of self-injection with Garamycin®(gentamicin)cream,contributing to the literature on the often multidisciplinary management of penile enlargement injections,a field still lacking well-established guidelines.Case Description:This case report describes a young patient who self-injected Garamycin®into the penis for the purpose of enlargement.He presented to our urology department with worsening symptoms,including severe and poorly tolerated pain.His primary request was prompt relief of pain while preserving,as much as possible,the aesthetic appearance and functional integrity of his penis.This case required a multi-stage surgical approach to salvage the penis and preserve both its structural integrity and functional outcome.Conclusions:To our knowledge,this case report documents the first reported instance of Garamycin®injection performed for the purpose of penile enlargement.It provides insight into the clinical course of such penile cream injections,demonstrates that a two-stage scrotal flap can achieve both functional and aesthetic outcomes,and highlights the importance of comprehensive management particularly addressing the traumatic impact of penile deformity secondary to inflammation and/or infection,as well as the body dysmorphic concerns often associated with these cases.展开更多
Legal case classification involves the categorization of legal documents into predefined categories,which facilitates legal information retrieval and case management.However,real-world legal datasets often suffer from...Legal case classification involves the categorization of legal documents into predefined categories,which facilitates legal information retrieval and case management.However,real-world legal datasets often suffer from class imbalances due to the uneven distribution of case types across legal domains.This leads to biased model performance,in the form of high accuracy for overrepresented categories and underperformance for minority classes.To address this issue,in this study,we propose a data augmentation method that masks unimportant terms within a document selectively while preserving key terms fromthe perspective of the legal domain.This approach enhances data diversity and improves the generalization capability of conventional models.Our experiments demonstrate consistent improvements achieved by the proposed augmentation strategy in terms of accuracy and F1 score across all models,validating the effectiveness of the proposed method in legal case classification.展开更多
Surgical navigation has evolved significantly through advances in augmented reality,virtual reality,and mixed reality,improving precision and safety across many clinical applications,including neurosurgery,maxillofaci...Surgical navigation has evolved significantly through advances in augmented reality,virtual reality,and mixed reality,improving precision and safety across many clinical applications,including neurosurgery,maxillofacial,spinal,and arthroplasty procedures.By integrating preoperative imaging with real-time intraoperative data,these systems provide dynamic guidance,reduce radiation exposure,and minimize tissue damage.Key challenges persist,including intraoperative registration accuracy,flexible tissue deformation,respiratory compensation,and real-time imaging quality.Emerging solutions include artificial intelligence-driven segmentation,deformation-field modeling,and hybrid registration techniques.Future developments will include lightweight,portable systems,improved non-rigid registration algorithms,and greater clinical adoption.Despite advances in rigid-tissue applications,soft-tissue navigation requires additional innovation to address motion variability and registration reliability,ultimately advancing minimally invasive surgery and precision medicine.展开更多
Automatic and accurate medical image segmentation remains a fundamental task in computer-aided diagnosis and treatment planning.Recent advances in foundation models,such as the medical-focused Segment AnythingModel(Me...Automatic and accurate medical image segmentation remains a fundamental task in computer-aided diagnosis and treatment planning.Recent advances in foundation models,such as the medical-focused Segment AnythingModel(MedSAM),have demonstrated strong performance but face challenges inmanymedical applications due to anatomical complexity and a limited domain-specific prompt.Thiswork introduces amethodology that enhances segmentation robustness and precision by automatically generating multiple informative point prompts,rather than relying on single inputs.The proposed approach randomly samples sets of spatially distributed point prompts based on image features,enabling MedSAM to better capture fine-grained anatomical structures and boundaries.During inference,probability maps are aggregated to reduce local misclassifications without additional model training.Extensive experiments on various computed tomography(CT)and magnetic resonance imaging(MRI)datasets demonstrate improvements in Dice Similarity Coefficient(DSC)and Normalized Surface Dice(NSD)metrics compared to baseline SAM and Scribble Prompt models.A semi-automatic point sampling version based on the ground truth segmentations yielded enhanced results,achieving up to 92.1%DSC and 86.6%NSD,with significant gains in delineating complex organs such as the pancreas,colon,kidney,and brain tumours.The main novelty of our method consists of effectively combining the results of multiple point prompts into the medical segmentation pipeline so that single-point prompt methods are outperformed.Overall,the proposed model offers a straightforward yet effective approach to improve medical image segmentation performance while maintaining computational efficiency.展开更多
Sign language is a primary mode of communication for individuals with hearing impairments,conveying meaning through hand shapes and hand movements.Contrary to spoken or written languages,sign language relies on the re...Sign language is a primary mode of communication for individuals with hearing impairments,conveying meaning through hand shapes and hand movements.Contrary to spoken or written languages,sign language relies on the recognition and interpretation of hand gestures captured in video data.However,sign language datasets remain relatively limited compared to those of other languages,which hinders the training and performance of deep learning models.Additionally,the distinct word order of sign language,unlike that of spoken language,requires context-aware and natural sentence generation.To address these challenges,this study applies data augmentation techniques to build a Korean Sign Language dataset and train recognition models.Recognized words are then reconstructed into complete sentences.The sign recognition process uses OpenCV and MediaPipe to extract hand landmarks from sign language videos and analyzes hand position,orientation,and motion.The extracted features are converted into time-series data and fed into a Long Short-Term Memory(LSTM)model.The proposed recognition framework achieved an accuracy of up to 81.25%,while the sentence generation achieved an accuracy of up to 95%.The proposed approach is expected to be applicable not only to Korean Sign Language but also to other low-resource sign languages for recognition and translation tasks.展开更多
Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and ...Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.展开更多
Quality control plays a critical role in modern manufacturing.With the rapid development of electric vehicles,5G communications,and the semiconductor industry,high-speed and high-precision detection of surface defects...Quality control plays a critical role in modern manufacturing.With the rapid development of electric vehicles,5G communications,and the semiconductor industry,high-speed and high-precision detection of surface defects on silicon carbide(SiC)wafers has become essential.This study developed an automated inspection framework for identifying surface defects on SiC wafers during the coarse grinding stage.Thecomplex machining textures on wafer surfaces hinder conventional machine vision models,often leading to misjudgment.To address this,deep learning algorithms were applied for defect classification.Because defects are rare and imbalanced across categories,data augmentation was performed using aWasserstein generative adversarial network with gradient penalty(WGAN-GP),along with conventionalmethods.An improved YOLOv8-seg instance segmentationmodel was then trained and tested on datasets with different augmentation strategies.Experimental results showed that,when trained withWGAN-GP–generated data,YOLOv8-seg achieved mean average precision values of 87.0%(bounding box)and 86.6%(segmentation mask).Compared with the traditional WGAN-GP,the proposed model reduced Frechet inception distance by 32.2%and multiscale structural similarity index by 29.8%,generating more realistic and diverse defect images.The proposed framework effectively improves defect detection accuracy under limited data conditions and shows strong potential for industrial applications.展开更多
Many spore-forming Bacillus species can cause serious human diseases,because of accidental Bacillusspore infection.Thus,developing an identification strategy with both high sensitivity and specificity is greatly in de...Many spore-forming Bacillus species can cause serious human diseases,because of accidental Bacillusspore infection.Thus,developing an identification strategy with both high sensitivity and specificity is greatly in demand.In this work,we proposed a novel approach named multi-head self-attention mechanism-guided neural network Raman platform to identify living Bacillus spores within a single-cell resolution.The multi-head self-attention mechanism-guided neural network Raman platform was created by combining single-cell Raman spectroscopy,convolutional neural network(CNN),and multi-head self-attention mechanism.To address the limited size of the original spectra dataset,Gaussian noise-based spectra augmentation was employed to increase the number of single-cell Raman spectra datasets for CNN training.Owing to the assistance of both spectra augmentation and multi-head self-attention mechanism,the obtained prediction accuracy of five Bacillus spore species was further improved from 92.29±0.82%to 99.43±0.15%.To figure out the spectra differences covered by the multi-head self-attention mechanism-guided CNN,the relative classification weight from typical Raman bands was visualized via multi-head self-attention mechanism curve.In the process of spectra augmentation from 0 to 1000,the distribution of relative classification weight varied from a discrete state to a more concentrated phase.More importantly,these highlighted four Raman bands(1017,1449,1576,and 1660 cm^(-1))were assigned large weights,showing that the spectra differences in the Raman bands produced the largest contribution to prediction accuracy.It can be foreseen that,our proposed sorting platform has great potential in accurately identifying Bacillus and its related genera species at a single-cell level.展开更多
IoT devices are highly vulnerable to cyberattacks due to their widespread,distributed nature and limited security features.Intrusion detection can counter these threats,but class imbalance between normal and abnormal ...IoT devices are highly vulnerable to cyberattacks due to their widespread,distributed nature and limited security features.Intrusion detection can counter these threats,but class imbalance between normal and abnormal traffic often degrades model performance.We propose a novel multi-generator adversarial data augmentation method that blends the strengths of TMG-GAN(Tabular Multi-Generator Generative Adversarial Network)and R3GAN(Re-GAN).Our approach uses multiple class-specific generators to create diverse,high-quality synthetic samples,improving training stability and minority-class detection.A dual-branch discriminator-classifier enhances authenticity and class prediction,while feature similarity and decoupling techniques ensure clear class separation.Experiments on TON-IoT and Edge-IIoTset datasets show our method outperforms existing techniques like hybrid sampling,SNGAN(Spectral Normalization GAN),and TMG-GAN,achieving higher detection accuracy and better minority-class recall for imbalanced IoT intrusion detection.展开更多
To address the issue of inconsistent image quality and data scarcity in bolt defect detection for transmission lines,this paper proposes an improved sparse region-based convolutional neural network(RCNN) based detecti...To address the issue of inconsistent image quality and data scarcity in bolt defect detection for transmission lines,this paper proposes an improved sparse region-based convolutional neural network(RCNN) based detection framework integrating image quality evaluation and text-to-image data augmentation.First,a HyperNetwork-based image quality assessment module is introduced to filter low-quality inspection images in terms of clarity and structural integrity,resulting in a high-quality training dataset.Second,a text-to-image diffusion model is utilized for sample augmentation.By designing text prompts that describe various bolt defect types under diverse lighting and viewing conditions,the model automatically generates realistic synthetic samples.The generated images are further filtered using a combination of quality and perceptual similarity metrics to ensure consistency with the real data distribution.Building upon the sparse RCNN baseline,a dynamic label assignment mechanism and a random decision path detection head are incorporated to enhance bounding box matching and prediction accuracy.Experimental results demonstrate that the proposed method significantly improves detection accuracy(mAP@0.5) over the original sparse RCNN while maintaining low computational cost,enabling more efficient and intelligent inspection of transmission line components.展开更多
This study focuses on developing a deep learning model capable of recognizing vehicle brands and models,integrated with a law enforcement intelligence platform to overcome the limitations of existing license plate rec...This study focuses on developing a deep learning model capable of recognizing vehicle brands and models,integrated with a law enforcement intelligence platform to overcome the limitations of existing license plate recognition techniques—particularly in handling counterfeit,obscured,or absent plates.The research first entailed collecting,annotating,and classifying images of various vehiclemodels,leveraging image processing and feature extraction methodologies to train themodel on Microsoft Custom Vision.Experimental results indicate that,formost brands and models,the system achieves stable and relatively high performance in Precision,Recall,and Average Precision(AP).Furthermore,simulated tests involving illicit vehicles reveal that,even in cases of reassigned,concealed,or missing license plates,the model can rely on exterior body features to effectively identify vehicles,reducing dependence on plate-specific data.In practical law enforcement scenarios,these findings can accelerate investigations of stolen or forged plates and enhance overall accuracy.In conclusion,continued collection of vehicle images across broadermodel types,production years,and modification levels—along with refined annotation processes and parameter adjustment strategies—will further strengthen themethod’s applicability within law enforcement intelligence platforms,facilitating more precise and comprehensive vehicle recognition and control in real-world operations.展开更多
The emergence of a multipolar global order is fundamentally reshaping international geopolitical landscape,with building communities with neighboring countries led by regional powers emerging as significant factors in...The emergence of a multipolar global order is fundamentally reshaping international geopolitical landscape,with building communities with neighboring countries led by regional powers emerging as significant factors in geopolitics.For regional powers aiming to augment their geopolitical influence,the building of communities with neighboring countries has become a strategic imperative.Brazil exemplifies distinct models of regional community building within South America and the Amazon region.In South America,Brazil prioritizes consensual power-building,aspiring to establish a“power pole”centered on itself.展开更多
This paper focuses on self-supervised video representation learning.Most existing approaches follow the contrastive learning pipeline to construct positive and negative pairs by sampling different clips.However,this f...This paper focuses on self-supervised video representation learning.Most existing approaches follow the contrastive learning pipeline to construct positive and negative pairs by sampling different clips.However,this formulation tends to bias the static background and has difficulty establishing global temporal structures.The major reason is that the positive pairs,i.e.,different clips sampled from the same video,have limited temporal receptive fields,and usually share similar backgrounds but differ in motions.To address these problems,we propose a framework to jointly utilize local clips and global videos to learn from detailed region-level correspondence as well as general long-term temporal relations.Based on a set of designed controllable augmentations,we implement accurate appearance and motion pattern alignment through soft spatio-temporal region contrast.Our formulation avoids the low-level redundancy shortcut with an adversarial mutual information minimization objective to improve the generalization ability.Moreover,we introduce local-global temporal order dependency to further bridge the gap between clip-level and video-level representations for robust temporal modeling.Extensive experiments demonstrate that our framework is superior on three video benchmarks in action recognition and video retrieval,and captures more accurate temporal dynamics.展开更多
基金supported by National Natural Science Foundation of China:Space-based occultation detection with ground-based GNSS atmospheric horizontal gradient model(41904033).
文摘The satellite-based augmentation system(SBAS)provides differential and integrity augmentation services for life safety fields of aviation and navigation.However,the signal structure of SBAS is public,which incurs a risk of spoofing attacks.To improve the anti-spoofing capability of the SBAS,European Union and the United States conduct research on navigation message authentication,and promote the standardization of SBAS message authentication.For the development of Beidou satellite-based augmentation system(BDSBAS),this paper proposes navigation message authentication based on the Chinese commercial cryptographic standards.Firstly,this paper expounds the architecture and principles of the SBAS message authentication,and then carries out the design of timed efficient streaming losstolerant authentication scheme(TESLA)and elliptic curve digital signature algorithm(ECDSA)authentication schemes based on Chinese commercial cryptographic standards,message arrangement and the design of over-the-air rekeying(OTAR)message.Finally,this paper conducts a theoretical analysis of the time between authentications(TBA)and maximum authentication latency(MAL)for L5 TESLA-I and L5 ECDSA-Q,and further simulates the reception time of OTAR message,TBA and MAL from the aspects of OTAR message weight and demodulation error rate.The simulation results can provide theoretical supports for the standardization of BDSBAS message authentication.
基金supported,in part,by the National Nature Science Foundation of China under Grant 62272236,62376128 and 62306139the Natural Science Foundation of Jiangsu Province under Grant BK20201136,BK20191401.
文摘Discriminative region localization and efficient feature encoding are crucial for fine-grained object recognition.However,existing data augmentation methods struggle to accurately locate discriminative regions in complex backgrounds,small target objects,and limited training data,leading to poor recognition.Fine-grained images exhibit“small inter-class differences,”and while second-order feature encoding enhances discrimination,it often requires dual Convolutional Neural Networks(CNN),increasing training time and complexity.This study proposes a model integrating discriminative region localization and efficient second-order feature encoding.By ranking feature map channels via a fully connected layer,it selects high-importance channels to generate an enhanced map,accurately locating discriminative regions.Cropping and erasing augmentations further refine recognition.To improve efficiency,a novel second-order feature encoding module generates an attention map from the fourth convolutional group of Residual Network 50 layers(ResNet-50)and multiplies it with features from the fifth group,producing second-order features while reducing dimensionality and training time.Experiments on Caltech-University of California,San Diego Birds-200-2011(CUB-200-2011),Stanford Car,and Fine-Grained Visual Classification of Aircraft(FGVC Aircraft)datasets show state-of-the-art accuracy of 88.9%,94.7%,and 93.3%,respectively.
基金funding provided by National Natural Science Foundation of China(U21A20140)National Key Research and Development Program of China(2021YFA1401200)+2 种基金Beijing Natural Science Foundation(JQ24028)Beijing Nova Program(20240484557)Synergetic Extreme Condition User Facility(SECUF).
文摘The advent of artificial intelligence(AI)has propelled augmented reality(AR)display technology to a pivotal juncture,positioning it as a contender for the next generation of mobile intelligent terminals.However,the pursuit of advanced AR displays,particularly those capable of delivering immersive 3D experiences,is significantly hindered by the performance limitations of current hardware and the complexity of system integration.In this study,we present an innovative multi-focal plane AR display system that integrates a non-orthogonal polarization-multiplexing metasurface,freeform optical elements,and an OLED display screen.All optical elements are integrated into a single solid-state architecture,based on a joint optimization design approach of ray tracing and diffraction theory.The multi-focal plane AR visual effect is realized by the compact and multiplexing metasurface,which performs distinct phase functions across diverse polarization channels.Meanwhile,freeform surfaces offer ample design flexibility for the collaborative optimization of multi-focal plane imaging and the see-through systems.Followed by a mechanical design and prototype assembly,we demonstrate the system's capabilities in real-time and multi-focal plane display.The digital images at all virtual image distances seamlessly integrate with the real environment,fully exhibiting the system's high parallelism and real-time interactivity.With the innovative design concept and joint design method,we believe that our work will spur more innovative and compact intelligent solutions for AR displays and inject new vitality into hybrid optical systems.
文摘In modern industrial production,foreign object detection in complex environments is crucial to ensure product quality and production safety.Detection systems based on deep-learning image processing algorithms often face challenges with handling high-resolution images and achieving accurate detection against complex backgrounds.To address these issues,this study employs the PatchCore unsupervised anomaly detection algorithm combined with data augmentation techniques to enhance the system’s generalization capability across varying lighting conditions,viewing angles,and object scales.The proposed method is evaluated in a complex industrial detection scenario involving the bogie of an electric multiple unit(EMU).A dataset consisting of complex backgrounds,diverse lighting conditions,and multiple viewing angles is constructed to validate the performance of the detection system in real industrial environments.Experimental results show that the proposed model achieves an average area under the receiver operating characteristic curve(AUROC)of 0.92 and an average F1 score of 0.85.Combined with data augmentation,the proposed model exhibits improvements in AUROC by 0.06 and F1 score by 0.03,demonstrating enhanced accuracy and robustness for foreign object detection in complex industrial settings.In addition,the effects of key factors on detection performance are systematically analyzed,providing practical guidance for parameter selection in real industrial applications.
基金supported by the National Natural Science Foundation of China(Grant No.22105106)the Jiangsu Youth Science and Technology Talent Support Program(Grant No.JSTJ-2025-063)+1 种基金Nanjing Science and Technology Innovation Project for Overseas Students(Grant No.NJKCZYZZ2022-05)Start-up Funding from NUPTSF(Grant No.NY221003).
文摘In an era dominated by visual information,the display interface serves as a critical gateway between the human and digital worlds.The relentless pursuit of visual immersion has driven display technology from cinema screens to smart-phones and now to virtual and augmented reality(VR/AR)headsets,progressively moving closer to the human eye.This evolution places unprecedented demands on pixel density,power efficiency,and form factor,pushing up against funda-mental physical and physiological limits.
文摘Background:Penile augmentation through injectable substances is becoming increasingly common.A growing number of aesthetic clinics are developing penile enlargement procedures using various injectable materials.Although these procedures are now performed in more controlled and medically supervised environments,their long-term outcomes remain poorly understood.The promotion of such medical treatments contributes to an increasing interest among adult males in self-injection as a method to alleviate psychological distress associated with penile size concerns.At the same time,access to injectable substances through unofficial or unregulated sources has become increasingly easy.Tor our knowledge,we report the first documented case of self-injection with Garamycin®(gentamicin)cream,contributing to the literature on the often multidisciplinary management of penile enlargement injections,a field still lacking well-established guidelines.Case Description:This case report describes a young patient who self-injected Garamycin®into the penis for the purpose of enlargement.He presented to our urology department with worsening symptoms,including severe and poorly tolerated pain.His primary request was prompt relief of pain while preserving,as much as possible,the aesthetic appearance and functional integrity of his penis.This case required a multi-stage surgical approach to salvage the penis and preserve both its structural integrity and functional outcome.Conclusions:To our knowledge,this case report documents the first reported instance of Garamycin®injection performed for the purpose of penile enlargement.It provides insight into the clinical course of such penile cream injections,demonstrates that a two-stage scrotal flap can achieve both functional and aesthetic outcomes,and highlights the importance of comprehensive management particularly addressing the traumatic impact of penile deformity secondary to inflammation and/or infection,as well as the body dysmorphic concerns often associated with these cases.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)[RS-2021-II211341,Artificial Intelligence Graduate School Program(Chung-Ang University)],and by the Chung-Ang University Graduate Research Scholarship in 2024.
文摘Legal case classification involves the categorization of legal documents into predefined categories,which facilitates legal information retrieval and case management.However,real-world legal datasets often suffer from class imbalances due to the uneven distribution of case types across legal domains.This leads to biased model performance,in the form of high accuracy for overrepresented categories and underperformance for minority classes.To address this issue,in this study,we propose a data augmentation method that masks unimportant terms within a document selectively while preserving key terms fromthe perspective of the legal domain.This approach enhances data diversity and improves the generalization capability of conventional models.Our experiments demonstrate consistent improvements achieved by the proposed augmentation strategy in terms of accuracy and F1 score across all models,validating the effectiveness of the proposed method in legal case classification.
基金Supported by the National Natural Science Foundation of China(NSFC)under Grants 62025104,62422102,62331005,62301034,and U22A2052the Beijing Natural Science Foundation-Daxing Innovation Joint Fund(L256040).
文摘Surgical navigation has evolved significantly through advances in augmented reality,virtual reality,and mixed reality,improving precision and safety across many clinical applications,including neurosurgery,maxillofacial,spinal,and arthroplasty procedures.By integrating preoperative imaging with real-time intraoperative data,these systems provide dynamic guidance,reduce radiation exposure,and minimize tissue damage.Key challenges persist,including intraoperative registration accuracy,flexible tissue deformation,respiratory compensation,and real-time imaging quality.Emerging solutions include artificial intelligence-driven segmentation,deformation-field modeling,and hybrid registration techniques.Future developments will include lightweight,portable systems,improved non-rigid registration algorithms,and greater clinical adoption.Despite advances in rigid-tissue applications,soft-tissue navigation requires additional innovation to address motion variability and registration reliability,ultimately advancing minimally invasive surgery and precision medicine.
基金supported by the Autonomous Government of Andalusia(Spain)under project UMA20-FEDERJA-108also by the Ministry of Science and Innovation of Spain,grant number PID2022-136764OA-I00+1 种基金It includes funds fromthe European Regional Development Fund(ERDF),It is also partially supported by the Fundación Unicaja(PUNI-003_2023)the Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND(ATECH-25-02).
文摘Automatic and accurate medical image segmentation remains a fundamental task in computer-aided diagnosis and treatment planning.Recent advances in foundation models,such as the medical-focused Segment AnythingModel(MedSAM),have demonstrated strong performance but face challenges inmanymedical applications due to anatomical complexity and a limited domain-specific prompt.Thiswork introduces amethodology that enhances segmentation robustness and precision by automatically generating multiple informative point prompts,rather than relying on single inputs.The proposed approach randomly samples sets of spatially distributed point prompts based on image features,enabling MedSAM to better capture fine-grained anatomical structures and boundaries.During inference,probability maps are aggregated to reduce local misclassifications without additional model training.Extensive experiments on various computed tomography(CT)and magnetic resonance imaging(MRI)datasets demonstrate improvements in Dice Similarity Coefficient(DSC)and Normalized Surface Dice(NSD)metrics compared to baseline SAM and Scribble Prompt models.A semi-automatic point sampling version based on the ground truth segmentations yielded enhanced results,achieving up to 92.1%DSC and 86.6%NSD,with significant gains in delineating complex organs such as the pancreas,colon,kidney,and brain tumours.The main novelty of our method consists of effectively combining the results of multiple point prompts into the medical segmentation pipeline so that single-point prompt methods are outperformed.Overall,the proposed model offers a straightforward yet effective approach to improve medical image segmentation performance while maintaining computational efficiency.
基金supported by the Institute of Information&Communications Technoljogy Planning&Evaluation(IITP)-Innovative Human Resource Development for Local Intellectualization Program grant funded by the Korea government(MSIT)(IITP-2026-RS-2022-00156334,50%)the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1C1C2011105,50%).
文摘Sign language is a primary mode of communication for individuals with hearing impairments,conveying meaning through hand shapes and hand movements.Contrary to spoken or written languages,sign language relies on the recognition and interpretation of hand gestures captured in video data.However,sign language datasets remain relatively limited compared to those of other languages,which hinders the training and performance of deep learning models.Additionally,the distinct word order of sign language,unlike that of spoken language,requires context-aware and natural sentence generation.To address these challenges,this study applies data augmentation techniques to build a Korean Sign Language dataset and train recognition models.Recognized words are then reconstructed into complete sentences.The sign recognition process uses OpenCV and MediaPipe to extract hand landmarks from sign language videos and analyzes hand position,orientation,and motion.The extracted features are converted into time-series data and fed into a Long Short-Term Memory(LSTM)model.The proposed recognition framework achieved an accuracy of up to 81.25%,while the sentence generation achieved an accuracy of up to 95%.The proposed approach is expected to be applicable not only to Korean Sign Language but also to other low-resource sign languages for recognition and translation tasks.
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.
基金the National Natural Science Foundation of China(Grant No.:52508343)the Fundamental Research Funds for the Central Universities(Grant No.:B250201004).
文摘Crack detection accuracy in computer vision is often constrained by limited annotated datasets.Although Generative Adversarial Networks(GANs)have been applied for data augmentation,they frequently introduce blurs and artifacts.To address this challenge,this study leverages Denoising Diffusion Probabilistic Models(DDPMs)to generate high-quality synthetic crack images,enriching the training set with diverse and structurally consistent samples that enhance the crack segmentation.The proposed framework involves a two-stage pipeline:first,DDPMs are used to synthesize high-fidelity crack images that capture fine structural details.Second,these generated samples are combined with real data to train segmentation networks,thereby improving accuracy and robustness in crack detection.Compared with GAN-based approaches,DDPM achieved the best fidelity,with the highest Structural Similarity Index(SSIM)(0.302)and lowest Learned Perceptual Image Patch Similarity(LPIPS)(0.461),producing artifact-free images that preserve fine crack details.To validate its effectiveness,six segmentation models were tested,among which LinkNet consistently achieved the best performance,excelling in both region-level accuracy and structural continuity.Incorporating DDPM-augmented data further enhanced segmentation outcomes,increasing F1 scores by up to 1.1%and IoU by 1.7%,while also improving boundary alignment and skeleton continuity compared with models trained on real images alone.Experiments with varying augmentation ratios showed consistent improvements,with F1 rising from 0.946(no augmentation)to 0.957 and IoU from 0.897 to 0.913 at the highest ratio.These findings demonstrate the effectiveness of diffusion-based augmentation for complex crack detection in structural health monitoring.
基金funded by the National Science and Technology Council(NSTC),Taiwan,grant number NSTC 114-2218-E-167-001.
文摘Quality control plays a critical role in modern manufacturing.With the rapid development of electric vehicles,5G communications,and the semiconductor industry,high-speed and high-precision detection of surface defects on silicon carbide(SiC)wafers has become essential.This study developed an automated inspection framework for identifying surface defects on SiC wafers during the coarse grinding stage.Thecomplex machining textures on wafer surfaces hinder conventional machine vision models,often leading to misjudgment.To address this,deep learning algorithms were applied for defect classification.Because defects are rare and imbalanced across categories,data augmentation was performed using aWasserstein generative adversarial network with gradient penalty(WGAN-GP),along with conventionalmethods.An improved YOLOv8-seg instance segmentationmodel was then trained and tested on datasets with different augmentation strategies.Experimental results showed that,when trained withWGAN-GP–generated data,YOLOv8-seg achieved mean average precision values of 87.0%(bounding box)and 86.6%(segmentation mask).Compared with the traditional WGAN-GP,the proposed model reduced Frechet inception distance by 32.2%and multiscale structural similarity index by 29.8%,generating more realistic and diverse defect images.The proposed framework effectively improves defect detection accuracy under limited data conditions and shows strong potential for industrial applications.
基金partially supported by the National Natural Science Foundation of China(62075137)the Guangdong Basic and Applied Basic Research Foundation(2023A1515140161)+3 种基金the Guangxi Natural Science Foundation of China(2021JJB 110003)the Dongguan Science and Technology of Social Development Program(20231800936312)the high-level talent program of Dongguan University of Technology(No.221110080)the Sanming Project of Medicine in Shenzhen(No.SZSM202103014).
文摘Many spore-forming Bacillus species can cause serious human diseases,because of accidental Bacillusspore infection.Thus,developing an identification strategy with both high sensitivity and specificity is greatly in demand.In this work,we proposed a novel approach named multi-head self-attention mechanism-guided neural network Raman platform to identify living Bacillus spores within a single-cell resolution.The multi-head self-attention mechanism-guided neural network Raman platform was created by combining single-cell Raman spectroscopy,convolutional neural network(CNN),and multi-head self-attention mechanism.To address the limited size of the original spectra dataset,Gaussian noise-based spectra augmentation was employed to increase the number of single-cell Raman spectra datasets for CNN training.Owing to the assistance of both spectra augmentation and multi-head self-attention mechanism,the obtained prediction accuracy of five Bacillus spore species was further improved from 92.29±0.82%to 99.43±0.15%.To figure out the spectra differences covered by the multi-head self-attention mechanism-guided CNN,the relative classification weight from typical Raman bands was visualized via multi-head self-attention mechanism curve.In the process of spectra augmentation from 0 to 1000,the distribution of relative classification weight varied from a discrete state to a more concentrated phase.More importantly,these highlighted four Raman bands(1017,1449,1576,and 1660 cm^(-1))were assigned large weights,showing that the spectra differences in the Raman bands produced the largest contribution to prediction accuracy.It can be foreseen that,our proposed sorting platform has great potential in accurately identifying Bacillus and its related genera species at a single-cell level.
基金Supported by the Key R&D Projects in Hubei Province(2025BAB018,2022BAA041)and Wuhan University Comprehensive Undergraduate Education Quality Reform Project。
文摘IoT devices are highly vulnerable to cyberattacks due to their widespread,distributed nature and limited security features.Intrusion detection can counter these threats,but class imbalance between normal and abnormal traffic often degrades model performance.We propose a novel multi-generator adversarial data augmentation method that blends the strengths of TMG-GAN(Tabular Multi-Generator Generative Adversarial Network)and R3GAN(Re-GAN).Our approach uses multiple class-specific generators to create diverse,high-quality synthetic samples,improving training stability and minority-class detection.A dual-branch discriminator-classifier enhances authenticity and class prediction,while feature similarity and decoupling techniques ensure clear class separation.Experiments on TON-IoT and Edge-IIoTset datasets show our method outperforms existing techniques like hybrid sampling,SNGAN(Spectral Normalization GAN),and TMG-GAN,achieving higher detection accuracy and better minority-class recall for imbalanced IoT intrusion detection.
基金Supported by the Science and Technology Project from State Grid Corporation of China (No.5700-202490330A-2-1-ZX)。
文摘To address the issue of inconsistent image quality and data scarcity in bolt defect detection for transmission lines,this paper proposes an improved sparse region-based convolutional neural network(RCNN) based detection framework integrating image quality evaluation and text-to-image data augmentation.First,a HyperNetwork-based image quality assessment module is introduced to filter low-quality inspection images in terms of clarity and structural integrity,resulting in a high-quality training dataset.Second,a text-to-image diffusion model is utilized for sample augmentation.By designing text prompts that describe various bolt defect types under diverse lighting and viewing conditions,the model automatically generates realistic synthetic samples.The generated images are further filtered using a combination of quality and perceptual similarity metrics to ensure consistency with the real data distribution.Building upon the sparse RCNN baseline,a dynamic label assignment mechanism and a random decision path detection head are incorporated to enhance bounding box matching and prediction accuracy.Experimental results demonstrate that the proposed method significantly improves detection accuracy(mAP@0.5) over the original sparse RCNN while maintaining low computational cost,enabling more efficient and intelligent inspection of transmission line components.
基金the National Science and Technology Council,Taiwan,for financially supporting this research(grant No.NSTC 114-2221-E-018-003)the Ministry of Education’s Teaching Practice Research Program,Taiwan(PSK1142780).
文摘This study focuses on developing a deep learning model capable of recognizing vehicle brands and models,integrated with a law enforcement intelligence platform to overcome the limitations of existing license plate recognition techniques—particularly in handling counterfeit,obscured,or absent plates.The research first entailed collecting,annotating,and classifying images of various vehiclemodels,leveraging image processing and feature extraction methodologies to train themodel on Microsoft Custom Vision.Experimental results indicate that,formost brands and models,the system achieves stable and relatively high performance in Precision,Recall,and Average Precision(AP).Furthermore,simulated tests involving illicit vehicles reveal that,even in cases of reassigned,concealed,or missing license plates,the model can rely on exterior body features to effectively identify vehicles,reducing dependence on plate-specific data.In practical law enforcement scenarios,these findings can accelerate investigations of stolen or forged plates and enhance overall accuracy.In conclusion,continued collection of vehicle images across broadermodel types,production years,and modification levels—along with refined annotation processes and parameter adjustment strategies—will further strengthen themethod’s applicability within law enforcement intelligence platforms,facilitating more precise and comprehensive vehicle recognition and control in real-world operations.
文摘The emergence of a multipolar global order is fundamentally reshaping international geopolitical landscape,with building communities with neighboring countries led by regional powers emerging as significant factors in geopolitics.For regional powers aiming to augment their geopolitical influence,the building of communities with neighboring countries has become a strategic imperative.Brazil exemplifies distinct models of regional community building within South America and the Amazon region.In South America,Brazil prioritizes consensual power-building,aspiring to establish a“power pole”centered on itself.
基金supported in part by the National Natural Science Foundation of China(No.62325109,U21B2013).
文摘This paper focuses on self-supervised video representation learning.Most existing approaches follow the contrastive learning pipeline to construct positive and negative pairs by sampling different clips.However,this formulation tends to bias the static background and has difficulty establishing global temporal structures.The major reason is that the positive pairs,i.e.,different clips sampled from the same video,have limited temporal receptive fields,and usually share similar backgrounds but differ in motions.To address these problems,we propose a framework to jointly utilize local clips and global videos to learn from detailed region-level correspondence as well as general long-term temporal relations.Based on a set of designed controllable augmentations,we implement accurate appearance and motion pattern alignment through soft spatio-temporal region contrast.Our formulation avoids the low-level redundancy shortcut with an adversarial mutual information minimization objective to improve the generalization ability.Moreover,we introduce local-global temporal order dependency to further bridge the gap between clip-level and video-level representations for robust temporal modeling.Extensive experiments demonstrate that our framework is superior on three video benchmarks in action recognition and video retrieval,and captures more accurate temporal dynamics.