Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relat...Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline.展开更多
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear...The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.展开更多
In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and ta...In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and targeted marketing.However,existing computer vision solutions often rely on facial recognition to gather such insights,raising significant privacy and ethical concerns.To address these issues,this paper presents a privacypreserving customer analytics system through two key strategies.First,we deploy a deep learning framework using YOLOv9s,trained on the RCA-TVGender dataset.Cameras are positioned perpendicular to observation areas to reduce facial visibility while maintaining accurate gender classification.Second,we apply AES-128 encryption to customer position data,ensuring secure access and regulatory compliance.Our system achieved overall performance,with 81.5%mAP@50,77.7%precision,and 75.7%recall.Moreover,a 90-min observational study confirmed the system’s ability to generate privacy-protected heatmaps revealing distinct behavioral patterns between male and female customers.For instance,women spent more time in certain areas and showed interest in different products.These results confirm the system’s effectiveness in enabling personalized layout and marketing strategies without compromising privacy.展开更多
This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o...This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.展开更多
Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a...Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a region characterized by intense ultraviolet radiation and arid climatic conditions,Mangxin Town presents unique environmental challenges that may exacerbate ocular health issues.Despite the global emphasis on addressing vision impairment among aging populations,there remains a paucity of updated and region-specific data in Xinjiang,necessitating this comprehensive assessment to inform targeted interventions.Methods:A cross-sectional study was conducted from May to June 2024,involving 1,311 elderly participants(76.76%participation rate)out of a total eligible population of 1,708 individuals aged≥60 years.Participants underwent detailed ocular examinations,including assessments of uncorrected visual acuity(UVA)and best-corrected visual acuity(BCVA)using standard logarithmic charts,slit-lamp biomicroscopy,optical coherence tomography(OCT,Topcon DRI OCT Triton),fundus photography,and intraocular pressure measurement(Canon TX-20 Tonometer).A multidisciplinary team of 10 ophthalmologists and 2 local village doctors,trained rigorously in standardized protocols,ensured consistent data collection.Demographic,lifestyle,and medical history data were collected via questionnaires.Statistical analyses,performed using STATA 16,included multivariate logistic regression to identify risk factors,with significance defined as P<0.05.Results:The overall prevalence of vision impairment was 13.21%(95%CI:11.37%-15.04%),with low vision at 11.76%(95%CI:10.01%-13.50%)and blindness at 1.45%(95%CI:0.80%-2.10%).Cataract emerged as the leading cause,responsible for 68.20%of cases,followed by glaucoma(5.80%),optic atrophy(5.20%),and age-related macular degeneration(2.90%).Vision impairment prevalence escalated significantly with age:7.74%in the 60–69 age group,17.79%in 70–79,and 33.72%in those≥80.Males exhibited higher prevalence than females(15.84%vs.10.45%,P=0.004).Multivariate analysis revealed age≥80 years(OR=6.43,95%CI:3.79%-10.90%),male sex(OR=0.53,95%CI:0.34%-0.83%),and daily exercise(OR=0.44,95%CI:0.20%-0.95%)as significant factors.History of eye disease showed a non-significant trend toward increased risk(OR=1.49,P=0.107).Education level,income,and smoking status showed no significant associations.Conclusions:This study underscores cataract as the predominant cause of vision impairment in Mangxin Town’s elderly population,with age and sex as critical determinants.The findings align with global patterns but highlight region-specific challenges,such as environmental factors contributing to cataract prevalence.Public health strategies should prioritize improving access to cataract surgery,enhancing grassroots ophthalmic infrastructure,and integrating portable screening technologies for early detection of fundus diseases.Additionally,promoting health education on UV protection and lifestyle modifications,such as regular exercise,may mitigate risks.Future research should expand to broader regions in Xinjiang,employ advanced diagnostic tools for complex conditions like glaucoma,and explore longitudinal trends to refine intervention strategies.These efforts are vital to reducing preventable blindness and improving quality of life for aging populations in underserved areas.展开更多
Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hinde...Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hindered by three challenges:①defect scale variance,motion blur,and strong illumination significantly affect the accuracy and reliability of damage detectors;②existing commonly used anchor-based damage detectors struggle to effectively generalize to harsh real-world scenarios;and③convolutional neural networks(CNNs)lack the capability to model long-range dependencies across the entire image.This paper presents an efficient Vision Transformer-enhanced anchor-free YOLO(you only look once)method to address these challenges.First,a concrete bridge damage dataset was established,augmented by motion blur and varying brightness.Four key enhancements were then applied to an anchor-based YOLO method:①Four detection heads were introduced to alleviate the multi-scale damage detection issue;②decoupled heads were employed to address the conflict between classification and bounding box regression tasks inherent in the original coupled head design;③an anchor-free mechanism was incorporated to reduce the computational complexity and improve generalization to real-world scenarios;and④a novel Vision Transformer block,C3MaxViT,was added to enable CNNs to model long-range dependencies.These enhancements were integrated into an advanced anchor-based YOLOv5l algorithm,and the proposed Vision Transformer-enhanced anchor-free YOLO method was then compared against cutting-edge damage detection methods.The experimental results demonstrated the effectiveness of the proposed method,with an increase of 8.1%in mean average precision at intersection over union threshold of 0.5(mAP_(50))and an improvement of 8.4%in mAP@[0.5:.05:.95]respectively.Furthermore,extensive ablation studies revealed that the four detection heads,decoupled head design,anchor-free mechanism,and C3MaxViT contributed improvements of 2.4%,1.2%,2.6%,and 1.9%in mAP50,respectively.展开更多
Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness...Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.展开更多
China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars ar...China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars around the world are watching closely.For over 70 years,these plans have guided the country’s economic and social development.展开更多
Dear Editor,This letter proposes a novel dynamic vision-enabled intelligent micro-vibration estimation method with spatiotemporal pattern consistency.Inspired by biological vision,dynamic vision data are collected by ...Dear Editor,This letter proposes a novel dynamic vision-enabled intelligent micro-vibration estimation method with spatiotemporal pattern consistency.Inspired by biological vision,dynamic vision data are collected by the event camera,which is able to capture the micro-vibration information of mechanical equipment,due to the significant advantage of extremely high temporal sampling frequency.展开更多
Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and...Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and are among the most cost-effective procedures in all of medicine and international development.[1-2]Thus,vision impairment is both extremely common and,in principle,readily manageable.展开更多
Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learni...Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.展开更多
To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is deve...To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is developed to identify the geometric parameters.The study utilizes a common precast element for highway bridges as the research subject.First,edge feature points of the bridge component section are extracted from images of the precast component cross-sections by combining the Canny operator with mathematical morphology.Subsequently,a deep learning model is developed to identify the geometric parameters of the precast components using the extracted edge coordinates from the images as input and the predefined control parameters of the bridge section as output.A dataset is generated by varying the control parameters and noise levels for model training.Finally,field measurements are conducted to validate the accuracy of the developed method.The results indicate that the developed method effectively identifies the geometric parameters of bridge precast components,with an error rate maintained within 5%.展开更多
文摘Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline.
基金financially supported by the National Science Fund for Distinguished Young Scholars,China(No.52025041)the National Natural Science Foundation of China(Nos.52450003,U2341267,and 52174294)+1 种基金the National Postdoctoral Program for Innovative Talents,China(No.BX20240437)the Fundamental Research Funds for the Central Universities,China(Nos.FRF-IDRY-23-037 and FRF-TP-20-02C2)。
文摘The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.
文摘In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and targeted marketing.However,existing computer vision solutions often rely on facial recognition to gather such insights,raising significant privacy and ethical concerns.To address these issues,this paper presents a privacypreserving customer analytics system through two key strategies.First,we deploy a deep learning framework using YOLOv9s,trained on the RCA-TVGender dataset.Cameras are positioned perpendicular to observation areas to reduce facial visibility while maintaining accurate gender classification.Second,we apply AES-128 encryption to customer position data,ensuring secure access and regulatory compliance.Our system achieved overall performance,with 81.5%mAP@50,77.7%precision,and 75.7%recall.Moreover,a 90-min observational study confirmed the system’s ability to generate privacy-protected heatmaps revealing distinct behavioral patterns between male and female customers.For instance,women spent more time in certain areas and showed interest in different products.These results confirm the system’s effectiveness in enabling personalized layout and marketing strategies without compromising privacy.
基金funded by Woosong University Academic Research 2024.
文摘This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.
基金supported by Science and Technology Planning Project of Guangzhou City(2024A04J4474).
文摘Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a region characterized by intense ultraviolet radiation and arid climatic conditions,Mangxin Town presents unique environmental challenges that may exacerbate ocular health issues.Despite the global emphasis on addressing vision impairment among aging populations,there remains a paucity of updated and region-specific data in Xinjiang,necessitating this comprehensive assessment to inform targeted interventions.Methods:A cross-sectional study was conducted from May to June 2024,involving 1,311 elderly participants(76.76%participation rate)out of a total eligible population of 1,708 individuals aged≥60 years.Participants underwent detailed ocular examinations,including assessments of uncorrected visual acuity(UVA)and best-corrected visual acuity(BCVA)using standard logarithmic charts,slit-lamp biomicroscopy,optical coherence tomography(OCT,Topcon DRI OCT Triton),fundus photography,and intraocular pressure measurement(Canon TX-20 Tonometer).A multidisciplinary team of 10 ophthalmologists and 2 local village doctors,trained rigorously in standardized protocols,ensured consistent data collection.Demographic,lifestyle,and medical history data were collected via questionnaires.Statistical analyses,performed using STATA 16,included multivariate logistic regression to identify risk factors,with significance defined as P<0.05.Results:The overall prevalence of vision impairment was 13.21%(95%CI:11.37%-15.04%),with low vision at 11.76%(95%CI:10.01%-13.50%)and blindness at 1.45%(95%CI:0.80%-2.10%).Cataract emerged as the leading cause,responsible for 68.20%of cases,followed by glaucoma(5.80%),optic atrophy(5.20%),and age-related macular degeneration(2.90%).Vision impairment prevalence escalated significantly with age:7.74%in the 60–69 age group,17.79%in 70–79,and 33.72%in those≥80.Males exhibited higher prevalence than females(15.84%vs.10.45%,P=0.004).Multivariate analysis revealed age≥80 years(OR=6.43,95%CI:3.79%-10.90%),male sex(OR=0.53,95%CI:0.34%-0.83%),and daily exercise(OR=0.44,95%CI:0.20%-0.95%)as significant factors.History of eye disease showed a non-significant trend toward increased risk(OR=1.49,P=0.107).Education level,income,and smoking status showed no significant associations.Conclusions:This study underscores cataract as the predominant cause of vision impairment in Mangxin Town’s elderly population,with age and sex as critical determinants.The findings align with global patterns but highlight region-specific challenges,such as environmental factors contributing to cataract prevalence.Public health strategies should prioritize improving access to cataract surgery,enhancing grassroots ophthalmic infrastructure,and integrating portable screening technologies for early detection of fundus diseases.Additionally,promoting health education on UV protection and lifestyle modifications,such as regular exercise,may mitigate risks.Future research should expand to broader regions in Xinjiang,employ advanced diagnostic tools for complex conditions like glaucoma,and explore longitudinal trends to refine intervention strategies.These efforts are vital to reducing preventable blindness and improving quality of life for aging populations in underserved areas.
基金support by University of Auckland Faculty Research Development Fund(3716476).
文摘Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hindered by three challenges:①defect scale variance,motion blur,and strong illumination significantly affect the accuracy and reliability of damage detectors;②existing commonly used anchor-based damage detectors struggle to effectively generalize to harsh real-world scenarios;and③convolutional neural networks(CNNs)lack the capability to model long-range dependencies across the entire image.This paper presents an efficient Vision Transformer-enhanced anchor-free YOLO(you only look once)method to address these challenges.First,a concrete bridge damage dataset was established,augmented by motion blur and varying brightness.Four key enhancements were then applied to an anchor-based YOLO method:①Four detection heads were introduced to alleviate the multi-scale damage detection issue;②decoupled heads were employed to address the conflict between classification and bounding box regression tasks inherent in the original coupled head design;③an anchor-free mechanism was incorporated to reduce the computational complexity and improve generalization to real-world scenarios;and④a novel Vision Transformer block,C3MaxViT,was added to enable CNNs to model long-range dependencies.These enhancements were integrated into an advanced anchor-based YOLOv5l algorithm,and the proposed Vision Transformer-enhanced anchor-free YOLO method was then compared against cutting-edge damage detection methods.The experimental results demonstrated the effectiveness of the proposed method,with an increase of 8.1%in mean average precision at intersection over union threshold of 0.5(mAP_(50))and an improvement of 8.4%in mAP@[0.5:.05:.95]respectively.Furthermore,extensive ablation studies revealed that the four detection heads,decoupled head design,anchor-free mechanism,and C3MaxViT contributed improvements of 2.4%,1.2%,2.6%,and 1.9%in mAP50,respectively.
基金supported by the National Science Fund for Distinguished Young Scholars,China(No.51625501)Aeronautical Science Foundation of China(No.20240046051002)National Natural Science Foundation of China(No.52005028).
文摘Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.
文摘China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars around the world are watching closely.For over 70 years,these plans have guided the country’s economic and social development.
文摘Dear Editor,This letter proposes a novel dynamic vision-enabled intelligent micro-vibration estimation method with spatiotemporal pattern consistency.Inspired by biological vision,dynamic vision data are collected by the event camera,which is able to capture the micro-vibration information of mechanical equipment,due to the significant advantage of extremely high temporal sampling frequency.
文摘Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and are among the most cost-effective procedures in all of medicine and international development.[1-2]Thus,vision impairment is both extremely common and,in principle,readily manageable.
文摘Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.
基金The National Natural Science Foundation of China(No.52338011,52378291)Young Elite Scientists Sponsorship Program by CAST(No.2022-2024QNRC0101).
文摘To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is developed to identify the geometric parameters.The study utilizes a common precast element for highway bridges as the research subject.First,edge feature points of the bridge component section are extracted from images of the precast component cross-sections by combining the Canny operator with mathematical morphology.Subsequently,a deep learning model is developed to identify the geometric parameters of the precast components using the extracted edge coordinates from the images as input and the predefined control parameters of the bridge section as output.A dataset is generated by varying the control parameters and noise levels for model training.Finally,field measurements are conducted to validate the accuracy of the developed method.The results indicate that the developed method effectively identifies the geometric parameters of bridge precast components,with an error rate maintained within 5%.