AIM:To evaluate the efficacy of the total computer vision syndrome questionnaire(CVS-Q)score as a predictive tool for identifying individuals with symptomatic binocular vision anomalies and refractive errors.METHODS:A...AIM:To evaluate the efficacy of the total computer vision syndrome questionnaire(CVS-Q)score as a predictive tool for identifying individuals with symptomatic binocular vision anomalies and refractive errors.METHODS:A total of 141 healthy computer users underwent comprehensive clinical visual function assessments,including evaluations of refractive errors,accommodation(amplitude of accommodation,positive relative accommodation,negative relative accommodation,accommodative accuracy,and accommodative facility),and vergence(phoria,positive and negative fusional vergence,near point of convergence,and vergence facility).Total CVS-Q scores were recorded to explore potential associations between symptom scores and the aforementioned clinical visual function parameters.RESULTS:The cohort included 54 males(38.3%)with a mean age of 23.9±0.58y and 87 age-matched females(61.7%)with a mean age of 23.9±0.53y.The multiple regression model was statistically significant[R²=0.60,F=13.28,degrees of freedom(DF=17122,P<0.001].This indicates that 60%of the variance in total CVS-Q scores(reflecting reported symptoms)could be explained by four clinical measurements:amplitude of accommodation,positive relative accommodation,exophoria at distance and near,and positive fusional vergence at near.CONCLUSION:The total CVS-Q score is a valid and reliable tool for predicting the presence of various nonstrabismic binocular vision anomalies and refractive errors in symptomatic computer users.展开更多
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear...The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.展开更多
AIM:To investigate the association between functionaloutcomes and postoperative patient satisfaction 5y aftersmall incision lenticule extraction(SMILE)and femtosecondlaser-assisted in situ keratomileusis(FS-LASIK).MET...AIM:To investigate the association between functionaloutcomes and postoperative patient satisfaction 5y aftersmall incision lenticule extraction(SMILE)and femtosecondlaser-assisted in situ keratomileusis(FS-LASIK).METHODS:This is a cross-sectional study.Thepatients underwent basic ophthalmic examinations,axiallength measurement,wide-field fundus photography,andaccommodation function testing.Behavioral habits datawere collected using a self-administered questionnaire,andvisual symptoms were assessed with the Quality of Vision(QoV)questionnaire.Postoperative satisfaction was alsorecorded.RESULTS:Totally 410 subjects[820 eyes,160males(39.02%)and 250 females(60.98%)]who hadundergone SMILE or FS-LASIK 5y ago were enrolled.Themean(standard deviation,SD)age of all patients was29.83y(6.69).The mean(SD)preoperative manifest SEwas-5.80(2.04)diopters(D;range:-0.88 to-13.75).Patient satisfaction at 5y after undergoing SMILE or FSLASIKwas 91.70%.Patients were categorized into twogroups:dissatisfied group and satisfied group.Significantdifferences were observed between the two groups in termsof age(P=0.012),sex(P=0.021),preoperative degreeof myopia(P=0.049),postoperative visual symptoms(frequency,P=0.043;severity,P<0.001;bothersome,P=0.018),difficulty driving at night(P=0.001),andaccommodative amplitude(AMP,P=0.020).Multivariateanalysis confirmed that female sex(P=0.024),severityof visual symptoms(P=0.009),and difficulty driving atnight(P=0.006)were significantly associated with lowersatisfaction.The dissatisfied group showed higher rates ofstarbursts,double or multiple images,and high myopia,but lower age.The frequency,severity,and bothersome ofdistortion exhibited decreased with increasing age.CONCLUSION:Patient satisfaction 5y after SMILEand FS-LASIK is high and stable.Difficulty driving at night,sex,and severity of visual symptoms are important factorsinfluencing patient satisfaction.Special attention should bepaid to younger highly myopic female patients,particularlythose with starbursts and double or multiple images.It is crucial to monitor postoperative visual outcomesand provide patients with comprehensive preoperativecounseling to enhance long-term satisfaction.展开更多
This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as o...This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.展开更多
Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a...Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a region characterized by intense ultraviolet radiation and arid climatic conditions,Mangxin Town presents unique environmental challenges that may exacerbate ocular health issues.Despite the global emphasis on addressing vision impairment among aging populations,there remains a paucity of updated and region-specific data in Xinjiang,necessitating this comprehensive assessment to inform targeted interventions.Methods:A cross-sectional study was conducted from May to June 2024,involving 1,311 elderly participants(76.76%participation rate)out of a total eligible population of 1,708 individuals aged≥60 years.Participants underwent detailed ocular examinations,including assessments of uncorrected visual acuity(UVA)and best-corrected visual acuity(BCVA)using standard logarithmic charts,slit-lamp biomicroscopy,optical coherence tomography(OCT,Topcon DRI OCT Triton),fundus photography,and intraocular pressure measurement(Canon TX-20 Tonometer).A multidisciplinary team of 10 ophthalmologists and 2 local village doctors,trained rigorously in standardized protocols,ensured consistent data collection.Demographic,lifestyle,and medical history data were collected via questionnaires.Statistical analyses,performed using STATA 16,included multivariate logistic regression to identify risk factors,with significance defined as P<0.05.Results:The overall prevalence of vision impairment was 13.21%(95%CI:11.37%-15.04%),with low vision at 11.76%(95%CI:10.01%-13.50%)and blindness at 1.45%(95%CI:0.80%-2.10%).Cataract emerged as the leading cause,responsible for 68.20%of cases,followed by glaucoma(5.80%),optic atrophy(5.20%),and age-related macular degeneration(2.90%).Vision impairment prevalence escalated significantly with age:7.74%in the 60–69 age group,17.79%in 70–79,and 33.72%in those≥80.Males exhibited higher prevalence than females(15.84%vs.10.45%,P=0.004).Multivariate analysis revealed age≥80 years(OR=6.43,95%CI:3.79%-10.90%),male sex(OR=0.53,95%CI:0.34%-0.83%),and daily exercise(OR=0.44,95%CI:0.20%-0.95%)as significant factors.History of eye disease showed a non-significant trend toward increased risk(OR=1.49,P=0.107).Education level,income,and smoking status showed no significant associations.Conclusions:This study underscores cataract as the predominant cause of vision impairment in Mangxin Town’s elderly population,with age and sex as critical determinants.The findings align with global patterns but highlight region-specific challenges,such as environmental factors contributing to cataract prevalence.Public health strategies should prioritize improving access to cataract surgery,enhancing grassroots ophthalmic infrastructure,and integrating portable screening technologies for early detection of fundus diseases.Additionally,promoting health education on UV protection and lifestyle modifications,such as regular exercise,may mitigate risks.Future research should expand to broader regions in Xinjiang,employ advanced diagnostic tools for complex conditions like glaucoma,and explore longitudinal trends to refine intervention strategies.These efforts are vital to reducing preventable blindness and improving quality of life for aging populations in underserved areas.展开更多
Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness...Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.展开更多
China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars ar...China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars around the world are watching closely.For over 70 years,these plans have guided the country’s economic and social development.展开更多
Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hinde...Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hindered by three challenges:①defect scale variance,motion blur,and strong illumination significantly affect the accuracy and reliability of damage detectors;②existing commonly used anchor-based damage detectors struggle to effectively generalize to harsh real-world scenarios;and③convolutional neural networks(CNNs)lack the capability to model long-range dependencies across the entire image.This paper presents an efficient Vision Transformer-enhanced anchor-free YOLO(you only look once)method to address these challenges.First,a concrete bridge damage dataset was established,augmented by motion blur and varying brightness.Four key enhancements were then applied to an anchor-based YOLO method:①Four detection heads were introduced to alleviate the multi-scale damage detection issue;②decoupled heads were employed to address the conflict between classification and bounding box regression tasks inherent in the original coupled head design;③an anchor-free mechanism was incorporated to reduce the computational complexity and improve generalization to real-world scenarios;and④a novel Vision Transformer block,C3MaxViT,was added to enable CNNs to model long-range dependencies.These enhancements were integrated into an advanced anchor-based YOLOv5l algorithm,and the proposed Vision Transformer-enhanced anchor-free YOLO method was then compared against cutting-edge damage detection methods.The experimental results demonstrated the effectiveness of the proposed method,with an increase of 8.1%in mean average precision at intersection over union threshold of 0.5(mAP_(50))and an improvement of 8.4%in mAP@[0.5:.05:.95]respectively.Furthermore,extensive ablation studies revealed that the four detection heads,decoupled head design,anchor-free mechanism,and C3MaxViT contributed improvements of 2.4%,1.2%,2.6%,and 1.9%in mAP50,respectively.展开更多
Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and...Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and are among the most cost-effective procedures in all of medicine and international development.[1-2]Thus,vision impairment is both extremely common and,in principle,readily manageable.展开更多
Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learni...Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.展开更多
目的 探讨基于Vision-LSTM的人工智能(artificial intelligence,AI)技术对甲状腺影像报告与数据系统4b (Thyroid Imaging Reporting and Data System Category 4b,TI-RADS 4b)类甲状腺结节的超声诊断准确性,评估其辅助临床决策的可行性...目的 探讨基于Vision-LSTM的人工智能(artificial intelligence,AI)技术对甲状腺影像报告与数据系统4b (Thyroid Imaging Reporting and Data System Category 4b,TI-RADS 4b)类甲状腺结节的超声诊断准确性,评估其辅助临床决策的可行性。方法 收集我院401例TI-RADS 4b类甲状腺结节的超声影像数据,并利用这些数据对Vision-LSTM模型进行训练和验证。将AI模型的诊断结果与初级医生及高级医生的诊断结果进行对比,评估其在诊断准确性、稳定性等方面的表现;采用曲线下面积(area under the curve,AUC)、精确率-召回率(precision-recall,PR)曲线等指标对模型性能进行量化分析。结果 在独立验证中,Vision-LSTM模型的AUC(0.88)与准确率(89.4%)均显著高于初级医生(AUC:0.624),并达到与高级医生(AUC:0.787)相当的水平,证明了其辅助诊断的应用潜力。AI模型能够准确识别超声影像中的复杂特征,稳定输出一致的诊断结果,展现出较高的准确性和可靠性。结论 基于Vision-LSTM模型的AI技术可显著提升TI-RADS 4b类甲状腺结节的诊断效率与准确性,为医生提供有效辅助,减轻工作负担。展开更多
To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is deve...To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is developed to identify the geometric parameters.The study utilizes a common precast element for highway bridges as the research subject.First,edge feature points of the bridge component section are extracted from images of the precast component cross-sections by combining the Canny operator with mathematical morphology.Subsequently,a deep learning model is developed to identify the geometric parameters of the precast components using the extracted edge coordinates from the images as input and the predefined control parameters of the bridge section as output.A dataset is generated by varying the control parameters and noise levels for model training.Finally,field measurements are conducted to validate the accuracy of the developed method.The results indicate that the developed method effectively identifies the geometric parameters of bridge precast components,with an error rate maintained within 5%.展开更多
Recently,for developing neuromorphic visual systems,adaptive optoelectronic devices become one of the main research directions and attract extensive focus to achieve optoelectronic transistors with high performances a...Recently,for developing neuromorphic visual systems,adaptive optoelectronic devices become one of the main research directions and attract extensive focus to achieve optoelectronic transistors with high performances and flexible func-tionalities.In this review,based on a description of the biological adaptive functions that are favorable for dynamically perceiv-ing,filtering,and processing information in the varying environment,we summarize the representative strategies for achiev-ing these adaptabilities in optoelectronic transistors,including the adaptation for detecting information,adaptive synaptic weight change,and history-dependent plasticity.Moreover,the key points of the corresponding strategies are comprehen-sively discussed.And the applications of these adaptive optoelectronic transistors,including the adaptive color detection,sig-nal filtering,extending the response range of light intensity,and improve learning efficiency,are also illustrated separately.Lastly,the challenges faced in developing adaptive optoelectronic transistor for artificial vision system are discussed.The descrip-tion of biological adaptive functions and the corresponding inspired neuromorphic devices are expected to provide insights for the design and application of next-generation artificial visual systems.展开更多
基金Supported by Ongoing Research Funding Program(ORFFT-2025-054-1),King Saud University,Riyadh,Saudi Arabia.
文摘AIM:To evaluate the efficacy of the total computer vision syndrome questionnaire(CVS-Q)score as a predictive tool for identifying individuals with symptomatic binocular vision anomalies and refractive errors.METHODS:A total of 141 healthy computer users underwent comprehensive clinical visual function assessments,including evaluations of refractive errors,accommodation(amplitude of accommodation,positive relative accommodation,negative relative accommodation,accommodative accuracy,and accommodative facility),and vergence(phoria,positive and negative fusional vergence,near point of convergence,and vergence facility).Total CVS-Q scores were recorded to explore potential associations between symptom scores and the aforementioned clinical visual function parameters.RESULTS:The cohort included 54 males(38.3%)with a mean age of 23.9±0.58y and 87 age-matched females(61.7%)with a mean age of 23.9±0.53y.The multiple regression model was statistically significant[R²=0.60,F=13.28,degrees of freedom(DF=17122,P<0.001].This indicates that 60%of the variance in total CVS-Q scores(reflecting reported symptoms)could be explained by four clinical measurements:amplitude of accommodation,positive relative accommodation,exophoria at distance and near,and positive fusional vergence at near.CONCLUSION:The total CVS-Q score is a valid and reliable tool for predicting the presence of various nonstrabismic binocular vision anomalies and refractive errors in symptomatic computer users.
基金financially supported by the National Science Fund for Distinguished Young Scholars,China(No.52025041)the National Natural Science Foundation of China(Nos.52450003,U2341267,and 52174294)+1 种基金the National Postdoctoral Program for Innovative Talents,China(No.BX20240437)the Fundamental Research Funds for the Central Universities,China(Nos.FRF-IDRY-23-037 and FRF-TP-20-02C2)。
文摘The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.
基金Supported by Research and Transformation Application of Capital Clinical Diagnosis and Treatment Technology by Beijing Municipal Commission of Science and Technology(No.Z201100005520043).
文摘AIM:To investigate the association between functionaloutcomes and postoperative patient satisfaction 5y aftersmall incision lenticule extraction(SMILE)and femtosecondlaser-assisted in situ keratomileusis(FS-LASIK).METHODS:This is a cross-sectional study.Thepatients underwent basic ophthalmic examinations,axiallength measurement,wide-field fundus photography,andaccommodation function testing.Behavioral habits datawere collected using a self-administered questionnaire,andvisual symptoms were assessed with the Quality of Vision(QoV)questionnaire.Postoperative satisfaction was alsorecorded.RESULTS:Totally 410 subjects[820 eyes,160males(39.02%)and 250 females(60.98%)]who hadundergone SMILE or FS-LASIK 5y ago were enrolled.Themean(standard deviation,SD)age of all patients was29.83y(6.69).The mean(SD)preoperative manifest SEwas-5.80(2.04)diopters(D;range:-0.88 to-13.75).Patient satisfaction at 5y after undergoing SMILE or FSLASIKwas 91.70%.Patients were categorized into twogroups:dissatisfied group and satisfied group.Significantdifferences were observed between the two groups in termsof age(P=0.012),sex(P=0.021),preoperative degreeof myopia(P=0.049),postoperative visual symptoms(frequency,P=0.043;severity,P<0.001;bothersome,P=0.018),difficulty driving at night(P=0.001),andaccommodative amplitude(AMP,P=0.020).Multivariateanalysis confirmed that female sex(P=0.024),severityof visual symptoms(P=0.009),and difficulty driving atnight(P=0.006)were significantly associated with lowersatisfaction.The dissatisfied group showed higher rates ofstarbursts,double or multiple images,and high myopia,but lower age.The frequency,severity,and bothersome ofdistortion exhibited decreased with increasing age.CONCLUSION:Patient satisfaction 5y after SMILEand FS-LASIK is high and stable.Difficulty driving at night,sex,and severity of visual symptoms are important factorsinfluencing patient satisfaction.Special attention should bepaid to younger highly myopic female patients,particularlythose with starbursts and double or multiple images.It is crucial to monitor postoperative visual outcomesand provide patients with comprehensive preoperativecounseling to enhance long-term satisfaction.
基金funded by Woosong University Academic Research 2024.
文摘This study investigates the application of Learnable Memory Vision Transformers(LMViT)for detecting metal surface flaws,comparing their performance with traditional CNNs,specifically ResNet18 and ResNet50,as well as other transformer-based models including Token to Token ViT,ViT withoutmemory,and Parallel ViT.Leveraging awidely-used steel surface defect dataset,the research applies data augmentation and t-distributed stochastic neighbor embedding(t-SNE)to enhance feature extraction and understanding.These techniques mitigated overfitting,stabilized training,and improved generalization capabilities.The LMViT model achieved a test accuracy of 97.22%,significantly outperforming ResNet18(88.89%)and ResNet50(88.90%),aswell as the Token to TokenViT(88.46%),ViT without memory(87.18),and Parallel ViT(91.03%).Furthermore,LMViT exhibited superior training and validation performance,attaining a validation accuracy of 98.2%compared to 91.0%for ResNet 18,96.0%for ResNet50,and 89.12%,87.51%,and 91.21%for Token to Token ViT,ViT without memory,and Parallel ViT,respectively.The findings highlight the LMViT’s ability to capture long-range dependencies in images,an areawhere CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction.The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs,with LMViT excelling particularly at detecting subtle and complex defects,which is critical for maintaining product quality and operational efficiency in industrial applications.For instance,the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify.This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT,ViT without memory,and Parallel ViT in industrial scenarios where complex spatial relationships are key.Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.
基金supported by Science and Technology Planning Project of Guangzhou City(2024A04J4474).
文摘Objective:This study aimed to investigate the prevalence,causes,and influencing factors of vision impairment in the elderly population aged 60 years and above in Mangxin Town,Kashgar region,Xinjiang,China.Located in a region characterized by intense ultraviolet radiation and arid climatic conditions,Mangxin Town presents unique environmental challenges that may exacerbate ocular health issues.Despite the global emphasis on addressing vision impairment among aging populations,there remains a paucity of updated and region-specific data in Xinjiang,necessitating this comprehensive assessment to inform targeted interventions.Methods:A cross-sectional study was conducted from May to June 2024,involving 1,311 elderly participants(76.76%participation rate)out of a total eligible population of 1,708 individuals aged≥60 years.Participants underwent detailed ocular examinations,including assessments of uncorrected visual acuity(UVA)and best-corrected visual acuity(BCVA)using standard logarithmic charts,slit-lamp biomicroscopy,optical coherence tomography(OCT,Topcon DRI OCT Triton),fundus photography,and intraocular pressure measurement(Canon TX-20 Tonometer).A multidisciplinary team of 10 ophthalmologists and 2 local village doctors,trained rigorously in standardized protocols,ensured consistent data collection.Demographic,lifestyle,and medical history data were collected via questionnaires.Statistical analyses,performed using STATA 16,included multivariate logistic regression to identify risk factors,with significance defined as P<0.05.Results:The overall prevalence of vision impairment was 13.21%(95%CI:11.37%-15.04%),with low vision at 11.76%(95%CI:10.01%-13.50%)and blindness at 1.45%(95%CI:0.80%-2.10%).Cataract emerged as the leading cause,responsible for 68.20%of cases,followed by glaucoma(5.80%),optic atrophy(5.20%),and age-related macular degeneration(2.90%).Vision impairment prevalence escalated significantly with age:7.74%in the 60–69 age group,17.79%in 70–79,and 33.72%in those≥80.Males exhibited higher prevalence than females(15.84%vs.10.45%,P=0.004).Multivariate analysis revealed age≥80 years(OR=6.43,95%CI:3.79%-10.90%),male sex(OR=0.53,95%CI:0.34%-0.83%),and daily exercise(OR=0.44,95%CI:0.20%-0.95%)as significant factors.History of eye disease showed a non-significant trend toward increased risk(OR=1.49,P=0.107).Education level,income,and smoking status showed no significant associations.Conclusions:This study underscores cataract as the predominant cause of vision impairment in Mangxin Town’s elderly population,with age and sex as critical determinants.The findings align with global patterns but highlight region-specific challenges,such as environmental factors contributing to cataract prevalence.Public health strategies should prioritize improving access to cataract surgery,enhancing grassroots ophthalmic infrastructure,and integrating portable screening technologies for early detection of fundus diseases.Additionally,promoting health education on UV protection and lifestyle modifications,such as regular exercise,may mitigate risks.Future research should expand to broader regions in Xinjiang,employ advanced diagnostic tools for complex conditions like glaucoma,and explore longitudinal trends to refine intervention strategies.These efforts are vital to reducing preventable blindness and improving quality of life for aging populations in underserved areas.
基金supported by the National Science Fund for Distinguished Young Scholars,China(No.51625501)Aeronautical Science Foundation of China(No.20240046051002)National Natural Science Foundation of China(No.52005028).
文摘Real-time and accurate drogue pose measurement during docking is basic and critical for Autonomous Aerial Refueling(AAR).Vision measurement is the best practicable technique,but its measurement accuracy and robustness are easily affected by limited computing power of airborne equipment,complex aerial scenes and partial occlusion.To address the above challenges,we propose a novel drogue keypoint detection and pose measurement algorithm based on monocular vision,and realize real-time processing on airborne embedded devices.Firstly,a lightweight network is designed with structural re-parameterization to reduce computational cost and improve inference speed.And a sub-pixel level keypoints prediction head and loss functions are adopted to improve keypoint detection accuracy.Secondly,a closed-form solution of drogue pose is computed based on double spatial circles,followed by a nonlinear refinement based on Levenberg-Marquardt optimization.Both virtual simulation and physical simulation experiments have been used to test the proposed method.In the virtual simulation,the mean pixel error of the proposed method is 0.787 pixels,which is significantly superior to that of other methods.In the physical simulation,the mean relative measurement error is 0.788%,and the mean processing time is 13.65 ms on embedded devices.
文摘China’s five-year plans crystallize a governance model that merges long-term strategic vision with adaptive execution.AS China prepares to unveil its 15th Five-Year Plan in 2026,policymakers,investors,and scholars around the world are watching closely.For over 70 years,these plans have guided the country’s economic and social development.
基金support by University of Auckland Faculty Research Development Fund(3716476).
文摘Deep learning techniques have recently been the most popular method for automatically detecting bridge damage captured by unmanned aerial vehicles(UAVs).However,their wider application to real-world scenarios is hindered by three challenges:①defect scale variance,motion blur,and strong illumination significantly affect the accuracy and reliability of damage detectors;②existing commonly used anchor-based damage detectors struggle to effectively generalize to harsh real-world scenarios;and③convolutional neural networks(CNNs)lack the capability to model long-range dependencies across the entire image.This paper presents an efficient Vision Transformer-enhanced anchor-free YOLO(you only look once)method to address these challenges.First,a concrete bridge damage dataset was established,augmented by motion blur and varying brightness.Four key enhancements were then applied to an anchor-based YOLO method:①Four detection heads were introduced to alleviate the multi-scale damage detection issue;②decoupled heads were employed to address the conflict between classification and bounding box regression tasks inherent in the original coupled head design;③an anchor-free mechanism was incorporated to reduce the computational complexity and improve generalization to real-world scenarios;and④a novel Vision Transformer block,C3MaxViT,was added to enable CNNs to model long-range dependencies.These enhancements were integrated into an advanced anchor-based YOLOv5l algorithm,and the proposed Vision Transformer-enhanced anchor-free YOLO method was then compared against cutting-edge damage detection methods.The experimental results demonstrated the effectiveness of the proposed method,with an increase of 8.1%in mean average precision at intersection over union threshold of 0.5(mAP_(50))and an improvement of 8.4%in mAP@[0.5:.05:.95]respectively.Furthermore,extensive ablation studies revealed that the four detection heads,decoupled head design,anchor-free mechanism,and C3MaxViT contributed improvements of 2.4%,1.2%,2.6%,and 1.9%in mAP50,respectively.
文摘Blindness affected 45 million people globally in 2021,and moderate to severe vision loss a further 295 million.[1]The most common causes,cataract and uncorrected refractive error,are generally the easiest to treat,and are among the most cost-effective procedures in all of medicine and international development.[1-2]Thus,vision impairment is both extremely common and,in principle,readily manageable.
文摘Detecting pavement cracks is critical for road safety and infrastructure management.Traditional methods,relying on manual inspection and basic image processing,are time-consuming and prone to errors.Recent deep-learning(DL)methods automate crack detection,but many still struggle with variable crack patterns and environmental conditions.This study aims to address these limitations by introducing the Masker Transformer,a novel hybrid deep learning model that integrates the precise localization capabilities of Mask Region-based Convolutional Neural Network(Mask R-CNN)with the global contextual awareness of Vision Transformer(ViT).The research focuses on leveraging the strengths of both architectures to enhance segmentation accuracy and adaptability across different pavement conditions.We evaluated the performance of theMaskerTransformer against other state-of-theartmodels such asU-Net,TransformerU-Net(TransUNet),U-NetTransformer(UNETr),SwinU-NetTransformer(Swin-UNETr),You Only Look Once version 8(YoloV8),and Mask R-CNN using two benchmark datasets:Crack500 and DeepCrack.The findings reveal that the MaskerTransformer significantly outperforms the existing models,achieving the highest Dice SimilarityCoefficient(DSC),precision,recall,and F1-Score across both datasets.Specifically,the model attained a DSC of 80.04%on Crack500 and 91.37%on DeepCrack,demonstrating superior segmentation accuracy and reliability.The high precision and recall rates further substantiate its effectiveness in real-world applications,suggesting that the Masker Transformer can serve as a robust tool for automated pavement crack detection,potentially replacing more traditional methods.
文摘目的 探讨基于Vision-LSTM的人工智能(artificial intelligence,AI)技术对甲状腺影像报告与数据系统4b (Thyroid Imaging Reporting and Data System Category 4b,TI-RADS 4b)类甲状腺结节的超声诊断准确性,评估其辅助临床决策的可行性。方法 收集我院401例TI-RADS 4b类甲状腺结节的超声影像数据,并利用这些数据对Vision-LSTM模型进行训练和验证。将AI模型的诊断结果与初级医生及高级医生的诊断结果进行对比,评估其在诊断准确性、稳定性等方面的表现;采用曲线下面积(area under the curve,AUC)、精确率-召回率(precision-recall,PR)曲线等指标对模型性能进行量化分析。结果 在独立验证中,Vision-LSTM模型的AUC(0.88)与准确率(89.4%)均显著高于初级医生(AUC:0.624),并达到与高级医生(AUC:0.787)相当的水平,证明了其辅助诊断的应用潜力。AI模型能够准确识别超声影像中的复杂特征,稳定输出一致的诊断结果,展现出较高的准确性和可靠性。结论 基于Vision-LSTM模型的AI技术可显著提升TI-RADS 4b类甲状腺结节的诊断效率与准确性,为医生提供有效辅助,减轻工作负担。
基金The National Natural Science Foundation of China(No.52338011,52378291)Young Elite Scientists Sponsorship Program by CAST(No.2022-2024QNRC0101).
文摘To overcome the limitations of low efficiency and reliance on manual processes in the measurement of geometric parameters for bridge prefabricated components,a method based on deep learning and computer vision is developed to identify the geometric parameters.The study utilizes a common precast element for highway bridges as the research subject.First,edge feature points of the bridge component section are extracted from images of the precast component cross-sections by combining the Canny operator with mathematical morphology.Subsequently,a deep learning model is developed to identify the geometric parameters of the precast components using the extracted edge coordinates from the images as input and the predefined control parameters of the bridge section as output.A dataset is generated by varying the control parameters and noise levels for model training.Finally,field measurements are conducted to validate the accuracy of the developed method.The results indicate that the developed method effectively identifies the geometric parameters of bridge precast components,with an error rate maintained within 5%.
基金the National Key Research and Development Program of China(2021YFA0717900)National Natural Science Foundation of China(62471251,62405144,62288102,22275098,and 62174089)+1 种基金Basic Research Program of Jiangsu(BK20240033,BK20243057)Jiangsu Funding Program for Excellent Postdoctoral Talent(2022ZB402).
文摘Recently,for developing neuromorphic visual systems,adaptive optoelectronic devices become one of the main research directions and attract extensive focus to achieve optoelectronic transistors with high performances and flexible func-tionalities.In this review,based on a description of the biological adaptive functions that are favorable for dynamically perceiv-ing,filtering,and processing information in the varying environment,we summarize the representative strategies for achiev-ing these adaptabilities in optoelectronic transistors,including the adaptation for detecting information,adaptive synaptic weight change,and history-dependent plasticity.Moreover,the key points of the corresponding strategies are comprehen-sively discussed.And the applications of these adaptive optoelectronic transistors,including the adaptive color detection,sig-nal filtering,extending the response range of light intensity,and improve learning efficiency,are also illustrated separately.Lastly,the challenges faced in developing adaptive optoelectronic transistor for artificial vision system are discussed.The descrip-tion of biological adaptive functions and the corresponding inspired neuromorphic devices are expected to provide insights for the design and application of next-generation artificial visual systems.