Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approach...Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.展开更多
Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to ...Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.展开更多
The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method f...The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.展开更多
This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualiz...This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.展开更多
This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi...This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison.展开更多
This study seeks to establish a novel,semi-automatic system that utilizes Industry 4.0 principles to effectively determine both acceptable and rejectable concrete cubes with regard to their failure modes,significantly...This study seeks to establish a novel,semi-automatic system that utilizes Industry 4.0 principles to effectively determine both acceptable and rejectable concrete cubes with regard to their failure modes,significantly contributing to the dependability of concrete quality evaluations.The study utilizes image processing and machine learning(ML)methods,namely object detectionmodels such as YOLOv8 and Convolutional Neural Networks(CNNs),to evaluate images of concrete cubes.These models are trained and validated on an extensive database of annotated images from real-world and laboratory conditions.Preliminary results indicate a good performance in the classification of concrete cube failure modes.The proposed system accurately identifies cracks,determines the severity of damage to structures,indicating the potential to minimize human errors and discrepancies that might occur through the current techniques to detect the failure mode of concrete cubes.Thedeveloped systemcould significantly improve the reliability of concrete cube assessments,reduce resource wastage,and contribute to more sustainable construction practices.By minimizing material costs and errors,this innovation supports the construction industry’s move towards sustainability.展开更多
The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image...The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.展开更多
In micro milling machining,tool wear directly affects workpiece quality and accuracy,making effective tool wear monitoring a key factor in ensuring product integrity.The use of machine vision-based methods can provide...In micro milling machining,tool wear directly affects workpiece quality and accuracy,making effective tool wear monitoring a key factor in ensuring product integrity.The use of machine vision-based methods can provide an intuitive and efficient representation of tool wear conditions.However,micro milling tools have non-flat flanks,thin coatings can peel off,and spindle orientation is uncertain during downtime.These factors result in low pixel values,uneven illumination,and arbitrary tool position.To address this,we propose an image-based tool wear monitoring method.It combines multiple algorithms to restore lost pixels due to uneven illumination during segmentation and accurately extract wear areas.Experimental results demonstrate that the proposed algorithm exhibits high robustness to such images,effectively addressing the effects of illumination and spindle orientation.Additionally,the algorithm has low complexity,fast execution time,and significantly reduces the detection time in situ.展开更多
Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert p...Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert performance.This survey reviews the principal model families as convolutional,recurrent,generative,reinforcement,autoencoder,and transfer-learning approaches as emphasising how their architectural choices map to tasks such as segmentation,classification,reconstruction,and anomaly detection.A dedicated treatment of multimodal fusion networks shows how imaging features can be integrated with genomic profiles and clinical records to yield more robust,context-aware predictions.To support clinical adoption,we outline post-hoc explainability techniques(Grad-CAM,SHAP,LIME)and describe emerging intrinsically interpretable designs that expose decision logic to end users.Regulatory guidance from the U.S.FDA,the European Medicines Agency,and the EU AI Act is summarised,linking transparency and lifecycle-monitoring requirements to concrete development practices.Remaining challenges as data imbalance,computational cost,privacy constraints,and cross-domain generalization are discussed alongside promising solutions such as federated learning,uncertainty quantification,and lightweight 3-D architectures.The article therefore offers researchers,clinicians,and policymakers a concise,practice-oriented roadmap for deploying trustworthy deep-learning systems in healthcare.展开更多
Structural Health Monitoring(SHM)systems play a key role in managing buildings and infrastructure by delivering vital insights into their strength and structural integrity.There is a need for more efficient techniques...Structural Health Monitoring(SHM)systems play a key role in managing buildings and infrastructure by delivering vital insights into their strength and structural integrity.There is a need for more efficient techniques to detect defects,as traditional methods are often prone to human error,and this issue is also addressed through image processing(IP).In addition to IP,automated,accurate,and real-time detection of structural defects,such as cracks,corrosion,and material degradation that conventional inspection techniques may miss,is made possible by Artificial Intelligence(AI)technologies like Machine Learning(ML)and Deep Learning(DL).This review examines the integration of computer vision and AI techniques in Structural Health Monitoring(SHM),investigating their effectiveness in detecting various forms of structural deterioration.Also,it evaluates ML and DL models in SHM for their accuracy in identifying and assessing structural damage,ultimately enhancing safety,durability,and maintenance practices in the field.Key findings reveal that AI-powered approaches,especially those utilizing IP and DL models like CNNs,significantly improve detection efficiency and accuracy,with reported accuracies in various SHM tasks.However,significant research gaps remain,including challenges with the consistency,quality,and environmental resilience of image data,a notable lack of standardized models and datasets for training across diverse structures,and concerns regarding computational costs,model interpretability,and seamless integration with existing systems.Future work should focus on developing more robust models through data augmentation,transfer learning,and hybrid approaches,standardizing protocols,and fostering interdisciplinary collaboration to overcome these limitations and achieve more reliable,scalable,and affordable SHM systems.展开更多
All-optical image processing has been viewed as a promising technique for its high computation speed and low power consumption.However,current methods are often restricted to few functionalities and low reconfigurabil...All-optical image processing has been viewed as a promising technique for its high computation speed and low power consumption.However,current methods are often restricted to few functionalities and low reconfigurabilities,which cannot meet the growing demand for device integration and scenario adaptation in next-generation vision regimes.Here,we propose and experimentally demonstrate a bilayer liquid crystal computing platform for reconfigurable image processing.Under different in-situ/ex-situ twisted/untwisted conditions of the layers,our approach allows for eight kinds of image processing functions,including one/two-channel bright field imaging,one/two-channel vortex filtering,horizontally/vertically one-dimensional edge detection,vertex detection,and photonic spin Hall effect-based resolution adjustable edge detection.A unified theoretical framework for this scheme is established on the transfer function theory,which coincides well with the experimental results.The proposed method offers an easily-switchable multi-functional solution to optical image processing by introducing mechanical degrees of freedom,which may enable emerging applications in computer vision,autonomous driving,and biomedical microscopy.展开更多
This paper provides a comprehensive introduction to the mini-Si Tian Real-time Image Processing pipeline(STRIP)and evaluates its operational performance.The STRIP pipeline is specifically designed for real-time alert ...This paper provides a comprehensive introduction to the mini-Si Tian Real-time Image Processing pipeline(STRIP)and evaluates its operational performance.The STRIP pipeline is specifically designed for real-time alert triggering and light curve generation for transient sources.By applying the STRIP pipeline to both simulated and real observational data of the Mini-Si Tian survey,it successfully identified various types of variable sources,including stellar flares,supernovae,variable stars,and asteroids,while meeting requirements of reduction speed within 5 minutes.For the real observational data set,the pipeline detected one flare event,127 variable stars,and14 asteroids from three monitored sky regions.Additionally,two data sets were generated:one,a real-bogus training data set comprising 218,818 training samples,and the other,a variable star light curve data set with 421instances.These data sets will be used to train machine learning algorithms,which are planned for future integration into STRIP.展开更多
Breast cancer remains one of the most pressing global health concerns,and early detection plays a crucial role in improving survival rates.Integrating digital mammography with computational techniques and advanced ima...Breast cancer remains one of the most pressing global health concerns,and early detection plays a crucial role in improving survival rates.Integrating digital mammography with computational techniques and advanced image processing has significantly enhanced the ability to identify abnormalities.However,existing methodologies face persistent challenges,including low image contrast,noise interference,and inaccuracies in segmenting regions of interest.To address these limitations,this study introduces a novel computational framework for analyzing mammographic images,evaluated using the Mammographic Image Analysis Society(MIAS)dataset comprising 322 samples.The proposed methodology follows a structured three-stage approach.Initially,mammographic scans are classified using the Breast Imaging Reporting and Data System(BI-RADS),ensuring systematic and standardized image analysis.Next,the pectoral muscle,which can interfere with accurate segmentation,is effectively removed to refine the region of interest(ROI).The final stage involves an advanced image pre-processing module utilizing Independent Component Analysis(ICA)to enhance contrast,suppress noise,and improve image clarity.Following these enhancements,a robust segmentation technique is employed to delineated abnormal regions.Experimental results validate the efficiency of the proposed framework,demonstrating a significant improvement in the Effective Measure of Enhancement(EME)and a 3 dB increase in Peak Signal-to-Noise Ratio(PSNR),indicating superior image quality.The model also achieves an accuracy of approximately 97%,surpassing contemporary techniques evaluated on the MIAS dataset.Furthermore,its ability to process mammograms across all BI-RADS categories highlights its adaptability and reliability for clinical applications.This study presents an advanced and dependable computational framework for mammographic image analysis,effectively addressing critical challenges in noise reduction,contrast enhancement,and segmentation precision.The proposed approach lays the groundwork for seamless integration into computer-aided diagnostic(CAD)systems,with the potential to significantly enhance early breast cancer detection and contribute to improved patient outcomes.展开更多
In order to obtain good welding quality, it is necessary to apply quality control because there are many influencing factors in laser welding process. The key to realize welding quality control is to obtain the qualit...In order to obtain good welding quality, it is necessary to apply quality control because there are many influencing factors in laser welding process. The key to realize welding quality control is to obtain the quality information. Abundant weld quality information is contained in weld pool and keyhole. Aiming at Nd:YAG laser welding of stainless steel, a coaxial visual sensing system was constructed. The images of weld pool and keyhole were obtained. Based on the gray character of weld pool and keyhole in images, an image processing algorithm was designed. The search start point and search criteria of weld pool and keyhole edge were determined respectively.展开更多
Using the method of mathematical morphology,this paper fulfills filtration,segmentation and extraction of morphological features of the satellite cloud image.It also gives out the relative algorithms,which is realized...Using the method of mathematical morphology,this paper fulfills filtration,segmentation and extraction of morphological features of the satellite cloud image.It also gives out the relative algorithms,which is realized by parallel C programming based on Transputer networks.It has been successfully used to process the typhoon and the low tornado cloud image.And it will be used in weather forecast.展开更多
The traditional printing checking method always uses printing control strips,but the results are not very well in repeatability and stability. In this paper,the checking methods for printing quality basing on image ar...The traditional printing checking method always uses printing control strips,but the results are not very well in repeatability and stability. In this paper,the checking methods for printing quality basing on image are taken as research objects. On the base of the traditional checking methods of printing quality,combining the method and theory of digital image processing with printing theory in the new domain of image quality checking,it constitute the checking system of printing quality by image processing,and expound the theory design and the model of this system. This is an application of machine vision. It uses the high resolution industrial CCD(Charge Coupled Device) colorful camera. It can display the real-time photographs on the monitor,and input the video signal to the image gathering card,and then the image data transmits through the computer PCI bus to the memory. At the same time,the system carries on processing and data analysis. This method is proved by experiments. The experiments are mainly about the data conversion of image and ink limit show of printing.展开更多
Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the backgroun...Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.展开更多
Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatmen...Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.展开更多
Unmanned aerial vehicle(UAV)-borne gamma-ray spectrum survey plays a crucial role in geological mapping,radioactive mineral exploration,and environmental monitoring.However,raw data are often compromised by flight and...Unmanned aerial vehicle(UAV)-borne gamma-ray spectrum survey plays a crucial role in geological mapping,radioactive mineral exploration,and environmental monitoring.However,raw data are often compromised by flight and instrument background noise,as well as detector resolution limitations,which affect the accuracy of geological interpretations.This study aims to explore the application of the Real-ESRGAN algorithm in the super-resolution reconstruction of UAV-borne gamma-ray spectrum images to enhance spatial resolution and the quality of geological feature visualization.We conducted super-resolution reconstruction experiments with 2×,4×and 6×magnification using the Real-ESRGAN algorithm,comparing the results with three other mainstream algorithms(SRCNN,SRGAN,FSRCNN)to verify the superiority in image quality.The experimental results indicate that Real-ESRGAN achieved a structural similarity index(SSIM)value of 0.950 at 2×magnification,significantly higher than the other algorithms,demonstrating its advantage in detail preservation.Furthermore,Real-ESRGAN effectively reduced ringing and overshoot artifacts,enhancing the clarity of geological structures and mineral deposit sites,thus providing high-quality visual information for geological exploration.展开更多
With the rapid development of transportation infrastructure,ensuring road safety through timely and accurate highway inspection has become increasingly critical.Traditional manual inspection methods are not only time-...With the rapid development of transportation infrastructure,ensuring road safety through timely and accurate highway inspection has become increasingly critical.Traditional manual inspection methods are not only time-consuming and labor-intensive,but they also struggle to provide consistent,high-precision detection and realtime monitoring of pavement surface defects.To overcome these limitations,we propose an Automatic Recognition of PavementDefect(ARPD)algorithm,which leverages unmanned aerial vehicle(UAV)-based aerial imagery to automate the inspection process.The ARPD framework incorporates a backbone network based on the Selective State Space Model(S3M),which is designed to capture long-range temporal dependencies.This enables effective modeling of dynamic correlations among redundant and often repetitive structures commonly found in road imagery.Furthermore,a neck structure based on Semantics and Detail Infusion(SDI)is introduced to guide cross-scale feature fusion.The SDI module enhances the integration of low-level spatial details with high-level semantic cues,thereby improving feature expressiveness and defect localization accuracy.Experimental evaluations demonstrate that theARPDalgorithm achieves a mean average precision(mAP)of 86.1%on a custom-labeled pavement defect dataset,outperforming the state-of-the-art YOLOv11 segmentation model.The algorithm also maintains strong generalization ability on public datasets.These results confirm that ARPD is well-suited for diverse real-world applications in intelligent,large-scale highway defect monitoring and maintenance planning.展开更多
基金funded by the National Natural Science Foundation of China,grant numbers 52374156 and 62476005。
文摘Images taken in dim environments frequently exhibit issues like insufficient brightness,noise,color shifts,and loss of detail.These problems pose significant challenges to dark image enhancement tasks.Current approaches,while effective in global illumination modeling,often struggle to simultaneously suppress noise and preserve structural details,especially under heterogeneous lighting.Furthermore,misalignment between luminance and color channels introduces additional challenges to accurate enhancement.In response to the aforementioned difficulties,we introduce a single-stage framework,M2ATNet,using the multi-scale multi-attention and Transformer architecture.First,to address the problems of texture blurring and residual noise,we design a multi-scale multi-attention denoising module(MMAD),which is applied separately to the luminance and color channels to enhance the structural and texture modeling capabilities.Secondly,to solve the non-alignment problem of the luminance and color channels,we introduce the multi-channel feature fusion Transformer(CFFT)module,which effectively recovers the dark details and corrects the color shifts through cross-channel alignment and deep feature interaction.To guide the model to learn more stably and efficiently,we also fuse multiple types of loss functions to form a hybrid loss term.We extensively evaluate the proposed method on various standard datasets,including LOL-v1,LOL-v2,DICM,LIME,and NPE.Evaluation in terms of numerical metrics and visual quality demonstrate that M2ATNet consistently outperforms existing advanced approaches.Ablation studies further confirm the critical roles played by the MMAD and CFFT modules to detail preservation and visual fidelity under challenging illumination-deficient environments.
基金supported by the Natural Science Foundation of the Anhui Higher Education Institutions of China(Grant Nos.2023AH040149 and 2024AH051915)the Anhui Provincial Natural Science Foundation(Grant No.2208085MF168)+1 种基金the Science and Technology Innovation Tackle Plan Project of Maanshan(Grant No.2024RGZN001)the Scientific Research Fund Project of Anhui Medical University(Grant No.2023xkj122).
文摘Convolutional neural networks(CNNs)-based medical image segmentation technologies have been widely used in medical image segmentation because of their strong representation and generalization abilities.However,due to the inability to effectively capture global information from images,CNNs can easily lead to loss of contours and textures in segmentation results.Notice that the transformer model can effectively capture the properties of long-range dependencies in the image,and furthermore,combining the CNN and the transformer can effectively extract local details and global contextual features of the image.Motivated by this,we propose a multi-branch and multi-scale attention network(M2ANet)for medical image segmentation,whose architecture consists of three components.Specifically,in the first component,we construct an adaptive multi-branch patch module for parallel extraction of image features to reduce information loss caused by downsampling.In the second component,we apply residual block to the well-known convolutional block attention module to enhance the network’s ability to recognize important features of images and alleviate the phenomenon of gradient vanishing.In the third component,we design a multi-scale feature fusion module,in which we adopt adaptive average pooling and position encoding to enhance contextual features,and then multi-head attention is introduced to further enrich feature representation.Finally,we validate the effectiveness and feasibility of the proposed M2ANet method through comparative experiments on four benchmark medical image segmentation datasets,particularly in the context of preserving contours and textures.
基金Supported by the Henan Province Key Research and Development Project(231111211300)the Central Government of Henan Province Guides Local Science and Technology Development Funds(Z20231811005)+2 种基金Henan Province Key Research and Development Project(231111110100)Henan Provincial Outstanding Foreign Scientist Studio(GZS2024006)Henan Provincial Joint Fund for Scientific and Technological Research and Development Plan(Application and Overcoming Technical Barriers)(242103810028)。
文摘The fusion of infrared and visible images should emphasize the salient targets in the infrared image while preserving the textural details of the visible images.To meet these requirements,an autoencoder-based method for infrared and visible image fusion is proposed.The encoder designed according to the optimization objective consists of a base encoder and a detail encoder,which is used to extract low-frequency and high-frequency information from the image.This extraction may lead to some information not being captured,so a compensation encoder is proposed to supplement the missing information.Multi-scale decomposition is also employed to extract image features more comprehensively.The decoder combines low-frequency,high-frequency and supplementary information to obtain multi-scale features.Subsequently,the attention strategy and fusion module are introduced to perform multi-scale fusion for image reconstruction.Experimental results on three datasets show that the fused images generated by this network effectively retain salient targets while being more consistent with human visual perception.
基金supported in part by the National Natural Science Foundation of China[62301374]Hubei Provincial Natural Science Foundation of China[2022CFB804]+2 种基金Hubei Provincial Education Research Project[B2022057]the Youths Science Foundation of Wuhan Institute of Technology[K202240]the 15th Graduate Education Innovation Fund of Wuhan Institute of Technology[CX2023295].
文摘This paper aims to develop a nonrigid registration method of preoperative and intraoperative thoracoabdominal CT images in computer-assisted interventional surgeries for accurate tumor localization and tissue visualization enhancement.However,fine structure registration of complex thoracoabdominal organs and large deformation registration caused by respiratory motion is challenging.To deal with this problem,we propose a 3D multi-scale attention VoxelMorph(MAVoxelMorph)registration network.To alleviate the large deformation problem,a multi-scale axial attention mechanism is utilized by using a residual dilated pyramid pooling for multi-scale feature extraction,and position-aware axial attention for long-distance dependencies between pixels capture.To further improve the large deformation and fine structure registration results,a multi-scale context channel attention mechanism is employed utilizing content information via adjacent encoding layers.Our method was evaluated on four public lung datasets(DIR-Lab dataset,Creatis dataset,Learn2Reg dataset,OASIS dataset)and a local dataset.Results proved that the proposed method achieved better registration performance than current state-of-the-art methods,especially in handling the registration of large deformations and fine structures.It also proved to be fast in 3D image registration,using about 1.5 s,and faster than most methods.Qualitative and quantitative assessments proved that the proposed MA-VoxelMorph has the potential to realize precise and fast tumor localization in clinical interventional surgeries.
基金funded by the Deanship of Research and Graduate Studies at King Khalid University through small group research under grant number RGP1/278/45.
文摘This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer(ViT)architectures and a dynamic multi-loss function.The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features,while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance.Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets,yielding notable improvements.Specifically,on the ISIC-2018 dataset,our method achieves an F1-Score improvement of+4.84% compared to the standard ViT,with a precision increase of+5.46% for melanoma(MEL).On the ChestX-ray14 dataset,the method delivers an F1-Score improvement of 5.3%over the conventional ViT,with precision gains of+5.0% for pneumonia(PNEU)and+5.4%for fibrosis(FIB).Experimental results demonstrate that our approach outperforms traditional CNN-based models and existing ViT variants,particularly in retrieving relevant medical cases and enhancing diagnostic accuracy.These findings highlight the potential of the proposedmethod for large-scalemedical image analysis,offering improved tools for clinical decision-making through superior classification and case comparison.
文摘This study seeks to establish a novel,semi-automatic system that utilizes Industry 4.0 principles to effectively determine both acceptable and rejectable concrete cubes with regard to their failure modes,significantly contributing to the dependability of concrete quality evaluations.The study utilizes image processing and machine learning(ML)methods,namely object detectionmodels such as YOLOv8 and Convolutional Neural Networks(CNNs),to evaluate images of concrete cubes.These models are trained and validated on an extensive database of annotated images from real-world and laboratory conditions.Preliminary results indicate a good performance in the classification of concrete cube failure modes.The proposed system accurately identifies cracks,determines the severity of damage to structures,indicating the potential to minimize human errors and discrepancies that might occur through the current techniques to detect the failure mode of concrete cubes.Thedeveloped systemcould significantly improve the reliability of concrete cube assessments,reduce resource wastage,and contribute to more sustainable construction practices.By minimizing material costs and errors,this innovation supports the construction industry’s move towards sustainability.
文摘The application of image super-resolution(SR)has brought significant assistance in the medical field,aiding doctors to make more precise diagnoses.However,solely relying on a convolutional neural network(CNN)for image SR may lead to issues such as blurry details and excessive smoothness.To address the limitations,we proposed an algorithm based on the generative adversarial network(GAN)framework.In the generator network,three different sizes of convolutions connected by a residual dense structure were used to extract detailed features,and an attention mechanism combined with dual channel and spatial information was applied to concentrate the computing power on crucial areas.In the discriminator network,using InstanceNorm to normalize tensors sped up the training process while retaining feature information.The experimental results demonstrate that our algorithm achieves higher peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM)compared to other methods,resulting in an improved visual quality.
基金Supported by National Natural Science Foundation of China(Grant No.52175528)。
文摘In micro milling machining,tool wear directly affects workpiece quality and accuracy,making effective tool wear monitoring a key factor in ensuring product integrity.The use of machine vision-based methods can provide an intuitive and efficient representation of tool wear conditions.However,micro milling tools have non-flat flanks,thin coatings can peel off,and spindle orientation is uncertain during downtime.These factors result in low pixel values,uneven illumination,and arbitrary tool position.To address this,we propose an image-based tool wear monitoring method.It combines multiple algorithms to restore lost pixels due to uneven illumination during segmentation and accurately extract wear areas.Experimental results demonstrate that the proposed algorithm exhibits high robustness to such images,effectively addressing the effects of illumination and spindle orientation.Additionally,the algorithm has low complexity,fast execution time,and significantly reduces the detection time in situ.
基金supported by the Science Committee of the Ministry of Higher Education and Science of the Republic of Kazakhstan within the framework of grant AP23489899“Applying Deep Learning and Neuroimaging Methods for Brain Stroke Diagnosis”.
文摘Deep learning now underpins many state-of-the-art systems for biomedical image and signal processing,enabling automated lesion detection,physiological monitoring,and therapy planning with accuracy that rivals expert performance.This survey reviews the principal model families as convolutional,recurrent,generative,reinforcement,autoencoder,and transfer-learning approaches as emphasising how their architectural choices map to tasks such as segmentation,classification,reconstruction,and anomaly detection.A dedicated treatment of multimodal fusion networks shows how imaging features can be integrated with genomic profiles and clinical records to yield more robust,context-aware predictions.To support clinical adoption,we outline post-hoc explainability techniques(Grad-CAM,SHAP,LIME)and describe emerging intrinsically interpretable designs that expose decision logic to end users.Regulatory guidance from the U.S.FDA,the European Medicines Agency,and the EU AI Act is summarised,linking transparency and lifecycle-monitoring requirements to concrete development practices.Remaining challenges as data imbalance,computational cost,privacy constraints,and cross-domain generalization are discussed alongside promising solutions such as federated learning,uncertainty quantification,and lightweight 3-D architectures.The article therefore offers researchers,clinicians,and policymakers a concise,practice-oriented roadmap for deploying trustworthy deep-learning systems in healthcare.
文摘Structural Health Monitoring(SHM)systems play a key role in managing buildings and infrastructure by delivering vital insights into their strength and structural integrity.There is a need for more efficient techniques to detect defects,as traditional methods are often prone to human error,and this issue is also addressed through image processing(IP).In addition to IP,automated,accurate,and real-time detection of structural defects,such as cracks,corrosion,and material degradation that conventional inspection techniques may miss,is made possible by Artificial Intelligence(AI)technologies like Machine Learning(ML)and Deep Learning(DL).This review examines the integration of computer vision and AI techniques in Structural Health Monitoring(SHM),investigating their effectiveness in detecting various forms of structural deterioration.Also,it evaluates ML and DL models in SHM for their accuracy in identifying and assessing structural damage,ultimately enhancing safety,durability,and maintenance practices in the field.Key findings reveal that AI-powered approaches,especially those utilizing IP and DL models like CNNs,significantly improve detection efficiency and accuracy,with reported accuracies in various SHM tasks.However,significant research gaps remain,including challenges with the consistency,quality,and environmental resilience of image data,a notable lack of standardized models and datasets for training across diverse structures,and concerns regarding computational costs,model interpretability,and seamless integration with existing systems.Future work should focus on developing more robust models through data augmentation,transfer learning,and hybrid approaches,standardizing protocols,and fostering interdisciplinary collaboration to overcome these limitations and achieve more reliable,scalable,and affordable SHM systems.
基金supported in part by the National Natural Science Foundation of China(12421005,12374273,and 61805077)in part by the Natural Science Foundation of Hunan Province(2025JJ50046)in part by the Hunan Provincial Major Sci-Tech Program(2023ZJ1010)。
文摘All-optical image processing has been viewed as a promising technique for its high computation speed and low power consumption.However,current methods are often restricted to few functionalities and low reconfigurabilities,which cannot meet the growing demand for device integration and scenario adaptation in next-generation vision regimes.Here,we propose and experimentally demonstrate a bilayer liquid crystal computing platform for reconfigurable image processing.Under different in-situ/ex-situ twisted/untwisted conditions of the layers,our approach allows for eight kinds of image processing functions,including one/two-channel bright field imaging,one/two-channel vortex filtering,horizontally/vertically one-dimensional edge detection,vertex detection,and photonic spin Hall effect-based resolution adjustable edge detection.A unified theoretical framework for this scheme is established on the transfer function theory,which coincides well with the experimental results.The proposed method offers an easily-switchable multi-functional solution to optical image processing by introducing mechanical degrees of freedom,which may enable emerging applications in computer vision,autonomous driving,and biomedical microscopy.
基金supported from the Strategic Pioneer Program of the Astronomy Large-Scale Scientific FacilityChinese Academy of Sciences and the Science and Education Integration Funding of University of Chinese Academy of Sciences+9 种基金the supports from the National Key Basic R&D Program of China via 2023YFA1608303the Strategic Priority Research Program of the Chinese Academy of Sciences(XDB0550103)the supports from the Strategic Priority Research Program of the Chinese Academy of Sciences under grant No.XDB0550000the National Natural Science Foundation of China(NSFC,grant Nos.12422303 and12261141690)the supports from the NSFC(grant No.12403024)supports from the NSFC through grant Nos.11988101 and 11933004the Postdoctoral Fellowship Program of CPSF under grant No.GZB20240731the Young Data Scientist Project of the National Astronomical Data Centerthe China Post-doctoral Science Foundation(No.2023M743447)supports from the New Cornerstone Science Foundation through the New Cornerstone Investigator Program and the XPLORER PRIZE。
文摘This paper provides a comprehensive introduction to the mini-Si Tian Real-time Image Processing pipeline(STRIP)and evaluates its operational performance.The STRIP pipeline is specifically designed for real-time alert triggering and light curve generation for transient sources.By applying the STRIP pipeline to both simulated and real observational data of the Mini-Si Tian survey,it successfully identified various types of variable sources,including stellar flares,supernovae,variable stars,and asteroids,while meeting requirements of reduction speed within 5 minutes.For the real observational data set,the pipeline detected one flare event,127 variable stars,and14 asteroids from three monitored sky regions.Additionally,two data sets were generated:one,a real-bogus training data set comprising 218,818 training samples,and the other,a variable star light curve data set with 421instances.These data sets will be used to train machine learning algorithms,which are planned for future integration into STRIP.
基金funded by Deanship of Graduate Studies and Scientific Research at Najran University for supporting the research project through the Nama’a program,with the project code NU/GP/MRC/13/771-4.
文摘Breast cancer remains one of the most pressing global health concerns,and early detection plays a crucial role in improving survival rates.Integrating digital mammography with computational techniques and advanced image processing has significantly enhanced the ability to identify abnormalities.However,existing methodologies face persistent challenges,including low image contrast,noise interference,and inaccuracies in segmenting regions of interest.To address these limitations,this study introduces a novel computational framework for analyzing mammographic images,evaluated using the Mammographic Image Analysis Society(MIAS)dataset comprising 322 samples.The proposed methodology follows a structured three-stage approach.Initially,mammographic scans are classified using the Breast Imaging Reporting and Data System(BI-RADS),ensuring systematic and standardized image analysis.Next,the pectoral muscle,which can interfere with accurate segmentation,is effectively removed to refine the region of interest(ROI).The final stage involves an advanced image pre-processing module utilizing Independent Component Analysis(ICA)to enhance contrast,suppress noise,and improve image clarity.Following these enhancements,a robust segmentation technique is employed to delineated abnormal regions.Experimental results validate the efficiency of the proposed framework,demonstrating a significant improvement in the Effective Measure of Enhancement(EME)and a 3 dB increase in Peak Signal-to-Noise Ratio(PSNR),indicating superior image quality.The model also achieves an accuracy of approximately 97%,surpassing contemporary techniques evaluated on the MIAS dataset.Furthermore,its ability to process mammograms across all BI-RADS categories highlights its adaptability and reliability for clinical applications.This study presents an advanced and dependable computational framework for mammographic image analysis,effectively addressing critical challenges in noise reduction,contrast enhancement,and segmentation precision.The proposed approach lays the groundwork for seamless integration into computer-aided diagnostic(CAD)systems,with the potential to significantly enhance early breast cancer detection and contribute to improved patient outcomes.
基金Project (10776020) supported by the Joint Foundation of the National Natural Science Foundation of China and China Academy of Engineering Physics
文摘In order to obtain good welding quality, it is necessary to apply quality control because there are many influencing factors in laser welding process. The key to realize welding quality control is to obtain the quality information. Abundant weld quality information is contained in weld pool and keyhole. Aiming at Nd:YAG laser welding of stainless steel, a coaxial visual sensing system was constructed. The images of weld pool and keyhole were obtained. Based on the gray character of weld pool and keyhole in images, an image processing algorithm was designed. The search start point and search criteria of weld pool and keyhole edge were determined respectively.
文摘Using the method of mathematical morphology,this paper fulfills filtration,segmentation and extraction of morphological features of the satellite cloud image.It also gives out the relative algorithms,which is realized by parallel C programming based on Transputer networks.It has been successfully used to process the typhoon and the low tornado cloud image.And it will be used in weather forecast.
文摘The traditional printing checking method always uses printing control strips,but the results are not very well in repeatability and stability. In this paper,the checking methods for printing quality basing on image are taken as research objects. On the base of the traditional checking methods of printing quality,combining the method and theory of digital image processing with printing theory in the new domain of image quality checking,it constitute the checking system of printing quality by image processing,and expound the theory design and the model of this system. This is an application of machine vision. It uses the high resolution industrial CCD(Charge Coupled Device) colorful camera. It can display the real-time photographs on the monitor,and input the video signal to the image gathering card,and then the image data transmits through the computer PCI bus to the memory. At the same time,the system carries on processing and data analysis. This method is proved by experiments. The experiments are mainly about the data conversion of image and ink limit show of printing.
基金financially supported byChongqingUniversity of Technology Graduate Innovation Foundation(Grant No.gzlcx20253267).
文摘Camouflaged Object Detection(COD)aims to identify objects that share highly similar patterns—such as texture,intensity,and color—with their surrounding environment.Due to their intrinsic resemblance to the background,camouflaged objects often exhibit vague boundaries and varying scales,making it challenging to accurately locate targets and delineate their indistinct edges.To address this,we propose a novel camouflaged object detection network called Edge-Guided and Multi-scale Fusion Network(EGMFNet),which leverages edge-guided multi-scale integration for enhanced performance.The model incorporates two innovative components:a Multi-scale Fusion Module(MSFM)and an Edge-Guided Attention Module(EGA).These designs exploit multi-scale features to uncover subtle cues between candidate objects and the background while emphasizing camouflaged object boundaries.Moreover,recognizing the rich contextual information in fused features,we introduce a Dual-Branch Global Context Module(DGCM)to refine features using extensive global context,thereby generatingmore informative representations.Experimental results on four benchmark datasets demonstrate that EGMFNet outperforms state-of-the-art methods across five evaluation metrics.Specifically,on COD10K,our EGMFNet-P improves F_(β)by 4.8 points and reduces mean absolute error(MAE)by 0.006 compared with ZoomNeXt;on NC4K,it achieves a 3.6-point increase in F_(β).OnCAMO and CHAMELEON,it obtains 4.5-point increases in F_(β),respectively.These consistent gains substantiate the superiority and robustness of EGMFNet.
基金supported by the Guangdong Pharmaceutical University 2024 Higher Education Research Projects(GKP202403,GMP202402)the Guangdong Pharmaceutical University College Students’Innovation and Entrepreneurship Training Programs(Grant No.202504302033,202504302034,202504302036,and 202504302244).
文摘Background:Diabetic macular edema is a prevalent retinal condition and a leading cause of visual impairment among diabetic patients’Early detection of affected areas is beneficial for effective diagnosis and treatment.Traditionally,diagnosis relies on optical coherence tomography imaging technology interpreted by ophthalmologists.However,this manual image interpretation is often slow and subjective.Therefore,developing automated segmentation for macular edema images is essential to enhance to improve the diagnosis efficiency and accuracy.Methods:In order to improve clinical diagnostic efficiency and accuracy,we proposed a SegNet network structure integrated with a convolutional block attention module(CBAM).This network introduces a multi-scale input module,the CBAM attention mechanism,and jump connection.The multi-scale input module enhances the network’s perceptual capabilities,while the lightweight CBAM effectively fuses relevant features across channels and spatial dimensions,allowing for better learning of varying information levels.Results:Experimental results demonstrate that the proposed network achieves an IoU of 80.127%and an accuracy of 99.162%.Compared to the traditional segmentation network,this model has fewer parameters,faster training and testing speed,and superior performance on semantic segmentation tasks,indicating its highly practical applicability.Conclusion:The C-SegNet proposed in this study enables accurate segmentation of Diabetic macular edema lesion images,which facilitates quicker diagnosis for healthcare professionals.
基金supported by the National Natural Science Foundation of China(Nos.12205044 and 12265003)2024 Jiangxi Province Civil-Military Integration Research Institute‘BeiDou+’Project Subtopic(No.2024JXRH0Y06).
文摘Unmanned aerial vehicle(UAV)-borne gamma-ray spectrum survey plays a crucial role in geological mapping,radioactive mineral exploration,and environmental monitoring.However,raw data are often compromised by flight and instrument background noise,as well as detector resolution limitations,which affect the accuracy of geological interpretations.This study aims to explore the application of the Real-ESRGAN algorithm in the super-resolution reconstruction of UAV-borne gamma-ray spectrum images to enhance spatial resolution and the quality of geological feature visualization.We conducted super-resolution reconstruction experiments with 2×,4×and 6×magnification using the Real-ESRGAN algorithm,comparing the results with three other mainstream algorithms(SRCNN,SRGAN,FSRCNN)to verify the superiority in image quality.The experimental results indicate that Real-ESRGAN achieved a structural similarity index(SSIM)value of 0.950 at 2×magnification,significantly higher than the other algorithms,demonstrating its advantage in detail preservation.Furthermore,Real-ESRGAN effectively reduced ringing and overshoot artifacts,enhancing the clarity of geological structures and mineral deposit sites,thus providing high-quality visual information for geological exploration.
基金supported in part by the Technical Service for the Development and Application of an Intelligent Visual Management Platformfor Expressway Construction Progress Based on BIM Technology(grant NO.JKYZLX-2023-09)in partby the Technical Service for the Development of an Early Warning Model in the Research and Application of Key Technologies for Tunnel Operation Safety Monitoring and Early Warning Based on Digital Twin(grant NO.JK-S02-ZNGS-202412-JISHU-FA-0035)sponsored by Yunnan Transportation Science Research Institute Co.,Ltd.
文摘With the rapid development of transportation infrastructure,ensuring road safety through timely and accurate highway inspection has become increasingly critical.Traditional manual inspection methods are not only time-consuming and labor-intensive,but they also struggle to provide consistent,high-precision detection and realtime monitoring of pavement surface defects.To overcome these limitations,we propose an Automatic Recognition of PavementDefect(ARPD)algorithm,which leverages unmanned aerial vehicle(UAV)-based aerial imagery to automate the inspection process.The ARPD framework incorporates a backbone network based on the Selective State Space Model(S3M),which is designed to capture long-range temporal dependencies.This enables effective modeling of dynamic correlations among redundant and often repetitive structures commonly found in road imagery.Furthermore,a neck structure based on Semantics and Detail Infusion(SDI)is introduced to guide cross-scale feature fusion.The SDI module enhances the integration of low-level spatial details with high-level semantic cues,thereby improving feature expressiveness and defect localization accuracy.Experimental evaluations demonstrate that theARPDalgorithm achieves a mean average precision(mAP)of 86.1%on a custom-labeled pavement defect dataset,outperforming the state-of-the-art YOLOv11 segmentation model.The algorithm also maintains strong generalization ability on public datasets.These results confirm that ARPD is well-suited for diverse real-world applications in intelligent,large-scale highway defect monitoring and maintenance planning.