The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients ...The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients who test positive for Covid-19 are diagnosed via a nasal PCR test.In comparison,polymerase chain reaction(PCR)findings take a few hours to a few days.The PCR test is expensive,although the government may bear expenses in certain places.Furthermore,subsets of the population resist invasive testing like swabs.Therefore,chest X-rays or Computerized Vomography(CT)scans are preferred in most cases,and more importantly,they are non-invasive,inexpensive,and provide a faster response time.Recent advances in Artificial Intelligence(AI),in combination with state-of-the-art methods,have allowed for the diagnosis of COVID-19 using chest x-rays.This article proposes a method for classifying COVID-19 as positive or negative on a decentralized dataset that is based on the Federated learning scheme.In order to build a progressive global COVID-19 classification model,two edge devices are employed to train the model on their respective localized dataset,and a 3-layered custom Convolutional Neural Network(CNN)model is used in the process of training the model,which can be deployed from the server.These two edge devices then communicate their learned parameter and weight to the server,where it aggregates and updates the globalmodel.The proposed model is trained using an image dataset that can be found on Kaggle.There are more than 13,000 X-ray images in Kaggle Database collection,from that collection 9000 images of Normal and COVID-19 positive images are used.Each edge node possesses a different number of images;edge node 1 has 3200 images,while edge node 2 has 5800.There is no association between the datasets of the various nodes that are included in the network.By doing it in this manner,each of the nodes will have access to a separate image collection that has no correlation with each other.The diagnosis of COVID-19 has become considerably more efficient with the installation of the suggested algorithm and dataset,and the findings that we have obtained are quite encouraging.展开更多
Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of...Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).展开更多
The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content i...The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content is key to the success of many applications such as image retrieval and content mining. Although recent years have witnessed many advances in image tagging, these methods have limitations when applied to high-quality and large-scale training data that are expensive to obtain. In this paper, we propose a novel semantic neighbor learning method based on user-contributed social image datasets that can be acquired from the Web's inexhaustible social image content. In contrast to existing image tagging approaches that rely on high-quality image-tag supervision, we acquire weak supervision of our neighbor learning method by progressive neighborhood retrieval from noisy and diverse user-contributed image collections. The retrieved neighbor images are not only visually alike and partially correlated but also semantically related. We offer a step-by-step and easy-to-use implementation for the proposed method. Extensive experimentation on several datasets demonstrates that the performance of the proposed method significantly outperforms others.展开更多
This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed ...This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed images reflecting a highly challenging and unconstraint environment.The methodology for building the dataset consists of four core phases;that include acquisition of videos,extraction of frames,localization of face regions,and cropping and resizing of detected face regions.The raw images in the dataset consist of a total of 4613 frames obtained fromvideo sequences.The processed images in the dataset consist of the face regions of 250 persons extracted from raw data images to ensure the authenticity of the presented data.The dataset further consists of 8 images corresponding to each of the 250 subjects(persons)for a total of 2000 images.It portrays a highly unconstrained and challenging environment with human faces of varying sizes and pixel quality(resolution).Since the face regions in video sequences are severely degraded due to various unavoidable factors,it can be used as a benchmark to test and evaluate face detection and recognition algorithms for research purposes.We have also gathered and displayed records of the presence of subjects who appear in presented frames;in a temporal context.This can also be used as a temporal benchmark for tracking,finding persons,activity monitoring,and crowd counting in large crowd scenarios.展开更多
A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with ...A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.展开更多
Historically,yarn-dyed plaid fabrics(YDPFs)have enjoyed enduring popularity with many rich plaid patterns,but production data are still classified and searched only according to production parameters.The process does ...Historically,yarn-dyed plaid fabrics(YDPFs)have enjoyed enduring popularity with many rich plaid patterns,but production data are still classified and searched only according to production parameters.The process does not satisfy the visual needs of sample order production,fabric design,and stock management.This study produced an image dataset for YDPFs,collected from 10,661 fabric samples.The authors believe that the dataset will have significant utility in further research into YDPFs.Convolutional neural networks,such as VGG,ResNet,and DenseNet,with different hyperparameter groups,seemed themost promising tools for the study.This paper reports on the authors’exhaustive evaluation of the YDPF dataset.With an overall accuracy of 88.78%,CNNs proved to be effective in YDPF image classification.This was true even for the low accuracy of Windowpane fabrics,which often mistakenly includes the Prince ofWales pattern.Image classification of traditional patterns is also improved by utilizing the strip pooling model to extract local detail features and horizontal and vertical directions.The strip pooling model characterizes the horizontal and vertical crisscross patterns of YDPFs with considerable success.The proposed method using the strip pooling model(SPM)improves the classification performance on the YDPF dataset by 2.64%for ResNet18,by 3.66%for VGG16,and by 3.54%for DenseNet121.The results reveal that the SPM significantly improves YDPF classification accuracy and reduces the error rate of Windowpane patterns as well.展开更多
In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a visi...In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.展开更多
Recently,many researchers have tried to develop a robust,fast,and accurate algorithm.This algorithm is for eye-tracking and detecting pupil position in many applications such as head-mounted eye tracking,gaze-based hu...Recently,many researchers have tried to develop a robust,fast,and accurate algorithm.This algorithm is for eye-tracking and detecting pupil position in many applications such as head-mounted eye tracking,gaze-based human-computer interaction,medical applications(such as deaf and diabetes patients),and attention analysis.Many real-world conditions challenge the eye appearance,such as illumination,reflections,and occasions.On the other hand,individual differences in eye physiology and other sources of noise,such as contact lenses or make-up.The present work introduces a robust pupil detection algorithm with and higher accuracy than the previous attempts for real-time analytics applications.The proposed circular hough transform with morphing canny edge detection for Pupillometery(CHMCEP)algorithm can detect even the blurred or noisy images by using different filtering methods in the pre-processing or start phase to remove the blur and noise and finally the second filtering process before the circular Hough transform for the center fitting to make sure better accuracy.The performance of the proposed CHMCEP algorithm was tested against recent pupil detection methods.Simulations and results show that the proposed CHMCEP algorithm achieved detection rates of 87.11,78.54,58,and 78 according to´Swirski,ExCuSe,Else,and labeled pupils in the wild(LPW)data sets,respectively.These results show that the proposed approach performs better than the other pupil detection methods by a large margin by providing exact and robust pupil positions on challenging ordinary eye pictures.展开更多
Space object recognition plays an important role in spatial exploitation and surveillance, followed by two main problems: lacking of data and drastic changes in viewpoints. In this article, firstly, we build a three-...Space object recognition plays an important role in spatial exploitation and surveillance, followed by two main problems: lacking of data and drastic changes in viewpoints. In this article, firstly, we build a three-dimensional (3D) satellites dataset named BUAA Satellite Image Dataset (BUAA-SID 1.0) to supply data for 3D space object research. Then, based on the dataset, we propose to recognize full-viewpoint 3D space objects based on kernel locality preserving projections (KLPP). To obtain more accurate and separable description of the objects, firstly, we build feature vectors employing moment invariants, Fourier descriptors, region covariance and histogram of oriented gradients. Then, we map the features into kernel space followed by dimensionality reduction using KLPP to obtain the submanifold of the features. At last, k-nearest neighbor (kNN) is used to accomplish the classification. Experimental results show that the proposed approach is more appropriate for space object recognition mainly considering changes of viewpoints. Encouraging recognition rate could be obtained based on images in BUAA-SID 1.0, and the highest recognition result could achieve 95.87%.展开更多
The automatic classification of thyroid nodules in ultrasound images is a critical research focus in medical imaging.However,publicly available thyroid ultrasound datasets remain scarce.In this study,we developed the ...The automatic classification of thyroid nodules in ultrasound images is a critical research focus in medical imaging.However,publicly available thyroid ultrasound datasets remain scarce.In this study,we developed the Ultrasound Dataset for Thyroid Nodules(UD-TN),a comprehensive dataset containing 10,495 labeled images classified as benign or malignant based on pathology-confirmed results.To establish a benchmark,we proposed the Thyroid Ultrasound Image Neural Network(ThyUNet),a deep learning model designed for accurate nodule classification.By incorporating high-resolution feature enhancement,instance normalization,and dilated convolutions into residual blocks,ThyUNet excels in extracting fine-grained features,particularly for small nodules.Experimental results demonstrate that ThyUNet achieves state-of-the-art performance,with an accuracy of 89.7%,a sensitivity of 0.879,and a specificity of 0.910 on the testing set.These results surpass those of other advanced architectures,highlighting the model’s effectiveness.UD-TN and ThyUNet contribute significantly to advancing intelligent medical diagnostics.Dataset details and access instructions are available at https://github.com/18811755633/Sample-of-UD-TN.展开更多
During emergency evacuation,it is crucial to accurately detect and classify different groups of evacuees based on their behaviours using computer vision.Traditional object detection models trained on standard image da...During emergency evacuation,it is crucial to accurately detect and classify different groups of evacuees based on their behaviours using computer vision.Traditional object detection models trained on standard image databases often fail to recognise individuals in specific groups such as the elderly,disabled individuals and pregnant women,who require additional assistance during emergencies.To address this limitation,this study proposes a novel image dataset called the Human Behaviour Detection Dataset(HBDset),specifically collected and anno-tated for public safety and emergency response purposes.This dataset contains eight types of human behaviour categories,i.e.the normal adult,child,holding a crutch,holding a baby,using a wheelchair,pregnant woman,lugging luggage and using a mobile phone.The dataset comprises more than 1,5o0 images collected from various public scenarios,with more than 2,9oo bounding box annotations.The images were carefully selected,cleaned and subsequently manually annotated using the Labellmg tool.To demonstrate the effectiveness of the dataset,classical object detection algorithms were trained and tested based on the HBDset,and the average detection accuracy exceeds 90%,highlighting the robustness and universality of the dataset.The developed open HBDset has the potential to enhance public safety,provide early disaster warnings and prioritise the needs of vulnerable individuals during emergency evacuation.展开更多
Images are widely used by companies to advertise their products and promote awareness of their brands.The automatic synthesis of advertising images is challenging because the advertising message must be clearly convey...Images are widely used by companies to advertise their products and promote awareness of their brands.The automatic synthesis of advertising images is challenging because the advertising message must be clearly conveyed while complying with the style required for the product,brand,or target audience.In this study,we proposed a data-driven method to capture individual design attributes and the relationships between elements in advertising images with the aim of automatically synthesizing the input of elements into an advertising image according to a specified style.To achieve this multi-format advertisement design,we created a dataset containing 13280 advertising images with rich annotations that encompassed the outlines and colors of the elements,in addition to the classes and goals of the advertisements.Using our probabilistic models,users guided the style of synthesized advertisements via additional constraints(e.g.,context-based keywords).We applied our method to a variety of design tasks,and the results were evaluated in several perceptual studies,which showed that our method improved users’satisfaction by 7.1%compared to designs generated by nonprofessional students,and that more users preferred the coloring results of our designs to those generated by the color harmony model and Colormind.展开更多
Detection of cracks in concrete structures is critical for their safety and the sustainability of maintenance processes.Traditional inspection techniques are costly,time-consuming,and inefficient regarding human resou...Detection of cracks in concrete structures is critical for their safety and the sustainability of maintenance processes.Traditional inspection techniques are costly,time-consuming,and inefficient regarding human resources.Deep learning architectures have become more widespread in recent years by accelerating these processes and increasing their efficiency.Deep learning models(DLMs)stand out as an effective solution in crack detection due to their features such as end-to-end learning capability,model adaptation,and automatic learning processes.However,providing an optimal balance between model performance and computational efficiency of DLMs is a vital research topic.In this article,three different methods are proposed for detecting cracks in concrete structures.In the first method,a Separable Convolutional with Attention and Multi-layer Enhanced Fusion Network(SCAMEFNet)deep neural network,which has a deep architecture and can provide a balance between the depth of DLMs and model parameters,has been developed.This model was designed using a convolutional neural network,multi-head attention,and various fusion techniques.The second method proposes a modified vision transformer(ViT)model.A two-stage ensemble learning model,deep featurebased two-stage ensemble model(DFTSEM),is proposed in the third method.In this method,deep features and machine learning methods are used.The proposed approaches are evaluated using the Concrete Cracks Image Data set,which the authors collected and contains concrete cracks on building surfaces.The results show that the SCAMEFNet model achieved an accuracy rate of 98.83%,the ViT model 97.33%,and the DFTSEM model 99.00%.These findings show that the proposed techniques successfully detect surface cracks and deformations and can provide practical solutions to realworld problems.In addition,the developed methods can contribute as a tool for BIM platforms in smart cities for building health.展开更多
Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures.Recently,the deep neural network recognizers based on the encoder-decoder ...Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures.Recently,the deep neural network recognizers based on the encoder-decoder frame-work have achieved great improvements on this task.However,the unsatisfactory recognition performance for formulas with long LTeX strings is one shortcoming of the existing work.Moreover,lacking sufficient training data also limits the capability of these recognizers.In this paper,we design a multimodal dependence attention(MDA)module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition perfor-mance of the formulas with long LTeX strings.To alleviate overfitting and further improve the recognition performance,we also propose a new dataset,Handwritten Formula Image Dataset(HFID),which contains 25620 handwritten formula images collected from real life.We conduct extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieve state-of-the-art performances,63.79%and 65.24%expression accuracy on CROHME 2014 and CROHME 2016,respectively.展开更多
The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer.Meanwhile,the human mind is limited in effectively handling and fully utilizing the accumu...The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer.Meanwhile,the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data.Machine learningbased approaches play a critical role in integrating and analyzing these large and complex datasets,which have extensively characterized lung cancer through the use of different perspectives from these accrued data.In this review,we provide an overview of machine learning-based approaches that strengthen the varying aspects of lung cancer diagnosis and therapy,including early detection,auxiliary diagnosis,prognosis prediction,and immunotherapy practice.Moreover,we highlight the challenges and opportunities for future applications of machine learning in lung cancer.展开更多
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2023R66)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients who test positive for Covid-19 are diagnosed via a nasal PCR test.In comparison,polymerase chain reaction(PCR)findings take a few hours to a few days.The PCR test is expensive,although the government may bear expenses in certain places.Furthermore,subsets of the population resist invasive testing like swabs.Therefore,chest X-rays or Computerized Vomography(CT)scans are preferred in most cases,and more importantly,they are non-invasive,inexpensive,and provide a faster response time.Recent advances in Artificial Intelligence(AI),in combination with state-of-the-art methods,have allowed for the diagnosis of COVID-19 using chest x-rays.This article proposes a method for classifying COVID-19 as positive or negative on a decentralized dataset that is based on the Federated learning scheme.In order to build a progressive global COVID-19 classification model,two edge devices are employed to train the model on their respective localized dataset,and a 3-layered custom Convolutional Neural Network(CNN)model is used in the process of training the model,which can be deployed from the server.These two edge devices then communicate their learned parameter and weight to the server,where it aggregates and updates the globalmodel.The proposed model is trained using an image dataset that can be found on Kaggle.There are more than 13,000 X-ray images in Kaggle Database collection,from that collection 9000 images of Normal and COVID-19 positive images are used.Each edge node possesses a different number of images;edge node 1 has 3200 images,while edge node 2 has 5800.There is no association between the datasets of the various nodes that are included in the network.By doing it in this manner,each of the nodes will have access to a separate image collection that has no correlation with each other.The diagnosis of COVID-19 has become considerably more efficient with the installation of the suggested algorithm and dataset,and the findings that we have obtained are quite encouraging.
基金supported by the National Natural Science Foundation of China(Grant Nos.62072250,61772281,61702235,U1636117,U1804263,62172435,61872203 and 61802212)the Zhongyuan Science and Technology Innovation Leading Talent Project of China(Grant No.214200510019)+3 种基金the Suqian Municipal Science and Technology Plan Project in 2020(S202015)the Plan for Scientific Talent of Henan Province(Grant No.2018JR0018)the Opening Project of Guangdong Provincial Key Laboratory of Information Security Technology(Grant No.2020B1212060078)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)Fund.
文摘Medical image segmentation,i.e.,labeling structures of interest in medical images,is crucial for disease diagnosis and treatment in radiology.In reversible data hiding in medical images(RDHMI),segmentation consists of only two regions:the focal and nonfocal regions.The focal region mainly contains information for diagnosis,while the nonfocal region serves as the monochrome background.The current traditional segmentation methods utilized in RDHMI are inaccurate for complex medical images,and manual segmentation is time-consuming,poorly reproducible,and operator-dependent.Implementing state-of-the-art deep learning(DL)models will facilitate key benefits,but the lack of domain-specific labels for existing medical datasets makes it impossible.To address this problem,this study provides labels of existing medical datasets based on a hybrid segmentation approach to facilitate the implementation of DL segmentation models in this domain.First,an initial segmentation based on a 33 kernel is performed to analyze×identified contour pixels before classifying pixels into focal and nonfocal regions.Then,several human expert raters evaluate and classify the generated labels into accurate and inaccurate labels.The inaccurate labels undergo manual segmentation by medical practitioners and are scored based on a hierarchical voting scheme before being assigned to the proposed dataset.To ensure reliability and integrity in the proposed dataset,we evaluate the accurate automated labels with manually segmented labels by medical practitioners using five assessment metrics:dice coefficient,Jaccard index,precision,recall,and accuracy.The experimental results show labels in the proposed dataset are consistent with the subjective judgment of human experts,with an average accuracy score of 94%and dice coefficient scores between 90%-99%.The study further proposes a ResNet-UNet with concatenated spatial and channel squeeze and excitation(scSE)architecture for semantic segmentation to validate and illustrate the usefulness of the proposed dataset.The results demonstrate the superior performance of the proposed architecture in accurately separating the focal and nonfocal regions compared to state-of-the-art architectures.Dataset information is released under the following URL:https://www.kaggle.com/lordamoah/datasets(accessed on 31 March 2025).
基金supported in part by the National Natural Science Foundation of China(Nos.61502094 and 61402099)Natural Science Foundation of Heilongjiang Province of China(Nos.F2016002 and F2015020)
文摘The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content is key to the success of many applications such as image retrieval and content mining. Although recent years have witnessed many advances in image tagging, these methods have limitations when applied to high-quality and large-scale training data that are expensive to obtain. In this paper, we propose a novel semantic neighbor learning method based on user-contributed social image datasets that can be acquired from the Web's inexhaustible social image content. In contrast to existing image tagging approaches that rely on high-quality image-tag supervision, we acquire weak supervision of our neighbor learning method by progressive neighborhood retrieval from noisy and diverse user-contributed image collections. The retrieved neighbor images are not only visually alike and partially correlated but also semantically related. We offer a step-by-step and easy-to-use implementation for the proposed method. Extensive experimentation on several datasets demonstrates that the performance of the proposed method significantly outperforms others.
基金This research was supported by the Deanship of Scientific Research,Islamic University of Madinah,Madinah(KSA),under Tammayuz program Grant Number 1442/505.
文摘This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed images reflecting a highly challenging and unconstraint environment.The methodology for building the dataset consists of four core phases;that include acquisition of videos,extraction of frames,localization of face regions,and cropping and resizing of detected face regions.The raw images in the dataset consist of a total of 4613 frames obtained fromvideo sequences.The processed images in the dataset consist of the face regions of 250 persons extracted from raw data images to ensure the authenticity of the presented data.The dataset further consists of 8 images corresponding to each of the 250 subjects(persons)for a total of 2000 images.It portrays a highly unconstrained and challenging environment with human faces of varying sizes and pixel quality(resolution).Since the face regions in video sequences are severely degraded due to various unavoidable factors,it can be used as a benchmark to test and evaluate face detection and recognition algorithms for research purposes.We have also gathered and displayed records of the presence of subjects who appear in presented frames;in a temporal context.This can also be used as a temporal benchmark for tracking,finding persons,activity monitoring,and crowd counting in large crowd scenarios.
基金funded by theDeanship of Research andGraduate Studies at King Khalid University through Large Research Project under grant number RGP2/417/46.
文摘A phase-aware cross-modal framework is presented that synthesizes UWF_FA from non-invasive UWF_RI for diabetic retinopathy(DR)stratification.A curated cohort of 1198 patients(2915 UWF_RI and 17,854 UWF_FA images)with strict registration quality supports training across three angiographic phases(initial,mid,final).The generator is based on a modified pix2pixHD with an added Gradient Variance Loss to better preserve microvasculature,and is evaluated using MAE,PSNR,SSIM,and MS-SSIM on held-out pairs.Quantitatively,the mid phase achieves the lowestMAE(98.76±42.67),while SSIM remains high across phases.Expert reviewshows substantial agreement(Cohen's κ=0.78–0.82)and Turing-stylemisclassification of 50%–70%of synthetic images as real,indicating strong perceptual realism.For downstream DR stratification,fusing multi-phase synthetic UWF_FA with UWF_RI in a Swin Transformer classifier yields significant gains over a UWF_RI-only baseline,with the full-phase setting(Set D)reaching AUC=0.910 and accuracy=0.829.These results support synthetic UWF_FA as a scalable,non-invasive complement to dye-based angiography that enhances screening accuracy while avoiding injection-related risks.
基金This work was supported by China Social Science Foundation under Grant[17CG209]The fabric samples were supported by Jiangsu Sunshine Group and Jiangsu Lianfa Textile Group.
文摘Historically,yarn-dyed plaid fabrics(YDPFs)have enjoyed enduring popularity with many rich plaid patterns,but production data are still classified and searched only according to production parameters.The process does not satisfy the visual needs of sample order production,fabric design,and stock management.This study produced an image dataset for YDPFs,collected from 10,661 fabric samples.The authors believe that the dataset will have significant utility in further research into YDPFs.Convolutional neural networks,such as VGG,ResNet,and DenseNet,with different hyperparameter groups,seemed themost promising tools for the study.This paper reports on the authors’exhaustive evaluation of the YDPF dataset.With an overall accuracy of 88.78%,CNNs proved to be effective in YDPF image classification.This was true even for the low accuracy of Windowpane fabrics,which often mistakenly includes the Prince ofWales pattern.Image classification of traditional patterns is also improved by utilizing the strip pooling model to extract local detail features and horizontal and vertical directions.The strip pooling model characterizes the horizontal and vertical crisscross patterns of YDPFs with considerable success.The proposed method using the strip pooling model(SPM)improves the classification performance on the YDPF dataset by 2.64%for ResNet18,by 3.66%for VGG16,and by 3.54%for DenseNet121.The results reveal that the SPM significantly improves YDPF classification accuracy and reduces the error rate of Windowpane patterns as well.
基金supported by the National Natural Science Foundation of China (61702528,61806212)。
文摘In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions.
基金This research was funded by“TAIF UNIVERSITY RESEARCHERS SUPPORTING PROJECT,grant number TURSP-2020/345”,Taif University,Taif,Saudi Arabia.
文摘Recently,many researchers have tried to develop a robust,fast,and accurate algorithm.This algorithm is for eye-tracking and detecting pupil position in many applications such as head-mounted eye tracking,gaze-based human-computer interaction,medical applications(such as deaf and diabetes patients),and attention analysis.Many real-world conditions challenge the eye appearance,such as illumination,reflections,and occasions.On the other hand,individual differences in eye physiology and other sources of noise,such as contact lenses or make-up.The present work introduces a robust pupil detection algorithm with and higher accuracy than the previous attempts for real-time analytics applications.The proposed circular hough transform with morphing canny edge detection for Pupillometery(CHMCEP)algorithm can detect even the blurred or noisy images by using different filtering methods in the pre-processing or start phase to remove the blur and noise and finally the second filtering process before the circular Hough transform for the center fitting to make sure better accuracy.The performance of the proposed CHMCEP algorithm was tested against recent pupil detection methods.Simulations and results show that the proposed CHMCEP algorithm achieved detection rates of 87.11,78.54,58,and 78 according to´Swirski,ExCuSe,Else,and labeled pupils in the wild(LPW)data sets,respectively.These results show that the proposed approach performs better than the other pupil detection methods by a large margin by providing exact and robust pupil positions on challenging ordinary eye pictures.
基金National Natural Science Foundation of China (60776793,60802043)National Basic Research Program of China (2010CB327900)
文摘Space object recognition plays an important role in spatial exploitation and surveillance, followed by two main problems: lacking of data and drastic changes in viewpoints. In this article, firstly, we build a three-dimensional (3D) satellites dataset named BUAA Satellite Image Dataset (BUAA-SID 1.0) to supply data for 3D space object research. Then, based on the dataset, we propose to recognize full-viewpoint 3D space objects based on kernel locality preserving projections (KLPP). To obtain more accurate and separable description of the objects, firstly, we build feature vectors employing moment invariants, Fourier descriptors, region covariance and histogram of oriented gradients. Then, we map the features into kernel space followed by dimensionality reduction using KLPP to obtain the submanifold of the features. At last, k-nearest neighbor (kNN) is used to accomplish the classification. Experimental results show that the proposed approach is more appropriate for space object recognition mainly considering changes of viewpoints. Encouraging recognition rate could be obtained based on images in BUAA-SID 1.0, and the highest recognition result could achieve 95.87%.
基金the Young Scientists Fund of the National Natural Science Foundation of China(Grant No.82402274 and 82272008)the Science&Technology Development Fund of Tianjin Education Commission for Higher Education(Grant No.2021KJ194).
文摘The automatic classification of thyroid nodules in ultrasound images is a critical research focus in medical imaging.However,publicly available thyroid ultrasound datasets remain scarce.In this study,we developed the Ultrasound Dataset for Thyroid Nodules(UD-TN),a comprehensive dataset containing 10,495 labeled images classified as benign or malignant based on pathology-confirmed results.To establish a benchmark,we proposed the Thyroid Ultrasound Image Neural Network(ThyUNet),a deep learning model designed for accurate nodule classification.By incorporating high-resolution feature enhancement,instance normalization,and dilated convolutions into residual blocks,ThyUNet excels in extracting fine-grained features,particularly for small nodules.Experimental results demonstrate that ThyUNet achieves state-of-the-art performance,with an accuracy of 89.7%,a sensitivity of 0.879,and a specificity of 0.910 on the testing set.These results surpass those of other advanced architectures,highlighting the model’s effectiveness.UD-TN and ThyUNet contribute significantly to advancing intelligent medical diagnostics.Dataset details and access instructions are available at https://github.com/18811755633/Sample-of-UD-TN.
基金funded by the Hong Kong Research Grants Council Theme-based Research Scheme(T22-505/19-N)the National Natural Science Foundation of China(52204232)MTR Research Fund(PTU-23005).
文摘During emergency evacuation,it is crucial to accurately detect and classify different groups of evacuees based on their behaviours using computer vision.Traditional object detection models trained on standard image databases often fail to recognise individuals in specific groups such as the elderly,disabled individuals and pregnant women,who require additional assistance during emergencies.To address this limitation,this study proposes a novel image dataset called the Human Behaviour Detection Dataset(HBDset),specifically collected and anno-tated for public safety and emergency response purposes.This dataset contains eight types of human behaviour categories,i.e.the normal adult,child,holding a crutch,holding a baby,using a wheelchair,pregnant woman,lugging luggage and using a mobile phone.The dataset comprises more than 1,5o0 images collected from various public scenarios,with more than 2,9oo bounding box annotations.The images were carefully selected,cleaned and subsequently manually annotated using the Labellmg tool.To demonstrate the effectiveness of the dataset,classical object detection algorithms were trained and tested based on the HBDset,and the average detection accuracy exceeds 90%,highlighting the robustness and universality of the dataset.The developed open HBDset has the potential to enhance public safety,provide early disaster warnings and prioritise the needs of vulnerable individuals during emergency evacuation.
基金Project supported by the National Science and Technology Innovation 2030 Major Project of the Ministry of Science and Technology of China(No.2018AAA0100700)the National Natural Science Foundation of China(No.61672451)+2 种基金the Provincial Key Research and Development Plan of Zhejiang Province,China(No.2019C03137)the China Postdoctoral Science Foundation(No.2018M630658)the Alibaba-Zhejiang University Joint Institute of Frontier Technologies。
文摘Images are widely used by companies to advertise their products and promote awareness of their brands.The automatic synthesis of advertising images is challenging because the advertising message must be clearly conveyed while complying with the style required for the product,brand,or target audience.In this study,we proposed a data-driven method to capture individual design attributes and the relationships between elements in advertising images with the aim of automatically synthesizing the input of elements into an advertising image according to a specified style.To achieve this multi-format advertisement design,we created a dataset containing 13280 advertising images with rich annotations that encompassed the outlines and colors of the elements,in addition to the classes and goals of the advertisements.Using our probabilistic models,users guided the style of synthesized advertisements via additional constraints(e.g.,context-based keywords).We applied our method to a variety of design tasks,and the results were evaluated in several perceptual studies,which showed that our method improved users’satisfaction by 7.1%compared to designs generated by nonprofessional students,and that more users preferred the coloring results of our designs to those generated by the color harmony model and Colormind.
文摘Detection of cracks in concrete structures is critical for their safety and the sustainability of maintenance processes.Traditional inspection techniques are costly,time-consuming,and inefficient regarding human resources.Deep learning architectures have become more widespread in recent years by accelerating these processes and increasing their efficiency.Deep learning models(DLMs)stand out as an effective solution in crack detection due to their features such as end-to-end learning capability,model adaptation,and automatic learning processes.However,providing an optimal balance between model performance and computational efficiency of DLMs is a vital research topic.In this article,three different methods are proposed for detecting cracks in concrete structures.In the first method,a Separable Convolutional with Attention and Multi-layer Enhanced Fusion Network(SCAMEFNet)deep neural network,which has a deep architecture and can provide a balance between the depth of DLMs and model parameters,has been developed.This model was designed using a convolutional neural network,multi-head attention,and various fusion techniques.The second method proposes a modified vision transformer(ViT)model.A two-stage ensemble learning model,deep featurebased two-stage ensemble model(DFTSEM),is proposed in the third method.In this method,deep features and machine learning methods are used.The proposed approaches are evaluated using the Concrete Cracks Image Data set,which the authors collected and contains concrete cracks on building surfaces.The results show that the SCAMEFNet model achieved an accuracy rate of 98.83%,the ViT model 97.33%,and the DFTSEM model 99.00%.These findings show that the proposed techniques successfully detect surface cracks and deformations and can provide practical solutions to realworld problems.In addition,the developed methods can contribute as a tool for BIM platforms in smart cities for building health.
基金supported by the National Key Research and Development Program of China under Grant No.2020YFB1313602.
文摘Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures.Recently,the deep neural network recognizers based on the encoder-decoder frame-work have achieved great improvements on this task.However,the unsatisfactory recognition performance for formulas with long LTeX strings is one shortcoming of the existing work.Moreover,lacking sufficient training data also limits the capability of these recognizers.In this paper,we design a multimodal dependence attention(MDA)module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition perfor-mance of the formulas with long LTeX strings.To alleviate overfitting and further improve the recognition performance,we also propose a new dataset,Handwritten Formula Image Dataset(HFID),which contains 25620 handwritten formula images collected from real life.We conduct extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieve state-of-the-art performances,63.79%and 65.24%expression accuracy on CROHME 2014 and CROHME 2016,respectively.
基金supported in part by the National Institutes of Health,USA(Grant Nos.U01TR003528 and R01LM013337).
文摘The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer.Meanwhile,the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data.Machine learningbased approaches play a critical role in integrating and analyzing these large and complex datasets,which have extensively characterized lung cancer through the use of different perspectives from these accrued data.In this review,we provide an overview of machine learning-based approaches that strengthen the varying aspects of lung cancer diagnosis and therapy,including early detection,auxiliary diagnosis,prognosis prediction,and immunotherapy practice.Moreover,we highlight the challenges and opportunities for future applications of machine learning in lung cancer.