Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step...Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step generation processes are often inefficient and difficult to control.To address these challenges,we propose CAFE-GAN,a CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination,which incorporates a pretrained CLIP model along with several key architectural innovations.First,we embed a coordinate attention mechanism into the generator to capture long-range dependencies and enhance feature representation.Second,we introduce a trainable linear projection layer after the CLIP text encoder,which aligns textual embeddings with the generator’s semantic space.Third,we design a multi-scale discriminator that leverages pre-trained visual features and integrates a feature regularization strategy,thereby improving training stability and discrimination performance.Experiments on the CUB and COCO datasets demonstrate that CAFE-GAN outperforms existing text-to-image generation methods,achieving lower Fréchet Inception Distance(FID)scores and generating images with superior visual quality and semantic fidelity,with FID scores of 9.84 and 5.62 on the CUB and COCO datasets,respectively,surpassing current state-of-the-art text-to-image models by varying degrees.These findings offer valuable insights for future research on efficient,controllable text-to-image synthesis.展开更多
The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-lear...The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.展开更多
Human action recognition(HAR)is crucial for the development of efficient computer vision,where bioinspired neuromorphic perception visual systems have emerged as a vital solution to address transmission bottlenecks ac...Human action recognition(HAR)is crucial for the development of efficient computer vision,where bioinspired neuromorphic perception visual systems have emerged as a vital solution to address transmission bottlenecks across sensor-processor interfaces.However,the absence of interactions among versatile biomimicking functionalities within a single device,which was developed for specific vision tasks,restricts the computational capacity,practicality,and scalability of in-sensor vision computing.Here,we propose a bioinspired vision sensor composed of a Ga N/Al N-based ultrathin quantum-disks-in-nanowires(QD-NWs)array to mimic not only Parvo cells for high-contrast vision and Magno cells for dynamic vision in the human retina but also the synergistic activity between the two cells for in-sensor vision computing.By simply tuning the applied bias voltage on each QD-NW-array-based pixel,we achieve two biosimilar photoresponse characteristics with slow and fast reactions to light stimuli that enhance the in-sensor image quality and HAR efficiency,respectively.Strikingly,the interplay and synergistic interaction of the two photoresponse modes within a single device markedly increased the HAR recognition accuracy from 51.4%to 81.4%owing to the integrated artificial vision system.The demonstration of an intelligent vision sensor offers a promising device platform for the development of highly efficient HAR systems and future smart optoelectronics.展开更多
This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 20...This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.展开更多
In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and ta...In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and targeted marketing.However,existing computer vision solutions often rely on facial recognition to gather such insights,raising significant privacy and ethical concerns.To address these issues,this paper presents a privacypreserving customer analytics system through two key strategies.First,we deploy a deep learning framework using YOLOv9s,trained on the RCA-TVGender dataset.Cameras are positioned perpendicular to observation areas to reduce facial visibility while maintaining accurate gender classification.Second,we apply AES-128 encryption to customer position data,ensuring secure access and regulatory compliance.Our system achieved overall performance,with 81.5%mAP@50,77.7%precision,and 75.7%recall.Moreover,a 90-min observational study confirmed the system’s ability to generate privacy-protected heatmaps revealing distinct behavioral patterns between male and female customers.For instance,women spent more time in certain areas and showed interest in different products.These results confirm the system’s effectiveness in enabling personalized layout and marketing strategies without compromising privacy.展开更多
Neurodegenerative diseases account for a large and increasing health and economic burden worldwide.With an increasingly aged population,this burden is set to increase.Optic neuropathies make up a large proportion of n...Neurodegenerative diseases account for a large and increasing health and economic burden worldwide.With an increasingly aged population,this burden is set to increase.Optic neuropathies make up a large proportion of neurodegenerative diseases with glaucoma being highly prevalent.Glaucoma is characterized by the progressive dysfunction and loss of retinal ganglion cells and their axons which make up the optic nerve.It is the leading cause of irreversible vision loss and affects an estimated 80 million people.The mammalian central nervous system is non-regenerative and,once lost or injured,retinal ganglion cells cannot regenerate an axon into the optic nerve under basal conditions.Thus,strategies that provide neuroprotection to stressed,dysfunctional,or dying retinal ganglion cells are likely to be of high therapeutic and translational value.Advancing age,genetics,and elevated intraocular pressure are all major risk factors for glaucoma,however,all clinically available glaucoma treatments focus on intraocular pressure management and do not directly address the neurodegenerative component of glaucoma.展开更多
Retinal ganglion cells are the bridging neurons between the eye and the central nervous system,transmitting visual signals to the brain.The injury and loss of retinal ganglion cells are the primary pathological change...Retinal ganglion cells are the bridging neurons between the eye and the central nervous system,transmitting visual signals to the brain.The injury and loss of retinal ganglion cells are the primary pathological changes in several retinal degenerative diseases,including glaucoma,ischemic optic neuropathy,diabetic neuropathy,and optic neuritis.In mammals,injured retinal ganglion cells lack regenerative capacity and undergo apoptotic cell death within a few days of injury.Additionally,these cells exhibit limited regenerative ability,ultimately contributing to vision impairment and potentially leading to blindness.Currently,the only effective clinical treatment for glaucoma is to prevent vision loss by lowering intraocular pressure through medications or surgery;however,this approach cannot halt the effect of retinal ganglion cell loss on visual function.This review comprehensively investigates the mechanisms underlying retinal ganglion cell degeneration in retinal degenerative diseases and further explores the current status and potential of cell replacement therapy for regenerating retinal ganglion cells.As our understanding of the complex processes involved in retinal ganglion cell degeneration deepens,we can explore new treatment strategies,such as cell transplantation,which may offer more effective ways to mitigate the effect of retinal degenerative diseases on vision.展开更多
文摘Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step generation processes are often inefficient and difficult to control.To address these challenges,we propose CAFE-GAN,a CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination,which incorporates a pretrained CLIP model along with several key architectural innovations.First,we embed a coordinate attention mechanism into the generator to capture long-range dependencies and enhance feature representation.Second,we introduce a trainable linear projection layer after the CLIP text encoder,which aligns textual embeddings with the generator’s semantic space.Third,we design a multi-scale discriminator that leverages pre-trained visual features and integrates a feature regularization strategy,thereby improving training stability and discrimination performance.Experiments on the CUB and COCO datasets demonstrate that CAFE-GAN outperforms existing text-to-image generation methods,achieving lower Fréchet Inception Distance(FID)scores and generating images with superior visual quality and semantic fidelity,with FID scores of 9.84 and 5.62 on the CUB and COCO datasets,respectively,surpassing current state-of-the-art text-to-image models by varying degrees.These findings offer valuable insights for future research on efficient,controllable text-to-image synthesis.
基金financially supported by the National Science Fund for Distinguished Young Scholars,China(No.52025041)the National Natural Science Foundation of China(Nos.52450003,U2341267,and 52174294)+1 种基金the National Postdoctoral Program for Innovative Talents,China(No.BX20240437)the Fundamental Research Funds for the Central Universities,China(Nos.FRF-IDRY-23-037 and FRF-TP-20-02C2)。
文摘The rapid advancements in computer vision(CV)technology have transformed the traditional approaches to material microstructure analysis.This review outlines the history of CV and explores the applications of deep-learning(DL)-driven CV in four key areas of materials science:microstructure-based performance prediction,microstructure information generation,microstructure defect detection,and crystal structure-based property prediction.The CV has significantly reduced the cost of traditional experimental methods used in material performance prediction.Moreover,recent progress made in generating microstructure images and detecting microstructural defects using CV has led to increased efficiency and reliability in material performance assessments.The DL-driven CV models can accelerate the design of new materials with optimized performance by integrating predictions based on both crystal and microstructural data,thereby allowing for the discovery and innovation of next-generation materials.Finally,the review provides insights into the rapid interdisciplinary developments in the field of materials science and future prospects.
基金funded by the National Natural Science Foundation of China(Grant Nos.62322410,52272168,624B2135,61804047)the Fundamental Research Funds for the Central Universities(No.WK2030000103)。
文摘Human action recognition(HAR)is crucial for the development of efficient computer vision,where bioinspired neuromorphic perception visual systems have emerged as a vital solution to address transmission bottlenecks across sensor-processor interfaces.However,the absence of interactions among versatile biomimicking functionalities within a single device,which was developed for specific vision tasks,restricts the computational capacity,practicality,and scalability of in-sensor vision computing.Here,we propose a bioinspired vision sensor composed of a Ga N/Al N-based ultrathin quantum-disks-in-nanowires(QD-NWs)array to mimic not only Parvo cells for high-contrast vision and Magno cells for dynamic vision in the human retina but also the synergistic activity between the two cells for in-sensor vision computing.By simply tuning the applied bias voltage on each QD-NW-array-based pixel,we achieve two biosimilar photoresponse characteristics with slow and fast reactions to light stimuli that enhance the in-sensor image quality and HAR efficiency,respectively.Strikingly,the interplay and synergistic interaction of the two photoresponse modes within a single device markedly increased the HAR recognition accuracy from 51.4%to 81.4%owing to the integrated artificial vision system.The demonstration of an intelligent vision sensor offers a promising device platform for the development of highly efficient HAR systems and future smart optoelectronics.
文摘This systematic review aims to comprehensively examine and compare deep learning methods for brain tumor segmentation and classification using MRI and other imaging modalities,focusing on recent trends from 2022 to 2025.The primary objective is to evaluate methodological advancements,model performance,dataset usage,and existing challenges in developing clinically robust AI systems.We included peer-reviewed journal articles and highimpact conference papers published between 2022 and 2025,written in English,that proposed or evaluated deep learning methods for brain tumor segmentation and/or classification.Excluded were non-open-access publications,books,and non-English articles.A structured search was conducted across Scopus,Google Scholar,Wiley,and Taylor&Francis,with the last search performed in August 2025.Risk of bias was not formally quantified but considered during full-text screening based on dataset diversity,validation methods,and availability of performance metrics.We used narrative synthesis and tabular benchmarking to compare performance metrics(e.g.,accuracy,Dice score)across model types(CNN,Transformer,Hybrid),imaging modalities,and datasets.A total of 49 studies were included(43 journal articles and 6 conference papers).These studies spanned over 9 public datasets(e.g.,BraTS,Figshare,REMBRANDT,MOLAB)and utilized a range of imaging modalities,predominantly MRI.Hybrid models,especially ResViT and UNetFormer,consistently achieved high performance,with classification accuracy exceeding 98%and segmentation Dice scores above 0.90 across multiple studies.Transformers and hybrid architectures showed increasing adoption post2023.Many studies lacked external validation and were evaluated only on a few benchmark datasets,raising concerns about generalizability and dataset bias.Few studies addressed clinical interpretability or uncertainty quantification.Despite promising results,particularly for hybrid deep learning models,widespread clinical adoption remains limited due to lack of validation,interpretability concerns,and real-world deployment barriers.
文摘In the competitive retail industry of the digital era,data-driven insights into gender-specific customer behavior are essential.They support the optimization of store performance,layout design,product placement,and targeted marketing.However,existing computer vision solutions often rely on facial recognition to gather such insights,raising significant privacy and ethical concerns.To address these issues,this paper presents a privacypreserving customer analytics system through two key strategies.First,we deploy a deep learning framework using YOLOv9s,trained on the RCA-TVGender dataset.Cameras are positioned perpendicular to observation areas to reduce facial visibility while maintaining accurate gender classification.Second,we apply AES-128 encryption to customer position data,ensuring secure access and regulatory compliance.Our system achieved overall performance,with 81.5%mAP@50,77.7%precision,and 75.7%recall.Moreover,a 90-min observational study confirmed the system’s ability to generate privacy-protected heatmaps revealing distinct behavioral patterns between male and female customers.For instance,women spent more time in certain areas and showed interest in different products.These results confirm the system’s effectiveness in enabling personalized layout and marketing strategies without compromising privacy.
基金supported by St.Erik Eye Hospital philanthropic donations,Vetenskapsrådet 2022-00799(to PAW).
文摘Neurodegenerative diseases account for a large and increasing health and economic burden worldwide.With an increasingly aged population,this burden is set to increase.Optic neuropathies make up a large proportion of neurodegenerative diseases with glaucoma being highly prevalent.Glaucoma is characterized by the progressive dysfunction and loss of retinal ganglion cells and their axons which make up the optic nerve.It is the leading cause of irreversible vision loss and affects an estimated 80 million people.The mammalian central nervous system is non-regenerative and,once lost or injured,retinal ganglion cells cannot regenerate an axon into the optic nerve under basal conditions.Thus,strategies that provide neuroprotection to stressed,dysfunctional,or dying retinal ganglion cells are likely to be of high therapeutic and translational value.Advancing age,genetics,and elevated intraocular pressure are all major risk factors for glaucoma,however,all clinically available glaucoma treatments focus on intraocular pressure management and do not directly address the neurodegenerative component of glaucoma.
基金supported by the National Key Research and Development Program of China,No.2019YFA0111200the National Natural Science Foundation of China,Nos.U23A20436,82371047+3 种基金Key Research Project in Shanxi Province,No.202302130501008Shanxi Provincial Science Fund for Distinguished Young Scholars,No.202103021221008Key Research and Development Program in Shanxi Province,No.202204051001023Shanxi Medical University Doctor’s Startup Fund Project,No.SD22028(all to YG)。
文摘Retinal ganglion cells are the bridging neurons between the eye and the central nervous system,transmitting visual signals to the brain.The injury and loss of retinal ganglion cells are the primary pathological changes in several retinal degenerative diseases,including glaucoma,ischemic optic neuropathy,diabetic neuropathy,and optic neuritis.In mammals,injured retinal ganglion cells lack regenerative capacity and undergo apoptotic cell death within a few days of injury.Additionally,these cells exhibit limited regenerative ability,ultimately contributing to vision impairment and potentially leading to blindness.Currently,the only effective clinical treatment for glaucoma is to prevent vision loss by lowering intraocular pressure through medications or surgery;however,this approach cannot halt the effect of retinal ganglion cell loss on visual function.This review comprehensively investigates the mechanisms underlying retinal ganglion cell degeneration in retinal degenerative diseases and further explores the current status and potential of cell replacement therapy for regenerating retinal ganglion cells.As our understanding of the complex processes involved in retinal ganglion cell degeneration deepens,we can explore new treatment strategies,such as cell transplantation,which may offer more effective ways to mitigate the effect of retinal degenerative diseases on vision.