In recent years,the rapid development of artificial intelligence has driven the widespread deployment of visual systems in complex environments such as autonomous driving,security surveillance,and medical diagnosis.Ho...In recent years,the rapid development of artificial intelligence has driven the widespread deployment of visual systems in complex environments such as autonomous driving,security surveillance,and medical diagnosis.However,existing image sensors—such as CMOS and CCD devices—intrinsically suffer from the limitation of fixed spectral response.Especially in environments with strong glare,haze,or dust,external spectral conditions often severely mismatch the device's design range,leading to significant degradation in image quality and a sharp drop in target recognition accuracy.While algorithmic post-processing(such as color bias correction or background suppression)can mitigate these issues,algorithm approaches typically introduce computational latency and increased energy consumption,making them unsuitable for edge computing or high-speed scenarios.展开更多
With the rapid development of automated visual analysis,visual analysis systems have become a popular research topic in the field of computer vision and automated analysis.Visual analysis systems can assist humans to ...With the rapid development of automated visual analysis,visual analysis systems have become a popular research topic in the field of computer vision and automated analysis.Visual analysis systems can assist humans to detect anomalous events(e.g.,fighting,walking alone on the grass,etc).In general,the existing methods for visual anomaly detection are usually based on an autoencoder architecture,i.e.,reconstructing the current frame or predicting the future frame.Then,the reconstruction error is adopted as the evaluation metric to identify whether an input is abnormal or not.The flaws of the existing methods are that abnormal samples can also be reconstructed well.In this paper,inspired by the human memory ability,we propose a novel deep neural network(DNN)based model termed cognitive memory-augmented network(CMAN)for the visual anomaly detection problem.The proposed CMAN model assumes that the visual analysis system imitates humans to remember normal samples and then distinguishes abnormal events from the collected videos.Specifically,in the proposed CMAN model,we introduce a memory module that is able to simulate the memory capacity of humans and a density estimation network that can learn the data distribution.The reconstruction errors and the novelty scores are used to distinguish abnormal events from videos.In addition,we develop a two-step scheme to train the proposed model so that the proposed memory module and the density estimation network can cooperate to improve performance.Comprehensive experiments evaluated on various popular benchmarks show the superiority and effectiveness of the proposed CMAN model for visual anomaly detection comparing with the state-of-the-arts methods.The implementation code of our CMAN method can be accessed at https://github.com/CMANcode/CMAN_pytorch.展开更多
Rapid developments in artificial intelligence trigger demands for perception and learning of external environments through visual perception systems.Neuromorphic devices and integrated system with photosensing and res...Rapid developments in artificial intelligence trigger demands for perception and learning of external environments through visual perception systems.Neuromorphic devices and integrated system with photosensing and response functions can be constructed to mimic complex biological visual sensing behaviors.Here,recent progresses on optoelectronic neuromorphic memristors and optoelectronic neuromorphic transistors are briefly reviewed.A variety of visual synaptic functions stimulated on optoelectronic neuromorphic devices are discussed,including light-triggered short-term plasticities,long-term plasticities,and neural facilitation.These optoelectronic neuromorphic devices can also mimic human visual perception,information processing,and cognition.The optoelectronic neuromorphic devices that simulate biological visual perception functions will have potential application prospects in areas such as bionic neurological optoelectronic systems and intelligent robots.展开更多
Optoelectronic memristors possess capabilities of data storage and mimicking human visual perception.They hold great promise in neuromorphic visual systems(NVs).This study introduces the amorphous wide-bandgap Ga_(2)O...Optoelectronic memristors possess capabilities of data storage and mimicking human visual perception.They hold great promise in neuromorphic visual systems(NVs).This study introduces the amorphous wide-bandgap Ga_(2)O_(3)photoelectric synaptic memristor,which achieves 3-bit data storage through the adjustment of current compliance(Icc)and the utilization of variable ultraviolet(UV-254 nm)light intensities.The“AND”and“OR”logic gates in memristor-aided logic(MAGIC)are implemented by utilizing voltage polarity and UV light as input signals.The device also exhibits highly stable synaptic characteristics such as paired-pulse facilitation(PPF),spike-intensity dependent plasticity(SIDP),spike-number dependent plasticity(SNDP),spike-time dependent plasticity(STDP),spike-frequency dependent plasticity(SFDP)and the learning experience behavior.Finally,when integrated into an artificial neural network(ANN),the Ag/Ga_(2)O_(3)/Pt memristive device mimicked optical pulse potentiation and electrical pulse depression with high pattern accuracy(90.7%).The single memristive cells with multifunctional features are promising candidates for optoelectronic memory storage,neuromorphic computing,and artificial visual perception applications.展开更多
Direct volume rendering (DVR) is a powerful visualization technique which allows users to effectively explore and study volumetric datasets. Different transparency settings can be flexibly assigned to different stru...Direct volume rendering (DVR) is a powerful visualization technique which allows users to effectively explore and study volumetric datasets. Different transparency settings can be flexibly assigned to different structures such that some valuable information can be revealed in direct volume rendered images (DVRIs). However, end-users often feel that some risks are always associated with DVR because they do not know whether any important information is missing from the transparent regions of DVRIs. In this paper, we investigate how to semi-automatically generate a set of DVRIs and also an animation which can reveal information missed in the original DVRIs and meanwhile satisfy some image quality criteria such as coherence. A complete framework is developed to tackle various problems related to the generation and quality evaluation of visibility-aware DVRIs and animations. Our technique can reduce the risk of using direct volume rendering and thus boost the confidence of users in volume rendering systems.展开更多
Acoustic quality detection is vital in the manufactured products quality control field since it represents the conditions of machines or products.Recent work employed machine learning models in manufactured audio data...Acoustic quality detection is vital in the manufactured products quality control field since it represents the conditions of machines or products.Recent work employed machine learning models in manufactured audio data to detect anomalous patterns.A major challenge is how to select applicable audio features to meliorate model’s accuracy and precision.To relax this challenge,we extract and analyze three audio feature types including Time Domain Feature,Frequency Domain Feature,and Cepstrum Feature to help identify the potential linear and non-linear relationships.In addition,we design a visual analysis system,namely AFExplorer,to assist data scientists in extracting audio features and selecting potential feature combinations.AFExplorer integrates four main views to present detailed distribution and relevance of the audio features,which helps users observe the impact of features visually in the feature selection.We perform the case study with AFExplore according to the ToyADMOS and MIMII Dataset to demonstrate the usability and effectiveness of the proposed system.展开更多
Visualization literacy,the ability to interpret and comprehend visual designs,is recognized as an essential skill by the visualization community.We identify and investigate barriers to comprehending parallel coordinat...Visualization literacy,the ability to interpret and comprehend visual designs,is recognized as an essential skill by the visualization community.We identify and investigate barriers to comprehending parallel coordinates plots(PCPs),one of the advanced graphical representations for the display of multivariate and high-dimensional data.We develop a parallel coordinates literacy test with diverse images generated using popular PCP software tools.The test improves PCP literacy and evaluates the user’s literacy skills.We introduce an interactive educational tool that assists the teaching and learning of parallel coordinates by offering a more active learning experience.Using this pedagogical tool,we aim to advance novice users’parallel coordinates literacy skills.Based on the hypothesis that an interactive tool that links traditional Cartesian Coordinates with PCPs interactively will enhance PCP literacy further than static slides,we compare the learning experience using traditional slides with our novel software tool and investigate the efficiency of the educational software with an online,crowdsourced user-study.User-study results show that our pedagogical tool positively impacts a user’s PCP comprehension.展开更多
Innovations on the Internet of Everything(IoE)enabled systems are driving a change in the settings where we interact in smart units,recognized globally as smart city environments.However,intelligent video-surveillance...Innovations on the Internet of Everything(IoE)enabled systems are driving a change in the settings where we interact in smart units,recognized globally as smart city environments.However,intelligent video-surveillance systems are critical to increasing the security of these smart cities.More precisely,in today’s world of smart video surveillance,person re-identification(Re-ID)has gained increased consideration by researchers.Various researchers have designed deep learningbased algorithms for person Re-ID because they have achieved substantial breakthroughs in computer vision problems.In this line of research,we designed an adaptive feature refinementbased deep learning architecture to conduct person Re-ID.In the proposed architecture,the inter-channel and inter-spatial relationship of features between the images of the same individual taken from nonidentical camera viewpoints are focused on learning spatial and channel attention.In addition,the spatial pyramid pooling layer is inserted to extract the multiscale and fixed-dimension feature vectors irrespective of the size of the feature maps.Furthermore,the model’s effectiveness is validated on the CUHK01 and CUHK02 datasets.When compared with existing approaches,the approach presented in this paper achieves encouraging Rank 1 and 5 scores of 24.6% and 54.8%,respectively.展开更多
Background: In the classical psychological refractory period (PRP) paradigm, two stimuli are presented in brief succession, and participants are asked to make separate speeded responses to both stimuli. Due to a ce...Background: In the classical psychological refractory period (PRP) paradigm, two stimuli are presented in brief succession, and participants are asked to make separate speeded responses to both stimuli. Due to a central cognitive bottleneck, responses to the second stimulus are delayed, especially at short stimulus-onset asynchrony (SOA) between the two stimuli. Although the mechanisms of dual-task interference in the classical PRP paradigm have been extensively investigated, specific mechanisms underlying the cross-modal PRP paradigm are not well understood. In particular, it remains unknown whether the dominance of vision over audition manifests in the cross-modal PRP tasks. The present study aimed to investigate whether the visual dominance effect manifessts in the cross-modal PRP paradigm. Methods: We adapted the classical PRP paradigm by manipulating the order of a visual and an auditory task: the visual task could either precede the auditory task or vice versa, at either short or long SOAs. Twenty-five healthy participants took part in Experiment 1, and thirty-three new participants took part in Experiment 2. Reaction time and accuracy data were calculated and further analyzed by repeated-measures analysis of variance. Results: The results showed that visual precedence in the Visual-Auditory condition caused larger impairments to the subsequent auditory processing than vice versa in the Auditory-Visual condition: a larger delay of second response was revealed in the Visual-Auditory condition ( 135 ± 10 ms) than the Auditory-Visual condition (88 ± 9 ms). This effect was found only at the short SOAs tinder the existence of the central bottleneck, but not at the long SOAs. Moreover, this effect occurred both when the single visual and the single auditory task were of equal difficulty in Experiment I and when the single auditory task was more difficult than the single visual task in Experiment 2. Conclusion: Results of the two experiments suggested that the visual dominance effect occurred under the central bottleneck of cognitive processing.展开更多
基金supported in part by STI 2030-Major Projects(2022ZD0209200)in part by National Natural Science Foundation of China(62374099)+2 种基金in part by Beijing Natural Science Foundation−Xiaomi Innovation Joint Fund(L233009)Beijing Natural Science Foundation(L248104)in part by Independent Research Program of School of Integrated Circuits,Tsinghua University,in part by Tsinghua University Fuzhou Data Technology Joint Research Institute.
文摘In recent years,the rapid development of artificial intelligence has driven the widespread deployment of visual systems in complex environments such as autonomous driving,security surveillance,and medical diagnosis.However,existing image sensors—such as CMOS and CCD devices—intrinsically suffer from the limitation of fixed spectral response.Especially in environments with strong glare,haze,or dust,external spectral conditions often severely mismatch the device's design range,leading to significant degradation in image quality and a sharp drop in target recognition accuracy.While algorithmic post-processing(such as color bias correction or background suppression)can mitigate these issues,algorithm approaches typically introduce computational latency and increased energy consumption,making them unsuitable for edge computing or high-speed scenarios.
基金the National Natural Science Foundation of China(61976049,62072080,U20B2063)the Fundamental Research Funds for the Central Universities(ZYGX2019Z015)+1 种基金the Sichuan Science and Technology Program,China(2018GZDZX0032,2019ZDZX0008,2019YFG0003,2019YFG0533,2020YFS0057)Dongguan Songshan Lake Introduction Program of Leading Innovative and Entrepreneurial Talents.Recommended by Associate Editor Huimin Lu.
文摘With the rapid development of automated visual analysis,visual analysis systems have become a popular research topic in the field of computer vision and automated analysis.Visual analysis systems can assist humans to detect anomalous events(e.g.,fighting,walking alone on the grass,etc).In general,the existing methods for visual anomaly detection are usually based on an autoencoder architecture,i.e.,reconstructing the current frame or predicting the future frame.Then,the reconstruction error is adopted as the evaluation metric to identify whether an input is abnormal or not.The flaws of the existing methods are that abnormal samples can also be reconstructed well.In this paper,inspired by the human memory ability,we propose a novel deep neural network(DNN)based model termed cognitive memory-augmented network(CMAN)for the visual anomaly detection problem.The proposed CMAN model assumes that the visual analysis system imitates humans to remember normal samples and then distinguishes abnormal events from the collected videos.Specifically,in the proposed CMAN model,we introduce a memory module that is able to simulate the memory capacity of humans and a density estimation network that can learn the data distribution.The reconstruction errors and the novelty scores are used to distinguish abnormal events from videos.In addition,we develop a two-step scheme to train the proposed model so that the proposed memory module and the density estimation network can cooperate to improve performance.Comprehensive experiments evaluated on various popular benchmarks show the superiority and effectiveness of the proposed CMAN model for visual anomaly detection comparing with the state-of-the-arts methods.The implementation code of our CMAN method can be accessed at https://github.com/CMANcode/CMAN_pytorch.
基金Project supported by the National Natural Science Foundation of China(Grant No.51972316)Open Project of State Key Laboratory of ASIC&System(Grant No.2019KF006)+1 种基金Zhejiang Provincial Natural Science Foundation of China(Grant No.LR18F040002)Program for Ningbo Municipal Science and Technology Innovative Research Team,China(Grant No.2016B10005).
文摘Rapid developments in artificial intelligence trigger demands for perception and learning of external environments through visual perception systems.Neuromorphic devices and integrated system with photosensing and response functions can be constructed to mimic complex biological visual sensing behaviors.Here,recent progresses on optoelectronic neuromorphic memristors and optoelectronic neuromorphic transistors are briefly reviewed.A variety of visual synaptic functions stimulated on optoelectronic neuromorphic devices are discussed,including light-triggered short-term plasticities,long-term plasticities,and neural facilitation.These optoelectronic neuromorphic devices can also mimic human visual perception,information processing,and cognition.The optoelectronic neuromorphic devices that simulate biological visual perception functions will have potential application prospects in areas such as bionic neurological optoelectronic systems and intelligent robots.
基金supported by National Key Research and Development Program of China(Grant 2021YFA0715600,2021YFA0717700,2018YFB2202900)the National Natural Science Foundation of China(52192610,62274127,62374128,62304167)+7 种基金2023 Qinchuangyuan Construction Two Chain Integration Special Project(23LLRH0043)Key Research and Development Program of Shaanxi Province(Grant 2024GXYBXM-512)CAS Project for Young Scientists in Basic Research(YSBR-113)the open fund of State Key Laboratory of Infrared Physics(SITP-NLIST-ZD-2023-03)the open research fund of Songshan Lake Materials Laboratory(2023SLABFN02)the Wuhu and Xidian University special fund for industryuniversity-research cooperation(XWYCXY-012021004)China Postdoctoral Science Foundation(2023TQ0255)the Fundamental Research Funds for the Central Universities and the Innovation Fund of Xidian University.
文摘Optoelectronic memristors possess capabilities of data storage and mimicking human visual perception.They hold great promise in neuromorphic visual systems(NVs).This study introduces the amorphous wide-bandgap Ga_(2)O_(3)photoelectric synaptic memristor,which achieves 3-bit data storage through the adjustment of current compliance(Icc)and the utilization of variable ultraviolet(UV-254 nm)light intensities.The“AND”and“OR”logic gates in memristor-aided logic(MAGIC)are implemented by utilizing voltage polarity and UV light as input signals.The device also exhibits highly stable synaptic characteristics such as paired-pulse facilitation(PPF),spike-intensity dependent plasticity(SIDP),spike-number dependent plasticity(SNDP),spike-time dependent plasticity(STDP),spike-frequency dependent plasticity(SFDP)and the learning experience behavior.Finally,when integrated into an artificial neural network(ANN),the Ag/Ga_(2)O_(3)/Pt memristive device mimicked optical pulse potentiation and electrical pulse depression with high pattern accuracy(90.7%).The single memristive cells with multifunctional features are promising candidates for optoelectronic memory storage,neuromorphic computing,and artificial visual perception applications.
基金supported in part by Hong Kong RGC CERG under Grant No. 618705
文摘Direct volume rendering (DVR) is a powerful visualization technique which allows users to effectively explore and study volumetric datasets. Different transparency settings can be flexibly assigned to different structures such that some valuable information can be revealed in direct volume rendered images (DVRIs). However, end-users often feel that some risks are always associated with DVR because they do not know whether any important information is missing from the transparent regions of DVRIs. In this paper, we investigate how to semi-automatically generate a set of DVRIs and also an animation which can reveal information missed in the original DVRIs and meanwhile satisfy some image quality criteria such as coherence. A complete framework is developed to tackle various problems related to the generation and quality evaluation of visibility-aware DVRIs and animations. Our technique can reduce the risk of using direct volume rendering and thus boost the confidence of users in volume rendering systems.
基金National Key Research and Development Program of China(2020YFB1707700)National Natural Science Foundation of China(61972356,62036009)Fundamental Research Funds for the Provincial Universities of Zhejiang,China(RF-A2020001).
文摘Acoustic quality detection is vital in the manufactured products quality control field since it represents the conditions of machines or products.Recent work employed machine learning models in manufactured audio data to detect anomalous patterns.A major challenge is how to select applicable audio features to meliorate model’s accuracy and precision.To relax this challenge,we extract and analyze three audio feature types including Time Domain Feature,Frequency Domain Feature,and Cepstrum Feature to help identify the potential linear and non-linear relationships.In addition,we design a visual analysis system,namely AFExplorer,to assist data scientists in extracting audio features and selecting potential feature combinations.AFExplorer integrates four main views to present detailed distribution and relevance of the audio features,which helps users observe the impact of features visually in the feature selection.We perform the case study with AFExplore according to the ToyADMOS and MIMII Dataset to demonstrate the usability and effectiveness of the proposed system.
文摘Visualization literacy,the ability to interpret and comprehend visual designs,is recognized as an essential skill by the visualization community.We identify and investigate barriers to comprehending parallel coordinates plots(PCPs),one of the advanced graphical representations for the display of multivariate and high-dimensional data.We develop a parallel coordinates literacy test with diverse images generated using popular PCP software tools.The test improves PCP literacy and evaluates the user’s literacy skills.We introduce an interactive educational tool that assists the teaching and learning of parallel coordinates by offering a more active learning experience.Using this pedagogical tool,we aim to advance novice users’parallel coordinates literacy skills.Based on the hypothesis that an interactive tool that links traditional Cartesian Coordinates with PCPs interactively will enhance PCP literacy further than static slides,we compare the learning experience using traditional slides with our novel software tool and investigate the efficiency of the educational software with an online,crowdsourced user-study.User-study results show that our pedagogical tool positively impacts a user’s PCP comprehension.
基金supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0008703,The Competency Development Program for Industry Specialist)the MSIT(Ministry of Science and ICT),Republic of Korea,under the ITRC(Information Technology Research Center)support program(IITP-2022-2018-0-01799)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘Innovations on the Internet of Everything(IoE)enabled systems are driving a change in the settings where we interact in smart units,recognized globally as smart city environments.However,intelligent video-surveillance systems are critical to increasing the security of these smart cities.More precisely,in today’s world of smart video surveillance,person re-identification(Re-ID)has gained increased consideration by researchers.Various researchers have designed deep learningbased algorithms for person Re-ID because they have achieved substantial breakthroughs in computer vision problems.In this line of research,we designed an adaptive feature refinementbased deep learning architecture to conduct person Re-ID.In the proposed architecture,the inter-channel and inter-spatial relationship of features between the images of the same individual taken from nonidentical camera viewpoints are focused on learning spatial and channel attention.In addition,the spatial pyramid pooling layer is inserted to extract the multiscale and fixed-dimension feature vectors irrespective of the size of the feature maps.Furthermore,the model’s effectiveness is validated on the CUHK01 and CUHK02 datasets.When compared with existing approaches,the approach presented in this paper achieves encouraging Rank 1 and 5 scores of 24.6% and 54.8%,respectively.
文摘Background: In the classical psychological refractory period (PRP) paradigm, two stimuli are presented in brief succession, and participants are asked to make separate speeded responses to both stimuli. Due to a central cognitive bottleneck, responses to the second stimulus are delayed, especially at short stimulus-onset asynchrony (SOA) between the two stimuli. Although the mechanisms of dual-task interference in the classical PRP paradigm have been extensively investigated, specific mechanisms underlying the cross-modal PRP paradigm are not well understood. In particular, it remains unknown whether the dominance of vision over audition manifests in the cross-modal PRP tasks. The present study aimed to investigate whether the visual dominance effect manifessts in the cross-modal PRP paradigm. Methods: We adapted the classical PRP paradigm by manipulating the order of a visual and an auditory task: the visual task could either precede the auditory task or vice versa, at either short or long SOAs. Twenty-five healthy participants took part in Experiment 1, and thirty-three new participants took part in Experiment 2. Reaction time and accuracy data were calculated and further analyzed by repeated-measures analysis of variance. Results: The results showed that visual precedence in the Visual-Auditory condition caused larger impairments to the subsequent auditory processing than vice versa in the Auditory-Visual condition: a larger delay of second response was revealed in the Visual-Auditory condition ( 135 ± 10 ms) than the Auditory-Visual condition (88 ± 9 ms). This effect was found only at the short SOAs tinder the existence of the central bottleneck, but not at the long SOAs. Moreover, this effect occurred both when the single visual and the single auditory task were of equal difficulty in Experiment I and when the single auditory task was more difficult than the single visual task in Experiment 2. Conclusion: Results of the two experiments suggested that the visual dominance effect occurred under the central bottleneck of cognitive processing.