Multisensory lab based in Peking University,has carried out basic studies in multisensory space and time processing,intersensory binding and haptic/tactile perception.We exploited a typical paradigm of multisensory il...Multisensory lab based in Peking University,has carried out basic studies in multisensory space and time processing,intersensory binding and haptic/tactile perception.We exploited a typical paradigm of multisensory illusion-temporal ventriloquist effect and applied it in a wide range of multisensory interactions(mainly focused on temporal processing).In this work,we summarized how the tactile stimuli were exploited to compose tactile cues and as tactile apparent motion to interface with other sensory stimuli(visual and auditory stimuli)to examine the underlying perceptual organization in a multisensory context.Moreover,we introduced two examples of wearable haptic/tactile perception in our lab,by using two customized tactile devices and discussed the potential applications in this field.展开更多
This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays,where the dynamics of agents is modeled as a high-order integrator.A linear distributed con...This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays,where the dynamics of agents is modeled as a high-order integrator.A linear distributed consensus protocol is proposed,which only depends on the agent's own information and its neighbors'partial information.By introducing a decomposition of the state vector and performing a state space transformation,the closed-loop dynamics of the multi-agent system is converted into two decoupled subsystems.Based on the decoupled subsystems,some sufficient conditions for the convergence to consensus are established,which provide the upper bounds on the admissible communication delays.Also,the explicit expression of the consensus state is derived.Moreover,the results on the consensus seeking of the group of high-order agents have been extended to a network of agents with dynamics modeled as a completely controllable linear time-invariant system.It is proved that the convergence to consensus of this network is equivalent to that of the group of high-order agents.Finally,some numerical examples are given to demonstrate the effectiveness of the main results.展开更多
In many image analysis and processing problems, discriminating the size and shape of each individual object in an aggregate pile projected in an image is an important practice. It is relatively easy to distinguish the...In many image analysis and processing problems, discriminating the size and shape of each individual object in an aggregate pile projected in an image is an important practice. It is relatively easy to distinguish these features among the objects already separated from each other. The problems will be undoubtedly more complex and of greater challenge if the objects are touched or/and overlapped. This letter presents an algorithm that can be used to separate the touches and overlaps existing in the objects within a 2-D image. The approach is first to convert the gray-scale image to its corresponding binary one and then to the 3-D topographic one using the erosion operations. A template (or mask) is engineered to search the topographic surface for the saddle point, from which the segmenting orientation is determined followed by the desired separating operation. The algorithm is tested on a real image and the running result is adequately satisfying and encouraging.展开更多
The electrocardiogram (ECG) has broad applications in clinical diagnosis and prognosis of cardiovascular disease. Many researchers have contributed to its progressive development. To commemorate those pioneers, and ...The electrocardiogram (ECG) has broad applications in clinical diagnosis and prognosis of cardiovascular disease. Many researchers have contributed to its progressive development. To commemorate those pioneers, and to better study and promote the use of ECG, we reviewed and present here a systematic introduction about the history, hotspots, and trends of ECG. In the historical part, information including the invention, improvement, and extensive applications of ECG, such as in long QT syndrome (LQTS), angina, and myocardial infarction (MI), are chronologi- cally presented. New technologies and applications from the 1990s are also introduced. In the second part, we use the bibliometric analysis me- thod to analyze the hotspots in the field of ECG-related research. By using total citations and year-specific total citations as our main criteria, four key hotspots in ECG-related research were identified from 11 articles, including atrial fibrillation, LQTS, angina and MI, and heart rate variability. Recent studies in those four areas are also reported. In the final part, we discuss the future trends concerning ECG-related research. The authors believe that improvement of the ECG instrumentation, big data mining for ECG, and the accuracy of diagnosis and application will be areas of continuous concern.展开更多
Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable atte...Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable attention in the field. However, each of the three hypotheses of the theory still needs further verification. In this review, we focus on how the auditory-motor anatomical and functional associations play a role in speech perception and discuss why previous studies could not reach an agreement and particularly whether the motor system involvement in speech perception is task-load dependent. Finally, we suggest that the function of the auditory-motor link is particularly useful for speech perception under adverse listening conditions and the further revised Motor Theory is a potential solution to the "cocktail-party" problem.展开更多
The autofocusing technique based on contourlet transform is discussed in this paper and an autofocusing method is proposed for images with much information in certain directions. The experimental results show that the...The autofocusing technique based on contourlet transform is discussed in this paper and an autofocusing method is proposed for images with much information in certain directions. The experimental results show that the proposed method can focus accurately and the sensitivity ratio is higher than that of the other autofocusing methods based on conventional image processing展开更多
Objective:Computer-aided diagnosis using deep learning algorithms has been initially applied in the field of mammography,but there is no large-scale clinical application.Methods:This study proposed to develop and veri...Objective:Computer-aided diagnosis using deep learning algorithms has been initially applied in the field of mammography,but there is no large-scale clinical application.Methods:This study proposed to develop and verify an artificial intelligence model based on mammography.Firstly,mammograms retrospectively collected from six centers were randomized to a training dataset and a validation dataset for establishing the model.Secondly,the model was tested by comparing 12 radiologists’performance with and without it.Finally,prospectively enrolled women with mammograms from six centers were diagnosed by radiologists with the model.The detection and diagnostic capabilities were evaluated using the freeresponse receiver operating characteristic(FROC)curve and ROC curve.Results:The sensitivity of model for detecting lesions after matching was 0.908 for false positive rate of 0.25 in unilateral images.The area under ROC curve(AUC)to distinguish the benign lesions from malignant lesions was0.855[95%confidence interval(95%CI):0.830,0.880].The performance of 12 radiologists with the model was higher than that of radiologists alone(AUC:0.852 vs.0.805,P=0.005).The mean reading time of with the model was shorter than that of reading alone(80.18 s vs.62.28 s,P=0.032).In prospective application,the sensitivity of detection reached 0.887 at false positive rate of 0.25;the AUC of radiologists with the model was 0.983(95%CI:0.978,0.988),with sensitivity,specificity,positive predictive value(PPV),and negative predictive value(NPV)of94.36%,98.07%,87.76%,and 99.09%,respectively.Conclusions:The artificial intelligence model exhibits high accuracy for detecting and diagnosing breast lesions,improves diagnostic accuracy and saves time.展开更多
Recently, approaches utilizing spatial-temporal features to form Bag-of-Words (BoWs) models have achieved great success due to their simplicity and effectiveness. But they still have difficulties when distinguishing...Recently, approaches utilizing spatial-temporal features to form Bag-of-Words (BoWs) models have achieved great success due to their simplicity and effectiveness. But they still have difficulties when distinguishing between actions with high inter-ambiguity. The main reason is that they describe actions by orderless bag of features, and ignore the spatial and temporal structure information of visual words. In order to improve classification performance, we present a novel approach called sequential Bag-of-Words. It captures temporal sequential structure by segmenting the entire action into sub-actions. Meanwhile, we pay more attention to the distinguishing parts of an action by classifying sub- actions separately, which is then employed to vote for the final result. Extensive experiments are conducted on challenging datasets and real scenes to evaluate our method. Concretely, we compare our results to some state-of-the-art classification approaches and confirm the advantages of our approach to distinguish similar actions. Results show that our approach is robust and outperforms most existing BoWs based classification approaches, especially on complex datasets with interactive activities, cluttered backgrounds and inter-class action ambiguities.展开更多
As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-...As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-visual keyword spotting models are limited to detecting isolated words,while keyword spotting for unconstrained speech is still a challenging problem.To this end,an Audio-Visual Keyword Transformer(AVKT)network is proposed to spot keywords in unconstrained video clips.The authors present a transformer classifier with learnable CLS tokens to extract distinctive keyword features from the variable-length audio and visual inputs.The outputs of audio and visual branches are combined in a decision fusion module.As humans can easily notice whether a keyword appears in a sentence or not,our AVKT network can detect whether a video clip with a spoken sentence contains a pre-specified keyword.Moreover,the position of the keyword is localised in the attention map without additional position labels.Exper-imental results on the LRS2-KWS dataset and our newly collected PKU-KWS dataset show that the accuracy of AVKT exceeded 99%in clean scenes and 85%in extremely noisy conditions.The code is available at https://github.com/jialeren/AVKT.展开更多
The research progress of swarm robotics is reviewed in details. The swarm robotics inspired from nature is a combination of swarm intelligence and robotics, which shows a great potential in several aspects. First of a...The research progress of swarm robotics is reviewed in details. The swarm robotics inspired from nature is a combination of swarm intelligence and robotics, which shows a great potential in several aspects. First of all, the cooperation of nature swarm and swarm intelligence are briefly introduced, and the special features of the swarm robotics are summarized compared to a single robot and other multi-individual systems. Then the modeling methods for swarm robotics are described by a list of several widely used swarm robotics entity projects and simulation platforms. Finally, as a main part of this paper, the current research on the swarm robotic algorithms are presented in detail, including cooperative control mechanisms in swarm robotics for flocking, navigating and searching applications.展开更多
Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning dis...Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e., zeroshot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot finegrained classification.展开更多
Tiny defect detection (TDD) which aims to perform the quality control of printed circuit boards (PCBs) is a basic and essential task in the production of most electronic products. Though significant progress has been ...Tiny defect detection (TDD) which aims to perform the quality control of printed circuit boards (PCBs) is a basic and essential task in the production of most electronic products. Though significant progress has been made in PCB defect detection, traditional methods are still difficult to cope with the complex and diverse PCBs. To deal with these problems, this article proposes a tiny defect detection network (TDD-Net) to improve performance for PCB defect detection. In this method, the inherent multi-scale and pyramidal hierarchies of deep convolutional networks are exploited to construct feature pyramids. Compared with existing approaches, the TDD-Net has three novel changes. First, reasonable anchors are designed by using k-means clustering. Second, TDD-Net strengthens the relationship of feature maps from different levels and benefits from low-level structural information, which is suitable for tiny defect detection. Finally, considering the small and imbalance dataset, online hard example mining is adopted in the whole training phase in order to improve the quality of region-of-interest (ROI) proposals and make more effective use of data information. Quantitative results on the PCB defect dataset show that the proposed method has better portability and can achieve 98.90% mAP, which outperforms the state-of-arts. The code will be publicly available.展开更多
Various time-frequency(T-F)masks are being applied to sound source localization tasks.Moreover,deep learning has dramatically advanced T-F mask estimation.However,existing masks are usually designed for speech separat...Various time-frequency(T-F)masks are being applied to sound source localization tasks.Moreover,deep learning has dramatically advanced T-F mask estimation.However,existing masks are usually designed for speech separation tasks and are suitable only for single-channel signals.A novel complex-valued T-F mask is proposed that reserves the head-related transfer function(HRTF),customized for binaural sound source localization.In addition,because the convolutional neural network that is exploited to estimate the proposed mask takes binaural spectral information as the input and output,accurate binaural cues can be preserved.Compared with conventional T-F masks that emphasize single speech source–dominated T-F units,HRTFreserved masks eliminate the speech component while keeping the direct propagation path.Thus,the estimated HRTF is capable of extracting more reliable localization features for the final direction of arrival estimation.Hence,binaural sound source localization guided by the proposed T-F mask is robust under noisy and reverberant acoustic environments.The experimental results demonstrate that the new T-F mask is superior to conventional T-F masks and lead to the better performance of sound source localization in adverse environments.展开更多
This article proposes a deep neural network(DNN)-based direct-path relative transfer function(DP-RTF)enhancement method for robust direction of arrival(DOA)estimation in noisy and reverberant environments.The DP-RTF r...This article proposes a deep neural network(DNN)-based direct-path relative transfer function(DP-RTF)enhancement method for robust direction of arrival(DOA)estimation in noisy and reverberant environments.The DP-RTF refers to the ratio between the directpath acoustic transfer functions of the two microphone channels.First,the complex-value DP-RTF is decomposed into the inter-channel intensity difference,and sinusoidal functions of the inter-channel phase difference in the time-frequency domain.Then,the decomposed DP-RTF features from a series of temporal context frames are utilized to train a DNN model,which maps the DP-RTF features contaminated by noise and reverberation to the clean ones,and meanwhile provides a time-frequency(TF)weight to indicate the reliability of the mapping.The DP-RTF enhancement network can help to enhance the DP-RTF against noise and reverberation.Finally,the DOA of a sound source can be estimated by integrating the weighted matching between the enhanced DP-RTF features and the DP-RTF templates.Experimental results on simulated data show the superiority of the proposed DP-RTF enhancement network for estimating the DOA of the sound source in the environments with various levels of noise and reverberation.展开更多
In robot binaural sound source localization(SSL),locating the direction of the sound source accurately in the shortest time is important.It refers to the algorithm complexity,but even more to the shortest duration of ...In robot binaural sound source localization(SSL),locating the direction of the sound source accurately in the shortest time is important.It refers to the algorithm complexity,but even more to the shortest duration of the required signal.A novel binaural SSL method based on feature and frequency weighting is proposed.More specifically,in the training stage,the direction-related interaural cross-correlation function(CCF)and interaural intensity difference(IID)in each frequency band are calculated under noiseless conditions,which are considered the templates.In the testing stage,first the cosine similarities between the CCF and IID of the test signal and templates are calculated in all features and frequency bands.Then,the direction likelihood can be obtained by weighting the similarities.Finally,the direction with maximum likelihood is specified as the direction of the sound source.Experiments were carried out on CIPIC dataset subject 003 with different noises in the noisex-92 dataset and demonstrated that the method can accurately locate the sound source with a short signal duration.展开更多
The computer virus is considered one of the most horrifying threats to the security of computer systems worldwide.The rapid development of evasion techniques used in virus causes the signature based computer virus det...The computer virus is considered one of the most horrifying threats to the security of computer systems worldwide.The rapid development of evasion techniques used in virus causes the signature based computer virus detection techniques to be ineffective.Many novel computer virus detection approaches have been proposed in the past to cope with the ineffectiveness,mainly classified into three categories: static,dynamic and heuristics techniques.As the natural similarities between the biological immune system(BIS),computer security system(CSS),and the artificial immune system(AIS) were all developed as a new prototype in the community of anti-virus research.The immune mechanisms in the BIS provide the opportunities to construct computer virus detection models that are robust and adaptive with the ability to detect unseen viruses.In this paper,a variety of classic computer virus detection approaches were introduced and reviewed based on the background knowledge of the computer virus history.Next,a variety of immune based computer virus detection approaches were also discussed in detail.Promising experimental results suggest that the immune based computer virus detection approaches were able to detect new variants and unseen viruses at lower false positive rates,which have paved a new way for the anti-virus research.展开更多
2D-to-3D video conversion is a feasible way to generate 3D programs for the current 3DTV industry. However, for large-scale 3D video production, current systems are no longer adequate in terms of the time and labor re...2D-to-3D video conversion is a feasible way to generate 3D programs for the current 3DTV industry. However, for large-scale 3D video production, current systems are no longer adequate in terms of the time and labor required for conversion. In this paper, we introduce a distributed 2D-to-3D video conversion system that includes a 2D-to-3D video conversion module, architecture of the parallel computation on the cloud, and 3D video coding in the system. The system enables cooperation among multiple users in the simultaneous completion of their conversion tasks so that the conversion efficiency is greatly promoted. In the experiments, we evaluate the system based on criteria related to both time consumption and video coding performance.展开更多
Simultaneous localization and mapping(SLAM)has attracted considerable research interest from the robotics and computer-vision communities for>30 years.With steady and progressive efforts being made,modern SLAM syst...Simultaneous localization and mapping(SLAM)has attracted considerable research interest from the robotics and computer-vision communities for>30 years.With steady and progressive efforts being made,modern SLAM systems allow robust and online applications in real-world scenes.We examined the evolution of this powerful perception tool in detail and noticed that the insights concerning incremental computation and temporal guidance are persistently retained.Herein,we denote this temporal continuity as a flow basis and present for the first time a survey that specifically focuses on the flow-based nature,ranging from geometric computation to the emerging learning techniques.We start by reviewing two essential stages for geometric computation,presenting the de facto standard pipeline and problem formulation,along with the utilization of temporal cues.The recently emerging techniques are then summarized,covering a wide range of areas,such as learning techniques,sensor fusion,and continuous time trajectory modeling.This survey aims at arousing public attention on how robust SLAM systems benefit from a continuously observing nature,as well as the topics worthy of further investigation for better utilizing the temporal cues.展开更多
Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to tar...Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to target representation and data association. So discriminative and reliable target representation is vital for accurate data association in multi-tracking. Pervious works always combine bunch of features to increase the discriminative power, but this is prone to error accumulation and unnecessary computational cost, which may increase ambiguity on the contrary. Moreover, reliability of a same feature in different scenes may vary a lot, especially for currently widespread network cameras, which are settled in various and complex indoor scenes, previous fixed feature selection schemes cannot meet general requirements. To properly handle these problems, first, we propose a scene-adaptive hierarchical data association scheme, which adaptively selects features with higher reliability on target representation in the applied scene, and gradually combines features to the minimum requirement of discriminating ambiguous targets; second, a novel depth-invariant part-based appearance model using RGB-D data is proposed which makes the appearance model robust to scale change, partial occlusion and view-truncation. The introduce of RGB-D data increases the diversity of features, which provides more types of features for feature selection in data association and enhances the final multi-tracking performance. We validate our method from several aspects including scene-adaptive feature selection scheme, hierarchical data association scheme and RGB-D based appearance modeling scheme in various indoor scenes, which demonstrates its effectiveness and efficiency on improving multi-tracking performances in various indoor scenes.展开更多
Correction to:Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics DOI:10.1007/s11633-019-1177-8 Authors:Ao-Xue Li,Ke-Xin Zhang,Li-Wei Wang The article Zero-shot Fine-grained Classification by...Correction to:Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics DOI:10.1007/s11633-019-1177-8 Authors:Ao-Xue Li,Ke-Xin Zhang,Li-Wei Wang The article Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics written by Ao-Xue Li,Ke-Xin Zhang and Li-Wei Wang,was originally published on vol.16,no.5 of International Journal of Automation and Computing without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication.展开更多
基金Supported by Natural Science Foundation of China(NSFC61527804)NSFC and the German Research Foundation(DFG)in Project Crossmodal Learning(NSFC 61621136008/DFG TRR-169)Research fund from brain lab,TAL education group,China.
文摘Multisensory lab based in Peking University,has carried out basic studies in multisensory space and time processing,intersensory binding and haptic/tactile perception.We exploited a typical paradigm of multisensory illusion-temporal ventriloquist effect and applied it in a wide range of multisensory interactions(mainly focused on temporal processing).In this work,we summarized how the tactile stimuli were exploited to compose tactile cues and as tactile apparent motion to interface with other sensory stimuli(visual and auditory stimuli)to examine the underlying perceptual organization in a multisensory context.Moreover,we introduced two examples of wearable haptic/tactile perception in our lab,by using two customized tactile devices and discussed the potential applications in this field.
基金supported by the National Natural Science Foundation of China(No.60674050,60736022,10972002,60774089,60704039)
文摘This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays,where the dynamics of agents is modeled as a high-order integrator.A linear distributed consensus protocol is proposed,which only depends on the agent's own information and its neighbors'partial information.By introducing a decomposition of the state vector and performing a state space transformation,the closed-loop dynamics of the multi-agent system is converted into two decoupled subsystems.Based on the decoupled subsystems,some sufficient conditions for the convergence to consensus are established,which provide the upper bounds on the admissible communication delays.Also,the explicit expression of the consensus state is derived.Moreover,the results on the consensus seeking of the group of high-order agents have been extended to a network of agents with dynamics modeled as a completely controllable linear time-invariant system.It is proved that the convergence to consensus of this network is equivalent to that of the group of high-order agents.Finally,some numerical examples are given to demonstrate the effectiveness of the main results.
基金Suppprted by the Scientific Research Start-up foundation of Ningbo University (No.2004037)Zhejiang Provincial Foundation for Returned Overseas Students and Scholars (No.2004884).
文摘In many image analysis and processing problems, discriminating the size and shape of each individual object in an aggregate pile projected in an image is an important practice. It is relatively easy to distinguish these features among the objects already separated from each other. The problems will be undoubtedly more complex and of greater challenge if the objects are touched or/and overlapped. This letter presents an algorithm that can be used to separate the touches and overlaps existing in the objects within a 2-D image. The approach is first to convert the gray-scale image to its corresponding binary one and then to the 3-D topographic one using the erosion operations. A template (or mask) is engineered to search the topographic surface for the saddle point, from which the segmenting orientation is determined followed by the desired separating operation. The algorithm is tested on a real image and the running result is adequately satisfying and encouraging.
基金This research was supported in part by National Natural Science Foundation of China,supported by Research Funds of China Space Medical Engineering,supported by State Key Laboratory of Space Medicine Fundamentals and Applications, China Astronaut Research and Training Centre
文摘The electrocardiogram (ECG) has broad applications in clinical diagnosis and prognosis of cardiovascular disease. Many researchers have contributed to its progressive development. To commemorate those pioneers, and to better study and promote the use of ECG, we reviewed and present here a systematic introduction about the history, hotspots, and trends of ECG. In the historical part, information including the invention, improvement, and extensive applications of ECG, such as in long QT syndrome (LQTS), angina, and myocardial infarction (MI), are chronologi- cally presented. New technologies and applications from the 1990s are also introduced. In the second part, we use the bibliometric analysis me- thod to analyze the hotspots in the field of ECG-related research. By using total citations and year-specific total citations as our main criteria, four key hotspots in ECG-related research were identified from 11 articles, including atrial fibrillation, LQTS, angina and MI, and heart rate variability. Recent studies in those four areas are also reported. In the final part, we discuss the future trends concerning ECG-related research. The authors believe that improvement of the ECG instrumentation, big data mining for ECG, and the accuracy of diagnosis and application will be areas of continuous concern.
基金supported by the National Basic Research Development Program of China (2009CB320901, 2011CB707805, 2013CB329304)the National Natural Science Foundation of China (31170985, 91120001, 61121002)"985" project grants from Peking University
文摘Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable attention in the field. However, each of the three hypotheses of the theory still needs further verification. In this review, we focus on how the auditory-motor anatomical and functional associations play a role in speech perception and discuss why previous studies could not reach an agreement and particularly whether the motor system involvement in speech perception is task-load dependent. Finally, we suggest that the function of the auditory-motor link is particularly useful for speech perception under adverse listening conditions and the further revised Motor Theory is a potential solution to the "cocktail-party" problem.
基金This work was supported by the National Natural Science Founda-tion of China (grant No.60472100, 60672073)Natural ScienceFoundation of Zhejiang Province (grant RC01057, Y105577)the Key Project of Chinese Ministry of Education (grant No.206059).
文摘The autofocusing technique based on contourlet transform is discussed in this paper and an autofocusing method is proposed for images with much information in certain directions. The experimental results show that the proposed method can focus accurately and the sensitivity ratio is higher than that of the other autofocusing methods based on conventional image processing
基金supported by Beijing Municipal Science&Technology Commission(No.Z181100001918001)Beijing Municipal Administration of Hospitals Clinical Medicine Development of Special Funding Support(No.ZYLX201803)+1 种基金Beijing Hospitals Authority Ascent Plan(No.DFL20191103)Beijing Municipal Administration of Hospitals Incubating Program(No.PX2018041)。
文摘Objective:Computer-aided diagnosis using deep learning algorithms has been initially applied in the field of mammography,but there is no large-scale clinical application.Methods:This study proposed to develop and verify an artificial intelligence model based on mammography.Firstly,mammograms retrospectively collected from six centers were randomized to a training dataset and a validation dataset for establishing the model.Secondly,the model was tested by comparing 12 radiologists’performance with and without it.Finally,prospectively enrolled women with mammograms from six centers were diagnosed by radiologists with the model.The detection and diagnostic capabilities were evaluated using the freeresponse receiver operating characteristic(FROC)curve and ROC curve.Results:The sensitivity of model for detecting lesions after matching was 0.908 for false positive rate of 0.25 in unilateral images.The area under ROC curve(AUC)to distinguish the benign lesions from malignant lesions was0.855[95%confidence interval(95%CI):0.830,0.880].The performance of 12 radiologists with the model was higher than that of radiologists alone(AUC:0.852 vs.0.805,P=0.005).The mean reading time of with the model was shorter than that of reading alone(80.18 s vs.62.28 s,P=0.032).In prospective application,the sensitivity of detection reached 0.887 at false positive rate of 0.25;the AUC of radiologists with the model was 0.983(95%CI:0.978,0.988),with sensitivity,specificity,positive predictive value(PPV),and negative predictive value(NPV)of94.36%,98.07%,87.76%,and 99.09%,respectively.Conclusions:The artificial intelligence model exhibits high accuracy for detecting and diagnosing breast lesions,improves diagnostic accuracy and saves time.
文摘Recently, approaches utilizing spatial-temporal features to form Bag-of-Words (BoWs) models have achieved great success due to their simplicity and effectiveness. But they still have difficulties when distinguishing between actions with high inter-ambiguity. The main reason is that they describe actions by orderless bag of features, and ignore the spatial and temporal structure information of visual words. In order to improve classification performance, we present a novel approach called sequential Bag-of-Words. It captures temporal sequential structure by segmenting the entire action into sub-actions. Meanwhile, we pay more attention to the distinguishing parts of an action by classifying sub- actions separately, which is then employed to vote for the final result. Extensive experiments are conducted on challenging datasets and real scenes to evaluate our method. Concretely, we compare our results to some state-of-the-art classification approaches and confirm the advantages of our approach to distinguish similar actions. Results show that our approach is robust and outperforms most existing BoWs based classification approaches, especially on complex datasets with interactive activities, cluttered backgrounds and inter-class action ambiguities.
基金Science and Technology Plan of Shenzhen,Grant/Award Number:JCYJ20200109140410340National Natural Science Foundation of China,Grant/Award Number:62073004。
文摘As one of the most effective methods to improve the accuracy and robustness of speech tasks,the audio-visual fusion approach has recently been introduced into the field of Keyword Spotting(KWS).However,existing audio-visual keyword spotting models are limited to detecting isolated words,while keyword spotting for unconstrained speech is still a challenging problem.To this end,an Audio-Visual Keyword Transformer(AVKT)network is proposed to spot keywords in unconstrained video clips.The authors present a transformer classifier with learnable CLS tokens to extract distinctive keyword features from the variable-length audio and visual inputs.The outputs of audio and visual branches are combined in a decision fusion module.As humans can easily notice whether a keyword appears in a sentence or not,our AVKT network can detect whether a video clip with a spoken sentence contains a pre-specified keyword.Moreover,the position of the keyword is localised in the attention map without additional position labels.Exper-imental results on the LRS2-KWS dataset and our newly collected PKU-KWS dataset show that the accuracy of AVKT exceeded 99%in clean scenes and 85%in extremely noisy conditions.The code is available at https://github.com/jialeren/AVKT.
基金Sponsored by National Natural Science Foundation of China under Grant( 61170057,60875080)
文摘The research progress of swarm robotics is reviewed in details. The swarm robotics inspired from nature is a combination of swarm intelligence and robotics, which shows a great potential in several aspects. First of all, the cooperation of nature swarm and swarm intelligence are briefly introduced, and the special features of the swarm robotics are summarized compared to a single robot and other multi-individual systems. Then the modeling methods for swarm robotics are described by a list of several widely used swarm robotics entity projects and simulation platforms. Finally, as a main part of this paper, the current research on the swarm robotic algorithms are presented in detail, including cooperative control mechanisms in swarm robotics for flocking, navigating and searching applications.
基金supported by National Basic Research Program of China (973 Program) (No. 2015CB352502)National Nature Science Foundation of China (No. 61573026)Beijing Nature Science Foundation (No. L172037)
文摘Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e., zeroshot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot finegrained classification.
文摘Tiny defect detection (TDD) which aims to perform the quality control of printed circuit boards (PCBs) is a basic and essential task in the production of most electronic products. Though significant progress has been made in PCB defect detection, traditional methods are still difficult to cope with the complex and diverse PCBs. To deal with these problems, this article proposes a tiny defect detection network (TDD-Net) to improve performance for PCB defect detection. In this method, the inherent multi-scale and pyramidal hierarchies of deep convolutional networks are exploited to construct feature pyramids. Compared with existing approaches, the TDD-Net has three novel changes. First, reasonable anchors are designed by using k-means clustering. Second, TDD-Net strengthens the relationship of feature maps from different levels and benefits from low-level structural information, which is suitable for tiny defect detection. Finally, considering the small and imbalance dataset, online hard example mining is adopted in the whole training phase in order to improve the quality of region-of-interest (ROI) proposals and make more effective use of data information. Quantitative results on the PCB defect dataset show that the proposed method has better portability and can achieve 98.90% mAP, which outperforms the state-of-arts. The code will be publicly available.
基金National Natural Science Foundation of China,Grant/Award Numbers:61673030,U1613209National Natural Science Foundation of Shenzhen,Grant/Award Number:JCYJ20190808182209321。
文摘Various time-frequency(T-F)masks are being applied to sound source localization tasks.Moreover,deep learning has dramatically advanced T-F mask estimation.However,existing masks are usually designed for speech separation tasks and are suitable only for single-channel signals.A novel complex-valued T-F mask is proposed that reserves the head-related transfer function(HRTF),customized for binaural sound source localization.In addition,because the convolutional neural network that is exploited to estimate the proposed mask takes binaural spectral information as the input and output,accurate binaural cues can be preserved.Compared with conventional T-F masks that emphasize single speech source–dominated T-F units,HRTFreserved masks eliminate the speech component while keeping the direct propagation path.Thus,the estimated HRTF is capable of extracting more reliable localization features for the final direction of arrival estimation.Hence,binaural sound source localization guided by the proposed T-F mask is robust under noisy and reverberant acoustic environments.The experimental results demonstrate that the new T-F mask is superior to conventional T-F masks and lead to the better performance of sound source localization in adverse environments.
基金supported by National Natural Science Foundation of China(No.61673030,U1613209)Science and Technology Plan Project of Shenzhen(No.JCYJ20200109140410340).
文摘This article proposes a deep neural network(DNN)-based direct-path relative transfer function(DP-RTF)enhancement method for robust direction of arrival(DOA)estimation in noisy and reverberant environments.The DP-RTF refers to the ratio between the directpath acoustic transfer functions of the two microphone channels.First,the complex-value DP-RTF is decomposed into the inter-channel intensity difference,and sinusoidal functions of the inter-channel phase difference in the time-frequency domain.Then,the decomposed DP-RTF features from a series of temporal context frames are utilized to train a DNN model,which maps the DP-RTF features contaminated by noise and reverberation to the clean ones,and meanwhile provides a time-frequency(TF)weight to indicate the reliability of the mapping.The DP-RTF enhancement network can help to enhance the DP-RTF against noise and reverberation.Finally,the DOA of a sound source can be estimated by integrating the weighted matching between the enhanced DP-RTF features and the DP-RTF templates.Experimental results on simulated data show the superiority of the proposed DP-RTF enhancement network for estimating the DOA of the sound source in the environments with various levels of noise and reverberation.
文摘In robot binaural sound source localization(SSL),locating the direction of the sound source accurately in the shortest time is important.It refers to the algorithm complexity,but even more to the shortest duration of the required signal.A novel binaural SSL method based on feature and frequency weighting is proposed.More specifically,in the training stage,the direction-related interaural cross-correlation function(CCF)and interaural intensity difference(IID)in each frequency band are calculated under noiseless conditions,which are considered the templates.In the testing stage,first the cosine similarities between the CCF and IID of the test signal and templates are calculated in all features and frequency bands.Then,the direction likelihood can be obtained by weighting the similarities.Finally,the direction with maximum likelihood is specified as the direction of the sound source.Experiments were carried out on CIPIC dataset subject 003 with different noises in the noisex-92 dataset and demonstrated that the method can accurately locate the sound source with a short signal duration.
基金National Natural Science Foundation of China(No.61170057,60875080)
文摘The computer virus is considered one of the most horrifying threats to the security of computer systems worldwide.The rapid development of evasion techniques used in virus causes the signature based computer virus detection techniques to be ineffective.Many novel computer virus detection approaches have been proposed in the past to cope with the ineffectiveness,mainly classified into three categories: static,dynamic and heuristics techniques.As the natural similarities between the biological immune system(BIS),computer security system(CSS),and the artificial immune system(AIS) were all developed as a new prototype in the community of anti-virus research.The immune mechanisms in the BIS provide the opportunities to construct computer virus detection models that are robust and adaptive with the ability to detect unseen viruses.In this paper,a variety of classic computer virus detection approaches were introduced and reviewed based on the background knowledge of the computer virus history.Next,a variety of immune based computer virus detection approaches were also discussed in detail.Promising experimental results suggest that the immune based computer virus detection approaches were able to detect new variants and unseen viruses at lower false positive rates,which have paved a new way for the anti-virus research.
基金supported by the National Key Basic Research Program of China (973 Program) under Grant No. 2009CB320904the National Natural Science Foundation of China under Grants No. 61121002, No. 61231010, 91120004the Key Projects in the National Science and Technology Pillar Program under Grant No. 2011BAH08B03
文摘2D-to-3D video conversion is a feasible way to generate 3D programs for the current 3DTV industry. However, for large-scale 3D video production, current systems are no longer adequate in terms of the time and labor required for conversion. In this paper, we introduce a distributed 2D-to-3D video conversion system that includes a 2D-to-3D video conversion module, architecture of the parallel computation on the cloud, and 3D video coding in the system. The system enables cooperation among multiple users in the simultaneous completion of their conversion tasks so that the conversion efficiency is greatly promoted. In the experiments, we evaluate the system based on criteria related to both time consumption and video coding performance.
基金National Key Research and Development Program of China(2017YFB1002601)National Natural Science Foundation of China(61632003,61771026)The authors thank Xin WANG,Qiuyuan WANG,Fei XUE,Pijian SUN,Shunkai LI,Junqiu WANG,Zhaoyang LV,and Wei DONG for their instructive discussion and feedback.
文摘Simultaneous localization and mapping(SLAM)has attracted considerable research interest from the robotics and computer-vision communities for>30 years.With steady and progressive efforts being made,modern SLAM systems allow robust and online applications in real-world scenes.We examined the evolution of this powerful perception tool in detail and noticed that the insights concerning incremental computation and temporal guidance are persistently retained.Herein,we denote this temporal continuity as a flow basis and present for the first time a survey that specifically focuses on the flow-based nature,ranging from geometric computation to the emerging learning techniques.We start by reviewing two essential stages for geometric computation,presenting the de facto standard pipeline and problem formulation,along with the utilization of temporal cues.The recently emerging techniques are then summarized,covering a wide range of areas,such as learning techniques,sensor fusion,and continuous time trajectory modeling.This survey aims at arousing public attention on how robust SLAM systems benefit from a continuously observing nature,as well as the topics worthy of further investigation for better utilizing the temporal cues.
基金This work is supported by National Natural Science Foundation of China (NSFC, No. 61340046), National High Technology Research and Development Program of China (863 Program, No. 2006AA04Z247), Scientific and Technical Innovation Commission of Shenzhen Municipality (JCYJ20130331144631730, JCYJ20130331144716089), Specialized Research Fund for the Doctoral Program of Higher Education (No. 20130001110011).
文摘Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to target representation and data association. So discriminative and reliable target representation is vital for accurate data association in multi-tracking. Pervious works always combine bunch of features to increase the discriminative power, but this is prone to error accumulation and unnecessary computational cost, which may increase ambiguity on the contrary. Moreover, reliability of a same feature in different scenes may vary a lot, especially for currently widespread network cameras, which are settled in various and complex indoor scenes, previous fixed feature selection schemes cannot meet general requirements. To properly handle these problems, first, we propose a scene-adaptive hierarchical data association scheme, which adaptively selects features with higher reliability on target representation in the applied scene, and gradually combines features to the minimum requirement of discriminating ambiguous targets; second, a novel depth-invariant part-based appearance model using RGB-D data is proposed which makes the appearance model robust to scale change, partial occlusion and view-truncation. The introduce of RGB-D data increases the diversity of features, which provides more types of features for feature selection in data association and enhances the final multi-tracking performance. We validate our method from several aspects including scene-adaptive feature selection scheme, hierarchical data association scheme and RGB-D based appearance modeling scheme in various indoor scenes, which demonstrates its effectiveness and efficiency on improving multi-tracking performances in various indoor scenes.
文摘Correction to:Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics DOI:10.1007/s11633-019-1177-8 Authors:Ao-Xue Li,Ke-Xin Zhang,Li-Wei Wang The article Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics written by Ao-Xue Li,Ke-Xin Zhang and Li-Wei Wang,was originally published on vol.16,no.5 of International Journal of Automation and Computing without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication.