With the continuous progress of The Times and the development of technology,the rise of network social media has also brought the“explosive”growth of image data.As one of the main ways of People’s Daily communicati...With the continuous progress of The Times and the development of technology,the rise of network social media has also brought the“explosive”growth of image data.As one of the main ways of People’s Daily communication,image is widely used as a carrier of communication because of its rich content,intuitive and other advantages.Image recognition based on convolution neural network is the first application in the field of image recognition.A series of algorithm operations such as image eigenvalue extraction,recognition and convolution are used to identify and analyze different images.The rapid development of artificial intelligence makes machine learning more and more important in its research field.Use algorithms to learn each piece of data and predict the outcome.This has become an important key to open the door of artificial intelligence.In machine vision,image recognition is the foundation,but how to associate the low-level information in the image with the high-level image semantics becomes the key problem of image recognition.Predecessors have provided many model algorithms,which have laid a solid foundation for the development of artificial intelligence and image recognition.The multi-level information fusion model based on the VGG16 model is an improvement on the fully connected neural network.Different from full connection network,convolutional neural network does not use full connection method in each layer of neurons of neural network,but USES some nodes for connection.Although this method reduces the computation time,due to the fact that the convolutional neural network model will lose some useful feature information in the process of propagation and calculation,this paper improves the model to be a multi-level information fusion of the convolution calculation method,and further recovers the discarded feature information,so as to improve the recognition rate of the image.VGG divides the network into five groups(mimicking the five layers of AlexNet),yet it USES 3*3 filters and combines them as a convolution sequence.Network deeper DCNN,channel number is bigger.The recognition rate of the model was verified by 0RL Face Database,BioID Face Database and CASIA Face Image Database.展开更多
The novel eye-based human-computer interaction(HCI) system aims to provide people, especially, disabled persons,a new way of communication with surroundings. It adopts a series of continual eye movements as input to p...The novel eye-based human-computer interaction(HCI) system aims to provide people, especially, disabled persons,a new way of communication with surroundings. It adopts a series of continual eye movements as input to perform simple control activities. Identification of eye movements is the crucial technology in these eye-based HCI systems. At present, researches on eye movement identification mainly focus on frontal face images. In fact, acquisition of non-frontal face images is more reasonable in real applications. In this paper, we discuss the identification process of eye movements from non-frontal face images. Firstly, the original head-shoulder images of 0?–±60?azimuths are sampled without any auxiliary light source. Secondly, the non-frontal face region is detected by using the Adaboost cascade classifiers. After that, we roughly extract eye windows by the integral projection function.Then, we propose a new method to calculate the x- y coordinates of the pupil center point by searching the minimal intensity value in the eye windows. According to the trajectory of the pupil center points, different eye movements(eye moving left, right, up or down)are successfully identified. A set of experiments is presented.展开更多
The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision...The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision approaches.In multiple real-life applications,for example,social media,content-based face picture retrieval is a well-invested technique for large-scale databases,where there is a significant necessity for reliable retrieval capabilities enabling quick search in a vast number of pictures.Humans widely employ faces for recognizing and identifying people.Thus,face recognition through formal or personal pictures is increasingly used in various real-life applications,such as helping crime investigators retrieve matching images from face image databases to identify victims and criminals.However,such face image retrieval becomes more challenging in large-scale databases,where traditional vision-based face analysis requires ample additional storage space than the raw face images already occupied to store extracted lengthy feature vectors and takes much longer to process and match thousands of face images.This work mainly contributes to enhancing face image retrieval performance in large-scale databases using hash codes inferred by locality-sensitive hashing(LSH)for facial hard and soft biometrics as(Hard BioHash)and(Soft BioHash),respectively,to be used as a search input for retrieving the top-k matching faces.Moreover,we propose the multi-biometric score-level fusion of both face hard and soft BioHashes(Hard-Soft BioHash Fusion)for further augmented face image retrieval.The experimental outcomes applied on the Labeled Faces in the Wild(LFW)dataset and the related attributes dataset(LFW-attributes),demonstrate that the retrieval performance of the suggested fusion approach(Hard-Soft BioHash Fusion)significantly improved the retrieval performance compared to solely using Hard BioHash or Soft BioHash in isolation,where the suggested method provides an augmented accuracy of 87%when executed on 1000 specimens and 77%on 5743 samples.These results remarkably outperform the results of the Hard BioHash method by(50%on the 1000 samples and 30%on the 5743 samples),and the Soft BioHash method by(78%on the 1000 samples and 63%on the 5743 samples).展开更多
In this paper,a new type of neural network model - Partially Connected Neural Evolutionary (PARCONE) was introduced to recognize a face gender. The neural network has a mesh structure in which each neuron didn't c...In this paper,a new type of neural network model - Partially Connected Neural Evolutionary (PARCONE) was introduced to recognize a face gender. The neural network has a mesh structure in which each neuron didn't connect to all other neurons but maintain a fixed number of connections with other neurons. In training,the evolutionary computation method was used to improve the neural network performance by change the connection neurons and its connection weights. With this new model,no feature extraction is needed and all of the pixels of a sample image can be used as the inputs of the neural network. The gender recognition experiment was made on 490 face images (245 females and 245 males from Color FERET database),which include not only frontal faces but also the faces rotated from-40°-40° in the direction of horizontal. After 300-600 generations' evolution,the gender recognition rate,rejection rate and error rate of the positive examples respectively are 96.2%,1.1%,and 2.7%. Furthermore,a large-scale GPU parallel computing method was used to accelerate neural network training. The experimental results show that the new neural model has a better pattern recognition ability and may be applied to many other pattern recognitions which need a large amount of input information.展开更多
Aiming at the problems such as low reconstruction efficiency,fuzzy texture details,and difficult convergence of reconstruction network face image super-resolution reconstruction algorithms,a new super-resolution recon...Aiming at the problems such as low reconstruction efficiency,fuzzy texture details,and difficult convergence of reconstruction network face image super-resolution reconstruction algorithms,a new super-resolution reconstruction algorithm with residual concern was proposed.Firstly,to solve the influence of redundant and invalid information about the face image super-resolution reconstruction network,an attention mechanism was introduced into the feature extraction module of the network,which improved the feature utilization rate of the overall network.Secondly,to alleviate the problem of gradient disappearance,the adaptive residual was introduced into the network to make the network model easier to converge during training,and features were supplemented according to the needs during training.The experimental results showed that the proposed algorithm had better reconstruction performance,more facial details,and clearer texture in the reconstructed face image than the comparison algorithm.In objective evaluation,the proposed algorithm's peak signalto-noise ratio and structural similarity were also better than other algorithms.展开更多
Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a nove...Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a novel end-to-end learning framework to generate diverse,realistic and controllable face images guided by face masks.The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face,such as eyes,nose and mouse.The framework consists of four components:style encoder,style decoder,generator and discriminator.The style encoder generates a style code which represents the style of the result face;the generator translate the input face mask into a real face based on the style code;the style decoder learns to reconstruct the style code from the generated face image;and the discriminator classifies an input face image as real or fake.With the style code,the proposed model can generate different face images matching the input face mask,and by manipulating the face mask,we can finely control the generated face image.We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.展开更多
Face hallucination or super-resolution is an inverse problem which is underdetermined,and the compressive sensing(CS)theory provides an effective way of seeking inverse problem solutions.In this paper,a novel compress...Face hallucination or super-resolution is an inverse problem which is underdetermined,and the compressive sensing(CS)theory provides an effective way of seeking inverse problem solutions.In this paper,a novel compressive sensing based face hallucination method is presented,which is comprised of three steps:dictionary learning、sparse coding and solving maximum a posteriori(MAP)formulation.In the first step,the K-SVD dictionary learning algorithm is adopted to obtain a dictionary which can sparsely represent high resolution(HR)face image patches.In the second step,we seek the sparsest representation for each low-resolution(LR)face image paches input using the learned dictionary,super resolution image blocks are obtained from the sparsest coefficients and dictionaries,which then are assembled into super-resolution(SR)image.Finally,MAP formulation is introduced to satisfy the consistency restrictive condition and obtain the higher quality HR images.The experimental results demonstrate that our approach can achieve better super-resolution faces compared with other state-of-the-art method.展开更多
Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”k...Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”kernel size of the convolutional operator,for the spatially dense patterns,such as the generic face images,the performance of CNNs is limited.Here,we propose a“non-local”model,termed the Speckle-Transformer(SpT)UNet,for speckle feature extraction of generic face images.It is worth noting that the lightweight SpT UNet reveals a high efficiency and strong comparative performance with Pearson Correlation Coefficient(PCC),and structural similarity measure(SSIM)exceeding 0.989,and 0.950,respectively.展开更多
Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VI...Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.展开更多
Coal-rock interface identification technology was pivotal in automatically adjusting the shearer's cutting drum during coal mining.However,it also served as a technical bottleneck hindering the advancement of inte...Coal-rock interface identification technology was pivotal in automatically adjusting the shearer's cutting drum during coal mining.However,it also served as a technical bottleneck hindering the advancement of intelligent coal mining.This study aimed to address the poor accuracy of current coal-rock identification technology on comprehensive working faces,coupled with the limited availability of coal-rock datasets.The loss function of the SegFormer model was enhanced,the model's hyperparameters and learning rate were adjusted,and an automatic recognition method was proposed for coal-rock interfaces based on FL-SegFormer.Additionally,an experimental platform was constructed to simulate the dusty environment during coal-rock cutting by the shearer,enabling the collection of coal-rock test image datasets.The morphology-based algorithms were employed to expand the coal-rock image datasets through image rotation,color dithering,and Gaussian noise injection so as to augment the diversity and applicability of the datasets.As a result,a coal-rock image dataset comprising 8424 samples was generated.The findings demonstrated that the FL-SegFormer model achieved a Mean Intersection over Union(MIoU)and mean pixel accuracy(MPA)of 97.72%and 98.83%,respectively.The FLSegFormer model outperformed other models in terms of recognition accuracy,as evidenced by an MloU exceeding 95.70% of the original image.Furthermore,the FL-SegFormer model using original coal-rock images was validated from No.15205 working face of the Yulin test mine in northern Shaanxi.The calculated average error was only 1.77%,and the model operated at a rate of 46.96 frames per second,meeting the practical application and deployment requirements in underground settings.These results provided a theoretical foundation for achieving automatic and efficient mining with coal mining machines and the intelligent development of coal mines.展开更多
Face image analysis is one among several important cues in computer vision.Over the last five decades,methods for face analysis have received immense attention due to large scale applications in various face analysis ...Face image analysis is one among several important cues in computer vision.Over the last five decades,methods for face analysis have received immense attention due to large scale applications in various face analysis tasks.Face parsing strongly benefits various human face image analysis tasks inducing face pose estimation.In this paper we propose a 3D head pose estimation framework developed through a prior end to end deep face parsing model.We have developed an end to end face parts segmentation framework through deep convolutional neural networks(DCNNs).For training a deep face parts parsing model,we label face images for seven different classes,including eyes,brows,nose,hair,mouth,skin,and back.We extract features from gray scale images by using DCNNs.We train a classifier using the extracted features.We use the probabilistic classification method to produce gray scale images in the form of probability maps for each dense semantic class.We use a next stage of DCNNs and extract features from grayscale images created as probability maps during the segmentation phase.We assess the performance of our newly proposed model on four standard head pose datasets,including Pointing’04,Annotated Facial Landmarks in the Wild(AFLW),Boston University(BU),and ICT-3DHP,obtaining superior results as compared to previous results.展开更多
Evaluating individuals' personality traits and intelligence from their faces plays a crucial role in interpersonal relationship and important social events such as elections and court sentences. To assess the possibl...Evaluating individuals' personality traits and intelligence from their faces plays a crucial role in interpersonal relationship and important social events such as elections and court sentences. To assess the possible correlations between personality traits (also measured intelligence) and face images, we first construct a dataset consisting of face photographs, personality measurements, and intelligence measurements. Then, we build an end-to-end convolutional neural network for prediction of personality traits and intelligence to investigate whether self-reported personality traits and intelligence can be predicted reliably from a face image. To our knowledge, it is the first work where deep learning is applied to this problem. Experimental results show the following three points: 1) "Rule-consciousness" and "Tension" can be reliably predicted from face images. 2) It is difficult, if not impossible, to predict intelligence from face images, a finding in accord with previous studies. 3) Convolutional neural network (CNN) features outperform traditional handcrafted features in predicting traits.展开更多
One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity informati...One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity information,which may lead to face shape distortion and undermine the realism of reenactment results.To solve this problem,in this paper,we propose using a 3D morphable model(3DMM)for explicit facial semantic decomposition and identity disentanglement.Instead of using 3D coefficients alone for reenactment control,we take advantage of the generative ability of 3DMM to render textured face proxies.These proxies contain abundant yet compact geometric and semantic information of human faces,which enables us to compute the face motion field between source and driving images by estimating the dense correspondence.In this way,we can approximate reenactment results by warping source images according to the motion field,and a generative adversarial network(GAN)is adopted to further improve the visual quality of warping results.Extensive experiments on various datasets demonstrate the advantages of the proposed method over existing state-of-the-art benchmarks in both identity preservation and reenactment fulfillment.展开更多
Objective:To develop a multimodal deep-learning model for classifying Chinese medicine constitution,i.e.,the balanced and unbalanced constitutions,based on inspection of tongue and face images,pulse waves from palpati...Objective:To develop a multimodal deep-learning model for classifying Chinese medicine constitution,i.e.,the balanced and unbalanced constitutions,based on inspection of tongue and face images,pulse waves from palpation,and health information from a total of 540 subjects.Methods:This study data consisted of tongue and face images,pulse waves obtained by palpation,and health information,including personal information,life habits,medical history,and current symptoms,from 540 subjects(202 males and 338 females).Convolutional neural networks,recurrent neural networks,and fully connected neural networks were used to extract deep features from the data.Feature fusion and decision fusion models were constructed for the multimodal data.Results:The optimal models for tongue and face images,pulse waves and health information were ResNet18,Gate Recurrent Unit,and entity embedding,respectively.Feature fusion was superior to decision fusion.The multimodal analysis revealed that multimodal data compensated for the loss of information from a single mode,resulting in improved classification performance.Conclusions:Multimodal data fusion can supplement single model information and improve classification performance.Our research underscores the effectiveness of multimodal deep learning technology to identify body constitution for modernizing and improving the intelligent application of Chinese medicine.展开更多
文摘With the continuous progress of The Times and the development of technology,the rise of network social media has also brought the“explosive”growth of image data.As one of the main ways of People’s Daily communication,image is widely used as a carrier of communication because of its rich content,intuitive and other advantages.Image recognition based on convolution neural network is the first application in the field of image recognition.A series of algorithm operations such as image eigenvalue extraction,recognition and convolution are used to identify and analyze different images.The rapid development of artificial intelligence makes machine learning more and more important in its research field.Use algorithms to learn each piece of data and predict the outcome.This has become an important key to open the door of artificial intelligence.In machine vision,image recognition is the foundation,but how to associate the low-level information in the image with the high-level image semantics becomes the key problem of image recognition.Predecessors have provided many model algorithms,which have laid a solid foundation for the development of artificial intelligence and image recognition.The multi-level information fusion model based on the VGG16 model is an improvement on the fully connected neural network.Different from full connection network,convolutional neural network does not use full connection method in each layer of neurons of neural network,but USES some nodes for connection.Although this method reduces the computation time,due to the fact that the convolutional neural network model will lose some useful feature information in the process of propagation and calculation,this paper improves the model to be a multi-level information fusion of the convolution calculation method,and further recovers the discarded feature information,so as to improve the recognition rate of the image.VGG divides the network into five groups(mimicking the five layers of AlexNet),yet it USES 3*3 filters and combines them as a convolution sequence.Network deeper DCNN,channel number is bigger.The recognition rate of the model was verified by 0RL Face Database,BioID Face Database and CASIA Face Image Database.
基金supported by Innovation Program of Shanghai Municipal Education Commission of China(No.14YZ169)
文摘The novel eye-based human-computer interaction(HCI) system aims to provide people, especially, disabled persons,a new way of communication with surroundings. It adopts a series of continual eye movements as input to perform simple control activities. Identification of eye movements is the crucial technology in these eye-based HCI systems. At present, researches on eye movement identification mainly focus on frontal face images. In fact, acquisition of non-frontal face images is more reasonable in real applications. In this paper, we discuss the identification process of eye movements from non-frontal face images. Firstly, the original head-shoulder images of 0?–±60?azimuths are sampled without any auxiliary light source. Secondly, the non-frontal face region is detected by using the Adaboost cascade classifiers. After that, we roughly extract eye windows by the integral projection function.Then, we propose a new method to calculate the x- y coordinates of the pupil center point by searching the minimal intensity value in the eye windows. According to the trajectory of the pupil center points, different eye movements(eye moving left, right, up or down)are successfully identified. A set of experiments is presented.
基金supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia,grant number 077416-04.
文摘The utilization of digital picture search and retrieval has grown substantially in numerous fields for different purposes during the last decade,owing to the continuing advances in image processing and computer vision approaches.In multiple real-life applications,for example,social media,content-based face picture retrieval is a well-invested technique for large-scale databases,where there is a significant necessity for reliable retrieval capabilities enabling quick search in a vast number of pictures.Humans widely employ faces for recognizing and identifying people.Thus,face recognition through formal or personal pictures is increasingly used in various real-life applications,such as helping crime investigators retrieve matching images from face image databases to identify victims and criminals.However,such face image retrieval becomes more challenging in large-scale databases,where traditional vision-based face analysis requires ample additional storage space than the raw face images already occupied to store extracted lengthy feature vectors and takes much longer to process and match thousands of face images.This work mainly contributes to enhancing face image retrieval performance in large-scale databases using hash codes inferred by locality-sensitive hashing(LSH)for facial hard and soft biometrics as(Hard BioHash)and(Soft BioHash),respectively,to be used as a search input for retrieving the top-k matching faces.Moreover,we propose the multi-biometric score-level fusion of both face hard and soft BioHashes(Hard-Soft BioHash Fusion)for further augmented face image retrieval.The experimental outcomes applied on the Labeled Faces in the Wild(LFW)dataset and the related attributes dataset(LFW-attributes),demonstrate that the retrieval performance of the suggested fusion approach(Hard-Soft BioHash Fusion)significantly improved the retrieval performance compared to solely using Hard BioHash or Soft BioHash in isolation,where the suggested method provides an augmented accuracy of 87%when executed on 1000 specimens and 77%on 5743 samples.These results remarkably outperform the results of the Hard BioHash method by(50%on the 1000 samples and 30%on the 5743 samples),and the Soft BioHash method by(78%on the 1000 samples and 63%on the 5743 samples).
基金National Natural Science Foundation of China (No.60975084)
文摘In this paper,a new type of neural network model - Partially Connected Neural Evolutionary (PARCONE) was introduced to recognize a face gender. The neural network has a mesh structure in which each neuron didn't connect to all other neurons but maintain a fixed number of connections with other neurons. In training,the evolutionary computation method was used to improve the neural network performance by change the connection neurons and its connection weights. With this new model,no feature extraction is needed and all of the pixels of a sample image can be used as the inputs of the neural network. The gender recognition experiment was made on 490 face images (245 females and 245 males from Color FERET database),which include not only frontal faces but also the faces rotated from-40°-40° in the direction of horizontal. After 300-600 generations' evolution,the gender recognition rate,rejection rate and error rate of the positive examples respectively are 96.2%,1.1%,and 2.7%. Furthermore,a large-scale GPU parallel computing method was used to accelerate neural network training. The experimental results show that the new neural model has a better pattern recognition ability and may be applied to many other pattern recognitions which need a large amount of input information.
基金supported by National Natural Science Foundation of China(No.62063014)。
文摘Aiming at the problems such as low reconstruction efficiency,fuzzy texture details,and difficult convergence of reconstruction network face image super-resolution reconstruction algorithms,a new super-resolution reconstruction algorithm with residual concern was proposed.Firstly,to solve the influence of redundant and invalid information about the face image super-resolution reconstruction network,an attention mechanism was introduced into the feature extraction module of the network,which improved the feature utilization rate of the overall network.Secondly,to alleviate the problem of gradient disappearance,the adaptive residual was introduced into the network to make the network model easier to converge during training,and features were supplemented according to the needs during training.The experimental results showed that the proposed algorithm had better reconstruction performance,more facial details,and clearer texture in the reconstructed face image than the comparison algorithm.In objective evaluation,the proposed algorithm's peak signalto-noise ratio and structural similarity were also better than other algorithms.
基金This work is supported by the National Key Research and Development Program of China(2018YFF0214700).
文摘Recent studies have shown remarkable success in face image generation task.However,existing approaches have limited diversity,quality and controllability in generating results.To address these issues,we propose a novel end-to-end learning framework to generate diverse,realistic and controllable face images guided by face masks.The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face,such as eyes,nose and mouse.The framework consists of four components:style encoder,style decoder,generator and discriminator.The style encoder generates a style code which represents the style of the result face;the generator translate the input face mask into a real face based on the style code;the style decoder learns to reconstruct the style code from the generated face image;and the discriminator classifies an input face image as real or fake.With the style code,the proposed model can generate different face images matching the input face mask,and by manipulating the face mask,we can finely control the generated face image.We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.
文摘Face hallucination or super-resolution is an inverse problem which is underdetermined,and the compressive sensing(CS)theory provides an effective way of seeking inverse problem solutions.In this paper,a novel compressive sensing based face hallucination method is presented,which is comprised of three steps:dictionary learning、sparse coding and solving maximum a posteriori(MAP)formulation.In the first step,the K-SVD dictionary learning algorithm is adopted to obtain a dictionary which can sparsely represent high resolution(HR)face image patches.In the second step,we seek the sparsest representation for each low-resolution(LR)face image paches input using the learned dictionary,super resolution image blocks are obtained from the sparsest coefficients and dictionaries,which then are assembled into super-resolution(SR)image.Finally,MAP formulation is introduced to satisfy the consistency restrictive condition and obtain the higher quality HR images.The experimental results demonstrate that our approach can achieve better super-resolution faces compared with other state-of-the-art method.
基金funding support from the Science and Technology Commission of Shanghai Municipality(Grant No.21DZ1100500)the Shanghai Frontiers Science Center Program(2021-2025 No.20)+2 种基金the Zhangjiang National Innovation Demonstration Zone(Grant No.ZJ2019ZD-005)supported by a fellowship from the China Postdoctoral Science Foundation(2020M671169)the International Postdoctoral Exchange Program from the Administrative Committee of Post-Doctoral Researchers of China([2020]33)。
文摘Significant progress has been made in computational imaging(CI),in which deep convolutional neural networks(CNNs)have demonstrated that sparse speckle patterns can be reconstructed.However,due to the limited“local”kernel size of the convolutional operator,for the spatially dense patterns,such as the generic face images,the performance of CNNs is limited.Here,we propose a“non-local”model,termed the Speckle-Transformer(SpT)UNet,for speckle feature extraction of generic face images.It is worth noting that the lightweight SpT UNet reveals a high efficiency and strong comparative performance with Pearson Correlation Coefficient(PCC),and structural similarity measure(SSIM)exceeding 0.989,and 0.950,respectively.
基金National Natural Science Foundation of China(No.62006039)National Key Research and Development Program of China(No.2019YFE0190500)。
文摘Near infrared-visible(NIR-VIS)face recognition is to match an NIR face image to a VIS image.The main challenges of NIR-VIS face recognition are the gap caused by cross-modality and the lack of sufficient paired NIR-VIS face images to train models.This paper focuses on the generation of paired NIR-VIS face images and proposes a dual variational generator based on ResNeSt(RS-DVG).RS-DVG can generate a large number of paired NIR-VIS face images from noise,and these generated NIR-VIS face images can be used as the training set together with the real NIR-VIS face images.In addition,a triplet loss function is introduced and a novel triplet selection method is proposed specifically for the training of the current face recognition model,which maximizes the inter-class distance and minimizes the intra-class distance in the input face images.The method proposed in this paper was evaluated on the datasets CASIA NIR-VIS 2.0 and BUAA-VisNir,and relatively good results were obtained.
基金funded by the National Natural Science Foundation of China(52004201,52274143,52204153)China Postdoctoral Science Foundation(2021T140551).
文摘Coal-rock interface identification technology was pivotal in automatically adjusting the shearer's cutting drum during coal mining.However,it also served as a technical bottleneck hindering the advancement of intelligent coal mining.This study aimed to address the poor accuracy of current coal-rock identification technology on comprehensive working faces,coupled with the limited availability of coal-rock datasets.The loss function of the SegFormer model was enhanced,the model's hyperparameters and learning rate were adjusted,and an automatic recognition method was proposed for coal-rock interfaces based on FL-SegFormer.Additionally,an experimental platform was constructed to simulate the dusty environment during coal-rock cutting by the shearer,enabling the collection of coal-rock test image datasets.The morphology-based algorithms were employed to expand the coal-rock image datasets through image rotation,color dithering,and Gaussian noise injection so as to augment the diversity and applicability of the datasets.As a result,a coal-rock image dataset comprising 8424 samples was generated.The findings demonstrated that the FL-SegFormer model achieved a Mean Intersection over Union(MIoU)and mean pixel accuracy(MPA)of 97.72%and 98.83%,respectively.The FLSegFormer model outperformed other models in terms of recognition accuracy,as evidenced by an MloU exceeding 95.70% of the original image.Furthermore,the FL-SegFormer model using original coal-rock images was validated from No.15205 working face of the Yulin test mine in northern Shaanxi.The calculated average error was only 1.77%,and the model operated at a rate of 46.96 frames per second,meeting the practical application and deployment requirements in underground settings.These results provided a theoretical foundation for achieving automatic and efficient mining with coal mining machines and the intelligent development of coal mines.
基金Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(2020-0-01592)Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education under Grant(2019R1F1A1058548)and Grant(2020R1G1A1013221).
文摘Face image analysis is one among several important cues in computer vision.Over the last five decades,methods for face analysis have received immense attention due to large scale applications in various face analysis tasks.Face parsing strongly benefits various human face image analysis tasks inducing face pose estimation.In this paper we propose a 3D head pose estimation framework developed through a prior end to end deep face parsing model.We have developed an end to end face parts segmentation framework through deep convolutional neural networks(DCNNs).For training a deep face parts parsing model,we label face images for seven different classes,including eyes,brows,nose,hair,mouth,skin,and back.We extract features from gray scale images by using DCNNs.We train a classifier using the extracted features.We use the probabilistic classification method to produce gray scale images in the form of probability maps for each dense semantic class.We use a next stage of DCNNs and extract features from grayscale images created as probability maps during the segmentation phase.We assess the performance of our newly proposed model on four standard head pose datasets,including Pointing’04,Annotated Facial Landmarks in the Wild(AFLW),Boston University(BU),and ICT-3DHP,obtaining superior results as compared to previous results.
基金supported by National Natural Science Foundation of China(Nos.61333015,61421004 and 61375042)Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDB02070002)
文摘Evaluating individuals' personality traits and intelligence from their faces plays a crucial role in interpersonal relationship and important social events such as elections and court sentences. To assess the possible correlations between personality traits (also measured intelligence) and face images, we first construct a dataset consisting of face photographs, personality measurements, and intelligence measurements. Then, we build an end-to-end convolutional neural network for prediction of personality traits and intelligence to investigate whether self-reported personality traits and intelligence can be predicted reliably from a face image. To our knowledge, it is the first work where deep learning is applied to this problem. Experimental results show the following three points: 1) "Rule-consciousness" and "Tension" can be reliably predicted from face images. 2) It is difficult, if not impossible, to predict intelligence from face images, a finding in accord with previous studies. 3) Convolutional neural network (CNN) features outperform traditional handcrafted features in predicting traits.
基金supported in part by the Beijing Municipal Natural Science Foundation,China(No.4222054)in part by the National Natural Science Foundation of China(Nos.62276263 and 62076240)the Youth Innovation Promotion Association CAS,China(No.Y2023143).
文摘One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity information,which may lead to face shape distortion and undermine the realism of reenactment results.To solve this problem,in this paper,we propose using a 3D morphable model(3DMM)for explicit facial semantic decomposition and identity disentanglement.Instead of using 3D coefficients alone for reenactment control,we take advantage of the generative ability of 3DMM to render textured face proxies.These proxies contain abundant yet compact geometric and semantic information of human faces,which enables us to compute the face motion field between source and driving images by estimating the dense correspondence.In this way,we can approximate reenactment results by warping source images according to the motion field,and a generative adversarial network(GAN)is adopted to further improve the visual quality of warping results.Extensive experiments on various datasets demonstrate the advantages of the proposed method over existing state-of-the-art benchmarks in both identity preservation and reenactment fulfillment.
基金Supported by the National Key Research and Development Program of China Under Grant(No.2018YFC1707704)。
文摘Objective:To develop a multimodal deep-learning model for classifying Chinese medicine constitution,i.e.,the balanced and unbalanced constitutions,based on inspection of tongue and face images,pulse waves from palpation,and health information from a total of 540 subjects.Methods:This study data consisted of tongue and face images,pulse waves obtained by palpation,and health information,including personal information,life habits,medical history,and current symptoms,from 540 subjects(202 males and 338 females).Convolutional neural networks,recurrent neural networks,and fully connected neural networks were used to extract deep features from the data.Feature fusion and decision fusion models were constructed for the multimodal data.Results:The optimal models for tongue and face images,pulse waves and health information were ResNet18,Gate Recurrent Unit,and entity embedding,respectively.Feature fusion was superior to decision fusion.The multimodal analysis revealed that multimodal data compensated for the loss of information from a single mode,resulting in improved classification performance.Conclusions:Multimodal data fusion can supplement single model information and improve classification performance.Our research underscores the effectiveness of multimodal deep learning technology to identify body constitution for modernizing and improving the intelligent application of Chinese medicine.