This paper proposes a cascade deep convolutional neural network to address the loosening detection problem of bolts on axlebox covers.Firstly,an SSD network based on ResNet50 and CBAM module by improving bolt image fe...This paper proposes a cascade deep convolutional neural network to address the loosening detection problem of bolts on axlebox covers.Firstly,an SSD network based on ResNet50 and CBAM module by improving bolt image features is proposed for locating bolts on axlebox covers.And then,theA2-PFN is proposed according to the slender features of the marker lines for extracting more accurate marker lines regions of the bolts.Finally,a rectangular approximationmethod is proposed to regularize themarker line regions asaway tocalculate the angle of themarker line and plot all the angle values into an angle table,according to which the criteria of the angle table can determine whether the bolt with the marker line is in danger of loosening.Meanwhile,our improved algorithm is compared with the pre-improved algorithmin the object localization stage.The results show that our proposed method has a significant improvement in both detection accuracy and detection speed,where ourmAP(IoU=0.75)reaches 0.77 and fps reaches 16.6.And in the saliency detection stage,after qualitative comparison and quantitative comparison,our method significantly outperforms other state-of-the-art methods,where our MAE reaches 0.092,F-measure reaches 0.948 and AUC reaches 0.943.Ultimately,according to the angle table,out of 676 bolt samples,a total of 60 bolts are loose,69 bolts are at risk of loosening,and 547 bolts are tightened.展开更多
The rapid development of information technology has fueled an ever-increasing demand for ultrafast and ultralow-en-ergy-consumption computing.Existing computing instruments are pre-dominantly electronic processors,whi...The rapid development of information technology has fueled an ever-increasing demand for ultrafast and ultralow-en-ergy-consumption computing.Existing computing instruments are pre-dominantly electronic processors,which use elec-trons as information carriers and possess von Neumann architecture featured by physical separation of storage and pro-cessing.The scaling of computing speed is limited not only by data transfer between memory and processing units,but also by RC delay associated with integrated circuits.Moreover,excessive heating due to Ohmic losses is becoming a severe bottleneck for both speed and power consumption scaling.Using photons as information carriers is a promising alternative.Owing to the weak third-order optical nonlinearity of conventional materials,building integrated photonic com-puting chips under traditional von Neumann architecture has been a challenge.Here,we report a new all-optical comput-ing framework to realize ultrafast and ultralow-energy-consumption all-optical computing based on convolutional neural networks.The device is constructed from cascaded silicon Y-shaped waveguides with side-coupled silicon waveguide segments which we termed“weight modulators”to enable complete phase and amplitude control in each waveguide branch.The generic device concept can be used for equation solving,multifunctional logic operations as well as many other mathematical operations.Multiple computing functions including transcendental equation solvers,multifarious logic gate operators,and half-adders were experimentally demonstrated to validate the all-optical computing performances.The time-of-flight of light through the network structure corresponds to an ultrafast computing time of the order of several picoseconds with an ultralow energy consumption of dozens of femtojoules per bit.Our approach can be further expan-ded to fulfill other complex computing tasks based on non-von Neumann architectures and thus paves a new way for on-chip all-optical computing.展开更多
The background pattern of patterned fabrics is complex,which has a great interference in the extraction of defect features.Traditional machine vision algorithms rely on artificially designed features,which are greatly...The background pattern of patterned fabrics is complex,which has a great interference in the extraction of defect features.Traditional machine vision algorithms rely on artificially designed features,which are greatly affected by background patterns and are difficult to effectively extract flaw features.Therefore,a convolutional neural network(CNN)with automatic feature extraction is proposed.On the basis of the two-stage detection model Faster R-CNN,Resnet-50 is used as the backbone network,and the problem of flaws with extreme aspect ratio is solved by improving the initialization algorithm of the prior frame aspect ratio,and the improved multi-scale model is designed to improve detection of small defects.The cascade R-CNN is introduced to improve the accuracy of defect detection,and the online hard example mining(OHEM)algorithm is used to strengthen the learning of hard samples to reduce the interference of complex backgrounds on the defect detection of patterned fabrics,and construct the focal loss as a loss function to reduce the impact of sample imbalance.In order to verify the effectiveness of the improved algorithm,a defect detection comparison experiment was set up.The experimental results show that the accuracy of the defect detection algorithm of patterned fabrics in this paper can reach 95.7%,and it can accurately locate the defect location and meet the actual needs of the factory.展开更多
With breakthroughs in data processing and pattern recognition through deep learning technologies,the use of advanced algorithmic models for analyzing and interpreting soil spectral information has provided an efficien...With breakthroughs in data processing and pattern recognition through deep learning technologies,the use of advanced algorithmic models for analyzing and interpreting soil spectral information has provided an efficient and economical method for soil quality assessment.However,traditional single-output networks exhibit limitations in the prediction process,particularly in their inability to fully utilize the correlations among various elements.As a result,single-output networks tend to be optimized for a single task,neglecting the interrelationships among different soil elements,which limits prediction accuracy and model generalizability.To overcome this limitation,in this study,a multi-task learning architecture with a progressive extraction network was implemented for the simultaneous prediction of multiple indicators in soil,including nitrogen(N),organic carbon(OC),calcium carbonate(CaCO 3),cation exchange capacity(CEC),and pH.Furthermore,while incorporating the Pearson correlation coefficient,convolutional neural networks,long short-term memory networks and attention mechanisms were combined to extract local abstract features from the original spectra,thereby further improving the model.This architecture is referred to as the Relevance-sharing Progressive Layered Extraction Network.The model employs an adaptive joint loss optimization method to update the weights of individual task losses in the multi-task learning training process.展开更多
Deep neural networks are now widely used in the medical image segmentation field for their performance superiority and no need of manual feature extraction.U-Net has been the baseline model since the very beginning du...Deep neural networks are now widely used in the medical image segmentation field for their performance superiority and no need of manual feature extraction.U-Net has been the baseline model since the very beginning due to a symmetricalU-structure for better feature extraction and fusing and suitable for small datasets.To enhance the segmentation performance of U-Net,cascaded U-Net proposes to put two U-Nets successively to segment targets from coarse to fine.However,the plain cascaded U-Net faces the problem of too less between connections so the contextual information learned by the former U-Net cannot be fully used by the latter one.In this article,we devise novel Inner Cascaded U-Net and Inner Cascaded U^(2)-Net as improvements to plain cascaded U-Net for medical image segmentation.The proposed Inner Cascaded U-Net adds inner nested connections between two U-Nets to share more contextual information.To further boost segmentation performance,we propose Inner Cascaded U^(2)-Net,which applies residual U-block to capture more global contextual information from different scales.The proposed models can be trained from scratch in an end-to-end fashion and have been evaluated on Multimodal Brain Tumor Segmentation Challenge(BraTS)2013 and ISBI Liver Tumor Segmentation Challenge(LiTS)dataset in comparison to related U-Net,cascaded U-Net,U-Net++,U^(2)-Net and state-of-the-art methods.Our experiments demonstrate that our proposed Inner Cascaded U-Net and Inner Cascaded U^(2)-Net achieve better segmentation performance in terms of dice similarity coefficient and hausdorff distance as well as get finer outline segmentation.展开更多
Image super resolution is an important field of computer research.The current mainstream image super-resolution technology is to use deep learning to mine the deeper features of the image,and then use it for image res...Image super resolution is an important field of computer research.The current mainstream image super-resolution technology is to use deep learning to mine the deeper features of the image,and then use it for image restoration.However,most of these models mentioned above only trained the images in a specific scale and do not consider the relationships between different scales of images.In order to utilize the information of images at different scales,we design a cascade network structure and cascaded super-resolution convolutional neural networks.This network contains three cascaded FSRCNNs.Due to each sub FSRCNN can process a specific scale image,our network can simultaneously exploit three scale images,and can also use the information of three different scales of images.Experiments on multiple datasets confirmed that the proposed network can achieve better performance for image SR.展开更多
Automatic modulation classification(AMC)aims to identify the modulation format of the received signals corrupted by the noise,which plays a major role in radio monitoring.In this paper,we propose a novel cascaded conv...Automatic modulation classification(AMC)aims to identify the modulation format of the received signals corrupted by the noise,which plays a major role in radio monitoring.In this paper,we propose a novel cascaded convolutional neural network(CasCNN)-based hierarchical digital modulation classification scheme,where M-ary phase shift keying(PSK)and M-ary quadrature amplitude modulation(QAM)modulation formats are considered to be classified.In CasCNN,two-block convolutional neural networks are cascaded.The first block network is utilized to classify the different classes of modulation formats,namely PSK and QAM.The second block is designed to identify the indexes of the modulations in the same PSK or QAM class.Moreover,it is noted that the gird constellation diagram extracted from the received signal is utilized as the inputs to the CasCNN.Extensive simulations demonstrate that CasCNN yields performance gain and performs stronger robustness to frequency offset compared with other recent methods.Specifically,CasCNN achieves 90%classification accuracy at 4 dB signal-to-noise ratio when the symbol length is set as 256.展开更多
To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by exist...To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by existing strategies have the problem of deviations and low accuracy.Therefore,a method for facial expression capture based on two-stage neural network is proposed in this paper which takes advantage of improved multi-task cascaded convolutional networks(MTCNN)and high-resolution network.Firstly,the convolution operation of traditional MTCNN is improved.The face information in the input image is quickly filtered by feature fusion in the first stage and Octave Convolution instead of the original ones is introduced into in the second stage to enhance the feature extraction ability of the network,which further rejects a large number of false candidates.The model outputs more accurate facial candidate windows for better landmarks recognition and locates the faces.Then the images cropped after face detection are input into high-resolution network.Multi-scale feature fusion is realized by parallel connection of multi-resolution streams,and rich high-resolution heatmaps of facial landmarks are obtained.Finally,the changes of facial landmarks recognized are tracked in real-time.The expression parameters are extracted and transmitted to Unity3D engine to drive the virtual character’s face,which can realize facial expression synchronous animation.Extensive experimental results obtained on the WFLW database demonstrate the superiority of the proposed method in terms of accuracy and robustness,especially for diverse expressions and complex background.The method can accurately capture facial expression and generate three-dimensional animation effects,making online entertainment and social interaction more immersive in shared virtual space.展开更多
In this paper, we proposed a multi-task system that can identify dish types, food ingredients, and cooking methods from food images with deep convolutional neural networks. We built up a dataset of 360 classes of diff...In this paper, we proposed a multi-task system that can identify dish types, food ingredients, and cooking methods from food images with deep convolutional neural networks. We built up a dataset of 360 classes of different foods with at least 500 images for each class. To reduce the noises of the data, which was collected from the Internet, outlier images were detected and eliminated through a one-class SVM trained with deep convolutional features. We simultaneously trained a dish identifier, a cooking method recognizer, and a multi-label ingredient detector. They share a few low-level layers in the deep network architecture. The proposed framework shows higher accuracy than traditional method with handcrafted features, and the cooking method recognizer and ingredient detector can be applied to dishes which are not included in the training dataset to provide reference information for users.展开更多
Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively u...Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.展开更多
针对高压输电线路中防振锤的背景复杂、缺陷目标小及类别数量不均衡问题,提出一种改进的Cascade R-CNN(cascade region convolutional neural networks)网络模型,用于防振锤的缺陷识别.将SE(squeeze and excitation)模块嵌入ResNet-101(...针对高压输电线路中防振锤的背景复杂、缺陷目标小及类别数量不均衡问题,提出一种改进的Cascade R-CNN(cascade region convolutional neural networks)网络模型,用于防振锤的缺陷识别.将SE(squeeze and excitation)模块嵌入ResNet-101(residual network-101),以增强网络学习能力.引入FPN(feature pyramid networks)模块提取多尺度的缺陷特征.利用Focal Loss函数降低Cascade R-CNN候选区域提取模块的分类损失.实验结果表明:相对于其他4种模型,该文模型有相对高的识别准确率;识别防振锤缺陷的效果良好.因此,该文模型具有有效性.展开更多
Prevailing linguistic steganalysis approaches focus on learning sensitive features to distinguish a particular category of steganographic texts from non-steganographic texts,by performing binary classification.While i...Prevailing linguistic steganalysis approaches focus on learning sensitive features to distinguish a particular category of steganographic texts from non-steganographic texts,by performing binary classification.While it remains an unsolved problem and poses a significant threat to the security of cyberspace when various categories of non-steganographic or steganographic texts coexist.In this paper,we propose a general linguistic steganalysis framework named LS-MTL,which introduces the idea of multi-task learning to deal with the classification of various categories of steganographic and non-steganographic texts.LS-MTL captures sensitive linguistic features from multiple related linguistic steganalysis tasks and can concurrently handle diverse tasks with a constructed model.In the proposed framework,convolutional neural networks(CNNs)are utilized as private base models to extract sensitive features for each steganalysis task.Besides,a shared CNN is built to capture potential interaction information and share linguistic features among all tasks.Finally,LS-MTL incorporates the private and shared sensitive features to identify the detected text as steganographic or non-steganographic.Experimental results demonstrate that the proposed framework LS-MTL outperforms the baseline in the multi-category linguistic steganalysis task,while average Acc,Pre,and Rec are increased by 0.5%,1.4%,and 0.4%,respectively.More ablation experimental results show that LS-MTL with the shared module has robust generalization capability and achieves good detection performance even in the case of spare data.展开更多
文摘This paper proposes a cascade deep convolutional neural network to address the loosening detection problem of bolts on axlebox covers.Firstly,an SSD network based on ResNet50 and CBAM module by improving bolt image features is proposed for locating bolts on axlebox covers.And then,theA2-PFN is proposed according to the slender features of the marker lines for extracting more accurate marker lines regions of the bolts.Finally,a rectangular approximationmethod is proposed to regularize themarker line regions asaway tocalculate the angle of themarker line and plot all the angle values into an angle table,according to which the criteria of the angle table can determine whether the bolt with the marker line is in danger of loosening.Meanwhile,our improved algorithm is compared with the pre-improved algorithmin the object localization stage.The results show that our proposed method has a significant improvement in both detection accuracy and detection speed,where ourmAP(IoU=0.75)reaches 0.77 and fps reaches 16.6.And in the saliency detection stage,after qualitative comparison and quantitative comparison,our method significantly outperforms other state-of-the-art methods,where our MAE reaches 0.092,F-measure reaches 0.948 and AUC reaches 0.943.Ultimately,according to the angle table,out of 676 bolt samples,a total of 60 bolts are loose,69 bolts are at risk of loosening,and 547 bolts are tightened.
基金financial supports from the National Key Research and Development Program of China(2018YFB2200403)National Natural Sci-ence Foundation of China(NSFC)(61775003,11734001,91950204,11527901,11604378,91850117).
文摘The rapid development of information technology has fueled an ever-increasing demand for ultrafast and ultralow-en-ergy-consumption computing.Existing computing instruments are pre-dominantly electronic processors,which use elec-trons as information carriers and possess von Neumann architecture featured by physical separation of storage and pro-cessing.The scaling of computing speed is limited not only by data transfer between memory and processing units,but also by RC delay associated with integrated circuits.Moreover,excessive heating due to Ohmic losses is becoming a severe bottleneck for both speed and power consumption scaling.Using photons as information carriers is a promising alternative.Owing to the weak third-order optical nonlinearity of conventional materials,building integrated photonic com-puting chips under traditional von Neumann architecture has been a challenge.Here,we report a new all-optical comput-ing framework to realize ultrafast and ultralow-energy-consumption all-optical computing based on convolutional neural networks.The device is constructed from cascaded silicon Y-shaped waveguides with side-coupled silicon waveguide segments which we termed“weight modulators”to enable complete phase and amplitude control in each waveguide branch.The generic device concept can be used for equation solving,multifunctional logic operations as well as many other mathematical operations.Multiple computing functions including transcendental equation solvers,multifarious logic gate operators,and half-adders were experimentally demonstrated to validate the all-optical computing performances.The time-of-flight of light through the network structure corresponds to an ultrafast computing time of the order of several picoseconds with an ultralow energy consumption of dozens of femtojoules per bit.Our approach can be further expan-ded to fulfill other complex computing tasks based on non-von Neumann architectures and thus paves a new way for on-chip all-optical computing.
基金National Key Research and Development Project,China(No.2018YFB1308800)。
文摘The background pattern of patterned fabrics is complex,which has a great interference in the extraction of defect features.Traditional machine vision algorithms rely on artificially designed features,which are greatly affected by background patterns and are difficult to effectively extract flaw features.Therefore,a convolutional neural network(CNN)with automatic feature extraction is proposed.On the basis of the two-stage detection model Faster R-CNN,Resnet-50 is used as the backbone network,and the problem of flaws with extreme aspect ratio is solved by improving the initialization algorithm of the prior frame aspect ratio,and the improved multi-scale model is designed to improve detection of small defects.The cascade R-CNN is introduced to improve the accuracy of defect detection,and the online hard example mining(OHEM)algorithm is used to strengthen the learning of hard samples to reduce the interference of complex backgrounds on the defect detection of patterned fabrics,and construct the focal loss as a loss function to reduce the impact of sample imbalance.In order to verify the effectiveness of the improved algorithm,a defect detection comparison experiment was set up.The experimental results show that the accuracy of the defect detection algorithm of patterned fabrics in this paper can reach 95.7%,and it can accurately locate the defect location and meet the actual needs of the factory.
文摘With breakthroughs in data processing and pattern recognition through deep learning technologies,the use of advanced algorithmic models for analyzing and interpreting soil spectral information has provided an efficient and economical method for soil quality assessment.However,traditional single-output networks exhibit limitations in the prediction process,particularly in their inability to fully utilize the correlations among various elements.As a result,single-output networks tend to be optimized for a single task,neglecting the interrelationships among different soil elements,which limits prediction accuracy and model generalizability.To overcome this limitation,in this study,a multi-task learning architecture with a progressive extraction network was implemented for the simultaneous prediction of multiple indicators in soil,including nitrogen(N),organic carbon(OC),calcium carbonate(CaCO 3),cation exchange capacity(CEC),and pH.Furthermore,while incorporating the Pearson correlation coefficient,convolutional neural networks,long short-term memory networks and attention mechanisms were combined to extract local abstract features from the original spectra,thereby further improving the model.This architecture is referred to as the Relevance-sharing Progressive Layered Extraction Network.The model employs an adaptive joint loss optimization method to update the weights of individual task losses in the multi-task learning training process.
基金supported in part by the National Nature Science Foundation of China(No.62172299)in part by the Shanghai Municipal Science and Technology Major Project(No.2021SHZDZX0100)in part by the Fundamental Research Funds for the Central Universi-ties of China.
文摘Deep neural networks are now widely used in the medical image segmentation field for their performance superiority and no need of manual feature extraction.U-Net has been the baseline model since the very beginning due to a symmetricalU-structure for better feature extraction and fusing and suitable for small datasets.To enhance the segmentation performance of U-Net,cascaded U-Net proposes to put two U-Nets successively to segment targets from coarse to fine.However,the plain cascaded U-Net faces the problem of too less between connections so the contextual information learned by the former U-Net cannot be fully used by the latter one.In this article,we devise novel Inner Cascaded U-Net and Inner Cascaded U^(2)-Net as improvements to plain cascaded U-Net for medical image segmentation.The proposed Inner Cascaded U-Net adds inner nested connections between two U-Nets to share more contextual information.To further boost segmentation performance,we propose Inner Cascaded U^(2)-Net,which applies residual U-block to capture more global contextual information from different scales.The proposed models can be trained from scratch in an end-to-end fashion and have been evaluated on Multimodal Brain Tumor Segmentation Challenge(BraTS)2013 and ISBI Liver Tumor Segmentation Challenge(LiTS)dataset in comparison to related U-Net,cascaded U-Net,U-Net++,U^(2)-Net and state-of-the-art methods.Our experiments demonstrate that our proposed Inner Cascaded U-Net and Inner Cascaded U^(2)-Net achieve better segmentation performance in terms of dice similarity coefficient and hausdorff distance as well as get finer outline segmentation.
基金supported in part by the National Natural Science Foundation of China under Grant 61806099in part by the Natural Science Foundation of Jiangsu Province of China under Grant BK20180790,in part by the Natural Science Research of Jiangsu Higher Education Institutions of China under Grant 8KJB520033in part by Startup Foundation for Introducing Talent of Nanjing University of Information Science and Technology under Grant 2243141701077.
文摘Image super resolution is an important field of computer research.The current mainstream image super-resolution technology is to use deep learning to mine the deeper features of the image,and then use it for image restoration.However,most of these models mentioned above only trained the images in a specific scale and do not consider the relationships between different scales of images.In order to utilize the information of images at different scales,we design a cascade network structure and cascaded super-resolution convolutional neural networks.This network contains three cascaded FSRCNNs.Due to each sub FSRCNN can process a specific scale image,our network can simultaneously exploit three scale images,and can also use the information of three different scales of images.Experiments on multiple datasets confirmed that the proposed network can achieve better performance for image SR.
基金National Key Research and Development Program of China under(2019YFB1804404)Beijing Natural Science Foundation(4202046)+1 种基金National Natural Science Foundation of China(61801052)Guangdong Key Field R&D Program(2018B010124001)。
文摘Automatic modulation classification(AMC)aims to identify the modulation format of the received signals corrupted by the noise,which plays a major role in radio monitoring.In this paper,we propose a novel cascaded convolutional neural network(CasCNN)-based hierarchical digital modulation classification scheme,where M-ary phase shift keying(PSK)and M-ary quadrature amplitude modulation(QAM)modulation formats are considered to be classified.In CasCNN,two-block convolutional neural networks are cascaded.The first block network is utilized to classify the different classes of modulation formats,namely PSK and QAM.The second block is designed to identify the indexes of the modulations in the same PSK or QAM class.Moreover,it is noted that the gird constellation diagram extracted from the received signal is utilized as the inputs to the CasCNN.Extensive simulations demonstrate that CasCNN yields performance gain and performs stronger robustness to frequency offset compared with other recent methods.Specifically,CasCNN achieves 90%classification accuracy at 4 dB signal-to-noise ratio when the symbol length is set as 256.
基金This research was funded by College Student Innovation and Entrepreneurship Training Program,grant number 2021055Z and S202110082031the Special Project for Cultivating Scientific and Technological Innovation Ability of College and Middle School Students in Hebei Province,Grant Number 2021H011404.
文摘To generate realistic three-dimensional animation of virtual character,capturing real facial expression is the primary task.Due to diverse facial expressions and complex background,facial landmarks recognized by existing strategies have the problem of deviations and low accuracy.Therefore,a method for facial expression capture based on two-stage neural network is proposed in this paper which takes advantage of improved multi-task cascaded convolutional networks(MTCNN)and high-resolution network.Firstly,the convolution operation of traditional MTCNN is improved.The face information in the input image is quickly filtered by feature fusion in the first stage and Octave Convolution instead of the original ones is introduced into in the second stage to enhance the feature extraction ability of the network,which further rejects a large number of false candidates.The model outputs more accurate facial candidate windows for better landmarks recognition and locates the faces.Then the images cropped after face detection are input into high-resolution network.Multi-scale feature fusion is realized by parallel connection of multi-resolution streams,and rich high-resolution heatmaps of facial landmarks are obtained.Finally,the changes of facial landmarks recognized are tracked in real-time.The expression parameters are extracted and transmitted to Unity3D engine to drive the virtual character’s face,which can realize facial expression synchronous animation.Extensive experimental results obtained on the WFLW database demonstrate the superiority of the proposed method in terms of accuracy and robustness,especially for diverse expressions and complex background.The method can accurately capture facial expression and generate three-dimensional animation effects,making online entertainment and social interaction more immersive in shared virtual space.
基金This work was supported by the National High Technology Research and Development 863 Program of China under Grant No. 2013AA013903, the National Natural Science Foundation of China under Grant No. 61373069, the Research Grant of Beijing Higher Institution Engineering Research Center, and the Tsinghua University Initiative Scientific Research Program.
文摘In this paper, we proposed a multi-task system that can identify dish types, food ingredients, and cooking methods from food images with deep convolutional neural networks. We built up a dataset of 360 classes of different foods with at least 500 images for each class. To reduce the noises of the data, which was collected from the Internet, outlier images were detected and eliminated through a one-class SVM trained with deep convolutional features. We simultaneously trained a dish identifier, a cooking method recognizer, and a multi-label ingredient detector. They share a few low-level layers in the deep network architecture. The proposed framework shows higher accuracy than traditional method with handcrafted features, and the cooking method recognizer and ingredient detector can be applied to dishes which are not included in the training dataset to provide reference information for users.
基金National Key R&D Program of China(No.2022ZD0118401).
文摘Deep learning based methods have been successfully applied to semantic segmentation of optical remote sensing images.However,as more and more remote sensing data is available,it is a new challenge to comprehensively utilize multi-modal remote sensing data to break through the performance bottleneck of single-modal interpretation.In addition,semantic segmentation and height estimation in remote sensing data are two tasks with strong correlation,but existing methods usually study individual tasks separately,which leads to high computational resource overhead.To this end,we propose a Multi-Task learning framework for Multi-Modal remote sensing images(MM_MT).Specifically,we design a Cross-Modal Feature Fusion(CMFF)method,which aggregates complementary information of different modalities to improve the accuracy of semantic segmentation and height estimation.Besides,a dual-stream multi-task learning method is introduced for Joint Semantic Segmentation and Height Estimation(JSSHE),extracting common features in a shared network to save time and resources,and then learning task-specific features in two task branches.Experimental results on the public multi-modal remote sensing image dataset Potsdam show that compared to training two tasks independently,multi-task learning saves 20%of training time and achieves competitive performance with mIoU of 83.02%for semantic segmentation and accuracy of 95.26%for height estimation.
基金This paper is partly supported by the National Natural Science Foundation of China unde rGrants 61972057 and 62172059Hunan ProvincialNatural Science Foundation of China underGrant 2022JJ30623 and 2019JJ50287Scientific Research Fund of Hunan Provincial Education Department of China under Grant 21A0211 and 19A265。
文摘Prevailing linguistic steganalysis approaches focus on learning sensitive features to distinguish a particular category of steganographic texts from non-steganographic texts,by performing binary classification.While it remains an unsolved problem and poses a significant threat to the security of cyberspace when various categories of non-steganographic or steganographic texts coexist.In this paper,we propose a general linguistic steganalysis framework named LS-MTL,which introduces the idea of multi-task learning to deal with the classification of various categories of steganographic and non-steganographic texts.LS-MTL captures sensitive linguistic features from multiple related linguistic steganalysis tasks and can concurrently handle diverse tasks with a constructed model.In the proposed framework,convolutional neural networks(CNNs)are utilized as private base models to extract sensitive features for each steganalysis task.Besides,a shared CNN is built to capture potential interaction information and share linguistic features among all tasks.Finally,LS-MTL incorporates the private and shared sensitive features to identify the detected text as steganographic or non-steganographic.Experimental results demonstrate that the proposed framework LS-MTL outperforms the baseline in the multi-category linguistic steganalysis task,while average Acc,Pre,and Rec are increased by 0.5%,1.4%,and 0.4%,respectively.More ablation experimental results show that LS-MTL with the shared module has robust generalization capability and achieves good detection performance even in the case of spare data.