Dropout and other feature noising schemes have shown promise in controlling over-fitting by artificially corrupting the training data. Though extensive studies have been performed for generalized linear models, little...Dropout and other feature noising schemes have shown promise in controlling over-fitting by artificially corrupting the training data. Though extensive studies have been performed for generalized linear models, little has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for both linear SVMs and the nonlinear extension with latent representation learning. For linear SVMs, to deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re- weighted least square problem, where the re-weights are analytically updated. For nonlinear latent SVMs, we con- sider learning one layer of latent representations in SVMs and extend the data augmentation technique in conjunction with first-order Taylor-expansion to deal with the intractable expected hinge loss and the nonlinearity of latent representa- tions. Finally, we apply the similar data augmentation ideas to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions, and we further develop a non-linear extension of logistic regression by incorporating one layer of latent representations. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs.展开更多
目的舰船舷号检测识别是海面态势感知的关键技术,精准的舷号检测识别对海洋权益保护具有重要意义。但目前没有公开数据提供支持。为此,本文先构建了一个真实场景下的稀疏舰船舷号数据集(sparse ship hull number dataset in real scene,...目的舰船舷号检测识别是海面态势感知的关键技术,精准的舷号检测识别对海洋权益保护具有重要意义。但目前没有公开数据提供支持。为此,本文先构建了一个真实场景下的稀疏舰船舷号数据集(sparse ship hull number dataset in real scene,SSHN-RS),包含3004幅舰船图像,共计11328个舷号字符,覆盖了多国、各类、水平、倾斜、背景简单、背景复杂、光线不佳和被遮挡的舰船舷号样本,是一个具有挑战性的数据集。基于SSHN-RS,开展舰船舷号检测识别研究,其主要难点在于:1)样本稀疏,模型容易过拟合;2)舷号字符分布密集,网络难以充分提取各字符特征;3)部分字符存在嵌套区域和相似区域,网络会识别出大量冗余结果。针对上述难点,提出了一种基于多视角渐进式上下文解耦的舰船舷号检测识别算法。方法首先,引入一个固定中心和最大化面积的随机透视变换技术,在不增加样本数量的前提下扩充舷号姿态,实现了数据增广,提升了模型的泛化能力;其次,提出了一个渐进式上下文解耦技术,先通过依次擦除舷号各字符生成一系列新样本,再利用特征提取网络提取和融合各样本的多尺度特征,不仅减少字符上下文信息对特征学习的干扰,而且再次增广了数据;最后,在测试阶段,提出了一个掩码间扰动抑制技术,先根据预测结果采用与渐进式上下文解耦技术类似的方法生成新样本并重新进行预测,再引入一个1维非极大值抑制技术去除预测结果中错误的冗余字符,输出最佳检测识别结果,进一步优化网络性能。结果在SSHN-RS上采用主流实例分割算法进行定性和定量评估。在定量评估上,本文算法舷号的检测精确率、召回率、F值和识别率分别可达0.9854,0.9576,0.9713,0.9018,均优于其他算法。相比指标排名第2的算法,分别提高了4.51%,3.45%,3.97%,8.83%;在定性评估上,本文算法更适合舰船舷号检测识别任务,检测识别性能更高。此外,本文算法可以泛化到其他实例分割算法中,以经典算法Mask RCNN(mask region based convolutional neural network)为例,加入本文算法各模块后,各指标分别提升了9.82%,6.04%,7.80%,6.73%。结论本文算法可以解决舷号检测识别任务中因样本稀疏、舷号分布密集、部分字符存在嵌套和相似性带来的问题,在主观和客观上均取得了最先进的性能,并且具有通用性。SSHN-RS可通过https://github.com/Bingchuan897/SSHN-RS获取。展开更多
文摘Dropout and other feature noising schemes have shown promise in controlling over-fitting by artificially corrupting the training data. Though extensive studies have been performed for generalized linear models, little has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for both linear SVMs and the nonlinear extension with latent representation learning. For linear SVMs, to deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re- weighted least square problem, where the re-weights are analytically updated. For nonlinear latent SVMs, we con- sider learning one layer of latent representations in SVMs and extend the data augmentation technique in conjunction with first-order Taylor-expansion to deal with the intractable expected hinge loss and the nonlinearity of latent representa- tions. Finally, we apply the similar data augmentation ideas to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions, and we further develop a non-linear extension of logistic regression by incorporating one layer of latent representations. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs.
文摘目的舰船舷号检测识别是海面态势感知的关键技术,精准的舷号检测识别对海洋权益保护具有重要意义。但目前没有公开数据提供支持。为此,本文先构建了一个真实场景下的稀疏舰船舷号数据集(sparse ship hull number dataset in real scene,SSHN-RS),包含3004幅舰船图像,共计11328个舷号字符,覆盖了多国、各类、水平、倾斜、背景简单、背景复杂、光线不佳和被遮挡的舰船舷号样本,是一个具有挑战性的数据集。基于SSHN-RS,开展舰船舷号检测识别研究,其主要难点在于:1)样本稀疏,模型容易过拟合;2)舷号字符分布密集,网络难以充分提取各字符特征;3)部分字符存在嵌套区域和相似区域,网络会识别出大量冗余结果。针对上述难点,提出了一种基于多视角渐进式上下文解耦的舰船舷号检测识别算法。方法首先,引入一个固定中心和最大化面积的随机透视变换技术,在不增加样本数量的前提下扩充舷号姿态,实现了数据增广,提升了模型的泛化能力;其次,提出了一个渐进式上下文解耦技术,先通过依次擦除舷号各字符生成一系列新样本,再利用特征提取网络提取和融合各样本的多尺度特征,不仅减少字符上下文信息对特征学习的干扰,而且再次增广了数据;最后,在测试阶段,提出了一个掩码间扰动抑制技术,先根据预测结果采用与渐进式上下文解耦技术类似的方法生成新样本并重新进行预测,再引入一个1维非极大值抑制技术去除预测结果中错误的冗余字符,输出最佳检测识别结果,进一步优化网络性能。结果在SSHN-RS上采用主流实例分割算法进行定性和定量评估。在定量评估上,本文算法舷号的检测精确率、召回率、F值和识别率分别可达0.9854,0.9576,0.9713,0.9018,均优于其他算法。相比指标排名第2的算法,分别提高了4.51%,3.45%,3.97%,8.83%;在定性评估上,本文算法更适合舰船舷号检测识别任务,检测识别性能更高。此外,本文算法可以泛化到其他实例分割算法中,以经典算法Mask RCNN(mask region based convolutional neural network)为例,加入本文算法各模块后,各指标分别提升了9.82%,6.04%,7.80%,6.73%。结论本文算法可以解决舷号检测识别任务中因样本稀疏、舷号分布密集、部分字符存在嵌套和相似性带来的问题,在主观和客观上均取得了最先进的性能,并且具有通用性。SSHN-RS可通过https://github.com/Bingchuan897/SSHN-RS获取。