哈尔滨理工大学 测控技术与通信工程学院,哈尔滨 150080
School of Measurement and Control Technologe and Communication Engineering,Harbin University of Science and Technology,Harbin 150080,China
针对自然场景中文字符检测率低、小字符检测困难以及字符检测类别多样等问题,提出一种基于YOLOv2的改进方法,并将其应用于自然场景中文字符检测中.首先利用k-means++聚类算法对字符目标候选框(anchor)的数量和宽高比维度进行聚类分析,提出多层特征融合策略,对原网络中第4个最大池化层前所输出的特征图经过3times3和1times1大小的卷积核进行卷积操作,并执行4倍的下采样得到局部特征;然后对第5个最大池化层前所输出的特征图经过3times3和1times1大小的卷积核进行卷积操作,并执行2倍的下采样得到局部特征,将局部特征与全局特征融合,同时增加高层卷积中的重复卷积层,将高层卷积中连续且重复的3times3times1024大小的卷积层数由3增加为5;最后使用Chinese text in the wild(CTW)数据集对YOLOv2和改进的YOLOv2算法进行对比实验,结果表明,改进后的YOLOv2算法在中文字符检测中平均准确率均值为78.3%,较原YOLOv2算法提升了7.3%,且明显高于其他自然场景中的文字符检测方法.
This paper proposes an improved method based on YOLOv2 to solve the problems of low Chinese character detection rate, difficulty in small character detection and various character detection categories in natural scenes, and applies it to Chinese character detection in natural scenes. Firstly, k-means++ clustering algorithm is used to cluster the number and aspect ratio of character target candidate boxes (anchors). Then the multi-layer feature fusion strategy is proposed, the feature map output before the fourth maxpooling pooling layer in the original network is convolved with 3times3 and 1times1 convolution kernels and 4 times downsampling is performed to obtain local features, and the feature map output before the fifth maxpooling pooling layer in the original network is convolved with 3times3 and 1times1 convolution kernels and 2 times downsampling is performed to obtain local features. At the same time, repeat convolution layers in high-level convolution are added, and the number of continuous and repeated 3times3times1024 convolution layers in high-level convolution is increased from 3 to 5. Finally, the Chinese text in the wild (CTW) data set is used to compare the YOLOv2 algorithm with the improved one. The experimental results show that the improved YOLOv2 algorithm has a mean average precision (mean average precision, mAP) of 78.3% in Chinese character detection, which is 7.3% higher than mAP value of the original YOLOv2 algorithm, and is significantly higher than the one of other Chinese character detection methods in natural scenes.