IEEE Transactions on Circuits and Systems for Video Technology
Tongkun Guan, Chaochen Gu, Changsheng Lu, Jingzheng Tu, Qi Feng, Kaijie Wu, Xinping Guan
Detecting the marking characters of industrial metal parts remains challenging due to low visual contrast, uneven illumination, corroded surfaces, and cluttered background of metal part images. Affected by these factors, bounding boxes generated by most existing methods could not locate low-contrast text areas very well. In this paper, we propose a refined feature-attentive network (RFN) to solve the inaccurate localization problem. Specifically, we first design a parallel feature integration mechanism to construct an adaptive feature representation from multi-resolution features, which enhances the perception of multi-scale texts at each scale-specific level to generate a high-quality attention map. Then, an attentive proposal refinement module is developed by the attention map to rectify the location deviation of candidate boxes. Besides, a re-scoring mechanism is designed to select text boxes with the best rectified location. To promote the research towards industrial scene text detection, we contribute two industrial scene text datasets, including a total of 102156 images and 1948809 text instances with various character structures and metal parts. Extensive experiments on our dataset and four public datasets demonstrate that our proposed method achieves the state-of-the-art performance. Both code and dataset are available at: https://github.com/TongkunGuan/RFN.