Ahmadyfard A, Tolou Beidokhti M A. Removing Geometrical Distortion of Documents using Geometry of text lines. JSDP. 2017; 14 (2) :141-158
Associated Professor Shahrood University
Document images produced by scanner or digital camera, usually have geometric and photometric distortions. Existence of either type of distortion, deteriorate the performance of OCR systems. In this paper, we present a novel method to eliminate geometric distortion from document images. In the proposed method to eliminate the geometric distortion,first text lines are extracted from image, then each line is broken into equal-width columns. For each extracted segment from a line, its direction is corrected in such a way that the segment lies in horizontal direction. For this aim, for each different rotation of text segment, horizontal projection of its image is calculated and rotation which causes maximum of projection is considered as corrected direction of that segment and based on this, for each line segment parallel to horizon, a reference point, which is introduced as base direction, is extracted. Using reference points of each line segment, a polynomial is fitted to the text line. At the end, geometric distortion of each part of a text line is eliminated using a perspective transform which is estimated based on the extracted polynomial function. To increase the stability of the proposed method for short text lines, the curve fitting is performed using reference information for adjacent long lines. The proposed method is implemented on Persian and English databases and has been compared with the existing methods. The results indicate the efficiency and accuracy of the proposed method in elimination of geometric distortions.

Type of Study: Research | Subject: Paper
Received: 2015/08/22 | Accepted: 2017/03/5 | Published: 2017/10/21 | ePublished: 2017/10/21

