Document Image Binarization

Doctoral Dissertation uoadl:1308846 877 Read counter

Unit:
Τομέας Επικοινωνιών και Επεξεργασίας Σήματος
Library of the School of Science
Deposit date:
2013-04-16
Year:
2013
Author:
Ντιρογιάννης Κωνσταντίνος
Dissertation committee:
Σέργιος Θεοδωρίδης Καθηγ. (Επιβλέπων), Βασίλειος Γάτος Ερευνητής Α΄ΕΚΕΦΕ <<Δημόκριτος>>, Γεώργιος Κουρουπέτρογλου Αναπλ. Καθηγ., Σταύρος Περαντώνης Ερευνητής Α΄ΕΚΕΦΕ <<Δημόκριτος>> , Νικόλαος Παπαμάρκος Καθηγ. ΔΠΘ, Ιωάννης Πρατικάκης Επίκ. Καθηγ.ΔΠΘ, Αλέξανδρος Ελευθεριάδης Αναπλ. Καθηγ.
Original Title:
Δυαδική Μετατροπή Εικόνων Κειμένου
Languages:
Greek
Translated title:
Document Image Binarization
Summary:
This thesis is focused on document image binarization, including binarization
techniques and evaluation methodologies. Specifically, a performance evaluation
methodology was developed that makes use of the skeleton of the characters.
Afterwards, the aforementioned methodology was improved to produce more
reliable ground truth images, while several different evaluation measures were
studied during the development of the new measures. The new measures are based
on (a) weights that start from the ground truth contour and (b) the local
stroke width. Experimental results prove the effectiveness of the new measures
for document images. Concerning the binarization techniques, an existing
technique was improved to offer better results for documents with fonts of
various sizes and better faint character detection. In order to be more robust
against different degradation types, a new binarization technique was developed
that was based on background estimation and on the combination of selected
global and local binarization techniques. Additionally, a binarization
technique was developed for the binarization of the text areas captured from
video content. Furthermore, through the document image binarization contests
that we organized, a publicly available benchmark was created that aids in the
development of document image binarization techniques and evaluation
methodologies.
Keywords:
Pre-processing, Binarization, Evaluation metrics, Ground-truth image, Historical document image processing
Index:
Yes
Number of index pages:
17-35
Contains images:
Yes
Number of references:
157
Number of pages:
199
document.pdf (19 MB) Open in new window