Extracting Textual Information from Images and Videos for Automatic Content-Based Annotation and Retrieval

One way to utilize semantic knowledge for annotating databases of digital images and videos is to use the textual information which is present. Usually, it provides important information about the content and is a very good entity for queries based on keywords. In this context, the extraction of sce...

Full description

Saved in:
Bibliographic Details
Main Author: Gllavata, Julinda
Contributors: Freisleben, Bernd (Prof.Dr.) (Thesis advisor)
Format: Dissertation
Language:English
Published: Philipps-Universität Marburg 2007
Mathematik und Informatik
Subjects:
Online Access:PDF Full Text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:One way to utilize semantic knowledge for annotating databases of digital images and videos is to use the textual information which is present. Usually, it provides important information about the content and is a very good entity for queries based on keywords. In this context, the extraction of scene and artificial text from images and videos is an important research problem, with the aim of achieving automatic content-based retrieval and summarization of the visual information. The process of text extraction includes several steps: - Text detection is aimed at identifying image parts containing text. - Text localization merges text regions which belong to the same text candidate and determines the exact text positions. - Text tracking tracks the localized text over successive frames in a video. - Text segmentation and binarization} include the separation of the localized text from the image background. The output of this step is a binary image where black text characters appear on a white background. - Character recognition performs optical character recognition (OCR) on the binarized image and converts the binarized image to ASCII text. In this thesis, a robust system for automatically extracting text appearing in images and videos with complex background is presented. Different algorithms are proposed addressing solutions to different steps of the text extraction process mentioned above. The system can operate on JPEG images and MPEG-1 videos. The tracking of the text appearing in videos is also addressed and a novel algorithm is presented. Individual and comparative experimental results demonstrate the performance of the proposed algorithms for the main processing steps: text detection, localization and segmentation, and in particular, their combination. Text in images or videos can appear in different scripts, such as Latin, Ideographic, Arabic, etc. The identification of the used script can help in improving the segmentation results and in increasing the accuracy of OCR by choosing the appropriate algorithms. Thus, a novel technique for script recognition in complex images is presented. Content-based media retrieval has received a lot of attention during the last years and query by example is the most used methodology. In this context, it may be of interest to search for images of video frames where a text visually similar with the input text image appears. Thus, a novel technique that deals with the holistic comparison of text images is proposed. Recently, relevance feedback methods have attracted researchers due to the possibility they offer to interact with the user to increase the performance of a content-based image retrieval (CBIR) system. However, due to the increasing number of images and the need of the user to explore the media before taking a decision, the employment of techniques to visualize or browse a collection of images is becoming important. Consequently, several visualization/browsing methods are proposed to facilitate the interactive exploratory analysis of large image data sets and assist the user during the semantic search.
DOI:https://doi.org/10.17192/z2007.0107