Extracting Textual Information from Images and Videos for Automatic Content-Based Annotation and Retrieval
Gllavata, Julinda
One way to utilize semantic knowledge for annotating databases of digital images and videos is to use the textual information which is present. Usually, it provides important information about the content and is a very good entity for queries based on keywords. In this context, the extraction of scene
and artificial text from images and videos is an important research problem, with the aim of achieving automatic content-based retrieval and summarization of the visual information.
The process of text extraction includes several steps:
- Text detection is aimed at identifying image parts containing text.
- Text localization merges text regions which belong to the same text candidate and determines the exact text positions.
- Text tracking tracks the localized text over successive frames in a video.
- Text segmentation and binarization} include the separation of the localized text from the image background. The output of this step is a binary image where black text characters appear on a white background.
- Character recognition performs optical character recognition (OCR) on the binarized image and converts the binarized image to ASCII text.
In this thesis, a robust system for automatically extracting text appearing in images and videos with complex background is presented. Different algorithms are proposed addressing solutions to different steps of the text extraction process mentioned above. The system can operate on JPEG images and MPEG-1 videos. The tracking of the text appearing in videos is also addressed and a novel algorithm is presented. Individual and comparative experimental results demonstrate the performance of the proposed algorithms for the main processing steps: text detection, localization and segmentation, and in particular, their combination.
Text in images or videos can appear in different scripts, such as Latin, Ideographic, Arabic, etc. The identification of the used script can help in improving the segmentation results and in increasing the accuracy of OCR by choosing the appropriate algorithms. Thus, a novel technique for script recognition in complex images is presented.
Content-based media retrieval has received a lot of attention during the last years and query by example is the most used methodology. In this context, it may be of interest to search for images of video frames where a text visually similar with the input text image appears. Thus, a novel technique that deals with the holistic comparison of text images is proposed. Recently, relevance feedback methods have attracted researchers due to the possibility they offer to interact with the user to increase the performance of a content-based image retrieval (CBIR) system. However, due to the increasing number of images and the need of the user to explore the media before taking a decision, the employment of techniques to visualize or browse a collection of images is becoming important. Consequently, several visualization/browsing methods are proposed to facilitate the interactive exploratory analysis of large image data sets and assist the user during the semantic search.
Philipps-Universität Marburg
Data processing Computer science
urn:nbn:de:hebis:04-z2007-01071
https://doi.org/10.17192/z2007.0107
opus:1618
2011-08-10
urn:nbn:de:hebis:04-z2007-01071
Text Segmentation
Indexierung [Inhaltserschliessung]
Extrahierung der textuellen Information aus Bilder und Video Beständen zwecks automatischen inhaltsbasierten Abfragesystemen
Die in digitalen Bildern und Videos vorhandene textuelle Information bietet eine hervorragende Möglichkeit, um semantisches Wissen in den Prozess der Indexierung von Bild- und Videodatenbeständen einfließen zu lassen. Die Verbindung dieser Information mit dem Inhalt der digitalen Medien ermöglicht wortbasierte Abfragen, die diese textuelle Information ausnutzen. Deshalb ist die Textextraktion aus Bildern und Videos im Rahmen von automatischen inhaltsbasierten Suchsystemen von großer Bedeutung.
Die Textextraktion aus Bildern und Videos besteht aus folgenden Schritten.
- Die Textdetektion definiert den Prozess der Identifizierung der Regionen in Bildern, in denen Text erscheint.
- Die Textlokalisierung baut auf der Textdetektion auf und verschmilzt die gleichem Text zugehörigen Regionen zwecks Bestimmung der exakten Textposition.
- Die Textverfolgung in Videos realisiert die Verfolgung von zuvor lokali- siertem Text über mehrere aufeinander folgende Einzelbilder hinweg.
- Die Textsegmentierung und Textbinärisierung} ist der Prozess der Trennung der Textpixel und Hintergrundpixel. Die Ausgabe dieses Schritts ist ein binäres Bild, in dem die Zeichen schwarz auf einem weißen Hintergrund erscheinen.
- Die Zeichenerkennung verfolgt das Ziel der Extraktion von ASCII-Text aus einem binären Bild mittels optischer Zeichenerkennung.
Diese Arbeit stellt ein robustes System für die automatische Extraktion von Text in Bildern und Videos vor. Verschiedene Algorithmen werden für jedes der oben genannten Probleme präsentiert. Das System kann sowohl mit JPEG Bildern als auch mit MPEG-1 Videos arbeiten. Die experimentellen Ergebnisse dokumentieren die Güte der einzelnen Schritte und deren Kombination.
Da Text in Bildern in unterschiedlichen Schriften (z. B. ideographische Schrift oder lateinische Schrift) erscheinen kann, ermöglicht die vorherige Erkennung der Schrift eine bessere Textsegmentierung oder Texterkennung. Für diesen Zweck wird eine Methode zur Schrifterkennung in Bildern mit komplexem Hintergrund vorgestellt.
Des Weiteren ist eine neue Methode entwickelt worden, um den holistischen Vergleich zwischen Textbildern zu ermöglichen. Im Rahmen der inhaltsbasierten Suche sind solche Ansätze von Interesse, um die Suche nach Bildern mit ähnlichen Textvorkommen zu vereinfachen. Außerdem gewinnt die Suche anhand von Beispielen im Rahmen von inhaltsbasierter Suche zunehmend an Bedeutung. Seit Kurzem sind Relevanz-Feedback-Verfahren in den Blickpunkt des Interesses gerückt, da sie Benutzern die Möglichkeit bieten, mit dem System zu interagieren. Darüber hinaus wächst der Bedarf für Methoden zur Visualisierung und Exploration („Browsing“) von Bilddatenbeständen, begründet durch deren zunehmende Größe und dem daraus resultierenden Benutzerinteresse, schnell und einfach diese großen Bestände durchsuchen zu können. Daher werden neue Methoden vorgeschlagen, die den Benutzer während dieses semantischen Suchprozesses unterstützen.
https://doi.org/10.17192/z2007.0107
Text Detektion and Lokalisierung
Text Extraction
235
application/pdf
2007-02-05
Mathematik und Informatik
One way to utilize semantic knowledge for annotating databases of digital images and videos is to use the textual information which is present. Usually, it provides important information about the content and is a very good entity for queries based on keywords. In this context, the extraction of scene
and artificial text from images and videos is an important research problem, with the aim of achieving automatic content-based retrieval and summarization of the visual information.
The process of text extraction includes several steps:
- Text detection is aimed at identifying image parts containing text.
- Text localization merges text regions which belong to the same text candidate and determines the exact text positions.
- Text tracking tracks the localized text over successive frames in a video.
- Text segmentation and binarization} include the separation of the localized text from the image background. The output of this step is a binary image where black text characters appear on a white background.
- Character recognition performs optical character recognition (OCR) on the binarized image and converts the binarized image to ASCII text.
In this thesis, a robust system for automatically extracting text appearing in images and videos with complex background is presented. Different algorithms are proposed addressing solutions to different steps of the text extraction process mentioned above. The system can operate on JPEG images and MPEG-1 videos. The tracking of the text appearing in videos is also addressed and a novel algorithm is presented. Individual and comparative experimental results demonstrate the performance of the proposed algorithms for the main processing steps: text detection, localization and segmentation, and in particular, their combination.
Text in images or videos can appear in different scripts, such as Latin, Ideographic, Arabic, etc. The identification of the used script can help in improving the segmentation results and in increasing the accuracy of OCR by choosing the appropriate algorithms. Thus, a novel technique for script recognition in complex images is presented.
Content-based media retrieval has received a lot of attention during the last years and query by example is the most used methodology. In this context, it may be of interest to search for images of video frames where a text visually similar with the input text image appears. Thus, a novel technique that deals with the holistic comparison of text images is proposed. Recently, relevance feedback methods have attracted researchers due to the possibility they offer to interact with the user to increase the performance of a content-based image retrieval (CBIR) system. However, due to the increasing number of images and the need of the user to explore the media before taking a decision, the employment of techniques to visualize or browse a collection of images is becoming important. Consequently, several visualization/browsing methods are proposed to facilitate the interactive exploratory analysis of large image data sets and assist the user during the semantic search.
Philipps-Universität Marburg
Multimedia Indexierung
ths
Prof.Dr.
Freisleben
Bernd
Freisleben, Bernd (Prof.Dr.)
ppn:185729150
English
Text Extrahierung
Bildverarbeitung
https://archiv.ub.uni-marburg.de/diss/z2007/0107/cover.png
2007
Data processing Computer science
Informatik
Text Segmentierung
Data Mining
monograph
Extracting Textual Information from Images and Videos for Automatic Content-Based Annotation and Retrieval
Multimedia
Clustering
Cluster-Analyse
doctoralThesis
Clustering
Publikationsserver der Universitätsbibliothek Marburg
Universitätsbibliothek Marburg
Text Detection and Localization
Gllavata, Julinda
Gllavata
Julinda
opus:1618
Fachbereich Mathematik und Informatik
2007-04-12
Multimedia Indexing
PRESERVATION_MASTER
VIEW
Image
PRESERVATION_MASTER