LAMP Seminar
Language and Media Processing Laboratory
Conference Room 4424
A.V. Williams Building
University of Maryland

December 1, 2000, 1:00 PM
A. Antonacopoulos

University of Liverpool
Text Extraction from WWW Images


There is a significant need to analyse the text in images on WWW pages, both for effective indexing and for presentation by non-visual means (e.g. audio). This talk argues that the extraction of text from such images benefits from an athropocentric approach to the distinction between colour regions. The method described here is part of a systematic approach to approximate the human colour perception characteristics for the identification of character regions. In this instance, the image is decomposed by performing histogram analysis of Hue and Luminance in the HLS colour space and merging using a wavelength-luminance colour representation.

