LAMP Seminar
Language and Media Processing Laboratory
Conference Room 4406
A.V. Williams Building
University of Maryland

Nov 15, 3 PM
Multi-Lingual Page Readers

Henry S. Baird
Lucent Technologies


Perhaps the greatest versatility demonstrable today by computer vision technology occurs in the interpretation of images of unconstrained multilingual machine-printed documents. I review the state of the art with particular attention to the analysis of complex unoriented page layouts, classification of symbols, and exploitation of linguistic contexts. Classifiers, to be useful in this domain, must generalize strongly across a wide range of writing systems, typefaces, and image degradations. The talk is illustrated with Chinese, English, Hebrew, Japanese, Korean, Swedish, Russian, Thai, Tibetan, and turkish examples, plus a Russian-English dictionary. I conduct a brief tour of a software system architeture that allows our largely language-independent page-reader to be rapidly retargeted to new languages. (This is joint work with David Ittner, Tin kam Ho, Craig Nohl, Dar-Shyang Lee, and others.)
Henry S. Baird is a Member of Technical Staff, Lucent Technologies, Murray Hill, New Jersey. His research focuses on the design and analysis of algorithms for machine vision with emphasis on the interpretation of images of printed documents. He is an Area Editor for the journal Computer Vision and Image Understanding. During 1989-91, he was an Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He was chair of the 1996 Symposium on Document Analysis and Information Retrieval, and was principal organizer of the 1990 IAPR Workshop on Syntactic and Structural Pattern Recognition. His Princeton University Ph.D. thesis on algorithms for image matching won a 1984 ACM Distinguished Dissertation Award and was published by the MIT Press. In 1976, his Master's thesis gave the first complete description of the sweep-line algorithm, now seen as a fundamental technique in computational geometry. He is a Fellow of the IAPR, a senior member of the IEEE, and a member of the ACM.

