3, 2000, 1:00 PM
Labs, Lucent Technologies Inc.
Noisy Data Using Noisy Queries
technologies for speech, handwriting, and printed character recognition
become more prevalent and less obtrusive, one can imagine situations
where they will be applied entirely in the background, without
imposing on the user, for the purposes of indexing and retrieval.
This scenario, however, raises the issue of coping with undetected,
uncorrected recognition errors. Consider the problem of querying
via voice a database that was created from faxed documents. To
accomplish this task, we must contend with ASR errors from the
speech recognition process, a completely different class of errors
from the OCR process, and the issue of judging the similarity
between spoken and printed keywords. In this talk, I'll describe
a new formalism, known as cross- domain approximate string matching,
for resolving these disparate constraints. We have formulated
this in terms of an optimization problem and developed a polynomial
time algorithm for its solution, along with several variations.
I'll conclude by presenting the results of a recent experiment
showing how cross-domain string matching can improve the effectiveness
of retrieval when searching a database of scanned, OCR'ed documents
using handwritten queries. (This is joint work with Gordon Wilfong,
also of Bell Labs.)