structures can provide great deal of information in determining
which documents are relevant to a given query. Structure-based
matching enhances existing content-based matching and information
retrieval capabilities, and provide an effective way to quickly
reduce candidate documents for similarity matching using document
I will present underlying techniques for structure-based retrieval,
such as document structure analysis, representation, similarity
matching, and classification, described in the following papers.
Hao, Jason T.L. Wang, Michael P. Bieber, Peter A. Ng, "Heuristic
Classification of Office Documents", International Journal
on Artificial Intelligence Tools, pages 233-265, 1995.
Dengel, Frank Dubiel, "Clustering and Classification of Document
Structure - a Machine Learning Approach", Third International
Conference on Document Analysis and Recognition, Montreal, Canada,
pages 587-591, Aug. 1995.
Hao, Jason T. L. Wang, Peter A. Ng, "Nested Segmentation:
An Approach for Layout Analysis in Document Classification",
Second International Conference on Document Analysis and Recognition,
Tokyo, Japan, pages 319-322, Oct. 1993.
Dengel, "Initial Learning of Document Structure", Second
International Conference on Document Analysis and Recognition,
Tokyo, Japan, pages 86-90, Oct. 1993.