The conference will be spread over three days November 2-4, 2005. Each session will have a government representative discussing programs or needs of the focused technology. Thursday afternoon will be devoted to abstracts and demonstrations. The Friday agenda will have additional talks and government discussions on potential collaboration and funding sources.

2005 Symposium on Document Image Understanding Technology
The Marriott Inn and Conference Center,
University Maryland University College, Adelphi, Maryland
November 2-4, 2005

Tentative Agenda

  Wednesday November 2nd

9:00 Welcome

9:15 Session 1: Document Analysis Systems

A Flexible Experimentation and Configuration Platform for Multilingual HHM OCR, E. MacRostie and P. Natarajan (BBN Technologies)

Document Information Processing: Towards Software and Test Collections, G. Agam, S. Argamon, O. Frieder, D. Grossman and D. Lewis* ( Illinois Institute of Technology , David D. Lewis Consulting*)

DOCLIB: A Document Processing Research Tool, K. Chen*, S. Jaeger, G. Zhu and D. Doermann ( University of Maryland , Booz Allen Hamilton*)

A Process Flow for Realizing High Accuracy for OCR Text, K. Taghza, J. Borsack and T. Nartker ( University of Nevada , Las Vegas )

10:35 Break

11:00 Session 2: Document Processing and Enhancement

Government Briefing: The NSA/HLT Program - Barb Wheatley (NSA)

A Document Image Enhancement Module: Perspective Warp Correction, C. Monnier and V. Ablavsky*, S. Holden and M. Snorrason (Charles River Analytics Inc, Boston University *)

Document Recognition and Translation with Handheld Mobile Devices, E.D. Haritaoglu and I. Haritaoglu (Polar Rain, Inc.)

12:00 Lunch

1:00 KEYNOTE SPEAKER: Google Print, Luc Vincent (Google, Inc.)

2:15 Session 3: Processing of Arabic Documents

A System for Discriminating Handwriting from Machine Print on Noisy Arabic Documents, J. C. Femiani, M. Phielipp and A. Razdan ( Arizona State University )

NLP-Enhanced Exploitation of Degraded Arabic Texts, M. Shalev (Encyclopaedia Britannica)

Initial Results in Offline Arabic Handwriting Recognition Using Large-Scale Geometric Features, I. Zavorin, E. Borovikov and M. Turner (CACI International Inc.)

Classification of Machine Print and Handwriting in Mixed Arabic Documents, K. Sridharan, F. Farooq and V. Govindaraju (CEDAR, SUNY at Buffalo )

3:35 Break

4:05 Session 4: Extraction and Information Retrieval

Government Briefing: The Sequoyah Foreign Language Translation System, Edward A. Cerutti (US Army)

Accurate Document Categorization of OCR Generated Text, R. Price* and A. Zukas, (Science Applications International Corp., Content Analyst Co., LLC*)

Effect of Degraded Input on Statistical Machine Translation, F. Farooq and Y. Al-Onaizan* (CEDAR, SUNY at Buffalo , IBM T.J. Watson Research Center *)

Mobile Interactive Support System for Time-Critical Document Exploitation, G. Nagy and D. Lopresti* (Rensselaer Polytechnic Institute, Lehigh University*)

Thursday November 3nd

9:00 Session 5: Arabic Document Analysis

Government Briefing: The LASER ATDC Program, Luis Hernandez and Melissa Holland (Army Research Laboratory)

Handwritten Arabic Word Spotting Using the CEDARABIC Document Analysis System, S. Srihari, H. Srinivasan, P. Babu and C. Bhole (CEDAR, University of Buffalo , State University of New York )

Word Spotting in Arabic Script Documents, A.L. Spitz and J. Yaghi (DocRec, Ltd.)

Challenges in Adapting Machine-Print Arabic OCR for Handwriting, S.G. Schlosser and R.C. Vogt (NovoDynamics, Inc)

10:20 Break

10:50 Session 6: Cross Domain Applications

Government Briefing: Government Needs at NMEC, Betsi McGrath (NMEC)

Robot Navigation Techniques for Engineering Drawing Analysis, T.C. Henderson and C. Xu ( University of Utah )

Perceptual Organization in Semantic Role Labeling, P. Sarkar, E. Saund ( Palo Alto Research Center )

Pictographic Recognition Technique Applied to Distinctive Characteristics of Handwritten Arabic Text, M. Walch and D. Gantz* (The Gannon Technologies Group, George Mason University *)

12:10 Lunch

1:10 KEYNOTE SPEAKER: EDUCE: Enhanced Digital Unwrapping for Conservation and Exploration of Inaccessible Texts, W. Brent Seales, ( University of Kentucky )

2:15 Session 7: Document Analysis Applications

Government Briefing: Supporting Document Analysis Research at DARPA, Joe Olive (DARPA)

A Document Triage System for Army Application, F. Fisher, K. Marcus, R. Chang and J. Turner (Army Research Laboratory)

Multi-Language Handwriting Derived Biometric Identification, D. Gantz, J. Miller and M.A. Walch* (George Mason, The Gannon Technologies Group*)

Tradeoff Studies about Storage and Retrieval Efficiency of Boundary Data Representations for LLS, TIGER and DLG Data Structures, D. Clutter and P. Bajcsy (NCSA, University of Illinois at Urbana-Champaign)

3:30 Demo Abstracts

ABBYY Software House OCR, Forms Processing and Data Capture Technology Review, I. Valenzula , (ABBY)

VERUS A Middle Eastern Language OCR, Steven G. Schlosser (NovoDynamics, Inc.)

IDG: A Business Information Extraction, Management, and Routing Front-End for Content Management Systems, Vikas Krishna, Savitha Srinivasan, Neil Boyette, Isaac Cheng, Jeffrey Kreulen and Tapas Kanungo (IBM Almaden Research Center)

Fast Skew and Slant Correction for Arabic Written Word or Line, E.M. Zaki and M. El-Adawi (Sakhr Software USA, Inc.)

4:30-7:30 Demos/Reception

Friday November 4th

9:00 10:00

Government Panel and Discussion Document Analysis Needs and Funding Opportunities

10:00 4:00

Workshop on Evaluation of Document Image Processing Technologies

This SDIUT Workshop will bring together and focus experts in the field of document image processing on evaluation issues.  By bringing together experienced researchers with a variety of document image exploitation approaches, we hope to make progress toward enhanced ground truth representation and corresponding evaluation techniques.  Expanding on the character and word accuracy based metrics in current use facilitates a more complete approach to evaluating the performance of established recognition systems.  It also enables component-level evaluation of specialized technologies such as script / language identification, line and word finding, signature detection, handwriting recognition, wordspotting, forms processing and others.  We hope that this session will help define solutions which can be incorporated into future government programs.

The day will be broken into morning and afternoon sessions.  In the morning, researchers are invited to give ten minute briefings that describe their work, highlighting evaluation techniques and issues.  The working lunch and afternoon sessions will feature panel discussions focused on synthesizing information from the morning sessions and identifying strategies for enhanced ground truth representation and evaluation.  Working notes from the workshop will be made available to all participants on the SDIUT WWW page

The workshop is open to all SDIUT attendees. The organizer for this workshop is Jen Doyon of MITRE, whose contact information is listed below. If you have any questions about the workshop, please contact her directly.

Jen Doyon 703-983-3275

Evaluation Workshop Abstracts

Performance Evaluation of Multilingual Document Exploitation Systems , Steven G. Schlosser (NovoDynamics, Inc.)

Issues with Automatic OCR Evaluatin and Metrics, Kristian J. Concepcion

Evaluation Issues in Image Refiner, Kristen Summers and Eugene Borovikov (CACI)

The Sporadic Nature of OCR Evaluations, Tapas Kanungo ( IBM Almaden Research Center )

Ground Truth Representation used in Testing and Optimization of the Optical Word Recognition System, Mike Ladwig (Northrop Grumman Corporation)

Metrics for Word Spotting, Sargur Srihari (CEDAR, University of Buffalo)

Evaluating with Informational Confidence, Stefan Jaeger and David Doermann (University of Maryland)