Dissertation Defense: Jason Soo
Candidate Name: Jason Soo
Major: Computer Science
Advisor: Ophir Frieder, Ph.D.
Title: Search in Adverse Environments
Today, search is a ubiquitous task. This task often carries the expectation that relevant results shall be returned within the first 10 documents. While the advent of modern online search engines have created such expectations, there exist environments in which such approaches are not omnipotent. These environments are defined by their lack of vital resources, such as the Internet, query logs, user models, and refined algorithms. This amalgam of resources is the keystone of the modern search systems. Without these resources, systemic error rates become intractable, and a novel, customized approach is required.
Frequently, adverse environments host information of great value. For example, medical records, personal information, historical documents, or national security data. These collections often contain error introduced by user error, systematically (for example, by an Optical Character Recognition process), or both. Accounting for such errors, and persevering to retrieving relevant documents, is the focus of my research.
I assert a solution effectively considering both the terms context and substring features can yield superior results with minimal external dependencies when searching such adverse conditions.
In this paper, I present my solution for searching corrupted document collections in adverse environments. My solution — Segments — is a language independent, domain independent, unsupervised approach that I experimentally show is either as good or better than the prior art, state–of–the–art, and commonly deployed solutions. Segments achieves its results by analyzing context and substring features of corrupted terms. The approach described is in use within the Archives Section of the United States Holocaust Memorial Museum to search multilingual collections with sparse query logs.
This dissertation is dedicated to describing my experimental results, and demonstrating both the strength, and drawbacks that Segments has to offer for real world deployments.
Monday, February 22, 2016 at 3:00pm to 5:00pm
St. Mary's Hall, 326
3700 Reservoir Road, N.W., Washington