Seminar: ASIndex: A Structure For String Search Using ngrams and Algebraic Signatures

AS-Index is a new data structure for string search in database files based on pattern matching. Its defining property is the use of algebraic signatures (AS) for indexing all or some n-grams in the database. The AS-Index construction preprocesses each record in linear time and adds n-grams references to an AS-based two-dimensional look-up data structure. Searches use this data structure and perform in essentially constant time. We discuss the construction and use of the index. We show how to trade storage overhead from several times the file size to less than that for the file for a slow down in search performance. We present the performance analysis and discuss related work. We conclude that AS-Index should be a practical proposal that offers faster search performance or lower indexing overhead than its competitors at present.

This is joint work with Cedric du Mouza, Philippe Rigaux, and Thomas Schwarz.

When:
Wednesday, April 23, 2008 at 12:45 PM

Where:
E2-599

CRSS Contact:
Long, Darrell D. E.

Streaming video is available for this event.

Last modified 24 May 2019