The BioText Project

University of California, Berkeley

The Layered Query Language

Searching MEDLINE Using Annotation Layers

As natural language processing (NLP) algorithms become ever more successful, methods are needed for conveniently re-using their results, both for additional processing, and for end applications such as text mining and information retrieval. We have developed the Layered Query Language (LQL) and a system architecture that supports queries over layers of annotation on natural language text. The model allows for both hierarchical and overlapping layers and for querying at multiple levels of description. The implementation is built on top of a standard RDBMS, and, by using carefully constructed indexes, can execute complex queries over very large collections.

More information:
Two short papers on the system and LQL:
  • Supporting Annotation Layers for Natural Language Processing, Preslav Nakov, Ariel Schwartz, Brian Wolf, and Marti Hearst, in ACL 2005 Poster/Demo Track   pdf
  • Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing, Preslav Nakov, Ariel Schwartz, Brian Wolf, and Marti Hearst, in ACL/ISMB BioLINK SIG 2005   pdf