
This directory contains the text obtained be applying 
Optical Character Recognition to the NIPS collection.

The files are in an XML-type format.
Tags are included for pages, columns, regions, paragraphs, lines, 
and words. The words are annotated by the coordinates of the
containing rectangle on the page.  Those coordinate are
(unfortunately) relative to an origin at the top left of the
page (whereas all of the DjVu software assumes a coordinate
origin at the lower left corner). 

The directory structure and file names are
identical to the DjVu counterparts, except
that the .djvu extensions were changes to .lsp

  -- Yann LeCun
     http://www.research.att.com/~yann
