SIGDAT, the Association for Computational Linguistics' special interest group on linguistic data and corpus-based approaches to NLP, invites submissions to EMNLP 2001. The conference will be held at Carnegie Mellon University, Pittsburgh, PA USA on June 3 and 4, immediately preceding the meeting of the North American Chapter of the ACL (NAACL 2001).
We are interested in papers from academia, government, and industry on all areas of traditional interest to the SIGDAT community and aligned fields, including but not limited to:
Also, to encourage reflection on the current state of the art in corpus-based methods, the conference will have the following theme:
Successes --- We solicit papers showing the success of empirical methods in and across application settings. Examples include improvements in information retrieval performance due to employing language modeling techniques; effective use of statistical word segmentation algorithms in machine translation systems; and increased speech recognition accuracy through the incorporation of statistical parsing.
Challenges --- It is clear that empirical and corpus-based methods have enjoyed many successes over the past years; but in looking to future accomplishments, the community needs to be aware of the limitations of various techniques and paradigms. We welcome papers that carefully expose and study such limitations. Examples include the identification and exploration of: classes of domains or problems in which popular techniques perform poorly; significant gaps between human and machine performance on tasks where statistical approaches have made great progress; and important practical situations where common assumptions fail to hold. We emphasize that we seek submissions that thoughtfully document fundamental limitations, rather than simply report on unsuccessful experiments. It is desired that such papers contain thorough examination, via careful experimentation, of the critical factors contributing to the "negative" result.