Distributional Clustering of English Words.
Fernando Pereira, Naftali Tishby, and Lillian Lee
Proceedings of the 31st ACL, pp 183--190, 1993.

Abstract: We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical ``soft'' clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

Paper formats : ps, pdf, other

BibTeX entry:

@inproceedings{Pereira+Tishby+Lee:93a,
  author =	 {Fernando Pereira and Naftali Tishby and Lillian Lee},
  title =	 {Distributional Clustering of {E}nglish Words},
  booktitle = 	 "31st Annual Meeting of the ACL",
  year = 	 1993,
  pages =	 {183-190}
}


Back links: Lillian Lee's home page or papers page; Cornell NLP page.