The main publication venues are ACL, NACCL, EMNLP, TACL, EACL, CoNLL, and CL. All the paper from these publications can be found in the ACL Anthology. In addition, NLP publications often appear in ML and AI conferences, including ICML, NIPS, ICLR, AAAI, IJCAI. A calendar of NLP events is available here, and ACL sponsored events are listed here.
Both parsing corpora below (PTB and UD) contain POS tags. Each parse tree contains POS tags for all leaf nodes. You can view a sample of the PTB in NLTK:
>> import nltk >> print ' '.join(map(lambda x: '/'.join(x), nltk.corpus.treebank.tagged_sents())) Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./. >> print ' '.join(map(lambda x: '/'.join(x), nltk.corpus.treebank.tagged_sents(tagset='universal'))) Pierre/NOUN Vinken/NOUN ,/. 61/NUM years/NOUN old/ADJ ,/. will/VERB join/VERB the/DET board/NOUN as/ADP a/DET nonexecutive/ADJ director/NOUN Nov./NOUN 29/NUM ./.
The CoNLL 2002 shared task is available in NLTK:
>> import nltk >> len(nltk.corpus.conll2002.iob_sents()) 35651 >> len(nltk.corpus.conll2002.iob_words()) 678377 >> print ' '.join(map(lambda x: x + '/' + x, nltk.corpus.conll2002.iob_sents())) Sao/B-LOC Paulo/I-LOC (/O Brasil/B-LOC )/O ,/O 23/O may/O (/O EFECOM/B-ORG )/O ./O
CoNLL 2002 is annotated with the IOB annotation scheme and multiple entity types.
The Universal Dependencies (UD) project is publicly available online. The website includes statistics for all annotated languages. You can easily download v1.3 from here. UD files follow the simple CoNLL-U format.
The Penn Treebank is available from the LDC You will find tgrep useful for quickly searching the corpus for patterns. NLTK can also be used to load parse trees. A few more browsers are available online.
The WMT shared task from 2016 is a good source for newswire bi-text.
TE has been studied extensively for more than a decade now. Recently, SNLI has been receiving significant attention.
We will look at three data sets commonly used for semantic parsing:
If you encounter an interesting demo or system not listed here, please email the course instructor.
Deep Learning frameworks and tools: