This chapter continues to present programming concepts by example, in the context of a linguistic processing task.

Often there is insufficient government or industrial support for developing language resources, and individual efforts are piecemeal and hard to discover or re-use.Sometimes these categories overlap, notably in the case of topical categories as a text can be relevant to more than one topic.Occasionally, text collections have temporal structure, news collections being the most common example.: Common Structures for Text Corpora: The simplest kind of corpus is a collection of isolated texts with no particular organization; some corpora are structured into categories like genre (Brown Corpus); some categorizations overlap, such as topic categories (Reuters Corpus); other corpora represent language use over time (Inaugural Address Corpus).NLTK's corpus readers support efficient access to a variety of corpora, and can be used to work with new corpora.

As just mentioned, a text corpus is a large body of text.

