Canterbury corpus

The Canterbury Corpus is a collection of texts commonly used in the field of linguistics, particularly in studies related to language modeling, text analysis, and natural language processing. It comprises a variety of written texts that are representative of different styles, genres, and forms of literature. The corpus was originally compiled by researchers at the University of Kent at Canterbury as a resource for linguistic analysis and is often used for tasks such as testing algorithms for text generation, machine translation, and lexical studies.

 Ancestors (6)

Data compression
Information theory
Applied mathematics
Fields of mathematics
Mathematics
 Home