I have a project which requires a corpus of conversational English in plain text (although I can perform some processing as needed). Since I am a student, I need to find a corpus that is free and downloadable. I would appreciate it if someone could provide suggestions for such corpora.
-
1MICASE might meet your needs. Or it might not. – john lawler in exile Mar 04 '15 at 03:26
-
Do you care for answers to your question? Why don't you do anything (accept, vote, comment)? – Sir Cornflakes Jan 28 '16 at 08:40
-
Possible duplicate of English text corpus for download – new Q Open Wid Feb 10 '19 at 22:57
-
@zixuan I think "text" and "conversational" corpora are two different enough things to deserve different questions. – Sir Cornflakes Feb 11 '19 at 16:45
3 Answers
There are many spoken English corpora available. But generally, you need to ask more questions than 'plain text' before you find the right one. Length, level of annotation, format of annotation, type of conversation, genre/register, dialect, natural vs. elicited, etc. Those will all depend on the type of research questions you want to answer.
If you just want any one corpus you could try:
Santa Barbara Corpus of Spoken American English: http://www.linguistics.ucsb.edu/research/santa-barbara-corpus#Contents
CHILDES collection of corpora (most is conversation) - http://childes.psy.cmu.edu/
BASE - British Academic Spoken English http://www2.warwick.ac.uk/fac/soc/al/research/collect/base/
There are others (like the Switchboard corpus) which you can download for a fee or buy on CD (like the Edinburgh Map Task corpus).
- 10,588
- 27
- 49
Here you can find the Saarbrücken Corpus of Spoken English (SCoSE):
- 30,154
- 3
- 65
- 128
-
Those files encode tone, power and pauses; but lack tagging of parts-of-speech or lemmas. – amI Jul 03 '17 at 22:12
-
There are decent tools for those task freely available, so you can add these annotation automatically. – Sir Cornflakes Jul 04 '17 at 08:09
-
I think the characters added to encode sonic features would choke any POS tagger. Are there plain text versions of the SCoSE files? – amI Jul 05 '17 at 21:49