4

I have a project which requires a corpus of conversational English in plain text (although I can perform some processing as needed). Since I am a student, I need to find a corpus that is free and downloadable. I would appreciate it if someone could provide suggestions for such corpora.

Stephane Rolland
  • 652
  • 6
  • 18
adeora
  • 159
  • 1
  • 3

3 Answers3

8

There are many spoken English corpora available. But generally, you need to ask more questions than 'plain text' before you find the right one. Length, level of annotation, format of annotation, type of conversation, genre/register, dialect, natural vs. elicited, etc. Those will all depend on the type of research questions you want to answer.

If you just want any one corpus you could try:

There are others (like the Switchboard corpus) which you can download for a fee or buy on CD (like the Edinburgh Map Task corpus).

Dominik Lukes
  • 10,588
  • 27
  • 49
3

Here you can find the Saarbrücken Corpus of Spoken English (SCoSE):

http://www.uni-saarland.de/lehrstuhl/engling/scose.html

Sir Cornflakes
  • 30,154
  • 3
  • 65
  • 128
  • Those files encode tone, power and pauses; but lack tagging of parts-of-speech or lemmas. – amI Jul 03 '17 at 22:12
  • There are decent tools for those task freely available, so you can add these annotation automatically. – Sir Cornflakes Jul 04 '17 at 08:09
  • I think the characters added to encode sonic features would choke any POS tagger. Are there plain text versions of the SCoSE files? – amI Jul 05 '17 at 21:49
1

I had success with the conversation transcripts provided by UNC Charlotte.

Dirigo
  • 111
  • 3