11

I want to measure the quality of speech: is it higher level/lower level (vocabulary grammar etc.) and also the understandability of the speech, i.e. is the teacher using language above a student's head?)

The Flesch Kincaid readability test is a measurement of a written document's readability (which really is a measurement of understandability). There are numerous methods of measuring written text's readability, but processing spoken text is different from written text. Is there a similar way to measure spoken text for understandability? I have transcribed (typed) dialogue.

P.S. I've seen the Flesch Kincaid applied to speech, including President Obama's speeches, and used to determine the quality of the language used, but it seems inappropriate to use a readability measurement for spoken text as well as using readability for measuring quality.

Tyler Rinker
  • 586
  • 2
  • 4
  • 17
  • i am guessing that if you want to measure understandability in the sense that longer, syntactically complex, sentences, and longer words are less understandable, then the flesch kincaid should work just as well. it might be helpful to know whether the measure is intended to measure understandability by uneducated people, or understandability by second language learners. –  Dec 22 '11 at 23:59
  • @jlovegren good point. This analysis would be used in a typical classroom to measure student and teacher language understandability and quality. The intended use would not be for ESL students, however I'm using this information to write a package for the computer program R that is free to be used by whomever and the other users may desire to use it on ESLers. – Tyler Rinker Dec 23 '11 at 00:10
  • In that case I think Flesch-Kincaid should work well enough. I think in the case of ESL students a big factor is the use of conventionalized expressions (e.g. "be under oath", "run the red light") whose meaning is computable, but not obvious, from the parts, but these aren't as much of an issue for children. –  Dec 23 '11 at 00:39
  • 2 follow ups: Is this something you or others have seen done in research (using Flesh Kincaid for understandability and quality of speech measures)? 2) Is this something the field needs to develop a bit more? – Tyler Rinker Dec 24 '11 at 16:17
  • 2
    As far as I know, I haven't seen any work trying to measure the amount of difficulty introduced by conventionalized expressions, but this isn't my area of specialization. There is, however, a fair-sized amount of literature on idioms: (see http://lingo.stanford.edu/sag/papers/idioms.pdf for an overview), and the subject is prominent in the development of large-scale grammars, such as the English Resource Grammar (http://www.delph-in.net/erg/). But as for the intuition that convenitionalized expressions are most difficult for L2 learners, this is a hunch, but probably not controversial. –  Dec 24 '11 at 17:13
  • 2
    [continued] I just checked google scholar; a possibly useful paper in this regard might be "Conventionalized language forms and the development of communicative competence" CA Yorio - Tesol Quarterly, 1980 - JSTOR. (there is a link to a PDf in google scholar. –  Dec 24 '11 at 17:19
  • 2
    [continued] I realized I might have misinterpreted your question. As for use of Flesch Kincaid in analyzing speech, I'm unfamiliar with the literature here, so I couldn't say whether anyone has endorsed it in print. In your place I'd look through the original proposal for Flesch-Kincaid and see whether the simplifying assumptions it makes about written language can be transferred to spoken language. –  Dec 24 '11 at 17:32
  • 2
    If I understand correctly, the Flesch-Kincaid test measures primarily the average length of words in syllables. If one is ready to accept that this is truly what determines the understandability of text, than I can see no difference between written and spoken text. But you should beware of confusing it with any attempts at quantifying the quality of text. In my opinion, the idea is already shaky for understandability, and for quality it is outright nonsense. – kamil-s Feb 29 '12 at 09:57

1 Answers1

1

While I don't know of existing work on the subject, one approach you might take would be to recreate a Flesch-Kincaid-like scale for speech.

First step: determine what levels you want your scale to map to.
You might take something like "conversation/dialogue", "presentations/monologue", and "formal speeches" to be three levels on such a scale (at least for an initial experiment).

Second step: build a classifier
You could use Naive Bayes or MaxEnt in the NLTK, for instance, with a simple set of features. If you're going for a hard-analogue to Flesch-Kincaid, go for the two simplest features to extract which approximately measure lexical difficulty and syntactic difficulty. For a corpus of speech, you might use word-length (measured by syllables, phones, or orthographic length) and utterance length (rather than sentence length; how many words are in each utterance). For each document, calculate average word length and average sentence length and see how well your classifier does.

If it's able to consistently distinguish among the classes, you ought to be able to come up with an F-K-style formula that could be used to classify things on the "conversation-monologue-speech" continuum.

dmh
  • 171
  • 8
  • Welcome to Linguistics, dmh! :) I added a link to one classifier, but couldn't find a page for the second one. – Alenanno May 31 '12 at 19:16