Computationally differentiating between CVC, VCV and CV syllables

Question

I want to develop a program which can differentiate between CVC, VCV, and CV syllable types.

I'm having trouble knowing when vowel has ended in CVC and CV syllables.

I want to read more about it. Are there any websites, papers, or books that would be recommended reading for this task?

Welcome to Linguistics SE! I have a few questions that I think will help people answer your question. Am I correct in assuming you're looking at audio files? Could you please give us a bit more information about your program and what tools you're currently using? It would be helpful if you described your current process as well. — acattle, Jul 24 '13 at 09:50
Incidentally, I'm not a phonologist nor a phonetician but can a VCV syllable actually exist? Surely that would be two syllables (most likely V-CV). — acattle, Jul 24 '13 at 09:51
@acattle is correct--in most standard phonological theories, a vowel is by definition syllabic, so a VCV sequence would by definition contain two syllables. I would also like to ask--what language is the program operating on? Syllable structure constraints and phonotactic constraints are language-specific. — musicallinguist, Jul 24 '13 at 13:50

robert · Answer 1 · 2013-07-25T08:59:48.910

I assume you're dealing with audio data. Then your problem is that phonemes in actual speech are not discrete units and there is no single right answer to the question where the boundary between a consonant and a following vowel is located. For example, in the syllable /pa/, the articulators move away from the most consonant-like position where airflow is completely blocked (lips are closed) to a fully open position at the mid-point of /a/. The articulators need time for that, so there is a range of mid-points between the two sounds that you might want to consider as potential boundaries.

When dealing with a sequence of an obstruent (a consonant such as /p,t,f,z/) and a vowel, common criteria are:

onset/offset of stable formant pattern in the vowel
rapid in-/decrease in intensity
and sometimes onset/offset of voicing

When dealing with a sequence of a sonorant (a consonant such as /n,m,w/) these criteria might be less or not at all useful. But a change in intensity and change in formant pattern will usually be observable. For example, approximant /r/ (such as in standard American and British English) usually has a low third formant, so the boundary could be set at the midpoint of the trajectory of the third formant in a /r/ + vowel sequence.

Here are two references you might find useful:

Machac, Pavel and Radek Skarnitzl (2009). Principles of Phonetic Segmentation. Prague: Epocha.
Wiget, Klaus, Laurence White, Barbara Schuppler, Izabelle Grenon, Oleysa Rauch, and Sven L. Mattys (2010). How stable are acoustic metrics of contrastive speech rhythm? Journal of the Acoustical Society of America 127.3:1559-1569.

The latter is not primarily about your topic but they give a good description of and more references for segmentation criteria.

Before you start writing your own program you might also want to consider using or adapting existing solutions. There are some tools that use phonemic forced alignment, such as HTK. Together with an acoustic model of the language you are working on and an orthographic transcription of the text, this produces a phonemic time-aligned transcription of an audio recording. Together with P2FA, which provides a wrapper and an acoustic model of American English, and outputs Praat TextGrids, I have achieved good results even for other varieties of English. You could also take a look at MAUS, which provides a web interface for a small number of languages and also produces Praat TextGrids.

Computationally differentiating between CVC, VCV and CV syllables

1 Answers1

Linked