4

Disclaimer: I am not a linguist, please provide any corrections for terminology.

From How languages compare with the number of different syllables from all words?, Yoon Mi Oh's thesis counted the different syllables in the 20,000 most frequent words for several languages. Her results, ordered by increasing number of syllables are:

Japanese: 643
Korean: 1104
Mandarin: 1274
Cantonese: 1298
Basque: 2082
Thai: 2438
Italian: 2729
Spanish: 2778
French: 2949
Turkish: 3260
Catalan: 3600
Serbian: 3831
Finnish: 3844
Hungarian: 4325
German: 5100
Vietnamese: 5156
English: 6949

We can see all languages with a writing system historically based on (approximately) syllabaries (with characters associated with syllables) — Japanese, Korean, Mandarin, Cantonese— have considerably less different number of syllables than others based on alphabets. Note that Korean until the end of the 19th century used Hanja, a writing system mainly based on Chinese characters. There is also an exception: Vietnamese, with a large number of different syllables and using until the 20th century Chữ Nôm, also a written system based mainly on Chinese characters. Despite this case, can a language restrict overtime its number of distinct syllables because it is written with a syllabary?


I only have an idea for the case of Mandarin. Although there are many Chinese characters, literate individuals know and use between 3,000 and 4,000 characters (see 3). Maybe this is the physiological upper limit on the number of items of a syllabary for an average person to remember? In that case, this would place an upper bound for the number of different syllables for a language using a syllabary. Furthermore, Chinese characters are constructed by combining only 214 radicals, which sometimes are responsible for its phonetics. And the 10 most used radicals appear in 10,665 characters (or 23% of the dictionary), potentially restricting the phonetics even more.

Supporting the direction of the hypothesis (from San Duanmu, The Phonology of Standard Chinese):

While Middle Chinese (about AD 600) had over 3,000 syllables (including tonal distinctions), modern Standard Chinese (SC) has just over 1,300. Thus, over a period of 1,500 years, Chinese lost more than half of its syllables. Moreover, the syllable inventory of modern Chinese continues to shrink. In addition, about 200 of the 1,300 syllables in SC are now rarely used.

In addition, Evelyn Rawski writes that during the Qīng Dynasty (1644-1911):

Evidence of the large number of potential teachers and the widespread distribution of private schools led us to conclude that it was possible for a broad cross-section of Ch'ing [= Qīng] males to attain some degree of literacy in private and charitable schools. Information from the mid and late nineteenth century suggests that 30 to 45 percent of the men and from 2 to 10 percent of the women in China knew how to read and write. This group included the fully literate members of the elite and, on the opposite pole, those knowing only a few hundred characters. Thus loosely defined, there was an average of almost one literate person per family.

Thus, a large part of the population was literate, which could allow for some plausible influence from the writing system to the language.

For the particular discussion of Chinese language, I also asked the question: Why Mandarin Chinese has a few number of different syllables?

Puco4
  • 299
  • 2
  • 9
  • 4
    I notice she didn't count any Polynesian languages: they generally have a small consonant inventory, and a correspondingly small number of different syllables. They are all written alphabetically as far as I know. – Colin Fine Aug 07 '20 at 22:26
  • 4
    I'm not sure there's a pattern there to be explained. Korean has the second lowest number of syllables but is written with an alphabet; Vietnamese has the second highest but was written with logograms for centuries. The count for Thai is on the low side at 2438. In principle the Thai writing system could represent over 10000 syllables, so I don't think it can be the reason for the comparatively low number of actual syllables. – rchivers Aug 08 '20 at 07:41
  • 1
    I haven't read the thesis but would assume that Oh counted syllables differing only in tone as the same. If you were to count them as different, the languages that really do rely on logograms would shoot right up the table. – rchivers Aug 08 '20 at 07:43
  • @rchivers I couldn't confirm it from Oh's thesis, but I am quite sure she considered the tones, at least for Mandarin. If you didn't count the tones in Mandarin you would get around 400 distinct syllables. – Puco4 Aug 08 '20 at 08:01
  • OK. I can’t see any reason to think that the fact that a language uses an alphabet or abugida influences the number of syllables in that language – alphabets cover the range from Korean to English, and the only abugida represented is capable of representing a far greater number of syllables than are actually found in that language (an order of magnitude greater, if we are counting tones). – rchivers Aug 08 '20 at 09:12
  • As for the logographic systems, I don’t think the question can be answered – you’d have to be careful about Japanese because it is a hybrid system, and if you are only looking at Mandarin and Cantonese, the similarity in the number of syllables is at least as likely to reflect the similarity of the languages themselves as the writing system. That said, assuming there was no real problem writing Vietnamese in Chinese characters, this type of writing system doesn't seem to restrict the number of syllables either (or if it does, it is a very slow process). – rchivers Aug 08 '20 at 09:12
  • As far as I can see that just means that there are no characters for non-existent words. According to wikipedia (https://en.wikipedia.org/wiki/Ch%E1%BB%AF_N%C3%B4m) the Chinese system was adapted for Vietnamese but remained a logographic system. On that basis there have been syllable-rich languages with logographic writing systems and syllable-poor languages with alphabetic systems. – rchivers Aug 08 '20 at 11:38
  • 2
    Maybe, although I think much of that could be said about Mandarin as well. With regard to Korean, although the modern script is an alphabet as @Draconis has pointed out, it has only been in use for 100 - 150 years, and the previous system was logographic (not that any of these scripts are purely logographic). Of course, for much of history the vast majority of speakers of any given language have been illiterate, so there would have to be some special pleading to explain how a writing system used by a tiny minority would constrain the speech patterns of the language as a whole. – rchivers Aug 08 '20 at 12:35
  • It should be noted that Chinese characters were not created to write Mandarin – they were made to write Old Chinese, which had a much higher number of possible syllables, allowing both initial and final clusters. – Janus Bahs Jacquet Aug 08 '20 at 21:03
  • 1
    Pardon me; what are these objects of which Japanese has 643? Can you give an example? – Kaz Aug 09 '20 at 00:20
  • @Kaz We are talking here about all different syllables in the 20.000 most frequent words of a language. For example, in Japanese here are the 20.000 most frequent words. Now you could count the different syllables from all the words and you would get that in total there are only 643 different syllables. – Puco4 Aug 09 '20 at 07:00
  • @Puco4 What is a "syllable" and what isn't? Can you give four or five examples? What are the syllables in word #94 in that list: 研究? Are they"ken" and "kyuu"? It's amazing that this can explode to 643, because it seems that all the syllables that cna be come from the basic morae (a, i, u, e, o, ka, ki, ku, ke, ko, kya, kyu, kyo, ...ha, hi, fu, he, ho, ...pa, pi, ... hya, ... bya, and so on). These are all syllables on their own. They can be lengthened, or have N stuck on to them: an, in, un, ... aa, ii, uu, ... kyoo, kyon, ... – Kaz Aug 09 '20 at 16:51
  • @Puco4 Are we counting, for instance, GA as two different syllables based on whether it or not it velarized in the middle of speech? – Kaz Aug 09 '20 at 16:52
  • @Kaz Sorry, I do not have any idea about Japanese, but I expect this number to be approximately all possible sounds. You can look the details in Yoon Mi Oh's thesis pag. 32 where she says: The Japanese Internet Corpus [Sharoff, 2006] was retrieved online, which was already lemmatized. It was then converted into Katakana by an online Kanji converter and was transcribed again into IPA by means of a list of phonemic entities corresponding to morae provided by the National Institute for Japanese Language and Linguistics (NINJAL). The transcribed data was syllabified by a bash shell script. – Puco4 Aug 09 '20 at 17:24
  • Vietnamese were written in chữ Nôm which was created based on Chinese characters and it still has a large sound inventory – Lưu Vĩnh Phúc Aug 10 '20 at 00:05
  • Assuming that a word like 顔 (kao, face) or 蝿 (hae, fly (insect)) count as two syllables (for which there is good justification), Japanese has 308 syllables, by my count. I'm assuming that the kana じ、ぢ、ず、づ are distinct varies regionally. – Kaz Aug 10 '20 at 22:03
  • I'm sorry I can't help you with this @Kaz. Maybe if you are still interested you can open a question asking about this point either here or in Japanese SE. – Puco4 Aug 10 '20 at 22:12

4 Answers4

9

No. The use of a ‘characters writing system’ (I take it you mean something not simply alphabetic) does not restrict the number of distinct syllables. Even if you look at Yoon Mi Oh's list there's no reason to assume this. The gap between Cantonese and Basque isn't all that great and Korean uses an alphabet. The list is also fairly biased, for example many languages with very small syllable inventories exist that use the Roman alphabet, like the Polynesian languages, or a syllabary, like some indigenous American languages.

One might be tempted to think, okay, maybe using, say, a syllabary doesn't restrict what the language can do, but people speaking a language with many syllables will not use a syllabary, so there could be a connection the other way round. But a long time ago the language that would eventually turn into Ancient Greek was written with a script containing both syllabic and ideographic signs even though the language wasn't well-suited for that. Our knowledge here is scant, but the consensus is that they adapted their script from the Minoan script whose language presumably was better suited to it, although we don't really know. The way the Greeks dealt with the situation was basically by omitting some consonants and writing filler vowels.

The last technique is similar to how the Japanese adapt foreign words to their phonology, although they do so in speaking as well as in writing. Which is a nice segue to what happened when the syllable inventory of Japanese expanded beyond what their syllabaries (kana) could express. Initially they had to make do by spelling their unspellable words with the kana they had, mainly by using historical spellings, but eventually their kana system was reformed and expanded to cope.

Another historical example is Ancient Egyptian. We have no reason to assume it had a particularly small syllable inventory, but they used a kind of hybrid system where ideographs could also stand for a sequence of consonants (vowels were not written) and this system was in use well into the Roman period. The last surviving hieroglyphic text dates from 394 and the last surviving demotic text from 452.

I think that instead writing systems originate form sources of power, whether it be cultural, or otherwise and spread from there. Think of the Chinese ideographs spreading over all Chinese territory and even beyond to Korea and Japan. Or the Roman letters spreading over the former empire and beyond and over the world in the colonial age. Think of the Greek letters turning into Cyrillic and spreading into Russia. Or the Phoenician letters spreading over the Mediterranean. Or the Arabic script being adopted even by languages wholly unsuited to it, such as Ottoman Turkic.

People tend to adopt the script of the cultural hegemonic power in their area. If they're lucky they'll adapt it to fit their languages, think for example of all the digraphs in languages using the Roman alphabet that are needed to spell sounds that Latin didn't have. Cases of shaking off the yoke, like the Koreans and the Turks did, seem to me to be exceptions, at least for the bigger language communities. The Japanese won't switch to a kana-only way of writing any time soon, I'm afraid.

Anonymous
  • 91
  • 1
7

There is a very good reason for thinking that this is coincidence. The reason is that a language has the same number of syllables whether it is written or not, and whether it is written with one form of script or another. Mandarin has the same number of syllables whether it is written in characters or Pinyin.

The only way the pattern could be significant is if the syllable inventory somehow influenced the choice of writing system. But there is no evidence for this: on the contrary, most scripts are adopted because they are borrowed from neighbouring languages, or for political or religious reasons.

Colin Fine
  • 7,454
  • 22
  • 28
7

In some cases, I do think there's a causal link here. However, Japanese, Mandarin, Korean, and Thai have very different writing systems, so I wouldn't group them all together as "characters". (Japanese is a syllabary plus logograms, Mandarin is pure logograms, Korean is an alphabet, and Thai is an abugida.)

Focusing on Japanese specifically, Japanese is a language with a fairly small inventory of syllables. Writing each mora with its own symbol (morae are sort of like syllables, and are the basis for the Japanese writing system), you end up needing only 71 symbols to write the entire language.

Compare this to English, with its ~7,000 different possible syllables. A syllable-based system is significantly less feasible here!

Is there a reason any of these languages couldn't be written with an alphabet, like English is? Not at all. Roumaji (Japanese written in the Roman alphabet) can convey exactly the same information that kana (Japanese syllabic characters) do. But there is a reason why English isn't written with a syllabary, and that reason is tied to the size of the syllable inventory.

P.S. It's worth noting that some languages with enormous inventories do/did use syllabic writing systems, such as Hittite and Mycenaean Greek. In these cases, the writing system doesn't fit very well, and writers have to use awkward workarounds: inserting extra vowels, as in Hittite harkzi (two syllables) written ha-ar-ak-zi (four signs), or leaving out some consonants, as in Mycenaean sperma written pe-ma.

Draconis
  • 65,972
  • 3
  • 141
  • 215
  • I see the logic but what about the fact that Vietnamese was written with Chinese characters until fairly recently? Admittedly I don't know how well this worked, or to what extent the writing system had to be adapted. – rchivers Aug 08 '20 at 07:12
  • Japanese isn't written with kana alone though. And many Japanese texts won't be legible if written in "only 71 symbols" of it (or romaji). – Alice Aug 08 '20 at 12:29
  • 6
    @Alice: a thousand years ago Genji Monogatari and Makura no kotoba were written entirely in kana. It's true that Japanese readers today aren't used to reading texts entirely in kana; and there may be more homophones today than there were befoer Sinicisation; but "won't be legible"? – Colin Fine Aug 08 '20 at 12:42
  • 2
    @rchivers A number of east Asian languages used Han characters historically. Japanese is the only big one other than 'Chinese' (which is really more than a dozen spoken languages sharing a rigorously standardized writing system) that still does (though Japanese calls them kanji). Most of the others (including Vietnamese and Korean) dropped Han characters due to how difficult they make learning to read. Japanese is kind of stuck with them now though due to their very high frequency of homophones (which are by definition also homographs when written with anything other than kanji). – Austin Hemmelgarn Aug 08 '20 at 15:55
  • @AustinHemmelgarn Going by the table above though, Vietnamese is the only one that doesn't fit the hypothesis - it is the only language with a high syllable count that has been written using a logographic system. – rchivers Aug 08 '20 at 16:12
  • 1
    @rchivers I contend that that's a side-effect of the selection of languages and the relative infrequency of logographic writing systems. – Austin Hemmelgarn Aug 08 '20 at 16:23
  • @AustinHemmelgarn You mean the correlation between having a relatively low number of distinct syllables and using a logographic writing system is a coincidence? If so, I think so too. The reason for singling out Vietnamese is that it's the only language in the list that bucks the trend identified by OP. He thinks it may be a special case, as you can see above. I'm sceptical about that but don't know enough to comment in any detail - it's possible I suppose. – rchivers Aug 08 '20 at 16:50
  • @rchivers I do believe it's a coincidence, but my point was more that it's a very small sample size and we're reasoning about something that's uncommon to begin with, so outliers are to be expected. – Austin Hemmelgarn Aug 08 '20 at 17:26
  • @Draconis I removed Thai from my list and instead used the term "characters with syllables associated", which I believe the other languages I mention have in common. I don't know what is the linguistic term for that though. – Puco4 Aug 08 '20 at 17:51
  • @Puco4 Sort of—Chinese uses characters with meanings associated, not sounds (called a "logography"), while Korean uses characters with individual sounds associated (an alphabet), it just groups the letters into squares. Japanese is the only one of those three that assigns a syllable to each character (called a "syllabary"). – Draconis Aug 08 '20 at 18:07
  • (I use the word "Chinese" here to mean "any language written primarily in the Han script", so Mandarin and Cantonese among others.) – Draconis Aug 08 '20 at 18:07
  • 1
    @Puco4 Fundamentally, each Han character stands for a meaning; the sound is incidental. For example, the character 一 (a single horizontal line) has the meaning of "one", but every language pronounces it differently: in Mandarin, jat¹ in Cantonese, etc. The pronunciation can even vary within the same language: in Mandarin, that character is pronounced in most cases, when it describes a noun without a classifier, yāo when spelling out a number… – Draconis Aug 08 '20 at 18:19
  • 1
    @Puco4 It's somewhat like the symbol "1". Yes, in English most people will read this as the syllable "one". But a Spanish-speaker would read it as uno instead, an English-speaker would read it differently in different contexts (e.g. reading 100 as "a hundred"), and you wouldn't use it to spell a word like "wonder" ("1der"?) that contains that syllable but not that meaning. – Draconis Aug 08 '20 at 18:22
  • (Anyone who speaks Mandarin/Cantonese/etc feel free to correct me; I'm looking up these pronunciations in a dictionary since I don't speak any of those languages myself.) – Draconis Aug 08 '20 at 18:25
  • I understand your point, but what I was mainly interested is in symbols associated to syllables. Probably in some cases "1" can have different syllables associated, but mostly it is associated to one syllable. Also this is the case for Mandarin characters. From Wikipedia: Unlike an alphabet, a character-based writing system associates each logogram with an entire sound and thus may be compared in some aspects to a syllabary. – Puco4 Aug 08 '20 at 18:39
  • 2
    @Puco4 In that case have a look at the passage here. It seems there is evidence that languages using syllabaries tend to have comparatively few syllables (the article also says that in practice the Han script is largely syllabic). I don't know if you'd be able to get any data for the other languages mentioned. I still doubt that the writing system constrains the number of syllables - I think it's more plausible that languages iwth many syllables don't adopt syllabic w. systems in the first palce, or don't stick with them. – rchivers Aug 08 '20 at 18:47
  • 2
    @Puco4 There are also some languages which use syllabic writing systems but have complicated syllable structures (such as Hittite or Mycenaean Greek); when this happens, they usually stretch the writing system to make it fit, adding "empty" vowels to break up consonant clusters (Hittite hark-zi, two syllables, written ha-ar-ak-zi, four characters) or just not representing some of the consonants in writing (Mycenaean Greek sperma written pe-ma, ignoring the S and the R). – Draconis Aug 08 '20 at 19:02
  • @Puco4 Updated. – Draconis Aug 08 '20 at 19:25
2

The problem with trying to draw any conclusion from statistics like the one shown is that the mapping from spoken to written languages is not one-to-one:

  • writing systems are shared by many unrelated languages
  • a single language may adopt different writing systems at different times
  • multiple writing systems may even be in use simultaneously, as with Japanese
  • both the writing system and the spoken language evolve over time

However, we might try to look for other evidence disproving the hypothesis.

Are languages which have fewer distinct syllables more likely to use syllabaries or logographies rather than alphabets?

As far as I know, spoken language always precedes written language, so it is quite reasonable to conjecture that the structure of a spoken language would influence a writing system designed for that language.

Unfortunately, most languages do not invent their own writing systems, they adopt and adapt systems from cultures they admire, or simply from cultures they have been conquered by. For example, Maori is written using the Latin alphabet because of the British conquest of New Zealand, not because it is suitable for the language.

That alphabet in turn was adopted for English under the cultural influence of Roman Christianity, and was itself adapted from the Greek alphabet, which was adapted from the Phoenician. At best, we would need to discard all evolution of European spoken languages, and examine the sounds of Classical Latin 2000 years ago, when that alphabet last changed substantially.

Do languages which are written with syllabaries or logographies rather than alphabets evolve to use fewer distinct syllables?

Your question discusses the opposite relationship, that writing influences speech. The clearest evidence for this would not be difference between languages, but difference over time - if a language was originally unwritten, or written using an alphabet, and then began to be written with a syllabary, our hypothesis would suggest that it would start to lose distinct syllables over time.

One broad reason to be skeptical of this hypothesis is that until fairly recently, the number of users of most spoken languages vastly out-numbered the users of its written form. Any influence would presumably need to first take root in a literate elite, then supplant the dialects used by less literate sections of society.

We can also look again to the relationship of English to the Latin script. Modern English has 20 to 30 distinct vowel sounds, but uses a writing system with only 5. Although some are reasonably consistently represented by digraphs like "ee" and "oo", others are simply ignored by the spelling - notably the unstressed "schwa", which is actually one of the most common vowel sounds in spoken English. The ability to distinguish written sounds doesn't seem to be a pre-requisite for distinguishing spoken sounds.

IMSoP
  • 303
  • 1
  • 8
  • @Puco4 An open question of why Chinese has undergone such a reduction would be interesting to explore. But I'm not seeing strong evidence for your original conjecture that it is related to some property of the writing system, rather than simply part of the evolution of the spoken language itself. – IMSoP Aug 09 '20 at 19:01
  • @Puco4 That quote from Wikipedia is about evolution of the writing systems themselves, and about how they represent certain sounds, not the existence of those sounds, as clarified later in the paragraph: "Bimoraic syllables are now written with two letters..." If anything, it lends weight to my suggestion that written and spoken language are under independent evolutionary influences, rather than one influencing the other. – IMSoP Aug 09 '20 at 19:06
  • 1
    @Puco4 Yes, I came across that, although it's still very recent compared to the evolution of the language. If it is a longer trend, an interesting hypothesis might be that highly literate societies lead to reduction in distinct syllables in spoken language. But again, that's not what this answer, or this question, are about. – IMSoP Aug 09 '20 at 19:21
  • Also the example of English might not be completely extrapolated because as you mentioned, the alphabet fails quite a bit to represent the phonetics. For example, on the other side we have Spanish, with the alphabet representing quite good the phonetics. In this way, I expect the way a foreigner word is written, it will influence a lot its final pronunciation. – Puco4 Aug 09 '20 at 21:49
  • (I moved some of the comments to the original question for clarity). – Puco4 Aug 09 '20 at 22:12
  • @Puco4 Since Spanish is written in exactly the same alphabet as English (except for the extra letter ñ), doesn't that difference rather suggest the writing system isn't the key factor? I suspect the influence is strongly in the opposite direction: Spanish has fairly straightforward pronunciation rules, so has been able (with the appropriate political climate) to maintain a standardised phonetic spelling system. – IMSoP Aug 10 '20 at 07:41
  • The influence in the direction from the language to the writing system is clear. However, I imagine there could also be some influence in the opposite direction. But I guess is difficult to be able to decide whether this has happened or not. – Puco4 Aug 10 '20 at 08:02
  • @Puco4 You seem to be very attached to your hypothesis, and keen to find positive evidence for it, rather than either being curious about alternative hypotheses, or applying the scientific method and looking for evidence that would disprove a hypothesis. – IMSoP Aug 10 '20 at 09:28
  • I don't know why do you think I do not want to apply the scientific method. Otherwise I wouldn't ask this question... But it is true I find interesting and logical this hypothesis. Furthermore, I am quite open to hear about other hypothesis explaining why the mentioned languages that historically were written with syllabaries have a low number of syllables (either individual reasons or general reasons). I understand the implications of these results can be an artifact given the limited sample of languages studied, so considering a larger sample might be able to help here. – Puco4 Aug 10 '20 at 11:40