Most Popular

1500 questions
11
votes
3 answers

Impossible bigrams in the English Language

Is there a list that contains every two letter combination that is not found in any English words? I have searched for a very long time and found nothing. It would also be useful if I had three letter combinations that do not exist. I have to search…
11
votes
2 answers

How does the nonsense word "frabjous" conform to English phonotactics?

I am aware that this question is rather more complex than I am treating it, but I am looking for a few general rules (e.g. basic phonotactic constraints) that would lead to the conclusion that the nonsense word "frabjous" conforms to English…
Rad Anyaz
  • 119
  • 1
  • 6
11
votes
2 answers

How to best clean a large historical corpus ridden with OCR errors

Overview: I have a very large corpus of historical news papers (17th-20th cent.). The word count is about 20 bln. It's raw OCR-ed data in txt-files of about 150 GB. One newspaper issue per file (some 12 mln files). Here are some further…
Mat
  • 231
  • 1
  • 5
11
votes
3 answers

Is it true that English speakers will only accept one of the 120 possible combinations of the 5 morphemes de-nation-al-ize-ation?

Dominique Sportiche, Hilda Koopman, and Edward Stabler [1] make the following claim about Affixation in section 2.3.2 of their Introduction to Syntactic Analysis and Theory: There are 5!=120 different orderings of the five morphemes in…
rjpond
  • 219
  • 1
  • 6
11
votes
0 answers

What kind of features support the claim that Slavic languages are closer to Germanic languages than to Indo-Iranian languages?

Inspired by this answer to a different question, I ask what kind of features justify a claim that Balto-Slavic languages are closer to Germanic languages than to Indo-Iranian languages. The features may be inherited or later acquired in a Sprachbund…
Sir Cornflakes
  • 30,154
  • 3
  • 65
  • 128
11
votes
5 answers

In Turkish, how exactly does "ğ" affect the vowel it follows?

In Standard Turkish, "ğ" is explained as having no sound of its own but instead lengthens the previous vowel. So would "aa" and "ağ" sound alike? What about "â" and "ağa"? Can there sometimes be three vowel length distinctions in Turkish? (This is a…
hippietrail
  • 14,687
  • 7
  • 61
  • 146
11
votes
1 answer

When and how did the Japanese honorific system evolve?

I know that languages, in general, can denote honorifics, especially with second person pronouns (T/V distinction, etc), and I imagine that the Japanese system of honorifics is probably an extension of that into other persons with more granularity.…
11
votes
4 answers

Hierarchy of morphology, auxiliaries, and suppletion of verbal accidents?

I would like to make a hierarchy of verbal accidents that would have the following features. For any two accidents in the hierarchy, if a language marks only one of them by lexical suppletion, it is significantly likelier to be the one earlier in…
11
votes
3 answers

Why do most languages have a different form for singular vs plural nouns?

I've been wondering about this for a while. It makes sense intuitively, but I feel this is probably partly due to having been conditioned to think about it this way throughout all our lives, because it's just the way most languages work. Coming from…
Samuele B.
  • 235
  • 2
  • 5
11
votes
1 answer

What is the origin of the "redundant" pronouns in the Venetian language?

From the examples taken from Wikipedia: • Venetian: (Ti) te jèra onto or even Ti te jèri/xeri onto (lit. "(You) you were dirty"). • Venetian: El can el jèra onto (lit. "The dog he was dirty"). It is possible to see that Venetian uses lots of double…
Ergative Man
  • 1,436
  • 1
  • 8
  • 22
11
votes
2 answers

What research has been done on the effects of learning Esperanto on acquiring other languages?

I have recently started learning Esperanto because I thought it would be an interesting exercise to compare and contrast it with the natural languages I speak. Anyone who has done even light research on Esperanto know its two main selling points are…
acattle
  • 2,898
  • 1
  • 14
  • 26
11
votes
3 answers

How are line breaks handled in bidirectional messages containing both English and Hebrew?

I have some Hebrew (right-to-left) text within an English (left-to-right) text as such: The Hebrew text (right-to-left) by itself looks like this: When the paper does not have enough width, the Hebrew text would wrap naturally as such: What's the…
Pacerier
  • 621
  • 1
  • 6
  • 12
11
votes
6 answers

Do multi-dimensional writing systems exist?

I am not sure whether linguistics board is the right place to ask this question, but since I couldn't find any better place here is the question: Most (all?) of the writing systems are using the vector-like/linear alignment of text. It may be left…
Diagon
  • 111
  • 3
11
votes
2 answers

Are there sentence boundary disambiguation algorithms which can handle punctuation errors with decent accuracy?

Most algorithms for splitting text into sentences which I've found rely on punctuation being correct. However, in many real world applications, there will be substantial numbers of punctuation errors (missing periods, extraneous periods, etc.) Are…
Alexey Romanov
  • 303
  • 2
  • 9
11
votes
4 answers

Are the morphologies of languages based on regular grammars?

Is the sets of possible morphemes of any given language a regular set, and can thus be recognized by a finite state automaton, or, equivalently, matched by regular expressions? Or are there any examples of recursive syntax in morphology that…
Kaz
  • 250
  • 1
  • 7