I am the developer of the Readable Passphrase plugin for KeePass, which is all about creating random yet grammatical text. So here is a brief empirical analysis of what it produces.
Based on the phrases produced, passphrases from a ~14k word dictionary (version 0.15 of the plugin) contain between 7.4 and 9.3 bits of entropy, depending on what grammatical forms of a word are allowed. Given that this is when trying very hard to be random, I'd interpret those numbers as an upper bound of the entropy present in normal prose.
I took great pains when developing the plugin to try to count the number of combination different phrases might produce, but this analysis is entirely dependent on my counts being correct. (Entropy is derived based on the part of speech and allowed grammatical forms for each word in the pattern. It is also reported as a range, as different parts of speech affect what is grammatically allowed).
Method
- Create 1000 passphrases which follow a fixed grammatical pattern.
- Determine the theoretical entropy of such a phrase based on numbers of words in the dictionary.
- Find average number of words per phrase = total words generated / 100
- Find average entropy per word = theoretical entropy / average words per phrase.
- Rinse and repeat for a different pattern.
Basic Pattern
<noun> <verb> <noun> - aka strength NormalRequired.
Note that nouns may be common, proper or derived from an adjective, and may have a definite or indefinite article or personal pronoun. The first noun may also be substituted for a number from 0-999 (digits). The verb uses present, past and future tenses. The entire phrase may be in the interrogative.
Samples:
my trite one examines the supply
should Waldo knit the sophist
how does their bifocal thing coil the daydream
should a secret thing enqueue the decade
the 1 risk whams a whaler
- Average words per phrase: 5.21
- Theoretical entropy per phrase (bits, min / avg / max): 39.5 / 44.8 / 46.4
- Entropy per word (min / avg / max): 7.58 / 8.59 / 8.90
Long Pattern
<noun> <adjective> <adverb> <verb> <adverb> <preposition> <adjective> <noun> <conjunction> <noun> - aka strength InsaneRequiredAnd
In addition to previous pattern, this includes plural nouns and demonstratives. And continuous present, continuous past, perfect and subjunctive verb tenses, and intransitive verbs. Note that intransitive verbs dramatically shorten these phrases (mostly because the plugin doesn't handle them very well). The conjunction is either and or or.
Samples:
should their streaky one variably elevate plus these hoarse liars because of the overdone oddity
should these mellow ones feasibly repose apart from this torrid poacher but not this depleted hewer
when does this disfigured one decisively replace except for the deadly intruders or the real sawdust
the 22 armful of logicians earlier smooched amidst that gay turret and even a homemade skywriter
their convex thing profited evermore
- Average words per phrase: 13.50
- Theoretical entropy per phrase (bits, min / avg / max): 119.49 / 123.03 / 124.17
- Entropy per word (min / avg / max): 8.85 / 9.12 / 9.20
Comment
Adding extra parts of speech adds, at best, 1.5 bits of entropy per word. While also introducing considerably more complexity (making it much harder to remember the phrase).
In order to get to 9 bits per word, the length and complexity of the phrase are quite out of hand. It would take non-trivial but reasonable effort to memorise, but once done, your very close to the magic 128 bits.
The shorter phrases are pretty easy to pick up (having memorised several of them during development of the plugin!).
And yes, for those interested in the plugin, it does have phrases in-between these.