As a Dane, I have a difficulty distinguishing between Swedish, Nynorsk and Bokmål when given a text. To me they appear as non-Danish Nordic languages.
I am wondering whether there are good quick heuristics to distinguish between the 3 non-Danish languages given a short text? The Swedish "och" (meaning "and" in English) is to me clearly an indication of Swedish. Would there be other such rules?
Consider the text "Samfunnsoppdrag under press: Erfaringer og vurderinger i norske bibliotek under Covid-19". To me the "og" word clearly indicates that this is not Swedish. Many of the words are similar to the Danish words, so I presume that this is Bokmål!? The word "Samfunnsoppdrag" is surely not Danish. But could the text be Nynorsk?
BTW: I have a webapp that - based on Wikidata lexemes - attempts to dectect the language for a given text. The data entered in Wikidata is unfortunately limited, so in the case with the above text it erroneously guesses on Danish, Swedish and Bokmål in the given order: https://ordia.toolforge.org/text-to-languages?text=Samfunnsoppdrag%20under%20press:%20Erfaringer%20og%20vurderinger%20i%20norske%20bibliotek%20under%20Covid-19