1

In traditional linguistics literatures there is a clear separation between words and non-words. Words are basically what you'd find in a dictionary. But in todays world you find all kinds of word-like "things" in tweets and other social media, such as you could imagine:

I'm so excited, yay!!ah!!woooohoo:):):Dboom
I've been w#ndreeeng what this means.
I don;t knw if this <asdfasdfasdf> sense.
Maybe the wordsarestucktogeth ...er.

Where:

  • yay!!ah!!woooohoo:):):Dboom is basically enthusiasm
  • w#ndreeeng is "wondering" with a sort of sounded out part (reeeng) and the # for playfulness.
  • don;t is misspelled
  • knw leaves out some letters
  • <asdfasdfasdf> can be interpreted as "makes" for "makes sense".
  • Words are stuck together
  • But also split apart.

These are just the top of my head. But I am wondering how to structure them linguistically. Such as, calling them just "words", but that doesn't make sense. Calling them "things" is too general. Maybe "clauses", but that is usually a set of words. So wondering if there are any formalisms around this sort of stuff. Wondering how to treat them.

You can also have more normalized "non-word" structures such as:

This is a #hashtag.
This is a https://linguistics.stackexchange.com link.
This is a @username.
This is an example.domain.com.

Wondering what those types of things are called when they are found in text/writing/language.

Sir Cornflakes
  • 30,154
  • 3
  • 65
  • 128
Lance
  • 4,342
  • 1
  • 26
  • 56
  • "Such as, calling them just "words", but that doesn't make sense." Why? Words are just things you put spaces around ;) – curiousdannii Jul 30 '18 at 04:12
  • 3
    In linguistics there's a clear separation between words and non-words? Which textbook are you reading? :p – Luke Sawczak Jul 30 '18 at 04:18
  • I am talking about e.g. punctuation, paragraphs, phrases, as opposed to words. – Lance Jul 30 '18 at 04:22
  • 1
    In terms of parsing you could just call them all tokens. – curiousdannii Jul 30 '18 at 11:51
  • "Words are just things you put spaces around" That's just one definition of a word and the worst one. That's the concept of a graphematic word which is useful when you want to count "words" of a text or are interested in spelling conventions. – unknown_person_1000 Dec 15 '18 at 15:58

1 Answers1

3

Those things are considered special parts-of-speech in corpus linguistics. There are several lists for parts-of-speech in general use, and in Universal Dependencies they are called "Symbols". Some of them may also be classified as Interjections, Particles, Other, or something else.

Sir Cornflakes
  • 30,154
  • 3
  • 65
  • 128