Is there any open source software to automatically extract information from a book: genre, style, location and time of the story, characters, etc?
-
2I think this question concerns available software and the identification of books rather than linguistics per se. – James Grossmann Apr 04 '12 at 05:25
-
In any case, I doubt it would be possible to infer information like those from a book in some automatic way... I can't see how that could be automatized. Andrey, do you mean extract information from a book or a database? Please include more detail in your question! :) – Alenanno Apr 04 '12 at 10:31
-
@Alenanno Well, according to the answer to this question it's at least possible to extract named entities from a running text; so it sounds like this larger task is conceptually possible. – Mark Beadles Apr 08 '12 at 00:57
-
1@MarkBeadles What kind of sorcery is that??? :D I honestly didn't think it was possible, but you never stop learning. eheh :) – Alenanno Apr 08 '12 at 11:07
1 Answers
This, generally speaking, corresponds to the fields of text mining, entity extraction, and automated (meta)data extraction. It's not part of linguistics per se; it's more applied linguistics with relations to information science and library science.
It's an enormously important field right now: consider, for example, the Google Books project. Google needs to be able to mine through thousands of scanned books to extract author, publisher, title, date, etc. so that it can create metadata to index the books with. They don't solely use text mining to do this, but it is one of the methods. I don't know if these methods are good enough to find genre, characters, etc., but it's a similar idea.
If you're looking for software, I suggest starting with the list of FOSS Information Extraction tools found in Wikipedia as a starting point.
- 6,860
- 2
- 24
- 46