jorge@home:~$

θεόφιλος Journey

I have lots of PDF documents: books, articles, papers, etc.

Calibre is helping me to keep them organized locally on my Linux machine; storing them on the cloud is still a work in progress, it would give me progress sync and the possibility to read anywhere, I Librarian would be my first choice for this purpose.

From time to time I open one of them, read a little and then add a tag or a label as a guide to find it later or to improve filtering and classification.

The main topics in my library are Theology, Machine Learning, Maths, and Origami. To read ALL of them and tag/label each one seems to be an unattainable goal, as much as I would like to do it.

So…what about letting an AI do it

When it gets something wrong, just get in and tighten some parametrical bolts so it keeps improving.

So I would get some kind of recommendation system based on each document “fingerprint” according to the model. A new model a new fingerprint.

Right now it seems like a different model will be necessary for each General Topic: Theology for example would have its specialized model.

Each time a new document arrives, I could get a fingerprint and a general idea of what it is about, what other books are topically close, and with some extra work, sentiment analysis could be part of the fingerprint.

Sounds like a job for Natural Language Processing, Topic Modeling as a starting point. So this blog is about this little project, later on, maybe I could use the experience to build something similar processing movies/series subtitles and/or public comments.

Next
Walked Paths