θεόφιλος Journey

Hebrew
10 Sep 2020
Let’s see how far can we get using Hebrew: NLTK The first good sign comes from NLTK, running the command: >>> nltk.corpus.udhr.fileids() returns as part of its output: 'Hebrew_Ivrit-Hebrew', 'Hebrew_Ivrit-UTF8' Getting the Text But sadly, the API chosen at the beginning does not provide a bible in Hebrew. For a...
One result ...several questions
11 Aug 2020
At this point we have tokenized the whole text of the “World Messianic Bible” and we have the first report of the most used words that are also semantically valuable:” LORD”, “shall” and “said” are the first of 2362 unique token-words found. The API service that provided us with the...
Tokenizing all documents
06 Aug 2020
Scripts Source Code The purpose is to go chapter by chapter and run the “tokenizing by words” process, then store the results in a database so we can start getting statistics on the words used. All our JSON documents are stored in TinyDB structures, from there we will extract information...

jorge@home:~$