jorge@home:~$

  • Hebrew

    Let’s see how far can we get using Hebrew: NLTK The first good sign comes from NLTK, running the command: >>> nltk.corpus.udhr.fileids() returns as part of its output: 'Hebrew_Ivrit-Hebrew', 'Hebrew_Ivrit-UTF8' Getting the Text But sadly, the API chosen at the beginning does not provide a bible in Hebrew. For a...

  • One result ...several questions

    At this point we have tokenized the whole text of the “World Messianic Bible” and we have the first report of the most used words that are also semantically valuable:” LORD”, “shall” and “said” are the first of 2362 unique token-words found. The API service that provided us with the...

  • Tokenizing all documents

    Scripts Source Code The purpose is to go chapter by chapter and run the “tokenizing by words” process, then store the results in a database so we can start getting statistics on the words used. All our JSON documents are stored in TinyDB structures, from there we will extract information...