Milestone 1

04 Aug 2020

This is what we have so far:

Data: A whole bible text inside TinyDB documents database entities.
Code: A function that returns the raw text of a whole chapter on receiving the ChapterID
NLTK environment up and running.

The code used to retrieve this information is inside small python scripts available on the GitHub repo associated with this project. Since the data is stored in a database we can query it any way we want; also the chapter data is in a JSON format that also can give us paragraphs and verses individually

Here’s where the fun begins; we can start playing with query and code to start gaining insights in the data we have:

python d_d_print_books_chapters.py ' wc -l
1255

We can see that there are 1255 chapters that we will treat as separate documents, so in the end, we could get what is the main topic(s) in any particular chapter/document.

Previous	Home	Next
Database Support	θεόφιλος Journey	Tokenizing all Documents

jorge@home:~$

Archive

About

Source Code

Milestone 1