jorge@home:~$

NLTK

This toolkit has a lot of implementations of code and resources that promise to make a lot easier the construction of NLP projects.

Again, an independent Linux server seems ideal to handle all the NLP processes related to the implementation of the code of “Theophilus” (that’s the code-name for this little project of mine). An AMI 2 instance in AWS (again, Free!), this time without GUI, will serve as a starting point to handle:

NLP

  • Python
  • Anaconda
  • NLTK

Coding Tools

  • Nano (because editing code in the terminal makes you feel like a real badass)
  • mc : Midnight Commander, Terminal File Browser
  • micro : Terminal Text Editor with mouse integration

Other Tools

  • Jekyll : Static page Generator, to maintain this blog. Includes a server for preview

Once more, all installation goes without a cinch, amazon VM internet connection speed is better than my local one.

The first thing to try is Tokenizing; according to the documentation, NLTK implements WordNet which will help us with:

  • Looking up the definition of a word
  • Finding synonyms and antonyms
  • Exploring word relations and similarity
  • Word sense disambiguation for words that have multiple uses and definitions

Installing NLTK’s “all data” as suggested:

python -m nltk.downloader all
Previous Home Next
Reading the First Document θεόφιλος Journey Getting Data