NLTK
This toolkit has a lot of implementations of code and resources that promise to make a lot easier the construction of NLP projects.
Again, an independent Linux server seems ideal to handle all the NLP processes related to the implementation of the code of “Theophilus” (that’s the code-name for this little project of mine). An AMI 2 instance in AWS (again, Free!), this time without GUI, will serve as a starting point to handle:
NLP
- Python
- Anaconda
- NLTK
Coding Tools
- Nano (because editing code in the terminal makes you feel like a real badass)
- mc : Midnight Commander, Terminal File Browser
- micro : Terminal Text Editor with mouse integration
Other Tools
- Jekyll : Static page Generator, to maintain this blog. Includes a server for preview
Once more, all installation goes without a cinch, amazon VM internet connection speed is better than my local one.
The first thing to try is Tokenizing; according to the documentation, NLTK implements WordNet which will help us with:
- Looking up the definition of a word
- Finding synonyms and antonyms
- Exploring word relations and similarity
- Word sense disambiguation for words that have multiple uses and definitions
Installing NLTK’s “all data” as suggested:
python -m nltk.downloader all
Previous | Home | Next |
---|---|---|
Reading the First Document | θεόφιλος Journey | Getting Data |