Reading the first document
Started reading
Debortoli, Stefan; Müller, Oliver; Junglas, Iris; and vom Brocke, Jan (2016) "Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial,"
Communications of the Association for Information Systems: Vol. 39, Article 7. DOI: 10.17705/1CAIS.03907 Available at http://aisel.aisnet.org/cais/vol39/iss1/7
Looks promising, it goes along with my idea of a practical application and example with theoretical support, good references, and example cases; cool!
Based on computer science and linguistics is particularly focused on probabilistic topic modeling
Leximancer seems to implement a similar idea, might check it more later to see how they do it
A sufficiently large data set (n > 1,000) will be needed.
Note that an extensive preprocessing phase is necessary before one can statistically analyze data through topic models.
Low-level natural language processing (NLP) steps will be implemented, such as:
tokenization
Splitting up documents into sentences and sentences into words; n-gram creation: creating n consecutive words: 1-grams include, for example, “fast”, “food”, or “chain”; 2-grams concatenate two 1-grams, such as “fast food”; and 3-grams comprise three 1-grams, such as “fast food chain”
stopping
Removing common or uninformative words), partof-speech filtering (i.e., identifying and filtering words by their part of speech
lemmatizing
Reducing a word into its dictionary form; for example, plural to singular for nouns, verbs to the simple present tense
stemming
Reducing a word to its stem
Ok, halfway through the document and the practical part starts, so hands on!
Minemytext was the tool used on this publication, this online tool seems pretty neat. Since I want a more hands-on code kind of approach I’ll try some toolkits:
Previous | Home | Next |
---|---|---|
Work Environment | θεόφιλος Journey | NLTK |