- get text data Repeat 20 times 1. separate 90:10 2. text ==> topics 3. topics ==> classes (e.g. buggy) - needs some param tuning (e.g. words per topic) 4. new data sets: - top top words in each topics + LDA 5. classifying with SVM