Data Categorization & Sentiment Analysis

Categorization and sentiment analysis are hot topics in the online world today. The abundance of data has lead to a pressing demand to categorize the data into specific areas so as to be more relevant and focused. Trying to understand the category or sentiment of pieces of information, typically unstructured is a hard problem and a topic of active academic research. We have implemented custom solutions while using weka, rainbow, nltk and other techniques for our client requirements.


We were asked to categorize the huge amounts of data coming through from various online social media sources and present this categorization visually on the main website (

We scrape a variety of online sources to obtain information on a given set of companies; we then, based on client inputs, categorize the pieces of information into various bins. We do this using our custom categorizer (built using NaiveBayes). We also analyze the sentiment of the information pieces (using n-gram analysis) and present the same to the user. The estimated sentiment is translated as an up/down or slider score for the company and shown visually to the end user.