What do they think?: An opinion mining Shiny web application

Opinion-rich resources such as customer review sites, social media, online newspapers, personal blogs etc. provide us a broad range of data on what people think about a new marketing tool, politics, movies etc. The productive mining of such data extracted from these environments where opinions are shared in real-time, plays a crucial role in setting future strategies for individuals or companies. To give some examples, the opinion of users on a new launched product on the market, the effects on communities linked to projects with big investments, opinions of individuals about decisions taken by the goverment… These are all very important signals that need to be taken into account by brands, governments, institutions and so on.

Such valuable information can be retrieved by sentiment analysis which is a widely used method in computational linguistics and natural language processing to extract subjective information in the source data. The technical details of sentiment analysis is beyond the aim of this article but generally speaking the method extracts the subjective information in source material by taking into account the word frequencies and assigning a value (i.e., positive, negative or neutral) to each word.  In general there are two approaches: 

  • Supervised: The predictive modelling, where the sentences were annotated, is used,
  • Lexicon: Uses the lexicon based sentiment dictionaries.  


In our web application we use the lexicon method to extract the sentiment values of tweets written on some of the main Twitter hashtags about data science. The selected hashtags are #datascience, #bigdata, #hadoop, #spark, #python, #scala, #rstats, #dataviz and #tableau. The number of extracted tweets is limited to 3000 tweets per day and the application keeps the tweets of the last 7 days.


Main page of the application window


Our web application has also an interactive part. This part of the app enables one to extract tweets based on a user defined keyword. The extraction is limited to 2000 tweets on the selected date and allows to search tweets in English or in Dutch sent in the last 7 days. Sentiment analysis of English tweets is performed by making use of the qdap dictionaries while the tweets in Dutch are analysed by a Dutch lexicon dictionary.

Main page of the user input window

All the calculations and analysis are performed by R and the web application is created by the web application framework Shiny of R.

Data Science