Sentiment Symposium Tutorial: Summarization

  1. Overview
  2. Why visualize?
  3. Visualization best practices
  4. Words and lexicons
  5. Products and services
  6. Tools
  7. Summary of conclusions


This section focusses on sentiment summarization via visualization. While there is work on textual sentiment summarization, I think high-level visual summaries are better in this area. Any linguistic summary will leave out important nuances of the original source texts, which could be misleading. Of course, visual summaries can make such mistakes too, but we expect them to be high-level and approximate, so we are less likely to be misled.

The central online demos all summarize their results visually in addition to providing numerical information:

Demo Lexicon visualization
Demo Text scoring:
Demo Trained model predictions:

Why visualize?

It's often the case that a visualization can capture nuances in the data that numerical or linguistic summaries cannot easily capture. Figure fig:tufte is a famous example involving datasets that might be summarized in the same way but nonetheless have very different properties.

Figure fig:tufte
Anscombe’s Quartet (Anscombe 1973), via Tufte (2001): four dramatically different data-sets with the same mean (7.50), standard deviation (2.03), and least-squares fit (3 + 0.5x).

Visualization best practices

Visualization is an art and a science in its own right, so I cannot hope to do justice to it here. The following advice from Tufte (2001, 2006) is easy to keep in mind (if only so that your violations of it are conscious and motivated):

  1. Draw attention to the data, not the visualization.
  2. Use a minimum of ink.
  3. Avoid creating graphical puzzles.
  4. Use tables where possible.

And some basic experimental evidence concerning effective visualization:

  1. Proportion judgments: highest accuracy with side-by-side bar charts (Cleveland and McGill 1984; Heer and Bostock 2010).
  2. For Web-based displays: gridlines improve accuracy; for a 0-100 scale, at least 80 pixels (no evidence that increasing height beyond that helps; Heer and Bostock 2010).

Words and lexicons

The online interface to SentiWordNet uses colored triangles to place words in a space defined by positive, negative, and neutral sentiment (figure fig:sentiwordnet).

Figure fig:sentiwordnet
SentiWordNet lexical visualizations.
figures/sentiwordnet-superb.png figures/sentiwordnet-great.png figures/sentiwordnet-good.png figures/sentiwordnet-legislative.png figures/sentiwordnet-terrible.png

Twitter Sentiment uses Google Charts to summarize its search results, and it also provides the raw data so that users can probe more deeply (figure fig:twittersentiment).

Figure fig:twittersentiment
Twitter Sentiment results for Netflix.

Twitrratr blends the data and summarization together (figure fig:twittrratr).

Figure fig:twittrratr
Twitrratr results for Netflix.

We Feel Fine aggregates enormous amounts of data and then visualizes the results for strings of the form we feel X (figure fig:wefeelfine).

Figure fig:wefeelfine
Visualizations from We Feel Fine.
figures/wefeelfine-mobs.png figures/wefeelfine-murmurs.png figures/wefeelfine-mounds.png

Figure fig:ep2d uses the t-SNE algorithm to embed a very high dimensional lexicon into a 2d space.

Figure fig:ep2d
A 2d embedding of a lexicon derived from Experience Project data using an extension of the model from Maas, Daly, Pham, Huang, Ng and Potts 2010.

Figure fig:imdbep visualizes scores derived from the IMDB and Experience Project websites using the methods described in the lexicons section.

Figure fig:imdbep
Merged IMDB and EP lexicons. The x-axis represents attenuation and emphasis. The y-axis represents sentiment polarity. The colors represent the (largely orthogonal) Experience Project scores.

Figure fig:gephi uses the Gephi social networking program to graph the relationships between modifiers in WordNet as given by the similar-to graph.

Figure fig:gephi
WordNet modifier relationships visualized using Gephi.

Products and services

Many online retailers and social networking sites do an excellent job of summarizing rating information about specific products and services. I think the summaries in figure fig:ratingsum work particularly well.

Figure fig:ratingsum
Effective rating summaries of products and services.
figures/amazon-tattoo-summary.png figures/review-dist-imdb.png figures/opentable-kaygetsu-reviewsum.png figures/review-dist-tripadvisor.png figures/review-dist-goodreads.png

If you build a classifier model, I think it makes sense to provide similar distributional information, so that it is apparent not only what predictions your system makes but also where it is particularly certain or uncertain. Figure fig:predict provides some examples from the classifier demo.

Figure fig:predict
Classifier predictions. The rightmost case is unclear, and this is reflected in the relatively slim margin by which neg wins our over pos, as compared with the more certain judgments of the other short reviews.
This movie was terrible! This movie was amazing! This movie was okay but a bit too long.
figures/predict-posneg1.png figures/predict-posneg2.png figures/predict-posneg3.png

Bing Liu often uses boxplot-like visualizations to compare products along a variety of dimensions. In figure fig:liu, this is particularly valuable, since reducing the comparison to a single number might be misleading, as each product has its own strengths and weaknesses (which individual users might care about at different levels).

Figure fig:liu
Comparing two products along multiple dimensions (Liu, Hu, and Cheng 2005).

Finally, Wordle graphics are extremely popular these days. They usually represent the words in a text, with size corresponding to frequency (and the colors often randomly assigned). Figure fig:wordle does something slightly different: it visualizes a text as a cloud of semantic classes from the Harvard General Inquirer (left) and LIWC (right).

A cautionary note about Wordle: naive users tend to assume that the color choices are deliberate and that size corresponds to importance, a cognitively much deeper notion that frequency. (The practice of filtering very high-frequency function words encourages this misconception.)

Figure fig:wordle
Wordle-like visualizations of a text from the Experience Project. At left, the text is reduced to its Harvard General Inquirer classes, with the size of a class name given by the number of words from the text in that class. At right is the same kind of visualization using LIWC semantic classes.


Some accessible, open-source visualization toolkits:

Summary of conclusions

  1. Visualization is often the best way to summarize sentiment information.
  2. Colors and shapes can provide a quick mental framework for sentiment analysis.
  3. Dimensionality reduction is generally very important, but it should be done with caution.
  4. Favor visualizations that convey the degree of certainty you have in your conclusions.
  5. Try to make the raw data supporting your conclusions easily accessible, so that users can drill down to gain a better understanding.