Sentiment Symposium Tutorial: Stemming

  1. Overview
  2. Porter stemmer
  3. Lancaster stemmer
  4. WordNet stemmer
  5. Assessment
    1. Classification accuracy
    2. Speed
  6. Summary of conclusions

Overview

Stemming is a method for collapsing distinct word forms. This could help reduce the vocabulary size, thereby sharpening one's results, especially for small data sets.

This section reviews three common stemming algorithms in the context of sentiment: the Porter stemmer, the Lancaster stemmer, and the WordNet stemmer.

My overall conclusion is that the Porter and Lancaster stemmers destroy too many sentiment distinctions. The WordNet stemmer does not have this problem nearly so severely, but it doesn't do enough collapsing to be worth the resources necessary to run it.

Demo See the effects of Porter stemming on our own input texts:

Porter stemmer

The Porter stemmer is one of the earliest and best-known stemming algorithms. It works by heuristically identifying word suffixes (endings) and stripping them off, with some regularization of the endings.

The Porter stemmer often collapses sentiment distinctions, by mapping two words with different sentiment into the same stemmed form. Table tab:porter provides examples of such collapsing relative to the disjoint Positiv/Negativ classess of the Harvard General Inquirer, a large gold-standard semantic resource containing extensive sentiment information.

Table tab:porter
Porter stemming. 36 instances in which an Harvard Inquirer Positiv/Negativ distinction is destroyed by the algorithm.
Positiv Negativ Stemmed
captivation captive captiv
common commoner common
defend defendant defend
defense defensive defens
dependability dependent depend
dependable dependent depend
desirable desire desir
dominance dominate domin
dominance domination domin
extravagance extravagant extravag
home homely home
pass passe pass
patron patronize patron
prosecute prosecution prosecut
affection affectation affect
capitalize capital capit
closeness close close
commitment commit commit
Positiv Negativ Stemmed
competence compete compet
competency compete compet
competent compete compet
conviction convict convict
defender defendant defend
desirous desire desir
impetus impetuous impetu
indulgence indulge indulg
objective object object
objective objection object
rational ration ration
subsidize subside subsid
temperance temper temper
temperate temper temper
tolerance tolerable toler
tolerant tolerable toler
tolerate tolerable toler
toleration tolerable toler

Lancaster stemmer

The Lancaster stemmer is another widely used stemming algorithm. However, for sentiment analysis, it is arguably even more problematic than the Porter stemmer, since it collapses even more words of differing sentiment. Table tab:lancaster illustrates with a randomly chosen selection of such collapses, again using the Harvard General Inquirer's Positiv/Negativ distinction as a gold standard.

Table tab:lancaster
Lancaster stemming. 50 randomly selected instances in which a Harvard Inquirer Positiv/Negativ distinction is destroyed by the algorithm.
Positiv Negativ Stemmed
apprehend apprehensive apprehend
arbitrate arbitrary arbit
arbitration arbitrary arbit
audible audacious aud
call callous cal
capitalize capital capit
captivation capture capt
captivation captive capt
comical commiseration com
comely commiseration com
comic commiseration com
commitment commit commit
competency compete compet
compliment complicate comply
compliment complication comply
consummate consumptive consum
content conceal cont
contentment conceal cont
conviction convict convict
credentials credulous cred
credibility credulous cred
cute cut cut
deference defeat def
defender defensive defend
defend defensive defend
Positiv Negativ Stemmed
defend defendant defend
dependability dependent depend
desirous desire desir
dominance dominate domin
famous famished fam
fill filth fil
flourish floor flo
meaningful mean mean
notoriety notorious not
notable notorious not
passionate passe pass
pass passe pass
patronage patronize patron
rational ration rat
refuge refugee refug
repentance repeal rep
repent repeal rep
ripe rip rip
savings savage sav
simplify simplistic simpl
simplicity simplistic simpl
suffice sufferer suff
temperate temper temp
tolerant tolerable tol
truth truant tru

WordNet stemmer

WordNet (Fellbaum 1998) has high-precision stemming functionality, but it is probably of limited use for sentiment analysis. To effect real change, it requires (word, part-of-speech tag) pairs, where the part-of-speech is a, n, r (adverb), or v. When given such pairs, it collapses tense, aspect, and number marking.

The only danger I know of for sentiment analysis is that it collapses base, comparative, and superlative adjective forms. Table tab:wordnet provides some illustrations.

Table tab:wordnet
WordNet stemming. Representative examples of what the stemmer does and doesn't do. Collapsing adjectival forms is the only worrisome behavior when it comes to sentiment.
WordStemmed
(exclaims, v)exclaim
(exclaimed, v) exclaim
(exclaiming, v)exclaim
(exclamation, n)exclamation
(proved, v)prove
(proven, v)prove
(proven, a)proven
(happy, a)happy
(happier, a)happy
(happiest, a)happy

Assessment

Classification accuracy

To assess the impact of the stemming algorithms, I use the experimental set-up as I used when assessing tokenizers. Here, though, I compare just the plain sentiment tokenizer with the sentiment tokenizer plus the Porter and Lancaster stemmers, applied to the output of sentiment tokenization. (Since the WordNet stemmer requires part-of-speech tagged data, and since its changes are minimal, I don't assess it here.)

The results of the experiment are given in figure fig:stemmer_accuracy. It looks like the stemmers benefit somewhat from their reduced vocabulary size when the amount of training data is small, though not enough to improve on the sentiment-aware tokenizer, and they are quickly out-paced as the training data grows.

Figure fig:stemmer_accuracy
Assessing stemming algorithms via classification. (Details on the experimental design.)
figures/stemmer-accuracy.png

Speed

Table tab:tokenizer_speed extends the tokenizer speed assessment given earlier with comparable numbers for the stemming algorithms.

Table tab:tokenizer_speed
Tokenizer speed for 12,000 OpenTable reviews. The numbers are averages for 100 rounds. The average review length is about 50 words.
Tokenizer Total time (secs) Average secs/text
Whitespace 1.305 0.0001
Treebank 9.085 0.001
Sentiment 29.915 0.002
Sentiment + Porter stemming 49.471 0.004
Sentiment + Lancaster stemming 62.938 0.005

Summary of conclusions

  1. My central conclusion is that, for sentiment analysis, running a stemmer is costly in terms of resources and performance accuracy.
  2. I can imagine running the WordNet stemmer if matching against a restricted vocabulary became important, but in that case it would be better to run the algorithm in reverse to expand the word list.
  3. I don't mean to suggest that stemming could never help with sentiment analysis, but rather only that these off-the-shelf stemming algorithms can weaken sentiment systems.