Sentiment Symposium Tutorial: Linguistic structure

Overview
Negation
Other scope marking
Part-of-speech tagging
Dependency parsing
Summary of conclusions

Overview

So far, the only structure we've imposed on our texts is to (carefully) turn them into lists of tokens. The present section explores practical methods for building on that basic structure by identifying semantic groupings and relationships that are relevant for sentiment.

Demo See the effects of negation marking on our own input texts:

http://sentiment.christopherpotts.net/tokenizing/

Demo The Stanford parser has an online interface:

http://nlp.stanford.edu:8080/parser/

Negation

Sentiment words behave very differently when under the semantic scope of negation. The goal of this section is to present and support a method for approximating this semantic influence.

The issue

The rules of thumb for how negation interacts with sentiment words are roughly as follows:

Weak (mild) words such as good and bad behave like their opposites when negated: bad ≈ not good; good ≈ not bad.
Strong (intense) words like superb and terrible have very general meanings under negation: not superb is consistent with everything from horrible to just-shy-of-superb, and different lexical items favor different senses.

These observations suggest that it would be difficult to have a general a priori rule for how to handle negation. It doesn't just turn good to bad and bad to good. Its effects depend on the words being negated.

An additional challenge for negation is that its expression is lexically diverse and its influences are far-reaching (syntactically speaking). In all of the following, for example, we have something like neg-enjoy:

I didn't enjoy it.
I never enjoy it.
No one enjoys it.
I have yet to enjoy it.
I don't think I will enjoy it.

Paying attention to bigrams like not enjoy would partially address the challenge, but it would leave out a lot of the effects of negation.

We could capture the effects of negation fairly exactly if we had accurate semantic parses and were willing to devote extensive time and energy to implementing the relevant principles of semantic scope-taking. Though I myself love such projects, I suspect that it would not be worth the effort. The approximate method I develop next does extremely well.

The approach

The method I favor for approximating the effects of negation is due to Das and Chen 2001 and Pang, Lee, and Vaithyanathan 2002. It works as as follows:

Definition: Negation marking: Append a _NEG suffix to every word appearing between a negation and a clause-level punctuation mark.

Negation and clause-level punctuation are defined as follows:

Definition: Negation

A negation is any word matching the following regular expression:

(?:
    ^(?:never|no|nothing|nowhere|noone|none|not|
        havent|hasnt|hadnt|cant|couldnt|shouldnt|
        wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint
    )$
)
|
n't

Definition: Clause-level punctuation

A clause-level punctuation mark is any word matching the following regular expression:

^[.:;!?]$

The sentiment tokenization strategy we've been using makes this straightforward, since it isolates the clausal punctation from the word-internal punctuation.

The algorithm captures simple negations:

No one enjoys it.

no one_NEG enjoys_NEG it_NEG .

It also handles long-distance effects:

I don't think I will enjoy it: it might be too spicy.

i don't think_NEG i_NEG will_NEG enjoy_NEG it_NEG : it might be too spicy .

The algorithm has literally turned enjoy into two tokens: enjoy outside of the scope of negation, enjoy_NEG inside the scope of negation. Thus, we needn't stipulate a relationship between the negated and un-negated forms; the sentiment analysis system can learn how the two behave.

The above regular expression for clause boundaries does not include commas, which are often used for sub-clausal boundaries. However, some applications might benefit from the more conservative behavior that comes from treating the comma as a boundary. This would prevent neg-marking of but I might in the following example:

I don't think I will enjoy it, but I might.

Demo See the effects of negation marking on our own input texts:

http://sentiment.christopherpotts.net/tokenizing/

Assessment

Figure fig:tokenizer_accuracy_neg extends the tokenizer assessment we did earlier with data on the sentiment-aware tokenizer plus negation marking. As you can see, negation marking results in substantial gains at all amounts of training data.

Negation marking is very difficult to implement for the other tokenizer strategies, since they either fail to reliably separate punctuation (Whitespace) or they separate too much of it (Treebank). Thus, I've not tried to do negation marking on their output.

I should add also that negation marking is most important when the texts are short. For them, a single never very good might be the only indication of negativity, whereas longer texts tend to contain more sentiment cues and are thus less dependent on any single one.

Figure fig:tokenizer_accuracy_neg

Assessing the value of negation marking.

Negation marking also helps substantially with out-of-domain testing, as we see in Figure fig:tokenizer_accuracy_neg_xtrain. Whereas the sentiment tokenizer alone only barely improves on the simpler methods, it leaps ahead of them when negation marking is added.

Figure fig:tokenizer_accuracy_neg_xtrain

Assessing the value of negation marking by training on OpenTable data and testing on IMDB user-supplied movie reviews.

figures/tokenizer-accuracy-xtrain-neg.png

Other scope marking

Scope marking is also effective for marking quotation and the effects of attitude reports like say, claim, etc., which are often used to create distance between the speaker's commitments and those of others:

They said it would be horrible, but they were wrong: I loved it!!!
This "wonderful" car turned out to be a piece of junk.

For quotation, the strategy is to turn it on and off at quotation marks. To account for nesting, one can keep a counter (though nesting is rare). For attitude verbs, the strategy is the same as the one for negation: _REPORT marking between the relevant predicates and clause-level punctuation.

The only concern is that small data sets might not support the very large increase in vocabulary size that this marking will produce.

Part-of-speech tagging

There are many cases in which a sentiment contrast exists between words that have the same string representation but different parts of speech. Table tab:pos provides a sample, using the Harvard General Inquirer's disjoint Positiv/Negativ classes to assess the polarity values.

Table tab:pos

Sentiment contrasts between identical strings with differing parts of speech.

Word	Tag1	Polarity	Tag2	Polarity
arrest	jj	Positiv	vb	Negativ
board	jj	Positiv	rb	Negativ
deal	intj	Negativ	nn	Positiv
even	jj	Positiv	vb	Negativ
even	rb	Positiv	vb	Negativ
fine	jj	Positiv	nn	Negativ
fine	jj	Positiv	vb	Negativ
fine	nn	Negativ	rb	Positiv
fine	rb	Positiv	vb	Negativ
fun	nn	Positiv	vb	Negativ
help	jj	Positiv	vbn	Negativ
help	nn	Positiv	vbn	Negativ
help	vb	Positiv	vbn	Negativ
hit	jj	Negativ	vb	Positiv
matter	jj	Negativ	vb	Positiv
mind	nn	Positiv	vb	Negativ
order	jj	Positiv	vb	Negativ
order	nn	Positiv	vb	Negativ
pass	nn	Negativ	vb	Positiv

This suggests that it might be worth the effort to run a part-of-speech tagger on your sentiment data and then use the resulting word–tag pairs as features or components of features.

I look to the Stanford Log-Linear Part-of-Speech Tagger (Toutanova, Klein, Manning, and Singer) for all my tagging needs. It is extremely well documented, it comes with sample Java code to facilitate incorporating it into other Java programs, it has a flexible command-line interface in case your existing system is in another programming language, and it is impressively fast (15000 words/second on a standard 2008 desktop). It is free for academic research and has very reasonably priced, ready-to-sign commercial licenses.

The only challenge is that the default tokenization is Treebank style. We saw in the tokenizing unit that this is a sub-optimal tokenizing strategy for sentiment applications.

However, it is straightforward to write new tokenizers to feed the POS-tagger. If you do this, there will be tokens that it handles incorrectly, but these errors might not affect your sentiment analysis.

To illustrate, let's work a bit with the following file, which I'll call foo.txt. The format is one sentence per line:

It is fine if you fine me, but don't rub it in! The play was a hit until someone hit the lead actor. :-( I found it to be booooring!

Suppose we tokenize this using the sentiment tokenizer and then store it on a single line, space-separated, in a file called foo.tok. The output:

It is fine if you fine me , but don't rub it in ! The play was a hit until someone hit the lead actor . :-( I found it to be boooring !

Then the command-line call we want is as follows (I assume that you are in the same directory as the Part-of-Speech Tagger):

java -mx3000m -cp stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model models/bidirectional-distsim-wsj-0-18.tagger -outputFormat slashTags -tagSeparator / -tokenize false -textFile foo.txt

The output (in the actual output, each example is on a single line with no blank lines between):

It/PRP is/VBZ fine/JJ if/IN you/PRP fine/VBP me/PRP ,/, but/CC don't/NN rub/VBP it/PRP in/IN !/. The/DT play/NN was/VBD a/DT hit/NN until/IN someone/NN hit/VBD the/DT lead/JJ actor/NN ./. :-(/NN I/PRP found/VBD it/PRP to/TO be/VB boooring/VBG !/.

Because the Treebank-style turns contractions into two tokens, the tagger gets the tag for can't wrong (it wants to see ca and n't). This can have detrimental effects on the surrounding tags, so it is perhaps worth pursuing a blend of the sentiment and Treebank styles if POS-tagging is going to be a large part of your system. However, the errors might not be problematic, since they seem to be very consistent.

For more on using the tagger efficiently from the command-line, see Matt Jockers' tutorial.

Figure fig:pos_accuracy uses our familiar classifier assessment to see what POS-tagging contributes. The performance is not all that different from the regular sentiment-aware tokenization, but we shouldn't conclude from this that POS-tagging is not valuable. Wider testing might reveal that there are situations in which it helps substantially.

Figure fig:pos_accuracy

Assessing POS-tagging.

Dependency parsing

Dependency parsing transforms a sentence into a quasi-semantic structure that can be extremely useful for extracting sentiment information, particularly where the goal is to relativize the sentiment information to particular entities or topics.

The Stanford Parser (Klein and Manning 2003a, b) can map raw strings, tokenized strings, or POS-tagged strings, to dependency structures (and others). I focus here on the Stanford Dependencies (de Marneffe, Manning, and MacCartney 2006).

There isn't space here to review dependency parsing in detail, but a few simple examples probably suffice to convey how this tool can be used in sentiment analysis.

I put the output of our tagging sample above into a file called foo.tagged and then ran the following command from inside the parser's distribution directory:

java -mx3000m -cp stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "typedDependencies" -tokenized -tagSeparator / englishPCFG.ser.gz foo.tagged

The output is a list of sequences of dependecy-graph edges, with indices keeping track of linear order:

nsubj(fine-3, It-1) cop(fine-3, is-2) mark(fine-6, if-4) nsubj(fine-6, you-5) advcl(fine-3, fine-6) dobj(fine-6, me-7) nsubj(rub-11, don't-10) conj_but(fine-3, rub-11) dobj(rub-11, it-12) prep(rub-11, in-13) det(play-2, The-1) nsubj(hit-5, play-2) cop(hit-5, was-3) det(hit-5, a-4) mark(hit-8, until-6) nsubj(hit-8, someone-7) advcl(hit-5, hit-8) det(actor-11, the-9) amod(actor-11, lead-10) dobj(hit-8, actor-11) nsubj(found-2, I-1) nsubj(boooring-6, it-3) aux(boooring-6, to-4) aux(boooring-6, be-5) xcomp(found-2, boooring-6)

Figure fig:dep provides the graphical structure for these examples:

Figure fig:dep

Stanford Dependency structures.

Such dependency can isolate not only what the sentiment of a text is but also where that sentiment is coming from and whom it is directed at.

Demo The Stanford parser has an online interface:

http://nlp.stanford.edu:8080/parser/

Like the POS-tagger, the Parser is free for academic use and has reasonably priced, ready-to-sign commercial licenses.

Summary of conclusions

Negation marking is helpful at all amounts of training data, in both in-domain and out-of-domain testing.
Other kind of semantic marking (particularly for reported information) is also likely to be helpful where there is enough data to support the much larger vocabulary that this will create.
POS-tagging did not have much of an impact in our experimental design, but POS distinctions do matter for sentiment, so it might nonetheless be worth the resources. POS-tagging is often a step along the way to richer analyses as well.
Dependency structures identify useful semantic relationships.