Info/CS 4300: Language and Information - in-class demo

Lecture 23

Sentiment analysis

Building lexicons tailored to a domain for which we don't have sentiment labels

In [3]:
%matplotlib inline

from __future__ import print_function
import json
from operator import itemgetter
from collections import defaultdict

from matplotlib import pyplot as plt
import numpy as np

from nltk.tokenize import TreebankWordTokenizer
from nltk import FreqDist,pos_tag
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import load_files
from sklearn.naive_bayes import MultinomialNB

tokenizer = TreebankWordTokenizer()

Using the movie review data, but this time we will not use the sentiment labels (we will pretend we don't have labels).

In [6]:
## loading movie review data: 
## http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz
data = load_files('txt_sentoken')
print(data.data[0])
arnold schwarzenegger has been an icon for action enthusiasts , since the late 80's , but lately his films have been very sloppy and the one-liners are getting worse . 
it's hard seeing arnold as mr . freeze in batman and robin , especially when he says tons of ice jokes , but hey he got 15 million , what's it matter to him ? 
once again arnold has signed to do another expensive blockbuster , that can't compare with the likes of the terminator series , true lies and even eraser . 
in this so called dark thriller , the devil ( gabriel byrne ) has come upon earth , to impregnate a woman ( robin tunney ) which happens every 1000 years , and basically destroy the world , but apparently god has chosen one man , and that one man is jericho cane ( arnold himself ) . 
with the help of a trusty sidekick ( kevin pollack ) , they will stop at nothing to let the devil take over the world ! 
parts of this are actually so absurd , that they would fit right in with dogma . 
yes , the film is that weak , but it's better than the other blockbuster right now ( sleepy hollow ) , but it makes the world is not enough look like a 4 star film . 
anyway , this definitely doesn't seem like an arnold movie . 
it just wasn't the type of film you can see him doing . 
sure he gave us a few chuckles with his well known one-liners , but he seemed confused as to where his character and the film was going . 
it's understandable , especially when the ending had to be changed according to some sources . 
aside form that , he still walked through it , much like he has in the past few films . 
i'm sorry to say this arnold but maybe these are the end of your action days . 
speaking of action , where was it in this film ? 
there was hardly any explosions or fights . 
the devil made a few places explode , but arnold wasn't kicking some devil butt . 
the ending was changed to make it more spiritual , which undoubtedly ruined the film . 
i was at least hoping for a cool ending if nothing else occurred , but once again i was let down . 
i also don't know why the film took so long and cost so much . 
there was really no super affects at all , unless you consider an invisible devil , who was in it for 5 minutes tops , worth the overpriced budget . 
the budget should have gone into a better script , where at least audiences could be somewhat entertained instead of facing boredom . 
it's pitiful to see how scripts like these get bought and made into a movie . 
do they even read these things anymore ? 
it sure doesn't seem like it . 
thankfully gabriel's performance gave some light to this poor film . 
when he walks down the street searching for robin tunney , you can't help but feel that he looked like a devil . 
the guy is creepy looking anyway ! 
when it's all over , you're just glad it's the end of the movie . 
don't bother to see this , if you're expecting a solid action flick , because it's neither solid nor does it have action . 
it's just another movie that we are suckered in to seeing , due to a strategic marketing campaign . 
save your money and see the world is not enough for an entertaining experience . 

In [7]:
## building the term documnet matrix
vec = CountVectorizer(min_df = 50)
X = vec.fit_transform(data.data)
terms = vec.get_feature_names()
len(terms)
Out[7]:
2153

We want to only look at adjectives and adverbs.

We will use the NLTK part of speech tokenizer.

We want to only keep words that are taged as "JJ" (adjectives) or "RB" (adverbs).

In [61]:
##example part of speech (POS) tagging (note that you need to tokenize the sentence first)
pos_tag(tokenizer.tokenize("This was a great day but the time is running out fast"))
Out[61]:
[('This', 'DT'),
 ('was', 'VBD'),
 ('a', 'DT'),
 ('great', 'JJ'),
 ('day', 'NN'),
 ('but', 'CC'),
 ('the', 'DT'),
 ('time', 'NN'),
 ('is', 'VBZ'),
 ('running', 'VBG'),
 ('out', 'RP'),
 ('fast', 'JJ')]
In [9]:
## POS tagging  all reviews
## POS tagging is relatively slow, so this will take a while
#reviews_pos_tagged=[pos_tag(tokenizer.tokenize(m)) for m in data.data]

## Reconstructing adjective-and-adverb-only reviews
reviews_adj_adv_only=[" ".join([w for w,tag in m if tag in ["RB","JJ"]])
                      for m in reviews_pos_tagged]
In [10]:
## It kind of works:
reviews_adj_adv_only[1]
Out[10]:
"good hard great rare rare strong masterful together virtually unheard true real married n't much enough david american anti-government only forward available highly operative wrong always terry surprising david own notable very simple complex character-driven well-written long sharply not caruso b-movie caruso too many memorable stoic memorable extremely well skillfully old-school the"
In [84]:
## term doc matrix only for adj/adv
X = vec.fit_transform(reviews_adj_adv_only)
X = X > 0  # we only keep binary values (is the word in the document)
terms = vec.get_feature_names()
In [85]:
len(terms)
Out[85]:
483
In [86]:
# PMI type measure via matrix multiplication
def getcollocations_matrix(X):
    XX=X.T.dot(X)  ## multiply X with it's transpose to get number docs in which both w1 (row) and w2 (column) occur
    term_freqs = np.asarray(X.sum(axis=0)) ## number of docs in which a word occurs
    #pmi=np.array(XX) * 1.0 / np.array(X.sum(axis=0)).T / np.array(X.sum(axis=0))
    pmi = XX.toarray() * 1.0  ## Casting to float, making it an array to use simple operations
    pmi /= term_freqs.T ## dividing by the number of documents in which w1 occurs
    pmi /= term_freqs  ## dividing by the number of documents in which w2 occurs
    
    return pmi  # this is not technically PMI beacuse we are ignoring some normalization factor and not taking the log 
                # but it's sufficient for ranking
In [87]:
pmi_matrix=getcollocations_matrix(X)
a.shape  # n_words by n_words
Out[87]:
(483, 483)
In [88]:
a
Out[88]:
array([[ 0.00399405,  0.00053261,  0.00085641, ...,  0.00061296,
         0.00066274,  0.00049234],
       [ 0.00053261,  0.01697531,  0.00082139, ...,  0.00045094,
         0.00042829,  0.00057458],
       [ 0.00085641,  0.00082139,  0.00670598, ...,  0.00069823,
         0.00045   ,  0.00055221],
       ..., 
       [ 0.00061296,  0.00045094,  0.00069823, ...,  0.00902344,
         0.00044339,  0.00087074],
       [ 0.00066274,  0.00042829,  0.00045   , ...,  0.00044339,
         0.00298861,  0.00054673],
       [ 0.00049234,  0.00057458,  0.00055221, ...,  0.00087074,
         0.00054673,  0.00278998]])
In [89]:
pmi_matrix[:,1].ravel().tolist()
Out[89]:
[5.14668039114771e-05,
 0.0002227667631989307,
 8.991188635137565e-05,
 0.00026652452025586353,
 6.692992436918547e-05,
 0.00011940298507462687,
 2.6002392220084247e-05,
 3.0030931859815612e-05,
 0.00013568521031207597,
 0.0002261420171867933,
 0.00013819789939192924,
 0.00012756729174639623,
 2.5426530041445244e-05,
 0.00010974539069359087,
 5.7185337679418995e-05,
 1.3935922627757571e-05,
 3.503608716978488e-05,
 5.632216277105041e-05,
 0.00017768301350390902,
 0.00014490653528474132,
 0.000292654375182909,
 0.00024073182474723158,
 0.0002487562189054726,
 0.00029850746268656717,
 0.0002261420171867933,
 8.15594160345812e-05,
 0.00020169423154497784,
 2.8757944382135565e-05,
 0.0002227667631989307,
 0.0002227667631989307,
 0.0002227667631989307,
 0.00026184865147944484,
 3.2731081434930606e-05,
 0.0001243781094527363,
 2.438786459857575e-05,
 1.940880771694195e-05,
 0.00021949078138718174,
 0.00015076134479119556,
 0.00013948946854512484,
 8.577800651912849e-05,
 0.00020445716622367614,
 6.692992436918547e-05,
 0.00012866700977869275,
 2.2997493273233217e-05,
 0.00018201674554058975,
 6.517630189663038e-05,
 0.00018426386585590566,
 0.00013568521031207597,
 0.00019135093761959434,
 0.00010815487778498811,
 0.00017355085039916696,
 0.0002369106846718787,
 0.0001122208506340478,
 0.00014214641080312722,
 8.48032564450475e-05,
 5.0594485201113076e-05,
 0.00029850746268656717,
 0.00012233912405187178,
 6.60414740457007e-05,
 9.387027128508402e-05,
 0.0001320829480914014,
 0.0002332089552238806,
 0.0002763957987838585,
 0.00010222858311183807,
 0.00014490653528474132,
 0.00020169423154497784,
 0.0,
 0.00014351320321469576,
 5.5900273911342166e-05,
 0.0001344628210299852,
 9.387027128508402e-05,
 4.228151029554776e-05,
 0.00014351320321469576,
 0.00027137042062415194,
 0.0002870264064293915,
 0.0001148105625717566,
 0.00021949078138718174,
 0.00021949078138718174,
 0.000281610813855252,
 0.00026652452025586353,
 0.00026652452025586353,
 0.00021949078138718174,
 0.0001554726368159204,
 0.0002261420171867933,
 0.00010364842454394692,
 9.884353069091628e-05,
 5.9941257567583765e-05,
 0.00026652452025586353,
 9.884353069091628e-05,
 0.00029850746268656717,
 0.00024073182474723158,
 0.00019383601473153714,
 4.564334291843534e-05,
 9.950248756218906e-05,
 0.00017355085039916696,
 0.0,
 0.0,
 0.00025297242600556537,
 0.00018201674554058975,
 0.00012233912405187178,
 9.156670634557275e-05,
 8.677542519958348e-05,
 0.000169606512890095,
 5.89935697009026e-05,
 9.629272989889263e-05,
 8.200754469411187e-05,
 8.067769261799113e-05,
 0.00024073182474723158,
 8.831581736288969e-05,
 0.00026652452025586353,
 0.0002870264064293915,
 4.8459003682884284e-05,
 8.111615833874107e-05,
 0.0002227667631989307,
 0.00020729684908789384,
 0.0001015331505736623,
 2.2410470171664203e-05,
 0.0002763957987838585,
 0.0002332089552238806,
 4.6065966463976415e-05,
 0.00013092432573972242,
 0.0002369106846718787,
 0.0001463271875914545,
 3.990741479766941e-05,
 0.00026184865147944484,
 1.1561094604437148e-05,
 7.316359379572725e-05,
 2.6184865147944487e-05,
 7.614986293024671e-05,
 0.000169606512890095,
 5.6750468191362575e-05,
 0.00027137042062415194,
 0.00029850746268656717,
 7.107320540156361e-05,
 0.0002227667631989307,
 0.0001243781094527363,
 9.884353069091628e-05,
 0.00013092432573972242,
 0.00021321961620469082,
 3.175611305176246e-05,
 0.00011940298507462687,
 0.0002870264064293915,
 0.00012335019119279634,
 0.00018892877385225768,
 0.00016223231667748214,
 2.1413734769481146e-05,
 5.0594485201113076e-05,
 5.0594485201113076e-05,
 1.428265371706063e-05,
 0.00012335019119279634,
 0.00025297242600556537,
 0.000281610813855252,
 9.950248756218906e-05,
 0.0,
 0.00027137042062415194,
 0.0001166044776119403,
 0.00024073182474723158,
 0.00016048788316482107,
 0.00029850746268656717,
 4.753303545964446e-05,
 0.0,
 3.0459945172098693e-05,
 0.00015076134479119556,
 0.00014214641080312722,
 0.0002870264064293915,
 0.0001105583195135434,
 0.00015229972586049343,
 0.0002487562189054726,
 0.0001463271875914545,
 0.0002261420171867933,
 1.3012531067417923e-05,
 0.00019638648860958365,
 2.1757103694356208e-05,
 0.00021321961620469082,
 0.00013326226012793177,
 0.00012648621300278268,
 3.462963604252519e-05,
 0.00014214641080312722,
 0.000292654375182909,
 0.0002332089552238806,
 0.00012036591237361579,
 1.8821403700287966e-05,
 3.769033619779889e-05,
 0.00012036591237361579,
 8.528784648187633e-05,
 0.000169606512890095,
 0.0001658374792703151,
 2.282167145921767e-05,
 0.0001015331505736623,
 7.210325185665874e-05,
 0.00017559262510974537,
 0.00026184865147944484,
 0.0001029336078229542,
 8.291873963515754e-05,
 0.00010815487778498811,
 8.15594160345812e-05,
 0.00014925373134328358,
 0.000169606512890095,
 0.00013568521031207597,
 0.0001463271875914545,
 0.0002369106846718787,
 0.0002369106846718787,
 0.00029850746268656717,
 0.00026652452025586353,
 3.324136555529701e-05,
 0.0002296211251435132,
 0.0002870264064293915,
 0.00015229972586049343,
 3.439026067817594e-05,
 0.0,
 0.0002573340195573855,
 1.124745526324669e-05,
 0.00012134449702705983,
 9.7551458394303e-05,
 0.00026652452025586353,
 2.500062501562539e-05,
 7.0402703463813e-05,
 5.3304904051172706e-05,
 0.00026652452025586353,
 0.00021321961620469082,
 0.00017559262510974537,
 0.00011138338159946535,
 0.00020729684908789384,
 0.0001243781094527363,
 0.00010084711577248892,
 0.0001463271875914545,
 1.649212501030758e-05,
 0.0001554726368159204,
 8.528784648187633e-05,
 2.3467567821271004e-05,
 0.00025297242600556537,
 0.0001798237727027513,
 0.0,
 9.884353069091628e-05,
 0.0002369106846718787,
 0.00024467824810374357,
 4.893564962074871e-05,
 0.0002369106846718787,
 6.0919890344197386e-05,
 0.00020729684908789384,
 1.8940828850670503e-05,
 0.00020445716622367614,
 7.654037504783774e-05,
 0.0002296211251435132,
 0.00012036591237361579,
 0.0001166044776119403,
 0.00026652452025586353,
 0.00010894432944765224,
 0.0002332089552238806,
 0.00026184865147944484,
 0.00021949078138718174,
 0.00011138338159946535,
 0.0002227667631989307,
 6.815238874122537e-05,
 0.000281610813855252,
 1.3126977250948425e-05,
 0.00010660980810234541,
 0.00015076134479119556,
 0.00013568521031207597,
 0.0002261420171867933,
 0.000292654375182909,
 0.00016770082173402648,
 0.00024467824810374357,
 5.6750468191362575e-05,
 0.000281610813855252,
 0.00016401508938822373,
 1.7621455884685192e-05,
 1.9535828709853875e-05,
 4.6065966463976415e-05,
 6.72314105149926e-05,
 0.00016223231667748214,
 0.00016770082173402648,
 0.0001320829480914014,
 0.00015878056525881233,
 8.795152112155779e-06,
 2.1916847480658382e-05,
 0.00017355085039916696,
 0.00016401508938822373,
 5.876131155247385e-05,
 7.175660160734788e-05,
 0.0002487562189054726,
 0.00013693002875530606,
 0.00016770082173402648,
 0.0002573340195573855,
 0.000169606512890095,
 0.00012866700977869275,
 0.00026184865147944484,
 5.042355788624446e-05,
 2.5169263295663334e-05,
 0.00016048788316482107,
 3.148812897537628e-05,
 0.0001029336078229542,
 1.140211851361983e-05,
 0.00012756729174639623,
 0.000281610813855252,
 3.545219271811962e-05,
 0.0002332089552238806,
 1.3470553370332453e-05,
 0.00016401508938822373,
 0.00024073182474723158,
 0.00012978585334198572,
 0.00010364842454394692,
 0.0002227667631989307,
 2.5469920024451123e-05,
 0.0,
 0.00025297242600556537,
 0.00010974539069359087,
 6.878052135635188e-05,
 6.753562504220977e-05,
 0.0001658374792703151,
 0.0001798237727027513,
 0.0001554726368159204,
 0.00010222858311183807,
 4.012197079120527e-05,
 0.00010017028949213663,
 0.00017559262510974537,
 0.0001658374792703151,
 9.506607091928892e-05,
 0.0,
 9.213193292795283e-05,
 0.00024467824810374357,
 6.815238874122537e-05,
 0.00012756729174639623,
 0.0001175226231049477,
 9.950248756218906e-05,
 0.00029850746268656717,
 9.446438692612884e-05,
 0.00021021652301870928,
 3.9174207701649236e-05,
 9.629272989889263e-05,
 0.0002369106846718787,
 0.00014214641080312722,
 3.52013517319065e-05,
 0.0002296211251435132,
 0.0002763957987838585,
 0.00027137042062415194,
 0.000169606512890095,
 6.846501437765303e-05,
 0.0,
 3.4232507188826514e-05,
 0.00029850746268656717,
 0.00015710919088766692,
 3.168869030642963e-05,
 0.00016770082173402648,
 0.00015076134479119556,
 2.419023198432473e-05,
 0.00013568521031207597,
 1.7214963246053472e-05,
 7.693491306354824e-05,
 0.0001175226231049477,
 0.0002573340195573855,
 0.0002261420171867933,
 0.0002332089552238806,
 0.0002261420171867933,
 0.00010660980810234541,
 0.00011570056693277797,
 3.631477648255075e-05,
 9.156670634557275e-05,
 0.0001658374792703151,
 0.0,
 2.407318247472316e-05,
 0.00014925373134328358,
 0.00021949078138718174,
 0.00024073182474723158,
 4.678800355588827e-05,
 0.00012756729174639623,
 9.328358208955224e-05,
 8.111615833874107e-05,
 9.950248756218906e-05,
 4.4553352639786145e-05,
 9.7551458394303e-05,
 0.0002261420171867933,
 0.0,
 5.85308750365818e-05,
 0.00024073182474723158,
 0.0001166044776119403,
 8.93734918223255e-05,
 7.28066982162359e-05,
 4.550418638514744e-05,
 7.316359379572725e-05,
 0.00010894432944765224,
 9.819324430479182e-05,
 0.00012978585334198572,
 4.3641441913240814e-05,
 1.0451941970818177e-05,
 0.00016770082173402648,
 0.0002763957987838585,
 0.0002870264064293915,
 0.00010084711577248892,
 7.388798581350672e-05,
 0.00012335019119279634,
 0.00014351320321469576,
 0.00024073182474723158,
 4.468674591116275e-05,
 4.100377234705593e-05,
 0.0002227667631989307,
 0.0001554726368159204,
 0.00021021652301870928,
 0.00024073182474723158,
 2.117074203450831e-05,
 9.950248756218906e-05,
 6.72314105149926e-05,
 7.981482959533881e-05,
 0.0002332089552238806,
 8.884150675195451e-05,
 0.0002573340195573855,
 1.8916822730454192e-05,
 0.00011845534233593934,
 0.00020729684908789384,
 0.00019638648860958365,
 5.2186619350798456e-05,
 0.00026184865147944484,
 0.00019383601473153714,
 8.291873963515754e-05,
 0.0002870264064293915,
 0.00014214641080312722,
 0.00019900497512437813,
 0.00018201674554058975,
 0.00012233912405187178,
 0.00027137042062415194,
 0.00015710919088766692,
 0.00021321961620469082,
 8.627383314640669e-05,
 1.7375288864177365e-05,
 2.3880597014925376e-05,
 0.00021630975556997622,
 0.00016048788316482107,
 9.387027128508402e-05,
 0.00021949078138718174,
 0.00013326226012793177,
 0.000292654375182909,
 0.000140805406927626,
 0.00013693002875530606,
 4.509176173513099e-05,
 1.5827543090486065e-05,
 5.349596105494035e-05,
 0.00026184865147944484,
 0.00012542330364981812,
 0.00016401508938822373,
 0.00026652452025586353,
 0.00027137042062415194,
 4.830217842824712e-05,
 6.489292667099286e-05,
 0.0002573340195573855,
 0.0001166044776119403,
 0.0002296211251435132,
 0.00021021652301870928,
 0.00010737678513905294,
 0.0002870264064293915,
 0.0,
 4.536587578823209e-05,
 0.00021021652301870928,
 0.00018892877385225768,
 0.00027137042062415194,
 0.0002573340195573855,
 0.000292654375182909,
 7.175660160734788e-05,
 0.00010017028949213663,
 7.500187504687617e-05,
 0.00020169423154497784,
 0.00011307100859339665,
 1.554726368159204e-05,
 0.00017155601303825698,
 0.00010084711577248892,
 0.00019900497512437813,
 0.00016770082173402648,
 0.0002870264064293915,
 0.0002296211251435132,
 1.3975068477835541e-05,
 0.0002296211251435132,
 9.884353069091628e-05,
 4.0338846308995566e-05,
 0.00021949078138718174,
 0.0002332089552238806,
 0.00020445716622367614,
 8.627383314640669e-05,
 0.00019900497512437813,
 8.779631255487269e-05,
 0.00017155601303825698,
 5.6750468191362575e-05,
 0.00011307100859339665,
 3.8969642648376916e-05,
 3.0522235448524254e-05]
In [90]:
"worse" in terms
Out[90]:
False
In [91]:
def getcollocations(w):
    if w not in terms:
        return []
    idx = terms.index(w)
    col = a[:,idx].ravel().tolist()
    return sorted([(terms[i],val) for i,val in enumerate(col)],key=itemgetter(1),reverse=True)
In [106]:
## words that are close to "good", not enough info yet
getcollocations("good")
Out[106]:
[(u'good', 0.0012990019157613248),
 (u'sean', 0.0009894664672151583),
 (u'nicely', 0.0009215728176087187),
 (u'forward', 0.0008879991787290832),
 (u'fairly', 0.0008726003490401396),
 (u'sad', 0.0008549720591605408),
 (u'pretty', 0.0008460801423536256),
 (u'stupid', 0.0008334223741852762),
 (u'technical', 0.0008266740148801322),
 (u'totally', 0.0008214479147860624),
 (u'shot', 0.000813992862910578),
 (u'sadly', 0.0008132974126976058),
 (u'average', 0.0008102717526801297),
 (u'intelligent', 0.0007956062005954214),
 (u'horrible', 0.0007921177925752724),
 (u'naturally', 0.0007839768760907504),
 (u'terrific', 0.0007831028773437151),
 (u'nice', 0.0007824948782153426),
 (u'therefore', 0.0007769729135288914),
 (u'thankfully', 0.0007742791829511099),
 (u'acting', 0.000772321679896414),
 (u'lovely', 0.0007690714940692756),
 (u'present', 0.0007649418644183042),
 (u'bad', 0.0007640757922921771),
 (u'climactic', 0.0007468867394326619),
 (u'really', 0.0007457428528542657),
 (u'suspenseful', 0.0007447192634049467),
 (u'mainly', 0.0007442767682989426),
 (u'entertaining', 0.0007378605892618828),
 (u'badly', 0.0007348213465601175),
 (u'total', 0.000731858357259472),
 (u'disappointing', 0.0007271669575334497),
 (u'looking', 0.0007249295207410391),
 (u'maybe', 0.0007220863150353994),
 (u'national', 0.0007177841580814052),
 (u'about', 0.0007150475082412255),
 (u'probably', 0.0007141673247730752),
 (u'particular', 0.0007139457401237506),
 (u'subtle', 0.0007139457401237506),
 (u'slightly', 0.0007136830669301804),
 (u'dull', 0.0007111692844677137),
 (u'fantastic', 0.0007082795040910224),
 (u'terribly', 0.0007071071793945959),
 (u'general', 0.0007068550313453645),
 (u'critic', 0.0007057796940765836),
 (u'weird', 0.000705621269902829),
 (u'able', 0.0007052875977492574),
 (u'very', 0.0007032159940293782),
 (u'natural', 0.0007031633880614717),
 (u'regular', 0.0006980802792321117),
 (u'sure', 0.0006975654707666012),
 (u'right', 0.0006953815152660083),
 (u'usual', 0.0006951223119472298),
 (u'black', 0.0006943271594512939),
 (u'scary', 0.0006937443768766327),
 (u'seemingly', 0.0006931543095197882),
 (u'great', 0.0006917807579957256),
 (u'brilliant', 0.0006905855523078406),
 (u'cool', 0.0006894620041798633),
 (u'definitely', 0.0006877273937350253),
 (u'actually', 0.0006873293419173826),
 (u'interesting', 0.000687056324644594),
 (u'overall', 0.0006852150176757506),
 (u'individual', 0.0006847488850106651),
 (u'mean', 0.0006847488850106651),
 (u'wonderfully', 0.0006826814495431681),
 (u'gary', 0.0006817190226876091),
 (u'well', 0.0006810369408554363),
 (u'fly', 0.0006808926965995029),
 (u'tight', 0.0006796214256947241),
 (u'impressive', 0.0006780063744889347),
 (u'musical', 0.0006771152059110175),
 (u'basically', 0.0006761618818391603),
 (u'sometimes', 0.000675186962089994),
 (u'realistic', 0.0006745871929118003),
 (u'major', 0.000674416540953057),
 (u'evil', 0.0006739860904899585),
 (u'necessary', 0.0006734198345853251),
 (u'special', 0.0006732364731610216),
 (u'funny', 0.0006728094275396127),
 (u'surprising', 0.0006724155630838723),
 (u'basic', 0.0006720430107526881),
 (u'hardly', 0.0006709490078754015),
 (u'offensive', 0.0006702582391177884),
 (u'just', 0.0006698552799986053),
 (u'ensemble', 0.0006694950953842451),
 (u'usually', 0.0006688654657840717),
 (u'believable', 0.0006680846422338569),
 (u'whatever', 0.0006676714791898038),
 (u'anti', 0.0006672826198542243),
 (u'co', 0.0006663493574488339),
 (u'also', 0.0006659902460502185),
 (u'again', 0.0006639350481827149),
 (u'then', 0.0006622797160026675),
 (u'anyway', 0.0006614873613691381),
 (u'relatively', 0.0006610608704849543),
 (u'somewhere', 0.000660454392622124),
 (u'tough', 0.0006601411336216709),
 (u'give', 0.0006577062332317469),
 (u'extremely', 0.0006562095366773631),
 (u'too', 0.0006548689759591067),
 (u'however', 0.0006546717339499118),
 (u'especially', 0.0006544502617801048),
 (u'supposedly', 0.0006544502617801048),
 (u'terrible', 0.0006529247366943702),
 (u'even', 0.0006527486230339266),
 (u'generally', 0.0006523322997678714),
 (u'huge', 0.000651853236931771),
 (u'ahead', 0.0006513777253398226),
 (u'largely', 0.0006509875619823263),
 (u'there', 0.0006507706217540319),
 (u'unbelievable', 0.0006507528026740025),
 (u'so', 0.0006499259587981563),
 (u'always', 0.000649825171590846),
 (u'never', 0.000649801949854652),
 (u'slowly', 0.0006492971101125448),
 (u'fair', 0.0006483905371339927),
 (u'not', 0.0006475484673745697),
 (u'next', 0.0006473426148267702),
 (u'likable', 0.0006461660812512426),
 (u'occasionally', 0.000645284291727162),
 (u'interested', 0.0006442563324688881),
 (u'back', 0.0006437715861799632),
 (u'superior', 0.0006429686782401028),
 (u'entirely', 0.000642160116018976),
 (u'later', 0.0006418646798227951),
 (u'nearly', 0.0006416179037059851),
 (u'strong', 0.0006413284520201026),
 (u'second', 0.000641094133988674),
 (u'little', 0.0006402652800623065),
 (u'powerful', 0.0006399069226294357),
 (u'instead', 0.0006398543246578239),
 (u'personal', 0.0006360797281161018),
 (u'capable', 0.0006352017246689251),
 (u'completely', 0.0006340763356350902),
 (u'wrong', 0.0006339986910994765),
 (u'predictable', 0.0006339610270650738),
 (u'quiet', 0.0006335317602620191),
 (u'ever', 0.0006327797233105649),
 (u'wild', 0.0006318830113738942),
 (u'social', 0.0006306520704426464),
 (u'remarkable', 0.0006302113631956564),
 (u'ago', 0.0006285680480373887),
 (u'recently', 0.0006284024901669663),
 (u'finally', 0.0006282722513089005),
 (u'quite', 0.0006271815008726003),
 (u'frankly', 0.0006266857052197366),
 (u'soft', 0.0006266857052197366),
 (u'typical', 0.0006264023934181003),
 (u'big', 0.0006261946314302335),
 (u'apparent', 0.000625995902572274),
 (u'much', 0.000625995902572274),
 (u'hard', 0.0006251580239147996),
 (u'small', 0.000624448268427326),
 (u'cute', 0.0006242994367116446),
 (u'many', 0.0006232859636000998),
 (u'screen', 0.0006232859636000998),
 (u'dramatic', 0.0006229646821755636),
 (u'moral', 0.0006225856422926838),
 (u'anywhere', 0.0006221317303341736),
 (u'entire', 0.000622029971733903),
 (u'mary', 0.0006217277486910995),
 (u'out', 0.0006213972182558571),
 (u'poor', 0.0006213279972222983),
 (u'important', 0.000620966760014611),
 (u'awful', 0.0006208887098939455),
 (u'perfectly', 0.0006202821758237137),
 (u'fully', 0.0006202306402491188),
 (u'effectively', 0.0006200055111600992),
 (u'brief', 0.0006196195755789227),
 (u'short', 0.0006194569030728637),
 (u'here', 0.0006189801970444799),
 (u'hot', 0.0006180919139034323),
 (u'still', 0.0006177443197554717),
 (u'highly', 0.0006159531875577456),
 (u'extra', 0.0006154948890550985),
 (u'yet', 0.0006150487041692614),
 (u'long', 0.0006143818784058126),
 (u'far', 0.0006138641990340517),
 (u'seriously', 0.0006133867159429216),
 (u'common', 0.0006129837162678666),
 (u'enough', 0.0006123595613716085),
 (u'responsible', 0.0006120321892573201),
 (u'almost', 0.0006113555819655456),
 (u'final', 0.0006112807194463247),
 (u'same', 0.0006110246003817371),
 (u'certain', 0.0006099298358086691),
 (u'quickly', 0.0006097896139945858),
 (u'sweet', 0.000609664482276389),
 (u'obvious', 0.0006093157609676838),
 (u'together', 0.0006092378084619628),
 (u'practically', 0.0006091738285751918),
 (u'possible', 0.0006077038145100972),
 (u'positive', 0.000606729930191972),
 (u'obviously', 0.0006066103303634303),
 (u'comic', 0.0006062872555019151),
 (u'immediately', 0.0006049315303161704),
 (u'solid', 0.0006049315303161704),
 (u'flat', 0.0006041079339508659),
 (u'likely', 0.0006039032903418039),
 (u'visually', 0.0006032792536573804),
 (u'due', 0.0006032517719129537),
 (u'other', 0.000602488434013854),
 (u'incredible', 0.0006016349774960963),
 (u'now', 0.0006013986260286734),
 (u'laughable', 0.0006013867270411772),
 (u'only', 0.0006012920510346847),
 (u'magic', 0.0006007031388319802),
 (u'mental', 0.000599912739965096),
 (u'emotional', 0.0005994544414624489),
 (u'willing', 0.0005992556613890115),
 (u'wonderful', 0.0005986444255042818),
 (u'surprisingly', 0.0005984850561479492),
 (u'similar', 0.0005981534650678376),
 (u'utterly', 0.0005973157151167622),
 (u'several', 0.0005970423440800956),
 (u'off', 0.0005968586387434555),
 (u'early', 0.000596662159194665),
 (u'previous', 0.0005958077652048264),
 (u'unfortunately', 0.0005950586149867968),
 (u'apparently', 0.0005949547834364588),
 (u'often', 0.0005935574189947833),
 (u'away', 0.0005935532409548652),
 (u'whole', 0.0005926566748309103),
 (u'male', 0.0005917634550961866),
 (u'unnecessary', 0.0005911163654788043),
 (u'happy', 0.0005908947245468662),
 (u'professional', 0.0005906014557527775),
 (u'intense', 0.0005898131988882425),
 (u'past', 0.0005896869546247819),
 (u'first', 0.0005888259341605106),
 (u'exactly', 0.0005876096626532927),
 (u'few', 0.0005874664175591812),
 (u'main', 0.0005869133854502856),
 (u'free', 0.0005864554293873665),
 (u'easy', 0.0005863400535405463),
 (u'old', 0.0005863261994427605),
 (u'close', 0.0005852807219171669),
 (u'quick', 0.000584929904301632),
 (u'third', 0.000584929904301632),
 (u'fast', 0.0005848951614942964),
 (u'soon', 0.0005843305908750934),
 (u'time', 0.0005840792658897709),
 (u'criminal', 0.0005837819236536145),
 (u'enjoyable', 0.0005835180248182529),
 (u'rather', 0.0005833916014348164),
 (u'easily', 0.0005824294195746387),
 (u'else', 0.0005817335660267597),
 (u'friendly', 0.0005817335660267597),
 (u'non', 0.0005817335660267597),
 (u'certainly', 0.000580122115705356),
 (u'absolutely', 0.0005793878661637486),
 (u'critical', 0.0005792260937594031),
 (u'minor', 0.0005792260937594031),
 (u'like', 0.000578571970559223),
 (u'as', 0.0005780282566890096),
 (u'such', 0.0005780075865749354),
 (u'spectacular', 0.0005779560753382743),
 (u'normal', 0.0005778029338238763),
 (u'incredibly', 0.0005770421663007374),
 (u'simple', 0.0005743698500011045),
 (u'unfunny', 0.0005740791770000919),
 (u'appropriate', 0.0005726439790575916),
 (u'necessarily', 0.0005721969501902554),
 (u'pathetic', 0.000571238372825246),
 (u'double', 0.0005711565920990004),
 (u'straight', 0.0005711565920990004),
 (u'last', 0.0005711090907190853),
 (u'popular', 0.0005705463820647067),
 (u'animal', 0.000570005066711704),
 (u'effective', 0.000569183476404355),
 (u'clever', 0.0005690871841566128),
 (u'dimensional', 0.0005690871841566128),
 (u'minute', 0.0005690871841566128),
 (u'available', 0.0005683383852300909),
 (u'disturbing', 0.0005665390325857622),
 (u'impossible', 0.0005663772451844384),
 (u'international', 0.0005658267888307156),
 (u'constantly', 0.0005653467050119214),
 (u'originally', 0.0005653467050119214),
 (u'ready', 0.0005652932695955688),
 (u'already', 0.0005646237552612668),
 (u'real', 0.000564423918761781),
 (u'suddenly', 0.0005636870867887034),
 (u'cold', 0.0005633242759626218),
 (u'original', 0.0005631458713257642),
 (u'essentially', 0.0005629679671226707),
 (u'worthy', 0.0005629679671226707),
 (u'particularly', 0.0005622242086295208),
 (u'different', 0.0005621765084891854),
 (u'once', 0.0005610615956620964),
 (u'no', 0.0005609573672400897),
 (u'nowhere', 0.0005609573672400897),
 (u're', 0.0005609573672400897),
 (u'top', 0.0005591295491650161),
 (u'perhaps', 0.0005588034632622061),
 (u'comedy', 0.0005578920264354991),
 (u'computer', 0.000557121761310243),
 (u'somewhat', 0.0005567711182308353),
 (u'worth', 0.0005565775199283053),
 (u'military', 0.0005565053246429462),
 (u'favorite', 0.0005564408022864658),
 (u'new', 0.0005556301367819692),
 (u'potential', 0.0005552911312073616),
 (u'steven', 0.0005548014564884839),
 (u'oddly', 0.000554618865915343),
 (u'ex', 0.0005545497545301821),
 (u'visual', 0.0005544145864067186),
 (u'mysterious', 0.0005536245911977356),
 (u'virtually', 0.0005535851676706262),
 (u'thoroughly', 0.0005530565592507927),
 (u'clear', 0.000552812152943043),
 (u'complex', 0.0005519010754612848),
 (u'shallow', 0.0005508289703315881),
 (u'open', 0.0005504094509330112),
 (u'familiar', 0.0005498931193655318),
 (u'true', 0.0005498272607472308),
 (u'rare', 0.0005494150345808287),
 (u'successful', 0.0005488937679446039),
 (u'young', 0.0005464644125943242),
 (u'epic', 0.0005453752181500873),
 (u'rarely', 0.0005442952672230574),
 (u'bright', 0.0005440919823426753),
 (u'simply', 0.0005430653936967457),
 (u'english', 0.0005429513282916424),
 (u'merely', 0.0005426483420593368),
 (u'low', 0.0005416140097490522),
 (u'earth', 0.0005414796808775866),
 (u'unique', 0.00054041726162145),
 (u'song', 0.0005401811684534198),
 (u'aware', 0.0005398290294909338),
 (u'billy', 0.0005398290294909338),
 (u'serious', 0.000539794634522505),
 (u'silly', 0.0005396344263800864),
 (u'traditional', 0.0005385580279232111),
 (u'weak', 0.0005376628413277627),
 (u'future', 0.0005367184686556415),
 (u'truly', 0.0005366173679316187),
 (u'emotionally', 0.0005365312956935994),
 (u'painful', 0.0005364346408033645),
 (u'indeed', 0.0005360259286960858),
 (u'wealthy', 0.0005358072318667524),
 (u'clearly', 0.000535459305092813),
 (u'humorous', 0.000535459305092813),
 (u'literally', 0.0005350851196944252),
 (u'comedic', 0.0005349871187567523),
 (u'memorable', 0.0005349159673910445),
 (u'single', 0.000533044995826694),
 (u'up', 0.0005324341112787293),
 (u'poorly', 0.0005301281690405149),
 (u'light', 0.0005284762677285353),
 (u'fat', 0.0005281528428400844),
 (u'mostly', 0.0005281528428400844),
 (u'large', 0.0005280223702998572),
 (u'eventually', 0.000527196044211751),
 (u'difficult', 0.0005267526497254501),
 (u'sympathetic', 0.0005267526497254501),
 (u'late', 0.0005266354529449301),
 (u'the', 0.0005262589733901873),
 (u'classic', 0.0005260356714071764),
 (u'music', 0.0005259185887458138),
 (u'surely', 0.0005250391930665246),
 (u'genuinely', 0.000524923647469459),
 (u'thus', 0.0005242934870283192),
 (u'overly', 0.0005235602094240837),
 (u'star', 0.0005224829250425528),
 (u'lucky', 0.0005223311948479709),
 (u'successfully', 0.0005221297170486082),
 (u'giant', 0.0005217295793212024),
 (u'psychological', 0.0005207453695884705),
 (u'constant', 0.0005202040542354678),
 (u'rich', 0.0005175983436853001),
 (u'graphic', 0.0005165392870754849),
 (u'perfect', 0.0005149128185777401),
 (u'dangerous', 0.0005117100812272425),
 (u'barely', 0.0005116611137553546),
 (u'further', 0.0005110944901520818),
 (u'ultimate', 0.0005109304675300817),
 (u'own', 0.000510909126296736),
 (u'hilarious', 0.0005108439229305341),
 (u'slow', 0.0005107278513499641),
 (u'standard', 0.0005104712041884816),
 (u'laugh', 0.0005102493566421155),
 (u'eccentric', 0.0005100410490868422),
 (u'to', 0.0005096279349436109),
 (u'meanwhile', 0.0005095719595539747),
 (u'dead', 0.0005094771025250182),
 (u'alive', 0.0005090168702734148),
 (u'over', 0.0005090168702734148),
 (u'physical', 0.0005082177857046968),
 (u'beautiful', 0.0005078008720166698),
 (u'human', 0.0005066196165672606),
 (u'dark', 0.000505922542794549),
 (u'movie', 0.0005055541704756365),
 (u'complete', 0.0005051896757600809),
 (u'private', 0.0005044720767888307),
 (u'ill', 0.0005040019257387013),
 (u'equally', 0.0005037283833095351),
 (u'initially', 0.000502854438429911),
 (u'heavily', 0.0005024062615685653),
 (u'attractive', 0.000500937237411932),
 (u'recent', 0.0005004011480277578),
 (u'full', 0.0004998273977331568),
 (u'key', 0.0004994884756574592),
 (u'high', 0.0004967698478307461),
 (u'various', 0.000496184512199295),
 (u'wide', 0.0004952596575633224),
 (u'hearted', 0.0004939247258717772),
 (u'life', 0.0004938342634677713),
 (u'female', 0.0004935921166287658),
 (u'blue', 0.0004930191972076788),
 (u'unusual', 0.000491829287640806),
 (u'ultimately', 0.000488766651202863),
 (u'british', 0.0004880646020055018),
 (u'public', 0.0004880646020055018),
 (u'possibly', 0.00048703275295263604),
 (u'david', 0.00048603387317002595),
 (u'former', 0.00048587973980644136),
 (u'political', 0.00048477797168896647),
 (u'of', 0.0004842452486431544),
 (u'modern', 0.00048378862888960117),
 (u'down', 0.00048271508670305596),
 (u'accidentally', 0.0004822265086800772),
 (u'outstanding', 0.0004812134277794888),
 (u'painfully', 0.00047993019197207685),
 (u'sexual', 0.00047947571262361836),
 (u'narrative', 0.00047871824704285443),
 (u'jean', 0.00047596382674916705),
 (u'unable', 0.00047596382674916705),
 (u'self', 0.0004755441055615576),
 (u'fresh', 0.00047517840789314535),
 (u'thin', 0.0004732747655810927),
 (u'cast', 0.00047240777517000657),
 (u'directly', 0.00047120418848167544),
 (u'on', 0.00047051979605105567),
 (u'elaborate', 0.0004692284895781883),
 (u'extraordinary', 0.0004674644727000748),
 (u'year', 0.00046463135468371064),
 (u'one', 0.00046339956424181324),
 (u'ridiculous', 0.0004612316130640738),
 (u'actual', 0.0004608804660423272),
 (u'famous', 0.00046053907310451814),
 (u'local', 0.00046017729849878015),
 (u'foreign', 0.00045926334160007343),
 (u'heavy', 0.00045926334160007343),
 (u'deadly', 0.0004588674249262803),
 (u'half', 0.0004575831098625123),
 (u'desperately', 0.00045707637330673985),
 (u'fellow', 0.00045707637330673985),
 (u'inevitable', 0.00045499875342807283),
 (u'initial', 0.00045276433204912905),
 (u'trouble', 0.00045188232361007235),
 (u'american', 0.000450472337961557),
 (u'be', 0.0004477817580600717),
 (u'alone', 0.00044681343173742094),
 (u'cheesy', 0.00044453225328459944),
 (u'desperate', 0.00044437980738155263),
 (u'two', 0.00044382259132213993),
 (u'occasional', 0.0004433372741091032),
 (u'odd', 0.0004406199782281893),
 (u'paul', 0.0004401957117925704),
 (u'cinematic', 0.00043958062696007036),
 (u'sole', 0.0004363001745200698),
 (u'ugly', 0.0004363001745200698),
 (u'numerous', 0.0004319371727748691),
 (u'sharp', 0.000429587864142838),
 (u'romantic', 0.0004293747749245131),
 (u'all', 0.0004259938711849501),
 (u'current', 0.0004246655031995346),
 (u'frequently', 0.0004234678164459501),
 (u'green', 0.0004226657940663176),
 (u'amusing', 0.00042159342706433713),
 (u'opposite', 0.00042071802543006733),
 (u'live', 0.0004198360169910106),
 (u'creative', 0.0004113687359760658),
 (u'unlikely', 0.00039802822938673035),
 (u'in', 0.000395902010212656),
 (u'and', 0.00039267015706806284),
 (u'central', 0.00038942494915840945),
 (u'white', 0.0003889657216240245),
 (u'love', 0.00037332901531098755),
 (u'previously', 0.0003593060260753516),
 (u'detective', 0.0003431098459818025)]
In [107]:
## suming scores from a list of seed words for which we know the polarity
def seed_score(pos_seed):
    score=defaultdict(int)
    for seed in pos_seed:
        c=dict(getcollocations(seed))
        for w in c:
            score[w]+=c[w]
    return score
In [108]:
# words that are closest to the seed set (still many negatives in there, so we need some more work)
sorted(seed_score(['good','great','perfect','cool']).items(),key=itemgetter(1),reverse=True)
Out[108]:
[(u'cool', 0.01836803051789725),
 (u'perfect', 0.014235784691719532),
 (u'generally', 0.004914620304679139),
 (u'great', 0.0044212228153536195),
 (u'shallow', 0.004387519939375499),
 (u'green', 0.004031276633069313),
 (u'quiet', 0.0038389323602368874),
 (u'sadly', 0.0037656812902670915),
 (u'cold', 0.00372639619058579),
 (u'eccentric', 0.003566246811727291),
 (u'anyway', 0.0035435598196591803),
 (u'mary', 0.003528786619802258),
 (u'like', 0.0034720852363373366),
 (u'willing', 0.0034463352861676677),
 (u'overall', 0.003416962030468515),
 (u'off', 0.0033758301235972932),
 (u'visually', 0.0033742110974529374),
 (u'therefore', 0.0033455317539115297),
 (u'close', 0.0033320331697695668),
 (u'sad', 0.0033247711223444326),
 (u'nicely', 0.00329676987067692),
 (u'entirely', 0.0032905492598513672),
 (u'intelligent', 0.0032582896033018925),
 (u'lovely', 0.0032553034585332107),
 (u'surely', 0.0032499006749901536),
 (u'totally', 0.003242437962550958),
 (u'minor', 0.0032310265819358213),
 (u'slowly', 0.0032165337363270928),
 (u'attractive', 0.003213592455513773),
 (u'terribly', 0.00320713317502573),
 (u'sean', 0.0031974453908803564),
 (u'good', 0.003195157496514654),
 (u'basically', 0.0031879411336983307),
 (u'definitely', 0.003181514976699522),
 (u'desperate', 0.003168203518859585),
 (u'lucky', 0.0031488310596797884),
 (u'whatever', 0.0031396648899829774),
 (u'actually', 0.003130303792115775),
 (u'utterly', 0.0031247667719795837),
 (u'wrong', 0.0030847532558246297),
 (u'believable', 0.0030755305239444446),
 (u'classic', 0.003066005635935617),
 (u'computer', 0.003062877208242161),
 (u'pretty', 0.0030497972634102315),
 (u'forward', 0.003016719069848052),
 (u'horrible', 0.003004876906152966),
 (u'out', 0.0029975287288694433),
 (u'looking', 0.002994825908284486),
 (u'incredible', 0.002980813104868238),
 (u'huge', 0.002967788559403041),
 (u'subtle', 0.0029647516154097213),
 (u'sympathetic', 0.0029574799728487563),
 (u'sure', 0.002942750201925073),
 (u'perfectly', 0.00291984566130735),
 (u'mental', 0.0029124135206031318),
 (u'necessary', 0.002908556234257104),
 (u'anti', 0.0028944363752663417),
 (u'hearted', 0.002893223092613725),
 (u'climactic', 0.002883438227928318),
 (u'very', 0.0028788226459838047),
 (u'past', 0.0028782458455121037),
 (u'acting', 0.002870504043733427),
 (u'tight', 0.002868221653510638),
 (u'memorable', 0.002866545817151434),
 (u'soft', 0.0028548578086146567),
 (u'weird', 0.0028444428229098093),
 (u'outstanding', 0.0028443425210123883),
 (u'ensemble', 0.0028432682292655605),
 (u'wonderfully', 0.00284156676875144),
 (u'interesting', 0.0028386296854845793),
 (u'mainly', 0.002837331341168971),
 (u'impossible', 0.002836475423211005),
 (u'similar', 0.002825617162065585),
 (u'different', 0.002822047002694978),
 (u'really', 0.002820943592758371),
 (u'high', 0.002816492704189309),
 (u'wonderful', 0.0028127558963416985),
 (u'funny', 0.0028126166780049876),
 (u'same', 0.0028115870709538504),
 (u'constant', 0.0027988950584166682),
 (u'especially', 0.0027943160712365573),
 (u'genuinely', 0.0027935832384329714),
 (u'blue', 0.0027852847247673237),
 (u'evil', 0.0027836162729910803),
 (u'incredibly', 0.002781830187028037),
 (u'apparent', 0.0027805310210175945),
 (u'effectively', 0.0027769925322682076),
 (u'nice', 0.002768021053852345),
 (u'fly', 0.0027663925539353675),
 (u'slightly', 0.002754343268868139),
 (u'barely', 0.0027521125520677554),
 (u'entire', 0.0027417046601842126),
 (u'necessarily', 0.0027360158931798834),
 (u'also', 0.0027332608938965353),
 (u'stupid', 0.002726806900623024),
 (u'love', 0.002718968110495262),
 (u'brilliant', 0.0027078034963873266),
 (u'present', 0.0027074937178435976),
 (u'third', 0.002701147706793944),
 (u'inevitable', 0.002698495114571493),
 (u'paul', 0.0026975221112758327),
 (u'solid', 0.002696724010656708),
 (u'second', 0.002694117545555503),
 (u'moral', 0.002693374009622139),
 (u'successful', 0.002687012182681193),
 (u'comic', 0.002682085766238037),
 (u'final', 0.0026721227616635025),
 (u'black', 0.0026710705445859924),
 (u'yet', 0.0026689962966847036),
 (u'original', 0.0026679941151914128),
 (u'completely', 0.0026594370192654596),
 (u'regular', 0.002651073081898117),
 (u'somewhere', 0.0026399150876791496),
 (u'traditional', 0.0026370942869986857),
 (u'probably', 0.0026316485937714876),
 (u'no', 0.002629862097209245),
 (u'famous', 0.0026273424381649635),
 (u'realistic', 0.00262547541830983),
 (u'never', 0.002625244395827706),
 (u'social', 0.002624405668474319),
 (u'give', 0.0026229033988971594),
 (u'fantastic', 0.0026222127905733282),
 (u'individual', 0.0026185379920023863),
 (u'right', 0.0026181746246368414),
 (u'only', 0.002617469727038716),
 (u'dull', 0.00261633648375256),
 (u'next', 0.0026161268333529906),
 (u'major', 0.002608481485985101),
 (u'always', 0.002607315437125177),
 (u'still', 0.002598904288393414),
 (u'key', 0.0025986126167018165),
 (u'nearly', 0.00259743762192478),
 (u'first', 0.0025949812360450634),
 (u'just', 0.002594077290946316),
 (u'particular', 0.0025934035019344534),
 (u'absolutely', 0.002592714515796534),
 (u'bad', 0.0025889114735255002),
 (u'fat', 0.0025853394975499586),
 (u'physical', 0.002584729666138146),
 (u'surprising', 0.0025785158544206115),
 (u'real', 0.0025666908886398223),
 (u'fully', 0.002563717589503081),
 (u'young', 0.00256224077034713),
 (u'elaborate', 0.002543562602388069),
 (u'late', 0.0025428097935556535),
 (u'cute', 0.002542474876328181),
 (u'silly', 0.0025367706469425193),
 (u'two', 0.002531225625310705),
 (u'again', 0.002529209527150347),
 (u'poor', 0.002527607544401137),
 (u'initial', 0.0025266776157293384),
 (u'short', 0.0025229835673570736),
 (u'alive', 0.002515791916264147),
 (u're', 0.00251410007668532),
 (u'movie', 0.0025137021481225247),
 (u'popular', 0.002510852243939196),
 (u'favorite', 0.0025054240669844224),
 (u'surprisingly', 0.0024978235225739834),
 (u'hot', 0.0024926082900537955),
 (u'unfortunately', 0.002491947150824106),
 (u'hilarious', 0.0024888159472125394),
 (u'terrific', 0.002486994559448905),
 (u'not', 0.0024838196518674758),
 (u'extremely', 0.0024836795804661924),
 (u'ever', 0.0024799846740085688),
 (u'slow', 0.0024765659763576393),
 (u'so', 0.002470072047171524),
 (u'fast', 0.0024699275239350137),
 (u'international', 0.002466387666107279),
 (u'little', 0.0024596442383871923),
 (u'merely', 0.0024589474397309834),
 (u'further', 0.0024579186591984735),
 (u'wild', 0.0024570737029157236),
 (u'powerful', 0.0024570439249516234),
 (u'unique', 0.0024550283562681952),
 (u'long', 0.0024542093072923944),
 (u'sexual', 0.002452222374912764),
 (u'usual', 0.002451004703699789),
 (u'too', 0.002450506270197633),
 (u'together', 0.0024498262842226424),
 (u'important', 0.0024469273639049914),
 (u'painful', 0.0024428002565871736),
 (u'general', 0.0024406913582367905),
 (u'somewhat', 0.0024383784285768053),
 (u'professional', 0.002437781447361272),
 (u'exactly', 0.0024375182927824286),
 (u'able', 0.002431802338319177),
 (u'disappointing', 0.0024294287360479513),
 (u'once', 0.0024280866937187373),
 (u'much', 0.0024268078320277466),
 (u'thankfully', 0.0024263668609984675),
 (u'well', 0.0024262964525530823),
 (u'trouble', 0.002424968855520134),
 (u'new', 0.0024245287179007918),
 (u'else', 0.0024238107605589858),
 (u'away', 0.002420198956604946),
 (u'however', 0.002419841628909794),
 (u'eventually', 0.0024192507548939125),
 (u'strong', 0.0024176279579377715),
 (u'female', 0.0024166696982525944),
 (u'future', 0.002411977424524619),
 (u'here', 0.0024116498209578057),
 (u'fair', 0.002410216236868189),
 (u'actual', 0.0024067205090930856),
 (u'effective', 0.0024056607781946858),
 (u'male', 0.0024051920208348716),
 (u'all', 0.0023966930782590217),
 (u'ultimate', 0.0023952146015562177),
 (u'hardly', 0.002393683141220838),
 (u'common', 0.0023916366049515445),
 (u'former', 0.002391407046403267),
 (u'even', 0.0023895010595202347),
 (u'literally', 0.0023869605379775384),
 (u'last', 0.0023856404344273687),
 (u'responsible', 0.002384712475009501),
 (u'comedic', 0.0023833468102536985),
 (u'as', 0.002382005332303402),
 (u'shot', 0.0023801158805992965),
 (u'hard', 0.0023792820325943408),
 (u'possible', 0.002378353733308164),
 (u'typical', 0.0023766948299850463),
 (u'certainly', 0.0023757811271592727),
 (u'clear', 0.00237362502544932),
 (u'then', 0.0023720256544745794),
 (u'special', 0.002371843688843257),
 (u'essentially', 0.0023630257444100496),
 (u'soon', 0.002357546474735235),
 (u'scary', 0.0023563717854545954),
 (u'quick', 0.0023473630392837866),
 (u'spectacular', 0.0023470585574532337),
 (u'other', 0.0023456108036209035),
 (u'later', 0.00234524260869092),
 (u'practically', 0.0023435453372715248),
 (u'straight', 0.0023371369082872784),
 (u'normal', 0.0023315921809888523),
 (u'seriously', 0.002328428800073818),
 (u'remarkable', 0.002325943042587466),
 (u'supposedly', 0.0023247640235971347),
 (u'big', 0.002322568876824087),
 (u'maybe', 0.0023225510972794436),
 (u'certain', 0.0023224311981699755),
 (u'billy', 0.002321059790449179),
 (u'easy', 0.0023189144504748046),
 (u'likable', 0.0023167935895555937),
 (u'whole', 0.002315434359066139),
 (u'easily', 0.0023119519296581773),
 (u'particularly', 0.0023108098883486295),
 (u'dead', 0.002303688222871656),
 (u'badly', 0.002303117290501056),
 (u'awful', 0.0022977472494975833),
 (u'many', 0.0022958937460551363),
 (u'visual', 0.0022940334855966285),
 (u'truly', 0.0022931001606245156),
 (u'co', 0.002289362745168104),
 (u'occasionally', 0.0022872519083199224),
 (u'ridiculous', 0.002283616768782563),
 (u'simple', 0.002281308555367873),
 (u'numerous', 0.0022812106887150517),
 (u'english', 0.002281201759352531),
 (u'obviously', 0.00227965655105327),
 (u'light', 0.0022778656862701215),
 (u'technical', 0.002276158023703666),
 (u'total', 0.0022756510142716104),
 (u'back', 0.0022745184635103223),
 (u'sharp', 0.002271797549489778),
 (u'there', 0.0022643260623824704),
 (u'relatively', 0.002262957154496509),
 (u'few', 0.0022623065819445416),
 (u'modern', 0.002261513095689698),
 (u'complete', 0.0022609091817375844),
 (u'disturbing', 0.002259201977809433),
 (u'rich', 0.002251969643273991),
 (u'far', 0.002250680245334822),
 (u'ahead', 0.002248176753787567),
 (u'fresh', 0.0022469860426400155),
 (u'now', 0.0022465177203474275),
 (u'such', 0.0022452292343507884),
 (u'worthy', 0.00224307369529924),
 (u'thin', 0.0022405183075006065),
 (u'laugh', 0.002236045366253346),
 (u'constantly', 0.002235679637427152),
 (u'single', 0.00223521992903317),
 (u'basic', 0.002232160715715428),
 (u'clearly', 0.002228490855521868),
 (u'usually', 0.0022177495835547746),
 (u'often', 0.0022175746655211962),
 (u'private', 0.002216483776545628),
 (u'various', 0.002212771836886069),
 (u'criminal', 0.0022077352184160784),
 (u'old', 0.0022071295402323414),
 (u'rare', 0.002206383285594105),
 (u'and', 0.0022039288310718346),
 (u'serious', 0.002202448873556133),
 (u'epic', 0.002200503937262294),
 (u'early', 0.0021999110030616106),
 (u'musical', 0.002197821272071629),
 (u'potential', 0.002196168886167113),
 (u'beautiful', 0.00219266018688451),
 (u'superior', 0.0021878102643613584),
 (u'intense', 0.0021849912646545433),
 (u'rarely', 0.002182915232381559),
 (u'almost', 0.0021827639379282137),
 (u'human', 0.002181964429166975),
 (u'top', 0.0021818912489780643),
 (u'deadly', 0.002179747441952342),
 (u'ago', 0.0021771875760520647),
 (u'quite', 0.00217716996792519),
 (u'rather', 0.0021709921383870814),
 (u'frequently', 0.0021679379865216565),
 (u'instead', 0.002167905989219789),
 (u'enough', 0.0021667069012216832),
 (u'seemingly', 0.0021624517939975384),
 (u'natural', 0.002161209011346006),
 (u'extra', 0.0021597403058495348),
 (u'difficult', 0.0021552941515333313),
 (u'true', 0.0021499827474573394),
 (u'up', 0.002149383094462342),
 (u'emotionally', 0.0021459115484230838),
 (u'song', 0.0021406378342695626),
 (u'american', 0.0021403246942986046),
 (u'several', 0.002139164769962724),
 (u'ready', 0.002137019223248206),
 (u'weak', 0.002131642351726366),
 (u'wealthy', 0.0021269001542290247),
 (u'star', 0.002115006546130686),
 (u'national', 0.0021147161369235675),
 (u'emotional', 0.002114499497453295),
 (u'full', 0.0021131329326073193),
 (u'creative', 0.0021078831775885464),
 (u'of', 0.0021047860848702653),
 (u'suddenly', 0.002102724335000132),
 (u'cinematic', 0.002099879500881235),
 (u'sometimes', 0.002098097167524616),
 (u'dark', 0.0020945422999583195),
 (u'mean', 0.0020915701447731016),
 (u'directly', 0.002091255612454668),
 (u'dramatic', 0.002082295634128307),
 (u'due', 0.0020775561173657585),
 (u'main', 0.002077327688112775),
 (u'highly', 0.0020734043999109187),
 (u'perhaps', 0.0020718721850718756),
 (u'recent', 0.002071123097896627),
 (u'accidentally', 0.0020672308456623984),
 (u'dangerous', 0.0020655845472295226),
 (u'available', 0.002060730486497321),
 (u'obvious', 0.0020567832665825863),
 (u'white', 0.002056694729345633),
 (u'initially', 0.0020457429869681775),
 (u'ultimately', 0.0020445486115517135),
 (u'simply', 0.002040809336654068),
 (u'virtually', 0.0020385289886886617),
 (u'opposite', 0.002034424295999083),
 (u'immediately', 0.002033911005728943),
 (u'the', 0.0020321660786504357),
 (u'brief', 0.002031927792962111),
 (u'quickly', 0.0020285452399449004),
 (u'unable', 0.0020276856993266543),
 (u'naturally', 0.002027352047162),
 (u'originally', 0.0020264246642947964),
 (u'capable', 0.0020229531124203127),
 (u'predictable', 0.002020379107232477),
 (u'flat', 0.0020194064080728694),
 (u'thoroughly', 0.0020183942576732744),
 (u'non', 0.0020174620063781877),
 (u'personal', 0.0020086691296012722),
 (u'happy', 0.002007374294492547),
 (u'small', 0.0020040568969961446),
 (u'political', 0.001999982549818061),
 (u'music', 0.0019927175648187696),
 (u'aware', 0.0019904472963272134),
 (u'entertaining', 0.001990095838036924),
 (u'offensive', 0.0019832429280615565),
 (u'gary', 0.0019823038097920038),
 (u'clever', 0.0019820969802297793),
 (u'already', 0.001981432011068098),
 (u'impressive', 0.0019801420317751032),
 (u'finally', 0.0019792514454253494),
 (u'dimensional', 0.0019747302138405634),
 (u'amusing', 0.0019594628873836764),
 (u'critical', 0.001955164082273457),
 (u'possibly', 0.0019511120404093365),
 (u'painfully', 0.0019487341117563886),
 (u'own', 0.0019485898408926775),
 (u'equally', 0.001941586594786285),
 (u'tough', 0.001935686228433603),
 (u'comedy', 0.001928955676509114),
 (u'cast', 0.0019216965131673314),
 (u'foreign', 0.0019214942499954125),
 (u'average', 0.001919281774346123),
 (u'thus', 0.0019142452806710012),
 (u'on', 0.0019121680081721627),
 (u'british', 0.00190909801041013),
 (u'suspenseful', 0.0019074161664669616),
 (u'double', 0.001903480651695787),
 (u'familiar', 0.001901770791001876),
 (u'time', 0.0018998176942151696),
 (u'standard', 0.0018994180833314393),
 (u'nowhere', 0.0018987297134157972),
 (u'magic', 0.0018913580242581073),
 (u'likely', 0.001888523417620946),
 (u'ill', 0.0018874309629878874),
 (u'over', 0.0018741371896421003),
 (u'pathetic', 0.0018694014862299202),
 (u'previous', 0.0018656264501616096),
 (u'indeed', 0.0018505566549915182),
 (u'frankly', 0.0018490553767872694),
 (u'unnecessary', 0.0018442958286519423),
 (u'meanwhile', 0.0018373819237956107),
 (u'sweet', 0.0018317965315737415),
 (u'bright', 0.0018209271692124643),
 (u'unusual', 0.0018096760600330329),
 (u'graphic', 0.0018072146818752881),
 (u'apparently', 0.0018057793360379136),
 (u'earth', 0.0018034447908819122),
 (u'year', 0.0018024798404822854),
 (u'interested', 0.0018019685951838995),
 (u'unbelievable', 0.00179761403594268),
 (u'heavily', 0.001782206560976708),
 (u'odd', 0.0017814712511314912),
 (u'romantic', 0.0017799989468994474),
 (u'open', 0.0017782724435018858),
 (u'complex', 0.0017774284447795035),
 (u'friendly', 0.0017772249575965825),
 (u'local', 0.0017699980405700344),
 (u'ex', 0.001765399600161043),
 (u'self', 0.0017556819143849144),
 (u'to', 0.0017544354081247865),
 (u'enjoyable', 0.0017532791488678407),
 (u'appropriate', 0.0017216651035689122),
 (u'life', 0.0017144139154199722),
 (u'in', 0.0017140816145215269),
 (u'be', 0.0017024490343991747),
 (u'steven', 0.0017010809857261344),
 (u'one', 0.0016979741150993989),
 (u'overly', 0.0016902408532289107),
 (u'screen', 0.0016889626085993827),
 (u'david', 0.0016848396539122083),
 (u'unfunny', 0.0016754948810989746),
 (u'about', 0.0016658770266088703),
 (u'fairly', 0.0016657738475276476),
 (u'detective', 0.0016518359786180817),
 (u'humorous', 0.001649419643652082),
 (u'large', 0.0016448568761717962),
 (u'animal', 0.001639990007664387),
 (u'terrible', 0.0016251396126525615),
 (u'mostly', 0.0016236936700630583),
 (u'anywhere', 0.0016195641003155631),
 (u'narrative', 0.001614732454952487),
 (u'occasional', 0.0016102652278808267),
 (u'central', 0.0016081788011969805),
 (u'giant', 0.0016035176486315535),
 (u'free', 0.0015934002865996032),
 (u'desperately', 0.0015771875152043742),
 (u'half', 0.0015768131466758427),
 (u'mysterious', 0.0015751527176854929),
 (u'largely', 0.0015751355449538386),
 (u'down', 0.0015691734113758155),
 (u'worth', 0.0015565706237899838),
 (u'previously', 0.001552396463687127),
 (u'low', 0.0015113894893284531),
 (u'heavy', 0.0014976075171991509),
 (u'minute', 0.0014966726528048834),
 (u'psychological', 0.0014689363873557058),
 (u'fellow', 0.0014653527083804605),
 (u'positive', 0.0014402283127099625),
 (u'sole', 0.0014359629388887166),
 (u'cheesy', 0.001433531207507379),
 (u'critic', 0.0014261314813257182),
 (u'jean', 0.0014204952971273396),
 (u'successfully', 0.0014128102927179332),
 (u'military', 0.0013847281416612812),
 (u'wide', 0.0013709124940004726),
 (u'ugly', 0.0013535263057754747),
 (u'live', 0.0013501882237028178),
 (u'extraordinary', 0.0013479763826237494),
 (u'recently', 0.0012873275710996563),
 (u'poorly', 0.0012552509687996207),
 (u'alone', 0.001210747694372369),
 (u'current', 0.0012097881183875095),
 (u'public', 0.00115049929567981),
 (u'oddly', 0.0011000750444722194),
 (u'unlikely', 0.0010737672952186752),
 (u'laughable', 0.0009970586913405925)]
In [109]:
posscores=seed_score(['good','great','perfect','cool'])
negscores=seed_score(['bad','terrible','wrong',"crap"])

## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms:
    sentscores[w]=posscores[w]-negscores[w]

    
In [104]:
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
Out[104]:
[(u'terrible', -0.009855717788299525),
 (u'wrong', -0.002892807170410995),
 (u'laughable', -0.0022372681494660608),
 (u'frankly', -0.0013849740220763036),
 (u'bad', -0.0013658125844714167),
 (u'poorly', -0.0013222754461456841),
 (u'anywhere', -0.0012546869557127096),
 (u'ugly', -0.001176343204772265),
 (u'current', -0.001117276129549125),
 (u'successfully', -0.001037488811300721),
 (u'unfunny', -0.0010133916696818575),
 (u'foreign', -0.000904334983761761),
 (u'sole', -0.0008546952219981477),
 (u'terribly', -0.0007390733994578876),
 (u'oddly', -0.0007073190373175831),
 (u'total', -0.0006894899408381215),
 (u'military', -0.0006728887380600891),
 (u'positive', -0.000609054423130711),
 (u'pathetic', -0.0005971148063656133),
 (u'awful', -0.000573324177411706),
 (u'earth', -0.0005105042292985202),
 (u'unnecessary', -0.0005082660511734676),
 (u'about', -0.0004718460547538108),
 (u'graphic', -0.00043745568496709455),
 (u'recently', -0.0004229737674965365),
 (u'critic', -0.00042093904961075274),
 (u'public', -0.0004157576360775922),
 (u'horrible', -0.00039944889995236713),
 (u'low', -0.0003892064134552527),
 (u'giant', -0.00037211464872255115),
 (u'worth', -0.0003552629512555594),
 (u'painful', -0.00035280475832298163),
 (u'offensive', -0.00033902732360354196),
 (u'desperately', -0.0003089528104637091),
 (u'entertaining', -0.0003081982911766674),
 (u'cheesy', -0.0003000668539786858),
 (u'one', -0.0002932272950073745),
 (u'ill', -0.0002868316466521808),
 (u'the', -0.0002369155377649466),
 (u'superior', -0.0002351620574958151),
 (u'unbelievable', -0.00022197959492024325),
 (u'half', -0.00020857413062678088),
 (u'jean', -0.00020618265558128746),
 (u'overly', -0.000192769761486823),
 (u'fellow', -0.0001800691312555103),
 (u'live', -0.00017944376467854146),
 (u'to', -0.00015489195817700615),
 (u'fairly', -0.00015233442125027106),
 (u'ex', -0.000133240536831384),
 (u'gary', -0.00011401582564645818),
 (u'already', -0.00011372581575243724),
 (u'fair', -9.02324502693951e-05),
 (u'stupid', -7.923432614896815e-05),
 (u'alone', -7.62403716351636e-05),
 (u'heavy', -7.111023622717464e-05),
 (u'seriously', -6.864833713224761e-05),
 (u'friendly', -6.761240116471266e-05),
 (u'previously', -6.525170730300057e-05),
 (u'minute', -6.519403682034059e-05),
 (u'painfully', -6.447850775579854e-05),
 (u'complete', -6.009777417811035e-05),
 (u'interested', -4.968938910418128e-05),
 (u'free', -2.996631042662557e-05),
 (u'apparently', -2.6996430693535575e-05),
 (u'dull', -2.052676565124454e-05),
 (u'equally', -9.762344666475664e-06),
 (u'standard', -3.637292259084249e-06),
 (u'ridiculous', 1.045081253261233e-05),
 (u'nowhere', 1.3030279211109991e-05),
 (u'tough', 2.8891238996220023e-05),
 (u'psychological', 2.92245726104584e-05),
 (u'predictable', 2.9484108354909007e-05),
 (u'occasional', 3.871281960579815e-05),
 (u'mostly', 4.2000269254770326e-05),
 (u'central', 5.328131393685185e-05),
 (u'silly', 5.40261995142597e-05),
 (u'international', 5.6200091819184374e-05),
 (u'over', 6.536716625644957e-05),
 (u'happy', 6.91126047207718e-05),
 (u'double', 6.995106926406145e-05),
 (u'brief', 8.330731593712165e-05),
 (u'unlikely', 8.62313866111619e-05),
 (u'down', 9.272424872767753e-05),
 (u'obvious', 9.409962691220991e-05),
 (u'thankfully', 0.00010456881380261187),
 (u'amusing', 0.00011606002779824664),
 (u'possibly', 0.0001170851976322212),
 (u'frequently', 0.00013607217060806467),
 (u'screen', 0.00014518477176753792),
 (u'elaborate', 0.00014565575042842227),
 (u'indeed', 0.00015197075399033238),
 (u'song', 0.00015267859031716782),
 (u'appropriate', 0.0001751277874693666),
 (u'of', 0.0001804054214429602),
 (u'supposedly', 0.0001813588769441093),
 (u'female', 0.000187585537309287),
 (u'extraordinary', 0.00018831007560704592),
 (u'sweet', 0.00019714980251914747),
 (u'odd', 0.00020447281519450286),
 (u'wide', 0.00021107841013971252),
 (u'largely', 0.0002230787156980189),
 (u'unique', 0.0002248681331836178),
 (u'weak', 0.00022527489860168668),
 (u'bright', 0.00022655330862714424),
 (u'badly', 0.00022762041065908269),
 (u'responsible', 0.00023230990156556134),
 (u'animal', 0.00023587487871816514),
 (u'complex', 0.00023957432196753032),
 (u'maybe', 0.0002527025964321641),
 (u'ahead', 0.00026528322946812576),
 (u'due', 0.00026764984856503596),
 (u'large', 0.0002742074206140816),
 (u'mean', 0.00027969318616533913),
 (u'such', 0.0002824625084295192),
 (u'spectacular', 0.000284731383087323),
 (u'single', 0.00029996981466053584),
 (u'david', 0.0003010343209289986),
 (u'early', 0.00030347445343878064),
 (u'there', 0.0003038652637019942),
 (u'unable', 0.0003089529589726911),
 (u'huge', 0.00031045486780024393),
 (u'truly', 0.0003143382466061274),
 (u'potential', 0.00034506606430874886),
 (u'aware', 0.00035049698049125226),
 (u'cinematic', 0.0003555319660876414),
 (u'hardly', 0.0003568441532017749),
 (u'opposite', 0.0003592578641072545),
 (u'romantic', 0.00035927730492690433),
 (u'capable', 0.00037030876580039463),
 (u'instead', 0.0003732553191903758),
 (u'meanwhile', 0.0003773411383428758),
 (u'available', 0.0003778464154304529),
 (u'up', 0.00038072585383780227),
 (u'possible', 0.00038259323279369803),
 (u'top', 0.000395909835942861),
 (u'finally', 0.00040172767143438004),
 (u'poor', 0.0004049527571028011),
 (u'national', 0.0004081816531386359),
 (u'even', 0.00040860453897041725),
 (u'absolutely', 0.0004115845765526563),
 (u'local', 0.00041357643629792045),
 (u'merely', 0.0004142841255082557),
 (u'easily', 0.0004149536193439774),
 (u'now', 0.00041557406708773526),
 (u'climactic', 0.0004205084905875853),
 (u'back', 0.000423291524248446),
 (u'whole', 0.00042383737917805336),
 (u'flat', 0.00043316669872226756),
 (u'right', 0.0004343735579328614),
 (u'simple', 0.0004346085198177101),
 (u'future', 0.00043509480822239224),
 (u'completely', 0.00043809695875969166),
 (u'open', 0.0004403087704105159),
 (u'straight', 0.000447631071003372),
 (u'basic', 0.00044899193653893314),
 (u'enjoyable', 0.0004501205216749073),
 (u'extremely', 0.00045181378697531447),
 (u'be', 0.00045577667832395166),
 (u'enough', 0.0004580487416557797),
 (u'on', 0.00046164403065117494),
 (u'usual', 0.00046192139465692835),
 (u'year', 0.00046657720097720116),
 (u'physical', 0.00046966667466785596),
 (u'so', 0.0004702228437914336),
 (u'laugh', 0.0004705173665808082),
 (u'mysterious', 0.0004715847558962325),
 (u'steven', 0.00047178890762968246),
 (u'practically', 0.0004770875391759445),
 (u'then', 0.00047833134786841704),
 (u'deadly', 0.00047872745964739846),
 (u'simply', 0.00047928947054113203),
 (u'previous', 0.0004864957003770933),
 (u'perhaps', 0.0004868518033162277),
 (u'obviously', 0.0004890121054869465),
 (u'detective', 0.0004894785073747004),
 (u'incredibly', 0.0004925436804560221),
 (u'dangerous', 0.0004971554463562978),
 (u'immediately', 0.000501778461674787),
 (u'originally', 0.0005042280610767363),
 (u'modern', 0.00050532562264158),
 (u'easy', 0.0005160148279703838),
 (u'disappointing', 0.0005186392844217117),
 (u'almost', 0.0005210870768316755),
 (u'else', 0.0005235152247228339),
 (u'later', 0.0005242486416186713),
 (u'impressive', 0.0005288163131077718),
 (u'thoroughly', 0.000535035438876993),
 (u'likable', 0.0005376939884275277),
 (u'accidentally', 0.0005387884356078381),
 (u'ultimate', 0.0005392830539742888),
 (u'anti', 0.0005405223310637991),
 (u'give', 0.0005434451039073836),
 (u'likely', 0.0005437373193750207),
 (u'self', 0.0005457375222126925),
 (u'small', 0.000547757267443128),
 (u'here', 0.0005489917746541125),
 (u'quick', 0.0005495308938904965),
 (u'in', 0.0005517231263248987),
 (u'big', 0.000553192795083189),
 (u'white', 0.0005571827961379774),
 (u'magic', 0.0005600402871066973),
 (u'music', 0.0005637757253686389),
 (u'seemingly', 0.0005673230234262952),
 (u'ultimately', 0.0005810759372053523),
 (u'serious', 0.0005854546565258667),
 (u'far', 0.0005878709423325309),
 (u'ever', 0.0005935266537370389),
 (u'paul', 0.0006015292658011918),
 (u'often', 0.000601672706603956),
 (u'dimensional', 0.0006021707502093074),
 (u'personal', 0.0006024528384928436),
 (u'main', 0.0006061750765146369),
 (u'rather', 0.0006090053018368551),
 (u'shot', 0.0006166843447406927),
 (u'time', 0.0006197846233195092),
 (u'thus', 0.0006224508013473646),
 (u'suddenly', 0.0006225691668584994),
 (u'old', 0.0006231237833857315),
 (u'full', 0.000624230460502307),
 (u'few', 0.000625522981820726),
 (u'critical', 0.0006271427574530369),
 (u'cast', 0.0006278059321773298),
 (u'emotional', 0.0006324621028321394),
 (u'dramatic', 0.0006337923808277456),
 (u'own', 0.0006353768623771551),
 (u'funny', 0.0006364526576666772),
 (u'technical', 0.0006420939225105851),
 (u'exactly', 0.0006428260279371192),
 (u'other', 0.0006498393372772956),
 (u'quite', 0.0006520081935745188),
 (u'clear', 0.0006539342315067483),
 (u'somewhere', 0.0006600920831501596),
 (u'just', 0.0006614607540827121),
 (u'common', 0.0006616229182801092),
 (u'hard', 0.0006619152572155417),
 (u'initially', 0.0006631595206745941),
 (u'narrative', 0.00066571276719812),
 (u'many', 0.000666141040993326),
 (u'not', 0.000672503120031562),
 (u'private', 0.0006733306849319425),
 (u'too', 0.0006818211717397056),
 (u'hilarious', 0.000683333186303766),
 (u'much', 0.0006841450446583987),
 (u'non', 0.0006871977899058012),
 (u'short', 0.000688800750842894),
 (u'usually', 0.0006989595906073285),
 (u'and', 0.0007005594235027154),
 (u'evil', 0.000701845065714783),
 (u'directly', 0.0007058816437173249),
 (u'necessarily', 0.0007099367697196331),
 (u'special', 0.0007110126412170818),
 (u'rich', 0.0007113668767104394),
 (u'clever', 0.0007128021013105938),
 (u'musical', 0.0007131823283974886),
 (u'typical', 0.0007183468693659971),
 (u'comedy', 0.000720336688148201),
 (u'certain', 0.0007265186777930407),
 (u'rarely', 0.0007327426410832743),
 (u'quickly', 0.0007337175378568676),
 (u'certainly', 0.0007351596198168088),
 (u'essentially', 0.0007375733628729414),
 (u'constantly', 0.0007376214223499883),
 (u'political', 0.000740313723538055),
 (u'dead', 0.00074437860122146),
 (u'as', 0.0007449121089841216),
 (u'alive', 0.0007452446800648737),
 (u'various', 0.0007474671137830931),
 (u'particular', 0.000751232883341537),
 (u'humorous', 0.0007585234451972629),
 (u'difficult', 0.0007601149087694834),
 (u'weird', 0.0007629208107821734),
 (u'once', 0.0007711851652305634),
 (u'average', 0.0007762013016956169),
 (u'never', 0.0007782027708719897),
 (u'major', 0.0007806552606664213),
 (u'utterly', 0.0007845844987133164),
 (u'american', 0.0007849357525979634),
 (u'only', 0.0007875764912710663),
 (u'away', 0.000789959495941285),
 (u'fresh', 0.0007924534921635159),
 (u'last', 0.0007952742073522234),
 (u'star', 0.0007961050312125886),
 (u'literally', 0.0007963635287101941),
 (u'really', 0.0007989344758660985),
 (u'ago', 0.000800378971986425),
 (u'favorite', 0.000800875986490055),
 (u'unfortunately', 0.0008097636860386844),
 (u'eventually', 0.000810251304977691),
 (u'worthy', 0.0008118883366741764),
 (u'beautiful', 0.0008158724141659971),
 (u'entire', 0.0008176639014100804),
 (u'out', 0.0008176780862230349),
 (u'recent', 0.0008191931673215587),
 (u'heavily', 0.0008204630367004118),
 (u'long', 0.0008214448818757603),
 (u'life', 0.0008238209176573776),
 (u'normal', 0.0008299171776015202),
 (u'sometimes', 0.000833889437982206),
 (u'true', 0.0008341628074066598),
 (u'comedic', 0.0008402304397220433),
 (u'wild', 0.0008460609266492368),
 (u'next', 0.0008506094512549538),
 (u'inevitable', 0.0008532965586213147),
 (u'male', 0.0008542524439990413),
 (u'movie', 0.0008568299207744503),
 (u'acting', 0.0008587059212509835),
 (u'together', 0.0008606082608744094),
 (u'soon', 0.0008612898410095002),
 (u'virtually', 0.0008620769385564711),
 (u'several', 0.0008635915908004243),
 (u'new', 0.0008717183653308082),
 (u'unusual', 0.000874049125520341),
 (u'totally', 0.0008753879812317773),
 (u'first', 0.0008767644151872808),
 (u'occasionally', 0.0008812392433480141),
 (u'ready', 0.0008867537055616045),
 (u'familiar', 0.0008871839350435358),
 (u'naturally', 0.0008931806678361168),
 (u'english', 0.0008953479422171709),
 (u'able', 0.0009025465029791912),
 (u'blue', 0.0009034182359929403),
 (u'third', 0.0009059647687182104),
 (u'clearly', 0.0009255258001603715),
 (u'little', 0.0009273497091342535),
 (u'soft', 0.0009315285079749412),
 (u'relatively', 0.000932682230440861),
 (u'numerous', 0.0009345468307633423),
 (u'creative', 0.0009391832311759108),
 (u'strong', 0.0009461852637215791),
 (u'real', 0.0009476095907813163),
 (u'attractive', 0.0009515625205987143),
 (u'well', 0.0009556768775209261),
 (u'sure', 0.000957112811378156),
 (u'again', 0.0009573117992908559),
 (u'nearly', 0.000963154733930963),
 (u'fat', 0.0009731769297739406),
 (u'regular', 0.0009743180446978744),
 (u'brilliant', 0.0009763815352689291),
 (u'dark', 0.0009772170725694694),
 (u'however', 0.0009825371290734996),
 (u'british', 0.0009831906491023189),
 (u'mental', 0.0009840764115491539),
 (u'successful', 0.0009841226854034244),
 (u'further', 0.000991798712094388),
 (u'impossible', 0.0009942979187305947),
 (u'no', 0.0009994894636115113),
 (u'still', 0.0010004181941142172),
 (u'extra', 0.001000860658100024),
 (u'disturbing', 0.001012984943329316),
 (u'particularly', 0.0010143822405221036),
 (u'light', 0.0010158854563900267),
 (u'scary', 0.0010190982983072799),
 (u'trouble', 0.001020603686622701),
 (u'probably', 0.0010229791746550003),
 (u'visual', 0.0010354518033044457),
 (u'entirely', 0.0010390508978398907),
 (u'subtle', 0.0010431120412786679),
 (u'general', 0.001044149818511135),
 (u'interesting', 0.001047814632179693),
 (u'present', 0.0010513260178107865),
 (u'human', 0.0010549180635384455),
 (u'thin', 0.001060201360202429),
 (u'important', 0.0010609448834153767),
 (u'billy', 0.0010614612617553588),
 (u'powerful', 0.001065105346526924),
 (u'remarkable', 0.0010801064408230525),
 (u'fast', 0.001080821099915891),
 (u'suspenseful', 0.0010865325836996308),
 (u'criminal', 0.0010893235304361155),
 (u'realistic', 0.0010908174254513768),
 (u'late', 0.001091566540796039),
 (u'necessary', 0.001093511991173771),
 (u're', 0.0010975283450386578),
 (u'slow', 0.0010992769336009133),
 (u'professional', 0.0011030029044699398),
 (u'ensemble', 0.0011086940528515163),
 (u'moral', 0.0011097710285118065),
 (u'famous', 0.0011131365147124647),
 (u'natural', 0.001122529450799036),
 (u'former', 0.0011246773108892236),
 (u'original', 0.001128075984996672),
 (u'co', 0.0011286547003453904),
 (u'pretty', 0.0011358695479615653),
 (u'different', 0.0011388889967668557),
 (u'good', 0.0011441582764286306),
 (u'individual', 0.0011444814831923626),
 (u'past', 0.0011477950766972849),
 (u'surprising', 0.0011522704428492743),
 (u'epic', 0.001153369353919784),
 (u'fly', 0.0011553456814920866),
 (u'two', 0.0011625152220849123),
 (u'cute', 0.00116783314798835),
 (u'highly', 0.0011731709941409391),
 (u'key', 0.0011735971782954077),
 (u'yet', 0.001186086529327471),
 (u'genuinely', 0.0011881735919781282),
 (u'initial', 0.0011920844650729454),
 (u'always', 0.0012053506155200685),
 (u'social', 0.0012062341230046482),
 (u'hot', 0.0012147898735283),
 (u'all', 0.0012153680273157782),
 (u'popular', 0.0012223294117901093),
 (u'second', 0.0012287549328360563),
 (u'comic', 0.0012292978367918522),
 (u'also', 0.001244627079060148),
 (u'slightly', 0.0012516757235332284),
 (u'similar', 0.0012528991713053467),
 (u'wealthy', 0.0012540156628997642),
 (u'desperate', 0.0012569748772776634),
 (u'rare', 0.0012578526078817402),
 (u'intense', 0.00125833299075402),
 (u'surprisingly', 0.0012584850456408068),
 (u'young', 0.0012594256270070184),
 (u'believable', 0.0012598114191262274),
 (u'actual', 0.0012604891484779095),
 (u'same', 0.00126803310786752),
 (u'effective', 0.0012723696620154205),
 (u'very', 0.0012726233793848608),
 (u'high', 0.0012981881983572924),
 (u'terrific', 0.0013008601686946928),
 (u'actually', 0.0013010291440174549),
 (u'final', 0.001320570482805893),
 (u'mainly', 0.0013263575562702952),
 (u'intelligent', 0.001334069929045109),
 (u'fantastic', 0.001340352895863594),
 (u'whatever', 0.0013462640084586114),
 (u'fully', 0.0013561219995253926),
 (u'basically', 0.001361770536748289),
 (u'black', 0.0013692127463810653),
 (u'effectively', 0.0013842696658279276),
 (u'emotionally', 0.001388499806196994),
 (u'nice', 0.0013909346255527887),
 (u'lovely', 0.0013961014520419077),
 (u'tight', 0.0014159678767012348),
 (u'especially', 0.0014182485621920152),
 (u'apparent', 0.0014586719518504473),
 (u'sharp', 0.0014619769627225439),
 (u'memorable', 0.0014656373965283931),
 (u'forward', 0.0014997473704841816),
 (u'therefore', 0.001499850682161086),
 (u'classic', 0.0015069583117860513),
 (u'sadly', 0.0015086339919077834),
 (u'sexual', 0.0015304010624382883),
 (u'barely', 0.001538589560672312),
 (u'sympathetic', 0.0015470191484813604),
 (u'love', 0.0015536385250374978),
 (u'looking', 0.0015587191721389278),
 (u'wonderfully', 0.0015793179219199137),
 (u'traditional', 0.0015967771118905387),
 (u'somewhat', 0.0016020153067423828),
 (u'close', 0.0016101565484702794),
 (u'sean', 0.0016256896243415598),
 (u'solid', 0.0016490646385887168),
 (u'wonderful', 0.0016554449555764227),
 (u'anyway', 0.001657321486801025),
 (u'constant', 0.0016655798457468024),
 (u'incredible', 0.0017345921494111424),
 (u'minor', 0.0017420880132325658),
 (u'mary', 0.0017445623912850411),
 (u'like', 0.0017654873172872217),
 (u'eccentric', 0.0017903988553942978),
 (u'perfectly', 0.0017942758139415226),
 (u'sad', 0.001823676709594714),
 (u'overall', 0.0018272778033300422),
 (u'visually', 0.001846739019031568),
 (u'nicely', 0.0019101214163914795),
 (u'slowly', 0.0019188497875388232),
 (u'shallow', 0.001926157429543946),
 (u'lucky', 0.0019562832861092887),
 (u'computer', 0.0019639866708498765),
 (u'surely', 0.002010656098858258),
 (u'hearted', 0.002040214009628856),
 (u'willing', 0.0021015517297600377),
 (u'off', 0.00219411315635401),
 (u'outstanding', 0.0022042146455797927),
 (u'definitely', 0.002310570684473473),
 (u'cold', 0.0023123785439538133),
 (u'great', 0.002751364025528922),
 (u'green', 0.002834670446207629),
 (u'quiet', 0.0030352435008745037),
 (u'generally', 0.003558180988855352),
 (u'perfect', 0.012866406697882273),
 (u'cool', 0.016159462179642536)]
In [ ]:
 

We got a reasonably good sentiment lexicon tailored to the specific data we are working without using any labels! These lexicons are very similar to the ones we obtained in last lecture when we used the labels, only that this method can be applied to dataset where we do not have any sentiment annotations.

In [ ]: