%matplotlib inline
from __future__ import print_function
import json
from operator import itemgetter
from collections import defaultdict
from matplotlib import pyplot as plt
import numpy as np
from nltk.tokenize import TreebankWordTokenizer
from nltk import FreqDist,pos_tag
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import load_files
from sklearn.naive_bayes import MultinomialNB
tokenizer = TreebankWordTokenizer()
Using the movie review data, but this time we will not use the sentiment labels (we will pretend we don't have labels).
## loading movie review data:
## http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz
data = load_files('txt_sentoken')
print(data.data[0])
arnold schwarzenegger has been an icon for action enthusiasts , since the late 80's , but lately his films have been very sloppy and the one-liners are getting worse . it's hard seeing arnold as mr . freeze in batman and robin , especially when he says tons of ice jokes , but hey he got 15 million , what's it matter to him ? once again arnold has signed to do another expensive blockbuster , that can't compare with the likes of the terminator series , true lies and even eraser . in this so called dark thriller , the devil ( gabriel byrne ) has come upon earth , to impregnate a woman ( robin tunney ) which happens every 1000 years , and basically destroy the world , but apparently god has chosen one man , and that one man is jericho cane ( arnold himself ) . with the help of a trusty sidekick ( kevin pollack ) , they will stop at nothing to let the devil take over the world ! parts of this are actually so absurd , that they would fit right in with dogma . yes , the film is that weak , but it's better than the other blockbuster right now ( sleepy hollow ) , but it makes the world is not enough look like a 4 star film . anyway , this definitely doesn't seem like an arnold movie . it just wasn't the type of film you can see him doing . sure he gave us a few chuckles with his well known one-liners , but he seemed confused as to where his character and the film was going . it's understandable , especially when the ending had to be changed according to some sources . aside form that , he still walked through it , much like he has in the past few films . i'm sorry to say this arnold but maybe these are the end of your action days . speaking of action , where was it in this film ? there was hardly any explosions or fights . the devil made a few places explode , but arnold wasn't kicking some devil butt . the ending was changed to make it more spiritual , which undoubtedly ruined the film . i was at least hoping for a cool ending if nothing else occurred , but once again i was let down . i also don't know why the film took so long and cost so much . there was really no super affects at all , unless you consider an invisible devil , who was in it for 5 minutes tops , worth the overpriced budget . the budget should have gone into a better script , where at least audiences could be somewhat entertained instead of facing boredom . it's pitiful to see how scripts like these get bought and made into a movie . do they even read these things anymore ? it sure doesn't seem like it . thankfully gabriel's performance gave some light to this poor film . when he walks down the street searching for robin tunney , you can't help but feel that he looked like a devil . the guy is creepy looking anyway ! when it's all over , you're just glad it's the end of the movie . don't bother to see this , if you're expecting a solid action flick , because it's neither solid nor does it have action . it's just another movie that we are suckered in to seeing , due to a strategic marketing campaign . save your money and see the world is not enough for an entertaining experience .
## building the term documnet matrix
vec = CountVectorizer(min_df = 50)
X = vec.fit_transform(data.data)
terms = vec.get_feature_names()
len(terms)
2153
We want to only look at adjectives and adverbs.
We will use the NLTK part of speech tokenizer.
We want to only keep words that are taged as "JJ" (adjectives) or "RB" (adverbs).
##example part of speech (POS) tagging (note that you need to tokenize the sentence first)
pos_tag(tokenizer.tokenize("This was a great day but the time is running out fast"))
[('This', 'DT'), ('was', 'VBD'), ('a', 'DT'), ('great', 'JJ'), ('day', 'NN'), ('but', 'CC'), ('the', 'DT'), ('time', 'NN'), ('is', 'VBZ'), ('running', 'VBG'), ('out', 'RP'), ('fast', 'JJ')]
## POS tagging all reviews
## POS tagging is relatively slow, so this will take a while
#reviews_pos_tagged=[pos_tag(tokenizer.tokenize(m)) for m in data.data]
## Reconstructing adjective-and-adverb-only reviews
reviews_adj_adv_only=[" ".join([w for w,tag in m if tag in ["RB","JJ"]])
for m in reviews_pos_tagged]
## It kind of works:
reviews_adj_adv_only[1]
"good hard great rare rare strong masterful together virtually unheard true real married n't much enough david american anti-government only forward available highly operative wrong always terry surprising david own notable very simple complex character-driven well-written long sharply not caruso b-movie caruso too many memorable stoic memorable extremely well skillfully old-school the"
## term doc matrix only for adj/adv
X = vec.fit_transform(reviews_adj_adv_only)
X = X > 0 # we only keep binary values (is the word in the document)
terms = vec.get_feature_names()
len(terms)
483
# PMI type measure via matrix multiplication
def getcollocations_matrix(X):
XX=X.T.dot(X) ## multiply X with it's transpose to get number docs in which both w1 (row) and w2 (column) occur
term_freqs = np.asarray(X.sum(axis=0)) ## number of docs in which a word occurs
#pmi=np.array(XX) * 1.0 / np.array(X.sum(axis=0)).T / np.array(X.sum(axis=0))
pmi = XX.toarray() * 1.0 ## Casting to float, making it an array to use simple operations
pmi /= term_freqs.T ## dividing by the number of documents in which w1 occurs
pmi /= term_freqs ## dividing by the number of documents in which w2 occurs
return pmi # this is not technically PMI beacuse we are ignoring some normalization factor and not taking the log
# but it's sufficient for ranking
pmi_matrix=getcollocations_matrix(X)
a.shape # n_words by n_words
(483, 483)
a
array([[ 0.00399405, 0.00053261, 0.00085641, ..., 0.00061296, 0.00066274, 0.00049234], [ 0.00053261, 0.01697531, 0.00082139, ..., 0.00045094, 0.00042829, 0.00057458], [ 0.00085641, 0.00082139, 0.00670598, ..., 0.00069823, 0.00045 , 0.00055221], ..., [ 0.00061296, 0.00045094, 0.00069823, ..., 0.00902344, 0.00044339, 0.00087074], [ 0.00066274, 0.00042829, 0.00045 , ..., 0.00044339, 0.00298861, 0.00054673], [ 0.00049234, 0.00057458, 0.00055221, ..., 0.00087074, 0.00054673, 0.00278998]])
pmi_matrix[:,1].ravel().tolist()
[5.14668039114771e-05, 0.0002227667631989307, 8.991188635137565e-05, 0.00026652452025586353, 6.692992436918547e-05, 0.00011940298507462687, 2.6002392220084247e-05, 3.0030931859815612e-05, 0.00013568521031207597, 0.0002261420171867933, 0.00013819789939192924, 0.00012756729174639623, 2.5426530041445244e-05, 0.00010974539069359087, 5.7185337679418995e-05, 1.3935922627757571e-05, 3.503608716978488e-05, 5.632216277105041e-05, 0.00017768301350390902, 0.00014490653528474132, 0.000292654375182909, 0.00024073182474723158, 0.0002487562189054726, 0.00029850746268656717, 0.0002261420171867933, 8.15594160345812e-05, 0.00020169423154497784, 2.8757944382135565e-05, 0.0002227667631989307, 0.0002227667631989307, 0.0002227667631989307, 0.00026184865147944484, 3.2731081434930606e-05, 0.0001243781094527363, 2.438786459857575e-05, 1.940880771694195e-05, 0.00021949078138718174, 0.00015076134479119556, 0.00013948946854512484, 8.577800651912849e-05, 0.00020445716622367614, 6.692992436918547e-05, 0.00012866700977869275, 2.2997493273233217e-05, 0.00018201674554058975, 6.517630189663038e-05, 0.00018426386585590566, 0.00013568521031207597, 0.00019135093761959434, 0.00010815487778498811, 0.00017355085039916696, 0.0002369106846718787, 0.0001122208506340478, 0.00014214641080312722, 8.48032564450475e-05, 5.0594485201113076e-05, 0.00029850746268656717, 0.00012233912405187178, 6.60414740457007e-05, 9.387027128508402e-05, 0.0001320829480914014, 0.0002332089552238806, 0.0002763957987838585, 0.00010222858311183807, 0.00014490653528474132, 0.00020169423154497784, 0.0, 0.00014351320321469576, 5.5900273911342166e-05, 0.0001344628210299852, 9.387027128508402e-05, 4.228151029554776e-05, 0.00014351320321469576, 0.00027137042062415194, 0.0002870264064293915, 0.0001148105625717566, 0.00021949078138718174, 0.00021949078138718174, 0.000281610813855252, 0.00026652452025586353, 0.00026652452025586353, 0.00021949078138718174, 0.0001554726368159204, 0.0002261420171867933, 0.00010364842454394692, 9.884353069091628e-05, 5.9941257567583765e-05, 0.00026652452025586353, 9.884353069091628e-05, 0.00029850746268656717, 0.00024073182474723158, 0.00019383601473153714, 4.564334291843534e-05, 9.950248756218906e-05, 0.00017355085039916696, 0.0, 0.0, 0.00025297242600556537, 0.00018201674554058975, 0.00012233912405187178, 9.156670634557275e-05, 8.677542519958348e-05, 0.000169606512890095, 5.89935697009026e-05, 9.629272989889263e-05, 8.200754469411187e-05, 8.067769261799113e-05, 0.00024073182474723158, 8.831581736288969e-05, 0.00026652452025586353, 0.0002870264064293915, 4.8459003682884284e-05, 8.111615833874107e-05, 0.0002227667631989307, 0.00020729684908789384, 0.0001015331505736623, 2.2410470171664203e-05, 0.0002763957987838585, 0.0002332089552238806, 4.6065966463976415e-05, 0.00013092432573972242, 0.0002369106846718787, 0.0001463271875914545, 3.990741479766941e-05, 0.00026184865147944484, 1.1561094604437148e-05, 7.316359379572725e-05, 2.6184865147944487e-05, 7.614986293024671e-05, 0.000169606512890095, 5.6750468191362575e-05, 0.00027137042062415194, 0.00029850746268656717, 7.107320540156361e-05, 0.0002227667631989307, 0.0001243781094527363, 9.884353069091628e-05, 0.00013092432573972242, 0.00021321961620469082, 3.175611305176246e-05, 0.00011940298507462687, 0.0002870264064293915, 0.00012335019119279634, 0.00018892877385225768, 0.00016223231667748214, 2.1413734769481146e-05, 5.0594485201113076e-05, 5.0594485201113076e-05, 1.428265371706063e-05, 0.00012335019119279634, 0.00025297242600556537, 0.000281610813855252, 9.950248756218906e-05, 0.0, 0.00027137042062415194, 0.0001166044776119403, 0.00024073182474723158, 0.00016048788316482107, 0.00029850746268656717, 4.753303545964446e-05, 0.0, 3.0459945172098693e-05, 0.00015076134479119556, 0.00014214641080312722, 0.0002870264064293915, 0.0001105583195135434, 0.00015229972586049343, 0.0002487562189054726, 0.0001463271875914545, 0.0002261420171867933, 1.3012531067417923e-05, 0.00019638648860958365, 2.1757103694356208e-05, 0.00021321961620469082, 0.00013326226012793177, 0.00012648621300278268, 3.462963604252519e-05, 0.00014214641080312722, 0.000292654375182909, 0.0002332089552238806, 0.00012036591237361579, 1.8821403700287966e-05, 3.769033619779889e-05, 0.00012036591237361579, 8.528784648187633e-05, 0.000169606512890095, 0.0001658374792703151, 2.282167145921767e-05, 0.0001015331505736623, 7.210325185665874e-05, 0.00017559262510974537, 0.00026184865147944484, 0.0001029336078229542, 8.291873963515754e-05, 0.00010815487778498811, 8.15594160345812e-05, 0.00014925373134328358, 0.000169606512890095, 0.00013568521031207597, 0.0001463271875914545, 0.0002369106846718787, 0.0002369106846718787, 0.00029850746268656717, 0.00026652452025586353, 3.324136555529701e-05, 0.0002296211251435132, 0.0002870264064293915, 0.00015229972586049343, 3.439026067817594e-05, 0.0, 0.0002573340195573855, 1.124745526324669e-05, 0.00012134449702705983, 9.7551458394303e-05, 0.00026652452025586353, 2.500062501562539e-05, 7.0402703463813e-05, 5.3304904051172706e-05, 0.00026652452025586353, 0.00021321961620469082, 0.00017559262510974537, 0.00011138338159946535, 0.00020729684908789384, 0.0001243781094527363, 0.00010084711577248892, 0.0001463271875914545, 1.649212501030758e-05, 0.0001554726368159204, 8.528784648187633e-05, 2.3467567821271004e-05, 0.00025297242600556537, 0.0001798237727027513, 0.0, 9.884353069091628e-05, 0.0002369106846718787, 0.00024467824810374357, 4.893564962074871e-05, 0.0002369106846718787, 6.0919890344197386e-05, 0.00020729684908789384, 1.8940828850670503e-05, 0.00020445716622367614, 7.654037504783774e-05, 0.0002296211251435132, 0.00012036591237361579, 0.0001166044776119403, 0.00026652452025586353, 0.00010894432944765224, 0.0002332089552238806, 0.00026184865147944484, 0.00021949078138718174, 0.00011138338159946535, 0.0002227667631989307, 6.815238874122537e-05, 0.000281610813855252, 1.3126977250948425e-05, 0.00010660980810234541, 0.00015076134479119556, 0.00013568521031207597, 0.0002261420171867933, 0.000292654375182909, 0.00016770082173402648, 0.00024467824810374357, 5.6750468191362575e-05, 0.000281610813855252, 0.00016401508938822373, 1.7621455884685192e-05, 1.9535828709853875e-05, 4.6065966463976415e-05, 6.72314105149926e-05, 0.00016223231667748214, 0.00016770082173402648, 0.0001320829480914014, 0.00015878056525881233, 8.795152112155779e-06, 2.1916847480658382e-05, 0.00017355085039916696, 0.00016401508938822373, 5.876131155247385e-05, 7.175660160734788e-05, 0.0002487562189054726, 0.00013693002875530606, 0.00016770082173402648, 0.0002573340195573855, 0.000169606512890095, 0.00012866700977869275, 0.00026184865147944484, 5.042355788624446e-05, 2.5169263295663334e-05, 0.00016048788316482107, 3.148812897537628e-05, 0.0001029336078229542, 1.140211851361983e-05, 0.00012756729174639623, 0.000281610813855252, 3.545219271811962e-05, 0.0002332089552238806, 1.3470553370332453e-05, 0.00016401508938822373, 0.00024073182474723158, 0.00012978585334198572, 0.00010364842454394692, 0.0002227667631989307, 2.5469920024451123e-05, 0.0, 0.00025297242600556537, 0.00010974539069359087, 6.878052135635188e-05, 6.753562504220977e-05, 0.0001658374792703151, 0.0001798237727027513, 0.0001554726368159204, 0.00010222858311183807, 4.012197079120527e-05, 0.00010017028949213663, 0.00017559262510974537, 0.0001658374792703151, 9.506607091928892e-05, 0.0, 9.213193292795283e-05, 0.00024467824810374357, 6.815238874122537e-05, 0.00012756729174639623, 0.0001175226231049477, 9.950248756218906e-05, 0.00029850746268656717, 9.446438692612884e-05, 0.00021021652301870928, 3.9174207701649236e-05, 9.629272989889263e-05, 0.0002369106846718787, 0.00014214641080312722, 3.52013517319065e-05, 0.0002296211251435132, 0.0002763957987838585, 0.00027137042062415194, 0.000169606512890095, 6.846501437765303e-05, 0.0, 3.4232507188826514e-05, 0.00029850746268656717, 0.00015710919088766692, 3.168869030642963e-05, 0.00016770082173402648, 0.00015076134479119556, 2.419023198432473e-05, 0.00013568521031207597, 1.7214963246053472e-05, 7.693491306354824e-05, 0.0001175226231049477, 0.0002573340195573855, 0.0002261420171867933, 0.0002332089552238806, 0.0002261420171867933, 0.00010660980810234541, 0.00011570056693277797, 3.631477648255075e-05, 9.156670634557275e-05, 0.0001658374792703151, 0.0, 2.407318247472316e-05, 0.00014925373134328358, 0.00021949078138718174, 0.00024073182474723158, 4.678800355588827e-05, 0.00012756729174639623, 9.328358208955224e-05, 8.111615833874107e-05, 9.950248756218906e-05, 4.4553352639786145e-05, 9.7551458394303e-05, 0.0002261420171867933, 0.0, 5.85308750365818e-05, 0.00024073182474723158, 0.0001166044776119403, 8.93734918223255e-05, 7.28066982162359e-05, 4.550418638514744e-05, 7.316359379572725e-05, 0.00010894432944765224, 9.819324430479182e-05, 0.00012978585334198572, 4.3641441913240814e-05, 1.0451941970818177e-05, 0.00016770082173402648, 0.0002763957987838585, 0.0002870264064293915, 0.00010084711577248892, 7.388798581350672e-05, 0.00012335019119279634, 0.00014351320321469576, 0.00024073182474723158, 4.468674591116275e-05, 4.100377234705593e-05, 0.0002227667631989307, 0.0001554726368159204, 0.00021021652301870928, 0.00024073182474723158, 2.117074203450831e-05, 9.950248756218906e-05, 6.72314105149926e-05, 7.981482959533881e-05, 0.0002332089552238806, 8.884150675195451e-05, 0.0002573340195573855, 1.8916822730454192e-05, 0.00011845534233593934, 0.00020729684908789384, 0.00019638648860958365, 5.2186619350798456e-05, 0.00026184865147944484, 0.00019383601473153714, 8.291873963515754e-05, 0.0002870264064293915, 0.00014214641080312722, 0.00019900497512437813, 0.00018201674554058975, 0.00012233912405187178, 0.00027137042062415194, 0.00015710919088766692, 0.00021321961620469082, 8.627383314640669e-05, 1.7375288864177365e-05, 2.3880597014925376e-05, 0.00021630975556997622, 0.00016048788316482107, 9.387027128508402e-05, 0.00021949078138718174, 0.00013326226012793177, 0.000292654375182909, 0.000140805406927626, 0.00013693002875530606, 4.509176173513099e-05, 1.5827543090486065e-05, 5.349596105494035e-05, 0.00026184865147944484, 0.00012542330364981812, 0.00016401508938822373, 0.00026652452025586353, 0.00027137042062415194, 4.830217842824712e-05, 6.489292667099286e-05, 0.0002573340195573855, 0.0001166044776119403, 0.0002296211251435132, 0.00021021652301870928, 0.00010737678513905294, 0.0002870264064293915, 0.0, 4.536587578823209e-05, 0.00021021652301870928, 0.00018892877385225768, 0.00027137042062415194, 0.0002573340195573855, 0.000292654375182909, 7.175660160734788e-05, 0.00010017028949213663, 7.500187504687617e-05, 0.00020169423154497784, 0.00011307100859339665, 1.554726368159204e-05, 0.00017155601303825698, 0.00010084711577248892, 0.00019900497512437813, 0.00016770082173402648, 0.0002870264064293915, 0.0002296211251435132, 1.3975068477835541e-05, 0.0002296211251435132, 9.884353069091628e-05, 4.0338846308995566e-05, 0.00021949078138718174, 0.0002332089552238806, 0.00020445716622367614, 8.627383314640669e-05, 0.00019900497512437813, 8.779631255487269e-05, 0.00017155601303825698, 5.6750468191362575e-05, 0.00011307100859339665, 3.8969642648376916e-05, 3.0522235448524254e-05]
"worse" in terms
False
def getcollocations(w):
if w not in terms:
return []
idx = terms.index(w)
col = a[:,idx].ravel().tolist()
return sorted([(terms[i],val) for i,val in enumerate(col)],key=itemgetter(1),reverse=True)
## words that are close to "good", not enough info yet
getcollocations("good")
[(u'good', 0.0012990019157613248), (u'sean', 0.0009894664672151583), (u'nicely', 0.0009215728176087187), (u'forward', 0.0008879991787290832), (u'fairly', 0.0008726003490401396), (u'sad', 0.0008549720591605408), (u'pretty', 0.0008460801423536256), (u'stupid', 0.0008334223741852762), (u'technical', 0.0008266740148801322), (u'totally', 0.0008214479147860624), (u'shot', 0.000813992862910578), (u'sadly', 0.0008132974126976058), (u'average', 0.0008102717526801297), (u'intelligent', 0.0007956062005954214), (u'horrible', 0.0007921177925752724), (u'naturally', 0.0007839768760907504), (u'terrific', 0.0007831028773437151), (u'nice', 0.0007824948782153426), (u'therefore', 0.0007769729135288914), (u'thankfully', 0.0007742791829511099), (u'acting', 0.000772321679896414), (u'lovely', 0.0007690714940692756), (u'present', 0.0007649418644183042), (u'bad', 0.0007640757922921771), (u'climactic', 0.0007468867394326619), (u'really', 0.0007457428528542657), (u'suspenseful', 0.0007447192634049467), (u'mainly', 0.0007442767682989426), (u'entertaining', 0.0007378605892618828), (u'badly', 0.0007348213465601175), (u'total', 0.000731858357259472), (u'disappointing', 0.0007271669575334497), (u'looking', 0.0007249295207410391), (u'maybe', 0.0007220863150353994), (u'national', 0.0007177841580814052), (u'about', 0.0007150475082412255), (u'probably', 0.0007141673247730752), (u'particular', 0.0007139457401237506), (u'subtle', 0.0007139457401237506), (u'slightly', 0.0007136830669301804), (u'dull', 0.0007111692844677137), (u'fantastic', 0.0007082795040910224), (u'terribly', 0.0007071071793945959), (u'general', 0.0007068550313453645), (u'critic', 0.0007057796940765836), (u'weird', 0.000705621269902829), (u'able', 0.0007052875977492574), (u'very', 0.0007032159940293782), (u'natural', 0.0007031633880614717), (u'regular', 0.0006980802792321117), (u'sure', 0.0006975654707666012), (u'right', 0.0006953815152660083), (u'usual', 0.0006951223119472298), (u'black', 0.0006943271594512939), (u'scary', 0.0006937443768766327), (u'seemingly', 0.0006931543095197882), (u'great', 0.0006917807579957256), (u'brilliant', 0.0006905855523078406), (u'cool', 0.0006894620041798633), (u'definitely', 0.0006877273937350253), (u'actually', 0.0006873293419173826), (u'interesting', 0.000687056324644594), (u'overall', 0.0006852150176757506), (u'individual', 0.0006847488850106651), (u'mean', 0.0006847488850106651), (u'wonderfully', 0.0006826814495431681), (u'gary', 0.0006817190226876091), (u'well', 0.0006810369408554363), (u'fly', 0.0006808926965995029), (u'tight', 0.0006796214256947241), (u'impressive', 0.0006780063744889347), (u'musical', 0.0006771152059110175), (u'basically', 0.0006761618818391603), (u'sometimes', 0.000675186962089994), (u'realistic', 0.0006745871929118003), (u'major', 0.000674416540953057), (u'evil', 0.0006739860904899585), (u'necessary', 0.0006734198345853251), (u'special', 0.0006732364731610216), (u'funny', 0.0006728094275396127), (u'surprising', 0.0006724155630838723), (u'basic', 0.0006720430107526881), (u'hardly', 0.0006709490078754015), (u'offensive', 0.0006702582391177884), (u'just', 0.0006698552799986053), (u'ensemble', 0.0006694950953842451), (u'usually', 0.0006688654657840717), (u'believable', 0.0006680846422338569), (u'whatever', 0.0006676714791898038), (u'anti', 0.0006672826198542243), (u'co', 0.0006663493574488339), (u'also', 0.0006659902460502185), (u'again', 0.0006639350481827149), (u'then', 0.0006622797160026675), (u'anyway', 0.0006614873613691381), (u'relatively', 0.0006610608704849543), (u'somewhere', 0.000660454392622124), (u'tough', 0.0006601411336216709), (u'give', 0.0006577062332317469), (u'extremely', 0.0006562095366773631), (u'too', 0.0006548689759591067), (u'however', 0.0006546717339499118), (u'especially', 0.0006544502617801048), (u'supposedly', 0.0006544502617801048), (u'terrible', 0.0006529247366943702), (u'even', 0.0006527486230339266), (u'generally', 0.0006523322997678714), (u'huge', 0.000651853236931771), (u'ahead', 0.0006513777253398226), (u'largely', 0.0006509875619823263), (u'there', 0.0006507706217540319), (u'unbelievable', 0.0006507528026740025), (u'so', 0.0006499259587981563), (u'always', 0.000649825171590846), (u'never', 0.000649801949854652), (u'slowly', 0.0006492971101125448), (u'fair', 0.0006483905371339927), (u'not', 0.0006475484673745697), (u'next', 0.0006473426148267702), (u'likable', 0.0006461660812512426), (u'occasionally', 0.000645284291727162), (u'interested', 0.0006442563324688881), (u'back', 0.0006437715861799632), (u'superior', 0.0006429686782401028), (u'entirely', 0.000642160116018976), (u'later', 0.0006418646798227951), (u'nearly', 0.0006416179037059851), (u'strong', 0.0006413284520201026), (u'second', 0.000641094133988674), (u'little', 0.0006402652800623065), (u'powerful', 0.0006399069226294357), (u'instead', 0.0006398543246578239), (u'personal', 0.0006360797281161018), (u'capable', 0.0006352017246689251), (u'completely', 0.0006340763356350902), (u'wrong', 0.0006339986910994765), (u'predictable', 0.0006339610270650738), (u'quiet', 0.0006335317602620191), (u'ever', 0.0006327797233105649), (u'wild', 0.0006318830113738942), (u'social', 0.0006306520704426464), (u'remarkable', 0.0006302113631956564), (u'ago', 0.0006285680480373887), (u'recently', 0.0006284024901669663), (u'finally', 0.0006282722513089005), (u'quite', 0.0006271815008726003), (u'frankly', 0.0006266857052197366), (u'soft', 0.0006266857052197366), (u'typical', 0.0006264023934181003), (u'big', 0.0006261946314302335), (u'apparent', 0.000625995902572274), (u'much', 0.000625995902572274), (u'hard', 0.0006251580239147996), (u'small', 0.000624448268427326), (u'cute', 0.0006242994367116446), (u'many', 0.0006232859636000998), (u'screen', 0.0006232859636000998), (u'dramatic', 0.0006229646821755636), (u'moral', 0.0006225856422926838), (u'anywhere', 0.0006221317303341736), (u'entire', 0.000622029971733903), (u'mary', 0.0006217277486910995), (u'out', 0.0006213972182558571), (u'poor', 0.0006213279972222983), (u'important', 0.000620966760014611), (u'awful', 0.0006208887098939455), (u'perfectly', 0.0006202821758237137), (u'fully', 0.0006202306402491188), (u'effectively', 0.0006200055111600992), (u'brief', 0.0006196195755789227), (u'short', 0.0006194569030728637), (u'here', 0.0006189801970444799), (u'hot', 0.0006180919139034323), (u'still', 0.0006177443197554717), (u'highly', 0.0006159531875577456), (u'extra', 0.0006154948890550985), (u'yet', 0.0006150487041692614), (u'long', 0.0006143818784058126), (u'far', 0.0006138641990340517), (u'seriously', 0.0006133867159429216), (u'common', 0.0006129837162678666), (u'enough', 0.0006123595613716085), (u'responsible', 0.0006120321892573201), (u'almost', 0.0006113555819655456), (u'final', 0.0006112807194463247), (u'same', 0.0006110246003817371), (u'certain', 0.0006099298358086691), (u'quickly', 0.0006097896139945858), (u'sweet', 0.000609664482276389), (u'obvious', 0.0006093157609676838), (u'together', 0.0006092378084619628), (u'practically', 0.0006091738285751918), (u'possible', 0.0006077038145100972), (u'positive', 0.000606729930191972), (u'obviously', 0.0006066103303634303), (u'comic', 0.0006062872555019151), (u'immediately', 0.0006049315303161704), (u'solid', 0.0006049315303161704), (u'flat', 0.0006041079339508659), (u'likely', 0.0006039032903418039), (u'visually', 0.0006032792536573804), (u'due', 0.0006032517719129537), (u'other', 0.000602488434013854), (u'incredible', 0.0006016349774960963), (u'now', 0.0006013986260286734), (u'laughable', 0.0006013867270411772), (u'only', 0.0006012920510346847), (u'magic', 0.0006007031388319802), (u'mental', 0.000599912739965096), (u'emotional', 0.0005994544414624489), (u'willing', 0.0005992556613890115), (u'wonderful', 0.0005986444255042818), (u'surprisingly', 0.0005984850561479492), (u'similar', 0.0005981534650678376), (u'utterly', 0.0005973157151167622), (u'several', 0.0005970423440800956), (u'off', 0.0005968586387434555), (u'early', 0.000596662159194665), (u'previous', 0.0005958077652048264), (u'unfortunately', 0.0005950586149867968), (u'apparently', 0.0005949547834364588), (u'often', 0.0005935574189947833), (u'away', 0.0005935532409548652), (u'whole', 0.0005926566748309103), (u'male', 0.0005917634550961866), (u'unnecessary', 0.0005911163654788043), (u'happy', 0.0005908947245468662), (u'professional', 0.0005906014557527775), (u'intense', 0.0005898131988882425), (u'past', 0.0005896869546247819), (u'first', 0.0005888259341605106), (u'exactly', 0.0005876096626532927), (u'few', 0.0005874664175591812), (u'main', 0.0005869133854502856), (u'free', 0.0005864554293873665), (u'easy', 0.0005863400535405463), (u'old', 0.0005863261994427605), (u'close', 0.0005852807219171669), (u'quick', 0.000584929904301632), (u'third', 0.000584929904301632), (u'fast', 0.0005848951614942964), (u'soon', 0.0005843305908750934), (u'time', 0.0005840792658897709), (u'criminal', 0.0005837819236536145), (u'enjoyable', 0.0005835180248182529), (u'rather', 0.0005833916014348164), (u'easily', 0.0005824294195746387), (u'else', 0.0005817335660267597), (u'friendly', 0.0005817335660267597), (u'non', 0.0005817335660267597), (u'certainly', 0.000580122115705356), (u'absolutely', 0.0005793878661637486), (u'critical', 0.0005792260937594031), (u'minor', 0.0005792260937594031), (u'like', 0.000578571970559223), (u'as', 0.0005780282566890096), (u'such', 0.0005780075865749354), (u'spectacular', 0.0005779560753382743), (u'normal', 0.0005778029338238763), (u'incredibly', 0.0005770421663007374), (u'simple', 0.0005743698500011045), (u'unfunny', 0.0005740791770000919), (u'appropriate', 0.0005726439790575916), (u'necessarily', 0.0005721969501902554), (u'pathetic', 0.000571238372825246), (u'double', 0.0005711565920990004), (u'straight', 0.0005711565920990004), (u'last', 0.0005711090907190853), (u'popular', 0.0005705463820647067), (u'animal', 0.000570005066711704), (u'effective', 0.000569183476404355), (u'clever', 0.0005690871841566128), (u'dimensional', 0.0005690871841566128), (u'minute', 0.0005690871841566128), (u'available', 0.0005683383852300909), (u'disturbing', 0.0005665390325857622), (u'impossible', 0.0005663772451844384), (u'international', 0.0005658267888307156), (u'constantly', 0.0005653467050119214), (u'originally', 0.0005653467050119214), (u'ready', 0.0005652932695955688), (u'already', 0.0005646237552612668), (u'real', 0.000564423918761781), (u'suddenly', 0.0005636870867887034), (u'cold', 0.0005633242759626218), (u'original', 0.0005631458713257642), (u'essentially', 0.0005629679671226707), (u'worthy', 0.0005629679671226707), (u'particularly', 0.0005622242086295208), (u'different', 0.0005621765084891854), (u'once', 0.0005610615956620964), (u'no', 0.0005609573672400897), (u'nowhere', 0.0005609573672400897), (u're', 0.0005609573672400897), (u'top', 0.0005591295491650161), (u'perhaps', 0.0005588034632622061), (u'comedy', 0.0005578920264354991), (u'computer', 0.000557121761310243), (u'somewhat', 0.0005567711182308353), (u'worth', 0.0005565775199283053), (u'military', 0.0005565053246429462), (u'favorite', 0.0005564408022864658), (u'new', 0.0005556301367819692), (u'potential', 0.0005552911312073616), (u'steven', 0.0005548014564884839), (u'oddly', 0.000554618865915343), (u'ex', 0.0005545497545301821), (u'visual', 0.0005544145864067186), (u'mysterious', 0.0005536245911977356), (u'virtually', 0.0005535851676706262), (u'thoroughly', 0.0005530565592507927), (u'clear', 0.000552812152943043), (u'complex', 0.0005519010754612848), (u'shallow', 0.0005508289703315881), (u'open', 0.0005504094509330112), (u'familiar', 0.0005498931193655318), (u'true', 0.0005498272607472308), (u'rare', 0.0005494150345808287), (u'successful', 0.0005488937679446039), (u'young', 0.0005464644125943242), (u'epic', 0.0005453752181500873), (u'rarely', 0.0005442952672230574), (u'bright', 0.0005440919823426753), (u'simply', 0.0005430653936967457), (u'english', 0.0005429513282916424), (u'merely', 0.0005426483420593368), (u'low', 0.0005416140097490522), (u'earth', 0.0005414796808775866), (u'unique', 0.00054041726162145), (u'song', 0.0005401811684534198), (u'aware', 0.0005398290294909338), (u'billy', 0.0005398290294909338), (u'serious', 0.000539794634522505), (u'silly', 0.0005396344263800864), (u'traditional', 0.0005385580279232111), (u'weak', 0.0005376628413277627), (u'future', 0.0005367184686556415), (u'truly', 0.0005366173679316187), (u'emotionally', 0.0005365312956935994), (u'painful', 0.0005364346408033645), (u'indeed', 0.0005360259286960858), (u'wealthy', 0.0005358072318667524), (u'clearly', 0.000535459305092813), (u'humorous', 0.000535459305092813), (u'literally', 0.0005350851196944252), (u'comedic', 0.0005349871187567523), (u'memorable', 0.0005349159673910445), (u'single', 0.000533044995826694), (u'up', 0.0005324341112787293), (u'poorly', 0.0005301281690405149), (u'light', 0.0005284762677285353), (u'fat', 0.0005281528428400844), (u'mostly', 0.0005281528428400844), (u'large', 0.0005280223702998572), (u'eventually', 0.000527196044211751), (u'difficult', 0.0005267526497254501), (u'sympathetic', 0.0005267526497254501), (u'late', 0.0005266354529449301), (u'the', 0.0005262589733901873), (u'classic', 0.0005260356714071764), (u'music', 0.0005259185887458138), (u'surely', 0.0005250391930665246), (u'genuinely', 0.000524923647469459), (u'thus', 0.0005242934870283192), (u'overly', 0.0005235602094240837), (u'star', 0.0005224829250425528), (u'lucky', 0.0005223311948479709), (u'successfully', 0.0005221297170486082), (u'giant', 0.0005217295793212024), (u'psychological', 0.0005207453695884705), (u'constant', 0.0005202040542354678), (u'rich', 0.0005175983436853001), (u'graphic', 0.0005165392870754849), (u'perfect', 0.0005149128185777401), (u'dangerous', 0.0005117100812272425), (u'barely', 0.0005116611137553546), (u'further', 0.0005110944901520818), (u'ultimate', 0.0005109304675300817), (u'own', 0.000510909126296736), (u'hilarious', 0.0005108439229305341), (u'slow', 0.0005107278513499641), (u'standard', 0.0005104712041884816), (u'laugh', 0.0005102493566421155), (u'eccentric', 0.0005100410490868422), (u'to', 0.0005096279349436109), (u'meanwhile', 0.0005095719595539747), (u'dead', 0.0005094771025250182), (u'alive', 0.0005090168702734148), (u'over', 0.0005090168702734148), (u'physical', 0.0005082177857046968), (u'beautiful', 0.0005078008720166698), (u'human', 0.0005066196165672606), (u'dark', 0.000505922542794549), (u'movie', 0.0005055541704756365), (u'complete', 0.0005051896757600809), (u'private', 0.0005044720767888307), (u'ill', 0.0005040019257387013), (u'equally', 0.0005037283833095351), (u'initially', 0.000502854438429911), (u'heavily', 0.0005024062615685653), (u'attractive', 0.000500937237411932), (u'recent', 0.0005004011480277578), (u'full', 0.0004998273977331568), (u'key', 0.0004994884756574592), (u'high', 0.0004967698478307461), (u'various', 0.000496184512199295), (u'wide', 0.0004952596575633224), (u'hearted', 0.0004939247258717772), (u'life', 0.0004938342634677713), (u'female', 0.0004935921166287658), (u'blue', 0.0004930191972076788), (u'unusual', 0.000491829287640806), (u'ultimately', 0.000488766651202863), (u'british', 0.0004880646020055018), (u'public', 0.0004880646020055018), (u'possibly', 0.00048703275295263604), (u'david', 0.00048603387317002595), (u'former', 0.00048587973980644136), (u'political', 0.00048477797168896647), (u'of', 0.0004842452486431544), (u'modern', 0.00048378862888960117), (u'down', 0.00048271508670305596), (u'accidentally', 0.0004822265086800772), (u'outstanding', 0.0004812134277794888), (u'painfully', 0.00047993019197207685), (u'sexual', 0.00047947571262361836), (u'narrative', 0.00047871824704285443), (u'jean', 0.00047596382674916705), (u'unable', 0.00047596382674916705), (u'self', 0.0004755441055615576), (u'fresh', 0.00047517840789314535), (u'thin', 0.0004732747655810927), (u'cast', 0.00047240777517000657), (u'directly', 0.00047120418848167544), (u'on', 0.00047051979605105567), (u'elaborate', 0.0004692284895781883), (u'extraordinary', 0.0004674644727000748), (u'year', 0.00046463135468371064), (u'one', 0.00046339956424181324), (u'ridiculous', 0.0004612316130640738), (u'actual', 0.0004608804660423272), (u'famous', 0.00046053907310451814), (u'local', 0.00046017729849878015), (u'foreign', 0.00045926334160007343), (u'heavy', 0.00045926334160007343), (u'deadly', 0.0004588674249262803), (u'half', 0.0004575831098625123), (u'desperately', 0.00045707637330673985), (u'fellow', 0.00045707637330673985), (u'inevitable', 0.00045499875342807283), (u'initial', 0.00045276433204912905), (u'trouble', 0.00045188232361007235), (u'american', 0.000450472337961557), (u'be', 0.0004477817580600717), (u'alone', 0.00044681343173742094), (u'cheesy', 0.00044453225328459944), (u'desperate', 0.00044437980738155263), (u'two', 0.00044382259132213993), (u'occasional', 0.0004433372741091032), (u'odd', 0.0004406199782281893), (u'paul', 0.0004401957117925704), (u'cinematic', 0.00043958062696007036), (u'sole', 0.0004363001745200698), (u'ugly', 0.0004363001745200698), (u'numerous', 0.0004319371727748691), (u'sharp', 0.000429587864142838), (u'romantic', 0.0004293747749245131), (u'all', 0.0004259938711849501), (u'current', 0.0004246655031995346), (u'frequently', 0.0004234678164459501), (u'green', 0.0004226657940663176), (u'amusing', 0.00042159342706433713), (u'opposite', 0.00042071802543006733), (u'live', 0.0004198360169910106), (u'creative', 0.0004113687359760658), (u'unlikely', 0.00039802822938673035), (u'in', 0.000395902010212656), (u'and', 0.00039267015706806284), (u'central', 0.00038942494915840945), (u'white', 0.0003889657216240245), (u'love', 0.00037332901531098755), (u'previously', 0.0003593060260753516), (u'detective', 0.0003431098459818025)]
## suming scores from a list of seed words for which we know the polarity
def seed_score(pos_seed):
score=defaultdict(int)
for seed in pos_seed:
c=dict(getcollocations(seed))
for w in c:
score[w]+=c[w]
return score
# words that are closest to the seed set (still many negatives in there, so we need some more work)
sorted(seed_score(['good','great','perfect','cool']).items(),key=itemgetter(1),reverse=True)
[(u'cool', 0.01836803051789725), (u'perfect', 0.014235784691719532), (u'generally', 0.004914620304679139), (u'great', 0.0044212228153536195), (u'shallow', 0.004387519939375499), (u'green', 0.004031276633069313), (u'quiet', 0.0038389323602368874), (u'sadly', 0.0037656812902670915), (u'cold', 0.00372639619058579), (u'eccentric', 0.003566246811727291), (u'anyway', 0.0035435598196591803), (u'mary', 0.003528786619802258), (u'like', 0.0034720852363373366), (u'willing', 0.0034463352861676677), (u'overall', 0.003416962030468515), (u'off', 0.0033758301235972932), (u'visually', 0.0033742110974529374), (u'therefore', 0.0033455317539115297), (u'close', 0.0033320331697695668), (u'sad', 0.0033247711223444326), (u'nicely', 0.00329676987067692), (u'entirely', 0.0032905492598513672), (u'intelligent', 0.0032582896033018925), (u'lovely', 0.0032553034585332107), (u'surely', 0.0032499006749901536), (u'totally', 0.003242437962550958), (u'minor', 0.0032310265819358213), (u'slowly', 0.0032165337363270928), (u'attractive', 0.003213592455513773), (u'terribly', 0.00320713317502573), (u'sean', 0.0031974453908803564), (u'good', 0.003195157496514654), (u'basically', 0.0031879411336983307), (u'definitely', 0.003181514976699522), (u'desperate', 0.003168203518859585), (u'lucky', 0.0031488310596797884), (u'whatever', 0.0031396648899829774), (u'actually', 0.003130303792115775), (u'utterly', 0.0031247667719795837), (u'wrong', 0.0030847532558246297), (u'believable', 0.0030755305239444446), (u'classic', 0.003066005635935617), (u'computer', 0.003062877208242161), (u'pretty', 0.0030497972634102315), (u'forward', 0.003016719069848052), (u'horrible', 0.003004876906152966), (u'out', 0.0029975287288694433), (u'looking', 0.002994825908284486), (u'incredible', 0.002980813104868238), (u'huge', 0.002967788559403041), (u'subtle', 0.0029647516154097213), (u'sympathetic', 0.0029574799728487563), (u'sure', 0.002942750201925073), (u'perfectly', 0.00291984566130735), (u'mental', 0.0029124135206031318), (u'necessary', 0.002908556234257104), (u'anti', 0.0028944363752663417), (u'hearted', 0.002893223092613725), (u'climactic', 0.002883438227928318), (u'very', 0.0028788226459838047), (u'past', 0.0028782458455121037), (u'acting', 0.002870504043733427), (u'tight', 0.002868221653510638), (u'memorable', 0.002866545817151434), (u'soft', 0.0028548578086146567), (u'weird', 0.0028444428229098093), (u'outstanding', 0.0028443425210123883), (u'ensemble', 0.0028432682292655605), (u'wonderfully', 0.00284156676875144), (u'interesting', 0.0028386296854845793), (u'mainly', 0.002837331341168971), (u'impossible', 0.002836475423211005), (u'similar', 0.002825617162065585), (u'different', 0.002822047002694978), (u'really', 0.002820943592758371), (u'high', 0.002816492704189309), (u'wonderful', 0.0028127558963416985), (u'funny', 0.0028126166780049876), (u'same', 0.0028115870709538504), (u'constant', 0.0027988950584166682), (u'especially', 0.0027943160712365573), (u'genuinely', 0.0027935832384329714), (u'blue', 0.0027852847247673237), (u'evil', 0.0027836162729910803), (u'incredibly', 0.002781830187028037), (u'apparent', 0.0027805310210175945), (u'effectively', 0.0027769925322682076), (u'nice', 0.002768021053852345), (u'fly', 0.0027663925539353675), (u'slightly', 0.002754343268868139), (u'barely', 0.0027521125520677554), (u'entire', 0.0027417046601842126), (u'necessarily', 0.0027360158931798834), (u'also', 0.0027332608938965353), (u'stupid', 0.002726806900623024), (u'love', 0.002718968110495262), (u'brilliant', 0.0027078034963873266), (u'present', 0.0027074937178435976), (u'third', 0.002701147706793944), (u'inevitable', 0.002698495114571493), (u'paul', 0.0026975221112758327), (u'solid', 0.002696724010656708), (u'second', 0.002694117545555503), (u'moral', 0.002693374009622139), (u'successful', 0.002687012182681193), (u'comic', 0.002682085766238037), (u'final', 0.0026721227616635025), (u'black', 0.0026710705445859924), (u'yet', 0.0026689962966847036), (u'original', 0.0026679941151914128), (u'completely', 0.0026594370192654596), (u'regular', 0.002651073081898117), (u'somewhere', 0.0026399150876791496), (u'traditional', 0.0026370942869986857), (u'probably', 0.0026316485937714876), (u'no', 0.002629862097209245), (u'famous', 0.0026273424381649635), (u'realistic', 0.00262547541830983), (u'never', 0.002625244395827706), (u'social', 0.002624405668474319), (u'give', 0.0026229033988971594), (u'fantastic', 0.0026222127905733282), (u'individual', 0.0026185379920023863), (u'right', 0.0026181746246368414), (u'only', 0.002617469727038716), (u'dull', 0.00261633648375256), (u'next', 0.0026161268333529906), (u'major', 0.002608481485985101), (u'always', 0.002607315437125177), (u'still', 0.002598904288393414), (u'key', 0.0025986126167018165), (u'nearly', 0.00259743762192478), (u'first', 0.0025949812360450634), (u'just', 0.002594077290946316), (u'particular', 0.0025934035019344534), (u'absolutely', 0.002592714515796534), (u'bad', 0.0025889114735255002), (u'fat', 0.0025853394975499586), (u'physical', 0.002584729666138146), (u'surprising', 0.0025785158544206115), (u'real', 0.0025666908886398223), (u'fully', 0.002563717589503081), (u'young', 0.00256224077034713), (u'elaborate', 0.002543562602388069), (u'late', 0.0025428097935556535), (u'cute', 0.002542474876328181), (u'silly', 0.0025367706469425193), (u'two', 0.002531225625310705), (u'again', 0.002529209527150347), (u'poor', 0.002527607544401137), (u'initial', 0.0025266776157293384), (u'short', 0.0025229835673570736), (u'alive', 0.002515791916264147), (u're', 0.00251410007668532), (u'movie', 0.0025137021481225247), (u'popular', 0.002510852243939196), (u'favorite', 0.0025054240669844224), (u'surprisingly', 0.0024978235225739834), (u'hot', 0.0024926082900537955), (u'unfortunately', 0.002491947150824106), (u'hilarious', 0.0024888159472125394), (u'terrific', 0.002486994559448905), (u'not', 0.0024838196518674758), (u'extremely', 0.0024836795804661924), (u'ever', 0.0024799846740085688), (u'slow', 0.0024765659763576393), (u'so', 0.002470072047171524), (u'fast', 0.0024699275239350137), (u'international', 0.002466387666107279), (u'little', 0.0024596442383871923), (u'merely', 0.0024589474397309834), (u'further', 0.0024579186591984735), (u'wild', 0.0024570737029157236), (u'powerful', 0.0024570439249516234), (u'unique', 0.0024550283562681952), (u'long', 0.0024542093072923944), (u'sexual', 0.002452222374912764), (u'usual', 0.002451004703699789), (u'too', 0.002450506270197633), (u'together', 0.0024498262842226424), (u'important', 0.0024469273639049914), (u'painful', 0.0024428002565871736), (u'general', 0.0024406913582367905), (u'somewhat', 0.0024383784285768053), (u'professional', 0.002437781447361272), (u'exactly', 0.0024375182927824286), (u'able', 0.002431802338319177), (u'disappointing', 0.0024294287360479513), (u'once', 0.0024280866937187373), (u'much', 0.0024268078320277466), (u'thankfully', 0.0024263668609984675), (u'well', 0.0024262964525530823), (u'trouble', 0.002424968855520134), (u'new', 0.0024245287179007918), (u'else', 0.0024238107605589858), (u'away', 0.002420198956604946), (u'however', 0.002419841628909794), (u'eventually', 0.0024192507548939125), (u'strong', 0.0024176279579377715), (u'female', 0.0024166696982525944), (u'future', 0.002411977424524619), (u'here', 0.0024116498209578057), (u'fair', 0.002410216236868189), (u'actual', 0.0024067205090930856), (u'effective', 0.0024056607781946858), (u'male', 0.0024051920208348716), (u'all', 0.0023966930782590217), (u'ultimate', 0.0023952146015562177), (u'hardly', 0.002393683141220838), (u'common', 0.0023916366049515445), (u'former', 0.002391407046403267), (u'even', 0.0023895010595202347), (u'literally', 0.0023869605379775384), (u'last', 0.0023856404344273687), (u'responsible', 0.002384712475009501), (u'comedic', 0.0023833468102536985), (u'as', 0.002382005332303402), (u'shot', 0.0023801158805992965), (u'hard', 0.0023792820325943408), (u'possible', 0.002378353733308164), (u'typical', 0.0023766948299850463), (u'certainly', 0.0023757811271592727), (u'clear', 0.00237362502544932), (u'then', 0.0023720256544745794), (u'special', 0.002371843688843257), (u'essentially', 0.0023630257444100496), (u'soon', 0.002357546474735235), (u'scary', 0.0023563717854545954), (u'quick', 0.0023473630392837866), (u'spectacular', 0.0023470585574532337), (u'other', 0.0023456108036209035), (u'later', 0.00234524260869092), (u'practically', 0.0023435453372715248), (u'straight', 0.0023371369082872784), (u'normal', 0.0023315921809888523), (u'seriously', 0.002328428800073818), (u'remarkable', 0.002325943042587466), (u'supposedly', 0.0023247640235971347), (u'big', 0.002322568876824087), (u'maybe', 0.0023225510972794436), (u'certain', 0.0023224311981699755), (u'billy', 0.002321059790449179), (u'easy', 0.0023189144504748046), (u'likable', 0.0023167935895555937), (u'whole', 0.002315434359066139), (u'easily', 0.0023119519296581773), (u'particularly', 0.0023108098883486295), (u'dead', 0.002303688222871656), (u'badly', 0.002303117290501056), (u'awful', 0.0022977472494975833), (u'many', 0.0022958937460551363), (u'visual', 0.0022940334855966285), (u'truly', 0.0022931001606245156), (u'co', 0.002289362745168104), (u'occasionally', 0.0022872519083199224), (u'ridiculous', 0.002283616768782563), (u'simple', 0.002281308555367873), (u'numerous', 0.0022812106887150517), (u'english', 0.002281201759352531), (u'obviously', 0.00227965655105327), (u'light', 0.0022778656862701215), (u'technical', 0.002276158023703666), (u'total', 0.0022756510142716104), (u'back', 0.0022745184635103223), (u'sharp', 0.002271797549489778), (u'there', 0.0022643260623824704), (u'relatively', 0.002262957154496509), (u'few', 0.0022623065819445416), (u'modern', 0.002261513095689698), (u'complete', 0.0022609091817375844), (u'disturbing', 0.002259201977809433), (u'rich', 0.002251969643273991), (u'far', 0.002250680245334822), (u'ahead', 0.002248176753787567), (u'fresh', 0.0022469860426400155), (u'now', 0.0022465177203474275), (u'such', 0.0022452292343507884), (u'worthy', 0.00224307369529924), (u'thin', 0.0022405183075006065), (u'laugh', 0.002236045366253346), (u'constantly', 0.002235679637427152), (u'single', 0.00223521992903317), (u'basic', 0.002232160715715428), (u'clearly', 0.002228490855521868), (u'usually', 0.0022177495835547746), (u'often', 0.0022175746655211962), (u'private', 0.002216483776545628), (u'various', 0.002212771836886069), (u'criminal', 0.0022077352184160784), (u'old', 0.0022071295402323414), (u'rare', 0.002206383285594105), (u'and', 0.0022039288310718346), (u'serious', 0.002202448873556133), (u'epic', 0.002200503937262294), (u'early', 0.0021999110030616106), (u'musical', 0.002197821272071629), (u'potential', 0.002196168886167113), (u'beautiful', 0.00219266018688451), (u'superior', 0.0021878102643613584), (u'intense', 0.0021849912646545433), (u'rarely', 0.002182915232381559), (u'almost', 0.0021827639379282137), (u'human', 0.002181964429166975), (u'top', 0.0021818912489780643), (u'deadly', 0.002179747441952342), (u'ago', 0.0021771875760520647), (u'quite', 0.00217716996792519), (u'rather', 0.0021709921383870814), (u'frequently', 0.0021679379865216565), (u'instead', 0.002167905989219789), (u'enough', 0.0021667069012216832), (u'seemingly', 0.0021624517939975384), (u'natural', 0.002161209011346006), (u'extra', 0.0021597403058495348), (u'difficult', 0.0021552941515333313), (u'true', 0.0021499827474573394), (u'up', 0.002149383094462342), (u'emotionally', 0.0021459115484230838), (u'song', 0.0021406378342695626), (u'american', 0.0021403246942986046), (u'several', 0.002139164769962724), (u'ready', 0.002137019223248206), (u'weak', 0.002131642351726366), (u'wealthy', 0.0021269001542290247), (u'star', 0.002115006546130686), (u'national', 0.0021147161369235675), (u'emotional', 0.002114499497453295), (u'full', 0.0021131329326073193), (u'creative', 0.0021078831775885464), (u'of', 0.0021047860848702653), (u'suddenly', 0.002102724335000132), (u'cinematic', 0.002099879500881235), (u'sometimes', 0.002098097167524616), (u'dark', 0.0020945422999583195), (u'mean', 0.0020915701447731016), (u'directly', 0.002091255612454668), (u'dramatic', 0.002082295634128307), (u'due', 0.0020775561173657585), (u'main', 0.002077327688112775), (u'highly', 0.0020734043999109187), (u'perhaps', 0.0020718721850718756), (u'recent', 0.002071123097896627), (u'accidentally', 0.0020672308456623984), (u'dangerous', 0.0020655845472295226), (u'available', 0.002060730486497321), (u'obvious', 0.0020567832665825863), (u'white', 0.002056694729345633), (u'initially', 0.0020457429869681775), (u'ultimately', 0.0020445486115517135), (u'simply', 0.002040809336654068), (u'virtually', 0.0020385289886886617), (u'opposite', 0.002034424295999083), (u'immediately', 0.002033911005728943), (u'the', 0.0020321660786504357), (u'brief', 0.002031927792962111), (u'quickly', 0.0020285452399449004), (u'unable', 0.0020276856993266543), (u'naturally', 0.002027352047162), (u'originally', 0.0020264246642947964), (u'capable', 0.0020229531124203127), (u'predictable', 0.002020379107232477), (u'flat', 0.0020194064080728694), (u'thoroughly', 0.0020183942576732744), (u'non', 0.0020174620063781877), (u'personal', 0.0020086691296012722), (u'happy', 0.002007374294492547), (u'small', 0.0020040568969961446), (u'political', 0.001999982549818061), (u'music', 0.0019927175648187696), (u'aware', 0.0019904472963272134), (u'entertaining', 0.001990095838036924), (u'offensive', 0.0019832429280615565), (u'gary', 0.0019823038097920038), (u'clever', 0.0019820969802297793), (u'already', 0.001981432011068098), (u'impressive', 0.0019801420317751032), (u'finally', 0.0019792514454253494), (u'dimensional', 0.0019747302138405634), (u'amusing', 0.0019594628873836764), (u'critical', 0.001955164082273457), (u'possibly', 0.0019511120404093365), (u'painfully', 0.0019487341117563886), (u'own', 0.0019485898408926775), (u'equally', 0.001941586594786285), (u'tough', 0.001935686228433603), (u'comedy', 0.001928955676509114), (u'cast', 0.0019216965131673314), (u'foreign', 0.0019214942499954125), (u'average', 0.001919281774346123), (u'thus', 0.0019142452806710012), (u'on', 0.0019121680081721627), (u'british', 0.00190909801041013), (u'suspenseful', 0.0019074161664669616), (u'double', 0.001903480651695787), (u'familiar', 0.001901770791001876), (u'time', 0.0018998176942151696), (u'standard', 0.0018994180833314393), (u'nowhere', 0.0018987297134157972), (u'magic', 0.0018913580242581073), (u'likely', 0.001888523417620946), (u'ill', 0.0018874309629878874), (u'over', 0.0018741371896421003), (u'pathetic', 0.0018694014862299202), (u'previous', 0.0018656264501616096), (u'indeed', 0.0018505566549915182), (u'frankly', 0.0018490553767872694), (u'unnecessary', 0.0018442958286519423), (u'meanwhile', 0.0018373819237956107), (u'sweet', 0.0018317965315737415), (u'bright', 0.0018209271692124643), (u'unusual', 0.0018096760600330329), (u'graphic', 0.0018072146818752881), (u'apparently', 0.0018057793360379136), (u'earth', 0.0018034447908819122), (u'year', 0.0018024798404822854), (u'interested', 0.0018019685951838995), (u'unbelievable', 0.00179761403594268), (u'heavily', 0.001782206560976708), (u'odd', 0.0017814712511314912), (u'romantic', 0.0017799989468994474), (u'open', 0.0017782724435018858), (u'complex', 0.0017774284447795035), (u'friendly', 0.0017772249575965825), (u'local', 0.0017699980405700344), (u'ex', 0.001765399600161043), (u'self', 0.0017556819143849144), (u'to', 0.0017544354081247865), (u'enjoyable', 0.0017532791488678407), (u'appropriate', 0.0017216651035689122), (u'life', 0.0017144139154199722), (u'in', 0.0017140816145215269), (u'be', 0.0017024490343991747), (u'steven', 0.0017010809857261344), (u'one', 0.0016979741150993989), (u'overly', 0.0016902408532289107), (u'screen', 0.0016889626085993827), (u'david', 0.0016848396539122083), (u'unfunny', 0.0016754948810989746), (u'about', 0.0016658770266088703), (u'fairly', 0.0016657738475276476), (u'detective', 0.0016518359786180817), (u'humorous', 0.001649419643652082), (u'large', 0.0016448568761717962), (u'animal', 0.001639990007664387), (u'terrible', 0.0016251396126525615), (u'mostly', 0.0016236936700630583), (u'anywhere', 0.0016195641003155631), (u'narrative', 0.001614732454952487), (u'occasional', 0.0016102652278808267), (u'central', 0.0016081788011969805), (u'giant', 0.0016035176486315535), (u'free', 0.0015934002865996032), (u'desperately', 0.0015771875152043742), (u'half', 0.0015768131466758427), (u'mysterious', 0.0015751527176854929), (u'largely', 0.0015751355449538386), (u'down', 0.0015691734113758155), (u'worth', 0.0015565706237899838), (u'previously', 0.001552396463687127), (u'low', 0.0015113894893284531), (u'heavy', 0.0014976075171991509), (u'minute', 0.0014966726528048834), (u'psychological', 0.0014689363873557058), (u'fellow', 0.0014653527083804605), (u'positive', 0.0014402283127099625), (u'sole', 0.0014359629388887166), (u'cheesy', 0.001433531207507379), (u'critic', 0.0014261314813257182), (u'jean', 0.0014204952971273396), (u'successfully', 0.0014128102927179332), (u'military', 0.0013847281416612812), (u'wide', 0.0013709124940004726), (u'ugly', 0.0013535263057754747), (u'live', 0.0013501882237028178), (u'extraordinary', 0.0013479763826237494), (u'recently', 0.0012873275710996563), (u'poorly', 0.0012552509687996207), (u'alone', 0.001210747694372369), (u'current', 0.0012097881183875095), (u'public', 0.00115049929567981), (u'oddly', 0.0011000750444722194), (u'unlikely', 0.0010737672952186752), (u'laughable', 0.0009970586913405925)]
posscores=seed_score(['good','great','perfect','cool'])
negscores=seed_score(['bad','terrible','wrong',"crap"])
## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms:
sentscores[w]=posscores[w]-negscores[w]
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
[(u'terrible', -0.009855717788299525), (u'wrong', -0.002892807170410995), (u'laughable', -0.0022372681494660608), (u'frankly', -0.0013849740220763036), (u'bad', -0.0013658125844714167), (u'poorly', -0.0013222754461456841), (u'anywhere', -0.0012546869557127096), (u'ugly', -0.001176343204772265), (u'current', -0.001117276129549125), (u'successfully', -0.001037488811300721), (u'unfunny', -0.0010133916696818575), (u'foreign', -0.000904334983761761), (u'sole', -0.0008546952219981477), (u'terribly', -0.0007390733994578876), (u'oddly', -0.0007073190373175831), (u'total', -0.0006894899408381215), (u'military', -0.0006728887380600891), (u'positive', -0.000609054423130711), (u'pathetic', -0.0005971148063656133), (u'awful', -0.000573324177411706), (u'earth', -0.0005105042292985202), (u'unnecessary', -0.0005082660511734676), (u'about', -0.0004718460547538108), (u'graphic', -0.00043745568496709455), (u'recently', -0.0004229737674965365), (u'critic', -0.00042093904961075274), (u'public', -0.0004157576360775922), (u'horrible', -0.00039944889995236713), (u'low', -0.0003892064134552527), (u'giant', -0.00037211464872255115), (u'worth', -0.0003552629512555594), (u'painful', -0.00035280475832298163), (u'offensive', -0.00033902732360354196), (u'desperately', -0.0003089528104637091), (u'entertaining', -0.0003081982911766674), (u'cheesy', -0.0003000668539786858), (u'one', -0.0002932272950073745), (u'ill', -0.0002868316466521808), (u'the', -0.0002369155377649466), (u'superior', -0.0002351620574958151), (u'unbelievable', -0.00022197959492024325), (u'half', -0.00020857413062678088), (u'jean', -0.00020618265558128746), (u'overly', -0.000192769761486823), (u'fellow', -0.0001800691312555103), (u'live', -0.00017944376467854146), (u'to', -0.00015489195817700615), (u'fairly', -0.00015233442125027106), (u'ex', -0.000133240536831384), (u'gary', -0.00011401582564645818), (u'already', -0.00011372581575243724), (u'fair', -9.02324502693951e-05), (u'stupid', -7.923432614896815e-05), (u'alone', -7.62403716351636e-05), (u'heavy', -7.111023622717464e-05), (u'seriously', -6.864833713224761e-05), (u'friendly', -6.761240116471266e-05), (u'previously', -6.525170730300057e-05), (u'minute', -6.519403682034059e-05), (u'painfully', -6.447850775579854e-05), (u'complete', -6.009777417811035e-05), (u'interested', -4.968938910418128e-05), (u'free', -2.996631042662557e-05), (u'apparently', -2.6996430693535575e-05), (u'dull', -2.052676565124454e-05), (u'equally', -9.762344666475664e-06), (u'standard', -3.637292259084249e-06), (u'ridiculous', 1.045081253261233e-05), (u'nowhere', 1.3030279211109991e-05), (u'tough', 2.8891238996220023e-05), (u'psychological', 2.92245726104584e-05), (u'predictable', 2.9484108354909007e-05), (u'occasional', 3.871281960579815e-05), (u'mostly', 4.2000269254770326e-05), (u'central', 5.328131393685185e-05), (u'silly', 5.40261995142597e-05), (u'international', 5.6200091819184374e-05), (u'over', 6.536716625644957e-05), (u'happy', 6.91126047207718e-05), (u'double', 6.995106926406145e-05), (u'brief', 8.330731593712165e-05), (u'unlikely', 8.62313866111619e-05), (u'down', 9.272424872767753e-05), (u'obvious', 9.409962691220991e-05), (u'thankfully', 0.00010456881380261187), (u'amusing', 0.00011606002779824664), (u'possibly', 0.0001170851976322212), (u'frequently', 0.00013607217060806467), (u'screen', 0.00014518477176753792), (u'elaborate', 0.00014565575042842227), (u'indeed', 0.00015197075399033238), (u'song', 0.00015267859031716782), (u'appropriate', 0.0001751277874693666), (u'of', 0.0001804054214429602), (u'supposedly', 0.0001813588769441093), (u'female', 0.000187585537309287), (u'extraordinary', 0.00018831007560704592), (u'sweet', 0.00019714980251914747), (u'odd', 0.00020447281519450286), (u'wide', 0.00021107841013971252), (u'largely', 0.0002230787156980189), (u'unique', 0.0002248681331836178), (u'weak', 0.00022527489860168668), (u'bright', 0.00022655330862714424), (u'badly', 0.00022762041065908269), (u'responsible', 0.00023230990156556134), (u'animal', 0.00023587487871816514), (u'complex', 0.00023957432196753032), (u'maybe', 0.0002527025964321641), (u'ahead', 0.00026528322946812576), (u'due', 0.00026764984856503596), (u'large', 0.0002742074206140816), (u'mean', 0.00027969318616533913), (u'such', 0.0002824625084295192), (u'spectacular', 0.000284731383087323), (u'single', 0.00029996981466053584), (u'david', 0.0003010343209289986), (u'early', 0.00030347445343878064), (u'there', 0.0003038652637019942), (u'unable', 0.0003089529589726911), (u'huge', 0.00031045486780024393), (u'truly', 0.0003143382466061274), (u'potential', 0.00034506606430874886), (u'aware', 0.00035049698049125226), (u'cinematic', 0.0003555319660876414), (u'hardly', 0.0003568441532017749), (u'opposite', 0.0003592578641072545), (u'romantic', 0.00035927730492690433), (u'capable', 0.00037030876580039463), (u'instead', 0.0003732553191903758), (u'meanwhile', 0.0003773411383428758), (u'available', 0.0003778464154304529), (u'up', 0.00038072585383780227), (u'possible', 0.00038259323279369803), (u'top', 0.000395909835942861), (u'finally', 0.00040172767143438004), (u'poor', 0.0004049527571028011), (u'national', 0.0004081816531386359), (u'even', 0.00040860453897041725), (u'absolutely', 0.0004115845765526563), (u'local', 0.00041357643629792045), (u'merely', 0.0004142841255082557), (u'easily', 0.0004149536193439774), (u'now', 0.00041557406708773526), (u'climactic', 0.0004205084905875853), (u'back', 0.000423291524248446), (u'whole', 0.00042383737917805336), (u'flat', 0.00043316669872226756), (u'right', 0.0004343735579328614), (u'simple', 0.0004346085198177101), (u'future', 0.00043509480822239224), (u'completely', 0.00043809695875969166), (u'open', 0.0004403087704105159), (u'straight', 0.000447631071003372), (u'basic', 0.00044899193653893314), (u'enjoyable', 0.0004501205216749073), (u'extremely', 0.00045181378697531447), (u'be', 0.00045577667832395166), (u'enough', 0.0004580487416557797), (u'on', 0.00046164403065117494), (u'usual', 0.00046192139465692835), (u'year', 0.00046657720097720116), (u'physical', 0.00046966667466785596), (u'so', 0.0004702228437914336), (u'laugh', 0.0004705173665808082), (u'mysterious', 0.0004715847558962325), (u'steven', 0.00047178890762968246), (u'practically', 0.0004770875391759445), (u'then', 0.00047833134786841704), (u'deadly', 0.00047872745964739846), (u'simply', 0.00047928947054113203), (u'previous', 0.0004864957003770933), (u'perhaps', 0.0004868518033162277), (u'obviously', 0.0004890121054869465), (u'detective', 0.0004894785073747004), (u'incredibly', 0.0004925436804560221), (u'dangerous', 0.0004971554463562978), (u'immediately', 0.000501778461674787), (u'originally', 0.0005042280610767363), (u'modern', 0.00050532562264158), (u'easy', 0.0005160148279703838), (u'disappointing', 0.0005186392844217117), (u'almost', 0.0005210870768316755), (u'else', 0.0005235152247228339), (u'later', 0.0005242486416186713), (u'impressive', 0.0005288163131077718), (u'thoroughly', 0.000535035438876993), (u'likable', 0.0005376939884275277), (u'accidentally', 0.0005387884356078381), (u'ultimate', 0.0005392830539742888), (u'anti', 0.0005405223310637991), (u'give', 0.0005434451039073836), (u'likely', 0.0005437373193750207), (u'self', 0.0005457375222126925), (u'small', 0.000547757267443128), (u'here', 0.0005489917746541125), (u'quick', 0.0005495308938904965), (u'in', 0.0005517231263248987), (u'big', 0.000553192795083189), (u'white', 0.0005571827961379774), (u'magic', 0.0005600402871066973), (u'music', 0.0005637757253686389), (u'seemingly', 0.0005673230234262952), (u'ultimately', 0.0005810759372053523), (u'serious', 0.0005854546565258667), (u'far', 0.0005878709423325309), (u'ever', 0.0005935266537370389), (u'paul', 0.0006015292658011918), (u'often', 0.000601672706603956), (u'dimensional', 0.0006021707502093074), (u'personal', 0.0006024528384928436), (u'main', 0.0006061750765146369), (u'rather', 0.0006090053018368551), (u'shot', 0.0006166843447406927), (u'time', 0.0006197846233195092), (u'thus', 0.0006224508013473646), (u'suddenly', 0.0006225691668584994), (u'old', 0.0006231237833857315), (u'full', 0.000624230460502307), (u'few', 0.000625522981820726), (u'critical', 0.0006271427574530369), (u'cast', 0.0006278059321773298), (u'emotional', 0.0006324621028321394), (u'dramatic', 0.0006337923808277456), (u'own', 0.0006353768623771551), (u'funny', 0.0006364526576666772), (u'technical', 0.0006420939225105851), (u'exactly', 0.0006428260279371192), (u'other', 0.0006498393372772956), (u'quite', 0.0006520081935745188), (u'clear', 0.0006539342315067483), (u'somewhere', 0.0006600920831501596), (u'just', 0.0006614607540827121), (u'common', 0.0006616229182801092), (u'hard', 0.0006619152572155417), (u'initially', 0.0006631595206745941), (u'narrative', 0.00066571276719812), (u'many', 0.000666141040993326), (u'not', 0.000672503120031562), (u'private', 0.0006733306849319425), (u'too', 0.0006818211717397056), (u'hilarious', 0.000683333186303766), (u'much', 0.0006841450446583987), (u'non', 0.0006871977899058012), (u'short', 0.000688800750842894), (u'usually', 0.0006989595906073285), (u'and', 0.0007005594235027154), (u'evil', 0.000701845065714783), (u'directly', 0.0007058816437173249), (u'necessarily', 0.0007099367697196331), (u'special', 0.0007110126412170818), (u'rich', 0.0007113668767104394), (u'clever', 0.0007128021013105938), (u'musical', 0.0007131823283974886), (u'typical', 0.0007183468693659971), (u'comedy', 0.000720336688148201), (u'certain', 0.0007265186777930407), (u'rarely', 0.0007327426410832743), (u'quickly', 0.0007337175378568676), (u'certainly', 0.0007351596198168088), (u'essentially', 0.0007375733628729414), (u'constantly', 0.0007376214223499883), (u'political', 0.000740313723538055), (u'dead', 0.00074437860122146), (u'as', 0.0007449121089841216), (u'alive', 0.0007452446800648737), (u'various', 0.0007474671137830931), (u'particular', 0.000751232883341537), (u'humorous', 0.0007585234451972629), (u'difficult', 0.0007601149087694834), (u'weird', 0.0007629208107821734), (u'once', 0.0007711851652305634), (u'average', 0.0007762013016956169), (u'never', 0.0007782027708719897), (u'major', 0.0007806552606664213), (u'utterly', 0.0007845844987133164), (u'american', 0.0007849357525979634), (u'only', 0.0007875764912710663), (u'away', 0.000789959495941285), (u'fresh', 0.0007924534921635159), (u'last', 0.0007952742073522234), (u'star', 0.0007961050312125886), (u'literally', 0.0007963635287101941), (u'really', 0.0007989344758660985), (u'ago', 0.000800378971986425), (u'favorite', 0.000800875986490055), (u'unfortunately', 0.0008097636860386844), (u'eventually', 0.000810251304977691), (u'worthy', 0.0008118883366741764), (u'beautiful', 0.0008158724141659971), (u'entire', 0.0008176639014100804), (u'out', 0.0008176780862230349), (u'recent', 0.0008191931673215587), (u'heavily', 0.0008204630367004118), (u'long', 0.0008214448818757603), (u'life', 0.0008238209176573776), (u'normal', 0.0008299171776015202), (u'sometimes', 0.000833889437982206), (u'true', 0.0008341628074066598), (u'comedic', 0.0008402304397220433), (u'wild', 0.0008460609266492368), (u'next', 0.0008506094512549538), (u'inevitable', 0.0008532965586213147), (u'male', 0.0008542524439990413), (u'movie', 0.0008568299207744503), (u'acting', 0.0008587059212509835), (u'together', 0.0008606082608744094), (u'soon', 0.0008612898410095002), (u'virtually', 0.0008620769385564711), (u'several', 0.0008635915908004243), (u'new', 0.0008717183653308082), (u'unusual', 0.000874049125520341), (u'totally', 0.0008753879812317773), (u'first', 0.0008767644151872808), (u'occasionally', 0.0008812392433480141), (u'ready', 0.0008867537055616045), (u'familiar', 0.0008871839350435358), (u'naturally', 0.0008931806678361168), (u'english', 0.0008953479422171709), (u'able', 0.0009025465029791912), (u'blue', 0.0009034182359929403), (u'third', 0.0009059647687182104), (u'clearly', 0.0009255258001603715), (u'little', 0.0009273497091342535), (u'soft', 0.0009315285079749412), (u'relatively', 0.000932682230440861), (u'numerous', 0.0009345468307633423), (u'creative', 0.0009391832311759108), (u'strong', 0.0009461852637215791), (u'real', 0.0009476095907813163), (u'attractive', 0.0009515625205987143), (u'well', 0.0009556768775209261), (u'sure', 0.000957112811378156), (u'again', 0.0009573117992908559), (u'nearly', 0.000963154733930963), (u'fat', 0.0009731769297739406), (u'regular', 0.0009743180446978744), (u'brilliant', 0.0009763815352689291), (u'dark', 0.0009772170725694694), (u'however', 0.0009825371290734996), (u'british', 0.0009831906491023189), (u'mental', 0.0009840764115491539), (u'successful', 0.0009841226854034244), (u'further', 0.000991798712094388), (u'impossible', 0.0009942979187305947), (u'no', 0.0009994894636115113), (u'still', 0.0010004181941142172), (u'extra', 0.001000860658100024), (u'disturbing', 0.001012984943329316), (u'particularly', 0.0010143822405221036), (u'light', 0.0010158854563900267), (u'scary', 0.0010190982983072799), (u'trouble', 0.001020603686622701), (u'probably', 0.0010229791746550003), (u'visual', 0.0010354518033044457), (u'entirely', 0.0010390508978398907), (u'subtle', 0.0010431120412786679), (u'general', 0.001044149818511135), (u'interesting', 0.001047814632179693), (u'present', 0.0010513260178107865), (u'human', 0.0010549180635384455), (u'thin', 0.001060201360202429), (u'important', 0.0010609448834153767), (u'billy', 0.0010614612617553588), (u'powerful', 0.001065105346526924), (u'remarkable', 0.0010801064408230525), (u'fast', 0.001080821099915891), (u'suspenseful', 0.0010865325836996308), (u'criminal', 0.0010893235304361155), (u'realistic', 0.0010908174254513768), (u'late', 0.001091566540796039), (u'necessary', 0.001093511991173771), (u're', 0.0010975283450386578), (u'slow', 0.0010992769336009133), (u'professional', 0.0011030029044699398), (u'ensemble', 0.0011086940528515163), (u'moral', 0.0011097710285118065), (u'famous', 0.0011131365147124647), (u'natural', 0.001122529450799036), (u'former', 0.0011246773108892236), (u'original', 0.001128075984996672), (u'co', 0.0011286547003453904), (u'pretty', 0.0011358695479615653), (u'different', 0.0011388889967668557), (u'good', 0.0011441582764286306), (u'individual', 0.0011444814831923626), (u'past', 0.0011477950766972849), (u'surprising', 0.0011522704428492743), (u'epic', 0.001153369353919784), (u'fly', 0.0011553456814920866), (u'two', 0.0011625152220849123), (u'cute', 0.00116783314798835), (u'highly', 0.0011731709941409391), (u'key', 0.0011735971782954077), (u'yet', 0.001186086529327471), (u'genuinely', 0.0011881735919781282), (u'initial', 0.0011920844650729454), (u'always', 0.0012053506155200685), (u'social', 0.0012062341230046482), (u'hot', 0.0012147898735283), (u'all', 0.0012153680273157782), (u'popular', 0.0012223294117901093), (u'second', 0.0012287549328360563), (u'comic', 0.0012292978367918522), (u'also', 0.001244627079060148), (u'slightly', 0.0012516757235332284), (u'similar', 0.0012528991713053467), (u'wealthy', 0.0012540156628997642), (u'desperate', 0.0012569748772776634), (u'rare', 0.0012578526078817402), (u'intense', 0.00125833299075402), (u'surprisingly', 0.0012584850456408068), (u'young', 0.0012594256270070184), (u'believable', 0.0012598114191262274), (u'actual', 0.0012604891484779095), (u'same', 0.00126803310786752), (u'effective', 0.0012723696620154205), (u'very', 0.0012726233793848608), (u'high', 0.0012981881983572924), (u'terrific', 0.0013008601686946928), (u'actually', 0.0013010291440174549), (u'final', 0.001320570482805893), (u'mainly', 0.0013263575562702952), (u'intelligent', 0.001334069929045109), (u'fantastic', 0.001340352895863594), (u'whatever', 0.0013462640084586114), (u'fully', 0.0013561219995253926), (u'basically', 0.001361770536748289), (u'black', 0.0013692127463810653), (u'effectively', 0.0013842696658279276), (u'emotionally', 0.001388499806196994), (u'nice', 0.0013909346255527887), (u'lovely', 0.0013961014520419077), (u'tight', 0.0014159678767012348), (u'especially', 0.0014182485621920152), (u'apparent', 0.0014586719518504473), (u'sharp', 0.0014619769627225439), (u'memorable', 0.0014656373965283931), (u'forward', 0.0014997473704841816), (u'therefore', 0.001499850682161086), (u'classic', 0.0015069583117860513), (u'sadly', 0.0015086339919077834), (u'sexual', 0.0015304010624382883), (u'barely', 0.001538589560672312), (u'sympathetic', 0.0015470191484813604), (u'love', 0.0015536385250374978), (u'looking', 0.0015587191721389278), (u'wonderfully', 0.0015793179219199137), (u'traditional', 0.0015967771118905387), (u'somewhat', 0.0016020153067423828), (u'close', 0.0016101565484702794), (u'sean', 0.0016256896243415598), (u'solid', 0.0016490646385887168), (u'wonderful', 0.0016554449555764227), (u'anyway', 0.001657321486801025), (u'constant', 0.0016655798457468024), (u'incredible', 0.0017345921494111424), (u'minor', 0.0017420880132325658), (u'mary', 0.0017445623912850411), (u'like', 0.0017654873172872217), (u'eccentric', 0.0017903988553942978), (u'perfectly', 0.0017942758139415226), (u'sad', 0.001823676709594714), (u'overall', 0.0018272778033300422), (u'visually', 0.001846739019031568), (u'nicely', 0.0019101214163914795), (u'slowly', 0.0019188497875388232), (u'shallow', 0.001926157429543946), (u'lucky', 0.0019562832861092887), (u'computer', 0.0019639866708498765), (u'surely', 0.002010656098858258), (u'hearted', 0.002040214009628856), (u'willing', 0.0021015517297600377), (u'off', 0.00219411315635401), (u'outstanding', 0.0022042146455797927), (u'definitely', 0.002310570684473473), (u'cold', 0.0023123785439538133), (u'great', 0.002751364025528922), (u'green', 0.002834670446207629), (u'quiet', 0.0030352435008745037), (u'generally', 0.003558180988855352), (u'perfect', 0.012866406697882273), (u'cool', 0.016159462179642536)]
We got a reasonably good sentiment lexicon tailored to the specific data we are working without using any labels! These lexicons are very similar to the ones we obtained in last lecture when we used the labels, only that this method can be applied to dataset where we do not have any sentiment annotations.