Info/CS 4300: Language and Information - in-class demo

Sentiment analysis

Building lexicons tailored to a domain for which we don't have sentiment labels

In [326]:
%matplotlib inline

from __future__ import print_function
import json
from operator import itemgetter
from collections import defaultdict

from matplotlib import pyplot as plt
import numpy as np

from nltk.tokenize import TreebankWordTokenizer
from nltk import FreqDist,pos_tag
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import load_files
from sklearn.naive_bayes import MultinomialNB

tokenizer = TreebankWordTokenizer()

Using the movie review data, but this time we will not use the sentiment labels (we will pretend we don't have labels).

In [327]:
## loading movie review data: 
## http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz
data = load_files('txt_sentoken')
print(data.data[0])
arnold schwarzenegger has been an icon for action enthusiasts , since the late 80's , but lately his films have been very sloppy and the one-liners are getting worse . 
it's hard seeing arnold as mr . freeze in batman and robin , especially when he says tons of ice jokes , but hey he got 15 million , what's it matter to him ? 
once again arnold has signed to do another expensive blockbuster , that can't compare with the likes of the terminator series , true lies and even eraser . 
in this so called dark thriller , the devil ( gabriel byrne ) has come upon earth , to impregnate a woman ( robin tunney ) which happens every 1000 years , and basically destroy the world , but apparently god has chosen one man , and that one man is jericho cane ( arnold himself ) . 
with the help of a trusty sidekick ( kevin pollack ) , they will stop at nothing to let the devil take over the world ! 
parts of this are actually so absurd , that they would fit right in with dogma . 
yes , the film is that weak , but it's better than the other blockbuster right now ( sleepy hollow ) , but it makes the world is not enough look like a 4 star film . 
anyway , this definitely doesn't seem like an arnold movie . 
it just wasn't the type of film you can see him doing . 
sure he gave us a few chuckles with his well known one-liners , but he seemed confused as to where his character and the film was going . 
it's understandable , especially when the ending had to be changed according to some sources . 
aside form that , he still walked through it , much like he has in the past few films . 
i'm sorry to say this arnold but maybe these are the end of your action days . 
speaking of action , where was it in this film ? 
there was hardly any explosions or fights . 
the devil made a few places explode , but arnold wasn't kicking some devil butt . 
the ending was changed to make it more spiritual , which undoubtedly ruined the film . 
i was at least hoping for a cool ending if nothing else occurred , but once again i was let down . 
i also don't know why the film took so long and cost so much . 
there was really no super affects at all , unless you consider an invisible devil , who was in it for 5 minutes tops , worth the overpriced budget . 
the budget should have gone into a better script , where at least audiences could be somewhat entertained instead of facing boredom . 
it's pitiful to see how scripts like these get bought and made into a movie . 
do they even read these things anymore ? 
it sure doesn't seem like it . 
thankfully gabriel's performance gave some light to this poor film . 
when he walks down the street searching for robin tunney , you can't help but feel that he looked like a devil . 
the guy is creepy looking anyway ! 
when it's all over , you're just glad it's the end of the movie . 
don't bother to see this , if you're expecting a solid action flick , because it's neither solid nor does it have action . 
it's just another movie that we are suckered in to seeing , due to a strategic marketing campaign . 
save your money and see the world is not enough for an entertaining experience . 

In [328]:
## building the term documnet matrix
vec = CountVectorizer(min_df = 50)
X = vec.fit_transform(data.data)
terms = vec.get_feature_names()
len(terms)
Out[328]:
2153
In [329]:
# PMI type measure via matrix multiplication
def getcollocations_matrix(X):
    XX=X.T.dot(X)  ## multiply X with it's transpose to get number docs in which both w1 (row) and w2 (column) occur
    term_freqs = np.asarray(X.sum(axis=0)) ## number of docs in which a word occurs
    pmi = XX.toarray() * 1.0  ## Casting to float, making it an array to use simple operations
    pmi /= term_freqs.T ## dividing by the number of documents in which w1 occurs
    pmi /= term_freqs  ## dividing by the number of documents in which w2 occurs
    
    return pmi  # this is not technically PMI beacuse we are ignoring some normalization factor and not taking the log 
                # but it's sufficient for ranking
In [330]:
pmi_matrix = getcollocations_matrix(X)
pmi_matrix.shape 
Out[330]:
(2153, 2153)
In [331]:
def getcollocations(w,PMI_MATRIX=pmi_matrix,TERMS=terms):
    if w not in TERMS:
        return []
    idx = TERMS.index(w)
    col = PMI_MATRIX[:,idx].ravel().tolist()
    return sorted([(TERMS[i],val) for i,val in enumerate(col)],key=itemgetter(1),reverse=True)
In [332]:
getcollocations("good")
Out[332]:
[(u'good', 0.0012711337380982813),
 (u'trek', 0.0010038914000850665),
 (u'sean', 0.0009922470727116103),
 (u'nudity', 0.0009374840201587473),
 (u'nicely', 0.0009268742752181751),
 (u'trash', 0.0009217014608968155),
 (u'showed', 0.000916850400576306),
 (u'compared', 0.00091151987499156),
 (u'fairly', 0.0008716089901959017),
 (u'comparison', 0.0008698557537213697),
 (u'laughed', 0.0008665639627895953),
 (u'crap', 0.0008473706979212659),
 (u'pulp', 0.0008450365730278281),
 (u'parts', 0.0008435572066033899),
 (u'fifteen', 0.0008424927416009955),
 (u'sorry', 0.0008413817621615216),
 (u'pretty', 0.0008334590198961828),
 (u'nights', 0.0008333717375608706),
 (u'chris', 0.000833301911692621),
 (u'doctor', 0.0008330167404996009),
 (u'rating', 0.0008322781072402701),
 (u'average', 0.0008295313148071339),
 (u'forward', 0.0008295313148071339),
 (u'watched', 0.0008295313148071339),
 (u'cool', 0.0008275372491465399),
 (u'stupid', 0.0008213343650560753),
 (u'sadly', 0.0008174507616788748),
 (u'matt', 0.0008162941129751053),
 (u'hate', 0.0008140549843070009),
 (u'kills', 0.0008135787895223813),
 (u'terrific', 0.0008122494124153186),
 (u'horrible', 0.0008093970595933685),
 (u'agrees', 0.0008091330037872864),
 (u'subplot', 0.0008082612810941305),
 (u'totally', 0.0008068044294699522),
 (u'sad', 0.0008064887782847135),
 (u'technical', 0.0008033355890763824),
 (u'therefore', 0.0008002537389904116),
 (u'handled', 0.0007999051964211649),
 (u'scientist', 0.0007949675100235034),
 (u'lovely', 0.0007943816828237808),
 (u'barry', 0.000792934345036231),
 (u'villain', 0.0007926632563712613),
 (u'event', 0.0007924384511368963),
 (u'producers', 0.0007895539020453444),
 (u'okay', 0.0007863956864371629),
 (u'fit', 0.000785871771922548),
 (u'mentioned', 0.0007854073087003715),
 (u'detail', 0.0007852359533368501),
 (u'information', 0.0007839070924927416),
 (u'allen', 0.0007790149847387508),
 (u'seven', 0.0007784369946922017),
 (u'shouldn', 0.0007783256780906441),
 (u'naturally', 0.0007776856076316881),
 (u'comments', 0.0007747509449613798),
 (u'entertain', 0.0007747509449613798),
 (u'jail', 0.0007734819016444898),
 (u'fbi', 0.0007733651320337342),
 (u'climactic', 0.0007732919036337689),
 (u'bad', 0.0007712559966343031),
 (u'ended', 0.0007711136165812794),
 (u'judge', 0.0007694203499660373),
 (u'ones', 0.0007682591154179706),
 (u'nice', 0.0007668341805484552),
 (u'kill', 0.000764042000480255),
 (u'critics', 0.0007636954961716471),
 (u'danny', 0.0007634624490260348),
 (u'presented', 0.0007617330823469355),
 (u'rent', 0.0007604037052398728),
 (u'sub', 0.0007604037052398728),
 (u'genius', 0.0007595059440766616),
 (u'thankfully', 0.0007594300769361086),
 (u'wanted', 0.0007590994107197358),
 (u'breaking', 0.0007584286306808082),
 (u'batman', 0.0007559768139867969),
 (u'total', 0.0007557951979353888),
 (u'wasn', 0.0007554151148587302),
 (u'bigger', 0.0007552449284064951),
 (u'ensemble', 0.000752202124443757),
 (u'steals', 0.000752202124443757),
 (u'lot', 0.0007517244632705712),
 (u'kiss', 0.0007491175649023608),
 (u'directing', 0.0007486014304357063),
 (u'perspective', 0.0007479380707277437),
 (u'badly', 0.0007476696718985352),
 (u'crash', 0.0007476696718985352),
 (u'adds', 0.0007473255088352558),
 (u'really', 0.0007456730464617401),
 (u'job', 0.0007452354168096464),
 (u'army', 0.000744825652379645),
 (u'brown', 0.0007446186605355376),
 (u'mainly', 0.0007441383853416938),
 (u'pay', 0.0007420807243907193),
 (u'dumb', 0.000741226368392181),
 (u'explosions', 0.0007406529596492268),
 (u'yeah', 0.0007402779454924423),
 (u'driver', 0.0007399796387768184),
 (u'recommend', 0.0007395349929176807),
 (u'blame', 0.0007389503091672745),
 (u'twice', 0.0007382828701783492),
 (u'gary', 0.0007379596761595932),
 (u'wouldn', 0.0007373611687174524),
 (u'cares', 0.0007367547861773888),
 (u'killed', 0.000736304171629268),
 (u'fiction', 0.0007362894228326887),
 (u'price', 0.0007357152732515652),
 (u'murphy', 0.0007352663926699596),
 (u'hits', 0.0007338161630986185),
 (u'accent', 0.0007334803204610447),
 (u'acts', 0.0007329420521241115),
 (u'saw', 0.0007328073077446421),
 (u'suspenseful', 0.0007327526614129683),
 (u'guilty', 0.0007299875570302779),
 (u'advice', 0.000729680323209979),
 (u'ending', 0.0007295169009651391),
 (u'aren', 0.0007290304055131928),
 (u'jackson', 0.000728289303944846),
 (u'ok', 0.0007282513286969607),
 (u'actor', 0.0007277390106091889),
 (u'news', 0.0007277251988989857),
 (u'fights', 0.0007273426745772697),
 (u'thinks', 0.00072659677209384),
 (u'throw', 0.0007247925124324958),
 (u'saying', 0.0007246200014638788),
 (u'cop', 0.0007238458347956482),
 (u'loves', 0.0007235356468040002),
 (u'extra', 0.0007228772886176453),
 (u'villains', 0.0007228772886176453),
 (u'performance', 0.0007227970372811597),
 (u'range', 0.0007226977363850031),
 (u'flash', 0.0007224950161223425),
 (u'gives', 0.0007223630680223859),
 (u'thrills', 0.0007222643344441425),
 (u'said', 0.0007214337497047879),
 (u'surprised', 0.0007209945072622753),
 (u'treat', 0.0007200792663256371),
 (u'guys', 0.0007196493682561889),
 (u'writing', 0.0007196184155951888),
 (u'particular', 0.0007191066917321584),
 (u'witty', 0.0007183570148845284),
 (u'natural', 0.000717863637813866),
 (u'acted', 0.0007174324884818456),
 (u'liked', 0.0007171432011881029),
 (u'cliched', 0.0007164134082425248),
 (u'grace', 0.0007158548012965267),
 (u'national', 0.000715805247454543),
 (u'acting', 0.0007155453571609738),
 (u'aliens', 0.0007152403336559289),
 (u'chemistry', 0.0007146731327569155),
 (u'guess', 0.0007139107996902104),
 (u'instance', 0.0007133969307341352),
 (u'violent', 0.0007131582166867087),
 (u'mediocre', 0.0007118275471655811),
 (u'alien', 0.0007110268412632576),
 (u'scary', 0.0007110268412632576),
 (u'ask', 0.0007106124899571602),
 (u'probably', 0.0007102573316947909),
 (u'nevertheless', 0.0007100225660637334),
 (u'mean', 0.0007095577775416395),
 (u'allowed', 0.0007091154787867436),
 (u'loud', 0.0007090011237667811),
 (u'flick', 0.0007089106899499742),
 (u'fun', 0.000708794736453356),
 (u'slightly', 0.000708672447749141),
 (u'plain', 0.0007081364882499924),
 (u'allows', 0.0007078066110039132),
 (u'prison', 0.0007071414486880486),
 (u'trailer', 0.00070639776026545),
 (u'stuff', 0.0007058992438503015),
 (u'fantastic', 0.0007056402742839906),
 (u'dog', 0.0007054282047178778),
 (u'critic', 0.0007051016175860639),
 (u'hey', 0.0007051016175860639),
 (u'overall', 0.0007051016175860639),
 (u'working', 0.0007049068919253111),
 (u'developed', 0.0007046989324817885),
 (u'person', 0.0007034519814486633),
 (u'visuals', 0.0007030783704767782),
 (u'emotion', 0.0007022736699219486),
 (u'menace', 0.0007016129344864077),
 (u'murdered', 0.0007013310207005769),
 (u'requires', 0.0007008109383715443),
 (u'track', 0.0007006720814390355),
 (u'usual', 0.0007006493308681725),
 (u'lines', 0.0006999170468685193),
 (u'saving', 0.0006999170468685193),
 (u'yes', 0.0006999170468685193),
 (u'able', 0.0006998405782738653),
 (u'get', 0.0006997175379902146),
 (u'maybe', 0.0006996916307503651),
 (u'think', 0.0006990926451643161),
 (u'bring', 0.0006989242567311171),
 (u'remember', 0.0006988801327250104),
 (u'de', 0.0006986421524297788),
 (u'annoying', 0.0006978596775361603),
 (u'wonderfully', 0.0006977822236318833),
 (u'disappointing', 0.00069756042381509),
 (u'included', 0.0006972871921567213),
 (u'friends', 0.0006965130357913436),
 (u'tell', 0.0006964706559291111),
 (u'williams', 0.0006959289155473312),
 (u'realistic', 0.0006955301024152124),
 (u'except', 0.0006951165184263484),
 (u'episode', 0.0006940976307569897),
 (u'impressive', 0.0006938603053760607),
 (u'terribly', 0.0006936598063473447),
 (u'very', 0.0006935024277037634),
 (u'language', 0.000693488179178764),
 (u'doing', 0.000693268245804233),
 (u'feeling', 0.0006931194985944053),
 (u'somewhere', 0.0006923647194453244),
 (u'study', 0.0006912760956726117),
 (u'theatre', 0.0006912760956726117),
 (u'dull', 0.0006902443403059361),
 (u'decided', 0.000689946718565549),
 (u'hotel', 0.000689668476845466),
 (u'seemingly', 0.0006890461727833452),
 (u'thrillers', 0.0006890461727833452),
 (u'mood', 0.0006884254725976731),
 (u'confused', 0.0006883344952654942),
 (u'anti', 0.0006881339316013725),
 (u'brilliant', 0.0006881339316013725),
 (u'reason', 0.000688112360680975),
 (u'smart', 0.0006876378004322295),
 (u'direction', 0.0006873678209267594),
 (u'jackie', 0.0006873678209267594),
 (u'actually', 0.0006873117883139156),
 (u'drop', 0.000686248633158629),
 (u'planet', 0.0006861555320009626),
 (u'brian', 0.0006859585872443607),
 (u'above', 0.000685583233708249),
 (u'lawyer', 0.000685264999188502),
 (u'better', 0.0006851280870126166),
 (u'warm', 0.0006849917675301333),
 (u'biggest', 0.0006847808840354194),
 (u'hundred', 0.0006846925138090629),
 (u'screenplay', 0.0006846474207826003),
 (u'did', 0.0006843905146410103),
 (u'lose', 0.0006843633347158854),
 (u'will', 0.0006842884672687007),
 (u'direct', 0.0006840940063669222),
 (u'scene', 0.0006840515924536997),
 (u'george', 0.0006839995051918474),
 (u'considered', 0.000683922094654818),
 (u'sheer', 0.0006838028405842591),
 (u'criminal', 0.0006834206854945138),
 (u'general', 0.0006833962645302296),
 (u'develops', 0.000683143435723522),
 (u'rules', 0.000683143435723522),
 (u'guy', 0.0006827392523224378),
 (u'talent', 0.0006825963958166327),
 (u'looks', 0.0006825376168108359),
 (u'had', 0.0006825121813937092),
 (u'great', 0.0006824846052572631),
 (u'tension', 0.0006824512944512591),
 (u'learn', 0.0006824341921233109),
 (u'fact', 0.0006821735781395314),
 (u'entertainment', 0.0006821027635973353),
 (u'agent', 0.0006814007228772886),
 (u'explained', 0.0006814007228772886),
 (u'hit', 0.0006810888689995415),
 (u'reasons', 0.0006807222621508924),
 (u'moved', 0.0006804749066777271),
 (u'offensive', 0.0006802156781418498),
 (u'threatening', 0.0006799437006615852),
 (u'feel', 0.0006798736033728572),
 (u'huge', 0.000679823000596379),
 (u'running', 0.0006792911231160586),
 (u'master', 0.0006792539027043922),
 (u'cops', 0.0006791217906937525),
 (u'why', 0.000678920708442264),
 (u'gore', 0.0006787074393876551),
 (u'failure', 0.0006781978992679947),
 (u'soundtrack', 0.0006781978992679947),
 (u'besides', 0.0006781089319455142),
 (u'either', 0.0006780236523876963),
 (u'aforementioned', 0.0006779823246019844),
 (u'feels', 0.000677834616034533),
 (u'me', 0.0006772941322099265),
 (u'definitely', 0.0006772162428792703),
 (u'capable', 0.0006770439407617049),
 (u'intelligent', 0.0006762483544623375),
 (u'rated', 0.0006761248387811572),
 (u'flicks', 0.0006759144046576647),
 (u'girls', 0.0006755789965569017),
 (u'care', 0.0006753585539959397),
 (u'anyway', 0.0006753514107461222),
 (u'well', 0.0006750278432664558),
 (u'relief', 0.0006745639263266804),
 (u'done', 0.0006741033421380078),
 (u'asking', 0.0006739941932807963),
 (u'evil', 0.0006738733408165179),
 (u'jump', 0.0006732428062202827),
 (u'supporting', 0.0006732428062202827),
 (u'gets', 0.0006727355113724907),
 (u'feet', 0.0006727296638374928),
 (u'sure', 0.0006725072227117109),
 (u'although', 0.0006724942545826389),
 (u'credit', 0.0006722064102747464),
 (u'weird', 0.0006719203649937786),
 (u'happening', 0.0006718035295973268),
 (u'necessary', 0.0006717400320992552),
 (u'right', 0.0006715253500819656),
 (u'1996', 0.0006704431174468618),
 (u'hurt', 0.0006704431174468618),
 (u'basically', 0.0006703084795005513),
 (u'dies', 0.0006700060619596082),
 (u'roles', 0.0006697805845704244),
 (u'interesting', 0.0006689558957182921),
 (u'star', 0.0006687483070094306),
 (u'usually', 0.0006687411040075134),
 (u'whom', 0.0006685774776057497),
 (u'try', 0.0006682335591501913),
 (u'though', 0.0006680374524563834),
 (u'haunting', 0.0006679343054291209),
 (u'major', 0.0006678430076837096),
 (u'role', 0.0006678107603149175),
 (u'path', 0.0006675134798838656),
 (u'regular', 0.0006675134798838656),
 (u'sign', 0.0006674389889252802),
 (u'loved', 0.0006670457995356336),
 (u'don', 0.0006670226685137735),
 (u'thing', 0.0006670088013375039),
 (u'expecting', 0.0006667751707626963),
 (u'make', 0.0006663531085448293),
 (u'knows', 0.0006663202799443585),
 (u'isn', 0.0006661965036826523),
 (u'amount', 0.0006661387831026985),
 (u'relatively', 0.0006661387831026985),
 (u'bruce', 0.0006656732773143668),
 (u'ideas', 0.0006653532420848887),
 (u'he', 0.0006653376447000585),
 (u'want', 0.0006651063577650056),
 (u'reminded', 0.0006650552782505471),
 (u'subplots', 0.0006650552782505471),
 (u'grow', 0.0006648449508380707),
 (u'rise', 0.0006648449508380707),
 (u'sometimes', 0.0006648229310006633),
 (u'special', 0.0006647811930510132),
 (u'individual', 0.0006646885535313573),
 (u'forever', 0.0006645170210014137),
 (u'scenes', 0.0006644715123710205),
 (u'action', 0.0006642620639475215),
 (u'aspect', 0.0006642051436742437),
 (u'kind', 0.0006640702385978397),
 (u'getting', 0.0006640336879613757),
 (u'just', 0.0006639106047940057),
 (u'believable', 0.0006636250518457072),
 (u'boring', 0.0006636250518457072),
 (u'cliche', 0.0006636250518457072),
 (u'funny', 0.0006636250518457072),
 (u'irritating', 0.0006636250518457072),
 (u'weight', 0.0006636250518457072),
 (u'went', 0.0006636250518457072),
 (u'also', 0.0006635828794351934),
 (u'effects', 0.0006633694181585554),
 (u'jack', 0.0006633457483727081),
 (u'bit', 0.0006630408748634487),
 (u'need', 0.00066283752211646),
 (u'but', 0.0006625489862068561),
 (u'disappointment', 0.0006623869454056965),
 (u'hardly', 0.0006622308815687205),
 (u'tight', 0.0006621697337495544),
 (u'likes', 0.000661949231007713),
 (u'budget', 0.0006618118686439429),
 (u'frightening', 0.0006616499772866426),
 (u'heard', 0.0006615893921774687),
 (u'black', 0.0006614045091292062),
 (u'serves', 0.0006612206132520633),
 (u'typical', 0.0006606624400071102),
 (u'myself', 0.0006603810746369642),
 (u'again', 0.0006602878569010809),
 (u'superb', 0.0006600571752228807),
 (u'we', 0.0006600378894032979),
 (u'musical', 0.0006598544549602202),
 (u'nobody', 0.0006598544549602202),
 (u'afraid', 0.0006595453896417377),
 (u'richard', 0.0006595031571137463),
 (u'system', 0.0006593710451031064),
 (u'him', 0.000658615728676154),
 (u'longer', 0.0006585592117552819),
 (u'terrible', 0.0006584042253888791),
 (u'decides', 0.0006583581863548682),
 (u'knowing', 0.0006583581863548682),
 (u'does', 0.0006581230584311701),
 (u'makes', 0.0006581059926947726),
 (u'wars', 0.0006580639480592907),
 (u'sounds', 0.0006580116820462604),
 (u'nothing', 0.0006577440462556566),
 (u'built', 0.0006576998281685133),
 (u'reading', 0.0006575553105178501),
 (u'confusing', 0.0006574476909907604),
 (u'wasted', 0.0006572981180887036),
 (u'grown', 0.0006572440417318061),
 (u'drawn', 0.0006571006482461006),
 (u'fly', 0.000656712290888981),
 (u'responsible', 0.000656712290888981),
 (u'played', 0.0006564938091899696),
 (u'was', 0.0006564883957972653),
 (u'survive', 0.0006562747743727326),
 (u'childhood', 0.0006560838580747332),
 (u'gave', 0.0006559768907872017),
 (u'too', 0.0006556821711522463),
 (u'basic', 0.0006555973294443478),
 (u'calls', 0.0006553297386976359),
 (u'surprising', 0.0006553297386976359),
 (u'some', 0.0006548712038000038),
 (u'brief', 0.0006547372163299164),
 (u'became', 0.0006544925970037938),
 (u'beat', 0.0006542782201295704),
 (u'started', 0.0006538658599067997),
 (u'anyone', 0.0006534515545886386),
 (u'jerry', 0.0006531742636276646),
 (u'however', 0.0006529728297040989),
 (u'heroes', 0.0006529479161105658),
 (u'like', 0.000652946803213366),
 (u'admit', 0.0006528050781743098),
 (u'shoot', 0.0006528050781743098),
 (u'case', 0.0006527621417708519),
 (u'then', 0.0006527316279785068),
 (u'depth', 0.0006526033071035145),
 (u'script', 0.0006525362013649949),
 (u'movies', 0.0006524133101944996),
 (u'times', 0.0006520875564461009),
 (u'buy', 0.0006517746044913196),
 (u'provide', 0.0006517746044913196),
 (u'performances', 0.0006514996521165078),
 (u'tough', 0.0006513116963915387),
 (u'thrown', 0.0006512548480284078),
 (u'hill', 0.0006512208452691519),
 (u'beginning', 0.000651103824452392),
 (u'loving', 0.0006510856249939714),
 (u'ups', 0.0006510245761777506),
 (u'see', 0.0006509615377774679),
 (u'course', 0.000650951656758376),
 (u'problem', 0.0006504279627465027),
 (u'best', 0.0006503077449163204),
 (u'room', 0.00065000588100559),
 (u'filmmakers', 0.0006499420610860018),
 (u'places', 0.0006493805747227564),
 (u'never', 0.0006493165422671562),
 (u'supposedly', 0.0006487360282466047),
 (u'kevin', 0.0006486366355657029),
 (u'especially', 0.0006485261265981212),
 (u'even', 0.000648425062841444),
 (u'occasionally', 0.0006482891787988526),
 (u'company', 0.0006482129946307112),
 (u'money', 0.0006480713396930734),
 (u'fair', 0.0006478244553731903),
 (u'science', 0.0006477404096472726),
 (u'not', 0.0006476204670378808),
 (u'next', 0.0006475053385230357),
 (u'know', 0.0006468572043976747),
 (u'seems', 0.000646841698041768),
 (u'memories', 0.0006467532284936977),
 (u'unbelievable', 0.0006467532284936977),
 (u'sick', 0.0006463880375120525),
 (u'actors', 0.0006462354435466341),
 (u'supposed', 0.0006459044138513752),
 (u'idea', 0.0006457879795324968),
 (u'likable', 0.0006457743779827689),
 (u'extremely', 0.0006455626764426486),
 (u've', 0.0006454666510550725),
 (u'plays', 0.0006453135893114008),
 (u'creature', 0.0006451910226277709),
 (u'held', 0.0006451910226277709),
 (u'mike', 0.0006451910226277709),
 (u'seconds', 0.0006451910226277709),
 (u'time', 0.0006449425340547377),
 (u'entertaining', 0.0006446039516335691),
 (u'my', 0.0006441661752358124),
 (u'help', 0.0006441611885932493),
 (u'awful', 0.0006441436346040245),
 (u'could', 0.0006440929900534719),
 (u'considering', 0.0006440669964559454),
 (u'dr', 0.0006440243119177749),
 (u'should', 0.000643552677141085),
 (u'slowly', 0.0006433766496732495),
 (u'fans', 0.0006433099992381855),
 (u'pull', 0.0006432382652953624),
 (u'mistake', 0.0006431873237997343),
 (u'moral', 0.0006431197833898005),
 (u'occur', 0.0006428867689755288),
 (u'characterization', 0.0006425946804843995),
 (u'entirely', 0.0006425946804843995),
 (u'fire', 0.0006425946804843995),
 (u'bond', 0.0006422177921087489),
 (u'nomination', 0.0006422177921087489),
 (u'doesn', 0.0006421234962308942),
 (u'series', 0.000641827148682892),
 (u'today', 0.000641765780712276),
 (u'albeit', 0.0006417129039074055),
 (u'present', 0.0006415906262961427),
 (u'ahead', 0.0006415042167841836),
 (u'speed', 0.0006414399120310978),
 (u'anywhere', 0.0006410014705327854),
 (u'efforts', 0.0006410014705327854),
 (u'mad', 0.0006410014705327854),
 (u'possible', 0.0006410014705327854),
 (u'realize', 0.0006410014705327854),
 (u'selling', 0.0006410014705327854),
 (u'it', 0.0006405730734197016),
 (u'flashbacks', 0.000640446970990802),
 (u'holes', 0.000640446970990802),
 (u'predictable', 0.0006403799435736392),
 (u'flaw', 0.0006403399623072613),
 (u'generally', 0.0006402693157977394),
 (u'used', 0.0006399878692194824),
 (u'animals', 0.000639924157136932),
 (u'got', 0.0006397980885480555),
 (u'things', 0.0006396737955731069),
 (u'non', 0.0006396386041886335),
 (u'pieces', 0.0006394303884971658),
 (u'everything', 0.000639365173771159),
 (u'so', 0.0006390972320839649),
 (u'hasn', 0.0006390776966116186),
 (u'place', 0.0006386716708311836),
 (u'appearance', 0.0006386074407642222),
 (u'largely', 0.0006386074407642222),
 (u'stuck', 0.0006384594951043672),
 (u'wants', 0.0006382347851870402),
 (u'revolves', 0.0006381010113901031),
 (u'theme', 0.0006378593064615462),
 (u'seemed', 0.0006378000203469945),
 (u'exciting', 0.0006377021982579842),
 (u'fake', 0.0006377021982579842),
 (u'saved', 0.0006376248166054836),
 (u'go', 0.0006376136925584035),
 (u'frank', 0.0006375101771202974),
 (u'helped', 0.0006375101771202974),
 (u'oh', 0.0006375101771202974),
 (u'decent', 0.0006373228394249932),
 (u'difference', 0.0006373228394249932),
 (u'happened', 0.0006373228394249932),
 (u'trust', 0.0006373228394249932),
 (u'directors', 0.0006372308736472984),
 (u'work', 0.0006371939070111661),
 (u'etc', 0.0006370800497718789),
 (u'our', 0.0006369861926514015),
 (u'strikes', 0.000636961545298335),
 (u'seen', 0.0006367336520799815),
 (u'little', 0.0006363792864759592),
 (u'funniest', 0.0006363527894410891),
 (u'damn', 0.0006362882244259267),
 (u'couple', 0.0006362330341904833),
 (u'this', 0.0006362222842527883),
 (u'way', 0.0006359903405904665),
 (u'began', 0.0006359740080188027),
 (u'pulls', 0.0006359740080188027),
 (u'making', 0.0006359280760523128),
 (u'instead', 0.0006357293085159097),
 (u'always', 0.0006355965193658757),
 (u'problems', 0.0006355965193658757),
 (u'or', 0.0006355875258306249),
 (u'entire', 0.000635364058522621),
 (u'turn', 0.0006352884449487142),
 (u'personal', 0.0006352463489707263),
 (u'later', 0.0006351777737724782),
 (u'exact', 0.000635109912899212),
 (u'attention', 0.0006350561310452954),
 (u'happens', 0.0006350094367225153),
 (u'ever', 0.0006349762899425742),
 (u'common', 0.0006349105063331525),
 (u'describe', 0.000634717142390307),
 (u'straight', 0.0006345914558274575),
 (u'minor', 0.000634529550505457),
 (u'been', 0.0006344902934719933),
 (u'face', 0.0006343474760289847),
 (u'fight', 0.0006343474760289847),
 (u'twist', 0.0006343474760289847),
 (u'have', 0.0006342080874479353),
 (u'move', 0.0006342056273089425),
 (u'society', 0.0006341900697073896),
 (u'followed', 0.0006339989334597381),
 (u'combination', 0.0006338871367865835),
 (u'nearly', 0.0006336322258054493),
 (u'hot', 0.0006335790357188346),
 (u'may', 0.0006335218734437214),
 (u'if', 0.0006334845249732937),
 (u'social', 0.0006334602767618115),
 (u'strong', 0.0006329819174554437),
 (u'add', 0.0006326933757003564),
 (u'subtle', 0.0006325922256802605),
 (u'talking', 0.0006325176275404396),
 (u'patrick', 0.0006324149627737556),
 (u'took', 0.0006322647216517789),
 (u'eddie', 0.0006318913035611389),
 (u'government', 0.0006318913035611389),
 (u'put', 0.0006318847691429929),
 (u'before', 0.0006317650285653122),
 (u'learned', 0.000631720001276202),
 (u'together', 0.000631683328804283),
 (u'cross', 0.0006314900549657912),
 (u'deserves', 0.0006314900549657912),
 (u'give', 0.0006313901451384067),
 (u'character', 0.000631182985573547),
 (u'ability', 0.0006310363216211411),
 (u'player', 0.0006309111408392287),
 (u'poor', 0.0006306710681067937),
 (u'formula', 0.0006306130913584845),
 (u'needs', 0.0006305939406678666),
 (u'interested', 0.0006305786823940408),
 (u'do', 0.0006304871168155528),
 (u'game', 0.0006304437992534219),
 (u'suspense', 0.0006303616674400746),
 (u'short', 0.0006301762085067098),
 (u'wild', 0.0006300908072045678),
 (u'follow', 0.0006299954039481207),
 (u'second', 0.0006299059485256166),
 (u'all', 0.0006294991237565685),
 (u'ago', 0.00062944335947677),
 (u'say', 0.0006293887843642655),
 (u'because', 0.0006290448272023219),
 (u'powerful', 0.0006289852826559587),
 (u'seeing', 0.0006288777224349069),
 (u'audiences', 0.0006284328142478287),
 (u'worker', 0.0006284328142478287),
 (u'days', 0.0006283205941024273),
 (u'were', 0.0006281126594862564),
 (u'shot', 0.0006281077627921833),
 (u'charming', 0.0006280737097825443),
 (u'oliver', 0.0006280737097825443),
 (u'film', 0.0006279666236763682),
 (u'singing', 0.0006279091202359556),
 (u'leaves', 0.0006278772935280517),
 (u'films', 0.0006278191103276649),
 (u'quite', 0.0006278043814335809),
 (u'laughable', 0.0006277534274216149),
 (u'battle', 0.0006275584729410491),
 (u'powers', 0.0006275584729410491),
 (u'details', 0.0006273766246440509),
 (u'hell', 0.000627333056822895),
 (u'taking', 0.0006272902091310146),
 (u'mark', 0.0006271456627005742),
 (u'perfectly', 0.0006271456627005742),
 (u'robert', 0.000627137076486493),
 (u'made', 0.0006271226129930316),
 (u'generated', 0.000627086172503012),
 (u'big', 0.0006268262942715562),
 (u'starring', 0.0006266568084684328),
 (u'suppose', 0.0006266568084684328),
 (u'dramatic', 0.0006264244207177584),
 (u'what', 0.0006260189663520424),
 (u'dozen', 0.0006259190829908375),
 (u'touches', 0.0006259190829908375),
 (u'wrong', 0.0006259190829908375),
 (u'seriously', 0.0006257867813457326),
 (u'thoughts', 0.0006257867813457326),
 (u'seem', 0.0006257614273719321),
 (u'back', 0.0006256700813097204),
 (u'loose', 0.0006256634493036858),
 (u'sam', 0.0006256241759718609),
 (u'violence', 0.0006255706449948188),
 (u'any', 0.0006253106130995327),
 (u'gotten', 0.0006251540343474053),
 (u'record', 0.0006251540343474053),
 (u'robin', 0.0006250970571295465),
 (u'surprises', 0.0006250693710166432),
 (u'completely', 0.0006249764337694657),
 (u'join', 0.0006246470744029624),
 (u'results', 0.0006245882840900773),
 (u'people', 0.0006245715157190483),
 (u'bunch', 0.0006245321967800837),
 (u'industry', 0.0006245321967800837),
 (u'cliches', 0.0006244274182888866),
 (u'amazing', 0.0006244026472868915),
 (u'point', 0.0006242677266906242),
 (u'ass', 0.0006242017814390315),
 (u'disturbing', 0.000624161911626727),
 (u'which', 0.0006240510808640825),
 (u'sense', 0.0006240167998774386),
 (u'monster', 0.000623891198951584),
 (u'write', 0.000623891198951584),
 (u'ship', 0.00062371956814097),
 (u'hold', 0.0006237077554940856),
 (u'order', 0.0006236089285609969),
 (u'movie', 0.0006234062228935263),
 (u'unlike', 0.0006233902994508702),
 (u're', 0.0006232476883776215),
 (u'save', 0.0006229081301665292),
 (u'heart', 0.0006228812876201978),
 (u'killer', 0.0006228611418740851),
 (u'between', 0.0006227931995624544),
 (u'take', 0.0006223776384022585),
 (u'asks', 0.0006221484861053505),
 (u'edge', 0.0006221484861053505),
 (u'finally', 0.0006221484861053505),
 (u'lacking', 0.0006221484861053505),
 (u'quiet', 0.0006221484861053505),
 (u'shooting', 0.0006221484861053505),
 (u'stunning', 0.0006221484861053505),
 (u'tommy', 0.0006221484861053505),
 (u'tradition', 0.0006221484861053505),
 (u'going', 0.0006216814076623284),
 (u'they', 0.000621589734442527),
 (u'cast', 0.0006213394503626908),
 (u'sound', 0.000621302025580037),
 (u'mission', 0.0006211748578015862),
 (u'there', 0.0006210483119477813),
 (u'doubt', 0.0006209634413699117),
 (u'kids', 0.0006208839566620469),
 (u'brought', 0.0006208275763683964),
 (u'inside', 0.0006208275763683964),
 (u'six', 0.0006207377185631616),
 (u'small', 0.0006206982565340094),
 (u'thought', 0.0006206420733060639),
 (u'race', 0.0006205155504462813),
 (u'can', 0.000620277579253773),
 (u'one', 0.0006202348373100844),
 (u'explain', 0.000620135060583974),
 (u'using', 0.0006200746578183327),
 (u'many', 0.0006198587703310405),
 (u'humanity', 0.0006197086881206236),
 (u'much', 0.0006196181929293892),
 (u'fan', 0.0006195233870078595),
 (u'accept', 0.00061938338172266),
 (u'trying', 0.0006192172800459613),
 (u'1995', 0.0006191429378632956),
 (u'lee', 0.00061902994732788),
 (u'car', 0.0006189182239760392),
 (u'claims', 0.0006188566951735761),
 (u'out', 0.0006185562072468923),
 (u'effectively', 0.0006185101908649683),
 (u'frankly', 0.0006183778892198636),
 (u'hard', 0.000618262266146714),
 (u'told', 0.0006182356025449395),
 (u'born', 0.0006181603547841623),
 (u'fully', 0.0006180821561308057),
 (u'air', 0.0006180282974556462),
 (u'still', 0.0006179889451285238),
 (u'rob', 0.0006177360854946742),
 (u'against', 0.000617664533052339),
 (u'silent', 0.0006176401637422683),
 (u'failed', 0.0006175399788008664),
 (u'plot', 0.0006173511305174044),
 (u'important', 0.0006173256296239136),
 (u'none', 0.0006170067630796864),
 (u'broken', 0.0006169639153878059),
 (u'shock', 0.0006169639153878059),
 (u'south', 0.0006169639153878059),
 (u'books', 0.0006168309776770996),
 (u'spend', 0.0006166182773399696),
 (u'means', 0.0006164822885998373),
 (u'girlfriend', 0.0006164407018291546),
 (u'same', 0.0006164275804859909),
 (u'suspects', 0.0006163878519747455),
 (u'five', 0.000616306716282765),
 (u'being', 0.0006162586897745075),
 (u'weren', 0.0006162232624281567),
 (u'obsessed', 0.0006160489911435334),
 (u'whatever', 0.0006160489911435334),
 (u'van', 0.0006160232548778717),
 (u'college', 0.0006159579539052972),
 (u'recently', 0.0006159579539052972),
 (u'logic', 0.0006158641579628722),
 (u'them', 0.0006158507656812686),
 (u'marry', 0.000615458717437551),
 (u'speech', 0.000615458717437551),
 (u'far', 0.00061529015633726),
 (u'would', 0.0006151668925359685),
 (u'shows', 0.0006150671212228505),
 (u'those', 0.0006149973540811511),
 (u'here', 0.0006148167699391258),
 (u'must', 0.0006147659258603031),
 (u'long', 0.0006147065185682051),
 (u'exist', 0.0006146527212125149),
 (u'something', 0.0006145255546073584),
 (u'land', 0.000614467640597877),
 (u'no', 0.0006144303549400738),
 (u'telling', 0.0006143521391616743),
 (u'she', 0.0006141595264514455),
 (u'winner', 0.0006140686356364499),
 (u'almost', 0.0006140554976682077),
 (u'throughout', 0.0006139081088059418),
 (u'liners', 0.0006138531729572792),
 (u'chance', 0.0006137787755299422),
 (u'standing', 0.0006136259041039073),
 (u'that', 0.0006135531164153746),
 (u'fascinating', 0.0006135075349094428),
 (u'ex', 0.0006133504267058808),
 (u'quickly', 0.000613202560161352),
 (u'minutes', 0.000613131841379186),
 (u'obviously', 0.0006130527480043951),
 (u'mess', 0.0006130184244643914),
 (u'cute', 0.0006128626878052707),
 (u'plenty', 0.0006128626878052707),
 (u'comedies', 0.000612721993891633),
 (u'enough', 0.000612576970934499),
 (u'drama', 0.0006122731133100275),
 (u'notice', 0.0006122731133100275),
 (u'terms', 0.0006122731133100275),
 (u'decide', 0.0006121138331036512),
 (u'destroy', 0.0006117793446702613),
 (u'50', 0.0006116035965103445),
 (u'style', 0.0006116035965103445),
 (u'succeeds', 0.0006115134692488488),
 (u'theater', 0.0006115134692488488),
 (u'has', 0.0006113816301969087),
 (u'talented', 0.0006113753521468163),
 (u'superior', 0.0006112336003842039),
 (u'off', 0.000610998871659018),
 (u'introduced', 0.0006108366954488896),
 (u'certain', 0.0006106851136645484),
 (u'remarkable', 0.0006106272178441403),
 (u'taste', 0.0006106272178441403),
 (u'john', 0.0006101940874583805),
 (u'end', 0.0006100413906444177),
 (u'smith', 0.0006098461149111769),
 (u'read', 0.0006098176152095687),
 (u'other', 0.0006097992006179753),
 (u'sweet', 0.0006096555446172912),
 (u'use', 0.0006096429888972028),
 (u'visually', 0.0006093470769262281),
 (u'ed', 0.000609187059311489),
 (u'fox', 0.0006090702897007335),
 (u'play', 0.0006088723226785255),
 (u'credits', 0.000608867812346123),
 (u'tried', 0.0006085813851622432),
 (u'part', 0.0006082067833354827),
 (u'office', 0.0006081900264811919),
 (u'main', 0.0006081150616067336),
 (u'despite', 0.0006080087477847743),
 (u'where', 0.0006079708396391428),
 (u'meeting', 0.000607944182769612),
 (u'ways', 0.0006078840587343283),
 (u'involved', 0.0006076402223690117),
 (u'figures', 0.0006075440615488869),
 (u'door', 0.0006073354269123659),
 (u'halfway', 0.0006073354269123659),
 (u'screenwriter', 0.0006073354269123659),
 (u'willing', 0.0006073354269123659),
 (u'opening', 0.0006072386095320195),
 (u'married', 0.0006071569563196793),
 (u'truth', 0.0006070411277231013),
 (u'humor', 0.0006069741327857078),
 (u'highly', 0.0006068997487008076),
 (u'effort', 0.0006068383443891115),
 (u'comic', 0.0006066880695697419),
 (u'led', 0.0006064640704892491),
 (u'friend', 0.0006064299831964133),
 (u'worse', 0.0006060657361243958),
 (u'than', 0.0006058864534585655),
 (u'driving', 0.0006057761575236307),
 (u'final', 0.0006057761575236307),
 (u'paced', 0.0006057761575236307),
 (u'yet', 0.0006057761575236307),
 (u'points', 0.0006056087513009137),
 (u'editing', 0.0006055578598092078),
 (u'disaster', 0.0006054973100782),
 (u'works', 0.0006053961127561814),
 (u'hope', 0.0006052028074954771),
 (u'conclusion', 0.0006051498935888108),
 (u'manage', 0.0006050699002122624),
 (u'pg', 0.0006050699002122624),
 (u'comes', 0.0006048901606608638),
 (u'generation', 0.0006048665837135352),
 (u'past', 0.0006048665837135352),
 (u'adaptation', 0.0006046583680220675),
 (u'score', 0.0006045405100835009),
 (u'students', 0.000604372815073769),
 (u'value', 0.000604372815073769),
 (u'you', 0.000604281417718327),
 (u'only', 0.000604277821507802),
 (u'watch', 0.0006039208079607493),
 (u'how', 0.0006036931915274067),
 (u'talk', 0.0006036321621141198),
 (u'woody', 0.000603555542842432),
 (u'about', 0.0006033704212867672),
 (u'owner', 0.0006032955016779157),
 (u'is', 0.0006032580875393416),
 (u'are', 0.0006032575189779611),
 (u'll', 0.0006030994567791901),
 (u'came', 0.0006030916856300515),
 (u'field', 0.0006028570601796031),
 (u'90', 0.0006025840683032954),
 (u'enjoyed', 0.0006025840683032954),
 (u'multiple', 0.0006025840683032954),
 (u'everyone', 0.000602438547840305),
 (u'obvious', 0.0006022624614353165),
 (u'wonderful', 0.0006020791801019521),
 (u'look', 0.000602031109907932),
 (u'wait', 0.0006019862666482326),
 (u'likely', 0.0006019160150124935),
 (u'jeff', 0.0006018168362326266),
 (u'dialogue', 0.0006017698266375452),
 (u'didn', 0.0006017325599637242),
 (u'now', 0.0006016610695602147),
 (u'island', 0.0006015685107379979),
 (u'over', 0.0006014962542014384),
 (u'1998', 0.0006014102032351721),
 (u'agree', 0.0006014102032351721),
 (u'date', 0.0006014102032351721),
 (u'opera', 0.0006014102032351721),
 (u'remake', 0.0006011771888209005),
 (u'be', 0.0006011213317860572),
 (u'chief', 0.0006011096484109667),
 (u'games', 0.0006011096484109667),
 (u'producer', 0.0006010587069153386),
 (u'while', 0.000600992473040906),
 (u'due', 0.0006009869729725155),
 (u'phone', 0.0006009869729725155),
 (u'building', 0.0006006950900327521),
 (u'given', 0.0006006665994669187),
 (u'falls', 0.0006004557215967957),
 (u'mary', 0.0006003950425352334),
 (u'figure', 0.0006003187146630575),
 (u'travel', 0.0006003187146630575),
 (u'naked', 0.0006001903042428087),
 (u'am', 0.0006000416871066832),
 (u'addition', 0.0006000008053702085),
 (u'unnecessary', 0.0005998149507066969),
 (u'super', 0.0005997287208402928),
 (u'hero', 0.0005996418225253119),
 (u'forces', 0.0005995622374348592),
 (u'onto', 0.0005994932191043153),
 (u'stands', 0.0005994932191043153),
 (u'choice', 0.0005994659892160929),
 (u'imagine', 0.0005994659892160929),
 (u'worth', 0.0005993948842101705),
 (u'enjoy', 0.0005993089675258589),
 (u'without', 0.000599238187956086),
 (u'position', 0.00059910594958293),
 (u'for', 0.0005988931282437008),
 (u'released', 0.0005988905987743094),
 (u'several', 0.0005988859731006158),
 (u'law', 0.0005988595053420486),
 (u'technology', 0.000598522594227932),
 (u'otherwise', 0.0005982196981782216),
 (u'early', 0.0005981933738790719),
 (u'who', 0.0005981748562238021),
 (u'version', 0.000598001170434595),
 (u'immediately', 0.0005979750275450198),
 (u'studio', 0.0005979750275450198),
 (u'yourself', 0.0005979538227568091),
 (u'pacing', 0.0005977505062580819),
 (u'witness', 0.0005977505062580819),
 (u'most', 0.0005976870249575253),
 (u'audience', 0.0005976437317292098),
 (u'happen', 0.0005976396063496852),
 (u'similar', 0.0005976193343234191),
 (u'often', 0.000597486744313787),
 (u'stop', 0.0005973863573051375),
 (u'surprisingly', 0.0005971756847433556),
 (u'mental', 0.0005970111735354374),
 (u'every', 0.0005969647212682807),
 (u'creating', 0.0005967546703459484),
 (u'girl', 0.0005964550383015896),
 (u'background', 0.000596414850427027),
 (u'million', 0.0005963657560505341),
 (u'1997', 0.0005962256325176275),
 (u'around', 0.0005961969250385714),
 (u'leave', 0.0005961050611055916),
 (u'unfortunately', 0.0005960899107710949),
 (u'growing', 0.000595860521903716),
 (u'members', 0.0005957036958682102),
 (u'brings', 0.0005954556467674971),
 (u'faces', 0.0005954556467674971),
 (u'apparently', 0.0005953574029716272),
 (u'let', 0.0005953107082733549),
 (u'student', 0.0005950985519268569),
 (u'veteran', 0.0005950985519268569),
 (u'more', 0.0005950716124445559),
 (u'forget', 0.0005949507380788871),
 (u'whole', 0.0005949163974879445),
 (u'career', 0.0005948830146027255),
 (u'catherine', 0.0005948612718024842),
 (u'watching', 0.0005948059224834115),
 (u'anything', 0.0005946316706465377),
 (u'starts', 0.0005945849455816957),
 (u'screen', 0.0005942032822377343),
 (u'up', 0.0005941929153640528),
 (u'model', 0.0005941237795240284),
 (u'and', 0.0005940629658477276),
 (u'capture', 0.0005935439580085528),
 (u'solid', 0.0005935439580085528),
 (u'positive', 0.0005934339405927958),
 (u'lots', 0.000593387363876636),
 (u'buddy', 0.0005932723960329503),
 (u'era', 0.0005932113472167296),
 (u'storyline', 0.0005930761269415491),
 (u'thriller', 0.0005929729550712593),
 (u'old', 0.0005925904737386595),
 (u'as', 0.0005925848590923172),
 (u'cause', 0.0005925223677193814),
 (u'handle', 0.0005925223677193814),
 (u'heroine', 0.0005925223677193814),
 (u'mouth', 0.0005925223677193814),
 (u'provided', 0.0005925223677193814),
 (u'easy', 0.0005922554657519402),
 (u'sets', 0.0005920306479121454),
 (u'twenty', 0.0005920159383452623),
 (u'ben', 0.0005919268678523268),
 (u'us', 0.0005918044934621259),
 (u'haven', 0.0005917675621554076),
 (u'stone', 0.0005917675621554076),
 (u'taken', 0.0005917323378957556),
 (u'fill', 0.0005916510112962647),
 (u'least', 0.0005914705528654417),
 (u'begins', 0.0005914642920627396),
 (u'friendly', 0.0005914251040754566),
 ...]
In [333]:
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
Out[333]:
[(u'worst', -0.047847985347985345),
 (u'bad', -0.005144142257015721),
 (u'over', -0.0008741258741258741),
 (u'ever', -0.0006927835481992246),
 (u'old', -0.000509611311233071),
 (u'horrible', -0.0005045767240889192),
 (u'appropriate', -0.0004807692307692308),
 (u'single', -0.0003521130740894542),
 (u'worried', -0.0003434065934065934),
 (u'rich', -0.00020903010033444816),
 (u'year', -0.0001923076923076923),
 (u'normal', -0.00016869095816464237),
 (u'ready', -0.00016411333242216783),
 (u'busy', -0.00014386161489820027),
 (u'able', -0.00011459845186260283),
 (u'enough', -0.0001019798896944335),
 (u'high', -8.761468369791503e-05),
 (u'rude', -8.029485482690247e-05),
 (u'seriously', -7.754342431761787e-05),
 (u'sorry', -7.302249637155297e-05),
 (u'still', -5.139207374979732e-05),
 (u'other', -5.07118667496026e-05),
 (u'like', -4.767420925957511e-05),
 (u'too', -4.1431243431412225e-05),
 (u'away', -4.001232677893313e-05),
 (u'fast', -3.4470246734397684e-05),
 (u'little', -1.496198113885625e-05),
 (u'hard', -1.1084353093817969e-05),
 (u'probably', -9.115554299477096e-06),
 (u'now', -8.629079693513414e-06),
 (u'never', -8.614478291966799e-06),
 (u'long', -8.133879221071868e-06),
 (u'much', -3.840238804308329e-06),
 (u'together', -3.67649335290002e-06),
 (u'just', -2.2528019305203796e-06),
 (u'rob', 0.0),
 (u'skin', 0.0),
 (u'young', 0.0),
 (u'finally', 0.0),
 (u'ta', 0.0),
 (u'worse', 0.0),
 (u'fat', 0.0),
 (u'bunim', 0.0),
 (u'anxious', 0.0),
 (u'quick', 0.0),
 (u'anal', 0.0),
 (u'ten', 0.0),
 (u'tired', 0.0),
 (u'past', 0.0),
 (u'second', 0.0),
 (u'uncomfortable', 0.0),
 (u'kris', 0.0),
 (u'public', 0.0),
 (u'full', 0.0),
 (u'alone', 0.0),
 (u'sexy', 0.0),
 (u'dry', 0.0),
 (u'bible', 0.0),
 (u'ahead', 0.0),
 (u'guilty', 0.0),
 (u'later', 0.0),
 (u'weird', 0.0),
 (u'extra', 0.0),
 (u'private', 0.0),
 (u'moral', 0.0),
 (u'total', 0.0),
 (u'angry', 0.0),
 (u'live', 0.0),
 (u'acceptable', 0.0),
 (u'everywhere', 0.0),
 (u'basically', 0.0),
 (u'glad', 0.0),
 (u'male', 0.0),
 (u'embarrassing', 0.0),
 (u'awesome', 0.0),
 (u'huge', 0.0),
 (u'awkward', 0.0),
 (u'rather', 0.0),
 (u'truthful', 0.0),
 (u'guys', 0.0),
 (u'short', 0.0),
 (u'natural', 0.0),
 (u'tall', 0.0),
 (u'cute', 0.0),
 (u'murray', 0.0),
 (u'scott', 0.0),
 (u'cold', 0.0),
 (u'easier', 0.0),
 (u'safe', 0.0),
 (u'bigger', 0.0),
 (u'mean', 0.0),
 (u'em', 0.0),
 (u'sexual', 0.0),
 (u'special', 0.0),
 (u'god', 0.0),
 (u'red', 0.0),
 (u'free', 0.0),
 (u'completely', 0.0),
 (u'scary', 0.0),
 (u'atm', 0.0),
 (u'american', 0.0),
 (u'major', 0.0),
 (u'delicious', 0.0),
 (u'open', 0.0),
 (u'top', 0.0),
 (u'wonderful', 0.0),
 (u'white', 0.0),
 (u'hundred', 0.0),
 (u'huh', 0.0),
 (u'forward', 0.0),
 (u'ridiculous', 0.0),
 (u'double', 0.0),
 (u'miserable', 0.0),
 (u'apparently', 0.0),
 (u'clearly', 0.0),
 (u'afraid', 0.0),
 (u'potential', 0.0),
 (u'lily', 0.0),
 (u'regular', 0.0),
 (u'forever', 0.0),
 (u'clear', 0.0),
 (u'hungry', 0.0),
 (u'professional', 0.0),
 (u'normally', 0.0),
 (u'anyway', 0.0),
 (u'bright', 0.0),
 (u'wasteful', 0.0),
 (u'truly', 0.0),
 (u'gray', 0.0),
 (u'twice', 0.0),
 (u'stupid', 0.0),
 (u'common', 0.0),
 (u'boring', 0.0),
 (u'fair', 0.0),
 (u'dumb', 0.0),
 (u'desperate', 0.0),
 (u'outside', 0.0),
 (u'barely', 0.0),
 (u'quiet', 0.0),
 (u'somewhere', 0.0),
 (u'tryclearblue', 0.0),
 (u'wear', 0.0),
 (u'tough', 0.0),
 (u'drunk', 0.0),
 (u'active', 0.0),
 (u'late', 0.0),
 (u'basic', 0.0),
 (u'present', 0.0),
 (u'fur', 0.0),
 (u'straight', 0.0),
 (u'ugly', 0.0),
 (u'alcoholic', 0.0),
 (u'almost', 0.0),
 (u'in', 0.0),
 (u'rid', 0.0),
 (u'grown', 0.0),
 (u'sensitive', 0.0),
 (u'belly', 0.0),
 (u'difficult', 0.0),
 (u'off', 0.0),
 (u'accurate', 0.0),
 (u'touch', 0.0),
 (u'yes', 0.0),
 (u'yet', 0.0),
 (u'early', 0.0),
 (u'possibly', 0.0),
 (u'disappointed', 0.0),
 (u'apart', 0.0),
 (u'necessary', 0.0),
 (u'often', 0.0),
 (u'dead', 0.0),
 (u'supportive', 0.0),
 (u'gross', 0.0),
 (u'literally', 0.0),
 (u'laker', 0.0),
 (u'exciting', 0.0),
 (u'oh', 0.0),
 (u'favorite', 0.0),
 (u'down', 0.0),
 (u'kmart', 0.0),
 (u'constantly', 0.0),
 (u'low', 0.0),
 (u'biggest', 0.0),
 (u'complete', 0.0),
 (u'diaper', 0.0),
 (u'true', 0.0),
 (u'khloe', 0.0),
 (u'inside', 0.0),
 (u'uh', 0.0),
 (u'emotional', 0.0),
 (u'certain', 0.0),
 (u'deep', 0.0),
 (u'girlfriend', 0.0),
 (u'annoying', 0.0),
 (u'selfish', 0.0),
 (u'incredible', 0.0),
 (u'sick', 0.0),
 (u'poor', 0.0),
 (u'welcome', 0.0),
 (u'luxurious', 0.0),
 (u'important', 0.0),
 (u'thebouncedryer', 0.0),
 (u'ago', 0.0),
 (u'younger', 0.0),
 (u'kardashian', 0.0),
 (u'serious', 0.0),
 (u'so', 9.719929433431632e-07),
 (u'up', 2.3948220007729206e-06),
 (u'only', 3.4565482998269166e-06),
 (u'not', 6.769332261068325e-06),
 (u'even', 7.76747119728939e-06),
 (u'crazy', 8.742022904100009e-06),
 (u'more', 8.838651139547831e-06),
 (u'wrong', 9.475619231716793e-06),
 (u'right', 9.72750361669927e-06),
 (u'around', 1.2384058436690012e-05),
 (u'nice', 1.3744503317492389e-05),
 (u'already', 1.4114724480578139e-05),
 (u'perfect', 1.6524555489457333e-05),
 (u'anymore', 1.782912565967765e-05),
 (u'here', 1.8486252267219916e-05),
 (u'comfortable', 1.8819632640770853e-05),
 (u'cool', 1.9084697889232416e-05),
 (u'really', 2.003658328600757e-05),
 (u'few', 2.2583559168925024e-05),
 (u'gorgeous', 2.8229448961156277e-05),
 (u'same', 2.902757619738752e-05),
 (u'fine', 2.9456816307293507e-05),
 (u'least', 3.0111412225233366e-05),
 (u'obviously', 3.070864690791617e-05),
 (u'nervous', 3.0795762503079576e-05),
 (u'beautiful', 3.22622273841786e-05),
 (u'as', 3.4001658469599825e-05),
 (u'armenian', 3.474393718296157e-05),
 (u'healthy', 3.662198784150004e-05),
 (u'gon', 3.8714672861014324e-05),
 (u'then', 3.890054232777123e-05),
 (u'easy', 4.337453914552158e-05),
 (u'there', 4.834990733331861e-05),
 (u'smart', 5.01856870420556e-05),
 (u'real', 5.2930216802168025e-05),
 (u'close', 5.313778627982358e-05),
 (u'whole', 5.386263165728602e-05),
 (u'totally', 5.519544414651547e-05),
 (u'um', 5.741582839557209e-05),
 (u'clean', 5.76601510695958e-05),
 (u'about', 5.76601510695958e-05),
 (u'different', 6.123038880785898e-05),
 (u'pregnant', 6.174868577077505e-05),
 (u'sometimes', 6.302388605281402e-05),
 (u'big', 6.341485539311127e-05),
 (u'yeah', 6.45244547683572e-05),
 (u'next', 6.532809402019291e-05),
 (u'new', 6.927439464142474e-05),
 (u'back', 6.954514654828853e-05),
 (u'super', 7.13165026387106e-05),
 (u'black', 7.259001161440186e-05),
 (u'better', 7.326569222565542e-05),
 (u'first', 7.429889392282526e-05),
 (u'many', 7.472445357743322e-05),
 (u'excited', 7.49737591842855e-05),
 (u'instead', 7.527853056308341e-05),
 (u'amazing', 7.547169811320755e-05),
 (u'all', 7.970667941973537e-05),
 (u'san', 8.66884090568301e-05),
 (u'always', 8.746618938689085e-05),
 (u'naked', 8.775778850372971e-05),
 (u'jealous', 8.775778850372971e-05),
 (u'actually', 9.610766562275378e-05),
 (u'exactly', 9.746588693957115e-05),
 (u'else', 9.833424385220351e-05),
 (u'happy', 0.00010227503659065664),
 (u'half', 0.00010423181154888472),
 (u'maybe', 0.00010762602164254015),
 (u'fun', 0.00010986596352450011),
 (u'last', 0.0001109612782436421),
 (u'well', 0.00011183991935902711),
 (u'kimberly', 0.00011291779584462511),
 (u'very', 0.00011345906421838313),
 (u'soon', 0.00011466574934067194),
 (u'pretty', 0.00011686194177483377),
 (u'two', 0.0001231830500123183),
 (u'honest', 0.0001231830500123183),
 (u'online', 0.0001231830500123183),
 (u'though', 0.0001231830500123183),
 (u'dramatic', 0.0001231830500123183),
 (u'definitely', 0.00012496943308475518),
 (u'okay', 0.000128214961394897),
 (u'funny', 0.0001314146790196465),
 (u'again', 0.00013204006859581066),
 (u'most', 0.00013290802764486976),
 (u'anywhere', 0.00013550135501355014),
 (u'and', 0.00013550135501355014),
 (u'married', 0.0001397624039133473),
 (u'sad', 0.0001538935056940597),
 (u'sweet', 0.0001538935056940597),
 (u'honestly', 0.00015723999979378361),
 (u'strong', 0.00016260162601626016),
 (u'upset', 0.0001719986240110079),
 (u'sure', 0.00017624762858331133),
 (u'entire', 0.00018126879276463763),
 (u'such', 0.00018364057710583252),
 (u'own', 0.00018487766158777996),
 (u'sudden', 0.00019860973187686197),
 (u'interested', 0.00019860973187686197),
 (u'fabulous', 0.00020165355918531962),
 (u'before', 0.00020514393823183516),
 (u'older', 0.00020885547201336674),
 (u'na', 0.00021189213182167),
 (u'proud', 0.00021557033752155703),
 (u'less', 0.00022197558268590456),
 (u'lately', 0.00022197558268590456),
 (u'extremely', 0.00022583559168925022),
 (u'hopefully', 0.0002358490566037736),
 (u'mad', 0.00023869196801527628),
 (u'you', 0.0002463661000246366),
 (u'along', 0.00025157232704402514),
 (u'far', 0.00025590039182125566),
 (u'usually', 0.0002658160552897395),
 (u'small', 0.0002695417789757412),
 (u'hot', 0.0002696971971726943),
 (u'possible', 0.000278473962684489),
 (u'kim', 0.00029239766081871346),
 (u'out', 0.00030596634370219276),
 (u'nude', 0.00030795762503079576),
 (u'fresh', 0.0003103721974849886),
 (u'awful', 0.00031269543464665416),
 (u'absolutely', 0.0003298922527165572),
 (u'especially', 0.00034956718133497076),
 (u'personal', 0.0003654970760233918),
 (u'kelly', 0.0003654970760233918),
 (u'done', 0.00037735849056603777),
 (u'light', 0.00037735849056603777),
 (u'female', 0.00037735849056603777),
 (u'scared', 0.0004192872117400419),
 (u'secret', 0.000449842555105713),
 (u'positive', 0.0006097560975609756),
 (u'also', 0.0006190667334485414),
 (u'once', 0.0006244240240072233),
 (u'certainly', 0.0006289308176100629),
 (u'adrienne', 0.0006548151605917973),
 (u'willing', 0.000732656922256554),
 (u'couple', 0.0009433962264150943),
 (u'changei', 0.0013550135501355014),
 (u'good', 0.0015092722230527766),
 (u'great', 0.004056482093964539),
 (u'best', 0.0062437027559645916)]

lots of words that correlate with good which do not even have a polarity, so we need to focus on words that are more likely to have a polarity: adverbs and adjectives.

In [334]:
##example part of speech (POS) tagging (note that you need to tokenize the sentence first)
pos_tag(tokenizer.tokenize("This was a great day but the time is running out fast"))
Out[334]:
[('This', 'DT'),
 ('was', 'VBD'),
 ('a', 'DT'),
 ('great', 'JJ'),
 ('day', 'NN'),
 ('but', 'CC'),
 ('the', 'DT'),
 ('time', 'NN'),
 ('is', 'VBZ'),
 ('running', 'VBG'),
 ('out', 'RP'),
 ('fast', 'RB')]
In [335]:
## POS tagging  all reviews
## POS tagging is relatively slow, so this will take a while

#reviews_pos_tagged=[pos_tag(tokenizer.tokenize(m)) for m in data.data]

## Reconstructing adjective-and-adverb-only reviews
reviews_adj_adv_only=[" ".join([w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]])
                      for m in reviews_pos_tagged]
In [336]:
print(data.data[1])
good films are hard to find these days . 
great films are beyond rare . 
proof of life , russell crowe's one-two punch of a deft kidnap and rescue thriller , is one of those rare gems . 
a taut drama laced with strong and subtle acting , an intelligent script , and masterful directing , together it delivers something virtually unheard of in the film industry these days , genuine motivation in a story that rings true . 
consider the strange coincidence of russell crowe's character in proof of life making the moves on a distraught wife played by meg ryan's character in the film -- all while the real russell crowe was hitching up with married woman meg ryan in the outside world . 
i haven't seen this much chemistry between actors since mcqueen and mcgraw teamed up in peckinpah's masterpiece , the getaway . 
but enough with the gossip , let's get to the review . 
the film revolves around the kidnapping of peter bowman ( david morse ) , an american engineer working in south america who is kidnapped during a mass ambush of civilians by anti-government soldiers . 
upon discovering his identity , the rebel soldiers decide to ransom him for $6 million . 
the only problem is that the company peter bowman works for is being auctioned off , and no one will step forward with the money . 
with no choice available to her , bowman's wife alice ( ryan ) hires terry thorne ( crowe ) , a highly skilled negotiator and rescue operative , to arrange the return of her husband . 
but when things go wrong -- as they always do in these situations -- terry and his team ( which includes the most surprising casting choice of the year : david caruso ) take matters into their own hands . 
the film is notable in that it takes this very simple story line and creates a complex and intelligent character-driven vehicle filled with well-written dialogue , shades of motivation , and convincing acting by all the actors . 
the script is based on both a book ( the long march to freedom ) and a magazine article pertaining to kidnap/ransom situations , and the story has been sharply pieced together by tony gilroy , screenwriter of the devil's advocate and dolores claiborne . 
the biggest surprise for me was not the chemistry between crowe and ryan , but that between crowe and david caruso . 
dug out from b-movie hell , caruso pulls off a gutsy performance as crowe's right hand gun while providing most of the film's humor . 
ryan cries a lot and smokes too many cigarettes , david morse ends up getting everyone at the guerilla camp to hate him , and crowe provides another memorable acting turn as the stoic , gunslinger character of terry thorne . 
the most memorable pieces of the film lie in its action scenes . 
the bulk of those scenes , which bookend the movie , work extremely well as establishment and closure devices for all of the story's characters . 
the scenes are skillfully crafted and executed with amazing accuracy and poise . 
director taylor hackford mixes both his old-school style of filmmaking with the dizziness of a lars von trier film . 
proof of life is a thinking man's action movie . 
it is a film about the choices men and women make in the face of love and war , and the sacrifices one makes for those choices -- the sacrifices that help you sleep at night . 

In [337]:
## It kind of works:
reviews_adj_adv_only[1]
Out[337]:
"good hard great rare one-two rare taut strong subtle intelligent masterful together virtually unheard genuine true strange distraught meg real married outside n't much enough david american south anti-government only forward available ryan terry highly skilled wrong always most surprising own notable very simple complex intelligent character-driven well-written long sharply together tony biggest not gutsy right most ryan too many david memorable gunslinger terry most memorable extremely well skillfully amazing old-school trier"
In [338]:
## term doc matrix only for adj/adv
X = vec.fit_transform(reviews_adj_adv_only)
terms = vec.get_feature_names()
In [339]:
len(terms)
Out[339]:
562
In [ ]:
 
In [340]:
pmi_matrix=getcollocations_matrix(X)
pmi_matrix.shape  # n_words by n_words
Out[340]:
(562, 562)
In [ ]:
 
In [342]:
getcollocations("good",pmi_matrix,terms)
Out[342]:
[(u'good', 0.0012845617524013917),
 (u'sean', 0.0009252217997465145),
 (u'nicely', 0.0009139270410318754),
 (u'fairly', 0.0008755655970071575),
 (u'robin', 0.0008653937882442727),
 (u'pretty', 0.0008548338879871134),
 (u'forward', 0.0008305488343511157),
 (u'terrific', 0.0008224793031847478),
 (u'cool', 0.0008204205677528381),
 (u'sadly', 0.0008203411798967191),
 (u'horrible', 0.0008162394739972354),
 (u'stupid', 0.0008141637119023551),
 (u'technical', 0.0008138216263091188),
 (u'lovely', 0.000809148389221857),
 (u'totally', 0.0007957590383758413),
 (u'sad', 0.0007916292386003339),
 (u'anti', 0.000788200947102258),
 (u'therefore', 0.0007862742336760081),
 (u'climactic', 0.0007856565791326648),
 (u'naturally', 0.0007855407689057879),
 (u'thankfully', 0.0007735470703392302),
 (u'bad', 0.0007711712373639965),
 (u'total', 0.0007710181664554288),
 (u'average', 0.0007709092809637673),
 (u'nice', 0.0007687057994165336),
 (u'mainly', 0.0007579711225428067),
 (u'fun', 0.0007575426481942805),
 (u'dumb', 0.000753415011970145),
 (u'bigger', 0.0007503673016413496),
 (u'really', 0.0007459124729101358),
 (u'twice', 0.0007415796995928053),
 (u'suspenseful', 0.0007411854520119478),
 (u'badly', 0.000739332488381918),
 (u'boring', 0.000739332488381918),
 (u'extra', 0.000739332488381918),
 (u'witty', 0.0007356047615497403),
 (u'guilty', 0.0007334647702201568),
 (u'gary', 0.0007322912265878045),
 (u'co', 0.0007263617429717089),
 (u'violent', 0.000724764360532028),
 (u'nevertheless', 0.0007242440702516748),
 (u'natural', 0.0007230834227031945),
 (u'smart', 0.0007227257388674994),
 (u'fantastic', 0.0007187573727497682),
 (u'maybe', 0.0007172577930466747),
 (u'slightly', 0.0007134523539730903),
 (u'either', 0.000711347986379672),
 (u'probably', 0.0007112067778635317),
 (u'though', 0.0007101187426403834),
 (u'particular', 0.0007097591888466413),
 (u'scary', 0.0007082680981137702),
 (u'usual', 0.0007081268963398241),
 (u'longer', 0.000707932266867628),
 (u'looking', 0.0007065542007196655),
 (u'terribly', 0.0007065542007196655),
 (u'robert', 0.0007063265737220109),
 (u'brilliant', 0.0007041261794113506),
 (u'intelligent', 0.0007031810436000601),
 (u'realistic', 0.0007030259822560203),
 (u'overall', 0.0007022972802440484),
 (u'somewhere', 0.0007019084591612359),
 (u'able', 0.0007016336973603369),
 (u'impressive', 0.000698175817331818),
 (u'very', 0.0006967103421644604),
 (u'loud', 0.0006940672339911883),
 (u'plain', 0.0006934978597221227),
 (u'right', 0.0006931443949618157),
 (u'weird', 0.0006928601605407688),
 (u'national', 0.0006882265560052878),
 (u'there', 0.0006878385631616218),
 (u'alien', 0.0006874604710229162),
 (u'great', 0.0006869373973507421),
 (u'past', 0.0006869029491235909),
 (u'actually', 0.000685955181232993),
 (u'wonderfully', 0.0006859017371207038),
 (u'general', 0.0006852349892320217),
 (u'better', 0.0006850957421299627),
 (u'capable', 0.0006849227381546773),
 (u'sure', 0.0006843408156923539),
 (u'disappointing', 0.000683743579481022),
 (u'dull', 0.000683416585899252),
 (u'believable', 0.0006827207435572456),
 (u'huge', 0.0006824607585063859),
 (u'seemingly', 0.0006814124316884037),
 (u'necessary', 0.0006796348340405209),
 (u'biggest', 0.0006793866109455464),
 (u'relatively', 0.0006785215910691195),
 (u'before', 0.0006768233275566246),
 (u'evil', 0.000676808909574656),
 (u'definitely', 0.0006754837585539397),
 (u'major', 0.0006745290111920259),
 (u'sometimes', 0.0006742712294043093),
 (u'black', 0.0006720462994227253),
 (u'well', 0.000671504346629487),
 (u'as', 0.0006689450718072615),
 (u'basic', 0.0006683427178347082),
 (u'funny', 0.0006678723347275578),
 (u'hardly', 0.0006674406137613473),
 (u'forever', 0.0006673891613551061),
 (u'also', 0.0006673611178298724),
 (u'fair', 0.0006665727831760785),
 (u'special', 0.0006664405529076444),
 (u'musical', 0.0006658888637082176),
 (u'offensive', 0.0006655439230052492),
 (u'anyway', 0.0006651745184226375),
 (u'brief', 0.0006649400268180232),
 (u'moral', 0.0006631886108409231),
 (u'responsible', 0.0006630521522790218),
 (u'just', 0.0006628096845863869),
 (u'usually', 0.0006625610216839845),
 (u'again', 0.0006619799217660092),
 (u'interesting', 0.0006613756613756613),
 (u'regular', 0.0006609700587377516),
 (u'occasionally', 0.0006603401817000564),
 (u'then', 0.0006594307667562458),
 (u'awful', 0.000658276102612472),
 (u'especially', 0.0006559491250305739),
 (u'fake', 0.0006555657532450505),
 (u'supposedly', 0.0006553789823751801),
 (u'terrible', 0.0006545398287485793),
 (u'however', 0.000654004356326862),
 (u'next', 0.0006525654523148672),
 (u'extremely', 0.0006524524033416466),
 (u'basically', 0.0006518196632265073),
 (u'typical', 0.0006518196632265073),
 (u'too', 0.0006512609037979682),
 (u'ahead', 0.0006508409550234646),
 (u'best', 0.0006491146329308628),
 (u'danny', 0.0006485894666690468),
 (u'never', 0.0006477344547582753),
 (u'tough', 0.0006469159273341783),
 (u'even', 0.0006464883829070581),
 (u'around', 0.0006460357696099141),
 (u'social', 0.0006452356262242194),
 (u'together', 0.0006444825500965067),
 (u'unbelievable', 0.0006444544692917445),
 (u'entirely', 0.0006426391045895142),
 (u'personal', 0.0006426077868943588),
 (u'not', 0.0006423935572122158),
 (u'minor', 0.0006421630756231517),
 (u'likable', 0.0006417352521217373),
 (u'mean', 0.000641257770535337),
 (u'quite', 0.0006410216256057863),
 (u'predictable', 0.000640754823264329),
 (u'subtle', 0.0006402131877417048),
 (u'always', 0.0006402020962292961),
 (u'generally', 0.0006398661203194408),
 (u'entire', 0.0006378554801726352),
 (u'frankly', 0.0006378554801726352),
 (u'little', 0.0006375234626413791),
 (u'ever', 0.000637266073888979),
 (u'short', 0.0006372459670525467),
 (u'largely', 0.0006370665432769362),
 (u'common', 0.0006370141529362062),
 (u'second', 0.0006359849362425101),
 (u'wrong', 0.000635925476169937),
 (u'instead', 0.0006355829230084756),
 (u'strong', 0.00063450471448079),
 (u'nearly', 0.000634401632655308),
 (u'back', 0.0006337135614702154),
 (u'quiet', 0.0006337135614702154),
 (u'possible', 0.0006312087647845625),
 (u'poor', 0.0006307932224772652),
 (u'mental', 0.0006302506458337662),
 (u'interested', 0.0006301929305731587),
 (u'stunning', 0.0006300076342101557),
 (u'so', 0.0006286547908608524),
 (u'dramatic', 0.0006282410782105418),
 (u'funniest', 0.0006278458433084542),
 (u'professional', 0.0006275006834165859),
 (u'frank', 0.0006273124143846578),
 (u'wild', 0.0006271417171290429),
 (u'naked', 0.000627112378538234),
 (u'star', 0.000627112378538234),
 (u'powerful', 0.000626749676179334),
 (u'decent', 0.0006267189305489107),
 (u'john', 0.0006262076478825818),
 (u'completely', 0.0006255335079053),
 (u'later', 0.0006254064548591827),
 (u'worse', 0.0006235667649983488),
 (u'perfectly', 0.0006235334239365573),
 (u'intriguing', 0.0006233248145608677),
 (u'laughable', 0.0006222952991013828),
 (u'surprising', 0.000622107085985413),
 (u'anywhere', 0.000621756701819834),
 (u'tight', 0.000621756701819834),
 (u'finally', 0.0006216428269660209),
 (u'big', 0.0006213592862226337),
 (u'much', 0.0006212148913906888),
 (u'ago', 0.0006211534728644996),
 (u'more', 0.0006199798310358765),
 (u'hard', 0.0006190442660658123),
 (u'straight', 0.0006188376562713841),
 (u'present', 0.0006177106937563211),
 (u'enough', 0.000617575663857709),
 (u'same', 0.0006162318080503474),
 (u'small', 0.0006154949120728588),
 (u'many', 0.0006153885110596677),
 (u'effectively', 0.0006151839251699169),
 (u'hot', 0.0006151839251699169),
 (u'recently', 0.0006147967387397613),
 (u'almost', 0.0006143694356622113),
 (u'here', 0.000613960063766426),
 (u'certain', 0.0006135451231654682),
 (u'still', 0.0006133362708912624),
 (u'important', 0.0006130810269107201),
 (u'remarkable', 0.0006128872941918517),
 (u'far', 0.0006120224706352686),
 (u'spectacular', 0.0006114779979098571),
 (u'visually', 0.0006102426888231705),
 (u'other', 0.0006101703594775709),
 (u'seriously', 0.0006099649270166965),
 (u'quickly', 0.0006096904329961811),
 (u'obviously', 0.0006096250342798272),
 (u'earlier', 0.000609151020327959),
 (u'slowly', 0.0006090691451908182),
 (u'long', 0.0006088771107782514),
 (u'fully', 0.0006088620492556972),
 (u'superior', 0.00060817931540365),
 (u'running', 0.0006077720706497973),
 (u'talented', 0.0006052776965324494),
 (u'wonderful', 0.0006052776965324494),
 (u'comic', 0.0006048073288417495),
 (u'oh', 0.0006045773057704355),
 (u'unfunny', 0.0006042385120995078),
 (u'about', 0.0006041402619349387),
 (u'top', 0.0006033140179310539),
 (u'likely', 0.0006028007048131317),
 (u'only', 0.0006026167735226865),
 (u'final', 0.0006025837724857136),
 (u'obvious', 0.0006024997205272381),
 (u'main', 0.000602477888849712),
 (u'apparently', 0.0006023816309988012),
 (u'incredibly', 0.0006020278833967047),
 (u'unfortunately', 0.0006016407983242533),
 (u'due', 0.0006013812369054086),
 (u'immediately', 0.0006013151176322699),
 (u'early', 0.000600729577650614),
 (u'often', 0.0006006006006006006),
 (u'incredible', 0.0006003602161296778),
 (u'cute', 0.0005998111898689282),
 (u'most', 0.0005985685875326978),
 (u'emotional', 0.0005982114011637608),
 (u'positive', 0.0005979656169770238),
 (u'now', 0.0005979394088065742),
 (u'several', 0.0005979190802734094),
 (u'highly', 0.0005964362931484381),
 (u'along', 0.0005962790050964475),
 (u'similar', 0.000595100190341206),
 (u'whole', 0.0005942959715223074),
 (u'willing', 0.0005934777797895668),
 (u'few', 0.0005933943030640883),
 (u'absolutely', 0.0005928288155689112),
 (u'sweet', 0.0005920610269134877),
 (u'away', 0.000591620460799738),
 (u'french', 0.0005914659907055344),
 (u'yet', 0.0005911647602544493),
 (u'worth', 0.0005908950775870928),
 (u'mary', 0.0005906836282839662),
 (u'least', 0.0005905845212708019),
 (u'surprisingly', 0.0005905812248256458),
 (u'old', 0.000590101593956502),
 (u'practically', 0.0005898717427521502),
 (u'happy', 0.000589500987414154),
 (u'up', 0.0005884162722214598),
 (u'middle', 0.00058779228889991),
 (u'effective', 0.0005876058065747514),
 (u'future', 0.0005875052809463456),
 (u'international', 0.0005875052809463456),
 (u'solid', 0.0005870447332999283),
 (u'exciting', 0.0005864956882626307),
 (u'tony', 0.0005864215046440799),
 (u'soon', 0.0005851288550908323),
 (u'unnecessary', 0.000584966364434045),
 (u'easy', 0.0005848159101222051),
 (u'free', 0.0005844776707294218),
 (u'third', 0.0005841639414375648),
 (u'apart', 0.0005819005029852293),
 (u'first', 0.0005809329714224865),
 (u'last', 0.000580213367814585),
 (u'amazing', 0.0005800090223625701),
 (u'extreme', 0.0005789481919604438),
 (u'non', 0.000578608034385849),
 (u'utterly', 0.0005783893616593237),
 (u'available', 0.0005781246525693194),
 (u'simple', 0.0005776035065483734),
 (u'otherwise', 0.0005774847802366472),
 (u'double', 0.0005765033093930432),
 (u'fascinating', 0.0005761032377001958),
 (u'rather', 0.0005757935047767011),
 (u'shallow', 0.000575036379852603),
 (u'light', 0.000574915395972979),
 (u'pure', 0.0005743029150823828),
 (u'exactly', 0.000574280538191088),
 (u'else', 0.0005736775398572476),
 (u'pathetic', 0.0005736775398572476),
 (u'honest', 0.0005733598889492426),
 (u'apparent', 0.0005718358063098241),
 (u'already', 0.0005715847809339198),
 (u'entertaining', 0.0005714539835012119),
 (u'enjoyable', 0.0005711245677447621),
 (u'friendly', 0.0005711245677447621),
 (u'single', 0.0005710582658446291),
 (u'deep', 0.000570424399040635),
 (u'certainly', 0.0005699326028365557),
 (u'quick', 0.000568979380459817),
 (u'constantly', 0.0005682595785953575),
 (u'painful', 0.0005679181643776794),
 (u'previous', 0.0005678436930736697),
 (u'real', 0.0005673647128220205),
 (u'popular', 0.0005661603391815123),
 (u'such', 0.0005660274727813993),
 (u'easily', 0.0005659961633545785),
 (u'normal', 0.0005658848928113239),
 (u'ready', 0.0005658156798841209),
 (u'particularly', 0.000565018324454474),
 (u'less', 0.0005649012303004698),
 (u'favorite', 0.0005643759453297084),
 (u'convincing', 0.0005633009435290804),
 (u'known', 0.0005633009435290804),
 (u'nasty', 0.0005633009435290804),
 (u'impossible', 0.0005624207858048163),
 (u'excellent', 0.0005616441760481125),
 (u'flat', 0.0005611509399278243),
 (u'originally', 0.0005593340354760587),
 (u'truly', 0.0005591590248266607),
 (u'virtually', 0.000558758193984491),
 (u'appropriate', 0.0005584449009124504),
 (u'perhaps', 0.0005572308902582929),
 (u'new', 0.0005569594344598478),
 (u'fast', 0.0005568997964435227),
 (u'screen', 0.0005568997964435227),
 (u'thoroughly', 0.000556594979915639),
 (u'clever', 0.000556329397198275),
 (u'safe', 0.0005561705518388389),
 (u'half', 0.0005560325442577374),
 (u'key', 0.000555889089008961),
 (u'aside', 0.0005552537871929507),
 (u'michael', 0.0005551134298149949),
 (u'once', 0.0005541844887360203),
 (u'different', 0.0005541309281693047),
 (u'necessarily', 0.0005540665018318823),
 (u'classic', 0.0005538756324660938),
 (u'emotionally', 0.0005537857248883865),
 (u'critical', 0.0005535888582958204),
 (u'chris', 0.0005533836733965262),
 (u'nowhere', 0.0005531382976406692),
 (u'original', 0.0005529724353966713),
 (u'dangerous', 0.0005528694445748382),
 (u'suddenly', 0.0005519938078013069),
 (u'down', 0.0005516731717589848),
 (u'lee', 0.0005513666015051592),
 (u'mysterious', 0.0005513666015051592),
 (u'serial', 0.0005504986493579649),
 (u'young', 0.0005498438886870331),
 (u'secret', 0.0005498397077462163),
 (u'intense', 0.0005497274268175362),
 (u'acting', 0.0005484772344888414),
 (u'military', 0.0005478256428826772),
 (u'humorous', 0.000547298075815186),
 (u'slow', 0.0005471823924341218),
 (u'low', 0.0005470205694386445),
 (u'somewhat', 0.0005465928646955907),
 (u'familiar', 0.0005464631435866351),
 (u'soft', 0.0005463047943708755),
 (u'animated', 0.0005462312179675932),
 (u'potential', 0.0005459686068051087),
 (u'over', 0.0005457929412302035),
 (u'essentially', 0.0005451299453507229),
 (u'silly', 0.0005447713072287817),
 (u'indeed', 0.0005445242454114444),
 (u'out', 0.0005438499440978276),
 (u'like', 0.0005435359981420951),
 (u'literally', 0.0005420443041506245),
 (u'crazy', 0.0005417629662764979),
 (u'no', 0.0005416355226241158),
 (u'serious', 0.0005409268406318974),
 (u'worst', 0.0005406021390612145),
 (u'memorable', 0.0005403090682829956),
 (u'true', 0.0005402317728588797),
 (u'psychological', 0.0005402148392860853),
 (u'standard', 0.0005398300708820353),
 (u'visual', 0.0005398300708820353),
 (u'close', 0.0005394994952109503),
 (u'jean', 0.0005392124163386921),
 (u'cold', 0.0005386565272496831),
 (u'bright', 0.0005385404624948351),
 (u'rarely', 0.0005384494313145622),
 (u'mad', 0.0005383158210338389),
 (u'complex', 0.0005376963551868495),
 (u'merely', 0.0005360160540768906),
 (u'poorly', 0.0005348362681911748),
 (u'computer', 0.0005346958174904943),
 (u'giant', 0.0005346958174904943),
 (u'clear', 0.0005346585226716695),
 (u'simply', 0.0005340171912077672),
 (u'successful', 0.0005337730714892496),
 (u'dark', 0.0005322365532609326),
 (u'time', 0.0005320064466663538),
 (u'rich', 0.0005315765772039537),
 (u'unique', 0.0005310015775010368),
 (u'to', 0.0005304770163685514),
 (u'life', 0.0005304417218232174),
 (u'day', 0.0005298847858621011),
 (u'oddly', 0.0005298847858621011),
 (u'physical', 0.0005292552821069931),
 (u'fi', 0.0005289821885661743),
 (u'mostly', 0.0005289533250212096),
 (u'billy', 0.0005280946345585128),
 (u'genuinely', 0.0005280946345585128),
 (u'minute', 0.0005280946345585128),
 (u'graphic', 0.0005267906971892326),
 (u'constant', 0.0005267229601830362),
 (u'dead', 0.0005264528895806107),
 (u'dimensional', 0.0005256383804442872),
 (u'sci', 0.0005254319725355288),
 (u'comedic', 0.0005253864569453923),
 (u'lucky', 0.0005250769509324643),
 (u'clearly', 0.0005248940610157341),
 (u'weak', 0.000524032368138832),
 (u'overly', 0.0005238698774820448),
 (u'female', 0.0005235025073014823),
 (u'surely', 0.0005227241806477484),
 (u'private', 0.0005221904709423308),
 (u'open', 0.0005221162047333222),
 (u'difficult', 0.0005217574989438107),
 (u'older', 0.0005216281696455515),
 (u'traditional', 0.0005214934516265315),
 (u'eventually', 0.0005210533727643994),
 (u'all', 0.0005208105706335679),
 (u'late', 0.0005208105706335679),
 (u'united', 0.000520725872215836),
 (u'attractive', 0.000520366420394242),
 (u'married', 0.0005194373454673897),
 (u'large', 0.000519041583680367),
 (u'aware', 0.0005188298164083636),
 (u'various', 0.0005184929139301763),
 (u'strange', 0.0005179504438381798),
 (u'own', 0.0005168251212652047),
 (u'rare', 0.0005158650746003157),
 (u'heavily', 0.0005156688784512538),
 (u'hilarious', 0.0005154203633291086),
 (u'barely', 0.0005146522256788417),
 (u'genuine', 0.0005143182527874212),
 (u'greatest', 0.0005139070175106722),
 (u'lead', 0.0005117311388397984),
 (u'ultimate', 0.0005114179618882441),
 (u'david', 0.0005107542137222632),
 (u'complete', 0.0005099706766860905),
 (u'grand', 0.000509006876682904),
 (u'narrative', 0.0005084029702190428),
 (u'perfect', 0.0005083120418988606),
 (u'near', 0.0005081954164447138),
 (u'thus', 0.0005069708491761723),
 (u'possibly', 0.0005056402170261037),
 (u'successfully', 0.0005055856829215927),
 (u'high', 0.0005051027593124279),
 (u'nonetheless', 0.0005037210360404276),
 (u'english', 0.0005033753112387527),
 (u'further', 0.0005031733147254146),
 (u'beautiful', 0.0005017852267718433),
 (u'initially', 0.0005012423650046902),
 (u'fresh', 0.0005008732616431257),
 (u'recent', 0.0005005636346526189),
 (u'eccentric', 0.0005004712229046829),
 (u'alive', 0.0004996234455649235),
 (u'tim', 0.0004987560437497066),
 (u'wide', 0.0004972891142092662),
 (u'human', 0.000496486050592237),
 (u'innocent', 0.0004943864663952036),
 (u'thin', 0.0004928883255879454),
 (u'public', 0.0004908173662367355),
 (u'steven', 0.0004887464068855257),
 (u'political', 0.0004883160776696898),
 (u'painfully', 0.00048584706379383184),
 (u'sole', 0.0004850647013722637),
 (u'sympathetic', 0.0004850647013722637),
 (u'equally', 0.00048392671966816455),
 (u'unusual', 0.00048392671966816455),
 (u'sexual', 0.0004825117292597781),
 (u'modern', 0.0004824400016353898),
 (u'ex', 0.00048142580638822573),
 (u'ultimately', 0.00048130143909130293),
 (u'somehow', 0.00048103669682557603),
 (u'sexy', 0.0004809723440902148),
 (u'off', 0.00047995539576202255),
 (u'full', 0.00047982005577615577),
 (u'outstanding', 0.00047956701949097386),
 (u'lame', 0.00047839161012947634),
 (u'william', 0.0004781867899738622),
 (u'hearted', 0.00047724107715658195),
 (u'empty', 0.00047698870218188265),
 (u'numerous', 0.0004768537690270928),
 (u'unable', 0.00047624534316549524),
 (u'chinese', 0.0004752851711026616),
 (u'subject', 0.00047456175379504724),
 (u'accidentally', 0.00047435868928764663),
 (u'cheap', 0.0004739971354086165),
 (u'foreign', 0.0004734641551214253),
 (u'blue', 0.00047337639531510066),
 (u'worthy', 0.0004729205682613548),
 (u'self', 0.00047247283281211326),
 (u'fine', 0.0004720845975598827),
 (u'alone', 0.0004717329650406088),
 (u'occasional', 0.00047017457786499854),
 (u'famous', 0.00046941745294090036),
 (u'former', 0.0004683969802171158),
 (u'british', 0.00046774096203754004),
 (u'of', 0.0004671606382632999),
 (u'frequently', 0.0004659658540222173),
 (u'sharp', 0.0004641282422035381),
 (u'green', 0.0004633604535481145),
 (u'sudden', 0.00046088259016015666),
 (u'year', 0.00046039019423049836),
 (u'deadly', 0.0004588960272715354),
 (u'the', 0.0004575102785248385),
 (u'be', 0.00045646800596322034),
 (u'desperately', 0.00045600552571401745),
 (u'actual', 0.0004552026990842392),
 (u'initial', 0.0004543606667144941),
 (u'local', 0.0004540037455309006),
 (u'extraordinary', 0.00045265254390729675),
 (u'ugly', 0.00045265254390729675),
 (u'heavy', 0.0004506407548232643),
 (u'ridiculous', 0.0004496348602812481),
 (u'limited', 0.0004480802959890412),
 (u'cinematic', 0.00044788778028721993),
 (u'one', 0.00044732721986132855),
 (u'unfortunate', 0.00044594658029385536),
 (u'younger', 0.0004456886586164153),
 (u'current', 0.00044442249765443685),
 (u'american', 0.0004437132656613717),
 (u'ill', 0.00044259359848713466),
 (u'inevitable', 0.0004405818094031022),
 (u'tiny', 0.0004393747359526827),
 (u'greater', 0.0004358876348736932),
 (u'bottom', 0.00043547496018978905),
 (u'directly', 0.000433893970015643),
 (u'fellow', 0.0004321877928800703),
 (u'latter', 0.00042951696944092376),
 (u'white', 0.00042928983196369435),
 (u'latest', 0.0004270929284954093),
 (u'meanwhile', 0.00042687649626813127),
 (u'tom', 0.0004224757076468103),
 (u'unlikely', 0.0004224757076468103),
 (u'red', 0.0004156615833299263),
 (u'romantic', 0.00041560618394523616),
 (u'odd', 0.00041385375442952845),
 (u'unexpected', 0.0004098644924931742),
 (u'teen', 0.00040827484352422847),
 (u'and', 0.0004045741946109285),
 (u'desperate', 0.00039819549456366027),
 (u'creative', 0.0003972532773395381),
 (u'two', 0.00039229887138632385),
 (u'ten', 0.0003886776510350655),
 (u'in', 0.0003876323503151146),
 (u'central', 0.0003875603599074045),
 (u'on', 0.0003857386895905659),
 (u'live', 0.0003802281368821293),
 (u'previously', 0.00037898556127140335),
 (u'bizarre', 0.0003456619426201175),
 (u'angry', 0.0003360602219917809)]

We can make this better by combining multiple seet terms

In [343]:
def seed_score(pos_seed,PMI_MATRIX=pmi_matrix,TERMS=terms):
    score=defaultdict(int)
    for seed in pos_seed:
        c=dict(getcollocations(seed,PMI_MATRIX,TERMS))
        for w in c:
            score[w]+=c[w]
    return score
In [345]:
sorted(seed_score(['good','great','perfect','cool']).items(),key=itemgetter(1),reverse=True)
Out[345]:
[(u'cool', 0.012001912748204434),
 (u'perfect', 0.006782938654467102),
 (u'great', 0.004234935151833858),
 (u'anti', 0.004160925070909675),
 (u'fake', 0.003978386428679741),
 (u'looking', 0.003957222634925364),
 (u'frank', 0.003953470579252501),
 (u'lovely', 0.0038977169233890795),
 (u'eccentric', 0.0038458229553531894),
 (u'greatest', 0.0037893056708582906),
 (u'totally', 0.0036293608998168546),
 (u'amazing', 0.003617561923228757),
 (u'stupid', 0.0035962513836334904),
 (u'generally', 0.003553253311814994),
 (u'climactic', 0.003537863066483464),
 (u'fun', 0.0035376706229829896),
 (u'twice', 0.0034429868622216564),
 (u'known', 0.0034156002474412875),
 (u'plain', 0.0033558593778353143),
 (u'good', 0.003300231759403832),
 (u'nicely', 0.0032826646303092937),
 (u'alien', 0.0032506377240264714),
 (u'overall', 0.003246573557239219),
 (u'convincing', 0.0032306160532405035),
 (u'necessary', 0.0032268324022759576),
 (u'earlier', 0.00320436224269597),
 (u'pretty', 0.0032003653412340915),
 (u'sad', 0.003187690012362329),
 (u'painful', 0.0031550334960725986),
 (u'quiet', 0.0031356051653670504),
 (u'terribly', 0.0031347252607038514),
 (u'pure', 0.003130142164470019),
 (u'past', 0.0031298060186221582),
 (u'intriguing', 0.003115520114794243),
 (u'apart', 0.0030991381997725305),
 (u'tony', 0.003069088503440111),
 (u'mary', 0.0030514393148461045),
 (u'actually', 0.0030506607023727842),
 (u'black', 0.003027284836003895),
 (u'best', 0.00302388631921692),
 (u'perfectly', 0.0030237182959442044),
 (u'horrible', 0.0030226425573185774),
 (u'friendly', 0.003018374685245776),
 (u'maybe', 0.003014183313704173),
 (u'nonetheless', 0.0030069332166719628),
 (u'non', 0.003003272731140307),
 (u'definitely', 0.0030031570539691705),
 (u'necessarily', 0.002993810771295877),
 (u'musical', 0.002991784203916739),
 (u'shallow', 0.002981744162904472),
 (u'extra', 0.002979129889469642),
 (u'classic', 0.0029709775923671394),
 (u'sean', 0.0029546706033888666),
 (u'basically', 0.0029421152254114707),
 (u'visually', 0.0029358658029767504),
 (u'bigger', 0.002923170822543089),
 (u'really', 0.0029218575415147856),
 (u'light', 0.002902417490684917),
 (u'straight', 0.00290152535467921),
 (u'forward', 0.0029006646141443147),
 (u'brilliant', 0.0028974531469318663),
 (u'somewhere', 0.002888470361242569),
 (u'probably', 0.0028667284143727334),
 (u'technical', 0.0028610204137455974),
 (u'fully', 0.00286017815256421),
 (u'forever', 0.002852944783095861),
 (u'green', 0.0028405257859683893),
 (u'sympathetic', 0.002839253674438178),
 (u'excellent', 0.00283813047530481),
 (u'nice', 0.002833410593649864),
 (u'day', 0.0028257993835809634),
 (u'slightly', 0.0028222905234820566),
 (u'mainly', 0.0028146496498095697),
 (u'literally', 0.002808983880889349),
 (u'sadly', 0.002807958075124012),
 (u'sure', 0.0028000700575213024),
 (u'huge', 0.0027946757739472075),
 (u'blue', 0.002783309447572288),
 (u'professional', 0.002773927402898334),
 (u'scary', 0.002770530957340576),
 (u'regular', 0.0027687862144207126),
 (u'interesting', 0.002767678804545164),
 (u'all', 0.002767455753874969),
 (u'moral', 0.0027672538806976406),
 (u'present', 0.002766172519637502),
 (u'john', 0.002763285894343504),
 (u'utterly', 0.0027598334785873114),
 (u'witty', 0.002759802990681556),
 (u'stunning', 0.002754567313948104),
 (u'very', 0.002754475732165381),
 (u'wonderful', 0.0027528746021938714),
 (u'nasty', 0.002750778743910107),
 (u'entire', 0.0027486000007843282),
 (u'nevertheless', 0.0027422265029806514),
 (u'quick', 0.0027375966375266063),
 (u'second', 0.0027345827164219757),
 (u'especially', 0.002733256847499587),
 (u'cold', 0.0027272967804060654),
 (u'same', 0.002726697589261401),
 (u'memorable', 0.0027248127669407428),
 (u'steven', 0.0027193586944238095),
 (u'french', 0.002716473947971556),
 (u'exactly', 0.002715773745840665),
 (u'realistic', 0.0027153709885526394),
 (u'anyway', 0.002711612563487999),
 (u'mad', 0.0027046117438334037),
 (u'entirely', 0.0027039462090588176),
 (u'still', 0.0027030152438311653),
 (u'third', 0.002694415514054729),
 (u'smart', 0.002690365929270928),
 (u'extreme', 0.0026859194745589665),
 (u'though', 0.002681282965005704),
 (u'soft', 0.0026798542026700055),
 (u'also', 0.002679265025818489),
 (u'famous', 0.002674277568707314),
 (u'badly', 0.0026718373920243425),
 (u'constantly', 0.0026691142464650183),
 (u'yet', 0.0026600761270785295),
 (u'funny', 0.0026531437436938293),
 (u'just', 0.002647198765749662),
 (u'always', 0.002645782644141673),
 (u'wrong', 0.00264456909940569),
 (u'inevitable', 0.0026424017992415253),
 (u'boring', 0.002638324709954785),
 (u'again', 0.0026361664213644374),
 (u'final', 0.0026325569909139406),
 (u'never', 0.002625848256123294),
 (u'not', 0.0026194513346862627),
 (u'future', 0.002616954084588698),
 (u'like', 0.0026162045111048924),
 (u'usual', 0.002613310253136084),
 (u'suspenseful', 0.002611112619052867),
 (u'wonderfully', 0.0026104285952339),
 (u'chinese', 0.002610227628625764),
 (u'then', 0.0026099428487068874),
 (u'right', 0.0026074503637217653),
 (u'before', 0.002604359817305096),
 (u'incredibly', 0.002602876658439028),
 (u'willing', 0.0026028274849918065),
 (u'robert', 0.00260242630248303),
 (u'mental', 0.002600612637348587),
 (u'likable', 0.002596743985407318),
 (u'completely', 0.0025966342812692813),
 (u'out', 0.002595421591107008),
 (u'weird', 0.002594854320599445),
 (u'bad', 0.002593939708098807),
 (u'about', 0.0025924880567343464),
 (u'over', 0.002583915258341187),
 (u'poor', 0.002580714006945396),
 (u'top', 0.0025748163584370844),
 (u'danny', 0.002571782212215268),
 (u'else', 0.0025710205049421482),
 (u'deadly', 0.002569539962164636),
 (u'next', 0.0025692064920089814),
 (u'easily', 0.0025685985055154945),
 (u'general', 0.0025597407189698524),
 (u'there', 0.002558311952404383),
 (u'later', 0.002553925871598969),
 (u'dumb', 0.0025518493910496854),
 (u'long', 0.002551438021081097),
 (u'nearly', 0.0025482925872103374),
 (u'evil', 0.002547619195532207),
 (u'obviously', 0.0025417900394499255),
 (u'surely', 0.002540410069301379),
 (u'ever', 0.0025299666270243693),
 (u'as', 0.0025295649156630716),
 (u'whole', 0.00252928721041847),
 (u'attractive', 0.00252799391976053),
 (u'single', 0.002525086026324002),
 (u'little', 0.0025237664375132246),
 (u'minor', 0.002523388079428801),
 (u'therefore', 0.002521419517208388),
 (u'already', 0.0025180371571217573),
 (u'terrific', 0.002516692390671686),
 (u'and', 0.0025147449761868494),
 (u'believable', 0.0025147020336030515),
 (u'intelligent', 0.0025117257402157674),
 (u'certain', 0.0025098535845574114),
 (u'tom', 0.0025081816901269868),
 (u'first', 0.002506983660237951),
 (u'superior', 0.0025068521818184517),
 (u'similar', 0.0025067487200958753),
 (u'average', 0.002506085475734242),
 (u'only', 0.0025057815352721303),
 (u'strong', 0.002502877514350509),
 (u'together', 0.002502209075754721),
 (u'well', 0.0025007427911392047),
 (u'able', 0.002500684083336215),
 (u'lucky', 0.0024996255614257753),
 (u'important', 0.0024967694522345547),
 (u'comic', 0.002494477741432366),
 (u'close', 0.0024890312948109006),
 (u'major', 0.002488996763330637),
 (u'hardly', 0.0024843138320983773),
 (u'too', 0.0024787095371459206),
 (u'desperately', 0.0024772938757274057),
 (u'original', 0.002476324684059444),
 (u'late', 0.0024747309617063505),
 (u'powerful', 0.0024742040581959284),
 (u'seriously', 0.002474163813548089),
 (u'fascinating', 0.0024735683609717705),
 (u'creative', 0.0024716985872205726),
 (u'older', 0.002470205167597138),
 (u'sometimes', 0.002468843682041278),
 (u'disappointing', 0.0024683926476614993),
 (u'almost', 0.0024642063085405586),
 (u'different', 0.002463192492144597),
 (u'here', 0.002461013442194645),
 (u'slowly', 0.00245995228639654),
 (u'traditional', 0.0024577471015898586),
 (u'other', 0.002457246892174661),
 (u'favorite', 0.0024567949763845435),
 (u'away', 0.0024530830995225053),
 (u'seemingly', 0.002452568157058489),
 (u'beautiful', 0.0024503474408517005),
 (u'total', 0.0024495169417283976),
 (u'hearted', 0.002449259178206941),
 (u'sexy', 0.0024479446336431497),
 (u'so', 0.0024453791255933297),
 (u'back', 0.0024437749617670125),
 (u'extremely', 0.002442881899286231),
 (u'last', 0.0024427706214193625),
 (u'once', 0.0024427435350683263),
 (u'violent', 0.0024413148535007635),
 (u'computer', 0.0024410664738611504),
 (u'most', 0.002440544878761546),
 (u'greater', 0.002439910529833608),
 (u'simple', 0.002438425301845975),
 (u'incredible', 0.002436860796263535),
 (u'hot', 0.0024361171552712284),
 (u'wild', 0.0024361150529840993),
 (u'emotional', 0.002431028224492918),
 (u'barely', 0.0024295038179497886),
 (u'much', 0.002429219418852587),
 (u'normal', 0.0024291864897619296),
 (u'robin', 0.002429169545568417),
 (u'originally', 0.002428926091347079),
 (u'occasionally', 0.0024264869193837964),
 (u'instead', 0.002426421516587577),
 (u'biggest', 0.002425814817705052),
 (u'special', 0.002423753782279029),
 (u'key', 0.0024198049087821665),
 (u'emotionally', 0.0024168893316711564),
 (u'awful', 0.0024166736599597668),
 (u'merely', 0.0024160066437449),
 (u'tough', 0.002415850760785141),
 (u'longer', 0.0024137232490317575),
 (u'silly', 0.002412926146521142),
 (u'clear', 0.0024129094987903332),
 (u'thankfully', 0.002407958791075349),
 (u'subtle', 0.002407822697286698),
 (u'solid', 0.0024069495164455216),
 (u'animated', 0.002403947054178053),
 (u'far', 0.002402836369689498),
 (u'effective', 0.0024020099789288174),
 (u'psychological', 0.0023996855512158134),
 (u'absolutely', 0.002397100755116605),
 (u'effectively', 0.002396397590933667),
 (u'desperate', 0.0023941211834815623),
 (u'oh', 0.0023922063459022565),
 (u'intense', 0.002386542866431266),
 (u'half', 0.002379333472657521),
 (u'short', 0.0023792589292618623),
 (u'enough', 0.002377507831705425),
 (u'eventually', 0.002377299019332627),
 (u'no', 0.0023770760677420803),
 (u'least', 0.002368181962538259),
 (u'lame', 0.002365220157299579),
 (u'real', 0.0023617633351179614),
 (u'remarkable', 0.0023606985831208143),
 (u'more', 0.0023605999078189417),
 (u'responsible', 0.002359948741648729),
 (u'soon', 0.002359938226674398),
 (u'outstanding', 0.002359314496448564),
 (u'high', 0.0023563850582576634),
 (u'less', 0.002354248180937743),
 (u'aware', 0.00235403878234738),
 (u'off', 0.0023540198109719282),
 (u'unfortunately', 0.0023538478030223913),
 (u'entertaining', 0.0023512886326058097),
 (u'fairly', 0.0023501918963178886),
 (u'ago', 0.002346131658173984),
 (u'old', 0.0023456785476582464),
 (u'big', 0.002343169816371394),
 (u'usually', 0.0023415700617843986),
 (u'apparent', 0.0023411235971029913),
 (u'even', 0.002339017723228175),
 (u'british', 0.0023343079898948908),
 (u'private', 0.0023336707837249876),
 (u'truly', 0.0023333647838521543),
 (u'brief', 0.002328450862839378),
 (u'political', 0.002328219300061392),
 (u'new', 0.0023280603289829024),
 (u'personal', 0.002327406823136519),
 (u'pathetic', 0.0023266504982631434),
 (u'many', 0.0023251322427836583),
 (u'dull', 0.002323983691361993),
 (u'successful', 0.00232378963778053),
 (u'natural', 0.00232347857148544),
 (u'several', 0.0023221381576374552),
 (u'such', 0.0023216832271255963),
 (u'secret', 0.002320036307164827),
 (u'certainly', 0.002313826438589921),
 (u'national', 0.002310571362533237),
 (u'now', 0.0023071026749417445),
 (u'gary', 0.0023059201550554424),
 (u'ahead', 0.002305765277776907),
 (u'possible', 0.002305683370535466),
 (u'co', 0.002305675716173628),
 (u'up', 0.002304147677700143),
 (u'michael', 0.002304054526092925),
 (u'main', 0.0023034755151444515),
 (u'previously', 0.002302853044911183),
 (u'hilarious', 0.0022996054700948603),
 (u'surprisingly', 0.0022975986386493423),
 (u'indeed', 0.002294415457668248),
 (u'quickly', 0.0022894075586336105),
 (u'honest', 0.0022858426152568534),
 (u'obvious', 0.0022824351550353363),
 (u'lead', 0.0022801250926040386),
 (u'empty', 0.0022787458629400835),
 (u'bright', 0.0022781744601933487),
 (u'otherwise', 0.0022759441718963173),
 (u'typical', 0.002272176030126653),
 (u'directly', 0.0022716390510039673),
 (u'acting', 0.002270839907615521),
 (u'visual', 0.0022693937359295714),
 (u'red', 0.002267580396747179),
 (u'potential', 0.002267569683244895),
 (u'suddenly', 0.002267308638256236),
 (u'alive', 0.0022647225458673897),
 (u'particular', 0.002261098546637612),
 (u'cute', 0.0022601846543045196),
 (u'difficult', 0.0022583789129845796),
 (u'serial', 0.0022574422917854783),
 (u'serious', 0.0022555576176684624),
 (u'impossible', 0.002252838161706063),
 (u'human', 0.002251761457857284),
 (u'capable', 0.002251214627027211),
 (u'however', 0.002250878310420556),
 (u'small', 0.0022489610616680407),
 (u'basic', 0.002247545248914553),
 (u'rare', 0.002245093696867987),
 (u'initial', 0.002236370482808303),
 (u'somewhat', 0.0022362775505530485),
 (u'occasional', 0.0022325583381675933),
 (u'perhaps', 0.002229127640019336),
 (u'better', 0.00222872431278556),
 (u'immediately', 0.00222837544249588),
 (u'happy', 0.0022281474572684256),
 (u'sci', 0.002226199463490062),
 (u'unexpected', 0.0022242244195155953),
 (u'initially', 0.002223905432665024),
 (u'fi', 0.0022224232964827497),
 (u'deep', 0.002220172553119152),
 (u'english', 0.0022199070209070445),
 (u'relatively', 0.0022194005513029185),
 (u'frankly', 0.0022170437766284835),
 (u'tim', 0.0022164491592922848),
 (u'either', 0.002215226909613434),
 (u'decent', 0.0022151294486753015),
 (u'hard', 0.0022118044609765446),
 (u'fantastic', 0.0022116860546185424),
 (u'true', 0.0022096622931451456),
 (u'around', 0.0022052101513649414),
 (u'common', 0.0021964869321670502),
 (u'guilty', 0.002191528056240432),
 (u'impressive', 0.0021908868057265865),
 (u'overly', 0.002187284444122418),
 (u'the', 0.002186014099868943),
 (u'tight', 0.0021806236289156106),
 (u'william', 0.0021792244357119687),
 (u'few', 0.0021725275981657947),
 (u'worse', 0.002172125969565442),
 (u'sharp', 0.002167733580520371),
 (u'american', 0.0021671834202428948),
 (u'quite', 0.0021662678232227126),
 (u'grand', 0.0021638514386397864),
 (u'ultimate', 0.0021635430435549214),
 (u'naked', 0.002163253934841081),
 (u'fair', 0.0021623850316764993),
 (u'clever', 0.002161758342135364),
 (u'numerous', 0.0021531205363190913),
 (u'fast', 0.0021521990875953485),
 (u'spectacular', 0.0021506353944526525),
 (u'popular', 0.002149773941789711),
 (u'international', 0.002145442826787117),
 (u'dead', 0.002144923871247314),
 (u'thin', 0.0021420490070066913),
 (u'rich', 0.002140442241369157),
 (u'genuinely', 0.0021385328377386517),
 (u'finally', 0.002135069354217433),
 (u'strange', 0.00213249946133963),
 (u'david', 0.002131674948749883),
 (u'two', 0.0021311542507946985),
 (u'actual', 0.002130019594255519),
 (u'critical', 0.0021285787536416668),
 (u'early', 0.0021275775518247056),
 (u'ready', 0.002126255316360624),
 (u'complete', 0.002124450206798375),
 (u'rather', 0.0021216369548865406),
 (u'full', 0.002120302975858754),
 (u'often', 0.002120249530885974),
 (u'own', 0.0021176220394398546),
 (u'talented', 0.002117397757039607),
 (u'star', 0.002110450555424798),
 (u'sexual', 0.0021100038830677214),
 (u'slow', 0.002106556435290481),
 (u'ultimately', 0.002102429070835513),
 (u'standard', 0.0021011071697935426),
 (u'recent', 0.0020989821112242834),
 (u'successfully', 0.00209794634455777),
 (u'easy', 0.0020913916102462925),
 (u'cinematic', 0.0020863510243774347),
 (u'practically', 0.0020863338600298487),
 (u'innocent', 0.0020829247055854675),
 (u'apparently', 0.0020818272530590243),
 (u'white', 0.0020805888193709023),
 (u'teen', 0.00208043455309245),
 (u'unique', 0.0020798213565664508),
 (u'along', 0.0020766370391498675),
 (u'unable', 0.0020763540400483855),
 (u'modern', 0.0020714825896373953),
 (u'latter', 0.0020675605429683686),
 (u'unusual', 0.002062296039972969),
 (u'latest', 0.002061272246439931),
 (u'previous', 0.002057485557325315),
 (u'social', 0.0020569363927636954),
 (u'simply', 0.002056561312485684),
 (u'due', 0.0020562829551382658),
 (u'equally', 0.0020540934511252187),
 (u'funniest', 0.0020485830167337998),
 (u'unfortunate', 0.0020484195089316586),
 (u'physical', 0.0020473480444393867),
 (u'accidentally', 0.0020469232499676572),
 (u'safe', 0.0020448668547450237),
 (u'dangerous', 0.0020417794161610345),
 (u'possibly', 0.0020386220903686582),
 (u'various', 0.0020339050284206316),
 (u'ridiculous', 0.0020325475952677158),
 (u'offensive', 0.0020292458611658707),
 (u'terrible', 0.002026561555154676),
 (u'virtually', 0.0020262283399145745),
 (u'weak', 0.0020211161232871035),
 (u'highly', 0.0020189553152879535),
 (u'supposedly', 0.0020187061521539625),
 (u'surprising', 0.0020172937346282825),
 (u'clearly', 0.002006958446305926),
 (u'rarely', 0.0020063581098494266),
 (u'tiny', 0.0020019326856074065),
 (u'female', 0.002000906579375961),
 (u'worst', 0.001997082267018422),
 (u'interested', 0.0019965592969395252),
 (u'married', 0.0019958703396158782),
 (u'predictable', 0.001995796500886652),
 (u'mean', 0.0019952780185648017),
 (u'giant', 0.0019938810972402903),
 (u'former', 0.0019935747326038445),
 (u'wide', 0.001992678512824471),
 (u'thoroughly', 0.0019876947213072387),
 (u'open', 0.001985391496371365),
 (u'of', 0.0019840710261414297),
 (u'aside', 0.001980879544615943),
 (u'particularly', 0.0019799884254985325),
 (u'essentially', 0.001979535958320732),
 (u'young', 0.001979527444016401),
 (u'likely', 0.001978555463730754),
 (u'unbelievable', 0.0019781321288305305),
 (u'middle', 0.0019779221502780183),
 (u'graphic', 0.0019757563038609325),
 (u'fine', 0.0019730514211073512),
 (u'meanwhile', 0.0019706149351678606),
 (u'worthy', 0.0019701465238407046),
 (u'to', 0.001967727301545104),
 (u'on', 0.001966072780443591),
 (u'cheap', 0.0019635613315031036),
 (u'be', 0.0019614212350543806),
 (u'mysterious', 0.001955105730777367),
 (u'constant', 0.0019517160902452907),
 (u'dramatic', 0.0019502335417707808),
 (u'nowhere', 0.0019443437460459846),
 (u'enjoyable', 0.001944335976798465),
 (u'complex', 0.0019425574374672867),
 (u'mostly', 0.0019343488295512537),
 (u'ugly', 0.0019244600986641602),
 (u'available', 0.0019198981505397614),
 (u'exciting', 0.0019180413841347022),
 (u'ex', 0.0019178209412079668),
 (u'dark', 0.0019172892665868906),
 (u'comedic', 0.0019083717108983882),
 (u'thus', 0.0019002996618598237),
 (u'fresh', 0.0018948749882022184),
 (u'life', 0.0018937712571957634),
 (u'heavily', 0.0018928253789474128),
 (u'screen', 0.0018902765330460658),
 (u'down', 0.001889898895897701),
 (u'subject', 0.0018831542361833758),
 (u'ill', 0.0018819965739884203),
 (u'double', 0.0018796786045038223),
 (u'loud', 0.0018791703282450395),
 (u'running', 0.0018682127198850135),
 (u'poorly', 0.0018675936750734959),
 (u'younger', 0.0018669711858897257),
 (u'live', 0.0018649847829290982),
 (u'unnecessary', 0.001860470865744998),
 (u'lee', 0.001845920547781358),
 (u'bizarre', 0.0018344290227850471),
 (u'sweet', 0.0018344275577852434),
 (u'military', 0.0018252766899189777),
 (u'time', 0.0018157018163832289),
 (u'unfunny', 0.0018107499211615543),
 (u'central', 0.0018098151545962167),
 (u'familiar', 0.0018088856542559902),
 (u'extraordinary', 0.001807414152125126),
 (u'dimensional', 0.001807168544195006),
 (u'billy', 0.0018071267256039524),
 (u'angry', 0.0018045689485650237),
 (u'laughable', 0.0018010179250777143),
 (u'large', 0.0017995900716021266),
 (u'further', 0.0017991308991933287),
 (u'romantic', 0.0017907365126554233),
 (u'sudden', 0.0017901045645434214),
 (u'flat', 0.0017847476326207986),
 (u'chris', 0.0017769997084619025),
 (u'public', 0.0017746031851269188),
 (u'somehow', 0.0017618448125107278),
 (u'worth', 0.0017543764067179425),
 (u'largely', 0.0017465367771711331),
 (u'year', 0.0017394608281398418),
 (u'foreign', 0.0017392301594858725),
 (u'sole', 0.0017355727619161666),
 (u'positive', 0.0017317615894826737),
 (u'genuine', 0.001718798629778317),
 (u'recently', 0.0017025823715422455),
 (u'in', 0.0016990012040850324),
 (u'ten', 0.0016791478447310657),
 (u'painfully', 0.001678652772728573),
 (u'low', 0.0016759724349185725),
 (u'anywhere', 0.0016733952458503438),
 (u'limited', 0.001668362161432197),
 (u'free', 0.0016667490901339968),
 (u'frequently', 0.0016625677147607005),
 (u'near', 0.0016612218149129076),
 (u'naturally', 0.0016607535392798487),
 (u'humorous', 0.0016352419056944998),
 (u'odd', 0.0016225109209537126),
 (u'self', 0.0016005335142226706),
 (u'fellow', 0.001598076389246976),
 (u'united', 0.001593774057041965),
 (u'appropriate', 0.0015883322489310553),
 (u'one', 0.0015787649736899742),
 (u'local', 0.0015762071787598516),
 (u'narrative', 0.0015677455276479862),
 (u'crazy', 0.0015569066000159573),
 (u'bottom', 0.001546149619251545),
 (u'alone', 0.0015445381171166998),
 (u'minute', 0.001523900044625383),
 (u'unlikely', 0.0015000995356950866),
 (u'jean', 0.0014303533753913659),
 (u'oddly', 0.0014165951312930588),
 (u'current', 0.0013195330501843442),
 (u'heavy', 0.001248448290982055)]
In [347]:
posscores=seed_score(['good','great','perfect','cool'])
negscores=seed_score(['bad','terrible','wrong',"crap","long","boring"])

## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms:
    sentscores[w] = posscores[w] - negscores[w]
    
In [348]:
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
Out[348]:
[(u'terrible', -0.010972487858524456),
 (u'boring', -0.009152588531402),
 (u'wrong', -0.0037842569272043196),
 (u'unfunny', -0.0028839715464925985),
 (u'bad', -0.002745669347410218),
 (u'frankly', -0.002735683658733542),
 (u'worst', -0.002650800210468679),
 (u'terribly', -0.002497993000217121),
 (u'anywhere', -0.002479642275811881),
 (u'laughable', -0.0024600189948948362),
 (u'horrible', -0.0023085769877623907),
 (u'awful', -0.0022332893067823654),
 (u'exciting', -0.0021194079061992045),
 (u'dull', -0.0019475225393855247),
 (u'running', -0.001919677366722775),
 (u'ugly', -0.0019027857871608356),
 (u'total', -0.0018358263440521236),
 (u'oddly', -0.001825867801362017),
 (u'painfully', -0.0017780445048585325),
 (u'ridiculous', -0.0017569353131335745),
 (u'poorly', -0.0017508500966694365),
 (u'bottom', -0.0016995579532760772),
 (u'current', -0.0016987113085641865),
 (u'successfully', -0.0016642378925818217),
 (u'pathetic', -0.0016356962074799996),
 (u'long', -0.0016266635116819225),
 (u'loud', -0.001602897330102698),
 (u'supposedly', -0.001601963787952321),
 (u'ten', -0.0015963920401226126),
 (u'longer', -0.0015882846862663351),
 (u'fair', -0.0015781345487195808),
 (u'complete', -0.0015756336615723376),
 (u'responsible', -0.0015332676187930182),
 (u'sadly', -0.001527851579187234),
 (u'foreign', -0.0015251667153189461),
 (u'chinese', -0.0015061170510252096),
 (u'positive', -0.0014976172523279425),
 (u'minute', -0.0014838443386630507),
 (u'worth', -0.0014745224449065645),
 (u'low', -0.001469731011335925),
 (u'sole', -0.0014495317049276017),
 (u'worse', -0.0013882709260687686),
 (u'stupid', -0.0013202645150308997),
 (u'unbelievable', -0.001319704417793653),
 (u'unnecessary', -0.0013068250007934096),
 (u'giant', -0.0012862481305175371),
 (u'guilty', -0.0012696870826236872),
 (u'huge', -0.0011716363434408194),
 (u'particular', -0.0011649531345376174),
 (u'nowhere', -0.0011620638741387348),
 (u'predictable', -0.0011613024162225889),
 (u'one', -0.0011593874971369273),
 (u'frequently', -0.0011554285647090838),
 (u'weak', -0.0011531149853807226),
 (u'down', -0.0011419182039187215),
 (u'offensive', -0.0011366657479160089),
 (u'graphic', -0.0011363816849142734),
 (u'seriously', -0.0011344455374323872),
 (u'desperately', -0.0011161513132458799),
 (u'oh', -0.001111547662698142),
 (u'double', -0.0011020533566854943),
 (u'international', -0.0010928198567703579),
 (u'thankfully', -0.0010902301162478444),
 (u'completely', -0.0010760318931627264),
 (u'poor', -0.00103866702754081),
 (u'silly', -0.001038540312821459),
 (u'absolutely', -0.001027435748663518),
 (u'jean', -0.0010140017472082751),
 (u'to', -0.0010115858226274967),
 (u'gary', -0.0010070218297941547),
 (u'possible', -0.0010025590916556012),
 (u'standard', -0.0009912099239919566),
 (u'of', -0.0009878949504556712),
 (u'dumb', -0.0009784543398925452),
 (u'disappointing', -0.0009756900327863331),
 (u'heavy', -0.0009713451522002838),
 (u'flat', -0.0009663081712942411),
 (u'middle', -0.0009621073865780962),
 (u'somehow', -0.0009587009918721025),
 (u'ex', -0.0009476020360521058),
 (u'no', -0.0009470747212862265),
 (u'due', -0.0009423906128011132),
 (u'sudden', -0.0009346695321723595),
 (u'hardly', -0.0009309010280151705),
 (u'narrative', -0.000928523352480348),
 (u'accidentally', -0.0009249678255190534),
 (u'female', -0.0009243270408085982),
 (u'public', -0.0009236482751538752),
 (u'entertaining', -0.0009229050621359738),
 (u'equally', -0.0009172811304320738),
 (u'modern', -0.0009162848762493533),
 (u'physical', -0.0008950524891427249),
 (u'apart', -0.0008926619097619935),
 (u'safe', -0.000887888000516411),
 (u'naked', -0.0008820158184256301),
 (u'cheap', -0.0008739972045461218),
 (u'basic', -0.0008681093271798832),
 (u'possibly', -0.0008674045241233957),
 (u'subject', -0.0008619626400079704),
 (u'plain', -0.0008577237983704335),
 (u'sweet', -0.00084739561689972),
 (u'robin', -0.0008467671520451816),
 (u'twice', -0.0008415855534974386),
 (u'anyway', -0.0008388387974757141),
 (u'angry', -0.0008340617818132718),
 (u'overly', -0.0008330763662077099),
 (u'largely', -0.0008316600758162794),
 (u'aside', -0.0008156820574845604),
 (u'slow', -0.0008140632049922348),
 (u'talented', -0.000813361131030282),
 (u'essentially', -0.0008084723479426809),
 (u'up', -0.0008050649163935399),
 (u'ultimate', -0.000804619511346139),
 (u'practically', -0.0008036196525271233),
 (u'obvious', -0.0007897023305959558),
 (u'appropriate', -0.0007858599546488657),
 (u'unfortunately', -0.0007830743334537918),
 (u'tiny', -0.0007810016970390661),
 (u'lame', -0.0007784745629901267),
 (u'unique', -0.000778336129572441),
 (u'complex', -0.0007777213054844434),
 (u'attractive', -0.0007720889163268158),
 (u'bizarre', -0.0007715003386157852),
 (u'better', -0.0007709039498411344),
 (u'odd', -0.0007552599151565062),
 (u'recently', -0.0007500236040481291),
 (u'half', -0.0007497782196305021),
 (u'rich', -0.0007471604119471463),
 (u'incredibly', -0.000744163886792942),
 (u'even', -0.0007426983048095066),
 (u'central', -0.0007425050228027793),
 (u'extremely', -0.0007316220772763207),
 (u'indeed', -0.0007309357257491715),
 (u'dead', -0.0007016908132519372),
 (u'superior', -0.0007015707094964823),
 (u'least', -0.0007011989621771744),
 (u'big', -0.0006905031732485781),
 (u'otherwise', -0.0006875860325658988),
 (u'aware', -0.0006873664321727212),
 (u'truly', -0.0006829821092590271),
 (u'fascinating', -0.0006779817080232037),
 (u'else', -0.0006768431043362028),
 (u'hard', -0.0006751686847043361),
 (u'free', -0.0006750478288130044),
 (u'military', -0.0006696152782781776),
 (u'mostly', -0.0006673677322305189),
 (u'be', -0.0006666705541028459),
 (u'too', -0.0006648100456499773),
 (u'utterly', -0.0006634455006497703),
 (u'such', -0.0006628058555271372),
 (u'interested', -0.0006590829464311145),
 (u'honest', -0.000657989708254409),
 (u'now', -0.000656435974339334),
 (u'time', -0.0006548332902411959),
 (u'sci', -0.0006544917451371882),
 (u'so', -0.0006524748570703415),
 (u'merely', -0.0006515577959294184),
 (u'the', -0.0006467840837392916),
 (u'fi', -0.0006423589762960891),
 (u'apparently', -0.000639277256057974),
 (u'totally', -0.0006388136916126528),
 (u'alone', -0.0006386257017009485),
 (u'there', -0.0006295539285055061),
 (u'rather', -0.0006289319771022664),
 (u'surprising', -0.0006273162579565118),
 (u'potential', -0.0006213470215711814),
 (u'easy', -0.000614095507121281),
 (u'already', -0.0006138704527556523),
 (u'various', -0.0006066641881557529),
 (u'crazy', -0.0005990740681345009),
 (u'future', -0.00059231466233218),
 (u'former', -0.000592036636930244),
 (u'special', -0.0005892044631823196),
 (u'local', -0.0005876959334561714),
 (u'bigger', -0.0005838593238211101),
 (u'available', -0.0005827300803454157),
 (u'year', -0.000580922857070901),
 (u'though', -0.0005779332324228033),
 (u'dark', -0.0005772746048261371),
 (u'major', -0.0005768078397518804),
 (u'funny', -0.000570069950523432),
 (u'latter', -0.0005694397396364772),
 (u'finally', -0.0005684924103107219),
 (u'only', -0.000567828620661859),
 (u'screen', -0.000567599131636246),
 (u'critical', -0.0005671683394556418),
 (u'chris', -0.0005591477399963297),
 (u'along', -0.0005545144723625335),
 (u'thus', -0.0005539437349819129),
 (u'here', -0.000552998751320494),
 (u'somewhere', -0.0005526140253634174),
 (u'much', -0.0005472668310253412),
 (u'wide', -0.0005417488267065382),
 (u'whole', -0.0005410414667989857),
 (u'painful', -0.0005375112978087131),
 (u'entirely', -0.0005363774322553268),
 (u'rarely', -0.0005360939600075622),
 (u'cinematic', -0.0005355064060903643),
 (u'fellow', -0.0005223937119934513),
 (u'tough', -0.0005223932466972897),
 (u'impressive', -0.0005204000586038864),
 (u'desperate', -0.0005201503282672598),
 (u'just', -0.0005141574466121833),
 (u'perhaps', -0.00051230856441907),
 (u'decent', -0.0005111996882843865),
 (u'few', -0.0005012104054137241),
 (u'wild', -0.0004973584556874676),
 (u'common', -0.000495809685954213),
 (u'early', -0.000495441064020094),
 (u'simply', -0.000494655499325071),
 (u'near', -0.0004927411616316157),
 (u'self', -0.0004917660782779683),
 (u'brief', -0.0004904708404110509),
 (u'strange', -0.0004856694163470239),
 (u'genuine', -0.0004851490258353313),
 (u'young', -0.00048397945794040765),
 (u'main', -0.00048026091243232646),
 (u'single', -0.00047829531336838213),
 (u'really', -0.0004775721943443196),
 (u'seemingly', -0.00047519001237246884),
 (u'large', -0.0004729756630322639),
 (u'shallow', -0.0004705897720634869),
 (u'either', -0.000468003391278571),
 (u'ultimately', -0.000458935267087319),
 (u'enough', -0.0004587562198272491),
 (u'romantic', -0.00045773308400239607),
 (u'deep', -0.00045621751119198726),
 (u'likely', -0.0004485419059861665),
 (u'on', -0.0004469204589403505),
 (u'younger', -0.0004434298896550497),
 (u'lead', -0.000443396464230313),
 (u'far', -0.0004417554607726356),
 (u'unlikely', -0.0004358413663177571),
 (u'spectacular', -0.0004354932315535024),
 (u'dramatic', -0.0004335562827193116),
 (u'mainly', -0.00043321869064741683),
 (u'away', -0.00043155390057646875),
 (u'short', -0.0004314804989861156),
 (u'instead', -0.00043116388455471987),
 (u'occasional', -0.00043083489345647694),
 (u'relatively', -0.0004248372994793332),
 (u'previous', -0.0004235288177482448),
 (u'climactic', -0.00042281811888987813),
 (u'back', -0.0004193178584839625),
 (u'lee', -0.00041872402509387585),
 (u'not', -0.0004185272461878745),
 (u'ever', -0.00041848457566656325),
 (u'pretty', -0.00041769940523365403),
 (u'funniest', -0.00041687076238961385),
 (u'ahead', -0.0004165029303795421),
 (u'impossible', -0.0004155484971273449),
 (u'live', -0.00041471729212954624),
 (u'therefore', -0.00041341784099136174),
 (u'quick', -0.00041187248937471065),
 (u'alive', -0.0004117125503202801),
 (u'badly', -0.0004108524460450782),
 (u'typical', -0.000409235693942564),
 (u'happy', -0.0004068915656132992),
 (u'constant', -0.00040174761783828296),
 (u'previously', -0.00040000604515426147),
 (u'never', -0.0003983941378316201),
 (u'evil', -0.00039506279830728557),
 (u'full', -0.0003942338907744736),
 (u'simple', -0.00039197455254401034),
 (u'worthy', -0.00039168348357142236),
 (u'then', -0.00039077370012267605),
 (u'dimensional', -0.0003896250283736383),
 (u'successful', -0.0003867862347564691),
 (u'often', -0.00038614169990034124),
 (u'easily', -0.0003860212360946879),
 (u'difficult', -0.000380357876911706),
 (u'quite', -0.000379233709865312),
 (u'thin', -0.00037527043094111485),
 (u'familiar', -0.00037472614921202135),
 (u'around', -0.0003720425242989393),
 (u'many', -0.00037151132475908176),
 (u'old', -0.0003686425008043049),
 (u'real', -0.00036753891906066106),
 (u'teen', -0.0003654854190965301),
 (u'maybe', -0.0003611065804915033),
 (u'ago', -0.0003574564824057854),
 (u'certainly', -0.00035481411511190455),
 (u'serious', -0.0003486634118993405),
 (u'however', -0.00034805065516387056),
 (u'ill', -0.0003462147957509242),
 (u'quickly', -0.00034413844309944637),
 (u'as', -0.00033728996602781043),
 (u'acting', -0.0003339975947398836),
 (u'different', -0.0003321248305263773),
 (u'ready', -0.0003294929896921037),
 (u'small', -0.00032775007962081376),
 (u'naturally', -0.00032626435831311154),
 (u'apparent', -0.0003176915986955846),
 (u'usual', -0.0003167083678088465),
 (u'comedic', -0.0003126089317118585),
 (u'entire', -0.0003109801245195676),
 (u'white', -0.0003090466973696814),
 (u'fast', -0.0003051831476534321),
 (u'directly', -0.00030150982004841865),
 (u'other', -0.0003014951572378752),
 (u'first', -0.00030124314858892727),
 (u'bright', -0.000301236409938576),
 (u'important', -0.00030112588430155924),
 (u'forever', -0.0003000092986698397),
 (u'billy', -0.0002992233022370851),
 (u'meanwhile', -0.00029487907702504526),
 (u'interesting', -0.0002943217343913131),
 (u'capable', -0.0002897533323065397),
 (u'immediately', -0.00028673215006844643),
 (u'psychological', -0.0002864699751391384),
 (u'social', -0.00028308869410780327),
 (u'next', -0.0002809045391921473),
 (u'english', -0.0002802107507452966),
 (u'obviously', -0.00027704797934609325),
 (u'in', -0.00027678957462428494),
 (u'eventually', -0.00027305546760727313),
 (u'fairly', -0.00027229365184522174),
 (u'intense', -0.00026985637420645593),
 (u'unable', -0.0002683859719104266),
 (u'more', -0.00026716151340399783),
 (u'violent', -0.00026459918765267066),
 (u'innocent', -0.00026205143202198515),
 (u'virtually', -0.0002589694961312146),
 (u'genuinely', -0.00025800944437602774),
 (u'soon', -0.0002545594041449741),
 (u'personal', -0.00025286180823031953),
 (u'out', -0.00025086090745423143),
 (u'right', -0.00025051898635050206),
 (u'mean', -0.0002502807827501163),
 (u'recent', -0.00024867750382279695),
 (u'limited', -0.00024784695862205795),
 (u'likable', -0.00024042725922320766),
 (u'mary', -0.00023855912341423993),
 (u'believable', -0.00023802304819741564),
 (u'little', -0.00023745748842529495),
 (u'tight', -0.00023723216108617606),
 (u'further', -0.00023584754345536097),
 (u'hilarious', -0.00023404885314757227),
 (u'sexy', -0.00023348151474449725),
 (u'exactly', -0.00023063377365658633),
 (u'again', -0.00022367754474764872),
 (u'suddenly', -0.00022132068168206898),
 (u'thoroughly', -0.00022076378917465717),
 (u'basically', -0.00021691414673864718),
 (u'about', -0.00021618939077793116),
 (u'clear', -0.00021400614115190146),
 (u'barely', -0.00021357675691639636),
 (u'mental', -0.00020952791125236842),
 (u'several', -0.00020854603013824432),
 (u'regular', -0.0002042841580624016),
 (u'new', -0.0002020211638352632),
 (u'certain', -0.00020169403649717126),
 (u'sure', -0.00020014067646794255),
 (u'david', -0.00019961008410458595),
 (u'human', -0.00019925137263039324),
 (u'top', -0.0001899368272478951),
 (u'cute', -0.00018961027049064529),
 (u'third', -0.00018946261705126213),
 (u'empty', -0.00018932179220444711),
 (u'life', -0.0001881061929275838),
 (u'average', -0.00018529951201760137),
 (u'necessarily', -0.00018083506978792746),
 (u'heavily', -0.00018056593853440356),
 (u'able', -0.00017675713661988212),
 (u'famous', -0.00017340787040879407),
 (u'most', -0.00016877056962312335),
 (u'mysterious', -0.00016791894103787043),
 (u'straight', -0.00016612820272913012),
 (u'actual', -0.00016296115754637318),
 (u'effectively', -0.00016113722883281424),
 (u'popular', -0.00016088036779547807),
 (u'nearly', -0.00015392284302645064),
 (u'open', -0.00015301437709358805),
 (u'united', -0.000150506573371944),
 (u'like', -0.00014684414499384693),
 (u'own', -0.0001465038219682995),
 (u'well', -0.00014394345620901295),
 (u'fantastic', -0.00014191595466896134),
 (u'almost', -0.0001400886356707153),
 (u'once', -0.0001368067428779078),
 (u'fine', -0.00011134453079526731),
 (u'good', -0.00010961438203885027),
 (u'two', -0.00010518289697645242),
 (u'together', -0.00010393211528060466),
 (u'actually', -0.00010134456936546597),
 (u'numerous', -9.962759606895041e-05),
 (u'over', -9.620453254701551e-05),
 (u'sexual', -8.823802345022313e-05),
 (u'probably', -8.39454478180437e-05),
 (u'star', -7.851460842608774e-05),
 (u'alien', -7.513072776308598e-05),
 (u'incredible', -7.510927174698197e-05),
 (u'national', -7.010488710382408e-05),
 (u'last', -6.685297325562545e-05),
 (u'high', -6.557268151637574e-05),
 (u'similar', -6.532964703839881e-05),
 (u'technical', -6.412624532036197e-05),
 (u'occasionally', -6.369915895968652e-05),
 (u'yet', -6.284366017743652e-05),
 (u'close', -6.248830923808076e-05),
 (u'subtle', -5.8596990943074294e-05),
 (u'clearly', -5.6041362763975706e-05),
 (u'red', -5.3225460752209883e-05),
 (u'less', -5.217972277071119e-05),
 (u'tom', -4.91060465189477e-05),
 (u'older', -4.3549597780270094e-05),
 (u'emotional', -4.032711358606736e-05),
 (u'usually', -3.9379866510426756e-05),
 (u'inevitable', -3.6282134903240816e-05),
 (u'later', -3.373131751310907e-05),
 (u'very', -3.174277372530645e-05),
 (u'private', -3.1614666906534025e-05),
 (u'stunning', -2.981091908429035e-05),
 (u'originally', -2.2794236577809452e-05),
 (u'dangerous', -2.2306624481820136e-05),
 (u'same', -1.9959277388562156e-05),
 (u'enjoyable', -9.85683993637407e-06),
 (u'extraordinary', -7.890864187094478e-06),
 (u'key', -7.418521772090299e-06),
 (u'cold', -4.944599197635301e-06),
 (u'favorite', -8.634601881634361e-07),
 (u'particularly', -7.76250227279944e-07),
 (u'general', 1.101516064030755e-05),
 (u'true', 1.1805019675490552e-05),
 (u'visual', 3.201351655334698e-05),
 (u'off', 3.20506856284937e-05),
 (u'deadly', 3.215829148920345e-05),
 (u'soft', 3.795379437129039e-05),
 (u'french', 3.837116293783395e-05),
 (u'before', 4.1991564575691656e-05),
 (u'also', 4.4783796449971575e-05),
 (u'sometimes', 4.768285359741612e-05),
 (u'and', 4.90900251027852e-05),
 (u'weird', 4.947972548887455e-05),
 (u'still', 5.6334215726506e-05),
 (u'terrific', 5.969721653504645e-05),
 (u'rare', 6.314096557611401e-05),
 (u'original', 6.409569795860466e-05),
 (u'normal', 6.539526715559733e-05),
 (u'surprisingly', 6.691039584375957e-05),
 (u'tim', 6.729369707955809e-05),
 (u'nasty', 7.067260219057901e-05),
 (u'realistic', 7.212112179272876e-05),
 (u'strong', 7.35983227081842e-05),
 (u'nice', 7.386518593153991e-05),
 (u'married', 7.393385255529284e-05),
 (u'late', 7.592604510232729e-05),
 (u'powerful', 8.25294213718502e-05),
 (u'american', 8.326019014575541e-05),
 (u'clever', 8.329213289233256e-05),
 (u'secret', 8.548700184917704e-05),
 (u'smart', 9.268513096002354e-05),
 (u'fresh', 9.767509726045531e-05),
 (u'michael', 0.0001016982636787467),
 (u'humorous', 0.00010207761912246673),
 (u'grand', 0.00010435025496580716),
 (u'robert', 0.00012628071077446694),
 (u'danny', 0.00013877623758952332),
 (u'necessary', 0.0001438228039764189),
 (u'mad', 0.00014411909468519356),
 (u'slowly', 0.00014953402449897164),
 (u'greater', 0.00015546179668127127),
 (u'hot', 0.00015992211413762758),
 (u'intelligent', 0.00016263174566505736),
 (u'especially', 0.00019207270167562098),
 (u'intriguing', 0.00019872580881751846),
 (u'extra', 0.00019993570131601312),
 (u'fun', 0.00020292320002075152),
 (u'brilliant', 0.00021403595449744723),
 (u'nevertheless', 0.00021588599225877848),
 (u'witty', 0.00022385923452095634),
 (u'slightly', 0.00022961481408002122),
 (u'sympathetic', 0.0002302595681872681),
 (u'biggest', 0.00023079503523019528),
 (u'beautiful', 0.00023579711492265282),
 (u'comic', 0.00023855794721270923),
 (u'extreme', 0.00024578270717409137),
 (u'latest', 0.00024824345238829565),
 (u'unusual', 0.0002530611082013979),
 (u'initially', 0.0002530628290485521),
 (u'highly', 0.00026438372066875534),
 (u'unfortunate', 0.00026675969756389346),
 (u'natural', 0.00026923164506055097),
 (u'initial', 0.0002748938324180273),
 (u'non', 0.0002752331506748959),
 (u'serial', 0.0002756322163883231),
 (u'blue', 0.00029048945785789407),
 (u'moral', 0.0002912641065306379),
 (u'always', 0.00029159719750840004),
 (u'willing', 0.000296038683355248),
 (u'second', 0.0003020498409727593),
 (u'literally', 0.00030452762415470206),
 (u'final', 0.0003152084839881915),
 (u'all', 0.00032168970628067804),
 (u'co', 0.0003417054595779819),
 (u'memorable', 0.0003498900458939291),
 (u'william', 0.0003540704811879115),
 (u'black', 0.00035521208784980137),
 (u'remarkable', 0.0003566660721891061),
 (u'visually', 0.000359933098097261),
 (u'minor', 0.0003618028522602351),
 (u'lucky', 0.0003672702501654841),
 (u'wonderful', 0.0003778512338285145),
 (u'effective', 0.0003830990144812743),
 (u'light', 0.0003857061782790933),
 (u'forward', 0.0004006219278071864),
 (u'animated', 0.00040249539286419094),
 (u'constantly', 0.0004036991860032579),
 (u'present', 0.000431445941106833),
 (u'unexpected', 0.0004471150032176463),
 (u'solid', 0.00045683245559449605),
 (u'scary', 0.0004586701851304227),
 (u'political', 0.0004673296065559056),
 (u'fully', 0.00047278144155897105),
 (u'overall', 0.0004748742705482376),
 (u'sad', 0.0005042726632991479),
 (u'fake', 0.0005083772153702511),
 (u'creative', 0.0005144691118677577),
 (u'steven', 0.0005346174697472804),
 (u'british', 0.0005471901111347385),
 (u'computer', 0.0005693113675447382),
 (u'somewhat', 0.0005694140016848977),
 (u'surely', 0.000576167354583601),
 (u'classic', 0.0005922804075833362),
 (u'earlier', 0.0006228774574494906),
 (u'sharp', 0.0006241216900057577),
 (u'best', 0.0006285737755879527),
 (u'green', 0.0006580402536190189),
 (u'wonderfully', 0.0006671343716093095),
 (u'friendly', 0.0006784541548675283),
 (u'pure', 0.0006843221033917685),
 (u'john', 0.0007138235664288961),
 (u'professional', 0.0007243867175374245),
 (u'definitely', 0.0007244897007716382),
 (u'musical', 0.0007417843886875146),
 (u'sean', 0.0007546407856791931),
 (u'past', 0.0007576435856556799),
 (u'excellent', 0.0007616899195319734),
 (u'emotionally', 0.0007727788550710918),
 (u'day', 0.0007769853341199288),
 (u'perfectly', 0.0007822992428988228),
 (u'generally', 0.0008262218888549855),
 (u'nonetheless', 0.0008952668065729035),
 (u'amazing', 0.0008995151565794725),
 (u'outstanding', 0.000948975985921854),
 (u'traditional', 0.0009850701844558505),
 (u'known', 0.0010065109302026419),
 (u'nicely', 0.0010459151709915544),
 (u'hearted', 0.0010560164273679716),
 (u'tony', 0.0010741410742027158),
 (u'suspenseful', 0.0011158487392470323),
 (u'anti', 0.001117982032020231),
 (u'eccentric', 0.0012097055720385144),
 (u'frank', 0.0012810788158073706),
 (u'convincing', 0.0013331906526105812),
 (u'great', 0.0014669121817867479),
 (u'quiet', 0.0014792862564728217),
 (u'lovely', 0.0014831258055575964),
 (u'looking', 0.0014881835706561514),
 (u'greatest', 0.0017617484017726204),
 (u'perfect', 0.004193549800085605),
 (u'cool', 0.008314337620380668)]
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Now let's apply this methodology to real (and important!) scenario where we don't have any sentiment labels: the Kardashians

In [349]:
## Loading the Kardashian data
with open("kardashian-transcripts.json", "rb") as f:
    transcripts = json.load(f)
In [350]:
msgs = [m['text'].lower() for transcript in transcripts
        for m in transcript ]
In [351]:
#msgs_pos_tagged = [pos_tag(tokenizer.tokenize(m)) for m in msgs]
In [352]:
msgs_adj_adv_only_tokenized=[[w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]]
                      for m in msgs_pos_tagged]
In [353]:
msgs_adj_adv_only=[" ".join([w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]])
                      for m in msgs_pos_tagged]
In [354]:
msgs[23]
Out[354]:
u'and then if you could take out the trash, and then if you go to dash, maybe tomorrow or whatever, later today and just...'
In [355]:
msgs_adj_adv_only[23]
Out[355]:
u'then then maybe later just'
In [356]:
vec = CountVectorizer(min_df = 10)
X = vec.fit_transform(msgs_adj_adv_only)
terms_kard = vec.get_feature_names()
len(terms_kard)
Out[356]:
347
In [ ]:
 
In [370]:
pmi_matrix_kard=getcollocations_matrix(X)
In [371]:
getcollocations("good",pmi_matrix_kard,terms_kard)
Out[371]:
[(u'good', 0.0014394723893038387),
 (u'changei', 0.0013550135501355014),
 (u'positive', 0.0006097560975609756),
 (u'horrible', 0.0003695491500369549),
 (u'awful', 0.00031269543464665416),
 (u'nude', 0.00030795762503079576),
 (u'you', 0.0002463661000246366),
 (u'extremely', 0.00022583559168925022),
 (u'proud', 0.00021557033752155703),
 (u'willing', 0.00019357336430507162),
 (u'pretty', 0.00016592002654720425),
 (u'strong', 0.00016260162601626016),
 (u'and', 0.00013550135501355014),
 (u'anywhere', 0.00013550135501355014),
 (u'such', 0.00013428062208550013),
 (u'adrienne', 0.0001231830500123183),
 (u'dramatic', 0.0001231830500123183),
 (u'honest', 0.0001231830500123183),
 (u'online', 0.0001231830500123183),
 (u'though', 0.0001231830500123183),
 (u'two', 0.0001231830500123183),
 (u'kimberly', 0.00011291779584462511),
 (u'fun', 0.00010986596352450011),
 (u'half', 0.00010423181154888472),
 (u'very', 0.0001007104665641251),
 (u'really', 9.148499128303384e-05),
 (u'all', 7.970667941973537e-05),
 (u'instead', 7.527853056308341e-05),
 (u'too', 7.501805121857447e-05),
 (u'black', 7.259001161440186e-05),
 (u'super', 7.13165026387106e-05),
 (u'big', 6.929046563192905e-05),
 (u'actually', 6.900531968282646e-05),
 (u'yeah', 6.45244547683572e-05),
 (u'sometimes', 6.302388605281402e-05),
 (u'like', 6.159152500615915e-05),
 (u'hard', 6.067224851352991e-05),
 (u'about', 5.76601510695958e-05),
 (u'clean', 5.76601510695958e-05),
 (u'um', 5.741582839557209e-05),
 (u'busy', 5.6458897922312554e-05),
 (u'sure', 5.474802222769703e-05),
 (u'before', 5.4200542005420054e-05),
 (u'close', 5.313778627982358e-05),
 (u'real', 5.2930216802168025e-05),
 (u'always', 5.211590577444236e-05),
 (u'great', 5.113258679756609e-05),
 (u'smart', 5.01856870420556e-05),
 (u'also', 4.9573666468372e-05),
 (u'not', 4.392812007920578e-05),
 (u'back', 4.3245113302196845e-05),
 (u'maybe', 4.065040650406504e-05),
 (u'single', 4.0448165675686606e-05),
 (u'own', 3.9562439420014635e-05),
 (u'gon', 3.8714672861014324e-05),
 (u'definitely', 3.8062178374592734e-05),
 (u'still', 3.7639265281541705e-05),
 (u'healthy', 3.662198784150004e-05),
 (u'armenian', 3.474393718296157e-05),
 (u'okay', 3.3875338753387534e-05),
 (u'so', 3.373213628801389e-05),
 (u'beautiful', 3.22622273841786e-05),
 (u'best', 3.169622339498249e-05),
 (u'rude', 3.151194302640701e-05),
 (u'nervous', 3.0795762503079576e-05),
 (u'different', 3.0449742699674187e-05),
 (u'least', 3.0111412225233366e-05),
 (u'fine', 2.9456816307293507e-05),
 (u'hot', 2.88300755347979e-05),
 (u'as', 2.8526601055484238e-05),
 (u'whole', 2.8526601055484238e-05),
 (u'again', 2.8426857695150378e-05),
 (u'gorgeous', 2.8229448961156277e-05),
 (u'ready', 2.6920799009314596e-05),
 (u'absolutely', 2.656889313991179e-05),
 (u'far', 2.656889313991179e-05),
 (u'well', 2.589197866500958e-05),
 (u'pregnant', 2.5809781907342885e-05),
 (u'only', 2.50928435210278e-05),
 (u'just', 2.3923261831488373e-05),
 (u'probably', 2.3362302588543127e-05),
 (u'then', 2.3228803716608595e-05),
 (u'right', 2.2860658054433303e-05),
 (u'few', 2.2583559168925024e-05),
 (u'happy', 2.128293534244243e-05),
 (u'now', 2.089995193011056e-05),
 (u'long', 2.0846362309776944e-05),
 (u'next', 2.068722977306109e-05),
 (u'never', 2.0592911096284215e-05),
 (u'here', 2.018863308703201e-05),
 (u'together', 2.012396361587378e-05),
 (u'better', 1.9357336430507162e-05),
 (u'cool', 1.9084697889232416e-05),
 (u'comfortable', 1.8819632640770853e-05),
 (u'anymore', 1.782912565967765e-05),
 (u'obviously', 1.7371968591480784e-05),
 (u'enough', 1.6728562347351868e-05),
 (u'perfect', 1.6524555489457333e-05),
 (u'first', 1.6325464459463874e-05),
 (u'honestly', 1.6325464459463874e-05),
 (u'old', 1.4188623561628287e-05),
 (u'already', 1.4114724480578139e-05),
 (u'ever', 1.3415975743915856e-05),
 (u'bad', 1.302897644361059e-05),
 (u'na', 1.2663678038649544e-05),
 (u'new', 1.2569698980848807e-05),
 (u'up', 1.0112041418921651e-05),
 (u'there', 9.801183002788436e-06),
 (u'wrong', 9.475619231716793e-06),
 (u'else', 9.03342366757001e-06),
 (u'crazy', 8.742022904100009e-06),
 (u'last', 8.416233230655287e-06),
 (u'little', 6.913334439466843e-06),
 (u'more', 5.000050000500005e-06),
 (u'even', 3.4391206856230997e-06),
 (u'much', 2.9780517585395634e-06),
 (u'able', 0.0),
 (u'acceptable', 0.0),
 (u'accurate', 0.0),
 (u'active', 0.0),
 (u'afraid', 0.0),
 (u'ago', 0.0),
 (u'ahead', 0.0),
 (u'alcoholic', 0.0),
 (u'almost', 0.0),
 (u'alone', 0.0),
 (u'along', 0.0),
 (u'amazing', 0.0),
 (u'american', 0.0),
 (u'anal', 0.0),
 (u'angry', 0.0),
 (u'annoying', 0.0),
 (u'anxious', 0.0),
 (u'anyway', 0.0),
 (u'apart', 0.0),
 (u'apparently', 0.0),
 (u'appropriate', 0.0),
 (u'around', 0.0),
 (u'atm', 0.0),
 (u'away', 0.0),
 (u'awesome', 0.0),
 (u'awkward', 0.0),
 (u'barely', 0.0),
 (u'basic', 0.0),
 (u'basically', 0.0),
 (u'belly', 0.0),
 (u'bible', 0.0),
 (u'bigger', 0.0),
 (u'biggest', 0.0),
 (u'boring', 0.0),
 (u'bright', 0.0),
 (u'bunim', 0.0),
 (u'certain', 0.0),
 (u'certainly', 0.0),
 (u'clear', 0.0),
 (u'clearly', 0.0),
 (u'cold', 0.0),
 (u'common', 0.0),
 (u'complete', 0.0),
 (u'completely', 0.0),
 (u'constantly', 0.0),
 (u'couple', 0.0),
 (u'cute', 0.0),
 (u'dead', 0.0),
 (u'deep', 0.0),
 (u'delicious', 0.0),
 (u'desperate', 0.0),
 (u'diaper', 0.0),
 (u'difficult', 0.0),
 (u'disappointed', 0.0),
 (u'done', 0.0),
 (u'double', 0.0),
 (u'down', 0.0),
 (u'drunk', 0.0),
 (u'dry', 0.0),
 (u'dumb', 0.0),
 (u'early', 0.0),
 (u'easier', 0.0),
 (u'easy', 0.0),
 (u'em', 0.0),
 (u'embarrassing', 0.0),
 (u'emotional', 0.0),
 (u'entire', 0.0),
 (u'especially', 0.0),
 (u'everywhere', 0.0),
 (u'exactly', 0.0),
 (u'excited', 0.0),
 (u'exciting', 0.0),
 (u'extra', 0.0),
 (u'fabulous', 0.0),
 (u'fair', 0.0),
 (u'fast', 0.0),
 (u'fat', 0.0),
 (u'favorite', 0.0),
 (u'female', 0.0),
 (u'finally', 0.0),
 (u'forever', 0.0),
 (u'forward', 0.0),
 (u'free', 0.0),
 (u'fresh', 0.0),
 (u'full', 0.0),
 (u'funny', 0.0),
 (u'fur', 0.0),
 (u'girlfriend', 0.0),
 (u'glad', 0.0),
 (u'god', 0.0),
 (u'gray', 0.0),
 (u'gross', 0.0),
 (u'grown', 0.0),
 (u'guilty', 0.0),
 (u'guys', 0.0),
 (u'high', 0.0),
 (u'hopefully', 0.0),
 (u'huge', 0.0),
 (u'huh', 0.0),
 (u'hundred', 0.0),
 (u'hungry', 0.0),
 (u'important', 0.0),
 (u'in', 0.0),
 (u'incredible', 0.0),
 (u'inside', 0.0),
 (u'interested', 0.0),
 (u'jealous', 0.0),
 (u'kardashian', 0.0),
 (u'kelly', 0.0),
 (u'khloe', 0.0),
 (u'kim', 0.0),
 (u'kmart', 0.0),
 (u'kris', 0.0),
 (u'laker', 0.0),
 (u'late', 0.0),
 (u'lately', 0.0),
 (u'later', 0.0),
 (u'less', 0.0),
 (u'light', 0.0),
 (u'lily', 0.0),
 (u'literally', 0.0),
 (u'live', 0.0),
 (u'low', 0.0),
 (u'luxurious', 0.0),
 (u'mad', 0.0),
 (u'major', 0.0),
 (u'male', 0.0),
 (u'many', 0.0),
 (u'married', 0.0),
 (u'mean', 0.0),
 (u'miserable', 0.0),
 (u'moral', 0.0),
 (u'most', 0.0),
 (u'murray', 0.0),
 (u'naked', 0.0),
 (u'natural', 0.0),
 (u'necessary', 0.0),
 (u'nice', 0.0),
 (u'normal', 0.0),
 (u'normally', 0.0),
 (u'off', 0.0),
 (u'often', 0.0),
 (u'oh', 0.0),
 (u'older', 0.0),
 (u'once', 0.0),
 (u'open', 0.0),
 (u'other', 0.0),
 (u'out', 0.0),
 (u'outside', 0.0),
 (u'over', 0.0),
 (u'past', 0.0),
 (u'personal', 0.0),
 (u'poor', 0.0),
 (u'possible', 0.0),
 (u'possibly', 0.0),
 (u'potential', 0.0),
 (u'present', 0.0),
 (u'private', 0.0),
 (u'professional', 0.0),
 (u'public', 0.0),
 (u'quick', 0.0),
 (u'quiet', 0.0),
 (u'rather', 0.0),
 (u'red', 0.0),
 (u'regular', 0.0),
 (u'rich', 0.0),
 (u'rid', 0.0),
 (u'ridiculous', 0.0),
 (u'rob', 0.0),
 (u'sad', 0.0),
 (u'safe', 0.0),
 (u'same', 0.0),
 (u'san', 0.0),
 (u'scared', 0.0),
 (u'scary', 0.0),
 (u'scott', 0.0),
 (u'second', 0.0),
 (u'secret', 0.0),
 (u'selfish', 0.0),
 (u'sensitive', 0.0),
 (u'serious', 0.0),
 (u'seriously', 0.0),
 (u'sexual', 0.0),
 (u'sexy', 0.0),
 (u'short', 0.0),
 (u'sick', 0.0),
 (u'skin', 0.0),
 (u'small', 0.0),
 (u'somewhere', 0.0),
 (u'soon', 0.0),
 (u'sorry', 0.0),
 (u'special', 0.0),
 (u'straight', 0.0),
 (u'stupid', 0.0),
 (u'sudden', 0.0),
 (u'supportive', 0.0),
 (u'sweet', 0.0),
 (u'ta', 0.0),
 (u'tall', 0.0),
 (u'ten', 0.0),
 (u'thebouncedryer', 0.0),
 (u'tired', 0.0),
 (u'top', 0.0),
 (u'total', 0.0),
 (u'totally', 0.0),
 (u'touch', 0.0),
 (u'tough', 0.0),
 (u'true', 0.0),
 (u'truly', 0.0),
 (u'truthful', 0.0),
 (u'tryclearblue', 0.0),
 (u'twice', 0.0),
 (u'ugly', 0.0),
 (u'uh', 0.0),
 (u'uncomfortable', 0.0),
 (u'upset', 0.0),
 (u'usually', 0.0),
 (u'wasteful', 0.0),
 (u'wear', 0.0),
 (u'weird', 0.0),
 (u'welcome', 0.0),
 (u'white', 0.0),
 (u'wonderful', 0.0),
 (u'worried', 0.0),
 (u'worse', 0.0),
 (u'worst', 0.0),
 (u'year', 0.0),
 (u'yes', 0.0),
 (u'yet', 0.0),
 (u'young', 0.0),
 (u'younger', 0.0)]
In [375]:
posscores=seed_score(['good',"rude"],pmi_matrix_kard,terms_kard)
negscores=seed_score(['bad'],pmi_matrix_kard,terms_kard)

## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms_kard:
    sentscores[w]=posscores[w]-negscores[w]

neglexicon_kard = sorted(sentscores.items(),key=itemgetter(1),reverse=False)[:10]
poslexicon_kard = sorted(sentscores.items(),key=itemgetter(1),reverse=False)[-10:]
In [380]:
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
Out[380]:
[(u'bad', -0.004933346763201359),
 (u'over', -0.0008741258741258741),
 (u'horrible', -0.0005045767240889192),
 (u'appropriate', -0.0004807692307692308),
 (u'san', -0.00040064102564102563),
 (u'worried', -0.0003434065934065934),
 (u'able', -0.0002403846153846154),
 (u'worst', -0.00022893772893772894),
 (u'high', -0.00022361359570661896),
 (u'rich', -0.00020903010033444816),
 (u'year', -0.0001923076923076923),
 (u'normal', -0.00016869095816464237),
 (u'ready', -0.00016411333242216783),
 (u'fast', -0.00016025641025641026),
 (u'especially', -0.00015762925598991173),
 (u'busy', -0.00014386161489820027),
 (u'entire', -0.00012651821862348178),
 (u'sorry', -0.0001201923076923077),
 (u'enough', -0.0001019798896944335),
 (u'rude', -8.029485482690247e-05),
 (u'seriously', -7.754342431761787e-05),
 (u'again', -7.243382008860433e-05),
 (u'other', -6.868131868131868e-05),
 (u'away', -6.585879873551106e-05),
 (u'now', -6.56137913959721e-05),
 (u'probably', -5.9528944095807e-05),
 (u'around', -5.7234432234432234e-05),
 (u'really', -5.550177990755901e-05),
 (u'long', -5.3118134731643175e-05),
 (u'still', -5.139207374979732e-05),
 (u'so', -4.982099447037755e-05),
 (u'like', -4.767420925957511e-05),
 (u'obviously', -4.426511227636932e-05),
 (u'too', -4.1431243431412225e-05),
 (u'totally', -4.074315514993481e-05),
 (u'not', -3.9996683296969165e-05),
 (u'never', -3.7859275015476366e-05),
 (u'nice', -3.4965034965034965e-05),
 (u'then', -3.171625122844635e-05),
 (u'little', -2.988022913981102e-05),
 (u'right', -2.875567040238595e-05),
 (u'much', -2.8721018402069058e-05),
 (u'up', -2.5766259384752287e-05),
 (u'different', -2.356927199349781e-05),
 (u'last', -2.1445209674265877e-05),
 (u'even', -2.0965408795048512e-05),
 (u'just', -1.8517524783027054e-05),
 (u'more', -1.274051202050482e-05),
 (u'hard', -1.1084353093817969e-05),
 (u'old', -1.0982540352991126e-05),
 (u'only', -1.0519692091507817e-05),
 (u'ever', -1.0384481224857945e-05),
 (u'also', -9.056727527875654e-06),
 (u'here', -5.0928339122264e-06),
 (u'well', -4.7302653330305954e-06),
 (u'together', -3.67649335290002e-06),
 (u'first', -2.982536776248202e-06),
 (u'rob', 0.0),
 (u'skin', 0.0),
 (u'certainly', 0.0),
 (u'young', 0.0),
 (u'finally', 0.0),
 (u'ta', 0.0),
 (u'worse', 0.0),
 (u'fat', 0.0),
 (u'bunim', 0.0),
 (u'anxious', 0.0),
 (u'quick', 0.0),
 (u'anal', 0.0),
 (u'ten', 0.0),
 (u'tired', 0.0),
 (u'past', 0.0),
 (u'second', 0.0),
 (u'fabulous', 0.0),
 (u'uncomfortable', 0.0),
 (u'kris', 0.0),
 (u'public', 0.0),
 (u'full', 0.0),
 (u'alone', 0.0),
 (u'sexy', 0.0),
 (u'along', 0.0),
 (u'dry', 0.0),
 (u'bible', 0.0),
 (u'ahead', 0.0),
 (u'guilty', 0.0),
 (u'later', 0.0),
 (u'usually', 0.0),
 (u'weird', 0.0),
 (u'extra', 0.0),
 (u'private', 0.0),
 (u'moral', 0.0),
 (u'total', 0.0),
 (u'angry', 0.0),
 (u'live', 0.0),
 (u'acceptable', 0.0),
 (u'everywhere', 0.0),
 (u'basically', 0.0),
 (u'glad', 0.0),
 (u'male', 0.0),
 (u'embarrassing', 0.0),
 (u'awesome', 0.0),
 (u'huge', 0.0),
 (u'awkward', 0.0),
 (u'rather', 0.0),
 (u'truthful', 0.0),
 (u'mad', 0.0),
 (u'guys', 0.0),
 (u'short', 0.0),
 (u'natural', 0.0),
 (u'tall', 0.0),
 (u'cute', 0.0),
 (u'soon', 0.0),
 (u'murray', 0.0),
 (u'scott', 0.0),
 (u'cold', 0.0),
 (u'personal', 0.0),
 (u'amazing', 0.0),
 (u'easier', 0.0),
 (u'safe', 0.0),
 (u'bigger', 0.0),
 (u'mean', 0.0),
 (u'em', 0.0),
 (u'sexual', 0.0),
 (u'special', 0.0),
 (u'out', 0.0),
 (u'god', 0.0),
 (u'red', 0.0),
 (u'free', 0.0),
 (u'small', 0.0),
 (u'completely', 0.0),
 (u'scary', 0.0),
 (u'atm', 0.0),
 (u'american', 0.0),
 (u'major', 0.0),
 (u'done', 0.0),
 (u'delicious', 0.0),
 (u'open', 0.0),
 (u'top', 0.0),
 (u'wonderful', 0.0),
 (u'white', 0.0),
 (u'hundred', 0.0),
 (u'exactly', 0.0),
 (u'huh', 0.0),
 (u'forward', 0.0),
 (u'ridiculous', 0.0),
 (u'double', 0.0),
 (u'light', 0.0),
 (u'sad', 0.0),
 (u'miserable', 0.0),
 (u'apparently', 0.0),
 (u'clearly', 0.0),
 (u'afraid', 0.0),
 (u'potential', 0.0),
 (u'lily', 0.0),
 (u'most', 0.0),
 (u'regular', 0.0),
 (u'forever', 0.0),
 (u'clear', 0.0),
 (u'upset', 0.0),
 (u'hungry', 0.0),
 (u'professional', 0.0),
 (u'normally', 0.0),
 (u'anyway', 0.0),
 (u'bright', 0.0),
 (u'wasteful', 0.0),
 (u'hopefully', 0.0),
 (u'truly', 0.0),
 (u'gray', 0.0),
 (u'married', 0.0),
 (u'naked', 0.0),
 (u'twice', 0.0),
 (u'stupid', 0.0),
 (u'common', 0.0),
 (u'boring', 0.0),
 (u'fair', 0.0),
 (u'dumb', 0.0),
 (u'desperate', 0.0),
 (u'outside', 0.0),
 (u'many', 0.0),
 (u'barely', 0.0),
 (u'quiet', 0.0),
 (u'somewhere', 0.0),
 (u'tryclearblue', 0.0),
 (u'wear', 0.0),
 (u'tough', 0.0),
 (u'drunk', 0.0),
 (u'sweet', 0.0),
 (u'active', 0.0),
 (u'late', 0.0),
 (u'secret', 0.0),
 (u'basic', 0.0),
 (u'present', 0.0),
 (u'fur', 0.0),
 (u'straight', 0.0),
 (u'ugly', 0.0),
 (u'alcoholic', 0.0),
 (u'almost', 0.0),
 (u'sudden', 0.0),
 (u'in', 0.0),
 (u'rid', 0.0),
 (u'grown', 0.0),
 (u'funny', 0.0),
 (u'sensitive', 0.0),
 (u'same', 0.0),
 (u'belly', 0.0),
 (u'difficult', 0.0),
 (u'kim', 0.0),
 (u'jealous', 0.0),
 (u'off', 0.0),
 (u'older', 0.0),
 (u'kelly', 0.0),
 (u'less', 0.0),
 (u'accurate', 0.0),
 (u'touch', 0.0),
 (u'yes', 0.0),
 (u'yet', 0.0),
 (u'interested', 0.0),
 (u'easy', 0.0),
 (u'excited', 0.0),
 (u'couple', 0.0),
 (u'possible', 0.0),
 (u'early', 0.0),
 (u'possibly', 0.0),
 (u'disappointed', 0.0),
 (u'apart', 0.0),
 (u'necessary', 0.0),
 (u'often', 0.0),
 (u'scared', 0.0),
 (u'dead', 0.0),
 (u'supportive', 0.0),
 (u'gross', 0.0),
 (u'literally', 0.0),
 (u'laker', 0.0),
 (u'exciting', 0.0),
 (u'oh', 0.0),
 (u'favorite', 0.0),
 (u'down', 0.0),
 (u'female', 0.0),
 (u'kmart', 0.0),
 (u'constantly', 0.0),
 (u'low', 0.0),
 (u'biggest', 0.0),
 (u'complete', 0.0),
 (u'diaper', 0.0),
 (u'true', 0.0),
 (u'khloe', 0.0),
 (u'inside', 0.0),
 (u'uh', 0.0),
 (u'emotional', 0.0),
 (u'certain', 0.0),
 (u'deep', 0.0),
 (u'girlfriend', 0.0),
 (u'annoying', 0.0),
 (u'selfish', 0.0),
 (u'incredible', 0.0),
 (u'lately', 0.0),
 (u'sick', 0.0),
 (u'poor', 0.0),
 (u'welcome', 0.0),
 (u'luxurious', 0.0),
 (u'important', 0.0),
 (u'fresh', 0.0),
 (u'thebouncedryer', 0.0),
 (u'ago', 0.0),
 (u'younger', 0.0),
 (u'kardashian', 0.0),
 (u'serious', 0.0),
 (u'once', 0.0),
 (u'whole', 3.2229573307878824e-06),
 (u'as', 3.2229573307878824e-06),
 (u'own', 4.469794838318963e-06),
 (u'crazy', 8.742022904100009e-06),
 (u'else', 9.03342366757001e-06),
 (u'wrong', 9.475619231716793e-06),
 (u'there', 9.801183002788436e-06),
 (u'new', 1.2569698980848807e-05),
 (u'na', 1.2663678038649544e-05),
 (u'already', 1.4114724480578139e-05),
 (u'honestly', 1.6325464459463874e-05),
 (u'perfect', 1.6524555489457333e-05),
 (u'anymore', 1.782912565967765e-05),
 (u'comfortable', 1.8819632640770853e-05),
 (u'cool', 1.9084697889232416e-05),
 (u'better', 1.9357336430507162e-05),
 (u'next', 2.068722977306109e-05),
 (u'happy', 2.128293534244243e-05),
 (u'few', 2.2583559168925024e-05),
 (u'actually', 2.4489650167156945e-05),
 (u'pregnant', 2.5809781907342885e-05),
 (u'far', 2.656889313991179e-05),
 (u'absolutely', 2.656889313991179e-05),
 (u'gorgeous', 2.8229448961156277e-05),
 (u'hot', 2.88300755347979e-05),
 (u'fine', 2.9456816307293507e-05),
 (u'least', 3.0111412225233366e-05),
 (u'sure', 3.0466747946422746e-05),
 (u'nervous', 3.0795762503079576e-05),
 (u'best', 3.169622339498249e-05),
 (u'beautiful', 3.22622273841786e-05),
 (u'great', 3.299035167419889e-05),
 (u'back', 3.3015980732638744e-05),
 (u'okay', 3.3875338753387534e-05),
 (u'armenian', 3.474393718296157e-05),
 (u'healthy', 3.662198784150004e-05),
 (u'definitely', 3.8062178374592734e-05),
 (u'gon', 3.8714672861014324e-05),
 (u'single', 4.0448165675686606e-05),
 (u'maybe', 4.065040650406504e-05),
 (u'big', 4.1974032065495483e-05),
 (u'such', 4.765553546041351e-05),
 (u'smart', 5.01856870420556e-05),
 (u'always', 5.211590577444236e-05),
 (u'real', 5.2930216802168025e-05),
 (u'close', 5.313778627982358e-05),
 (u'before', 5.4200542005420054e-05),
 (u'um', 5.741582839557209e-05),
 (u'clean', 5.76601510695958e-05),
 (u'about', 5.76601510695958e-05),
 (u'sometimes', 6.302388605281402e-05),
 (u'yeah', 6.45244547683572e-05),
 (u'super', 7.13165026387106e-05),
 (u'black', 7.259001161440186e-05),
 (u'instead', 7.527853056308341e-05),
 (u'all', 7.970667941973537e-05),
 (u'very', 0.0001007104665641251),
 (u'half', 0.00010423181154888472),
 (u'fun', 0.00010986596352450011),
 (u'kimberly', 0.00011291779584462511),
 (u'pretty', 0.00011686194177483377),
 (u'two', 0.0001231830500123183),
 (u'honest', 0.0001231830500123183),
 (u'online', 0.0001231830500123183),
 (u'though', 0.0001231830500123183),
 (u'dramatic', 0.0001231830500123183),
 (u'adrienne', 0.0001231830500123183),
 (u'anywhere', 0.00013550135501355014),
 (u'and', 0.00013550135501355014),
 (u'strong', 0.00016260162601626016),
 (u'willing', 0.00019357336430507162),
 (u'proud', 0.00021557033752155703),
 (u'extremely', 0.00022583559168925022),
 (u'you', 0.0002463661000246366),
 (u'nude', 0.00030795762503079576),
 (u'awful', 0.00031269543464665416),
 (u'positive', 0.0006097560975609756),
 (u'changei', 0.0013550135501355014),
 (u'good', 0.001426443412860228)]
In [ ]:
 

We (roughly) calculate the each sentence's sentiment score by comparing the number of words with positive sentiment score vs negative sentiment score (according to our automatically induced lexicon)

In [389]:
final_message_sentiment = {}

for k, m in enumerate(msgs_adj_adv_only_tokenized):
    m_sent_score = sum([sentscores.get(w,0)>0 for w in m])-sum([sentscores.get(w,0)<0 for w in m])
    final_message_sentiment[msgs[k]]=m_sent_score

sorted(final_message_sentiment.items(), key=itemgetter(1), reverse=False)[:10]
Out[389]:
[(u"i couldn't be any more sorry, and i'll never excuse the way i acted the other night in vegas, but, like, i don't know what i ever did so bad to, like, deserve you to, like, hate me so much.",
  -9),
 (u"he just needs to be pushed a little bit so that he takes care of something that's made him feel really bad for a really long time.",
  -7),
 (u"i mean, honestly i really thought you brought me here to spend time with you and like, it's like a bonding thing and you really wanted to take me to lunch and hang out, but obviously, this is not really why i'm here.",
  -6),
 (u'now, i do not know what case you have him on, but whatever it is, it is going bad, and it sounds like it is going bad right now.',
  -6),
 (u"i understand that we're gonna fight 'cause we do so much stuff together, but i got you guys a little gift because i felt a little bad-- for you both.",
  -6),
 (u"khloe getting married has really made me think about my own love life, and you know, i'm still sad, but i don't really, it's not really my personality to get really mad and yell, and i just don't want to fight with my brother rob.",
  -6),
 (u"i'm just trying to keep busy and not think about it, but i just have so much to do, so i'm going to have to pass.",
  -6),
 (u"just as i feel like i'm getting all the moves down, and all i need to do is really put it together, one after the other, i fall, and i just get really mad.",
  -6),
 (u"she hit me really hard, but because i'm a bigger girl, i know not to hit her, because i will really, like, give her a concussion, and that's not what i, i'm not trying to hurt my sister",
  -6),
 (u"we've just been fighting so much lately, and i don't really understand what i've done, or i don't really understand, like, just everyone is fighting.",
  -6)]
In [390]:
sorted(final_message_sentiment.items(), key=itemgetter(1))[-10:]
Out[390]:
[(u"it's gonna be a pretty big game.", 4),
 (u"this is a great time to tell khloe that it's not always all about us and that maybe once in a while it's a great thing to help somebody else out.",
  4),
 (u"so, tonight, khloe, i ask you to honor that very same promise to his grandmother, that you will always support lamar and stand by him because you have realized very quickly what the rest of us already know: it's very easy to love lamar.",
  4),
 (u"i wouldn't be a good manager or a good mom if i didn't find out who's really single out there and who would be a great match for kim.",
  4),
 (u"they're always pretty strong women, actually.", 4),
 (u'i feel very at peace, very comfortable in my own skin.', 4),
 (u"i definitely feel protective over summer because she's so young and new to the industry, but i think the smart thing to do is let her learn her own lessons and kind of feel her way through on her own.",
  4),
 (u"i just want to say all you kids, i'm extremely proud of you because you all know where you're going, you all have good direction, and you all have very good work ethics.",
  4),
 (u'i feel very at peace, very comfortable in my own skin, very secure.', 5),
 (u"as far back as i can remember, khloe's always had body issues, and i'm always having to remind her how beautiful she is.",
  5)]

Pretty good considering that we had absolutely no sentiment labels to start with!

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: