%matplotlib inline
from __future__ import print_function
import json
from operator import itemgetter
from collections import defaultdict
from matplotlib import pyplot as plt
import numpy as np
from nltk.tokenize import TreebankWordTokenizer
from nltk import FreqDist,pos_tag
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import load_files
from sklearn.naive_bayes import MultinomialNB
tokenizer = TreebankWordTokenizer()
Using the movie review data, but this time we will not use the sentiment labels (we will pretend we don't have labels).
## loading movie review data:
## http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz
data = load_files('txt_sentoken')
print(data.data[0])
arnold schwarzenegger has been an icon for action enthusiasts , since the late 80's , but lately his films have been very sloppy and the one-liners are getting worse . it's hard seeing arnold as mr . freeze in batman and robin , especially when he says tons of ice jokes , but hey he got 15 million , what's it matter to him ? once again arnold has signed to do another expensive blockbuster , that can't compare with the likes of the terminator series , true lies and even eraser . in this so called dark thriller , the devil ( gabriel byrne ) has come upon earth , to impregnate a woman ( robin tunney ) which happens every 1000 years , and basically destroy the world , but apparently god has chosen one man , and that one man is jericho cane ( arnold himself ) . with the help of a trusty sidekick ( kevin pollack ) , they will stop at nothing to let the devil take over the world ! parts of this are actually so absurd , that they would fit right in with dogma . yes , the film is that weak , but it's better than the other blockbuster right now ( sleepy hollow ) , but it makes the world is not enough look like a 4 star film . anyway , this definitely doesn't seem like an arnold movie . it just wasn't the type of film you can see him doing . sure he gave us a few chuckles with his well known one-liners , but he seemed confused as to where his character and the film was going . it's understandable , especially when the ending had to be changed according to some sources . aside form that , he still walked through it , much like he has in the past few films . i'm sorry to say this arnold but maybe these are the end of your action days . speaking of action , where was it in this film ? there was hardly any explosions or fights . the devil made a few places explode , but arnold wasn't kicking some devil butt . the ending was changed to make it more spiritual , which undoubtedly ruined the film . i was at least hoping for a cool ending if nothing else occurred , but once again i was let down . i also don't know why the film took so long and cost so much . there was really no super affects at all , unless you consider an invisible devil , who was in it for 5 minutes tops , worth the overpriced budget . the budget should have gone into a better script , where at least audiences could be somewhat entertained instead of facing boredom . it's pitiful to see how scripts like these get bought and made into a movie . do they even read these things anymore ? it sure doesn't seem like it . thankfully gabriel's performance gave some light to this poor film . when he walks down the street searching for robin tunney , you can't help but feel that he looked like a devil . the guy is creepy looking anyway ! when it's all over , you're just glad it's the end of the movie . don't bother to see this , if you're expecting a solid action flick , because it's neither solid nor does it have action . it's just another movie that we are suckered in to seeing , due to a strategic marketing campaign . save your money and see the world is not enough for an entertaining experience .
## building the term documnet matrix
vec = CountVectorizer(min_df = 50)
X = vec.fit_transform(data.data)
terms = vec.get_feature_names()
len(terms)
2153
# PMI type measure via matrix multiplication
def getcollocations_matrix(X):
XX=X.T.dot(X) ## multiply X with it's transpose to get number docs in which both w1 (row) and w2 (column) occur
term_freqs = np.asarray(X.sum(axis=0)) ## number of docs in which a word occurs
pmi = XX.toarray() * 1.0 ## Casting to float, making it an array to use simple operations
pmi /= term_freqs.T ## dividing by the number of documents in which w1 occurs
pmi /= term_freqs ## dividing by the number of documents in which w2 occurs
return pmi # this is not technically PMI beacuse we are ignoring some normalization factor and not taking the log
# but it's sufficient for ranking
pmi_matrix = getcollocations_matrix(X)
pmi_matrix.shape
(2153, 2153)
def getcollocations(w,PMI_MATRIX=pmi_matrix,TERMS=terms):
if w not in TERMS:
return []
idx = TERMS.index(w)
col = PMI_MATRIX[:,idx].ravel().tolist()
return sorted([(TERMS[i],val) for i,val in enumerate(col)],key=itemgetter(1),reverse=True)
getcollocations("good")
[(u'good', 0.0012711337380982813), (u'trek', 0.0010038914000850665), (u'sean', 0.0009922470727116103), (u'nudity', 0.0009374840201587473), (u'nicely', 0.0009268742752181751), (u'trash', 0.0009217014608968155), (u'showed', 0.000916850400576306), (u'compared', 0.00091151987499156), (u'fairly', 0.0008716089901959017), (u'comparison', 0.0008698557537213697), (u'laughed', 0.0008665639627895953), (u'crap', 0.0008473706979212659), (u'pulp', 0.0008450365730278281), (u'parts', 0.0008435572066033899), (u'fifteen', 0.0008424927416009955), (u'sorry', 0.0008413817621615216), (u'pretty', 0.0008334590198961828), (u'nights', 0.0008333717375608706), (u'chris', 0.000833301911692621), (u'doctor', 0.0008330167404996009), (u'rating', 0.0008322781072402701), (u'average', 0.0008295313148071339), (u'forward', 0.0008295313148071339), (u'watched', 0.0008295313148071339), (u'cool', 0.0008275372491465399), (u'stupid', 0.0008213343650560753), (u'sadly', 0.0008174507616788748), (u'matt', 0.0008162941129751053), (u'hate', 0.0008140549843070009), (u'kills', 0.0008135787895223813), (u'terrific', 0.0008122494124153186), (u'horrible', 0.0008093970595933685), (u'agrees', 0.0008091330037872864), (u'subplot', 0.0008082612810941305), (u'totally', 0.0008068044294699522), (u'sad', 0.0008064887782847135), (u'technical', 0.0008033355890763824), (u'therefore', 0.0008002537389904116), (u'handled', 0.0007999051964211649), (u'scientist', 0.0007949675100235034), (u'lovely', 0.0007943816828237808), (u'barry', 0.000792934345036231), (u'villain', 0.0007926632563712613), (u'event', 0.0007924384511368963), (u'producers', 0.0007895539020453444), (u'okay', 0.0007863956864371629), (u'fit', 0.000785871771922548), (u'mentioned', 0.0007854073087003715), (u'detail', 0.0007852359533368501), (u'information', 0.0007839070924927416), (u'allen', 0.0007790149847387508), (u'seven', 0.0007784369946922017), (u'shouldn', 0.0007783256780906441), (u'naturally', 0.0007776856076316881), (u'comments', 0.0007747509449613798), (u'entertain', 0.0007747509449613798), (u'jail', 0.0007734819016444898), (u'fbi', 0.0007733651320337342), (u'climactic', 0.0007732919036337689), (u'bad', 0.0007712559966343031), (u'ended', 0.0007711136165812794), (u'judge', 0.0007694203499660373), (u'ones', 0.0007682591154179706), (u'nice', 0.0007668341805484552), (u'kill', 0.000764042000480255), (u'critics', 0.0007636954961716471), (u'danny', 0.0007634624490260348), (u'presented', 0.0007617330823469355), (u'rent', 0.0007604037052398728), (u'sub', 0.0007604037052398728), (u'genius', 0.0007595059440766616), (u'thankfully', 0.0007594300769361086), (u'wanted', 0.0007590994107197358), (u'breaking', 0.0007584286306808082), (u'batman', 0.0007559768139867969), (u'total', 0.0007557951979353888), (u'wasn', 0.0007554151148587302), (u'bigger', 0.0007552449284064951), (u'ensemble', 0.000752202124443757), (u'steals', 0.000752202124443757), (u'lot', 0.0007517244632705712), (u'kiss', 0.0007491175649023608), (u'directing', 0.0007486014304357063), (u'perspective', 0.0007479380707277437), (u'badly', 0.0007476696718985352), (u'crash', 0.0007476696718985352), (u'adds', 0.0007473255088352558), (u'really', 0.0007456730464617401), (u'job', 0.0007452354168096464), (u'army', 0.000744825652379645), (u'brown', 0.0007446186605355376), (u'mainly', 0.0007441383853416938), (u'pay', 0.0007420807243907193), (u'dumb', 0.000741226368392181), (u'explosions', 0.0007406529596492268), (u'yeah', 0.0007402779454924423), (u'driver', 0.0007399796387768184), (u'recommend', 0.0007395349929176807), (u'blame', 0.0007389503091672745), (u'twice', 0.0007382828701783492), (u'gary', 0.0007379596761595932), (u'wouldn', 0.0007373611687174524), (u'cares', 0.0007367547861773888), (u'killed', 0.000736304171629268), (u'fiction', 0.0007362894228326887), (u'price', 0.0007357152732515652), (u'murphy', 0.0007352663926699596), (u'hits', 0.0007338161630986185), (u'accent', 0.0007334803204610447), (u'acts', 0.0007329420521241115), (u'saw', 0.0007328073077446421), (u'suspenseful', 0.0007327526614129683), (u'guilty', 0.0007299875570302779), (u'advice', 0.000729680323209979), (u'ending', 0.0007295169009651391), (u'aren', 0.0007290304055131928), (u'jackson', 0.000728289303944846), (u'ok', 0.0007282513286969607), (u'actor', 0.0007277390106091889), (u'news', 0.0007277251988989857), (u'fights', 0.0007273426745772697), (u'thinks', 0.00072659677209384), (u'throw', 0.0007247925124324958), (u'saying', 0.0007246200014638788), (u'cop', 0.0007238458347956482), (u'loves', 0.0007235356468040002), (u'extra', 0.0007228772886176453), (u'villains', 0.0007228772886176453), (u'performance', 0.0007227970372811597), (u'range', 0.0007226977363850031), (u'flash', 0.0007224950161223425), (u'gives', 0.0007223630680223859), (u'thrills', 0.0007222643344441425), (u'said', 0.0007214337497047879), (u'surprised', 0.0007209945072622753), (u'treat', 0.0007200792663256371), (u'guys', 0.0007196493682561889), (u'writing', 0.0007196184155951888), (u'particular', 0.0007191066917321584), (u'witty', 0.0007183570148845284), (u'natural', 0.000717863637813866), (u'acted', 0.0007174324884818456), (u'liked', 0.0007171432011881029), (u'cliched', 0.0007164134082425248), (u'grace', 0.0007158548012965267), (u'national', 0.000715805247454543), (u'acting', 0.0007155453571609738), (u'aliens', 0.0007152403336559289), (u'chemistry', 0.0007146731327569155), (u'guess', 0.0007139107996902104), (u'instance', 0.0007133969307341352), (u'violent', 0.0007131582166867087), (u'mediocre', 0.0007118275471655811), (u'alien', 0.0007110268412632576), (u'scary', 0.0007110268412632576), (u'ask', 0.0007106124899571602), (u'probably', 0.0007102573316947909), (u'nevertheless', 0.0007100225660637334), (u'mean', 0.0007095577775416395), (u'allowed', 0.0007091154787867436), (u'loud', 0.0007090011237667811), (u'flick', 0.0007089106899499742), (u'fun', 0.000708794736453356), (u'slightly', 0.000708672447749141), (u'plain', 0.0007081364882499924), (u'allows', 0.0007078066110039132), (u'prison', 0.0007071414486880486), (u'trailer', 0.00070639776026545), (u'stuff', 0.0007058992438503015), (u'fantastic', 0.0007056402742839906), (u'dog', 0.0007054282047178778), (u'critic', 0.0007051016175860639), (u'hey', 0.0007051016175860639), (u'overall', 0.0007051016175860639), (u'working', 0.0007049068919253111), (u'developed', 0.0007046989324817885), (u'person', 0.0007034519814486633), (u'visuals', 0.0007030783704767782), (u'emotion', 0.0007022736699219486), (u'menace', 0.0007016129344864077), (u'murdered', 0.0007013310207005769), (u'requires', 0.0007008109383715443), (u'track', 0.0007006720814390355), (u'usual', 0.0007006493308681725), (u'lines', 0.0006999170468685193), (u'saving', 0.0006999170468685193), (u'yes', 0.0006999170468685193), (u'able', 0.0006998405782738653), (u'get', 0.0006997175379902146), (u'maybe', 0.0006996916307503651), (u'think', 0.0006990926451643161), (u'bring', 0.0006989242567311171), (u'remember', 0.0006988801327250104), (u'de', 0.0006986421524297788), (u'annoying', 0.0006978596775361603), (u'wonderfully', 0.0006977822236318833), (u'disappointing', 0.00069756042381509), (u'included', 0.0006972871921567213), (u'friends', 0.0006965130357913436), (u'tell', 0.0006964706559291111), (u'williams', 0.0006959289155473312), (u'realistic', 0.0006955301024152124), (u'except', 0.0006951165184263484), (u'episode', 0.0006940976307569897), (u'impressive', 0.0006938603053760607), (u'terribly', 0.0006936598063473447), (u'very', 0.0006935024277037634), (u'language', 0.000693488179178764), (u'doing', 0.000693268245804233), (u'feeling', 0.0006931194985944053), (u'somewhere', 0.0006923647194453244), (u'study', 0.0006912760956726117), (u'theatre', 0.0006912760956726117), (u'dull', 0.0006902443403059361), (u'decided', 0.000689946718565549), (u'hotel', 0.000689668476845466), (u'seemingly', 0.0006890461727833452), (u'thrillers', 0.0006890461727833452), (u'mood', 0.0006884254725976731), (u'confused', 0.0006883344952654942), (u'anti', 0.0006881339316013725), (u'brilliant', 0.0006881339316013725), (u'reason', 0.000688112360680975), (u'smart', 0.0006876378004322295), (u'direction', 0.0006873678209267594), (u'jackie', 0.0006873678209267594), (u'actually', 0.0006873117883139156), (u'drop', 0.000686248633158629), (u'planet', 0.0006861555320009626), (u'brian', 0.0006859585872443607), (u'above', 0.000685583233708249), (u'lawyer', 0.000685264999188502), (u'better', 0.0006851280870126166), (u'warm', 0.0006849917675301333), (u'biggest', 0.0006847808840354194), (u'hundred', 0.0006846925138090629), (u'screenplay', 0.0006846474207826003), (u'did', 0.0006843905146410103), (u'lose', 0.0006843633347158854), (u'will', 0.0006842884672687007), (u'direct', 0.0006840940063669222), (u'scene', 0.0006840515924536997), (u'george', 0.0006839995051918474), (u'considered', 0.000683922094654818), (u'sheer', 0.0006838028405842591), (u'criminal', 0.0006834206854945138), (u'general', 0.0006833962645302296), (u'develops', 0.000683143435723522), (u'rules', 0.000683143435723522), (u'guy', 0.0006827392523224378), (u'talent', 0.0006825963958166327), (u'looks', 0.0006825376168108359), (u'had', 0.0006825121813937092), (u'great', 0.0006824846052572631), (u'tension', 0.0006824512944512591), (u'learn', 0.0006824341921233109), (u'fact', 0.0006821735781395314), (u'entertainment', 0.0006821027635973353), (u'agent', 0.0006814007228772886), (u'explained', 0.0006814007228772886), (u'hit', 0.0006810888689995415), (u'reasons', 0.0006807222621508924), (u'moved', 0.0006804749066777271), (u'offensive', 0.0006802156781418498), (u'threatening', 0.0006799437006615852), (u'feel', 0.0006798736033728572), (u'huge', 0.000679823000596379), (u'running', 0.0006792911231160586), (u'master', 0.0006792539027043922), (u'cops', 0.0006791217906937525), (u'why', 0.000678920708442264), (u'gore', 0.0006787074393876551), (u'failure', 0.0006781978992679947), (u'soundtrack', 0.0006781978992679947), (u'besides', 0.0006781089319455142), (u'either', 0.0006780236523876963), (u'aforementioned', 0.0006779823246019844), (u'feels', 0.000677834616034533), (u'me', 0.0006772941322099265), (u'definitely', 0.0006772162428792703), (u'capable', 0.0006770439407617049), (u'intelligent', 0.0006762483544623375), (u'rated', 0.0006761248387811572), (u'flicks', 0.0006759144046576647), (u'girls', 0.0006755789965569017), (u'care', 0.0006753585539959397), (u'anyway', 0.0006753514107461222), (u'well', 0.0006750278432664558), (u'relief', 0.0006745639263266804), (u'done', 0.0006741033421380078), (u'asking', 0.0006739941932807963), (u'evil', 0.0006738733408165179), (u'jump', 0.0006732428062202827), (u'supporting', 0.0006732428062202827), (u'gets', 0.0006727355113724907), (u'feet', 0.0006727296638374928), (u'sure', 0.0006725072227117109), (u'although', 0.0006724942545826389), (u'credit', 0.0006722064102747464), (u'weird', 0.0006719203649937786), (u'happening', 0.0006718035295973268), (u'necessary', 0.0006717400320992552), (u'right', 0.0006715253500819656), (u'1996', 0.0006704431174468618), (u'hurt', 0.0006704431174468618), (u'basically', 0.0006703084795005513), (u'dies', 0.0006700060619596082), (u'roles', 0.0006697805845704244), (u'interesting', 0.0006689558957182921), (u'star', 0.0006687483070094306), (u'usually', 0.0006687411040075134), (u'whom', 0.0006685774776057497), (u'try', 0.0006682335591501913), (u'though', 0.0006680374524563834), (u'haunting', 0.0006679343054291209), (u'major', 0.0006678430076837096), (u'role', 0.0006678107603149175), (u'path', 0.0006675134798838656), (u'regular', 0.0006675134798838656), (u'sign', 0.0006674389889252802), (u'loved', 0.0006670457995356336), (u'don', 0.0006670226685137735), (u'thing', 0.0006670088013375039), (u'expecting', 0.0006667751707626963), (u'make', 0.0006663531085448293), (u'knows', 0.0006663202799443585), (u'isn', 0.0006661965036826523), (u'amount', 0.0006661387831026985), (u'relatively', 0.0006661387831026985), (u'bruce', 0.0006656732773143668), (u'ideas', 0.0006653532420848887), (u'he', 0.0006653376447000585), (u'want', 0.0006651063577650056), (u'reminded', 0.0006650552782505471), (u'subplots', 0.0006650552782505471), (u'grow', 0.0006648449508380707), (u'rise', 0.0006648449508380707), (u'sometimes', 0.0006648229310006633), (u'special', 0.0006647811930510132), (u'individual', 0.0006646885535313573), (u'forever', 0.0006645170210014137), (u'scenes', 0.0006644715123710205), (u'action', 0.0006642620639475215), (u'aspect', 0.0006642051436742437), (u'kind', 0.0006640702385978397), (u'getting', 0.0006640336879613757), (u'just', 0.0006639106047940057), (u'believable', 0.0006636250518457072), (u'boring', 0.0006636250518457072), (u'cliche', 0.0006636250518457072), (u'funny', 0.0006636250518457072), (u'irritating', 0.0006636250518457072), (u'weight', 0.0006636250518457072), (u'went', 0.0006636250518457072), (u'also', 0.0006635828794351934), (u'effects', 0.0006633694181585554), (u'jack', 0.0006633457483727081), (u'bit', 0.0006630408748634487), (u'need', 0.00066283752211646), (u'but', 0.0006625489862068561), (u'disappointment', 0.0006623869454056965), (u'hardly', 0.0006622308815687205), (u'tight', 0.0006621697337495544), (u'likes', 0.000661949231007713), (u'budget', 0.0006618118686439429), (u'frightening', 0.0006616499772866426), (u'heard', 0.0006615893921774687), (u'black', 0.0006614045091292062), (u'serves', 0.0006612206132520633), (u'typical', 0.0006606624400071102), (u'myself', 0.0006603810746369642), (u'again', 0.0006602878569010809), (u'superb', 0.0006600571752228807), (u'we', 0.0006600378894032979), (u'musical', 0.0006598544549602202), (u'nobody', 0.0006598544549602202), (u'afraid', 0.0006595453896417377), (u'richard', 0.0006595031571137463), (u'system', 0.0006593710451031064), (u'him', 0.000658615728676154), (u'longer', 0.0006585592117552819), (u'terrible', 0.0006584042253888791), (u'decides', 0.0006583581863548682), (u'knowing', 0.0006583581863548682), (u'does', 0.0006581230584311701), (u'makes', 0.0006581059926947726), (u'wars', 0.0006580639480592907), (u'sounds', 0.0006580116820462604), (u'nothing', 0.0006577440462556566), (u'built', 0.0006576998281685133), (u'reading', 0.0006575553105178501), (u'confusing', 0.0006574476909907604), (u'wasted', 0.0006572981180887036), (u'grown', 0.0006572440417318061), (u'drawn', 0.0006571006482461006), (u'fly', 0.000656712290888981), (u'responsible', 0.000656712290888981), (u'played', 0.0006564938091899696), (u'was', 0.0006564883957972653), (u'survive', 0.0006562747743727326), (u'childhood', 0.0006560838580747332), (u'gave', 0.0006559768907872017), (u'too', 0.0006556821711522463), (u'basic', 0.0006555973294443478), (u'calls', 0.0006553297386976359), (u'surprising', 0.0006553297386976359), (u'some', 0.0006548712038000038), (u'brief', 0.0006547372163299164), (u'became', 0.0006544925970037938), (u'beat', 0.0006542782201295704), (u'started', 0.0006538658599067997), (u'anyone', 0.0006534515545886386), (u'jerry', 0.0006531742636276646), (u'however', 0.0006529728297040989), (u'heroes', 0.0006529479161105658), (u'like', 0.000652946803213366), (u'admit', 0.0006528050781743098), (u'shoot', 0.0006528050781743098), (u'case', 0.0006527621417708519), (u'then', 0.0006527316279785068), (u'depth', 0.0006526033071035145), (u'script', 0.0006525362013649949), (u'movies', 0.0006524133101944996), (u'times', 0.0006520875564461009), (u'buy', 0.0006517746044913196), (u'provide', 0.0006517746044913196), (u'performances', 0.0006514996521165078), (u'tough', 0.0006513116963915387), (u'thrown', 0.0006512548480284078), (u'hill', 0.0006512208452691519), (u'beginning', 0.000651103824452392), (u'loving', 0.0006510856249939714), (u'ups', 0.0006510245761777506), (u'see', 0.0006509615377774679), (u'course', 0.000650951656758376), (u'problem', 0.0006504279627465027), (u'best', 0.0006503077449163204), (u'room', 0.00065000588100559), (u'filmmakers', 0.0006499420610860018), (u'places', 0.0006493805747227564), (u'never', 0.0006493165422671562), (u'supposedly', 0.0006487360282466047), (u'kevin', 0.0006486366355657029), (u'especially', 0.0006485261265981212), (u'even', 0.000648425062841444), (u'occasionally', 0.0006482891787988526), (u'company', 0.0006482129946307112), (u'money', 0.0006480713396930734), (u'fair', 0.0006478244553731903), (u'science', 0.0006477404096472726), (u'not', 0.0006476204670378808), (u'next', 0.0006475053385230357), (u'know', 0.0006468572043976747), (u'seems', 0.000646841698041768), (u'memories', 0.0006467532284936977), (u'unbelievable', 0.0006467532284936977), (u'sick', 0.0006463880375120525), (u'actors', 0.0006462354435466341), (u'supposed', 0.0006459044138513752), (u'idea', 0.0006457879795324968), (u'likable', 0.0006457743779827689), (u'extremely', 0.0006455626764426486), (u've', 0.0006454666510550725), (u'plays', 0.0006453135893114008), (u'creature', 0.0006451910226277709), (u'held', 0.0006451910226277709), (u'mike', 0.0006451910226277709), (u'seconds', 0.0006451910226277709), (u'time', 0.0006449425340547377), (u'entertaining', 0.0006446039516335691), (u'my', 0.0006441661752358124), (u'help', 0.0006441611885932493), (u'awful', 0.0006441436346040245), (u'could', 0.0006440929900534719), (u'considering', 0.0006440669964559454), (u'dr', 0.0006440243119177749), (u'should', 0.000643552677141085), (u'slowly', 0.0006433766496732495), (u'fans', 0.0006433099992381855), (u'pull', 0.0006432382652953624), (u'mistake', 0.0006431873237997343), (u'moral', 0.0006431197833898005), (u'occur', 0.0006428867689755288), (u'characterization', 0.0006425946804843995), (u'entirely', 0.0006425946804843995), (u'fire', 0.0006425946804843995), (u'bond', 0.0006422177921087489), (u'nomination', 0.0006422177921087489), (u'doesn', 0.0006421234962308942), (u'series', 0.000641827148682892), (u'today', 0.000641765780712276), (u'albeit', 0.0006417129039074055), (u'present', 0.0006415906262961427), (u'ahead', 0.0006415042167841836), (u'speed', 0.0006414399120310978), (u'anywhere', 0.0006410014705327854), (u'efforts', 0.0006410014705327854), (u'mad', 0.0006410014705327854), (u'possible', 0.0006410014705327854), (u'realize', 0.0006410014705327854), (u'selling', 0.0006410014705327854), (u'it', 0.0006405730734197016), (u'flashbacks', 0.000640446970990802), (u'holes', 0.000640446970990802), (u'predictable', 0.0006403799435736392), (u'flaw', 0.0006403399623072613), (u'generally', 0.0006402693157977394), (u'used', 0.0006399878692194824), (u'animals', 0.000639924157136932), (u'got', 0.0006397980885480555), (u'things', 0.0006396737955731069), (u'non', 0.0006396386041886335), (u'pieces', 0.0006394303884971658), (u'everything', 0.000639365173771159), (u'so', 0.0006390972320839649), (u'hasn', 0.0006390776966116186), (u'place', 0.0006386716708311836), (u'appearance', 0.0006386074407642222), (u'largely', 0.0006386074407642222), (u'stuck', 0.0006384594951043672), (u'wants', 0.0006382347851870402), (u'revolves', 0.0006381010113901031), (u'theme', 0.0006378593064615462), (u'seemed', 0.0006378000203469945), (u'exciting', 0.0006377021982579842), (u'fake', 0.0006377021982579842), (u'saved', 0.0006376248166054836), (u'go', 0.0006376136925584035), (u'frank', 0.0006375101771202974), (u'helped', 0.0006375101771202974), (u'oh', 0.0006375101771202974), (u'decent', 0.0006373228394249932), (u'difference', 0.0006373228394249932), (u'happened', 0.0006373228394249932), (u'trust', 0.0006373228394249932), (u'directors', 0.0006372308736472984), (u'work', 0.0006371939070111661), (u'etc', 0.0006370800497718789), (u'our', 0.0006369861926514015), (u'strikes', 0.000636961545298335), (u'seen', 0.0006367336520799815), (u'little', 0.0006363792864759592), (u'funniest', 0.0006363527894410891), (u'damn', 0.0006362882244259267), (u'couple', 0.0006362330341904833), (u'this', 0.0006362222842527883), (u'way', 0.0006359903405904665), (u'began', 0.0006359740080188027), (u'pulls', 0.0006359740080188027), (u'making', 0.0006359280760523128), (u'instead', 0.0006357293085159097), (u'always', 0.0006355965193658757), (u'problems', 0.0006355965193658757), (u'or', 0.0006355875258306249), (u'entire', 0.000635364058522621), (u'turn', 0.0006352884449487142), (u'personal', 0.0006352463489707263), (u'later', 0.0006351777737724782), (u'exact', 0.000635109912899212), (u'attention', 0.0006350561310452954), (u'happens', 0.0006350094367225153), (u'ever', 0.0006349762899425742), (u'common', 0.0006349105063331525), (u'describe', 0.000634717142390307), (u'straight', 0.0006345914558274575), (u'minor', 0.000634529550505457), (u'been', 0.0006344902934719933), (u'face', 0.0006343474760289847), (u'fight', 0.0006343474760289847), (u'twist', 0.0006343474760289847), (u'have', 0.0006342080874479353), (u'move', 0.0006342056273089425), (u'society', 0.0006341900697073896), (u'followed', 0.0006339989334597381), (u'combination', 0.0006338871367865835), (u'nearly', 0.0006336322258054493), (u'hot', 0.0006335790357188346), (u'may', 0.0006335218734437214), (u'if', 0.0006334845249732937), (u'social', 0.0006334602767618115), (u'strong', 0.0006329819174554437), (u'add', 0.0006326933757003564), (u'subtle', 0.0006325922256802605), (u'talking', 0.0006325176275404396), (u'patrick', 0.0006324149627737556), (u'took', 0.0006322647216517789), (u'eddie', 0.0006318913035611389), (u'government', 0.0006318913035611389), (u'put', 0.0006318847691429929), (u'before', 0.0006317650285653122), (u'learned', 0.000631720001276202), (u'together', 0.000631683328804283), (u'cross', 0.0006314900549657912), (u'deserves', 0.0006314900549657912), (u'give', 0.0006313901451384067), (u'character', 0.000631182985573547), (u'ability', 0.0006310363216211411), (u'player', 0.0006309111408392287), (u'poor', 0.0006306710681067937), (u'formula', 0.0006306130913584845), (u'needs', 0.0006305939406678666), (u'interested', 0.0006305786823940408), (u'do', 0.0006304871168155528), (u'game', 0.0006304437992534219), (u'suspense', 0.0006303616674400746), (u'short', 0.0006301762085067098), (u'wild', 0.0006300908072045678), (u'follow', 0.0006299954039481207), (u'second', 0.0006299059485256166), (u'all', 0.0006294991237565685), (u'ago', 0.00062944335947677), (u'say', 0.0006293887843642655), (u'because', 0.0006290448272023219), (u'powerful', 0.0006289852826559587), (u'seeing', 0.0006288777224349069), (u'audiences', 0.0006284328142478287), (u'worker', 0.0006284328142478287), (u'days', 0.0006283205941024273), (u'were', 0.0006281126594862564), (u'shot', 0.0006281077627921833), (u'charming', 0.0006280737097825443), (u'oliver', 0.0006280737097825443), (u'film', 0.0006279666236763682), (u'singing', 0.0006279091202359556), (u'leaves', 0.0006278772935280517), (u'films', 0.0006278191103276649), (u'quite', 0.0006278043814335809), (u'laughable', 0.0006277534274216149), (u'battle', 0.0006275584729410491), (u'powers', 0.0006275584729410491), (u'details', 0.0006273766246440509), (u'hell', 0.000627333056822895), (u'taking', 0.0006272902091310146), (u'mark', 0.0006271456627005742), (u'perfectly', 0.0006271456627005742), (u'robert', 0.000627137076486493), (u'made', 0.0006271226129930316), (u'generated', 0.000627086172503012), (u'big', 0.0006268262942715562), (u'starring', 0.0006266568084684328), (u'suppose', 0.0006266568084684328), (u'dramatic', 0.0006264244207177584), (u'what', 0.0006260189663520424), (u'dozen', 0.0006259190829908375), (u'touches', 0.0006259190829908375), (u'wrong', 0.0006259190829908375), (u'seriously', 0.0006257867813457326), (u'thoughts', 0.0006257867813457326), (u'seem', 0.0006257614273719321), (u'back', 0.0006256700813097204), (u'loose', 0.0006256634493036858), (u'sam', 0.0006256241759718609), (u'violence', 0.0006255706449948188), (u'any', 0.0006253106130995327), (u'gotten', 0.0006251540343474053), (u'record', 0.0006251540343474053), (u'robin', 0.0006250970571295465), (u'surprises', 0.0006250693710166432), (u'completely', 0.0006249764337694657), (u'join', 0.0006246470744029624), (u'results', 0.0006245882840900773), (u'people', 0.0006245715157190483), (u'bunch', 0.0006245321967800837), (u'industry', 0.0006245321967800837), (u'cliches', 0.0006244274182888866), (u'amazing', 0.0006244026472868915), (u'point', 0.0006242677266906242), (u'ass', 0.0006242017814390315), (u'disturbing', 0.000624161911626727), (u'which', 0.0006240510808640825), (u'sense', 0.0006240167998774386), (u'monster', 0.000623891198951584), (u'write', 0.000623891198951584), (u'ship', 0.00062371956814097), (u'hold', 0.0006237077554940856), (u'order', 0.0006236089285609969), (u'movie', 0.0006234062228935263), (u'unlike', 0.0006233902994508702), (u're', 0.0006232476883776215), (u'save', 0.0006229081301665292), (u'heart', 0.0006228812876201978), (u'killer', 0.0006228611418740851), (u'between', 0.0006227931995624544), (u'take', 0.0006223776384022585), (u'asks', 0.0006221484861053505), (u'edge', 0.0006221484861053505), (u'finally', 0.0006221484861053505), (u'lacking', 0.0006221484861053505), (u'quiet', 0.0006221484861053505), (u'shooting', 0.0006221484861053505), (u'stunning', 0.0006221484861053505), (u'tommy', 0.0006221484861053505), (u'tradition', 0.0006221484861053505), (u'going', 0.0006216814076623284), (u'they', 0.000621589734442527), (u'cast', 0.0006213394503626908), (u'sound', 0.000621302025580037), (u'mission', 0.0006211748578015862), (u'there', 0.0006210483119477813), (u'doubt', 0.0006209634413699117), (u'kids', 0.0006208839566620469), (u'brought', 0.0006208275763683964), (u'inside', 0.0006208275763683964), (u'six', 0.0006207377185631616), (u'small', 0.0006206982565340094), (u'thought', 0.0006206420733060639), (u'race', 0.0006205155504462813), (u'can', 0.000620277579253773), (u'one', 0.0006202348373100844), (u'explain', 0.000620135060583974), (u'using', 0.0006200746578183327), (u'many', 0.0006198587703310405), (u'humanity', 0.0006197086881206236), (u'much', 0.0006196181929293892), (u'fan', 0.0006195233870078595), (u'accept', 0.00061938338172266), (u'trying', 0.0006192172800459613), (u'1995', 0.0006191429378632956), (u'lee', 0.00061902994732788), (u'car', 0.0006189182239760392), (u'claims', 0.0006188566951735761), (u'out', 0.0006185562072468923), (u'effectively', 0.0006185101908649683), (u'frankly', 0.0006183778892198636), (u'hard', 0.000618262266146714), (u'told', 0.0006182356025449395), (u'born', 0.0006181603547841623), (u'fully', 0.0006180821561308057), (u'air', 0.0006180282974556462), (u'still', 0.0006179889451285238), (u'rob', 0.0006177360854946742), (u'against', 0.000617664533052339), (u'silent', 0.0006176401637422683), (u'failed', 0.0006175399788008664), (u'plot', 0.0006173511305174044), (u'important', 0.0006173256296239136), (u'none', 0.0006170067630796864), (u'broken', 0.0006169639153878059), (u'shock', 0.0006169639153878059), (u'south', 0.0006169639153878059), (u'books', 0.0006168309776770996), (u'spend', 0.0006166182773399696), (u'means', 0.0006164822885998373), (u'girlfriend', 0.0006164407018291546), (u'same', 0.0006164275804859909), (u'suspects', 0.0006163878519747455), (u'five', 0.000616306716282765), (u'being', 0.0006162586897745075), (u'weren', 0.0006162232624281567), (u'obsessed', 0.0006160489911435334), (u'whatever', 0.0006160489911435334), (u'van', 0.0006160232548778717), (u'college', 0.0006159579539052972), (u'recently', 0.0006159579539052972), (u'logic', 0.0006158641579628722), (u'them', 0.0006158507656812686), (u'marry', 0.000615458717437551), (u'speech', 0.000615458717437551), (u'far', 0.00061529015633726), (u'would', 0.0006151668925359685), (u'shows', 0.0006150671212228505), (u'those', 0.0006149973540811511), (u'here', 0.0006148167699391258), (u'must', 0.0006147659258603031), (u'long', 0.0006147065185682051), (u'exist', 0.0006146527212125149), (u'something', 0.0006145255546073584), (u'land', 0.000614467640597877), (u'no', 0.0006144303549400738), (u'telling', 0.0006143521391616743), (u'she', 0.0006141595264514455), (u'winner', 0.0006140686356364499), (u'almost', 0.0006140554976682077), (u'throughout', 0.0006139081088059418), (u'liners', 0.0006138531729572792), (u'chance', 0.0006137787755299422), (u'standing', 0.0006136259041039073), (u'that', 0.0006135531164153746), (u'fascinating', 0.0006135075349094428), (u'ex', 0.0006133504267058808), (u'quickly', 0.000613202560161352), (u'minutes', 0.000613131841379186), (u'obviously', 0.0006130527480043951), (u'mess', 0.0006130184244643914), (u'cute', 0.0006128626878052707), (u'plenty', 0.0006128626878052707), (u'comedies', 0.000612721993891633), (u'enough', 0.000612576970934499), (u'drama', 0.0006122731133100275), (u'notice', 0.0006122731133100275), (u'terms', 0.0006122731133100275), (u'decide', 0.0006121138331036512), (u'destroy', 0.0006117793446702613), (u'50', 0.0006116035965103445), (u'style', 0.0006116035965103445), (u'succeeds', 0.0006115134692488488), (u'theater', 0.0006115134692488488), (u'has', 0.0006113816301969087), (u'talented', 0.0006113753521468163), (u'superior', 0.0006112336003842039), (u'off', 0.000610998871659018), (u'introduced', 0.0006108366954488896), (u'certain', 0.0006106851136645484), (u'remarkable', 0.0006106272178441403), (u'taste', 0.0006106272178441403), (u'john', 0.0006101940874583805), (u'end', 0.0006100413906444177), (u'smith', 0.0006098461149111769), (u'read', 0.0006098176152095687), (u'other', 0.0006097992006179753), (u'sweet', 0.0006096555446172912), (u'use', 0.0006096429888972028), (u'visually', 0.0006093470769262281), (u'ed', 0.000609187059311489), (u'fox', 0.0006090702897007335), (u'play', 0.0006088723226785255), (u'credits', 0.000608867812346123), (u'tried', 0.0006085813851622432), (u'part', 0.0006082067833354827), (u'office', 0.0006081900264811919), (u'main', 0.0006081150616067336), (u'despite', 0.0006080087477847743), (u'where', 0.0006079708396391428), (u'meeting', 0.000607944182769612), (u'ways', 0.0006078840587343283), (u'involved', 0.0006076402223690117), (u'figures', 0.0006075440615488869), (u'door', 0.0006073354269123659), (u'halfway', 0.0006073354269123659), (u'screenwriter', 0.0006073354269123659), (u'willing', 0.0006073354269123659), (u'opening', 0.0006072386095320195), (u'married', 0.0006071569563196793), (u'truth', 0.0006070411277231013), (u'humor', 0.0006069741327857078), (u'highly', 0.0006068997487008076), (u'effort', 0.0006068383443891115), (u'comic', 0.0006066880695697419), (u'led', 0.0006064640704892491), (u'friend', 0.0006064299831964133), (u'worse', 0.0006060657361243958), (u'than', 0.0006058864534585655), (u'driving', 0.0006057761575236307), (u'final', 0.0006057761575236307), (u'paced', 0.0006057761575236307), (u'yet', 0.0006057761575236307), (u'points', 0.0006056087513009137), (u'editing', 0.0006055578598092078), (u'disaster', 0.0006054973100782), (u'works', 0.0006053961127561814), (u'hope', 0.0006052028074954771), (u'conclusion', 0.0006051498935888108), (u'manage', 0.0006050699002122624), (u'pg', 0.0006050699002122624), (u'comes', 0.0006048901606608638), (u'generation', 0.0006048665837135352), (u'past', 0.0006048665837135352), (u'adaptation', 0.0006046583680220675), (u'score', 0.0006045405100835009), (u'students', 0.000604372815073769), (u'value', 0.000604372815073769), (u'you', 0.000604281417718327), (u'only', 0.000604277821507802), (u'watch', 0.0006039208079607493), (u'how', 0.0006036931915274067), (u'talk', 0.0006036321621141198), (u'woody', 0.000603555542842432), (u'about', 0.0006033704212867672), (u'owner', 0.0006032955016779157), (u'is', 0.0006032580875393416), (u'are', 0.0006032575189779611), (u'll', 0.0006030994567791901), (u'came', 0.0006030916856300515), (u'field', 0.0006028570601796031), (u'90', 0.0006025840683032954), (u'enjoyed', 0.0006025840683032954), (u'multiple', 0.0006025840683032954), (u'everyone', 0.000602438547840305), (u'obvious', 0.0006022624614353165), (u'wonderful', 0.0006020791801019521), (u'look', 0.000602031109907932), (u'wait', 0.0006019862666482326), (u'likely', 0.0006019160150124935), (u'jeff', 0.0006018168362326266), (u'dialogue', 0.0006017698266375452), (u'didn', 0.0006017325599637242), (u'now', 0.0006016610695602147), (u'island', 0.0006015685107379979), (u'over', 0.0006014962542014384), (u'1998', 0.0006014102032351721), (u'agree', 0.0006014102032351721), (u'date', 0.0006014102032351721), (u'opera', 0.0006014102032351721), (u'remake', 0.0006011771888209005), (u'be', 0.0006011213317860572), (u'chief', 0.0006011096484109667), (u'games', 0.0006011096484109667), (u'producer', 0.0006010587069153386), (u'while', 0.000600992473040906), (u'due', 0.0006009869729725155), (u'phone', 0.0006009869729725155), (u'building', 0.0006006950900327521), (u'given', 0.0006006665994669187), (u'falls', 0.0006004557215967957), (u'mary', 0.0006003950425352334), (u'figure', 0.0006003187146630575), (u'travel', 0.0006003187146630575), (u'naked', 0.0006001903042428087), (u'am', 0.0006000416871066832), (u'addition', 0.0006000008053702085), (u'unnecessary', 0.0005998149507066969), (u'super', 0.0005997287208402928), (u'hero', 0.0005996418225253119), (u'forces', 0.0005995622374348592), (u'onto', 0.0005994932191043153), (u'stands', 0.0005994932191043153), (u'choice', 0.0005994659892160929), (u'imagine', 0.0005994659892160929), (u'worth', 0.0005993948842101705), (u'enjoy', 0.0005993089675258589), (u'without', 0.000599238187956086), (u'position', 0.00059910594958293), (u'for', 0.0005988931282437008), (u'released', 0.0005988905987743094), (u'several', 0.0005988859731006158), (u'law', 0.0005988595053420486), (u'technology', 0.000598522594227932), (u'otherwise', 0.0005982196981782216), (u'early', 0.0005981933738790719), (u'who', 0.0005981748562238021), (u'version', 0.000598001170434595), (u'immediately', 0.0005979750275450198), (u'studio', 0.0005979750275450198), (u'yourself', 0.0005979538227568091), (u'pacing', 0.0005977505062580819), (u'witness', 0.0005977505062580819), (u'most', 0.0005976870249575253), (u'audience', 0.0005976437317292098), (u'happen', 0.0005976396063496852), (u'similar', 0.0005976193343234191), (u'often', 0.000597486744313787), (u'stop', 0.0005973863573051375), (u'surprisingly', 0.0005971756847433556), (u'mental', 0.0005970111735354374), (u'every', 0.0005969647212682807), (u'creating', 0.0005967546703459484), (u'girl', 0.0005964550383015896), (u'background', 0.000596414850427027), (u'million', 0.0005963657560505341), (u'1997', 0.0005962256325176275), (u'around', 0.0005961969250385714), (u'leave', 0.0005961050611055916), (u'unfortunately', 0.0005960899107710949), (u'growing', 0.000595860521903716), (u'members', 0.0005957036958682102), (u'brings', 0.0005954556467674971), (u'faces', 0.0005954556467674971), (u'apparently', 0.0005953574029716272), (u'let', 0.0005953107082733549), (u'student', 0.0005950985519268569), (u'veteran', 0.0005950985519268569), (u'more', 0.0005950716124445559), (u'forget', 0.0005949507380788871), (u'whole', 0.0005949163974879445), (u'career', 0.0005948830146027255), (u'catherine', 0.0005948612718024842), (u'watching', 0.0005948059224834115), (u'anything', 0.0005946316706465377), (u'starts', 0.0005945849455816957), (u'screen', 0.0005942032822377343), (u'up', 0.0005941929153640528), (u'model', 0.0005941237795240284), (u'and', 0.0005940629658477276), (u'capture', 0.0005935439580085528), (u'solid', 0.0005935439580085528), (u'positive', 0.0005934339405927958), (u'lots', 0.000593387363876636), (u'buddy', 0.0005932723960329503), (u'era', 0.0005932113472167296), (u'storyline', 0.0005930761269415491), (u'thriller', 0.0005929729550712593), (u'old', 0.0005925904737386595), (u'as', 0.0005925848590923172), (u'cause', 0.0005925223677193814), (u'handle', 0.0005925223677193814), (u'heroine', 0.0005925223677193814), (u'mouth', 0.0005925223677193814), (u'provided', 0.0005925223677193814), (u'easy', 0.0005922554657519402), (u'sets', 0.0005920306479121454), (u'twenty', 0.0005920159383452623), (u'ben', 0.0005919268678523268), (u'us', 0.0005918044934621259), (u'haven', 0.0005917675621554076), (u'stone', 0.0005917675621554076), (u'taken', 0.0005917323378957556), (u'fill', 0.0005916510112962647), (u'least', 0.0005914705528654417), (u'begins', 0.0005914642920627396), (u'friendly', 0.0005914251040754566), ...]
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
[(u'worst', -0.047847985347985345), (u'bad', -0.005144142257015721), (u'over', -0.0008741258741258741), (u'ever', -0.0006927835481992246), (u'old', -0.000509611311233071), (u'horrible', -0.0005045767240889192), (u'appropriate', -0.0004807692307692308), (u'single', -0.0003521130740894542), (u'worried', -0.0003434065934065934), (u'rich', -0.00020903010033444816), (u'year', -0.0001923076923076923), (u'normal', -0.00016869095816464237), (u'ready', -0.00016411333242216783), (u'busy', -0.00014386161489820027), (u'able', -0.00011459845186260283), (u'enough', -0.0001019798896944335), (u'high', -8.761468369791503e-05), (u'rude', -8.029485482690247e-05), (u'seriously', -7.754342431761787e-05), (u'sorry', -7.302249637155297e-05), (u'still', -5.139207374979732e-05), (u'other', -5.07118667496026e-05), (u'like', -4.767420925957511e-05), (u'too', -4.1431243431412225e-05), (u'away', -4.001232677893313e-05), (u'fast', -3.4470246734397684e-05), (u'little', -1.496198113885625e-05), (u'hard', -1.1084353093817969e-05), (u'probably', -9.115554299477096e-06), (u'now', -8.629079693513414e-06), (u'never', -8.614478291966799e-06), (u'long', -8.133879221071868e-06), (u'much', -3.840238804308329e-06), (u'together', -3.67649335290002e-06), (u'just', -2.2528019305203796e-06), (u'rob', 0.0), (u'skin', 0.0), (u'young', 0.0), (u'finally', 0.0), (u'ta', 0.0), (u'worse', 0.0), (u'fat', 0.0), (u'bunim', 0.0), (u'anxious', 0.0), (u'quick', 0.0), (u'anal', 0.0), (u'ten', 0.0), (u'tired', 0.0), (u'past', 0.0), (u'second', 0.0), (u'uncomfortable', 0.0), (u'kris', 0.0), (u'public', 0.0), (u'full', 0.0), (u'alone', 0.0), (u'sexy', 0.0), (u'dry', 0.0), (u'bible', 0.0), (u'ahead', 0.0), (u'guilty', 0.0), (u'later', 0.0), (u'weird', 0.0), (u'extra', 0.0), (u'private', 0.0), (u'moral', 0.0), (u'total', 0.0), (u'angry', 0.0), (u'live', 0.0), (u'acceptable', 0.0), (u'everywhere', 0.0), (u'basically', 0.0), (u'glad', 0.0), (u'male', 0.0), (u'embarrassing', 0.0), (u'awesome', 0.0), (u'huge', 0.0), (u'awkward', 0.0), (u'rather', 0.0), (u'truthful', 0.0), (u'guys', 0.0), (u'short', 0.0), (u'natural', 0.0), (u'tall', 0.0), (u'cute', 0.0), (u'murray', 0.0), (u'scott', 0.0), (u'cold', 0.0), (u'easier', 0.0), (u'safe', 0.0), (u'bigger', 0.0), (u'mean', 0.0), (u'em', 0.0), (u'sexual', 0.0), (u'special', 0.0), (u'god', 0.0), (u'red', 0.0), (u'free', 0.0), (u'completely', 0.0), (u'scary', 0.0), (u'atm', 0.0), (u'american', 0.0), (u'major', 0.0), (u'delicious', 0.0), (u'open', 0.0), (u'top', 0.0), (u'wonderful', 0.0), (u'white', 0.0), (u'hundred', 0.0), (u'huh', 0.0), (u'forward', 0.0), (u'ridiculous', 0.0), (u'double', 0.0), (u'miserable', 0.0), (u'apparently', 0.0), (u'clearly', 0.0), (u'afraid', 0.0), (u'potential', 0.0), (u'lily', 0.0), (u'regular', 0.0), (u'forever', 0.0), (u'clear', 0.0), (u'hungry', 0.0), (u'professional', 0.0), (u'normally', 0.0), (u'anyway', 0.0), (u'bright', 0.0), (u'wasteful', 0.0), (u'truly', 0.0), (u'gray', 0.0), (u'twice', 0.0), (u'stupid', 0.0), (u'common', 0.0), (u'boring', 0.0), (u'fair', 0.0), (u'dumb', 0.0), (u'desperate', 0.0), (u'outside', 0.0), (u'barely', 0.0), (u'quiet', 0.0), (u'somewhere', 0.0), (u'tryclearblue', 0.0), (u'wear', 0.0), (u'tough', 0.0), (u'drunk', 0.0), (u'active', 0.0), (u'late', 0.0), (u'basic', 0.0), (u'present', 0.0), (u'fur', 0.0), (u'straight', 0.0), (u'ugly', 0.0), (u'alcoholic', 0.0), (u'almost', 0.0), (u'in', 0.0), (u'rid', 0.0), (u'grown', 0.0), (u'sensitive', 0.0), (u'belly', 0.0), (u'difficult', 0.0), (u'off', 0.0), (u'accurate', 0.0), (u'touch', 0.0), (u'yes', 0.0), (u'yet', 0.0), (u'early', 0.0), (u'possibly', 0.0), (u'disappointed', 0.0), (u'apart', 0.0), (u'necessary', 0.0), (u'often', 0.0), (u'dead', 0.0), (u'supportive', 0.0), (u'gross', 0.0), (u'literally', 0.0), (u'laker', 0.0), (u'exciting', 0.0), (u'oh', 0.0), (u'favorite', 0.0), (u'down', 0.0), (u'kmart', 0.0), (u'constantly', 0.0), (u'low', 0.0), (u'biggest', 0.0), (u'complete', 0.0), (u'diaper', 0.0), (u'true', 0.0), (u'khloe', 0.0), (u'inside', 0.0), (u'uh', 0.0), (u'emotional', 0.0), (u'certain', 0.0), (u'deep', 0.0), (u'girlfriend', 0.0), (u'annoying', 0.0), (u'selfish', 0.0), (u'incredible', 0.0), (u'sick', 0.0), (u'poor', 0.0), (u'welcome', 0.0), (u'luxurious', 0.0), (u'important', 0.0), (u'thebouncedryer', 0.0), (u'ago', 0.0), (u'younger', 0.0), (u'kardashian', 0.0), (u'serious', 0.0), (u'so', 9.719929433431632e-07), (u'up', 2.3948220007729206e-06), (u'only', 3.4565482998269166e-06), (u'not', 6.769332261068325e-06), (u'even', 7.76747119728939e-06), (u'crazy', 8.742022904100009e-06), (u'more', 8.838651139547831e-06), (u'wrong', 9.475619231716793e-06), (u'right', 9.72750361669927e-06), (u'around', 1.2384058436690012e-05), (u'nice', 1.3744503317492389e-05), (u'already', 1.4114724480578139e-05), (u'perfect', 1.6524555489457333e-05), (u'anymore', 1.782912565967765e-05), (u'here', 1.8486252267219916e-05), (u'comfortable', 1.8819632640770853e-05), (u'cool', 1.9084697889232416e-05), (u'really', 2.003658328600757e-05), (u'few', 2.2583559168925024e-05), (u'gorgeous', 2.8229448961156277e-05), (u'same', 2.902757619738752e-05), (u'fine', 2.9456816307293507e-05), (u'least', 3.0111412225233366e-05), (u'obviously', 3.070864690791617e-05), (u'nervous', 3.0795762503079576e-05), (u'beautiful', 3.22622273841786e-05), (u'as', 3.4001658469599825e-05), (u'armenian', 3.474393718296157e-05), (u'healthy', 3.662198784150004e-05), (u'gon', 3.8714672861014324e-05), (u'then', 3.890054232777123e-05), (u'easy', 4.337453914552158e-05), (u'there', 4.834990733331861e-05), (u'smart', 5.01856870420556e-05), (u'real', 5.2930216802168025e-05), (u'close', 5.313778627982358e-05), (u'whole', 5.386263165728602e-05), (u'totally', 5.519544414651547e-05), (u'um', 5.741582839557209e-05), (u'clean', 5.76601510695958e-05), (u'about', 5.76601510695958e-05), (u'different', 6.123038880785898e-05), (u'pregnant', 6.174868577077505e-05), (u'sometimes', 6.302388605281402e-05), (u'big', 6.341485539311127e-05), (u'yeah', 6.45244547683572e-05), (u'next', 6.532809402019291e-05), (u'new', 6.927439464142474e-05), (u'back', 6.954514654828853e-05), (u'super', 7.13165026387106e-05), (u'black', 7.259001161440186e-05), (u'better', 7.326569222565542e-05), (u'first', 7.429889392282526e-05), (u'many', 7.472445357743322e-05), (u'excited', 7.49737591842855e-05), (u'instead', 7.527853056308341e-05), (u'amazing', 7.547169811320755e-05), (u'all', 7.970667941973537e-05), (u'san', 8.66884090568301e-05), (u'always', 8.746618938689085e-05), (u'naked', 8.775778850372971e-05), (u'jealous', 8.775778850372971e-05), (u'actually', 9.610766562275378e-05), (u'exactly', 9.746588693957115e-05), (u'else', 9.833424385220351e-05), (u'happy', 0.00010227503659065664), (u'half', 0.00010423181154888472), (u'maybe', 0.00010762602164254015), (u'fun', 0.00010986596352450011), (u'last', 0.0001109612782436421), (u'well', 0.00011183991935902711), (u'kimberly', 0.00011291779584462511), (u'very', 0.00011345906421838313), (u'soon', 0.00011466574934067194), (u'pretty', 0.00011686194177483377), (u'two', 0.0001231830500123183), (u'honest', 0.0001231830500123183), (u'online', 0.0001231830500123183), (u'though', 0.0001231830500123183), (u'dramatic', 0.0001231830500123183), (u'definitely', 0.00012496943308475518), (u'okay', 0.000128214961394897), (u'funny', 0.0001314146790196465), (u'again', 0.00013204006859581066), (u'most', 0.00013290802764486976), (u'anywhere', 0.00013550135501355014), (u'and', 0.00013550135501355014), (u'married', 0.0001397624039133473), (u'sad', 0.0001538935056940597), (u'sweet', 0.0001538935056940597), (u'honestly', 0.00015723999979378361), (u'strong', 0.00016260162601626016), (u'upset', 0.0001719986240110079), (u'sure', 0.00017624762858331133), (u'entire', 0.00018126879276463763), (u'such', 0.00018364057710583252), (u'own', 0.00018487766158777996), (u'sudden', 0.00019860973187686197), (u'interested', 0.00019860973187686197), (u'fabulous', 0.00020165355918531962), (u'before', 0.00020514393823183516), (u'older', 0.00020885547201336674), (u'na', 0.00021189213182167), (u'proud', 0.00021557033752155703), (u'less', 0.00022197558268590456), (u'lately', 0.00022197558268590456), (u'extremely', 0.00022583559168925022), (u'hopefully', 0.0002358490566037736), (u'mad', 0.00023869196801527628), (u'you', 0.0002463661000246366), (u'along', 0.00025157232704402514), (u'far', 0.00025590039182125566), (u'usually', 0.0002658160552897395), (u'small', 0.0002695417789757412), (u'hot', 0.0002696971971726943), (u'possible', 0.000278473962684489), (u'kim', 0.00029239766081871346), (u'out', 0.00030596634370219276), (u'nude', 0.00030795762503079576), (u'fresh', 0.0003103721974849886), (u'awful', 0.00031269543464665416), (u'absolutely', 0.0003298922527165572), (u'especially', 0.00034956718133497076), (u'personal', 0.0003654970760233918), (u'kelly', 0.0003654970760233918), (u'done', 0.00037735849056603777), (u'light', 0.00037735849056603777), (u'female', 0.00037735849056603777), (u'scared', 0.0004192872117400419), (u'secret', 0.000449842555105713), (u'positive', 0.0006097560975609756), (u'also', 0.0006190667334485414), (u'once', 0.0006244240240072233), (u'certainly', 0.0006289308176100629), (u'adrienne', 0.0006548151605917973), (u'willing', 0.000732656922256554), (u'couple', 0.0009433962264150943), (u'changei', 0.0013550135501355014), (u'good', 0.0015092722230527766), (u'great', 0.004056482093964539), (u'best', 0.0062437027559645916)]
lots of words that correlate with good which do not even have a polarity, so we need to focus on words that are more likely to have a polarity: adverbs and adjectives.
##example part of speech (POS) tagging (note that you need to tokenize the sentence first)
pos_tag(tokenizer.tokenize("This was a great day but the time is running out fast"))
[('This', 'DT'), ('was', 'VBD'), ('a', 'DT'), ('great', 'JJ'), ('day', 'NN'), ('but', 'CC'), ('the', 'DT'), ('time', 'NN'), ('is', 'VBZ'), ('running', 'VBG'), ('out', 'RP'), ('fast', 'RB')]
## POS tagging all reviews
## POS tagging is relatively slow, so this will take a while
#reviews_pos_tagged=[pos_tag(tokenizer.tokenize(m)) for m in data.data]
## Reconstructing adjective-and-adverb-only reviews
reviews_adj_adv_only=[" ".join([w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]])
for m in reviews_pos_tagged]
print(data.data[1])
good films are hard to find these days . great films are beyond rare . proof of life , russell crowe's one-two punch of a deft kidnap and rescue thriller , is one of those rare gems . a taut drama laced with strong and subtle acting , an intelligent script , and masterful directing , together it delivers something virtually unheard of in the film industry these days , genuine motivation in a story that rings true . consider the strange coincidence of russell crowe's character in proof of life making the moves on a distraught wife played by meg ryan's character in the film -- all while the real russell crowe was hitching up with married woman meg ryan in the outside world . i haven't seen this much chemistry between actors since mcqueen and mcgraw teamed up in peckinpah's masterpiece , the getaway . but enough with the gossip , let's get to the review . the film revolves around the kidnapping of peter bowman ( david morse ) , an american engineer working in south america who is kidnapped during a mass ambush of civilians by anti-government soldiers . upon discovering his identity , the rebel soldiers decide to ransom him for $6 million . the only problem is that the company peter bowman works for is being auctioned off , and no one will step forward with the money . with no choice available to her , bowman's wife alice ( ryan ) hires terry thorne ( crowe ) , a highly skilled negotiator and rescue operative , to arrange the return of her husband . but when things go wrong -- as they always do in these situations -- terry and his team ( which includes the most surprising casting choice of the year : david caruso ) take matters into their own hands . the film is notable in that it takes this very simple story line and creates a complex and intelligent character-driven vehicle filled with well-written dialogue , shades of motivation , and convincing acting by all the actors . the script is based on both a book ( the long march to freedom ) and a magazine article pertaining to kidnap/ransom situations , and the story has been sharply pieced together by tony gilroy , screenwriter of the devil's advocate and dolores claiborne . the biggest surprise for me was not the chemistry between crowe and ryan , but that between crowe and david caruso . dug out from b-movie hell , caruso pulls off a gutsy performance as crowe's right hand gun while providing most of the film's humor . ryan cries a lot and smokes too many cigarettes , david morse ends up getting everyone at the guerilla camp to hate him , and crowe provides another memorable acting turn as the stoic , gunslinger character of terry thorne . the most memorable pieces of the film lie in its action scenes . the bulk of those scenes , which bookend the movie , work extremely well as establishment and closure devices for all of the story's characters . the scenes are skillfully crafted and executed with amazing accuracy and poise . director taylor hackford mixes both his old-school style of filmmaking with the dizziness of a lars von trier film . proof of life is a thinking man's action movie . it is a film about the choices men and women make in the face of love and war , and the sacrifices one makes for those choices -- the sacrifices that help you sleep at night .
## It kind of works:
reviews_adj_adv_only[1]
"good hard great rare one-two rare taut strong subtle intelligent masterful together virtually unheard genuine true strange distraught meg real married outside n't much enough david american south anti-government only forward available ryan terry highly skilled wrong always most surprising own notable very simple complex intelligent character-driven well-written long sharply together tony biggest not gutsy right most ryan too many david memorable gunslinger terry most memorable extremely well skillfully amazing old-school trier"
## term doc matrix only for adj/adv
X = vec.fit_transform(reviews_adj_adv_only)
terms = vec.get_feature_names()
len(terms)
562
pmi_matrix=getcollocations_matrix(X)
pmi_matrix.shape # n_words by n_words
(562, 562)
getcollocations("good",pmi_matrix,terms)
[(u'good', 0.0012845617524013917), (u'sean', 0.0009252217997465145), (u'nicely', 0.0009139270410318754), (u'fairly', 0.0008755655970071575), (u'robin', 0.0008653937882442727), (u'pretty', 0.0008548338879871134), (u'forward', 0.0008305488343511157), (u'terrific', 0.0008224793031847478), (u'cool', 0.0008204205677528381), (u'sadly', 0.0008203411798967191), (u'horrible', 0.0008162394739972354), (u'stupid', 0.0008141637119023551), (u'technical', 0.0008138216263091188), (u'lovely', 0.000809148389221857), (u'totally', 0.0007957590383758413), (u'sad', 0.0007916292386003339), (u'anti', 0.000788200947102258), (u'therefore', 0.0007862742336760081), (u'climactic', 0.0007856565791326648), (u'naturally', 0.0007855407689057879), (u'thankfully', 0.0007735470703392302), (u'bad', 0.0007711712373639965), (u'total', 0.0007710181664554288), (u'average', 0.0007709092809637673), (u'nice', 0.0007687057994165336), (u'mainly', 0.0007579711225428067), (u'fun', 0.0007575426481942805), (u'dumb', 0.000753415011970145), (u'bigger', 0.0007503673016413496), (u'really', 0.0007459124729101358), (u'twice', 0.0007415796995928053), (u'suspenseful', 0.0007411854520119478), (u'badly', 0.000739332488381918), (u'boring', 0.000739332488381918), (u'extra', 0.000739332488381918), (u'witty', 0.0007356047615497403), (u'guilty', 0.0007334647702201568), (u'gary', 0.0007322912265878045), (u'co', 0.0007263617429717089), (u'violent', 0.000724764360532028), (u'nevertheless', 0.0007242440702516748), (u'natural', 0.0007230834227031945), (u'smart', 0.0007227257388674994), (u'fantastic', 0.0007187573727497682), (u'maybe', 0.0007172577930466747), (u'slightly', 0.0007134523539730903), (u'either', 0.000711347986379672), (u'probably', 0.0007112067778635317), (u'though', 0.0007101187426403834), (u'particular', 0.0007097591888466413), (u'scary', 0.0007082680981137702), (u'usual', 0.0007081268963398241), (u'longer', 0.000707932266867628), (u'looking', 0.0007065542007196655), (u'terribly', 0.0007065542007196655), (u'robert', 0.0007063265737220109), (u'brilliant', 0.0007041261794113506), (u'intelligent', 0.0007031810436000601), (u'realistic', 0.0007030259822560203), (u'overall', 0.0007022972802440484), (u'somewhere', 0.0007019084591612359), (u'able', 0.0007016336973603369), (u'impressive', 0.000698175817331818), (u'very', 0.0006967103421644604), (u'loud', 0.0006940672339911883), (u'plain', 0.0006934978597221227), (u'right', 0.0006931443949618157), (u'weird', 0.0006928601605407688), (u'national', 0.0006882265560052878), (u'there', 0.0006878385631616218), (u'alien', 0.0006874604710229162), (u'great', 0.0006869373973507421), (u'past', 0.0006869029491235909), (u'actually', 0.000685955181232993), (u'wonderfully', 0.0006859017371207038), (u'general', 0.0006852349892320217), (u'better', 0.0006850957421299627), (u'capable', 0.0006849227381546773), (u'sure', 0.0006843408156923539), (u'disappointing', 0.000683743579481022), (u'dull', 0.000683416585899252), (u'believable', 0.0006827207435572456), (u'huge', 0.0006824607585063859), (u'seemingly', 0.0006814124316884037), (u'necessary', 0.0006796348340405209), (u'biggest', 0.0006793866109455464), (u'relatively', 0.0006785215910691195), (u'before', 0.0006768233275566246), (u'evil', 0.000676808909574656), (u'definitely', 0.0006754837585539397), (u'major', 0.0006745290111920259), (u'sometimes', 0.0006742712294043093), (u'black', 0.0006720462994227253), (u'well', 0.000671504346629487), (u'as', 0.0006689450718072615), (u'basic', 0.0006683427178347082), (u'funny', 0.0006678723347275578), (u'hardly', 0.0006674406137613473), (u'forever', 0.0006673891613551061), (u'also', 0.0006673611178298724), (u'fair', 0.0006665727831760785), (u'special', 0.0006664405529076444), (u'musical', 0.0006658888637082176), (u'offensive', 0.0006655439230052492), (u'anyway', 0.0006651745184226375), (u'brief', 0.0006649400268180232), (u'moral', 0.0006631886108409231), (u'responsible', 0.0006630521522790218), (u'just', 0.0006628096845863869), (u'usually', 0.0006625610216839845), (u'again', 0.0006619799217660092), (u'interesting', 0.0006613756613756613), (u'regular', 0.0006609700587377516), (u'occasionally', 0.0006603401817000564), (u'then', 0.0006594307667562458), (u'awful', 0.000658276102612472), (u'especially', 0.0006559491250305739), (u'fake', 0.0006555657532450505), (u'supposedly', 0.0006553789823751801), (u'terrible', 0.0006545398287485793), (u'however', 0.000654004356326862), (u'next', 0.0006525654523148672), (u'extremely', 0.0006524524033416466), (u'basically', 0.0006518196632265073), (u'typical', 0.0006518196632265073), (u'too', 0.0006512609037979682), (u'ahead', 0.0006508409550234646), (u'best', 0.0006491146329308628), (u'danny', 0.0006485894666690468), (u'never', 0.0006477344547582753), (u'tough', 0.0006469159273341783), (u'even', 0.0006464883829070581), (u'around', 0.0006460357696099141), (u'social', 0.0006452356262242194), (u'together', 0.0006444825500965067), (u'unbelievable', 0.0006444544692917445), (u'entirely', 0.0006426391045895142), (u'personal', 0.0006426077868943588), (u'not', 0.0006423935572122158), (u'minor', 0.0006421630756231517), (u'likable', 0.0006417352521217373), (u'mean', 0.000641257770535337), (u'quite', 0.0006410216256057863), (u'predictable', 0.000640754823264329), (u'subtle', 0.0006402131877417048), (u'always', 0.0006402020962292961), (u'generally', 0.0006398661203194408), (u'entire', 0.0006378554801726352), (u'frankly', 0.0006378554801726352), (u'little', 0.0006375234626413791), (u'ever', 0.000637266073888979), (u'short', 0.0006372459670525467), (u'largely', 0.0006370665432769362), (u'common', 0.0006370141529362062), (u'second', 0.0006359849362425101), (u'wrong', 0.000635925476169937), (u'instead', 0.0006355829230084756), (u'strong', 0.00063450471448079), (u'nearly', 0.000634401632655308), (u'back', 0.0006337135614702154), (u'quiet', 0.0006337135614702154), (u'possible', 0.0006312087647845625), (u'poor', 0.0006307932224772652), (u'mental', 0.0006302506458337662), (u'interested', 0.0006301929305731587), (u'stunning', 0.0006300076342101557), (u'so', 0.0006286547908608524), (u'dramatic', 0.0006282410782105418), (u'funniest', 0.0006278458433084542), (u'professional', 0.0006275006834165859), (u'frank', 0.0006273124143846578), (u'wild', 0.0006271417171290429), (u'naked', 0.000627112378538234), (u'star', 0.000627112378538234), (u'powerful', 0.000626749676179334), (u'decent', 0.0006267189305489107), (u'john', 0.0006262076478825818), (u'completely', 0.0006255335079053), (u'later', 0.0006254064548591827), (u'worse', 0.0006235667649983488), (u'perfectly', 0.0006235334239365573), (u'intriguing', 0.0006233248145608677), (u'laughable', 0.0006222952991013828), (u'surprising', 0.000622107085985413), (u'anywhere', 0.000621756701819834), (u'tight', 0.000621756701819834), (u'finally', 0.0006216428269660209), (u'big', 0.0006213592862226337), (u'much', 0.0006212148913906888), (u'ago', 0.0006211534728644996), (u'more', 0.0006199798310358765), (u'hard', 0.0006190442660658123), (u'straight', 0.0006188376562713841), (u'present', 0.0006177106937563211), (u'enough', 0.000617575663857709), (u'same', 0.0006162318080503474), (u'small', 0.0006154949120728588), (u'many', 0.0006153885110596677), (u'effectively', 0.0006151839251699169), (u'hot', 0.0006151839251699169), (u'recently', 0.0006147967387397613), (u'almost', 0.0006143694356622113), (u'here', 0.000613960063766426), (u'certain', 0.0006135451231654682), (u'still', 0.0006133362708912624), (u'important', 0.0006130810269107201), (u'remarkable', 0.0006128872941918517), (u'far', 0.0006120224706352686), (u'spectacular', 0.0006114779979098571), (u'visually', 0.0006102426888231705), (u'other', 0.0006101703594775709), (u'seriously', 0.0006099649270166965), (u'quickly', 0.0006096904329961811), (u'obviously', 0.0006096250342798272), (u'earlier', 0.000609151020327959), (u'slowly', 0.0006090691451908182), (u'long', 0.0006088771107782514), (u'fully', 0.0006088620492556972), (u'superior', 0.00060817931540365), (u'running', 0.0006077720706497973), (u'talented', 0.0006052776965324494), (u'wonderful', 0.0006052776965324494), (u'comic', 0.0006048073288417495), (u'oh', 0.0006045773057704355), (u'unfunny', 0.0006042385120995078), (u'about', 0.0006041402619349387), (u'top', 0.0006033140179310539), (u'likely', 0.0006028007048131317), (u'only', 0.0006026167735226865), (u'final', 0.0006025837724857136), (u'obvious', 0.0006024997205272381), (u'main', 0.000602477888849712), (u'apparently', 0.0006023816309988012), (u'incredibly', 0.0006020278833967047), (u'unfortunately', 0.0006016407983242533), (u'due', 0.0006013812369054086), (u'immediately', 0.0006013151176322699), (u'early', 0.000600729577650614), (u'often', 0.0006006006006006006), (u'incredible', 0.0006003602161296778), (u'cute', 0.0005998111898689282), (u'most', 0.0005985685875326978), (u'emotional', 0.0005982114011637608), (u'positive', 0.0005979656169770238), (u'now', 0.0005979394088065742), (u'several', 0.0005979190802734094), (u'highly', 0.0005964362931484381), (u'along', 0.0005962790050964475), (u'similar', 0.000595100190341206), (u'whole', 0.0005942959715223074), (u'willing', 0.0005934777797895668), (u'few', 0.0005933943030640883), (u'absolutely', 0.0005928288155689112), (u'sweet', 0.0005920610269134877), (u'away', 0.000591620460799738), (u'french', 0.0005914659907055344), (u'yet', 0.0005911647602544493), (u'worth', 0.0005908950775870928), (u'mary', 0.0005906836282839662), (u'least', 0.0005905845212708019), (u'surprisingly', 0.0005905812248256458), (u'old', 0.000590101593956502), (u'practically', 0.0005898717427521502), (u'happy', 0.000589500987414154), (u'up', 0.0005884162722214598), (u'middle', 0.00058779228889991), (u'effective', 0.0005876058065747514), (u'future', 0.0005875052809463456), (u'international', 0.0005875052809463456), (u'solid', 0.0005870447332999283), (u'exciting', 0.0005864956882626307), (u'tony', 0.0005864215046440799), (u'soon', 0.0005851288550908323), (u'unnecessary', 0.000584966364434045), (u'easy', 0.0005848159101222051), (u'free', 0.0005844776707294218), (u'third', 0.0005841639414375648), (u'apart', 0.0005819005029852293), (u'first', 0.0005809329714224865), (u'last', 0.000580213367814585), (u'amazing', 0.0005800090223625701), (u'extreme', 0.0005789481919604438), (u'non', 0.000578608034385849), (u'utterly', 0.0005783893616593237), (u'available', 0.0005781246525693194), (u'simple', 0.0005776035065483734), (u'otherwise', 0.0005774847802366472), (u'double', 0.0005765033093930432), (u'fascinating', 0.0005761032377001958), (u'rather', 0.0005757935047767011), (u'shallow', 0.000575036379852603), (u'light', 0.000574915395972979), (u'pure', 0.0005743029150823828), (u'exactly', 0.000574280538191088), (u'else', 0.0005736775398572476), (u'pathetic', 0.0005736775398572476), (u'honest', 0.0005733598889492426), (u'apparent', 0.0005718358063098241), (u'already', 0.0005715847809339198), (u'entertaining', 0.0005714539835012119), (u'enjoyable', 0.0005711245677447621), (u'friendly', 0.0005711245677447621), (u'single', 0.0005710582658446291), (u'deep', 0.000570424399040635), (u'certainly', 0.0005699326028365557), (u'quick', 0.000568979380459817), (u'constantly', 0.0005682595785953575), (u'painful', 0.0005679181643776794), (u'previous', 0.0005678436930736697), (u'real', 0.0005673647128220205), (u'popular', 0.0005661603391815123), (u'such', 0.0005660274727813993), (u'easily', 0.0005659961633545785), (u'normal', 0.0005658848928113239), (u'ready', 0.0005658156798841209), (u'particularly', 0.000565018324454474), (u'less', 0.0005649012303004698), (u'favorite', 0.0005643759453297084), (u'convincing', 0.0005633009435290804), (u'known', 0.0005633009435290804), (u'nasty', 0.0005633009435290804), (u'impossible', 0.0005624207858048163), (u'excellent', 0.0005616441760481125), (u'flat', 0.0005611509399278243), (u'originally', 0.0005593340354760587), (u'truly', 0.0005591590248266607), (u'virtually', 0.000558758193984491), (u'appropriate', 0.0005584449009124504), (u'perhaps', 0.0005572308902582929), (u'new', 0.0005569594344598478), (u'fast', 0.0005568997964435227), (u'screen', 0.0005568997964435227), (u'thoroughly', 0.000556594979915639), (u'clever', 0.000556329397198275), (u'safe', 0.0005561705518388389), (u'half', 0.0005560325442577374), (u'key', 0.000555889089008961), (u'aside', 0.0005552537871929507), (u'michael', 0.0005551134298149949), (u'once', 0.0005541844887360203), (u'different', 0.0005541309281693047), (u'necessarily', 0.0005540665018318823), (u'classic', 0.0005538756324660938), (u'emotionally', 0.0005537857248883865), (u'critical', 0.0005535888582958204), (u'chris', 0.0005533836733965262), (u'nowhere', 0.0005531382976406692), (u'original', 0.0005529724353966713), (u'dangerous', 0.0005528694445748382), (u'suddenly', 0.0005519938078013069), (u'down', 0.0005516731717589848), (u'lee', 0.0005513666015051592), (u'mysterious', 0.0005513666015051592), (u'serial', 0.0005504986493579649), (u'young', 0.0005498438886870331), (u'secret', 0.0005498397077462163), (u'intense', 0.0005497274268175362), (u'acting', 0.0005484772344888414), (u'military', 0.0005478256428826772), (u'humorous', 0.000547298075815186), (u'slow', 0.0005471823924341218), (u'low', 0.0005470205694386445), (u'somewhat', 0.0005465928646955907), (u'familiar', 0.0005464631435866351), (u'soft', 0.0005463047943708755), (u'animated', 0.0005462312179675932), (u'potential', 0.0005459686068051087), (u'over', 0.0005457929412302035), (u'essentially', 0.0005451299453507229), (u'silly', 0.0005447713072287817), (u'indeed', 0.0005445242454114444), (u'out', 0.0005438499440978276), (u'like', 0.0005435359981420951), (u'literally', 0.0005420443041506245), (u'crazy', 0.0005417629662764979), (u'no', 0.0005416355226241158), (u'serious', 0.0005409268406318974), (u'worst', 0.0005406021390612145), (u'memorable', 0.0005403090682829956), (u'true', 0.0005402317728588797), (u'psychological', 0.0005402148392860853), (u'standard', 0.0005398300708820353), (u'visual', 0.0005398300708820353), (u'close', 0.0005394994952109503), (u'jean', 0.0005392124163386921), (u'cold', 0.0005386565272496831), (u'bright', 0.0005385404624948351), (u'rarely', 0.0005384494313145622), (u'mad', 0.0005383158210338389), (u'complex', 0.0005376963551868495), (u'merely', 0.0005360160540768906), (u'poorly', 0.0005348362681911748), (u'computer', 0.0005346958174904943), (u'giant', 0.0005346958174904943), (u'clear', 0.0005346585226716695), (u'simply', 0.0005340171912077672), (u'successful', 0.0005337730714892496), (u'dark', 0.0005322365532609326), (u'time', 0.0005320064466663538), (u'rich', 0.0005315765772039537), (u'unique', 0.0005310015775010368), (u'to', 0.0005304770163685514), (u'life', 0.0005304417218232174), (u'day', 0.0005298847858621011), (u'oddly', 0.0005298847858621011), (u'physical', 0.0005292552821069931), (u'fi', 0.0005289821885661743), (u'mostly', 0.0005289533250212096), (u'billy', 0.0005280946345585128), (u'genuinely', 0.0005280946345585128), (u'minute', 0.0005280946345585128), (u'graphic', 0.0005267906971892326), (u'constant', 0.0005267229601830362), (u'dead', 0.0005264528895806107), (u'dimensional', 0.0005256383804442872), (u'sci', 0.0005254319725355288), (u'comedic', 0.0005253864569453923), (u'lucky', 0.0005250769509324643), (u'clearly', 0.0005248940610157341), (u'weak', 0.000524032368138832), (u'overly', 0.0005238698774820448), (u'female', 0.0005235025073014823), (u'surely', 0.0005227241806477484), (u'private', 0.0005221904709423308), (u'open', 0.0005221162047333222), (u'difficult', 0.0005217574989438107), (u'older', 0.0005216281696455515), (u'traditional', 0.0005214934516265315), (u'eventually', 0.0005210533727643994), (u'all', 0.0005208105706335679), (u'late', 0.0005208105706335679), (u'united', 0.000520725872215836), (u'attractive', 0.000520366420394242), (u'married', 0.0005194373454673897), (u'large', 0.000519041583680367), (u'aware', 0.0005188298164083636), (u'various', 0.0005184929139301763), (u'strange', 0.0005179504438381798), (u'own', 0.0005168251212652047), (u'rare', 0.0005158650746003157), (u'heavily', 0.0005156688784512538), (u'hilarious', 0.0005154203633291086), (u'barely', 0.0005146522256788417), (u'genuine', 0.0005143182527874212), (u'greatest', 0.0005139070175106722), (u'lead', 0.0005117311388397984), (u'ultimate', 0.0005114179618882441), (u'david', 0.0005107542137222632), (u'complete', 0.0005099706766860905), (u'grand', 0.000509006876682904), (u'narrative', 0.0005084029702190428), (u'perfect', 0.0005083120418988606), (u'near', 0.0005081954164447138), (u'thus', 0.0005069708491761723), (u'possibly', 0.0005056402170261037), (u'successfully', 0.0005055856829215927), (u'high', 0.0005051027593124279), (u'nonetheless', 0.0005037210360404276), (u'english', 0.0005033753112387527), (u'further', 0.0005031733147254146), (u'beautiful', 0.0005017852267718433), (u'initially', 0.0005012423650046902), (u'fresh', 0.0005008732616431257), (u'recent', 0.0005005636346526189), (u'eccentric', 0.0005004712229046829), (u'alive', 0.0004996234455649235), (u'tim', 0.0004987560437497066), (u'wide', 0.0004972891142092662), (u'human', 0.000496486050592237), (u'innocent', 0.0004943864663952036), (u'thin', 0.0004928883255879454), (u'public', 0.0004908173662367355), (u'steven', 0.0004887464068855257), (u'political', 0.0004883160776696898), (u'painfully', 0.00048584706379383184), (u'sole', 0.0004850647013722637), (u'sympathetic', 0.0004850647013722637), (u'equally', 0.00048392671966816455), (u'unusual', 0.00048392671966816455), (u'sexual', 0.0004825117292597781), (u'modern', 0.0004824400016353898), (u'ex', 0.00048142580638822573), (u'ultimately', 0.00048130143909130293), (u'somehow', 0.00048103669682557603), (u'sexy', 0.0004809723440902148), (u'off', 0.00047995539576202255), (u'full', 0.00047982005577615577), (u'outstanding', 0.00047956701949097386), (u'lame', 0.00047839161012947634), (u'william', 0.0004781867899738622), (u'hearted', 0.00047724107715658195), (u'empty', 0.00047698870218188265), (u'numerous', 0.0004768537690270928), (u'unable', 0.00047624534316549524), (u'chinese', 0.0004752851711026616), (u'subject', 0.00047456175379504724), (u'accidentally', 0.00047435868928764663), (u'cheap', 0.0004739971354086165), (u'foreign', 0.0004734641551214253), (u'blue', 0.00047337639531510066), (u'worthy', 0.0004729205682613548), (u'self', 0.00047247283281211326), (u'fine', 0.0004720845975598827), (u'alone', 0.0004717329650406088), (u'occasional', 0.00047017457786499854), (u'famous', 0.00046941745294090036), (u'former', 0.0004683969802171158), (u'british', 0.00046774096203754004), (u'of', 0.0004671606382632999), (u'frequently', 0.0004659658540222173), (u'sharp', 0.0004641282422035381), (u'green', 0.0004633604535481145), (u'sudden', 0.00046088259016015666), (u'year', 0.00046039019423049836), (u'deadly', 0.0004588960272715354), (u'the', 0.0004575102785248385), (u'be', 0.00045646800596322034), (u'desperately', 0.00045600552571401745), (u'actual', 0.0004552026990842392), (u'initial', 0.0004543606667144941), (u'local', 0.0004540037455309006), (u'extraordinary', 0.00045265254390729675), (u'ugly', 0.00045265254390729675), (u'heavy', 0.0004506407548232643), (u'ridiculous', 0.0004496348602812481), (u'limited', 0.0004480802959890412), (u'cinematic', 0.00044788778028721993), (u'one', 0.00044732721986132855), (u'unfortunate', 0.00044594658029385536), (u'younger', 0.0004456886586164153), (u'current', 0.00044442249765443685), (u'american', 0.0004437132656613717), (u'ill', 0.00044259359848713466), (u'inevitable', 0.0004405818094031022), (u'tiny', 0.0004393747359526827), (u'greater', 0.0004358876348736932), (u'bottom', 0.00043547496018978905), (u'directly', 0.000433893970015643), (u'fellow', 0.0004321877928800703), (u'latter', 0.00042951696944092376), (u'white', 0.00042928983196369435), (u'latest', 0.0004270929284954093), (u'meanwhile', 0.00042687649626813127), (u'tom', 0.0004224757076468103), (u'unlikely', 0.0004224757076468103), (u'red', 0.0004156615833299263), (u'romantic', 0.00041560618394523616), (u'odd', 0.00041385375442952845), (u'unexpected', 0.0004098644924931742), (u'teen', 0.00040827484352422847), (u'and', 0.0004045741946109285), (u'desperate', 0.00039819549456366027), (u'creative', 0.0003972532773395381), (u'two', 0.00039229887138632385), (u'ten', 0.0003886776510350655), (u'in', 0.0003876323503151146), (u'central', 0.0003875603599074045), (u'on', 0.0003857386895905659), (u'live', 0.0003802281368821293), (u'previously', 0.00037898556127140335), (u'bizarre', 0.0003456619426201175), (u'angry', 0.0003360602219917809)]
We can make this better by combining multiple seet terms
def seed_score(pos_seed,PMI_MATRIX=pmi_matrix,TERMS=terms):
score=defaultdict(int)
for seed in pos_seed:
c=dict(getcollocations(seed,PMI_MATRIX,TERMS))
for w in c:
score[w]+=c[w]
return score
sorted(seed_score(['good','great','perfect','cool']).items(),key=itemgetter(1),reverse=True)
[(u'cool', 0.012001912748204434), (u'perfect', 0.006782938654467102), (u'great', 0.004234935151833858), (u'anti', 0.004160925070909675), (u'fake', 0.003978386428679741), (u'looking', 0.003957222634925364), (u'frank', 0.003953470579252501), (u'lovely', 0.0038977169233890795), (u'eccentric', 0.0038458229553531894), (u'greatest', 0.0037893056708582906), (u'totally', 0.0036293608998168546), (u'amazing', 0.003617561923228757), (u'stupid', 0.0035962513836334904), (u'generally', 0.003553253311814994), (u'climactic', 0.003537863066483464), (u'fun', 0.0035376706229829896), (u'twice', 0.0034429868622216564), (u'known', 0.0034156002474412875), (u'plain', 0.0033558593778353143), (u'good', 0.003300231759403832), (u'nicely', 0.0032826646303092937), (u'alien', 0.0032506377240264714), (u'overall', 0.003246573557239219), (u'convincing', 0.0032306160532405035), (u'necessary', 0.0032268324022759576), (u'earlier', 0.00320436224269597), (u'pretty', 0.0032003653412340915), (u'sad', 0.003187690012362329), (u'painful', 0.0031550334960725986), (u'quiet', 0.0031356051653670504), (u'terribly', 0.0031347252607038514), (u'pure', 0.003130142164470019), (u'past', 0.0031298060186221582), (u'intriguing', 0.003115520114794243), (u'apart', 0.0030991381997725305), (u'tony', 0.003069088503440111), (u'mary', 0.0030514393148461045), (u'actually', 0.0030506607023727842), (u'black', 0.003027284836003895), (u'best', 0.00302388631921692), (u'perfectly', 0.0030237182959442044), (u'horrible', 0.0030226425573185774), (u'friendly', 0.003018374685245776), (u'maybe', 0.003014183313704173), (u'nonetheless', 0.0030069332166719628), (u'non', 0.003003272731140307), (u'definitely', 0.0030031570539691705), (u'necessarily', 0.002993810771295877), (u'musical', 0.002991784203916739), (u'shallow', 0.002981744162904472), (u'extra', 0.002979129889469642), (u'classic', 0.0029709775923671394), (u'sean', 0.0029546706033888666), (u'basically', 0.0029421152254114707), (u'visually', 0.0029358658029767504), (u'bigger', 0.002923170822543089), (u'really', 0.0029218575415147856), (u'light', 0.002902417490684917), (u'straight', 0.00290152535467921), (u'forward', 0.0029006646141443147), (u'brilliant', 0.0028974531469318663), (u'somewhere', 0.002888470361242569), (u'probably', 0.0028667284143727334), (u'technical', 0.0028610204137455974), (u'fully', 0.00286017815256421), (u'forever', 0.002852944783095861), (u'green', 0.0028405257859683893), (u'sympathetic', 0.002839253674438178), (u'excellent', 0.00283813047530481), (u'nice', 0.002833410593649864), (u'day', 0.0028257993835809634), (u'slightly', 0.0028222905234820566), (u'mainly', 0.0028146496498095697), (u'literally', 0.002808983880889349), (u'sadly', 0.002807958075124012), (u'sure', 0.0028000700575213024), (u'huge', 0.0027946757739472075), (u'blue', 0.002783309447572288), (u'professional', 0.002773927402898334), (u'scary', 0.002770530957340576), (u'regular', 0.0027687862144207126), (u'interesting', 0.002767678804545164), (u'all', 0.002767455753874969), (u'moral', 0.0027672538806976406), (u'present', 0.002766172519637502), (u'john', 0.002763285894343504), (u'utterly', 0.0027598334785873114), (u'witty', 0.002759802990681556), (u'stunning', 0.002754567313948104), (u'very', 0.002754475732165381), (u'wonderful', 0.0027528746021938714), (u'nasty', 0.002750778743910107), (u'entire', 0.0027486000007843282), (u'nevertheless', 0.0027422265029806514), (u'quick', 0.0027375966375266063), (u'second', 0.0027345827164219757), (u'especially', 0.002733256847499587), (u'cold', 0.0027272967804060654), (u'same', 0.002726697589261401), (u'memorable', 0.0027248127669407428), (u'steven', 0.0027193586944238095), (u'french', 0.002716473947971556), (u'exactly', 0.002715773745840665), (u'realistic', 0.0027153709885526394), (u'anyway', 0.002711612563487999), (u'mad', 0.0027046117438334037), (u'entirely', 0.0027039462090588176), (u'still', 0.0027030152438311653), (u'third', 0.002694415514054729), (u'smart', 0.002690365929270928), (u'extreme', 0.0026859194745589665), (u'though', 0.002681282965005704), (u'soft', 0.0026798542026700055), (u'also', 0.002679265025818489), (u'famous', 0.002674277568707314), (u'badly', 0.0026718373920243425), (u'constantly', 0.0026691142464650183), (u'yet', 0.0026600761270785295), (u'funny', 0.0026531437436938293), (u'just', 0.002647198765749662), (u'always', 0.002645782644141673), (u'wrong', 0.00264456909940569), (u'inevitable', 0.0026424017992415253), (u'boring', 0.002638324709954785), (u'again', 0.0026361664213644374), (u'final', 0.0026325569909139406), (u'never', 0.002625848256123294), (u'not', 0.0026194513346862627), (u'future', 0.002616954084588698), (u'like', 0.0026162045111048924), (u'usual', 0.002613310253136084), (u'suspenseful', 0.002611112619052867), (u'wonderfully', 0.0026104285952339), (u'chinese', 0.002610227628625764), (u'then', 0.0026099428487068874), (u'right', 0.0026074503637217653), (u'before', 0.002604359817305096), (u'incredibly', 0.002602876658439028), (u'willing', 0.0026028274849918065), (u'robert', 0.00260242630248303), (u'mental', 0.002600612637348587), (u'likable', 0.002596743985407318), (u'completely', 0.0025966342812692813), (u'out', 0.002595421591107008), (u'weird', 0.002594854320599445), (u'bad', 0.002593939708098807), (u'about', 0.0025924880567343464), (u'over', 0.002583915258341187), (u'poor', 0.002580714006945396), (u'top', 0.0025748163584370844), (u'danny', 0.002571782212215268), (u'else', 0.0025710205049421482), (u'deadly', 0.002569539962164636), (u'next', 0.0025692064920089814), (u'easily', 0.0025685985055154945), (u'general', 0.0025597407189698524), (u'there', 0.002558311952404383), (u'later', 0.002553925871598969), (u'dumb', 0.0025518493910496854), (u'long', 0.002551438021081097), (u'nearly', 0.0025482925872103374), (u'evil', 0.002547619195532207), (u'obviously', 0.0025417900394499255), (u'surely', 0.002540410069301379), (u'ever', 0.0025299666270243693), (u'as', 0.0025295649156630716), (u'whole', 0.00252928721041847), (u'attractive', 0.00252799391976053), (u'single', 0.002525086026324002), (u'little', 0.0025237664375132246), (u'minor', 0.002523388079428801), (u'therefore', 0.002521419517208388), (u'already', 0.0025180371571217573), (u'terrific', 0.002516692390671686), (u'and', 0.0025147449761868494), (u'believable', 0.0025147020336030515), (u'intelligent', 0.0025117257402157674), (u'certain', 0.0025098535845574114), (u'tom', 0.0025081816901269868), (u'first', 0.002506983660237951), (u'superior', 0.0025068521818184517), (u'similar', 0.0025067487200958753), (u'average', 0.002506085475734242), (u'only', 0.0025057815352721303), (u'strong', 0.002502877514350509), (u'together', 0.002502209075754721), (u'well', 0.0025007427911392047), (u'able', 0.002500684083336215), (u'lucky', 0.0024996255614257753), (u'important', 0.0024967694522345547), (u'comic', 0.002494477741432366), (u'close', 0.0024890312948109006), (u'major', 0.002488996763330637), (u'hardly', 0.0024843138320983773), (u'too', 0.0024787095371459206), (u'desperately', 0.0024772938757274057), (u'original', 0.002476324684059444), (u'late', 0.0024747309617063505), (u'powerful', 0.0024742040581959284), (u'seriously', 0.002474163813548089), (u'fascinating', 0.0024735683609717705), (u'creative', 0.0024716985872205726), (u'older', 0.002470205167597138), (u'sometimes', 0.002468843682041278), (u'disappointing', 0.0024683926476614993), (u'almost', 0.0024642063085405586), (u'different', 0.002463192492144597), (u'here', 0.002461013442194645), (u'slowly', 0.00245995228639654), (u'traditional', 0.0024577471015898586), (u'other', 0.002457246892174661), (u'favorite', 0.0024567949763845435), (u'away', 0.0024530830995225053), (u'seemingly', 0.002452568157058489), (u'beautiful', 0.0024503474408517005), (u'total', 0.0024495169417283976), (u'hearted', 0.002449259178206941), (u'sexy', 0.0024479446336431497), (u'so', 0.0024453791255933297), (u'back', 0.0024437749617670125), (u'extremely', 0.002442881899286231), (u'last', 0.0024427706214193625), (u'once', 0.0024427435350683263), (u'violent', 0.0024413148535007635), (u'computer', 0.0024410664738611504), (u'most', 0.002440544878761546), (u'greater', 0.002439910529833608), (u'simple', 0.002438425301845975), (u'incredible', 0.002436860796263535), (u'hot', 0.0024361171552712284), (u'wild', 0.0024361150529840993), (u'emotional', 0.002431028224492918), (u'barely', 0.0024295038179497886), (u'much', 0.002429219418852587), (u'normal', 0.0024291864897619296), (u'robin', 0.002429169545568417), (u'originally', 0.002428926091347079), (u'occasionally', 0.0024264869193837964), (u'instead', 0.002426421516587577), (u'biggest', 0.002425814817705052), (u'special', 0.002423753782279029), (u'key', 0.0024198049087821665), (u'emotionally', 0.0024168893316711564), (u'awful', 0.0024166736599597668), (u'merely', 0.0024160066437449), (u'tough', 0.002415850760785141), (u'longer', 0.0024137232490317575), (u'silly', 0.002412926146521142), (u'clear', 0.0024129094987903332), (u'thankfully', 0.002407958791075349), (u'subtle', 0.002407822697286698), (u'solid', 0.0024069495164455216), (u'animated', 0.002403947054178053), (u'far', 0.002402836369689498), (u'effective', 0.0024020099789288174), (u'psychological', 0.0023996855512158134), (u'absolutely', 0.002397100755116605), (u'effectively', 0.002396397590933667), (u'desperate', 0.0023941211834815623), (u'oh', 0.0023922063459022565), (u'intense', 0.002386542866431266), (u'half', 0.002379333472657521), (u'short', 0.0023792589292618623), (u'enough', 0.002377507831705425), (u'eventually', 0.002377299019332627), (u'no', 0.0023770760677420803), (u'least', 0.002368181962538259), (u'lame', 0.002365220157299579), (u'real', 0.0023617633351179614), (u'remarkable', 0.0023606985831208143), (u'more', 0.0023605999078189417), (u'responsible', 0.002359948741648729), (u'soon', 0.002359938226674398), (u'outstanding', 0.002359314496448564), (u'high', 0.0023563850582576634), (u'less', 0.002354248180937743), (u'aware', 0.00235403878234738), (u'off', 0.0023540198109719282), (u'unfortunately', 0.0023538478030223913), (u'entertaining', 0.0023512886326058097), (u'fairly', 0.0023501918963178886), (u'ago', 0.002346131658173984), (u'old', 0.0023456785476582464), (u'big', 0.002343169816371394), (u'usually', 0.0023415700617843986), (u'apparent', 0.0023411235971029913), (u'even', 0.002339017723228175), (u'british', 0.0023343079898948908), (u'private', 0.0023336707837249876), (u'truly', 0.0023333647838521543), (u'brief', 0.002328450862839378), (u'political', 0.002328219300061392), (u'new', 0.0023280603289829024), (u'personal', 0.002327406823136519), (u'pathetic', 0.0023266504982631434), (u'many', 0.0023251322427836583), (u'dull', 0.002323983691361993), (u'successful', 0.00232378963778053), (u'natural', 0.00232347857148544), (u'several', 0.0023221381576374552), (u'such', 0.0023216832271255963), (u'secret', 0.002320036307164827), (u'certainly', 0.002313826438589921), (u'national', 0.002310571362533237), (u'now', 0.0023071026749417445), (u'gary', 0.0023059201550554424), (u'ahead', 0.002305765277776907), (u'possible', 0.002305683370535466), (u'co', 0.002305675716173628), (u'up', 0.002304147677700143), (u'michael', 0.002304054526092925), (u'main', 0.0023034755151444515), (u'previously', 0.002302853044911183), (u'hilarious', 0.0022996054700948603), (u'surprisingly', 0.0022975986386493423), (u'indeed', 0.002294415457668248), (u'quickly', 0.0022894075586336105), (u'honest', 0.0022858426152568534), (u'obvious', 0.0022824351550353363), (u'lead', 0.0022801250926040386), (u'empty', 0.0022787458629400835), (u'bright', 0.0022781744601933487), (u'otherwise', 0.0022759441718963173), (u'typical', 0.002272176030126653), (u'directly', 0.0022716390510039673), (u'acting', 0.002270839907615521), (u'visual', 0.0022693937359295714), (u'red', 0.002267580396747179), (u'potential', 0.002267569683244895), (u'suddenly', 0.002267308638256236), (u'alive', 0.0022647225458673897), (u'particular', 0.002261098546637612), (u'cute', 0.0022601846543045196), (u'difficult', 0.0022583789129845796), (u'serial', 0.0022574422917854783), (u'serious', 0.0022555576176684624), (u'impossible', 0.002252838161706063), (u'human', 0.002251761457857284), (u'capable', 0.002251214627027211), (u'however', 0.002250878310420556), (u'small', 0.0022489610616680407), (u'basic', 0.002247545248914553), (u'rare', 0.002245093696867987), (u'initial', 0.002236370482808303), (u'somewhat', 0.0022362775505530485), (u'occasional', 0.0022325583381675933), (u'perhaps', 0.002229127640019336), (u'better', 0.00222872431278556), (u'immediately', 0.00222837544249588), (u'happy', 0.0022281474572684256), (u'sci', 0.002226199463490062), (u'unexpected', 0.0022242244195155953), (u'initially', 0.002223905432665024), (u'fi', 0.0022224232964827497), (u'deep', 0.002220172553119152), (u'english', 0.0022199070209070445), (u'relatively', 0.0022194005513029185), (u'frankly', 0.0022170437766284835), (u'tim', 0.0022164491592922848), (u'either', 0.002215226909613434), (u'decent', 0.0022151294486753015), (u'hard', 0.0022118044609765446), (u'fantastic', 0.0022116860546185424), (u'true', 0.0022096622931451456), (u'around', 0.0022052101513649414), (u'common', 0.0021964869321670502), (u'guilty', 0.002191528056240432), (u'impressive', 0.0021908868057265865), (u'overly', 0.002187284444122418), (u'the', 0.002186014099868943), (u'tight', 0.0021806236289156106), (u'william', 0.0021792244357119687), (u'few', 0.0021725275981657947), (u'worse', 0.002172125969565442), (u'sharp', 0.002167733580520371), (u'american', 0.0021671834202428948), (u'quite', 0.0021662678232227126), (u'grand', 0.0021638514386397864), (u'ultimate', 0.0021635430435549214), (u'naked', 0.002163253934841081), (u'fair', 0.0021623850316764993), (u'clever', 0.002161758342135364), (u'numerous', 0.0021531205363190913), (u'fast', 0.0021521990875953485), (u'spectacular', 0.0021506353944526525), (u'popular', 0.002149773941789711), (u'international', 0.002145442826787117), (u'dead', 0.002144923871247314), (u'thin', 0.0021420490070066913), (u'rich', 0.002140442241369157), (u'genuinely', 0.0021385328377386517), (u'finally', 0.002135069354217433), (u'strange', 0.00213249946133963), (u'david', 0.002131674948749883), (u'two', 0.0021311542507946985), (u'actual', 0.002130019594255519), (u'critical', 0.0021285787536416668), (u'early', 0.0021275775518247056), (u'ready', 0.002126255316360624), (u'complete', 0.002124450206798375), (u'rather', 0.0021216369548865406), (u'full', 0.002120302975858754), (u'often', 0.002120249530885974), (u'own', 0.0021176220394398546), (u'talented', 0.002117397757039607), (u'star', 0.002110450555424798), (u'sexual', 0.0021100038830677214), (u'slow', 0.002106556435290481), (u'ultimately', 0.002102429070835513), (u'standard', 0.0021011071697935426), (u'recent', 0.0020989821112242834), (u'successfully', 0.00209794634455777), (u'easy', 0.0020913916102462925), (u'cinematic', 0.0020863510243774347), (u'practically', 0.0020863338600298487), (u'innocent', 0.0020829247055854675), (u'apparently', 0.0020818272530590243), (u'white', 0.0020805888193709023), (u'teen', 0.00208043455309245), (u'unique', 0.0020798213565664508), (u'along', 0.0020766370391498675), (u'unable', 0.0020763540400483855), (u'modern', 0.0020714825896373953), (u'latter', 0.0020675605429683686), (u'unusual', 0.002062296039972969), (u'latest', 0.002061272246439931), (u'previous', 0.002057485557325315), (u'social', 0.0020569363927636954), (u'simply', 0.002056561312485684), (u'due', 0.0020562829551382658), (u'equally', 0.0020540934511252187), (u'funniest', 0.0020485830167337998), (u'unfortunate', 0.0020484195089316586), (u'physical', 0.0020473480444393867), (u'accidentally', 0.0020469232499676572), (u'safe', 0.0020448668547450237), (u'dangerous', 0.0020417794161610345), (u'possibly', 0.0020386220903686582), (u'various', 0.0020339050284206316), (u'ridiculous', 0.0020325475952677158), (u'offensive', 0.0020292458611658707), (u'terrible', 0.002026561555154676), (u'virtually', 0.0020262283399145745), (u'weak', 0.0020211161232871035), (u'highly', 0.0020189553152879535), (u'supposedly', 0.0020187061521539625), (u'surprising', 0.0020172937346282825), (u'clearly', 0.002006958446305926), (u'rarely', 0.0020063581098494266), (u'tiny', 0.0020019326856074065), (u'female', 0.002000906579375961), (u'worst', 0.001997082267018422), (u'interested', 0.0019965592969395252), (u'married', 0.0019958703396158782), (u'predictable', 0.001995796500886652), (u'mean', 0.0019952780185648017), (u'giant', 0.0019938810972402903), (u'former', 0.0019935747326038445), (u'wide', 0.001992678512824471), (u'thoroughly', 0.0019876947213072387), (u'open', 0.001985391496371365), (u'of', 0.0019840710261414297), (u'aside', 0.001980879544615943), (u'particularly', 0.0019799884254985325), (u'essentially', 0.001979535958320732), (u'young', 0.001979527444016401), (u'likely', 0.001978555463730754), (u'unbelievable', 0.0019781321288305305), (u'middle', 0.0019779221502780183), (u'graphic', 0.0019757563038609325), (u'fine', 0.0019730514211073512), (u'meanwhile', 0.0019706149351678606), (u'worthy', 0.0019701465238407046), (u'to', 0.001967727301545104), (u'on', 0.001966072780443591), (u'cheap', 0.0019635613315031036), (u'be', 0.0019614212350543806), (u'mysterious', 0.001955105730777367), (u'constant', 0.0019517160902452907), (u'dramatic', 0.0019502335417707808), (u'nowhere', 0.0019443437460459846), (u'enjoyable', 0.001944335976798465), (u'complex', 0.0019425574374672867), (u'mostly', 0.0019343488295512537), (u'ugly', 0.0019244600986641602), (u'available', 0.0019198981505397614), (u'exciting', 0.0019180413841347022), (u'ex', 0.0019178209412079668), (u'dark', 0.0019172892665868906), (u'comedic', 0.0019083717108983882), (u'thus', 0.0019002996618598237), (u'fresh', 0.0018948749882022184), (u'life', 0.0018937712571957634), (u'heavily', 0.0018928253789474128), (u'screen', 0.0018902765330460658), (u'down', 0.001889898895897701), (u'subject', 0.0018831542361833758), (u'ill', 0.0018819965739884203), (u'double', 0.0018796786045038223), (u'loud', 0.0018791703282450395), (u'running', 0.0018682127198850135), (u'poorly', 0.0018675936750734959), (u'younger', 0.0018669711858897257), (u'live', 0.0018649847829290982), (u'unnecessary', 0.001860470865744998), (u'lee', 0.001845920547781358), (u'bizarre', 0.0018344290227850471), (u'sweet', 0.0018344275577852434), (u'military', 0.0018252766899189777), (u'time', 0.0018157018163832289), (u'unfunny', 0.0018107499211615543), (u'central', 0.0018098151545962167), (u'familiar', 0.0018088856542559902), (u'extraordinary', 0.001807414152125126), (u'dimensional', 0.001807168544195006), (u'billy', 0.0018071267256039524), (u'angry', 0.0018045689485650237), (u'laughable', 0.0018010179250777143), (u'large', 0.0017995900716021266), (u'further', 0.0017991308991933287), (u'romantic', 0.0017907365126554233), (u'sudden', 0.0017901045645434214), (u'flat', 0.0017847476326207986), (u'chris', 0.0017769997084619025), (u'public', 0.0017746031851269188), (u'somehow', 0.0017618448125107278), (u'worth', 0.0017543764067179425), (u'largely', 0.0017465367771711331), (u'year', 0.0017394608281398418), (u'foreign', 0.0017392301594858725), (u'sole', 0.0017355727619161666), (u'positive', 0.0017317615894826737), (u'genuine', 0.001718798629778317), (u'recently', 0.0017025823715422455), (u'in', 0.0016990012040850324), (u'ten', 0.0016791478447310657), (u'painfully', 0.001678652772728573), (u'low', 0.0016759724349185725), (u'anywhere', 0.0016733952458503438), (u'limited', 0.001668362161432197), (u'free', 0.0016667490901339968), (u'frequently', 0.0016625677147607005), (u'near', 0.0016612218149129076), (u'naturally', 0.0016607535392798487), (u'humorous', 0.0016352419056944998), (u'odd', 0.0016225109209537126), (u'self', 0.0016005335142226706), (u'fellow', 0.001598076389246976), (u'united', 0.001593774057041965), (u'appropriate', 0.0015883322489310553), (u'one', 0.0015787649736899742), (u'local', 0.0015762071787598516), (u'narrative', 0.0015677455276479862), (u'crazy', 0.0015569066000159573), (u'bottom', 0.001546149619251545), (u'alone', 0.0015445381171166998), (u'minute', 0.001523900044625383), (u'unlikely', 0.0015000995356950866), (u'jean', 0.0014303533753913659), (u'oddly', 0.0014165951312930588), (u'current', 0.0013195330501843442), (u'heavy', 0.001248448290982055)]
posscores=seed_score(['good','great','perfect','cool'])
negscores=seed_score(['bad','terrible','wrong',"crap","long","boring"])
## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms:
sentscores[w] = posscores[w] - negscores[w]
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
[(u'terrible', -0.010972487858524456), (u'boring', -0.009152588531402), (u'wrong', -0.0037842569272043196), (u'unfunny', -0.0028839715464925985), (u'bad', -0.002745669347410218), (u'frankly', -0.002735683658733542), (u'worst', -0.002650800210468679), (u'terribly', -0.002497993000217121), (u'anywhere', -0.002479642275811881), (u'laughable', -0.0024600189948948362), (u'horrible', -0.0023085769877623907), (u'awful', -0.0022332893067823654), (u'exciting', -0.0021194079061992045), (u'dull', -0.0019475225393855247), (u'running', -0.001919677366722775), (u'ugly', -0.0019027857871608356), (u'total', -0.0018358263440521236), (u'oddly', -0.001825867801362017), (u'painfully', -0.0017780445048585325), (u'ridiculous', -0.0017569353131335745), (u'poorly', -0.0017508500966694365), (u'bottom', -0.0016995579532760772), (u'current', -0.0016987113085641865), (u'successfully', -0.0016642378925818217), (u'pathetic', -0.0016356962074799996), (u'long', -0.0016266635116819225), (u'loud', -0.001602897330102698), (u'supposedly', -0.001601963787952321), (u'ten', -0.0015963920401226126), (u'longer', -0.0015882846862663351), (u'fair', -0.0015781345487195808), (u'complete', -0.0015756336615723376), (u'responsible', -0.0015332676187930182), (u'sadly', -0.001527851579187234), (u'foreign', -0.0015251667153189461), (u'chinese', -0.0015061170510252096), (u'positive', -0.0014976172523279425), (u'minute', -0.0014838443386630507), (u'worth', -0.0014745224449065645), (u'low', -0.001469731011335925), (u'sole', -0.0014495317049276017), (u'worse', -0.0013882709260687686), (u'stupid', -0.0013202645150308997), (u'unbelievable', -0.001319704417793653), (u'unnecessary', -0.0013068250007934096), (u'giant', -0.0012862481305175371), (u'guilty', -0.0012696870826236872), (u'huge', -0.0011716363434408194), (u'particular', -0.0011649531345376174), (u'nowhere', -0.0011620638741387348), (u'predictable', -0.0011613024162225889), (u'one', -0.0011593874971369273), (u'frequently', -0.0011554285647090838), (u'weak', -0.0011531149853807226), (u'down', -0.0011419182039187215), (u'offensive', -0.0011366657479160089), (u'graphic', -0.0011363816849142734), (u'seriously', -0.0011344455374323872), (u'desperately', -0.0011161513132458799), (u'oh', -0.001111547662698142), (u'double', -0.0011020533566854943), (u'international', -0.0010928198567703579), (u'thankfully', -0.0010902301162478444), (u'completely', -0.0010760318931627264), (u'poor', -0.00103866702754081), (u'silly', -0.001038540312821459), (u'absolutely', -0.001027435748663518), (u'jean', -0.0010140017472082751), (u'to', -0.0010115858226274967), (u'gary', -0.0010070218297941547), (u'possible', -0.0010025590916556012), (u'standard', -0.0009912099239919566), (u'of', -0.0009878949504556712), (u'dumb', -0.0009784543398925452), (u'disappointing', -0.0009756900327863331), (u'heavy', -0.0009713451522002838), (u'flat', -0.0009663081712942411), (u'middle', -0.0009621073865780962), (u'somehow', -0.0009587009918721025), (u'ex', -0.0009476020360521058), (u'no', -0.0009470747212862265), (u'due', -0.0009423906128011132), (u'sudden', -0.0009346695321723595), (u'hardly', -0.0009309010280151705), (u'narrative', -0.000928523352480348), (u'accidentally', -0.0009249678255190534), (u'female', -0.0009243270408085982), (u'public', -0.0009236482751538752), (u'entertaining', -0.0009229050621359738), (u'equally', -0.0009172811304320738), (u'modern', -0.0009162848762493533), (u'physical', -0.0008950524891427249), (u'apart', -0.0008926619097619935), (u'safe', -0.000887888000516411), (u'naked', -0.0008820158184256301), (u'cheap', -0.0008739972045461218), (u'basic', -0.0008681093271798832), (u'possibly', -0.0008674045241233957), (u'subject', -0.0008619626400079704), (u'plain', -0.0008577237983704335), (u'sweet', -0.00084739561689972), (u'robin', -0.0008467671520451816), (u'twice', -0.0008415855534974386), (u'anyway', -0.0008388387974757141), (u'angry', -0.0008340617818132718), (u'overly', -0.0008330763662077099), (u'largely', -0.0008316600758162794), (u'aside', -0.0008156820574845604), (u'slow', -0.0008140632049922348), (u'talented', -0.000813361131030282), (u'essentially', -0.0008084723479426809), (u'up', -0.0008050649163935399), (u'ultimate', -0.000804619511346139), (u'practically', -0.0008036196525271233), (u'obvious', -0.0007897023305959558), (u'appropriate', -0.0007858599546488657), (u'unfortunately', -0.0007830743334537918), (u'tiny', -0.0007810016970390661), (u'lame', -0.0007784745629901267), (u'unique', -0.000778336129572441), (u'complex', -0.0007777213054844434), (u'attractive', -0.0007720889163268158), (u'bizarre', -0.0007715003386157852), (u'better', -0.0007709039498411344), (u'odd', -0.0007552599151565062), (u'recently', -0.0007500236040481291), (u'half', -0.0007497782196305021), (u'rich', -0.0007471604119471463), (u'incredibly', -0.000744163886792942), (u'even', -0.0007426983048095066), (u'central', -0.0007425050228027793), (u'extremely', -0.0007316220772763207), (u'indeed', -0.0007309357257491715), (u'dead', -0.0007016908132519372), (u'superior', -0.0007015707094964823), (u'least', -0.0007011989621771744), (u'big', -0.0006905031732485781), (u'otherwise', -0.0006875860325658988), (u'aware', -0.0006873664321727212), (u'truly', -0.0006829821092590271), (u'fascinating', -0.0006779817080232037), (u'else', -0.0006768431043362028), (u'hard', -0.0006751686847043361), (u'free', -0.0006750478288130044), (u'military', -0.0006696152782781776), (u'mostly', -0.0006673677322305189), (u'be', -0.0006666705541028459), (u'too', -0.0006648100456499773), (u'utterly', -0.0006634455006497703), (u'such', -0.0006628058555271372), (u'interested', -0.0006590829464311145), (u'honest', -0.000657989708254409), (u'now', -0.000656435974339334), (u'time', -0.0006548332902411959), (u'sci', -0.0006544917451371882), (u'so', -0.0006524748570703415), (u'merely', -0.0006515577959294184), (u'the', -0.0006467840837392916), (u'fi', -0.0006423589762960891), (u'apparently', -0.000639277256057974), (u'totally', -0.0006388136916126528), (u'alone', -0.0006386257017009485), (u'there', -0.0006295539285055061), (u'rather', -0.0006289319771022664), (u'surprising', -0.0006273162579565118), (u'potential', -0.0006213470215711814), (u'easy', -0.000614095507121281), (u'already', -0.0006138704527556523), (u'various', -0.0006066641881557529), (u'crazy', -0.0005990740681345009), (u'future', -0.00059231466233218), (u'former', -0.000592036636930244), (u'special', -0.0005892044631823196), (u'local', -0.0005876959334561714), (u'bigger', -0.0005838593238211101), (u'available', -0.0005827300803454157), (u'year', -0.000580922857070901), (u'though', -0.0005779332324228033), (u'dark', -0.0005772746048261371), (u'major', -0.0005768078397518804), (u'funny', -0.000570069950523432), (u'latter', -0.0005694397396364772), (u'finally', -0.0005684924103107219), (u'only', -0.000567828620661859), (u'screen', -0.000567599131636246), (u'critical', -0.0005671683394556418), (u'chris', -0.0005591477399963297), (u'along', -0.0005545144723625335), (u'thus', -0.0005539437349819129), (u'here', -0.000552998751320494), (u'somewhere', -0.0005526140253634174), (u'much', -0.0005472668310253412), (u'wide', -0.0005417488267065382), (u'whole', -0.0005410414667989857), (u'painful', -0.0005375112978087131), (u'entirely', -0.0005363774322553268), (u'rarely', -0.0005360939600075622), (u'cinematic', -0.0005355064060903643), (u'fellow', -0.0005223937119934513), (u'tough', -0.0005223932466972897), (u'impressive', -0.0005204000586038864), (u'desperate', -0.0005201503282672598), (u'just', -0.0005141574466121833), (u'perhaps', -0.00051230856441907), (u'decent', -0.0005111996882843865), (u'few', -0.0005012104054137241), (u'wild', -0.0004973584556874676), (u'common', -0.000495809685954213), (u'early', -0.000495441064020094), (u'simply', -0.000494655499325071), (u'near', -0.0004927411616316157), (u'self', -0.0004917660782779683), (u'brief', -0.0004904708404110509), (u'strange', -0.0004856694163470239), (u'genuine', -0.0004851490258353313), (u'young', -0.00048397945794040765), (u'main', -0.00048026091243232646), (u'single', -0.00047829531336838213), (u'really', -0.0004775721943443196), (u'seemingly', -0.00047519001237246884), (u'large', -0.0004729756630322639), (u'shallow', -0.0004705897720634869), (u'either', -0.000468003391278571), (u'ultimately', -0.000458935267087319), (u'enough', -0.0004587562198272491), (u'romantic', -0.00045773308400239607), (u'deep', -0.00045621751119198726), (u'likely', -0.0004485419059861665), (u'on', -0.0004469204589403505), (u'younger', -0.0004434298896550497), (u'lead', -0.000443396464230313), (u'far', -0.0004417554607726356), (u'unlikely', -0.0004358413663177571), (u'spectacular', -0.0004354932315535024), (u'dramatic', -0.0004335562827193116), (u'mainly', -0.00043321869064741683), (u'away', -0.00043155390057646875), (u'short', -0.0004314804989861156), (u'instead', -0.00043116388455471987), (u'occasional', -0.00043083489345647694), (u'relatively', -0.0004248372994793332), (u'previous', -0.0004235288177482448), (u'climactic', -0.00042281811888987813), (u'back', -0.0004193178584839625), (u'lee', -0.00041872402509387585), (u'not', -0.0004185272461878745), (u'ever', -0.00041848457566656325), (u'pretty', -0.00041769940523365403), (u'funniest', -0.00041687076238961385), (u'ahead', -0.0004165029303795421), (u'impossible', -0.0004155484971273449), (u'live', -0.00041471729212954624), (u'therefore', -0.00041341784099136174), (u'quick', -0.00041187248937471065), (u'alive', -0.0004117125503202801), (u'badly', -0.0004108524460450782), (u'typical', -0.000409235693942564), (u'happy', -0.0004068915656132992), (u'constant', -0.00040174761783828296), (u'previously', -0.00040000604515426147), (u'never', -0.0003983941378316201), (u'evil', -0.00039506279830728557), (u'full', -0.0003942338907744736), (u'simple', -0.00039197455254401034), (u'worthy', -0.00039168348357142236), (u'then', -0.00039077370012267605), (u'dimensional', -0.0003896250283736383), (u'successful', -0.0003867862347564691), (u'often', -0.00038614169990034124), (u'easily', -0.0003860212360946879), (u'difficult', -0.000380357876911706), (u'quite', -0.000379233709865312), (u'thin', -0.00037527043094111485), (u'familiar', -0.00037472614921202135), (u'around', -0.0003720425242989393), (u'many', -0.00037151132475908176), (u'old', -0.0003686425008043049), (u'real', -0.00036753891906066106), (u'teen', -0.0003654854190965301), (u'maybe', -0.0003611065804915033), (u'ago', -0.0003574564824057854), (u'certainly', -0.00035481411511190455), (u'serious', -0.0003486634118993405), (u'however', -0.00034805065516387056), (u'ill', -0.0003462147957509242), (u'quickly', -0.00034413844309944637), (u'as', -0.00033728996602781043), (u'acting', -0.0003339975947398836), (u'different', -0.0003321248305263773), (u'ready', -0.0003294929896921037), (u'small', -0.00032775007962081376), (u'naturally', -0.00032626435831311154), (u'apparent', -0.0003176915986955846), (u'usual', -0.0003167083678088465), (u'comedic', -0.0003126089317118585), (u'entire', -0.0003109801245195676), (u'white', -0.0003090466973696814), (u'fast', -0.0003051831476534321), (u'directly', -0.00030150982004841865), (u'other', -0.0003014951572378752), (u'first', -0.00030124314858892727), (u'bright', -0.000301236409938576), (u'important', -0.00030112588430155924), (u'forever', -0.0003000092986698397), (u'billy', -0.0002992233022370851), (u'meanwhile', -0.00029487907702504526), (u'interesting', -0.0002943217343913131), (u'capable', -0.0002897533323065397), (u'immediately', -0.00028673215006844643), (u'psychological', -0.0002864699751391384), (u'social', -0.00028308869410780327), (u'next', -0.0002809045391921473), (u'english', -0.0002802107507452966), (u'obviously', -0.00027704797934609325), (u'in', -0.00027678957462428494), (u'eventually', -0.00027305546760727313), (u'fairly', -0.00027229365184522174), (u'intense', -0.00026985637420645593), (u'unable', -0.0002683859719104266), (u'more', -0.00026716151340399783), (u'violent', -0.00026459918765267066), (u'innocent', -0.00026205143202198515), (u'virtually', -0.0002589694961312146), (u'genuinely', -0.00025800944437602774), (u'soon', -0.0002545594041449741), (u'personal', -0.00025286180823031953), (u'out', -0.00025086090745423143), (u'right', -0.00025051898635050206), (u'mean', -0.0002502807827501163), (u'recent', -0.00024867750382279695), (u'limited', -0.00024784695862205795), (u'likable', -0.00024042725922320766), (u'mary', -0.00023855912341423993), (u'believable', -0.00023802304819741564), (u'little', -0.00023745748842529495), (u'tight', -0.00023723216108617606), (u'further', -0.00023584754345536097), (u'hilarious', -0.00023404885314757227), (u'sexy', -0.00023348151474449725), (u'exactly', -0.00023063377365658633), (u'again', -0.00022367754474764872), (u'suddenly', -0.00022132068168206898), (u'thoroughly', -0.00022076378917465717), (u'basically', -0.00021691414673864718), (u'about', -0.00021618939077793116), (u'clear', -0.00021400614115190146), (u'barely', -0.00021357675691639636), (u'mental', -0.00020952791125236842), (u'several', -0.00020854603013824432), (u'regular', -0.0002042841580624016), (u'new', -0.0002020211638352632), (u'certain', -0.00020169403649717126), (u'sure', -0.00020014067646794255), (u'david', -0.00019961008410458595), (u'human', -0.00019925137263039324), (u'top', -0.0001899368272478951), (u'cute', -0.00018961027049064529), (u'third', -0.00018946261705126213), (u'empty', -0.00018932179220444711), (u'life', -0.0001881061929275838), (u'average', -0.00018529951201760137), (u'necessarily', -0.00018083506978792746), (u'heavily', -0.00018056593853440356), (u'able', -0.00017675713661988212), (u'famous', -0.00017340787040879407), (u'most', -0.00016877056962312335), (u'mysterious', -0.00016791894103787043), (u'straight', -0.00016612820272913012), (u'actual', -0.00016296115754637318), (u'effectively', -0.00016113722883281424), (u'popular', -0.00016088036779547807), (u'nearly', -0.00015392284302645064), (u'open', -0.00015301437709358805), (u'united', -0.000150506573371944), (u'like', -0.00014684414499384693), (u'own', -0.0001465038219682995), (u'well', -0.00014394345620901295), (u'fantastic', -0.00014191595466896134), (u'almost', -0.0001400886356707153), (u'once', -0.0001368067428779078), (u'fine', -0.00011134453079526731), (u'good', -0.00010961438203885027), (u'two', -0.00010518289697645242), (u'together', -0.00010393211528060466), (u'actually', -0.00010134456936546597), (u'numerous', -9.962759606895041e-05), (u'over', -9.620453254701551e-05), (u'sexual', -8.823802345022313e-05), (u'probably', -8.39454478180437e-05), (u'star', -7.851460842608774e-05), (u'alien', -7.513072776308598e-05), (u'incredible', -7.510927174698197e-05), (u'national', -7.010488710382408e-05), (u'last', -6.685297325562545e-05), (u'high', -6.557268151637574e-05), (u'similar', -6.532964703839881e-05), (u'technical', -6.412624532036197e-05), (u'occasionally', -6.369915895968652e-05), (u'yet', -6.284366017743652e-05), (u'close', -6.248830923808076e-05), (u'subtle', -5.8596990943074294e-05), (u'clearly', -5.6041362763975706e-05), (u'red', -5.3225460752209883e-05), (u'less', -5.217972277071119e-05), (u'tom', -4.91060465189477e-05), (u'older', -4.3549597780270094e-05), (u'emotional', -4.032711358606736e-05), (u'usually', -3.9379866510426756e-05), (u'inevitable', -3.6282134903240816e-05), (u'later', -3.373131751310907e-05), (u'very', -3.174277372530645e-05), (u'private', -3.1614666906534025e-05), (u'stunning', -2.981091908429035e-05), (u'originally', -2.2794236577809452e-05), (u'dangerous', -2.2306624481820136e-05), (u'same', -1.9959277388562156e-05), (u'enjoyable', -9.85683993637407e-06), (u'extraordinary', -7.890864187094478e-06), (u'key', -7.418521772090299e-06), (u'cold', -4.944599197635301e-06), (u'favorite', -8.634601881634361e-07), (u'particularly', -7.76250227279944e-07), (u'general', 1.101516064030755e-05), (u'true', 1.1805019675490552e-05), (u'visual', 3.201351655334698e-05), (u'off', 3.20506856284937e-05), (u'deadly', 3.215829148920345e-05), (u'soft', 3.795379437129039e-05), (u'french', 3.837116293783395e-05), (u'before', 4.1991564575691656e-05), (u'also', 4.4783796449971575e-05), (u'sometimes', 4.768285359741612e-05), (u'and', 4.90900251027852e-05), (u'weird', 4.947972548887455e-05), (u'still', 5.6334215726506e-05), (u'terrific', 5.969721653504645e-05), (u'rare', 6.314096557611401e-05), (u'original', 6.409569795860466e-05), (u'normal', 6.539526715559733e-05), (u'surprisingly', 6.691039584375957e-05), (u'tim', 6.729369707955809e-05), (u'nasty', 7.067260219057901e-05), (u'realistic', 7.212112179272876e-05), (u'strong', 7.35983227081842e-05), (u'nice', 7.386518593153991e-05), (u'married', 7.393385255529284e-05), (u'late', 7.592604510232729e-05), (u'powerful', 8.25294213718502e-05), (u'american', 8.326019014575541e-05), (u'clever', 8.329213289233256e-05), (u'secret', 8.548700184917704e-05), (u'smart', 9.268513096002354e-05), (u'fresh', 9.767509726045531e-05), (u'michael', 0.0001016982636787467), (u'humorous', 0.00010207761912246673), (u'grand', 0.00010435025496580716), (u'robert', 0.00012628071077446694), (u'danny', 0.00013877623758952332), (u'necessary', 0.0001438228039764189), (u'mad', 0.00014411909468519356), (u'slowly', 0.00014953402449897164), (u'greater', 0.00015546179668127127), (u'hot', 0.00015992211413762758), (u'intelligent', 0.00016263174566505736), (u'especially', 0.00019207270167562098), (u'intriguing', 0.00019872580881751846), (u'extra', 0.00019993570131601312), (u'fun', 0.00020292320002075152), (u'brilliant', 0.00021403595449744723), (u'nevertheless', 0.00021588599225877848), (u'witty', 0.00022385923452095634), (u'slightly', 0.00022961481408002122), (u'sympathetic', 0.0002302595681872681), (u'biggest', 0.00023079503523019528), (u'beautiful', 0.00023579711492265282), (u'comic', 0.00023855794721270923), (u'extreme', 0.00024578270717409137), (u'latest', 0.00024824345238829565), (u'unusual', 0.0002530611082013979), (u'initially', 0.0002530628290485521), (u'highly', 0.00026438372066875534), (u'unfortunate', 0.00026675969756389346), (u'natural', 0.00026923164506055097), (u'initial', 0.0002748938324180273), (u'non', 0.0002752331506748959), (u'serial', 0.0002756322163883231), (u'blue', 0.00029048945785789407), (u'moral', 0.0002912641065306379), (u'always', 0.00029159719750840004), (u'willing', 0.000296038683355248), (u'second', 0.0003020498409727593), (u'literally', 0.00030452762415470206), (u'final', 0.0003152084839881915), (u'all', 0.00032168970628067804), (u'co', 0.0003417054595779819), (u'memorable', 0.0003498900458939291), (u'william', 0.0003540704811879115), (u'black', 0.00035521208784980137), (u'remarkable', 0.0003566660721891061), (u'visually', 0.000359933098097261), (u'minor', 0.0003618028522602351), (u'lucky', 0.0003672702501654841), (u'wonderful', 0.0003778512338285145), (u'effective', 0.0003830990144812743), (u'light', 0.0003857061782790933), (u'forward', 0.0004006219278071864), (u'animated', 0.00040249539286419094), (u'constantly', 0.0004036991860032579), (u'present', 0.000431445941106833), (u'unexpected', 0.0004471150032176463), (u'solid', 0.00045683245559449605), (u'scary', 0.0004586701851304227), (u'political', 0.0004673296065559056), (u'fully', 0.00047278144155897105), (u'overall', 0.0004748742705482376), (u'sad', 0.0005042726632991479), (u'fake', 0.0005083772153702511), (u'creative', 0.0005144691118677577), (u'steven', 0.0005346174697472804), (u'british', 0.0005471901111347385), (u'computer', 0.0005693113675447382), (u'somewhat', 0.0005694140016848977), (u'surely', 0.000576167354583601), (u'classic', 0.0005922804075833362), (u'earlier', 0.0006228774574494906), (u'sharp', 0.0006241216900057577), (u'best', 0.0006285737755879527), (u'green', 0.0006580402536190189), (u'wonderfully', 0.0006671343716093095), (u'friendly', 0.0006784541548675283), (u'pure', 0.0006843221033917685), (u'john', 0.0007138235664288961), (u'professional', 0.0007243867175374245), (u'definitely', 0.0007244897007716382), (u'musical', 0.0007417843886875146), (u'sean', 0.0007546407856791931), (u'past', 0.0007576435856556799), (u'excellent', 0.0007616899195319734), (u'emotionally', 0.0007727788550710918), (u'day', 0.0007769853341199288), (u'perfectly', 0.0007822992428988228), (u'generally', 0.0008262218888549855), (u'nonetheless', 0.0008952668065729035), (u'amazing', 0.0008995151565794725), (u'outstanding', 0.000948975985921854), (u'traditional', 0.0009850701844558505), (u'known', 0.0010065109302026419), (u'nicely', 0.0010459151709915544), (u'hearted', 0.0010560164273679716), (u'tony', 0.0010741410742027158), (u'suspenseful', 0.0011158487392470323), (u'anti', 0.001117982032020231), (u'eccentric', 0.0012097055720385144), (u'frank', 0.0012810788158073706), (u'convincing', 0.0013331906526105812), (u'great', 0.0014669121817867479), (u'quiet', 0.0014792862564728217), (u'lovely', 0.0014831258055575964), (u'looking', 0.0014881835706561514), (u'greatest', 0.0017617484017726204), (u'perfect', 0.004193549800085605), (u'cool', 0.008314337620380668)]
Now let's apply this methodology to real (and important!) scenario where we don't have any sentiment labels: the Kardashians
## Loading the Kardashian data
with open("kardashian-transcripts.json", "rb") as f:
transcripts = json.load(f)
msgs = [m['text'].lower() for transcript in transcripts
for m in transcript ]
#msgs_pos_tagged = [pos_tag(tokenizer.tokenize(m)) for m in msgs]
msgs_adj_adv_only_tokenized=[[w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]]
for m in msgs_pos_tagged]
msgs_adj_adv_only=[" ".join([w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]])
for m in msgs_pos_tagged]
msgs[23]
u'and then if you could take out the trash, and then if you go to dash, maybe tomorrow or whatever, later today and just...'
msgs_adj_adv_only[23]
u'then then maybe later just'
vec = CountVectorizer(min_df = 10)
X = vec.fit_transform(msgs_adj_adv_only)
terms_kard = vec.get_feature_names()
len(terms_kard)
347
pmi_matrix_kard=getcollocations_matrix(X)
getcollocations("good",pmi_matrix_kard,terms_kard)
[(u'good', 0.0014394723893038387), (u'changei', 0.0013550135501355014), (u'positive', 0.0006097560975609756), (u'horrible', 0.0003695491500369549), (u'awful', 0.00031269543464665416), (u'nude', 0.00030795762503079576), (u'you', 0.0002463661000246366), (u'extremely', 0.00022583559168925022), (u'proud', 0.00021557033752155703), (u'willing', 0.00019357336430507162), (u'pretty', 0.00016592002654720425), (u'strong', 0.00016260162601626016), (u'and', 0.00013550135501355014), (u'anywhere', 0.00013550135501355014), (u'such', 0.00013428062208550013), (u'adrienne', 0.0001231830500123183), (u'dramatic', 0.0001231830500123183), (u'honest', 0.0001231830500123183), (u'online', 0.0001231830500123183), (u'though', 0.0001231830500123183), (u'two', 0.0001231830500123183), (u'kimberly', 0.00011291779584462511), (u'fun', 0.00010986596352450011), (u'half', 0.00010423181154888472), (u'very', 0.0001007104665641251), (u'really', 9.148499128303384e-05), (u'all', 7.970667941973537e-05), (u'instead', 7.527853056308341e-05), (u'too', 7.501805121857447e-05), (u'black', 7.259001161440186e-05), (u'super', 7.13165026387106e-05), (u'big', 6.929046563192905e-05), (u'actually', 6.900531968282646e-05), (u'yeah', 6.45244547683572e-05), (u'sometimes', 6.302388605281402e-05), (u'like', 6.159152500615915e-05), (u'hard', 6.067224851352991e-05), (u'about', 5.76601510695958e-05), (u'clean', 5.76601510695958e-05), (u'um', 5.741582839557209e-05), (u'busy', 5.6458897922312554e-05), (u'sure', 5.474802222769703e-05), (u'before', 5.4200542005420054e-05), (u'close', 5.313778627982358e-05), (u'real', 5.2930216802168025e-05), (u'always', 5.211590577444236e-05), (u'great', 5.113258679756609e-05), (u'smart', 5.01856870420556e-05), (u'also', 4.9573666468372e-05), (u'not', 4.392812007920578e-05), (u'back', 4.3245113302196845e-05), (u'maybe', 4.065040650406504e-05), (u'single', 4.0448165675686606e-05), (u'own', 3.9562439420014635e-05), (u'gon', 3.8714672861014324e-05), (u'definitely', 3.8062178374592734e-05), (u'still', 3.7639265281541705e-05), (u'healthy', 3.662198784150004e-05), (u'armenian', 3.474393718296157e-05), (u'okay', 3.3875338753387534e-05), (u'so', 3.373213628801389e-05), (u'beautiful', 3.22622273841786e-05), (u'best', 3.169622339498249e-05), (u'rude', 3.151194302640701e-05), (u'nervous', 3.0795762503079576e-05), (u'different', 3.0449742699674187e-05), (u'least', 3.0111412225233366e-05), (u'fine', 2.9456816307293507e-05), (u'hot', 2.88300755347979e-05), (u'as', 2.8526601055484238e-05), (u'whole', 2.8526601055484238e-05), (u'again', 2.8426857695150378e-05), (u'gorgeous', 2.8229448961156277e-05), (u'ready', 2.6920799009314596e-05), (u'absolutely', 2.656889313991179e-05), (u'far', 2.656889313991179e-05), (u'well', 2.589197866500958e-05), (u'pregnant', 2.5809781907342885e-05), (u'only', 2.50928435210278e-05), (u'just', 2.3923261831488373e-05), (u'probably', 2.3362302588543127e-05), (u'then', 2.3228803716608595e-05), (u'right', 2.2860658054433303e-05), (u'few', 2.2583559168925024e-05), (u'happy', 2.128293534244243e-05), (u'now', 2.089995193011056e-05), (u'long', 2.0846362309776944e-05), (u'next', 2.068722977306109e-05), (u'never', 2.0592911096284215e-05), (u'here', 2.018863308703201e-05), (u'together', 2.012396361587378e-05), (u'better', 1.9357336430507162e-05), (u'cool', 1.9084697889232416e-05), (u'comfortable', 1.8819632640770853e-05), (u'anymore', 1.782912565967765e-05), (u'obviously', 1.7371968591480784e-05), (u'enough', 1.6728562347351868e-05), (u'perfect', 1.6524555489457333e-05), (u'first', 1.6325464459463874e-05), (u'honestly', 1.6325464459463874e-05), (u'old', 1.4188623561628287e-05), (u'already', 1.4114724480578139e-05), (u'ever', 1.3415975743915856e-05), (u'bad', 1.302897644361059e-05), (u'na', 1.2663678038649544e-05), (u'new', 1.2569698980848807e-05), (u'up', 1.0112041418921651e-05), (u'there', 9.801183002788436e-06), (u'wrong', 9.475619231716793e-06), (u'else', 9.03342366757001e-06), (u'crazy', 8.742022904100009e-06), (u'last', 8.416233230655287e-06), (u'little', 6.913334439466843e-06), (u'more', 5.000050000500005e-06), (u'even', 3.4391206856230997e-06), (u'much', 2.9780517585395634e-06), (u'able', 0.0), (u'acceptable', 0.0), (u'accurate', 0.0), (u'active', 0.0), (u'afraid', 0.0), (u'ago', 0.0), (u'ahead', 0.0), (u'alcoholic', 0.0), (u'almost', 0.0), (u'alone', 0.0), (u'along', 0.0), (u'amazing', 0.0), (u'american', 0.0), (u'anal', 0.0), (u'angry', 0.0), (u'annoying', 0.0), (u'anxious', 0.0), (u'anyway', 0.0), (u'apart', 0.0), (u'apparently', 0.0), (u'appropriate', 0.0), (u'around', 0.0), (u'atm', 0.0), (u'away', 0.0), (u'awesome', 0.0), (u'awkward', 0.0), (u'barely', 0.0), (u'basic', 0.0), (u'basically', 0.0), (u'belly', 0.0), (u'bible', 0.0), (u'bigger', 0.0), (u'biggest', 0.0), (u'boring', 0.0), (u'bright', 0.0), (u'bunim', 0.0), (u'certain', 0.0), (u'certainly', 0.0), (u'clear', 0.0), (u'clearly', 0.0), (u'cold', 0.0), (u'common', 0.0), (u'complete', 0.0), (u'completely', 0.0), (u'constantly', 0.0), (u'couple', 0.0), (u'cute', 0.0), (u'dead', 0.0), (u'deep', 0.0), (u'delicious', 0.0), (u'desperate', 0.0), (u'diaper', 0.0), (u'difficult', 0.0), (u'disappointed', 0.0), (u'done', 0.0), (u'double', 0.0), (u'down', 0.0), (u'drunk', 0.0), (u'dry', 0.0), (u'dumb', 0.0), (u'early', 0.0), (u'easier', 0.0), (u'easy', 0.0), (u'em', 0.0), (u'embarrassing', 0.0), (u'emotional', 0.0), (u'entire', 0.0), (u'especially', 0.0), (u'everywhere', 0.0), (u'exactly', 0.0), (u'excited', 0.0), (u'exciting', 0.0), (u'extra', 0.0), (u'fabulous', 0.0), (u'fair', 0.0), (u'fast', 0.0), (u'fat', 0.0), (u'favorite', 0.0), (u'female', 0.0), (u'finally', 0.0), (u'forever', 0.0), (u'forward', 0.0), (u'free', 0.0), (u'fresh', 0.0), (u'full', 0.0), (u'funny', 0.0), (u'fur', 0.0), (u'girlfriend', 0.0), (u'glad', 0.0), (u'god', 0.0), (u'gray', 0.0), (u'gross', 0.0), (u'grown', 0.0), (u'guilty', 0.0), (u'guys', 0.0), (u'high', 0.0), (u'hopefully', 0.0), (u'huge', 0.0), (u'huh', 0.0), (u'hundred', 0.0), (u'hungry', 0.0), (u'important', 0.0), (u'in', 0.0), (u'incredible', 0.0), (u'inside', 0.0), (u'interested', 0.0), (u'jealous', 0.0), (u'kardashian', 0.0), (u'kelly', 0.0), (u'khloe', 0.0), (u'kim', 0.0), (u'kmart', 0.0), (u'kris', 0.0), (u'laker', 0.0), (u'late', 0.0), (u'lately', 0.0), (u'later', 0.0), (u'less', 0.0), (u'light', 0.0), (u'lily', 0.0), (u'literally', 0.0), (u'live', 0.0), (u'low', 0.0), (u'luxurious', 0.0), (u'mad', 0.0), (u'major', 0.0), (u'male', 0.0), (u'many', 0.0), (u'married', 0.0), (u'mean', 0.0), (u'miserable', 0.0), (u'moral', 0.0), (u'most', 0.0), (u'murray', 0.0), (u'naked', 0.0), (u'natural', 0.0), (u'necessary', 0.0), (u'nice', 0.0), (u'normal', 0.0), (u'normally', 0.0), (u'off', 0.0), (u'often', 0.0), (u'oh', 0.0), (u'older', 0.0), (u'once', 0.0), (u'open', 0.0), (u'other', 0.0), (u'out', 0.0), (u'outside', 0.0), (u'over', 0.0), (u'past', 0.0), (u'personal', 0.0), (u'poor', 0.0), (u'possible', 0.0), (u'possibly', 0.0), (u'potential', 0.0), (u'present', 0.0), (u'private', 0.0), (u'professional', 0.0), (u'public', 0.0), (u'quick', 0.0), (u'quiet', 0.0), (u'rather', 0.0), (u'red', 0.0), (u'regular', 0.0), (u'rich', 0.0), (u'rid', 0.0), (u'ridiculous', 0.0), (u'rob', 0.0), (u'sad', 0.0), (u'safe', 0.0), (u'same', 0.0), (u'san', 0.0), (u'scared', 0.0), (u'scary', 0.0), (u'scott', 0.0), (u'second', 0.0), (u'secret', 0.0), (u'selfish', 0.0), (u'sensitive', 0.0), (u'serious', 0.0), (u'seriously', 0.0), (u'sexual', 0.0), (u'sexy', 0.0), (u'short', 0.0), (u'sick', 0.0), (u'skin', 0.0), (u'small', 0.0), (u'somewhere', 0.0), (u'soon', 0.0), (u'sorry', 0.0), (u'special', 0.0), (u'straight', 0.0), (u'stupid', 0.0), (u'sudden', 0.0), (u'supportive', 0.0), (u'sweet', 0.0), (u'ta', 0.0), (u'tall', 0.0), (u'ten', 0.0), (u'thebouncedryer', 0.0), (u'tired', 0.0), (u'top', 0.0), (u'total', 0.0), (u'totally', 0.0), (u'touch', 0.0), (u'tough', 0.0), (u'true', 0.0), (u'truly', 0.0), (u'truthful', 0.0), (u'tryclearblue', 0.0), (u'twice', 0.0), (u'ugly', 0.0), (u'uh', 0.0), (u'uncomfortable', 0.0), (u'upset', 0.0), (u'usually', 0.0), (u'wasteful', 0.0), (u'wear', 0.0), (u'weird', 0.0), (u'welcome', 0.0), (u'white', 0.0), (u'wonderful', 0.0), (u'worried', 0.0), (u'worse', 0.0), (u'worst', 0.0), (u'year', 0.0), (u'yes', 0.0), (u'yet', 0.0), (u'young', 0.0), (u'younger', 0.0)]
posscores=seed_score(['good',"rude"],pmi_matrix_kard,terms_kard)
negscores=seed_score(['bad'],pmi_matrix_kard,terms_kard)
## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms_kard:
sentscores[w]=posscores[w]-negscores[w]
neglexicon_kard = sorted(sentscores.items(),key=itemgetter(1),reverse=False)[:10]
poslexicon_kard = sorted(sentscores.items(),key=itemgetter(1),reverse=False)[-10:]
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
[(u'bad', -0.004933346763201359), (u'over', -0.0008741258741258741), (u'horrible', -0.0005045767240889192), (u'appropriate', -0.0004807692307692308), (u'san', -0.00040064102564102563), (u'worried', -0.0003434065934065934), (u'able', -0.0002403846153846154), (u'worst', -0.00022893772893772894), (u'high', -0.00022361359570661896), (u'rich', -0.00020903010033444816), (u'year', -0.0001923076923076923), (u'normal', -0.00016869095816464237), (u'ready', -0.00016411333242216783), (u'fast', -0.00016025641025641026), (u'especially', -0.00015762925598991173), (u'busy', -0.00014386161489820027), (u'entire', -0.00012651821862348178), (u'sorry', -0.0001201923076923077), (u'enough', -0.0001019798896944335), (u'rude', -8.029485482690247e-05), (u'seriously', -7.754342431761787e-05), (u'again', -7.243382008860433e-05), (u'other', -6.868131868131868e-05), (u'away', -6.585879873551106e-05), (u'now', -6.56137913959721e-05), (u'probably', -5.9528944095807e-05), (u'around', -5.7234432234432234e-05), (u'really', -5.550177990755901e-05), (u'long', -5.3118134731643175e-05), (u'still', -5.139207374979732e-05), (u'so', -4.982099447037755e-05), (u'like', -4.767420925957511e-05), (u'obviously', -4.426511227636932e-05), (u'too', -4.1431243431412225e-05), (u'totally', -4.074315514993481e-05), (u'not', -3.9996683296969165e-05), (u'never', -3.7859275015476366e-05), (u'nice', -3.4965034965034965e-05), (u'then', -3.171625122844635e-05), (u'little', -2.988022913981102e-05), (u'right', -2.875567040238595e-05), (u'much', -2.8721018402069058e-05), (u'up', -2.5766259384752287e-05), (u'different', -2.356927199349781e-05), (u'last', -2.1445209674265877e-05), (u'even', -2.0965408795048512e-05), (u'just', -1.8517524783027054e-05), (u'more', -1.274051202050482e-05), (u'hard', -1.1084353093817969e-05), (u'old', -1.0982540352991126e-05), (u'only', -1.0519692091507817e-05), (u'ever', -1.0384481224857945e-05), (u'also', -9.056727527875654e-06), (u'here', -5.0928339122264e-06), (u'well', -4.7302653330305954e-06), (u'together', -3.67649335290002e-06), (u'first', -2.982536776248202e-06), (u'rob', 0.0), (u'skin', 0.0), (u'certainly', 0.0), (u'young', 0.0), (u'finally', 0.0), (u'ta', 0.0), (u'worse', 0.0), (u'fat', 0.0), (u'bunim', 0.0), (u'anxious', 0.0), (u'quick', 0.0), (u'anal', 0.0), (u'ten', 0.0), (u'tired', 0.0), (u'past', 0.0), (u'second', 0.0), (u'fabulous', 0.0), (u'uncomfortable', 0.0), (u'kris', 0.0), (u'public', 0.0), (u'full', 0.0), (u'alone', 0.0), (u'sexy', 0.0), (u'along', 0.0), (u'dry', 0.0), (u'bible', 0.0), (u'ahead', 0.0), (u'guilty', 0.0), (u'later', 0.0), (u'usually', 0.0), (u'weird', 0.0), (u'extra', 0.0), (u'private', 0.0), (u'moral', 0.0), (u'total', 0.0), (u'angry', 0.0), (u'live', 0.0), (u'acceptable', 0.0), (u'everywhere', 0.0), (u'basically', 0.0), (u'glad', 0.0), (u'male', 0.0), (u'embarrassing', 0.0), (u'awesome', 0.0), (u'huge', 0.0), (u'awkward', 0.0), (u'rather', 0.0), (u'truthful', 0.0), (u'mad', 0.0), (u'guys', 0.0), (u'short', 0.0), (u'natural', 0.0), (u'tall', 0.0), (u'cute', 0.0), (u'soon', 0.0), (u'murray', 0.0), (u'scott', 0.0), (u'cold', 0.0), (u'personal', 0.0), (u'amazing', 0.0), (u'easier', 0.0), (u'safe', 0.0), (u'bigger', 0.0), (u'mean', 0.0), (u'em', 0.0), (u'sexual', 0.0), (u'special', 0.0), (u'out', 0.0), (u'god', 0.0), (u'red', 0.0), (u'free', 0.0), (u'small', 0.0), (u'completely', 0.0), (u'scary', 0.0), (u'atm', 0.0), (u'american', 0.0), (u'major', 0.0), (u'done', 0.0), (u'delicious', 0.0), (u'open', 0.0), (u'top', 0.0), (u'wonderful', 0.0), (u'white', 0.0), (u'hundred', 0.0), (u'exactly', 0.0), (u'huh', 0.0), (u'forward', 0.0), (u'ridiculous', 0.0), (u'double', 0.0), (u'light', 0.0), (u'sad', 0.0), (u'miserable', 0.0), (u'apparently', 0.0), (u'clearly', 0.0), (u'afraid', 0.0), (u'potential', 0.0), (u'lily', 0.0), (u'most', 0.0), (u'regular', 0.0), (u'forever', 0.0), (u'clear', 0.0), (u'upset', 0.0), (u'hungry', 0.0), (u'professional', 0.0), (u'normally', 0.0), (u'anyway', 0.0), (u'bright', 0.0), (u'wasteful', 0.0), (u'hopefully', 0.0), (u'truly', 0.0), (u'gray', 0.0), (u'married', 0.0), (u'naked', 0.0), (u'twice', 0.0), (u'stupid', 0.0), (u'common', 0.0), (u'boring', 0.0), (u'fair', 0.0), (u'dumb', 0.0), (u'desperate', 0.0), (u'outside', 0.0), (u'many', 0.0), (u'barely', 0.0), (u'quiet', 0.0), (u'somewhere', 0.0), (u'tryclearblue', 0.0), (u'wear', 0.0), (u'tough', 0.0), (u'drunk', 0.0), (u'sweet', 0.0), (u'active', 0.0), (u'late', 0.0), (u'secret', 0.0), (u'basic', 0.0), (u'present', 0.0), (u'fur', 0.0), (u'straight', 0.0), (u'ugly', 0.0), (u'alcoholic', 0.0), (u'almost', 0.0), (u'sudden', 0.0), (u'in', 0.0), (u'rid', 0.0), (u'grown', 0.0), (u'funny', 0.0), (u'sensitive', 0.0), (u'same', 0.0), (u'belly', 0.0), (u'difficult', 0.0), (u'kim', 0.0), (u'jealous', 0.0), (u'off', 0.0), (u'older', 0.0), (u'kelly', 0.0), (u'less', 0.0), (u'accurate', 0.0), (u'touch', 0.0), (u'yes', 0.0), (u'yet', 0.0), (u'interested', 0.0), (u'easy', 0.0), (u'excited', 0.0), (u'couple', 0.0), (u'possible', 0.0), (u'early', 0.0), (u'possibly', 0.0), (u'disappointed', 0.0), (u'apart', 0.0), (u'necessary', 0.0), (u'often', 0.0), (u'scared', 0.0), (u'dead', 0.0), (u'supportive', 0.0), (u'gross', 0.0), (u'literally', 0.0), (u'laker', 0.0), (u'exciting', 0.0), (u'oh', 0.0), (u'favorite', 0.0), (u'down', 0.0), (u'female', 0.0), (u'kmart', 0.0), (u'constantly', 0.0), (u'low', 0.0), (u'biggest', 0.0), (u'complete', 0.0), (u'diaper', 0.0), (u'true', 0.0), (u'khloe', 0.0), (u'inside', 0.0), (u'uh', 0.0), (u'emotional', 0.0), (u'certain', 0.0), (u'deep', 0.0), (u'girlfriend', 0.0), (u'annoying', 0.0), (u'selfish', 0.0), (u'incredible', 0.0), (u'lately', 0.0), (u'sick', 0.0), (u'poor', 0.0), (u'welcome', 0.0), (u'luxurious', 0.0), (u'important', 0.0), (u'fresh', 0.0), (u'thebouncedryer', 0.0), (u'ago', 0.0), (u'younger', 0.0), (u'kardashian', 0.0), (u'serious', 0.0), (u'once', 0.0), (u'whole', 3.2229573307878824e-06), (u'as', 3.2229573307878824e-06), (u'own', 4.469794838318963e-06), (u'crazy', 8.742022904100009e-06), (u'else', 9.03342366757001e-06), (u'wrong', 9.475619231716793e-06), (u'there', 9.801183002788436e-06), (u'new', 1.2569698980848807e-05), (u'na', 1.2663678038649544e-05), (u'already', 1.4114724480578139e-05), (u'honestly', 1.6325464459463874e-05), (u'perfect', 1.6524555489457333e-05), (u'anymore', 1.782912565967765e-05), (u'comfortable', 1.8819632640770853e-05), (u'cool', 1.9084697889232416e-05), (u'better', 1.9357336430507162e-05), (u'next', 2.068722977306109e-05), (u'happy', 2.128293534244243e-05), (u'few', 2.2583559168925024e-05), (u'actually', 2.4489650167156945e-05), (u'pregnant', 2.5809781907342885e-05), (u'far', 2.656889313991179e-05), (u'absolutely', 2.656889313991179e-05), (u'gorgeous', 2.8229448961156277e-05), (u'hot', 2.88300755347979e-05), (u'fine', 2.9456816307293507e-05), (u'least', 3.0111412225233366e-05), (u'sure', 3.0466747946422746e-05), (u'nervous', 3.0795762503079576e-05), (u'best', 3.169622339498249e-05), (u'beautiful', 3.22622273841786e-05), (u'great', 3.299035167419889e-05), (u'back', 3.3015980732638744e-05), (u'okay', 3.3875338753387534e-05), (u'armenian', 3.474393718296157e-05), (u'healthy', 3.662198784150004e-05), (u'definitely', 3.8062178374592734e-05), (u'gon', 3.8714672861014324e-05), (u'single', 4.0448165675686606e-05), (u'maybe', 4.065040650406504e-05), (u'big', 4.1974032065495483e-05), (u'such', 4.765553546041351e-05), (u'smart', 5.01856870420556e-05), (u'always', 5.211590577444236e-05), (u'real', 5.2930216802168025e-05), (u'close', 5.313778627982358e-05), (u'before', 5.4200542005420054e-05), (u'um', 5.741582839557209e-05), (u'clean', 5.76601510695958e-05), (u'about', 5.76601510695958e-05), (u'sometimes', 6.302388605281402e-05), (u'yeah', 6.45244547683572e-05), (u'super', 7.13165026387106e-05), (u'black', 7.259001161440186e-05), (u'instead', 7.527853056308341e-05), (u'all', 7.970667941973537e-05), (u'very', 0.0001007104665641251), (u'half', 0.00010423181154888472), (u'fun', 0.00010986596352450011), (u'kimberly', 0.00011291779584462511), (u'pretty', 0.00011686194177483377), (u'two', 0.0001231830500123183), (u'honest', 0.0001231830500123183), (u'online', 0.0001231830500123183), (u'though', 0.0001231830500123183), (u'dramatic', 0.0001231830500123183), (u'adrienne', 0.0001231830500123183), (u'anywhere', 0.00013550135501355014), (u'and', 0.00013550135501355014), (u'strong', 0.00016260162601626016), (u'willing', 0.00019357336430507162), (u'proud', 0.00021557033752155703), (u'extremely', 0.00022583559168925022), (u'you', 0.0002463661000246366), (u'nude', 0.00030795762503079576), (u'awful', 0.00031269543464665416), (u'positive', 0.0006097560975609756), (u'changei', 0.0013550135501355014), (u'good', 0.001426443412860228)]
We (roughly) calculate the each sentence's sentiment score by comparing the number of words with positive sentiment score vs negative sentiment score (according to our automatically induced lexicon)
final_message_sentiment = {}
for k, m in enumerate(msgs_adj_adv_only_tokenized):
m_sent_score = sum([sentscores.get(w,0)>0 for w in m])-sum([sentscores.get(w,0)<0 for w in m])
final_message_sentiment[msgs[k]]=m_sent_score
sorted(final_message_sentiment.items(), key=itemgetter(1), reverse=False)[:10]
[(u"i couldn't be any more sorry, and i'll never excuse the way i acted the other night in vegas, but, like, i don't know what i ever did so bad to, like, deserve you to, like, hate me so much.", -9), (u"he just needs to be pushed a little bit so that he takes care of something that's made him feel really bad for a really long time.", -7), (u"i mean, honestly i really thought you brought me here to spend time with you and like, it's like a bonding thing and you really wanted to take me to lunch and hang out, but obviously, this is not really why i'm here.", -6), (u'now, i do not know what case you have him on, but whatever it is, it is going bad, and it sounds like it is going bad right now.', -6), (u"i understand that we're gonna fight 'cause we do so much stuff together, but i got you guys a little gift because i felt a little bad-- for you both.", -6), (u"khloe getting married has really made me think about my own love life, and you know, i'm still sad, but i don't really, it's not really my personality to get really mad and yell, and i just don't want to fight with my brother rob.", -6), (u"i'm just trying to keep busy and not think about it, but i just have so much to do, so i'm going to have to pass.", -6), (u"just as i feel like i'm getting all the moves down, and all i need to do is really put it together, one after the other, i fall, and i just get really mad.", -6), (u"she hit me really hard, but because i'm a bigger girl, i know not to hit her, because i will really, like, give her a concussion, and that's not what i, i'm not trying to hurt my sister", -6), (u"we've just been fighting so much lately, and i don't really understand what i've done, or i don't really understand, like, just everyone is fighting.", -6)]
sorted(final_message_sentiment.items(), key=itemgetter(1))[-10:]
[(u"it's gonna be a pretty big game.", 4), (u"this is a great time to tell khloe that it's not always all about us and that maybe once in a while it's a great thing to help somebody else out.", 4), (u"so, tonight, khloe, i ask you to honor that very same promise to his grandmother, that you will always support lamar and stand by him because you have realized very quickly what the rest of us already know: it's very easy to love lamar.", 4), (u"i wouldn't be a good manager or a good mom if i didn't find out who's really single out there and who would be a great match for kim.", 4), (u"they're always pretty strong women, actually.", 4), (u'i feel very at peace, very comfortable in my own skin.', 4), (u"i definitely feel protective over summer because she's so young and new to the industry, but i think the smart thing to do is let her learn her own lessons and kind of feel her way through on her own.", 4), (u"i just want to say all you kids, i'm extremely proud of you because you all know where you're going, you all have good direction, and you all have very good work ethics.", 4), (u'i feel very at peace, very comfortable in my own skin, very secure.', 5), (u"as far back as i can remember, khloe's always had body issues, and i'm always having to remind her how beautiful she is.", 5)]
Pretty good considering that we had absolutely no sentiment labels to start with!