Info/CS 4300: Language and Information - in-class demo

Sentiment analysis

Building lexicons tailored to a domain for which we don't have sentiment labels

In [76]:
%matplotlib inline

from __future__ import print_function
import json
from operator import itemgetter
from collections import defaultdict

from matplotlib import pyplot as plt
import numpy as np

from nltk.tokenize import TweetTokenizer
from nltk import FreqDist,pos_tag
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import load_files
from sklearn.naive_bayes import MultinomialNB

tokenizer = TweetTokenizer()

Using the movie review data, but this time we will not use the sentiment labels (we will pretend we don't have labels).

In [77]:
## loading movie review data: 
## http://www.cs.cornell.edu/people/pabo/movie-review-data/review_polarity.tar.gz
data = load_files('txt_sentoken')
print(data.data[0])
b"arnold schwarzenegger has been an icon for action enthusiasts , since the late 80's , but lately his films have been very sloppy and the one-liners are getting worse . \nit's hard seeing arnold as mr . freeze in batman and robin , especially when he says tons of ice jokes , but hey he got 15 million , what's it matter to him ? \nonce again arnold has signed to do another expensive blockbuster , that can't compare with the likes of the terminator series , true lies and even eraser . \nin this so called dark thriller , the devil ( gabriel byrne ) has come upon earth , to impregnate a woman ( robin tunney ) which happens every 1000 years , and basically destroy the world , but apparently god has chosen one man , and that one man is jericho cane ( arnold himself ) . \nwith the help of a trusty sidekick ( kevin pollack ) , they will stop at nothing to let the devil take over the world ! \nparts of this are actually so absurd , that they would fit right in with dogma . \nyes , the film is that weak , but it's better than the other blockbuster right now ( sleepy hollow ) , but it makes the world is not enough look like a 4 star film . \nanyway , this definitely doesn't seem like an arnold movie . \nit just wasn't the type of film you can see him doing . \nsure he gave us a few chuckles with his well known one-liners , but he seemed confused as to where his character and the film was going . \nit's understandable , especially when the ending had to be changed according to some sources . \naside form that , he still walked through it , much like he has in the past few films . \ni'm sorry to say this arnold but maybe these are the end of your action days . \nspeaking of action , where was it in this film ? \nthere was hardly any explosions or fights . \nthe devil made a few places explode , but arnold wasn't kicking some devil butt . \nthe ending was changed to make it more spiritual , which undoubtedly ruined the film . \ni was at least hoping for a cool ending if nothing else occurred , but once again i was let down . \ni also don't know why the film took so long and cost so much . \nthere was really no super affects at all , unless you consider an invisible devil , who was in it for 5 minutes tops , worth the overpriced budget . \nthe budget should have gone into a better script , where at least audiences could be somewhat entertained instead of facing boredom . \nit's pitiful to see how scripts like these get bought and made into a movie . \ndo they even read these things anymore ? \nit sure doesn't seem like it . \nthankfully gabriel's performance gave some light to this poor film . \nwhen he walks down the street searching for robin tunney , you can't help but feel that he looked like a devil . \nthe guy is creepy looking anyway ! \nwhen it's all over , you're just glad it's the end of the movie . \ndon't bother to see this , if you're expecting a solid action flick , because it's neither solid nor does it have action . \nit's just another movie that we are suckered in to seeing , due to a strategic marketing campaign . \nsave your money and see the world is not enough for an entertaining experience . \n"
In [78]:
## building the term documnet matrix
vec = CountVectorizer(min_df = 50)
X = vec.fit_transform(data.data)
terms = vec.get_feature_names()
len(terms)
Out[78]:
2153
In [79]:
# PMI type measure via matrix multiplication
def getcollocations_matrix(X):
    XX=X.T.dot(X)  ## multiply X with it's transpose to get number docs in which both w1 (row) and w2 (column) occur
    term_freqs = np.asarray(X.sum(axis=0)) ## number of docs in which a word occurs
    pmi = XX.toarray() * 1.0  ## Casting to float, making it an array to use simple operations
    pmi /= term_freqs.T ## dividing by the number of documents in which w1 occurs
    pmi /= term_freqs  ## dividing by the number of documents in which w2 occurs
    
    return pmi  # this is not technically PMI beacuse we are ignoring some normalization factor and not taking the log 
                # but it's sufficient for ranking
In [80]:
pmi_matrix = getcollocations_matrix(X)
pmi_matrix.shape 
Out[80]:
(2153, 2153)
In [81]:
def getcollocations(w,PMI_MATRIX=pmi_matrix,TERMS=terms):
    if w not in TERMS:
        return []
    idx = TERMS.index(w)
    col = PMI_MATRIX[:,idx].ravel().tolist()
    return sorted([(TERMS[i],val) for i,val in enumerate(col)],key=itemgetter(1),reverse=True)
In [82]:
getcollocations("good")
Out[82]:
[('good', 0.0012711337380982813),
 ('trek', 0.0010038914000850665),
 ('sean', 0.0009922470727116103),
 ('nudity', 0.0009374840201587473),
 ('nicely', 0.0009268742752181751),
 ('trash', 0.0009217014608968155),
 ('showed', 0.000916850400576306),
 ('compared', 0.00091151987499156),
 ('fairly', 0.0008716089901959017),
 ('comparison', 0.0008698557537213697),
 ('laughed', 0.0008665639627895953),
 ('crap', 0.0008473706979212659),
 ('pulp', 0.0008450365730278281),
 ('parts', 0.0008435572066033899),
 ('fifteen', 0.0008424927416009955),
 ('sorry', 0.0008413817621615216),
 ('pretty', 0.0008334590198961828),
 ('nights', 0.0008333717375608706),
 ('chris', 0.000833301911692621),
 ('doctor', 0.0008330167404996009),
 ('rating', 0.0008322781072402701),
 ('average', 0.0008295313148071339),
 ('forward', 0.0008295313148071339),
 ('watched', 0.0008295313148071339),
 ('cool', 0.0008275372491465399),
 ('stupid', 0.0008213343650560753),
 ('sadly', 0.0008174507616788748),
 ('matt', 0.0008162941129751053),
 ('hate', 0.0008140549843070009),
 ('kills', 0.0008135787895223813),
 ('terrific', 0.0008122494124153186),
 ('horrible', 0.0008093970595933685),
 ('agrees', 0.0008091330037872864),
 ('subplot', 0.0008082612810941305),
 ('totally', 0.0008068044294699522),
 ('sad', 0.0008064887782847135),
 ('technical', 0.0008033355890763824),
 ('therefore', 0.0008002537389904116),
 ('handled', 0.0007999051964211649),
 ('scientist', 0.0007949675100235034),
 ('lovely', 0.0007943816828237808),
 ('barry', 0.000792934345036231),
 ('villain', 0.0007926632563712613),
 ('event', 0.0007924384511368963),
 ('producers', 0.0007895539020453444),
 ('okay', 0.0007863956864371629),
 ('fit', 0.000785871771922548),
 ('mentioned', 0.0007854073087003715),
 ('detail', 0.0007852359533368501),
 ('information', 0.0007839070924927416),
 ('allen', 0.0007790149847387508),
 ('seven', 0.0007784369946922017),
 ('shouldn', 0.0007783256780906441),
 ('naturally', 0.0007776856076316881),
 ('comments', 0.0007747509449613798),
 ('entertain', 0.0007747509449613798),
 ('jail', 0.0007734819016444898),
 ('fbi', 0.0007733651320337342),
 ('climactic', 0.0007732919036337689),
 ('bad', 0.0007712559966343031),
 ('ended', 0.0007711136165812794),
 ('judge', 0.0007694203499660373),
 ('ones', 0.0007682591154179706),
 ('nice', 0.0007668341805484552),
 ('kill', 0.000764042000480255),
 ('critics', 0.0007636954961716471),
 ('danny', 0.0007634624490260348),
 ('presented', 0.0007617330823469355),
 ('rent', 0.0007604037052398728),
 ('sub', 0.0007604037052398728),
 ('genius', 0.0007595059440766616),
 ('thankfully', 0.0007594300769361086),
 ('wanted', 0.0007590994107197358),
 ('breaking', 0.0007584286306808082),
 ('batman', 0.0007559768139867969),
 ('total', 0.0007557951979353888),
 ('wasn', 0.0007554151148587302),
 ('bigger', 0.0007552449284064951),
 ('ensemble', 0.000752202124443757),
 ('steals', 0.000752202124443757),
 ('lot', 0.0007517244632705712),
 ('kiss', 0.0007491175649023608),
 ('directing', 0.0007486014304357063),
 ('perspective', 0.0007479380707277437),
 ('badly', 0.0007476696718985352),
 ('crash', 0.0007476696718985352),
 ('adds', 0.0007473255088352558),
 ('really', 0.0007456730464617401),
 ('job', 0.0007452354168096464),
 ('army', 0.000744825652379645),
 ('brown', 0.0007446186605355376),
 ('mainly', 0.0007441383853416938),
 ('pay', 0.0007420807243907193),
 ('dumb', 0.000741226368392181),
 ('explosions', 0.0007406529596492268),
 ('yeah', 0.0007402779454924423),
 ('driver', 0.0007399796387768184),
 ('recommend', 0.0007395349929176807),
 ('blame', 0.0007389503091672745),
 ('twice', 0.0007382828701783492),
 ('gary', 0.0007379596761595932),
 ('wouldn', 0.0007373611687174524),
 ('cares', 0.0007367547861773888),
 ('killed', 0.000736304171629268),
 ('fiction', 0.0007362894228326887),
 ('price', 0.0007357152732515652),
 ('murphy', 0.0007352663926699596),
 ('hits', 0.0007338161630986185),
 ('accent', 0.0007334803204610447),
 ('acts', 0.0007329420521241115),
 ('saw', 0.0007328073077446421),
 ('suspenseful', 0.0007327526614129683),
 ('guilty', 0.0007299875570302779),
 ('advice', 0.000729680323209979),
 ('ending', 0.0007295169009651391),
 ('aren', 0.0007290304055131928),
 ('jackson', 0.000728289303944846),
 ('ok', 0.0007282513286969607),
 ('actor', 0.0007277390106091889),
 ('news', 0.0007277251988989857),
 ('fights', 0.0007273426745772697),
 ('thinks', 0.00072659677209384),
 ('throw', 0.0007247925124324958),
 ('saying', 0.0007246200014638788),
 ('cop', 0.0007238458347956482),
 ('loves', 0.0007235356468040002),
 ('extra', 0.0007228772886176453),
 ('villains', 0.0007228772886176453),
 ('performance', 0.0007227970372811597),
 ('range', 0.0007226977363850031),
 ('flash', 0.0007224950161223425),
 ('gives', 0.0007223630680223859),
 ('thrills', 0.0007222643344441425),
 ('said', 0.0007214337497047879),
 ('surprised', 0.0007209945072622753),
 ('treat', 0.0007200792663256371),
 ('guys', 0.0007196493682561889),
 ('writing', 0.0007196184155951888),
 ('particular', 0.0007191066917321584),
 ('witty', 0.0007183570148845284),
 ('natural', 0.000717863637813866),
 ('acted', 0.0007174324884818456),
 ('liked', 0.0007171432011881029),
 ('cliched', 0.0007164134082425248),
 ('grace', 0.0007158548012965267),
 ('national', 0.000715805247454543),
 ('acting', 0.0007155453571609738),
 ('aliens', 0.0007152403336559289),
 ('chemistry', 0.0007146731327569155),
 ('guess', 0.0007139107996902104),
 ('instance', 0.0007133969307341352),
 ('violent', 0.0007131582166867087),
 ('mediocre', 0.0007118275471655811),
 ('alien', 0.0007110268412632576),
 ('scary', 0.0007110268412632576),
 ('ask', 0.0007106124899571602),
 ('probably', 0.0007102573316947909),
 ('nevertheless', 0.0007100225660637334),
 ('mean', 0.0007095577775416395),
 ('allowed', 0.0007091154787867436),
 ('loud', 0.0007090011237667811),
 ('flick', 0.0007089106899499742),
 ('fun', 0.000708794736453356),
 ('slightly', 0.000708672447749141),
 ('plain', 0.0007081364882499924),
 ('allows', 0.0007078066110039132),
 ('prison', 0.0007071414486880486),
 ('trailer', 0.00070639776026545),
 ('stuff', 0.0007058992438503015),
 ('fantastic', 0.0007056402742839906),
 ('dog', 0.0007054282047178778),
 ('critic', 0.0007051016175860639),
 ('hey', 0.0007051016175860639),
 ('overall', 0.0007051016175860639),
 ('working', 0.0007049068919253111),
 ('developed', 0.0007046989324817885),
 ('person', 0.0007034519814486633),
 ('visuals', 0.0007030783704767782),
 ('emotion', 0.0007022736699219486),
 ('menace', 0.0007016129344864077),
 ('murdered', 0.0007013310207005769),
 ('requires', 0.0007008109383715443),
 ('track', 0.0007006720814390355),
 ('usual', 0.0007006493308681725),
 ('lines', 0.0006999170468685193),
 ('saving', 0.0006999170468685193),
 ('yes', 0.0006999170468685193),
 ('able', 0.0006998405782738653),
 ('get', 0.0006997175379902146),
 ('maybe', 0.0006996916307503651),
 ('think', 0.0006990926451643161),
 ('bring', 0.0006989242567311171),
 ('remember', 0.0006988801327250104),
 ('de', 0.0006986421524297788),
 ('annoying', 0.0006978596775361603),
 ('wonderfully', 0.0006977822236318833),
 ('disappointing', 0.00069756042381509),
 ('included', 0.0006972871921567213),
 ('friends', 0.0006965130357913436),
 ('tell', 0.0006964706559291111),
 ('williams', 0.0006959289155473312),
 ('realistic', 0.0006955301024152124),
 ('except', 0.0006951165184263484),
 ('episode', 0.0006940976307569897),
 ('impressive', 0.0006938603053760607),
 ('terribly', 0.0006936598063473447),
 ('very', 0.0006935024277037634),
 ('language', 0.000693488179178764),
 ('doing', 0.000693268245804233),
 ('feeling', 0.0006931194985944053),
 ('somewhere', 0.0006923647194453244),
 ('study', 0.0006912760956726117),
 ('theatre', 0.0006912760956726117),
 ('dull', 0.0006902443403059361),
 ('decided', 0.000689946718565549),
 ('hotel', 0.000689668476845466),
 ('seemingly', 0.0006890461727833452),
 ('thrillers', 0.0006890461727833452),
 ('mood', 0.0006884254725976731),
 ('confused', 0.0006883344952654942),
 ('anti', 0.0006881339316013725),
 ('brilliant', 0.0006881339316013725),
 ('reason', 0.000688112360680975),
 ('smart', 0.0006876378004322295),
 ('direction', 0.0006873678209267594),
 ('jackie', 0.0006873678209267594),
 ('actually', 0.0006873117883139156),
 ('drop', 0.000686248633158629),
 ('planet', 0.0006861555320009626),
 ('brian', 0.0006859585872443607),
 ('above', 0.000685583233708249),
 ('lawyer', 0.000685264999188502),
 ('better', 0.0006851280870126166),
 ('warm', 0.0006849917675301333),
 ('biggest', 0.0006847808840354194),
 ('hundred', 0.0006846925138090629),
 ('screenplay', 0.0006846474207826003),
 ('did', 0.0006843905146410103),
 ('lose', 0.0006843633347158854),
 ('will', 0.0006842884672687007),
 ('direct', 0.0006840940063669222),
 ('scene', 0.0006840515924536997),
 ('george', 0.0006839995051918474),
 ('considered', 0.000683922094654818),
 ('sheer', 0.0006838028405842591),
 ('criminal', 0.0006834206854945138),
 ('general', 0.0006833962645302296),
 ('develops', 0.000683143435723522),
 ('rules', 0.000683143435723522),
 ('guy', 0.0006827392523224378),
 ('talent', 0.0006825963958166327),
 ('looks', 0.0006825376168108359),
 ('had', 0.0006825121813937092),
 ('great', 0.0006824846052572631),
 ('tension', 0.0006824512944512591),
 ('learn', 0.0006824341921233109),
 ('fact', 0.0006821735781395314),
 ('entertainment', 0.0006821027635973353),
 ('agent', 0.0006814007228772886),
 ('explained', 0.0006814007228772886),
 ('hit', 0.0006810888689995415),
 ('reasons', 0.0006807222621508924),
 ('moved', 0.0006804749066777271),
 ('offensive', 0.0006802156781418498),
 ('threatening', 0.0006799437006615852),
 ('feel', 0.0006798736033728572),
 ('huge', 0.000679823000596379),
 ('running', 0.0006792911231160586),
 ('master', 0.0006792539027043922),
 ('cops', 0.0006791217906937525),
 ('why', 0.000678920708442264),
 ('gore', 0.0006787074393876551),
 ('failure', 0.0006781978992679947),
 ('soundtrack', 0.0006781978992679947),
 ('besides', 0.0006781089319455142),
 ('either', 0.0006780236523876963),
 ('aforementioned', 0.0006779823246019844),
 ('feels', 0.000677834616034533),
 ('me', 0.0006772941322099265),
 ('definitely', 0.0006772162428792703),
 ('capable', 0.0006770439407617049),
 ('intelligent', 0.0006762483544623375),
 ('rated', 0.0006761248387811572),
 ('flicks', 0.0006759144046576647),
 ('girls', 0.0006755789965569017),
 ('care', 0.0006753585539959397),
 ('anyway', 0.0006753514107461222),
 ('well', 0.0006750278432664558),
 ('relief', 0.0006745639263266804),
 ('done', 0.0006741033421380078),
 ('asking', 0.0006739941932807963),
 ('evil', 0.0006738733408165179),
 ('jump', 0.0006732428062202827),
 ('supporting', 0.0006732428062202827),
 ('gets', 0.0006727355113724907),
 ('feet', 0.0006727296638374928),
 ('sure', 0.0006725072227117109),
 ('although', 0.0006724942545826389),
 ('credit', 0.0006722064102747464),
 ('weird', 0.0006719203649937786),
 ('happening', 0.0006718035295973268),
 ('necessary', 0.0006717400320992552),
 ('right', 0.0006715253500819656),
 ('1996', 0.0006704431174468618),
 ('hurt', 0.0006704431174468618),
 ('basically', 0.0006703084795005513),
 ('dies', 0.0006700060619596082),
 ('roles', 0.0006697805845704244),
 ('interesting', 0.0006689558957182921),
 ('star', 0.0006687483070094306),
 ('usually', 0.0006687411040075134),
 ('whom', 0.0006685774776057497),
 ('try', 0.0006682335591501913),
 ('though', 0.0006680374524563834),
 ('haunting', 0.0006679343054291209),
 ('major', 0.0006678430076837096),
 ('role', 0.0006678107603149175),
 ('path', 0.0006675134798838656),
 ('regular', 0.0006675134798838656),
 ('sign', 0.0006674389889252802),
 ('loved', 0.0006670457995356336),
 ('don', 0.0006670226685137735),
 ('thing', 0.0006670088013375039),
 ('expecting', 0.0006667751707626963),
 ('make', 0.0006663531085448293),
 ('knows', 0.0006663202799443585),
 ('isn', 0.0006661965036826523),
 ('amount', 0.0006661387831026985),
 ('relatively', 0.0006661387831026985),
 ('bruce', 0.0006656732773143668),
 ('ideas', 0.0006653532420848887),
 ('he', 0.0006653376447000585),
 ('want', 0.0006651063577650056),
 ('reminded', 0.0006650552782505471),
 ('subplots', 0.0006650552782505471),
 ('grow', 0.0006648449508380707),
 ('rise', 0.0006648449508380707),
 ('sometimes', 0.0006648229310006633),
 ('special', 0.0006647811930510132),
 ('individual', 0.0006646885535313573),
 ('forever', 0.0006645170210014137),
 ('scenes', 0.0006644715123710205),
 ('action', 0.0006642620639475215),
 ('aspect', 0.0006642051436742437),
 ('kind', 0.0006640702385978397),
 ('getting', 0.0006640336879613757),
 ('just', 0.0006639106047940057),
 ('believable', 0.0006636250518457072),
 ('boring', 0.0006636250518457072),
 ('cliche', 0.0006636250518457072),
 ('funny', 0.0006636250518457072),
 ('irritating', 0.0006636250518457072),
 ('weight', 0.0006636250518457072),
 ('went', 0.0006636250518457072),
 ('also', 0.0006635828794351934),
 ('effects', 0.0006633694181585554),
 ('jack', 0.0006633457483727081),
 ('bit', 0.0006630408748634487),
 ('need', 0.00066283752211646),
 ('but', 0.0006625489862068561),
 ('disappointment', 0.0006623869454056965),
 ('hardly', 0.0006622308815687205),
 ('tight', 0.0006621697337495544),
 ('likes', 0.000661949231007713),
 ('budget', 0.0006618118686439429),
 ('frightening', 0.0006616499772866426),
 ('heard', 0.0006615893921774687),
 ('black', 0.0006614045091292062),
 ('serves', 0.0006612206132520633),
 ('typical', 0.0006606624400071102),
 ('myself', 0.0006603810746369642),
 ('again', 0.0006602878569010809),
 ('superb', 0.0006600571752228807),
 ('we', 0.0006600378894032979),
 ('musical', 0.0006598544549602202),
 ('nobody', 0.0006598544549602202),
 ('afraid', 0.0006595453896417377),
 ('richard', 0.0006595031571137463),
 ('system', 0.0006593710451031064),
 ('him', 0.000658615728676154),
 ('longer', 0.0006585592117552819),
 ('terrible', 0.0006584042253888791),
 ('decides', 0.0006583581863548682),
 ('knowing', 0.0006583581863548682),
 ('does', 0.0006581230584311701),
 ('makes', 0.0006581059926947726),
 ('wars', 0.0006580639480592907),
 ('sounds', 0.0006580116820462604),
 ('nothing', 0.0006577440462556566),
 ('built', 0.0006576998281685133),
 ('reading', 0.0006575553105178501),
 ('confusing', 0.0006574476909907604),
 ('wasted', 0.0006572981180887036),
 ('grown', 0.0006572440417318061),
 ('drawn', 0.0006571006482461006),
 ('fly', 0.000656712290888981),
 ('responsible', 0.000656712290888981),
 ('played', 0.0006564938091899696),
 ('was', 0.0006564883957972653),
 ('survive', 0.0006562747743727326),
 ('childhood', 0.0006560838580747332),
 ('gave', 0.0006559768907872017),
 ('too', 0.0006556821711522463),
 ('basic', 0.0006555973294443478),
 ('calls', 0.0006553297386976359),
 ('surprising', 0.0006553297386976359),
 ('some', 0.0006548712038000038),
 ('brief', 0.0006547372163299164),
 ('became', 0.0006544925970037938),
 ('beat', 0.0006542782201295704),
 ('started', 0.0006538658599067997),
 ('anyone', 0.0006534515545886386),
 ('jerry', 0.0006531742636276646),
 ('however', 0.0006529728297040989),
 ('heroes', 0.0006529479161105658),
 ('like', 0.000652946803213366),
 ('admit', 0.0006528050781743098),
 ('shoot', 0.0006528050781743098),
 ('case', 0.0006527621417708519),
 ('then', 0.0006527316279785068),
 ('depth', 0.0006526033071035145),
 ('script', 0.0006525362013649949),
 ('movies', 0.0006524133101944996),
 ('times', 0.0006520875564461009),
 ('buy', 0.0006517746044913196),
 ('provide', 0.0006517746044913196),
 ('performances', 0.0006514996521165078),
 ('tough', 0.0006513116963915387),
 ('thrown', 0.0006512548480284078),
 ('hill', 0.0006512208452691519),
 ('beginning', 0.000651103824452392),
 ('loving', 0.0006510856249939714),
 ('ups', 0.0006510245761777506),
 ('see', 0.0006509615377774679),
 ('course', 0.000650951656758376),
 ('problem', 0.0006504279627465027),
 ('best', 0.0006503077449163204),
 ('room', 0.00065000588100559),
 ('filmmakers', 0.0006499420610860018),
 ('places', 0.0006493805747227564),
 ('never', 0.0006493165422671562),
 ('supposedly', 0.0006487360282466047),
 ('kevin', 0.0006486366355657029),
 ('especially', 0.0006485261265981212),
 ('even', 0.000648425062841444),
 ('occasionally', 0.0006482891787988526),
 ('company', 0.0006482129946307112),
 ('money', 0.0006480713396930734),
 ('fair', 0.0006478244553731903),
 ('science', 0.0006477404096472726),
 ('not', 0.0006476204670378808),
 ('next', 0.0006475053385230357),
 ('know', 0.0006468572043976747),
 ('seems', 0.000646841698041768),
 ('memories', 0.0006467532284936977),
 ('unbelievable', 0.0006467532284936977),
 ('sick', 0.0006463880375120525),
 ('actors', 0.0006462354435466341),
 ('supposed', 0.0006459044138513752),
 ('idea', 0.0006457879795324968),
 ('likable', 0.0006457743779827689),
 ('extremely', 0.0006455626764426486),
 ('ve', 0.0006454666510550725),
 ('plays', 0.0006453135893114008),
 ('creature', 0.0006451910226277709),
 ('held', 0.0006451910226277709),
 ('mike', 0.0006451910226277709),
 ('seconds', 0.0006451910226277709),
 ('time', 0.0006449425340547377),
 ('entertaining', 0.0006446039516335691),
 ('my', 0.0006441661752358124),
 ('help', 0.0006441611885932493),
 ('awful', 0.0006441436346040245),
 ('could', 0.0006440929900534719),
 ('considering', 0.0006440669964559454),
 ('dr', 0.0006440243119177749),
 ('should', 0.000643552677141085),
 ('slowly', 0.0006433766496732495),
 ('fans', 0.0006433099992381855),
 ('pull', 0.0006432382652953624),
 ('mistake', 0.0006431873237997343),
 ('moral', 0.0006431197833898005),
 ('occur', 0.0006428867689755288),
 ('characterization', 0.0006425946804843995),
 ('entirely', 0.0006425946804843995),
 ('fire', 0.0006425946804843995),
 ('bond', 0.0006422177921087489),
 ('nomination', 0.0006422177921087489),
 ('doesn', 0.0006421234962308942),
 ('series', 0.000641827148682892),
 ('today', 0.000641765780712276),
 ('albeit', 0.0006417129039074055),
 ('present', 0.0006415906262961427),
 ('ahead', 0.0006415042167841836),
 ('speed', 0.0006414399120310978),
 ('anywhere', 0.0006410014705327854),
 ('efforts', 0.0006410014705327854),
 ('mad', 0.0006410014705327854),
 ('possible', 0.0006410014705327854),
 ('realize', 0.0006410014705327854),
 ('selling', 0.0006410014705327854),
 ('it', 0.0006405730734197016),
 ('flashbacks', 0.000640446970990802),
 ('holes', 0.000640446970990802),
 ('predictable', 0.0006403799435736392),
 ('flaw', 0.0006403399623072613),
 ('generally', 0.0006402693157977394),
 ('used', 0.0006399878692194824),
 ('animals', 0.000639924157136932),
 ('got', 0.0006397980885480555),
 ('things', 0.0006396737955731069),
 ('non', 0.0006396386041886335),
 ('pieces', 0.0006394303884971658),
 ('everything', 0.000639365173771159),
 ('so', 0.0006390972320839649),
 ('hasn', 0.0006390776966116186),
 ('place', 0.0006386716708311836),
 ('appearance', 0.0006386074407642222),
 ('largely', 0.0006386074407642222),
 ('stuck', 0.0006384594951043672),
 ('wants', 0.0006382347851870402),
 ('revolves', 0.0006381010113901031),
 ('theme', 0.0006378593064615462),
 ('seemed', 0.0006378000203469945),
 ('exciting', 0.0006377021982579842),
 ('fake', 0.0006377021982579842),
 ('saved', 0.0006376248166054836),
 ('go', 0.0006376136925584035),
 ('frank', 0.0006375101771202974),
 ('helped', 0.0006375101771202974),
 ('oh', 0.0006375101771202974),
 ('decent', 0.0006373228394249932),
 ('difference', 0.0006373228394249932),
 ('happened', 0.0006373228394249932),
 ('trust', 0.0006373228394249932),
 ('directors', 0.0006372308736472984),
 ('work', 0.0006371939070111661),
 ('etc', 0.0006370800497718789),
 ('our', 0.0006369861926514015),
 ('strikes', 0.000636961545298335),
 ('seen', 0.0006367336520799815),
 ('little', 0.0006363792864759592),
 ('funniest', 0.0006363527894410891),
 ('damn', 0.0006362882244259267),
 ('couple', 0.0006362330341904833),
 ('this', 0.0006362222842527883),
 ('way', 0.0006359903405904665),
 ('began', 0.0006359740080188027),
 ('pulls', 0.0006359740080188027),
 ('making', 0.0006359280760523128),
 ('instead', 0.0006357293085159097),
 ('always', 0.0006355965193658757),
 ('problems', 0.0006355965193658757),
 ('or', 0.0006355875258306249),
 ('entire', 0.000635364058522621),
 ('turn', 0.0006352884449487142),
 ('personal', 0.0006352463489707263),
 ('later', 0.0006351777737724782),
 ('exact', 0.000635109912899212),
 ('attention', 0.0006350561310452954),
 ('happens', 0.0006350094367225153),
 ('ever', 0.0006349762899425742),
 ('common', 0.0006349105063331525),
 ('describe', 0.000634717142390307),
 ('straight', 0.0006345914558274575),
 ('minor', 0.000634529550505457),
 ('been', 0.0006344902934719933),
 ('face', 0.0006343474760289847),
 ('fight', 0.0006343474760289847),
 ('twist', 0.0006343474760289847),
 ('have', 0.0006342080874479353),
 ('move', 0.0006342056273089425),
 ('society', 0.0006341900697073896),
 ('followed', 0.0006339989334597381),
 ('combination', 0.0006338871367865835),
 ('nearly', 0.0006336322258054493),
 ('hot', 0.0006335790357188346),
 ('may', 0.0006335218734437214),
 ('if', 0.0006334845249732937),
 ('social', 0.0006334602767618115),
 ('strong', 0.0006329819174554437),
 ('add', 0.0006326933757003564),
 ('subtle', 0.0006325922256802605),
 ('talking', 0.0006325176275404396),
 ('patrick', 0.0006324149627737556),
 ('took', 0.0006322647216517789),
 ('eddie', 0.0006318913035611389),
 ('government', 0.0006318913035611389),
 ('put', 0.0006318847691429929),
 ('before', 0.0006317650285653122),
 ('learned', 0.000631720001276202),
 ('together', 0.000631683328804283),
 ('cross', 0.0006314900549657912),
 ('deserves', 0.0006314900549657912),
 ('give', 0.0006313901451384067),
 ('character', 0.000631182985573547),
 ('ability', 0.0006310363216211411),
 ('player', 0.0006309111408392287),
 ('poor', 0.0006306710681067937),
 ('formula', 0.0006306130913584845),
 ('needs', 0.0006305939406678666),
 ('interested', 0.0006305786823940408),
 ('do', 0.0006304871168155528),
 ('game', 0.0006304437992534219),
 ('suspense', 0.0006303616674400746),
 ('short', 0.0006301762085067098),
 ('wild', 0.0006300908072045678),
 ('follow', 0.0006299954039481207),
 ('second', 0.0006299059485256166),
 ('all', 0.0006294991237565685),
 ('ago', 0.00062944335947677),
 ('say', 0.0006293887843642655),
 ('because', 0.0006290448272023219),
 ('powerful', 0.0006289852826559587),
 ('seeing', 0.0006288777224349069),
 ('audiences', 0.0006284328142478287),
 ('worker', 0.0006284328142478287),
 ('days', 0.0006283205941024273),
 ('were', 0.0006281126594862564),
 ('shot', 0.0006281077627921833),
 ('charming', 0.0006280737097825443),
 ('oliver', 0.0006280737097825443),
 ('film', 0.0006279666236763682),
 ('singing', 0.0006279091202359556),
 ('leaves', 0.0006278772935280517),
 ('films', 0.0006278191103276649),
 ('quite', 0.0006278043814335809),
 ('laughable', 0.0006277534274216149),
 ('battle', 0.0006275584729410491),
 ('powers', 0.0006275584729410491),
 ('details', 0.0006273766246440509),
 ('hell', 0.000627333056822895),
 ('taking', 0.0006272902091310146),
 ('mark', 0.0006271456627005742),
 ('perfectly', 0.0006271456627005742),
 ('robert', 0.000627137076486493),
 ('made', 0.0006271226129930316),
 ('generated', 0.000627086172503012),
 ('big', 0.0006268262942715562),
 ('starring', 0.0006266568084684328),
 ('suppose', 0.0006266568084684328),
 ('dramatic', 0.0006264244207177584),
 ('what', 0.0006260189663520424),
 ('dozen', 0.0006259190829908375),
 ('touches', 0.0006259190829908375),
 ('wrong', 0.0006259190829908375),
 ('seriously', 0.0006257867813457326),
 ('thoughts', 0.0006257867813457326),
 ('seem', 0.0006257614273719321),
 ('back', 0.0006256700813097204),
 ('loose', 0.0006256634493036858),
 ('sam', 0.0006256241759718609),
 ('violence', 0.0006255706449948188),
 ('any', 0.0006253106130995327),
 ('gotten', 0.0006251540343474053),
 ('record', 0.0006251540343474053),
 ('robin', 0.0006250970571295465),
 ('surprises', 0.0006250693710166432),
 ('completely', 0.0006249764337694657),
 ('join', 0.0006246470744029624),
 ('results', 0.0006245882840900773),
 ('people', 0.0006245715157190483),
 ('bunch', 0.0006245321967800837),
 ('industry', 0.0006245321967800837),
 ('cliches', 0.0006244274182888866),
 ('amazing', 0.0006244026472868915),
 ('point', 0.0006242677266906242),
 ('ass', 0.0006242017814390315),
 ('disturbing', 0.000624161911626727),
 ('which', 0.0006240510808640825),
 ('sense', 0.0006240167998774386),
 ('monster', 0.000623891198951584),
 ('write', 0.000623891198951584),
 ('ship', 0.00062371956814097),
 ('hold', 0.0006237077554940856),
 ('order', 0.0006236089285609969),
 ('movie', 0.0006234062228935263),
 ('unlike', 0.0006233902994508702),
 ('re', 0.0006232476883776215),
 ('save', 0.0006229081301665292),
 ('heart', 0.0006228812876201978),
 ('killer', 0.0006228611418740851),
 ('between', 0.0006227931995624544),
 ('take', 0.0006223776384022585),
 ('asks', 0.0006221484861053505),
 ('edge', 0.0006221484861053505),
 ('finally', 0.0006221484861053505),
 ('lacking', 0.0006221484861053505),
 ('quiet', 0.0006221484861053505),
 ('shooting', 0.0006221484861053505),
 ('stunning', 0.0006221484861053505),
 ('tommy', 0.0006221484861053505),
 ('tradition', 0.0006221484861053505),
 ('going', 0.0006216814076623284),
 ('they', 0.000621589734442527),
 ('cast', 0.0006213394503626908),
 ('sound', 0.000621302025580037),
 ('mission', 0.0006211748578015862),
 ('there', 0.0006210483119477813),
 ('doubt', 0.0006209634413699117),
 ('kids', 0.0006208839566620469),
 ('brought', 0.0006208275763683964),
 ('inside', 0.0006208275763683964),
 ('six', 0.0006207377185631616),
 ('small', 0.0006206982565340094),
 ('thought', 0.0006206420733060639),
 ('race', 0.0006205155504462813),
 ('can', 0.000620277579253773),
 ('one', 0.0006202348373100844),
 ('explain', 0.000620135060583974),
 ('using', 0.0006200746578183327),
 ('many', 0.0006198587703310405),
 ('humanity', 0.0006197086881206236),
 ('much', 0.0006196181929293892),
 ('fan', 0.0006195233870078595),
 ('accept', 0.00061938338172266),
 ('trying', 0.0006192172800459613),
 ('1995', 0.0006191429378632956),
 ('lee', 0.00061902994732788),
 ('car', 0.0006189182239760392),
 ('claims', 0.0006188566951735761),
 ('out', 0.0006185562072468923),
 ('effectively', 0.0006185101908649683),
 ('frankly', 0.0006183778892198636),
 ('hard', 0.000618262266146714),
 ('told', 0.0006182356025449395),
 ('born', 0.0006181603547841623),
 ('fully', 0.0006180821561308057),
 ('air', 0.0006180282974556462),
 ('still', 0.0006179889451285238),
 ('rob', 0.0006177360854946742),
 ('against', 0.000617664533052339),
 ('silent', 0.0006176401637422683),
 ('failed', 0.0006175399788008664),
 ('plot', 0.0006173511305174044),
 ('important', 0.0006173256296239136),
 ('none', 0.0006170067630796864),
 ('broken', 0.0006169639153878059),
 ('shock', 0.0006169639153878059),
 ('south', 0.0006169639153878059),
 ('books', 0.0006168309776770996),
 ('spend', 0.0006166182773399696),
 ('means', 0.0006164822885998373),
 ('girlfriend', 0.0006164407018291546),
 ('same', 0.0006164275804859909),
 ('suspects', 0.0006163878519747455),
 ('five', 0.000616306716282765),
 ('being', 0.0006162586897745075),
 ('weren', 0.0006162232624281567),
 ('obsessed', 0.0006160489911435334),
 ('whatever', 0.0006160489911435334),
 ('van', 0.0006160232548778717),
 ('college', 0.0006159579539052972),
 ('recently', 0.0006159579539052972),
 ('logic', 0.0006158641579628722),
 ('them', 0.0006158507656812686),
 ('marry', 0.000615458717437551),
 ('speech', 0.000615458717437551),
 ('far', 0.00061529015633726),
 ('would', 0.0006151668925359685),
 ('shows', 0.0006150671212228505),
 ('those', 0.0006149973540811511),
 ('here', 0.0006148167699391258),
 ('must', 0.0006147659258603031),
 ('long', 0.0006147065185682051),
 ('exist', 0.0006146527212125149),
 ('something', 0.0006145255546073584),
 ('land', 0.000614467640597877),
 ('no', 0.0006144303549400738),
 ('telling', 0.0006143521391616743),
 ('she', 0.0006141595264514455),
 ('winner', 0.0006140686356364499),
 ('almost', 0.0006140554976682077),
 ('throughout', 0.0006139081088059418),
 ('liners', 0.0006138531729572792),
 ('chance', 0.0006137787755299422),
 ('standing', 0.0006136259041039073),
 ('that', 0.0006135531164153746),
 ('fascinating', 0.0006135075349094428),
 ('ex', 0.0006133504267058808),
 ('quickly', 0.000613202560161352),
 ('minutes', 0.000613131841379186),
 ('obviously', 0.0006130527480043951),
 ('mess', 0.0006130184244643914),
 ('cute', 0.0006128626878052707),
 ('plenty', 0.0006128626878052707),
 ('comedies', 0.000612721993891633),
 ('enough', 0.000612576970934499),
 ('drama', 0.0006122731133100275),
 ('notice', 0.0006122731133100275),
 ('terms', 0.0006122731133100275),
 ('decide', 0.0006121138331036512),
 ('destroy', 0.0006117793446702613),
 ('50', 0.0006116035965103445),
 ('style', 0.0006116035965103445),
 ('succeeds', 0.0006115134692488488),
 ('theater', 0.0006115134692488488),
 ('has', 0.0006113816301969087),
 ('talented', 0.0006113753521468163),
 ('superior', 0.0006112336003842039),
 ('off', 0.000610998871659018),
 ('introduced', 0.0006108366954488896),
 ('certain', 0.0006106851136645484),
 ('remarkable', 0.0006106272178441403),
 ('taste', 0.0006106272178441403),
 ('john', 0.0006101940874583805),
 ('end', 0.0006100413906444177),
 ('smith', 0.0006098461149111769),
 ('read', 0.0006098176152095687),
 ('other', 0.0006097992006179753),
 ('sweet', 0.0006096555446172912),
 ('use', 0.0006096429888972028),
 ('visually', 0.0006093470769262281),
 ('ed', 0.000609187059311489),
 ('fox', 0.0006090702897007335),
 ('play', 0.0006088723226785255),
 ('credits', 0.000608867812346123),
 ('tried', 0.0006085813851622432),
 ('part', 0.0006082067833354827),
 ('office', 0.0006081900264811919),
 ('main', 0.0006081150616067336),
 ('despite', 0.0006080087477847743),
 ('where', 0.0006079708396391428),
 ('meeting', 0.000607944182769612),
 ('ways', 0.0006078840587343283),
 ('involved', 0.0006076402223690117),
 ('figures', 0.0006075440615488869),
 ('door', 0.0006073354269123659),
 ('halfway', 0.0006073354269123659),
 ('screenwriter', 0.0006073354269123659),
 ('willing', 0.0006073354269123659),
 ('opening', 0.0006072386095320195),
 ('married', 0.0006071569563196793),
 ('truth', 0.0006070411277231013),
 ('humor', 0.0006069741327857078),
 ('highly', 0.0006068997487008076),
 ('effort', 0.0006068383443891115),
 ('comic', 0.0006066880695697419),
 ('led', 0.0006064640704892491),
 ('friend', 0.0006064299831964133),
 ('worse', 0.0006060657361243958),
 ('than', 0.0006058864534585655),
 ('driving', 0.0006057761575236307),
 ('final', 0.0006057761575236307),
 ('paced', 0.0006057761575236307),
 ('yet', 0.0006057761575236307),
 ('points', 0.0006056087513009137),
 ('editing', 0.0006055578598092078),
 ('disaster', 0.0006054973100782),
 ('works', 0.0006053961127561814),
 ('hope', 0.0006052028074954771),
 ('conclusion', 0.0006051498935888108),
 ('manage', 0.0006050699002122624),
 ('pg', 0.0006050699002122624),
 ('comes', 0.0006048901606608638),
 ('generation', 0.0006048665837135352),
 ('past', 0.0006048665837135352),
 ('adaptation', 0.0006046583680220675),
 ('score', 0.0006045405100835009),
 ('students', 0.000604372815073769),
 ('value', 0.000604372815073769),
 ('you', 0.000604281417718327),
 ('only', 0.000604277821507802),
 ('watch', 0.0006039208079607493),
 ('how', 0.0006036931915274067),
 ('talk', 0.0006036321621141198),
 ('woody', 0.000603555542842432),
 ('about', 0.0006033704212867672),
 ('owner', 0.0006032955016779157),
 ('is', 0.0006032580875393416),
 ('are', 0.0006032575189779611),
 ('ll', 0.0006030994567791901),
 ('came', 0.0006030916856300515),
 ('field', 0.0006028570601796031),
 ('90', 0.0006025840683032954),
 ('enjoyed', 0.0006025840683032954),
 ('multiple', 0.0006025840683032954),
 ('everyone', 0.000602438547840305),
 ('obvious', 0.0006022624614353165),
 ('wonderful', 0.0006020791801019521),
 ('look', 0.000602031109907932),
 ('wait', 0.0006019862666482326),
 ('likely', 0.0006019160150124935),
 ('jeff', 0.0006018168362326266),
 ('dialogue', 0.0006017698266375452),
 ('didn', 0.0006017325599637242),
 ('now', 0.0006016610695602147),
 ('island', 0.0006015685107379979),
 ('over', 0.0006014962542014384),
 ('1998', 0.0006014102032351721),
 ('agree', 0.0006014102032351721),
 ('date', 0.0006014102032351721),
 ('opera', 0.0006014102032351721),
 ('remake', 0.0006011771888209005),
 ('be', 0.0006011213317860572),
 ('chief', 0.0006011096484109667),
 ('games', 0.0006011096484109667),
 ('producer', 0.0006010587069153386),
 ('while', 0.000600992473040906),
 ('due', 0.0006009869729725155),
 ('phone', 0.0006009869729725155),
 ('building', 0.0006006950900327521),
 ('given', 0.0006006665994669187),
 ('falls', 0.0006004557215967957),
 ('mary', 0.0006003950425352334),
 ('figure', 0.0006003187146630575),
 ('travel', 0.0006003187146630575),
 ('naked', 0.0006001903042428087),
 ('am', 0.0006000416871066832),
 ('addition', 0.0006000008053702085),
 ('unnecessary', 0.0005998149507066969),
 ('super', 0.0005997287208402928),
 ('hero', 0.0005996418225253119),
 ('forces', 0.0005995622374348592),
 ('onto', 0.0005994932191043153),
 ('stands', 0.0005994932191043153),
 ('choice', 0.0005994659892160929),
 ('imagine', 0.0005994659892160929),
 ('worth', 0.0005993948842101705),
 ('enjoy', 0.0005993089675258589),
 ('without', 0.000599238187956086),
 ('position', 0.00059910594958293),
 ('for', 0.0005988931282437008),
 ('released', 0.0005988905987743094),
 ('several', 0.0005988859731006158),
 ('law', 0.0005988595053420486),
 ('technology', 0.000598522594227932),
 ('otherwise', 0.0005982196981782216),
 ('early', 0.0005981933738790719),
 ('who', 0.0005981748562238021),
 ('version', 0.000598001170434595),
 ('immediately', 0.0005979750275450198),
 ('studio', 0.0005979750275450198),
 ('yourself', 0.0005979538227568091),
 ('pacing', 0.0005977505062580819),
 ('witness', 0.0005977505062580819),
 ('most', 0.0005976870249575253),
 ('audience', 0.0005976437317292098),
 ('happen', 0.0005976396063496852),
 ('similar', 0.0005976193343234191),
 ('often', 0.000597486744313787),
 ('stop', 0.0005973863573051375),
 ('surprisingly', 0.0005971756847433556),
 ('mental', 0.0005970111735354374),
 ('every', 0.0005969647212682807),
 ('creating', 0.0005967546703459484),
 ('girl', 0.0005964550383015896),
 ('background', 0.000596414850427027),
 ('million', 0.0005963657560505341),
 ('1997', 0.0005962256325176275),
 ('around', 0.0005961969250385714),
 ('leave', 0.0005961050611055916),
 ('unfortunately', 0.0005960899107710949),
 ('growing', 0.000595860521903716),
 ('members', 0.0005957036958682102),
 ('brings', 0.0005954556467674971),
 ('faces', 0.0005954556467674971),
 ('apparently', 0.0005953574029716272),
 ('let', 0.0005953107082733549),
 ('student', 0.0005950985519268569),
 ('veteran', 0.0005950985519268569),
 ('more', 0.0005950716124445559),
 ('forget', 0.0005949507380788871),
 ('whole', 0.0005949163974879445),
 ('career', 0.0005948830146027255),
 ('catherine', 0.0005948612718024842),
 ('watching', 0.0005948059224834115),
 ('anything', 0.0005946316706465377),
 ('starts', 0.0005945849455816957),
 ('screen', 0.0005942032822377343),
 ('up', 0.0005941929153640528),
 ('model', 0.0005941237795240284),
 ('and', 0.0005940629658477276),
 ('capture', 0.0005935439580085528),
 ('solid', 0.0005935439580085528),
 ('positive', 0.0005934339405927958),
 ('lots', 0.000593387363876636),
 ('buddy', 0.0005932723960329503),
 ('era', 0.0005932113472167296),
 ('storyline', 0.0005930761269415491),
 ('thriller', 0.0005929729550712593),
 ('old', 0.0005925904737386595),
 ('as', 0.0005925848590923172),
 ('cause', 0.0005925223677193814),
 ('handle', 0.0005925223677193814),
 ('heroine', 0.0005925223677193814),
 ('mouth', 0.0005925223677193814),
 ('provided', 0.0005925223677193814),
 ('easy', 0.0005922554657519402),
 ('sets', 0.0005920306479121454),
 ('twenty', 0.0005920159383452623),
 ('ben', 0.0005919268678523268),
 ('us', 0.0005918044934621259),
 ('haven', 0.0005917675621554076),
 ('stone', 0.0005917675621554076),
 ('taken', 0.0005917323378957556),
 ('fill', 0.0005916510112962647),
 ('least', 0.0005914705528654417),
 ('begins', 0.0005914642920627396),
 ('friendly', 0.0005914251040754566),
 ...]
In [83]:
##example part of speech (POS) tagging (note that you need to tokenize the sentence first)
pos_tag(tokenizer.tokenize("This was a great day but the time is running out fast"))
Out[83]:
[('This', 'DT'),
 ('was', 'VBD'),
 ('a', 'DT'),
 ('great', 'JJ'),
 ('day', 'NN'),
 ('but', 'CC'),
 ('the', 'DT'),
 ('time', 'NN'),
 ('is', 'VBZ'),
 ('running', 'VBG'),
 ('out', 'RP'),
 ('fast', 'RB')]
In [51]:
## POS tagging  all reviews
## POS tagging is relatively slow, so this will take a while

reviews_pos_tagged=[pos_tag(tokenizer.tokenize(m)) for m in data.data]

## Reconstructing adjective-and-adverb-only reviews
reviews_adj_adv_only=[" ".join([w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]])
                      for m in reviews_pos_tagged]
In [84]:
print(data.data[1])
b"good films are hard to find these days . \ngreat films are beyond rare . \nproof of life , russell crowe's one-two punch of a deft kidnap and rescue thriller , is one of those rare gems . \na taut drama laced with strong and subtle acting , an intelligent script , and masterful directing , together it delivers something virtually unheard of in the film industry these days , genuine motivation in a story that rings true . \nconsider the strange coincidence of russell crowe's character in proof of life making the moves on a distraught wife played by meg ryan's character in the film -- all while the real russell crowe was hitching up with married woman meg ryan in the outside world . \ni haven't seen this much chemistry between actors since mcqueen and mcgraw teamed up in peckinpah's masterpiece , the getaway . \nbut enough with the gossip , let's get to the review . \nthe film revolves around the kidnapping of peter bowman ( david morse ) , an american engineer working in south america who is kidnapped during a mass ambush of civilians by anti-government soldiers . \nupon discovering his identity , the rebel soldiers decide to ransom him for $6 million . \nthe only problem is that the company peter bowman works for is being auctioned off , and no one will step forward with the money . \nwith no choice available to her , bowman's wife alice ( ryan ) hires terry thorne ( crowe ) , a highly skilled negotiator and rescue operative , to arrange the return of her husband . \nbut when things go wrong -- as they always do in these situations -- terry and his team ( which includes the most surprising casting choice of the year : david caruso ) take matters into their own hands . \nthe film is notable in that it takes this very simple story line and creates a complex and intelligent character-driven vehicle filled with well-written dialogue , shades of motivation , and convincing acting by all the actors . \nthe script is based on both a book ( the long march to freedom ) and a magazine article pertaining to kidnap/ransom situations , and the story has been sharply pieced together by tony gilroy , screenwriter of the devil's advocate and dolores claiborne . \nthe biggest surprise for me was not the chemistry between crowe and ryan , but that between crowe and david caruso . \ndug out from b-movie hell , caruso pulls off a gutsy performance as crowe's right hand gun while providing most of the film's humor . \nryan cries a lot and smokes too many cigarettes , david morse ends up getting everyone at the guerilla camp to hate him , and crowe provides another memorable acting turn as the stoic , gunslinger character of terry thorne . \nthe most memorable pieces of the film lie in its action scenes . \nthe bulk of those scenes , which bookend the movie , work extremely well as establishment and closure devices for all of the story's characters . \nthe scenes are skillfully crafted and executed with amazing accuracy and poise . \ndirector taylor hackford mixes both his old-school style of filmmaking with the dizziness of a lars von trier film . \nproof of life is a thinking man's action movie . \nit is a film about the choices men and women make in the face of love and war , and the sacrifices one makes for those choices -- the sacrifices that help you sleep at night . \n"
In [85]:
## It kind of works:
reviews_adj_adv_only[1]
Out[85]:
"good hard great rare crowe's one-two rare taut strong subtle intelligent masterful together virtually unheard genuine true strange distraught real married outside much enough let's david american south anti-government only forward available bowman's ryan terry highly skilled wrong always most surprising own notable very simple complex intelligent character-driven well-written long sharply together tony biggest not gutsy right most film's ryan too many david memorable gunslinger terry most memorable extremely well skillfully amazing old-school trier man's"
In [86]:
## term doc matrix only for adj/adv
X = vec.fit_transform(reviews_adj_adv_only)
terms = vec.get_feature_names()
In [87]:
len(terms)
Out[87]:
576
In [88]:
pmi_matrix=getcollocations_matrix(X)
pmi_matrix.shape  # n_words by n_words
Out[88]:
(576, 576)
In [89]:
getcollocations("good",pmi_matrix,terms)
Out[89]:
[('good', 0.0012832614349917284),
 ('sean', 0.0009249332576569759),
 ('nicely', 0.0009142510507387667),
 ('fairly', 0.000867719006738669),
 ('pretty', 0.0008609133674701302),
 ('terrific', 0.000831187604463116),
 ('he', 0.0008287103623039518),
 ('sadly', 0.0008245212623420526),
 ('horrible', 0.0008203986560303424),
 ('technical', 0.000817968488099229),
 ('stupid', 0.0008165931732810714),
 ('forward', 0.0008158591569455379),
 ('lovely', 0.0008132714383389111),
 ('robin', 0.0008028131634819533),
 ('sad', 0.0008020759613116301),
 ('total', 0.0007996250034466595),
 ('cool', 0.0007961783439490446),
 ('totally', 0.0007939970334176773),
 ('naturally', 0.0007829087048832272),
 ('thankfully', 0.0007774887114619778),
 ('they', 0.0007756992159419563),
 ('bad', 0.0007720796800988853),
 ('nice', 0.0007720517274657402),
 ('average', 0.0007712639195805711),
 ('fun', 0.0007703436484226743),
 ('climactic', 0.0007673110589637576),
 ('badly', 0.0007654486534808358),
 ('dumb', 0.0007600492426269871),
 ('therefore', 0.0007561876508739784),
 ('mainly', 0.0007555888597477207),
 ('bigger', 0.0007541908292930254),
 ('twice', 0.0007477153143173636),
 ('really', 0.0007464406345154477),
 ('suspenseful', 0.0007449621931686967),
 ('anti', 0.0007446382965629712),
 ('guilty', 0.0007372021703231894),
 ('extra', 0.000736855251654802),
 ('gary', 0.0007360226468506723),
 ('smart', 0.000732211878708694),
 ('aren', 0.0007289455060155697),
 ('ve', 0.0007279344858962693),
 ('violent', 0.0007274069971383735),
 ('boring', 0.00072578337925946),
 ('forever', 0.0007236625698992255),
 ('co', 0.0007234410631438232),
 ('longer', 0.0007230160096402135),
 ('natural', 0.0007226849583537482),
 ('scary', 0.0007221226337134648),
 ('fantastic', 0.0007207509218907141),
 ('though', 0.0007206722287012948),
 ('nevertheless', 0.0007151637054419488),
 ('that', 0.000715019519211013),
 ('slightly', 0.000714275671039496),
 ('particular', 0.0007133757961783439),
 ('probably', 0.0007111342630959992),
 ('looking', 0.0007101544769016765),
 ('terribly', 0.0007101544769016765),
 ('intelligent', 0.0007096530262048105),
 ('witty', 0.0007094402154212624),
 ('able', 0.0007077140835102619),
 ('usual', 0.0007077140835102619),
 ('brilliant', 0.0007048832271762208),
 ('realistic', 0.0007032908704883227),
 ('overall', 0.000703118537513442),
 ('maybe', 0.0007005098084086604),
 ('impressive', 0.0006997398403157801),
 ('somewhere', 0.0006987980005684003),
 ('very', 0.0006972942200561076),
 ('general', 0.0006970316067780315),
 ('plain', 0.0006970316067780315),
 ('disappointing', 0.0006963906581740977),
 ('capable', 0.0006948465547191663),
 ('better', 0.0006924412263889158),
 ('fair', 0.0006912556164518837),
 ('weird', 0.0006907289455060155),
 ('sure', 0.0006904748942965504),
 ('past', 0.0006904264112413089),
 ('right', 0.0006895282417358496),
 ('great', 0.0006882164679575816),
 ('seemingly', 0.0006870005005782542),
 ('actually', 0.000685391918869078),
 ('national', 0.0006848845969454146),
 ('wonderfully', 0.0006844011489946297),
 ('loud', 0.0006830979414751223),
 ('necessary', 0.0006830979414751223),
 ('peter', 0.0006824385805277526),
 ('evil', 0.0006819790259280704),
 ('relatively', 0.0006819790259280704),
 ('biggest', 0.0006811154333917554),
 ('dull', 0.0006802721088435374),
 ('believable', 0.0006794055201698514),
 ('huge', 0.0006784004824181208),
 ('robert', 0.0006760084925690022),
 ('funny', 0.0006758333405353084),
 ('major', 0.000675087264745043),
 ('fake', 0.0006748559296329996),
 ('it', 0.000674409891345073),
 ('there', 0.0006743076615601185),
 ('isn', 0.0006732595820762097),
 ('black', 0.0006723283793347488),
 ('offensive', 0.0006723283793347488),
 ('definitely', 0.0006717286216368588),
 ('danny', 0.0006713720089516268),
 ('well', 0.0006712165858124412),
 ('sometimes', 0.0006709129511677283),
 ('hardly', 0.0006708415850416599),
 ('special', 0.0006705840136359559),
 ('awful', 0.0006682137625701541),
 ('also', 0.0006677363413883081),
 ('brief', 0.0006667411628859836),
 ('musical', 0.0006664698190407897),
 ('responsible', 0.0006664307619721632),
 ('basic', 0.0006652512384996461),
 ('either', 0.0006651259793698213),
 ('anyway', 0.0006640466187830329),
 ('again', 0.0006629841024745483),
 ('as', 0.0006629024930696557),
 ('just', 0.0006621746447228981),
 ('before', 0.0006605331446095777),
 ('usually', 0.0006603252990637598),
 ('occasionally', 0.0006601366661314207),
 ('basically', 0.000660085931918576),
 ('around', 0.0006583525129797142),
 ('interesting', 0.0006582175157064767),
 ('then', 0.000658144236310809),
 ('regular', 0.0006574892130675981),
 ('movie', 0.0006571630775452432),
 ('especially', 0.000655566729988453),
 ('unbelievable', 0.0006549354060959373),
 ('next', 0.0006545539933664034),
 ('terrible', 0.0006535061962626672),
 ('however', 0.0006534727007700416),
 ('supposedly', 0.0006532745386248571),
 ('too', 0.0006526945865931038),
 ('extremely', 0.0006523525785905076),
 ('typical', 0.0006521079769487413),
 ('best', 0.0006519679895476074),
 ('minor', 0.0006490255985362402),
 ('never', 0.0006488495909123945),
 ('social', 0.0006485234510712218),
 ('ahead', 0.0006484191197566994),
 ('frank', 0.0006484191197566994),
 ('together', 0.0006481608436062671),
 ('earlier', 0.0006481171080567661),
 ('even', 0.0006475500993617314),
 ('not', 0.000647194430057688),
 ('professional', 0.000646173728422413),
 ('predictable', 0.0006440198159943383),
 ('personal', 0.000643647334897754),
 ('tough', 0.0006435774946921443),
 ('generally', 0.000643126584626801),
 ('entirely', 0.0006429233575550971),
 ('always', 0.0006416607690493041),
 ('entire', 0.0006411056991798842),
 ('alien', 0.0006396646524035059),
 ('likable', 0.0006396301969953506),
 ('surprising', 0.0006393830685506504),
 ('re', 0.0006393282282497197),
 ('quite', 0.0006392831469314743),
 ('second', 0.0006375118821969114),
 ('little', 0.0006375096023289368),
 ('quiet', 0.0006369426751592356),
 ('instead', 0.0006365668977697612),
 ('nearly', 0.0006362510978789325),
 ('possible', 0.0006361034884989469),
 ('poor', 0.0006359642685921708),
 ('don', 0.0006359458947599255),
 ('common', 0.0006352968284533978),
 ('john', 0.0006348405541191062),
 ('strong', 0.0006345571220687517),
 ('ever', 0.0006344707113487858),
 ('short', 0.0006341305662181353),
 ('straight', 0.0006339096148013345),
 ('we', 0.0006335726080949011),
 ('mean', 0.0006334621140927917),
 ('interested', 0.0006334041047416844),
 ('wild', 0.0006331171936267478),
 ('worse', 0.0006331171936267478),
 ('running', 0.0006329367463846492),
 ('so', 0.0006323964773073481),
 ('dramatic', 0.0006314423066345445),
 ('laughable', 0.0006312044528605038),
 ('funniest', 0.0006308765544434335),
 ('powerful', 0.0006299433051025408),
 ('back', 0.000629555760482866),
 ('largely', 0.000626832473966232),
 ('big', 0.0006259819025465529),
 ('completely', 0.0006258191508398261),
 ('wrong', 0.0006257682422617052),
 ('didn', 0.000625466230561772),
 ('tight', 0.0006249248888354765),
 ('finally', 0.0006248104337276312),
 ('ago', 0.0006243185861020256),
 ('decent', 0.0006241183259949558),
 ('more', 0.000621968372132822),
 ('perfectly', 0.0006215946588903384),
 ('fully', 0.0006202905790766413),
 ('hard', 0.0006202687831393604),
 ('much', 0.0006202111087208874),
 ('important', 0.00062015503875969),
 ('mental', 0.0006195398698270162),
 ('moral', 0.0006194579742725115),
 ('later', 0.0006191509803223855),
 ('many', 0.0006188591291767968),
 ('subtle', 0.0006185532540916463),
 ('here', 0.0006184390993909081),
 ('hot', 0.0006183186203300183),
 ('recently', 0.0006179294609753779),
 ('frankly', 0.0006176413819725922),
 ('small', 0.0006176413819725922),
 ('same', 0.0006169314493496352),
 ('remarkable', 0.0006160102867737209),
 ('certain', 0.0006154967938407428),
 ('still', 0.0006152448508223882),
 ('almost', 0.000614907621277048),
 ('enough', 0.0006135664210955028),
 ('visually', 0.0006133522057088936),
 ('obviously', 0.000612731403881253),
 ('other', 0.0006126687777268868),
 ('present', 0.000611591722914092),
 ('tony', 0.0006112076175770443),
 ('long', 0.0006096304083441553),
 ('quickly', 0.0006094667166229549),
 ('worth', 0.0006093904474805919),
 ('far', 0.0006088127147421085),
 ('positive', 0.0006075453209211171),
 ('unfunny', 0.000607317434454155),
 ('comic', 0.0006067717063359034),
 ('due', 0.0006066120715802245),
 ('spectacular', 0.0006066120715802245),
 ('seriously', 0.000605535245417656),
 ('apparently', 0.0006054510915389225),
 ('obvious', 0.0006047294823925617),
 ('superior', 0.0006046339887381151),
 ('only', 0.0006043969151390608),
 ('immediately', 0.000604379143709377),
 ('likely', 0.000604379143709377),
 ('slowly', 0.0006036040778368514),
 ('effectively', 0.0006034193764666443),
 ('incredible', 0.0006034193764666443),
 ('unfortunately', 0.0006029501089433885),
 ('future', 0.000602698445311965),
 ('main', 0.0006023551447621176),
 ('final', 0.0006023019331768913),
 ('wonderful', 0.0006022374652947902),
 ('exciting', 0.0006015569709837226),
 ('top', 0.0006010365929811415),
 ('away', 0.0006009182398614209),
 ('incredibly', 0.0006000947518029163),
 ('now', 0.0005997020752003287),
 ('highly', 0.0005994754589733983),
 ('most', 0.0005991814703546777),
 ('several', 0.0005989389355912623),
 ('man', 0.0005986565034283526),
 ('often', 0.0005983923958135547),
 ('old', 0.0005982720466243037),
 ('oh', 0.0005976252260753322),
 ('star', 0.0005976252260753322),
 ('mary', 0.0005968833874133718),
 ('absolutely', 0.0005958495993425108),
 ('early', 0.0005956009613700224),
 ('french', 0.00059447983014862),
 ('pure', 0.00059447983014862),
 ('emotional', 0.0005941229995182786),
 ('surprisingly', 0.00059359055590756),
 ('practically', 0.0005928774586387854),
 ('easy', 0.0005917276087127468),
 ('few', 0.0005915726229176027),
 ('willing', 0.0005914467697907188),
 ('yet', 0.000591312086826336),
 ('similar', 0.0005912837020295414),
 ('least', 0.0005911743392196498),
 ('ll', 0.0005911494109321011),
 ('solid', 0.0005910033399138328),
 ('whole', 0.0005908314711500944),
 ('along', 0.0005894513353447312),
 ('happy', 0.0005885547820076039),
 ('unnecessary', 0.0005879470847623714),
 ('soon', 0.0005870488322717622),
 ('about', 0.000586717804716572),
 ('international', 0.000586391669194217),
 ('non', 0.0005850436423684831),
 ('up', 0.0005847487615003539),
 ('third', 0.0005845194097140311),
 ('quick', 0.000583864118895966),
 ('sweet', 0.0005830216021298824),
 ('entertaining', 0.0005830033855511562),
 ('intriguing', 0.000582964482349131),
 ('effective', 0.0005827808830538584),
 ('simple', 0.0005823475887170154),
 ('utterly', 0.0005813365685977151),
 ('first', 0.000581192769237202),
 ('available', 0.0005810705106715834),
 ('middle', 0.0005807418508804796),
 ('double', 0.0005794409058740269),
 ('apart', 0.0005793995674345695),
 ('doesn', 0.0005786797017725769),
 ('last', 0.0005779384843487602),
 ('she', 0.0005770591757852905),
 ('what', 0.0005767341635770193),
 ('deep', 0.0005766355217394896),
 ('apparent', 0.0005766007375125712),
 ('pathetic', 0.0005766007375125712),
 ('honest', 0.0005762814680012133),
 ('else', 0.000576117518792678),
 ('single', 0.0005757673899744503),
 ('secret', 0.0005756074545883464),
 ('michael', 0.0005753841128657395),
 ('exactly', 0.0005752567854478683),
 ('free', 0.0005746851204444231),
 ('otherwise', 0.0005743176159709176),
 ('friendly', 0.0005740347566249902),
 ('rather', 0.0005732484076433122),
 ('light', 0.000571948524632783),
 ('constantly', 0.0005711551688047606),
 ('previous', 0.0005707371641211789),
 ('certainly', 0.0005704842058212914),
 ('such', 0.0005690939798374554),
 ('popular', 0.000569045232629571),
 ('talented', 0.0005689196710160164),
 ('already', 0.0005686201044674146),
 ('appropriate', 0.0005678171135140473),
 ('flat', 0.000567251746325019),
 ('known', 0.0005661712668082095),
 ('normal', 0.0005661712668082095),
 ('out', 0.0005661712668082095),
 ('real', 0.0005661712668082095),
 ('less', 0.0005645254201023716),
 ('enjoyable', 0.0005644129709485567),
 ('impossible', 0.0005626326963906581),
 ('originally', 0.0005621841452109685),
 ('clever', 0.0005617480537862704),
 ('convincing', 0.00056160536949524),
 ('virtually', 0.00056160536949524),
 ('perhaps', 0.0005612799383692617),
 ('excellent', 0.0005611161662117076),
 ('particularly', 0.0005609928710752076),
 ('aside', 0.0005608300284420943),
 ('classic', 0.0005605766890729504),
 ('nasty', 0.0005605095541401274),
 ('you', 0.0005597375024126616),
 ('thoroughly', 0.0005594311326795403),
 ('safe', 0.000559004541911903),
 ('truly', 0.000559004541911903),
 ('fascinating', 0.0005583077769914288),
 ('different', 0.0005582527875521505),
 ('fast', 0.0005578452187669123),
 ('once', 0.0005570083587423185),
 ('easily', 0.0005566899298042442),
 ('emotionally', 0.0005566075629769897),
 ('new', 0.0005565231119904089),
 ('critical', 0.0005564096932425507),
 ('down', 0.0005563655897475252),
 ('key', 0.0005559568367369274),
 ('original', 0.0005557015278771899),
 ('over', 0.0005551008789097249),
 ('suddenly', 0.0005548065151022053),
 ('painful', 0.0005541761128504085),
 ('military', 0.0005538631957906397),
 ('serial', 0.0005533037380171138),
 ('intense', 0.0005525285856803009),
 ('cute', 0.0005520169851380043),
 ('nowhere', 0.000551579223849235),
 ('young', 0.0005511013442752416),
 ('bright', 0.0005506171111266653),
 ('dangerous', 0.0005504442871746481),
 ('silly', 0.0005503408202033747),
 ('humorous', 0.0005500868558193399),
 ('necessarily', 0.000549928648498138),
 ('soft', 0.0005490885130683066),
 ('somewhat', 0.0005486772108113266),
 ('crazy', 0.0005482544545674433),
 ('essentially', 0.0005479076775563317),
 ('close', 0.0005473421765129823),
 ('half', 0.0005471174260983178),
 ('mysterious', 0.0005469790204757277),
 ('potential', 0.0005464211063381557),
 ('slow', 0.0005463207498317021),
 ('familiar', 0.0005452879004095461),
 ('worst', 0.0005451500564069146),
 ('mad', 0.0005449398443029017),
 ('screen', 0.0005445799896841676),
 ('indeed', 0.0005441534953212235),
 ('animated', 0.0005440552016985138),
 ('visual', 0.0005425807973578674),
 ('low', 0.0005424444362627787),
 ('serious', 0.0005416989106494434),
 ('giant', 0.0005416520387180901),
 ('rarely', 0.0005411931226843178),
 ('steve', 0.0005411931226843178),
 ('no', 0.0005411194408432445),
 ('literally', 0.0005408003845691623),
 ('favorite', 0.0005407377919320594),
 ('like', 0.0005404362092260182),
 ('memorable', 0.0005401736065976285),
 ('aware', 0.0005397819281010471),
 ('standard', 0.0005393928960807942),
 ('chris', 0.0005387077352093038),
 ('shallow', 0.0005387077352093038),
 ('true', 0.0005377934893765022),
 ('sci', 0.0005376344086021505),
 ('poorly', 0.0005375615485386457),
 ('computer', 0.0005374203821656051),
 ('older', 0.0005372849776853417),
 ('physical', 0.0005366183710132754),
 ('rare', 0.0005366183710132754),
 ('clear', 0.0005357231027502098),
 ('comedic', 0.0005349215540298343),
 ('simply', 0.0005347540528206044),
 ('can', 0.0005345322842512801),
 ('unique', 0.0005337073180233351),
 ('complex', 0.0005332543326914531),
 ('female', 0.0005330442246013461),
 ('oddly', 0.0005325848357263666),
 ('surely', 0.0005325848357263666),
 ('fi', 0.0005324705961648636),
 ('dead', 0.000531612760912124),
 ('genuinely', 0.0005307855626326964),
 ('successful', 0.000529644088304454),
 ('rich', 0.0005296190009565806),
 ('constant', 0.0005294068988336504),
 ('jean', 0.0005293313556117849),
 ('lucky', 0.0005292470537555002),
 ('psychological', 0.0005290452820994744),
 ('cold', 0.0005289231571497747),
 ('mostly', 0.0005281963647661954),
 ('merely', 0.0005281316348195329),
 ('clearly', 0.0005275686804349225),
 ('overly', 0.0005265392781316348),
 ('life', 0.0005263623496107572),
 ('lee', 0.0005263000508358004),
 ('amazing', 0.0005259159703149652),
 ('ready', 0.0005259159703149652),
 ('late', 0.0005252946775020133),
 ('dark', 0.0005251443634163102),
 ('private', 0.0005248513141063681),
 ('open', 0.0005247766694708168),
 ('eventually', 0.0005237084217975938),
 ('united', 0.0005233792524564262),
 ('attractive', 0.0005230179690331935),
 ('graphic', 0.0005230179690331935),
 ('time', 0.0005229220728159156),
 ('weak', 0.0005226196308998857),
 ('large', 0.0005216863815589931),
 ('strange', 0.0005216863815589931),
 ('various', 0.0005211349160393747),
 ('to', 0.0005209980274352141),
 ('billy', 0.0005185727974747759),
 ('heavily', 0.0005182964905707506),
 ('genuine', 0.0005179533841954225),
 ('own', 0.0005177625711331581),
 ('barely', 0.000517274657402046),
 ('hilarious', 0.0005159235668789809),
 ('lead', 0.0005143386860440776),
 ('ultimate', 0.0005140239132864007),
 ('high', 0.0005112873174747606),
 ('traditional', 0.0005108811040339703),
 ('difficult', 0.0005095541401273885),
 ('successfully', 0.000508161915700811),
 ('possibly', 0.0005075636942675159),
 ('grand', 0.000506484536873609),
 ('greatest', 0.0005064087442006763),
 ('thus', 0.0005060155697098372),
 ('further', 0.0005057372551826141),
 ('dimensional', 0.0005055100596501871),
 ('wide', 0.0005042462845010616),
 ('recent', 0.0005031142773769634),
 ('complete', 0.0005025188758652747),
 ('innocent', 0.0005022487044266374),
 ('year', 0.000499562882477832),
 ('thin', 0.0004989384288747346),
 ('alive', 0.0004978402518485979),
 ('initially', 0.0004965993738529634),
 ('english', 0.0004963966388564935),
 ('david', 0.0004960803527682509),
 ('perfect', 0.0004944686557157225),
 ('human', 0.0004931504004900357),
 ('beautiful', 0.0004920049200492005),
 ('eccentric', 0.0004909766454352442),
 ('political', 0.0004908043124603634),
 ('all', 0.0004905190716743539),
 ('married', 0.0004905190716743539),
 ('chinese', 0.0004899559039686428),
 ('sole', 0.0004887233104995393),
 ('sympathetic', 0.0004875363686404026),
 ('sexual', 0.00048720527433232763),
 ('empty', 0.0004862680638312445),
 ('off', 0.0004861263635698075),
 ('tim', 0.0004852896572641795),
 ('modern', 0.00048489829463735357),
 ('foreign', 0.00048319789150010983),
 ('narrative', 0.00048319789150010983),
 ('outstanding', 0.0004820106730934756),
 ('lame', 0.00048166809265773043),
 ('painfully', 0.00048124557678697803),
 ('hearted', 0.00048071145295036654),
 ('fresh', 0.00048038774153423834),
 ('numerous', 0.00047928359714952384),
 ('ex', 0.0004789413913988051),
 ('full', 0.000478736230452094),
 ('unable', 0.0004786720710287589),
 ('unusual', 0.0004786720710287589),
 ('public', 0.00047845459166890946),
 ('blue', 0.00047770700636942675),
 ('subject', 0.000476979902858971),
 ('worthy', 0.0004768904131961457),
 ('ultimately', 0.00047569136499234053),
 ('equally', 0.00047481181239143023),
 ('film', 0.0004737351416150324),
 ('cheap', 0.00047335630503637184),
 ('former', 0.00047282951741550467),
 ('occasional', 0.0004725703718923361),
 ('alone', 0.0004720200181983621),
 ('sudden', 0.00047095155375410154),
 ('one', 0.0004706472969156914),
 ('near', 0.00047035766780989714),
 ('william', 0.0004702874232358514),
 ('british', 0.000470124355474674),
 ('fine', 0.00046983604175244),
 ('accidentally', 0.00046932618169627895),
 ('famous', 0.0004684393219425067),
 ('frequently', 0.00046834020232296744),
 ('steven', 0.0004639458991900605),
 ('green', 0.0004615526631588664),
 ('deadly', 0.00046123435097737757),
 ('sharp', 0.00046051254448132537),
 ('meanwhile', 0.0004596493532076958),
 ('of', 0.0004579326422713459),
 ('ill', 0.00045780254777070064),
 ('initial', 0.0004566758803028482),
 ('local', 0.00045631714041258675),
 ('ugly', 0.0004549590536851683),
 ('actual', 0.00045453186208546393),
 ('be', 0.0004542536908112378),
 ('ridiculous', 0.0004519259933272672),
 ('desperately', 0.00045158898662083375),
 ('somehow', 0.000449853902587711),
 ('heavy', 0.00044979161751985535),
 ('self', 0.0004482189195564992),
 ('american', 0.00044597422497684366),
 ('limited', 0.00044422668626490285),
 ('cinematic', 0.00044378462078763795),
 ('inevitable', 0.0004428268122535639),
 ('current', 0.0004411724156947087),
 ('younger', 0.00043862719021954696),
 ('bottom', 0.0004376939408786542),
 ('directly', 0.0004361048947036208),
 ('tiny', 0.00043312101910828024),
 ('unfortunate', 0.00043295449814745426),
 ('white', 0.0004315217691013869),
 ('greater', 0.00043136858423482627),
 ('tom', 0.0004311611954924057),
 ('teen', 0.00042832087141142803),
 ('extraordinary', 0.0004246284501061571),
 ('fellow', 0.0004246284501061571),
 ('latest', 0.0004246284501061571),
 ('unlikely', 0.0004246284501061571),
 ('odd', 0.00042015867694714494),
 ('latter', 0.00041755130927105446),
 ('romantic', 0.00041599779055115387),
 ('unexpected', 0.0004119529739835853),
 ('detective', 0.00040998608975766895),
 ('live', 0.00040875448935452505),
 ('red', 0.00040864780951076413),
 ('on', 0.00040208180673768863),
 ('the', 0.00040030077848549186),
 ('creative', 0.00039927749786101345),
 ('ten', 0.0003963198867657466),
 ('two', 0.0003951403632932295),
 ('desperate', 0.00039465467715748723),
 ('central', 0.0003832012842421418),
 ('in', 0.00038137925611386335),
 ('previously', 0.0003809166978893468),
 ('and', 0.00037276543329929824),
 ('bizarre', 0.00034742327735958313),
 ('angry', 0.00033777263076626134)]

We can make this better by combining multiple seet terms

In [90]:
def seed_score(pos_seed,PMI_MATRIX=pmi_matrix,TERMS=terms):
    score=defaultdict(int)
    for seed in pos_seed:
        c=dict(getcollocations(seed,PMI_MATRIX,TERMS))
        for w in c:
            score[w]+=c[w]
    return score
In [91]:
sorted(seed_score(['good','great','perfect','cool']).items(),key=itemgetter(1),reverse=True)
Out[91]:
[('cool', 0.01233842898097543),
 ('perfect', 0.006798784836900034),
 ('great', 0.004248631481147458),
 ('frank', 0.004199495490947327),
 ('eccentric', 0.004084710911853615),
 ('fake', 0.003911762949434756),
 ('looking', 0.0038393777520473217),
 ('lovely', 0.0038028477315611427),
 ('greatest', 0.003730147480106928),
 ('amazing', 0.0036985631973772324),
 ('twice', 0.003661242122664058),
 ('anti', 0.003660653152490723),
 ('generally', 0.0035927347963626934),
 ('known', 0.0035564386989262358),
 ('totally', 0.003546906203489673),
 ('plain', 0.0035118779583992172),
 ('earlier', 0.003485008708798782),
 ('stupid', 0.0033319627079071512),
 ('sad', 0.0032958074094856364),
 ('convincing', 0.0032845107876468384),
 ('overall', 0.003268378795897639),
 ('nicely', 0.0032643270804776636),
 ('good', 0.003262124902614077),
 ('pretty', 0.003249145627494061),
 ('climactic', 0.003243555286391972),
 ('man', 0.003190950013225829),
 ('necessarily', 0.0031860920461633867),
 ('past', 0.0031797773077773686),
 ('fun', 0.003153100558314221),
 ('friendly', 0.0031332047075606456),
 ('terribly', 0.0031123173387858664),
 ('intriguing', 0.0031087143840738247),
 ('necessary', 0.0030837671387598064),
 ('extra', 0.0030693991457936233),
 ('actually', 0.0030683303367765517),
 ('apart', 0.0030613602501711593),
 ('they', 0.0030539757813460434),
 ('black', 0.003053138089032602),
 ('definitely', 0.0030493148349400295),
 ('best', 0.003043422883124742),
 ('bigger', 0.00302455408969759),
 ('musical', 0.003003959069610916),
 ('quiet', 0.0029999963749380303),
 ('john', 0.0029968693911054983),
 ('steven', 0.0029867301907443955),
 ('pure', 0.0029859258177321285),
 ('painful', 0.0029827416960734117),
 ('classic', 0.0029772145231463792),
 ('perfectly', 0.0029687945292047385),
 ('basically', 0.0029613324950677937),
 ('somewhere', 0.0029537299642018867),
 ('he', 0.002948724291562637),
 ('horrible', 0.002945643760366221),
 ('brilliant', 0.0029418273019810766),
 ('technical', 0.0029277317228736336),
 ('really', 0.0029263971284278645),
 ('fully', 0.0029147021133513534),
 ('forward', 0.002905429173729839),
 ('mainly', 0.002901635335665364),
 ('visually', 0.002897170954974414),
 ('shallow', 0.002869477485777347),
 ('nasty', 0.002865858196500654),
 ('maybe', 0.0028622287923194237),
 ('isn', 0.0028566090192177563),
 ('all', 0.00285461092390361),
 ('regular', 0.002851129537218903),
 ('probably', 0.0028510038317324277),
 ('scary', 0.0028496778141188024),
 ('slightly', 0.002839870019252718),
 ('present', 0.002839191031164896),
 ('non', 0.002836054321034321),
 ('green', 0.002833792308872344),
 ('entire', 0.002819008284677596),
 ('aren', 0.0028185457174324195),
 ('interesting', 0.0028106487026091083),
 ('professional', 0.002801892043622532),
 ('especially', 0.0027872535559887897),
 ('excellent', 0.002783733520408009),
 ('sympathetic', 0.00278352694514409),
 ('same', 0.0027832670373233006),
 ('mary', 0.0027826630422002983),
 ('huge', 0.002781314871475908),
 ('nevertheless', 0.002771202856021355),
 ('constantly', 0.002761195678000438),
 ('weird', 0.0027601413120321343),
 ('anyway', 0.002757779963983885),
 ('forever', 0.0027568329027658684),
 ('tony', 0.002756290460877205),
 ('future', 0.002749234391960185),
 ('very', 0.0027481646751586373),
 ('sure', 0.0027459142198264248),
 ('though', 0.0027453667922239028),
 ('soft', 0.002742373141880715),
 ('nice', 0.002741853177318124),
 ('blue', 0.002739395424413957),
 ('wonderful', 0.0027377931144858384),
 ('light', 0.0027373283200756394),
 ('sean', 0.0027337378934548374),
 ('second', 0.0027335395520831232),
 ('entirely', 0.002733523367163213),
 ('realistic', 0.002731294773691186),
 ('memorable', 0.002730908137254546),
 ('badly', 0.0027307312781661053),
 ('still', 0.0027191290459714634),
 ('danny', 0.002718729213354672),
 ('third', 0.00271541531851348),
 ('also', 0.0027056370383890453),
 ('inevitable', 0.002703987742769333),
 ('famous', 0.0027032161393452437),
 ('before', 0.0026925347274034638),
 ('literally', 0.0026894086542412756),
 ('smart', 0.002683582489884059),
 ('deadly', 0.0026766545015301318),
 ('yet', 0.0026766197782430662),
 ('quick', 0.002675617905789821),
 ('dumb', 0.0026710891292985525),
 ('poor', 0.002667166995342299),
 ('out', 0.0026667066493894324),
 ('wonderfully', 0.002665905523637924),
 ('it', 0.0026623466980691805),
 ('exactly', 0.0026622488358886177),
 ('always', 0.002652590008331684),
 ('again', 0.0026501302605318553),
 ('suspenseful', 0.0026485166277748695),
 ('straight', 0.002646189752081472),
 ('that', 0.002642264415676568),
 ('oh', 0.0026408399178469),
 ('didn', 0.002634709735067517),
 ('incredibly', 0.0026306080205473936),
 ('else', 0.0026236977815681873),
 ('not', 0.0026197610436078512),
 ('never', 0.0026185162906170443),
 ('then', 0.002613898264607873),
 ('original', 0.0026121619253186364),
 ('lucky', 0.002611194469540666),
 ('just', 0.0026082660263522482),
 ('final', 0.002605105324093498),
 ('obviously', 0.0025999411138145803),
 ('mental', 0.002598422855348094),
 ('chinese', 0.002594664847599488),
 ('right', 0.0025941719891221897),
 ('next', 0.002590120626479452),
 ('sadly', 0.002589445111141814),
 ('longer', 0.002578784157910541),
 ('alien', 0.0025777906620490405),
 ('likable', 0.0025776081267851907),
 ('together', 0.002577143083714865),
 ('ve', 0.002576658664841161),
 ('wrong', 0.002574267820832323),
 ('similar', 0.002570303153596778),
 ('about', 0.0025682590836591324),
 ('over', 0.0025672992416833455),
 ('easily', 0.0025649029055357614),
 ('comic', 0.0025648804047939894),
 ('certain', 0.002563451398133774),
 ('funny', 0.002562830642541606),
 ('boring', 0.0025616238435000253),
 ('desperately', 0.0025568098595270487),
 ('usual', 0.002556158669214207),
 ('like', 0.0025558699946088698),
 ('witty', 0.002554642026201358),
 ('late', 0.0025543690617334434),
 ('already', 0.0025527149553354602),
 ('originally', 0.0025514339703727),
 ('cold', 0.0025499605916891304),
 ('french', 0.0025444561222301753),
 ('believable', 0.002543143621598955),
 ('later', 0.002542810478900551),
 ('desperate', 0.002542474373414215),
 ('completely', 0.0025416873280540166),
 ('bad', 0.002541534118352169),
 ('long', 0.0025397945691588344),
 ('first', 0.002535085273247567),
 ('co', 0.00253448031405291),
 ('evil', 0.0025334075067660966),
 ('close', 0.0025325295336182225),
 ('terrific', 0.002531797517920264),
 ('wild', 0.00253043821721538),
 ('utterly', 0.0025275094878181724),
 ('beautiful', 0.0025262843425938046),
 ('traditional', 0.002524371276082079),
 ('mean', 0.0025242814452110457),
 ('as', 0.002523913421332458),
 ('computer', 0.0025223013263525763),
 ('strong', 0.0025219961530189706),
 ('incredible', 0.002517568804243094),
 ('ever', 0.0025150040801195025),
 ('nearly', 0.002514305025450897),
 ('therefore', 0.002513451112442391),
 ('little', 0.002506072311279282),
 ('single', 0.002505291173635525),
 ('movie', 0.002503536342636675),
 ('whole', 0.0025007145988450545),
 ('older', 0.002498322493840491),
 ('almost', 0.0024977229944715953),
 ('slowly', 0.0024942831852932394),
 ('general', 0.00249251121682983),
 ('there', 0.0024915178366652618),
 ('only', 0.002490669377909969),
 ('well', 0.0024880154640546824),
 ('major', 0.0024870428132837117),
 ('emotionally', 0.0024865103239952464),
 ('tough', 0.002486159146802659),
 ('merely', 0.002481832205084195),
 ('too', 0.0024813276898331686),
 ('seemingly', 0.0024776359121710554),
 ('occasionally', 0.0024768922155345282),
 ('able', 0.002476601756585323),
 ('re', 0.002472412146757419),
 ('extremely', 0.002471343494995713),
 ('once', 0.002471033221713364),
 ('away', 0.002466459287948913),
 ('important', 0.0024656807279107595),
 ('ll', 0.0024621555414009303),
 ('so', 0.0024614535416055787),
 ('thankfully', 0.002460228506662616),
 ('key', 0.0024601953216293253),
 ('instead', 0.002459485644152606),
 ('effectively', 0.0024586780396513193),
 ('robert', 0.0024573908936458004),
 ('most', 0.0024567708406460402),
 ('intense', 0.0024540340246803705),
 ('solid', 0.002452225134191952),
 ('top', 0.002451460433879865),
 ('last', 0.0024507916818635083),
 ('special', 0.0024474442315980055),
 ('creative', 0.0024472815239837834),
 ('tim', 0.0024459044680487695),
 ('psychological', 0.00244476670365184),
 ('clear', 0.00244394486836231),
 ('disappointing', 0.0024420242374239755),
 ('sometimes', 0.0024397798656203814),
 ('emotional', 0.002437342965428747),
 ('hearted', 0.002435117648234929),
 ('and', 0.002434775894362365),
 ('minor', 0.0024326013810353738),
 ('responsible', 0.0024319820376394546),
 ('here', 0.0024311048832102297),
 ('other', 0.0024299883603013535),
 ('normal', 0.0024298569825335708),
 ('willing', 0.0024261246443667258),
 ('much', 0.0024261229493739082),
 ('hot', 0.0024196044249443906),
 ('remarkable', 0.0024195606507870326),
 ('barely', 0.0024175002418381566),
 ('effective', 0.0024169124922279123),
 ('animated', 0.0024109840734820127),
 ('outstanding', 0.0024107809176646656),
 ('half', 0.0024084639659253332),
 ('subtle', 0.0024083972202186494),
 ('different', 0.002407773631130719),
 ('fascinating', 0.002403685035026676),
 ('least', 0.002403331373138169),
 ('lame', 0.002402847501293744),
 ('seriously', 0.002401916898466411),
 ('total', 0.002401279300347136),
 ('violent', 0.002400535983510144),
 ('far', 0.0024005213011034504),
 ('doesn', 0.002400464865939389),
 ('absolutely', 0.002400431236130183),
 ('unfortunately', 0.002399753462836699),
 ('usually', 0.002399454857408294),
 ('aware', 0.0023960326566863843),
 ('previously', 0.0023958414023801155),
 ('moral', 0.002395235049660093),
 ('ago', 0.002393629653696946),
 ('awful', 0.002393420826940957),
 ('surely', 0.002391676453026381),
 ('biggest', 0.002390865343509711),
 ('greater', 0.0023903319537811043),
 ('truly', 0.002386967231988023),
 ('successful', 0.0023869373994698283),
 ('real', 0.002385190944733403),
 ('powerful', 0.0023847176086883894),
 ('back', 0.0023806187028304715),
 ('short', 0.0023774499236466013),
 ('eventually', 0.0023698523692254974),
 ('quickly', 0.0023688699328822584),
 ('secret', 0.0023676820230258654),
 ('intelligent', 0.002366708277842242),
 ('robin', 0.0023651976511104983),
 ('dull', 0.00235964477869644),
 ('many', 0.0023590119795897794),
 ('michael', 0.0023589064413142794),
 ('off', 0.0023580212371917464),
 ('natural', 0.002356857167653292),
 ('personal', 0.0023566081734952646),
 ('more', 0.00235539519766598),
 ('high', 0.0023548555530815605),
 ('hardly', 0.0023485136786730067),
 ('superior', 0.002347312160007487),
 ('big', 0.0023463307195843493),
 ('alive', 0.002345887108786913),
 ('now', 0.0023368253650724344),
 ('otherwise', 0.0023316118219038357),
 ('enough', 0.002331558244168371),
 ('such', 0.0023313159079823265),
 ('fairly', 0.0023300790747863434),
 ('even', 0.0023295884605102823),
 ('capable', 0.002329072615940134),
 ('tight', 0.0023289164961650087),
 ('mad', 0.0023252643113845844),
 ('several', 0.002324876098041507),
 ('ahead', 0.00232478891174848),
 ('private', 0.002323938810211654),
 ('indeed', 0.0023233119218103422),
 ('no', 0.002321993798982203),
 ('somewhat', 0.0023197875411107224),
 ('serial', 0.002316222892701973),
 ('simple', 0.0023154877313890524),
 ('we', 0.0023152809960851227),
 ('immediately', 0.002314386462682709),
 ('old', 0.002313901722294),
 ('new', 0.0023136153967201574),
 ('occasional', 0.002313367300043446),
 ('honest', 0.002310746954588941),
 ('suddenly', 0.0023091385180307403),
 ('visual', 0.002304223240593406),
 ('main', 0.0023038137423872776),
 ('apparent', 0.002302475481884322),
 ('entertaining', 0.0023023063162522428),
 ('potential', 0.002300250289886611),
 ('soon', 0.0022988242749470353),
 ('silly', 0.002298284087838806),
 ('gary', 0.0022899832288248365),
 ('she', 0.0022875778111765914),
 ('don', 0.0022863504541293096),
 ('relatively', 0.002285570771709969),
 ('pathetic', 0.002280629178427796),
 ('bright', 0.002280352499605687),
 ('attractive', 0.002280010685796501),
 ('initially', 0.002279932158451899),
 ('up', 0.0022772831624123012),
 ('unexpected', 0.0022765307261015653),
 ('difficult', 0.0022753479475171212),
 ('hilarious', 0.0022747276808401406),
 ('surprisingly', 0.002269646495223617),
 ('average', 0.0022684700842012396),
 ('small', 0.002266828621142853),
 ('typical', 0.0022574522552556682),
 ('however', 0.00225704169246753),
 ('better', 0.0022567705132825926),
 ('what', 0.002254185440970014),
 ('brief', 0.002253217282469064),
 ('around', 0.0022473890284868177),
 ('guilty', 0.002245444106373558),
 ('impossible', 0.002245287470222215),
 ('obvious', 0.0022451600892185787),
 ('impressive', 0.0022442582461890724),
 ('perhaps', 0.002244145584435339),
 ('serious', 0.002243385622669684),
 ('decent', 0.0022328653519831255),
 ('certainly', 0.0022321207870243474),
 ('innocent', 0.00223192076106074),
 ('international', 0.0022311108082540463),
 ('ultimate', 0.002227728559313895),
 ('fair', 0.0022262788637288523),
 ('human', 0.002224433969022101),
 ('peter', 0.002223954022722312),
 ('less', 0.0022137925950437287),
 ('deep', 0.00221186169544016),
 ('happy', 0.0022112088452389167),
 ('possible', 0.0022095882316891602),
 ('english', 0.002206408349555691),
 ('rare', 0.0022058513645247073),
 ('tom', 0.0022030707241931058),
 ('common', 0.0022012160069994173),
 ('genuinely', 0.002198948512828223),
 ('basic', 0.0021977498874032183),
 ('standard', 0.0021941975079702502),
 ('actual', 0.002193307780211677),
 ('national', 0.0021906365425489404),
 ('star', 0.002187352805003062),
 ('numerous', 0.002184433922212774),
 ('sharp', 0.002180467631924875),
 ('true', 0.002179804625173467),
 ('lee', 0.002177677031010695),
 ('spectacular', 0.002177609291983217),
 ('worse', 0.0021744261458589323),
 ('quite', 0.0021718602677202976),
 ('lead', 0.0021657650352752524),
 ('favorite', 0.00216567820348113),
 ('finally', 0.002161142727699993),
 ('empty', 0.002158321847172612),
 ('strange', 0.0021572049950062994),
 ('clever', 0.0021557059229457523),
 ('few', 0.0021545920236595304),
 ('hard', 0.0021528800039478093),
 ('red', 0.002151662251742031),
 ('own', 0.002147796474615853),
 ('of', 0.0021453161208263016),
 ('frankly', 0.0021452378161904186),
 ('fast', 0.0021423510935287874),
 ('complete', 0.002138681263627506),
 ('particular', 0.002135804166824032),
 ('full', 0.002134401343947513),
 ('grand', 0.00213226141731942),
 ('dead', 0.0021316555748768394),
 ('fantastic', 0.0021304672573844597),
 ('social', 0.0021272845860293007),
 ('you', 0.002126860652727663),
 ('safe', 0.002125503559035235),
 ('virtually', 0.0021228974792839818),
 ('popular', 0.0021225651112164565),
 ('equally', 0.0021202583856818067),
 ('often', 0.0021192912537311985),
 ('latest', 0.002118743729347657),
 ('unique', 0.002118664223964338),
 ('political', 0.002116365997444799),
 ('unfortunate', 0.002115077048141771),
 ('early', 0.0021141573314535653),
 ('funniest', 0.0021140007773808992),
 ('ready', 0.0021139804915892495),
 ('accidentally', 0.0021139204676988393),
 ('sci', 0.0021135829268120196),
 ('recent', 0.0021111884479918692),
 ('meanwhile', 0.002110182756405934),
 ('sexual', 0.0021044354971384597),
 ('cinematic', 0.002104275709167208),
 ('fi', 0.0021041621438043457),
 ('along', 0.0021024956883385973),
 ('easy', 0.0021016454330641684),
 ('physical', 0.0021010909668950396),
 ('teen', 0.0020943036771299685),
 ('unusual', 0.002093759530768171),
 ('thin', 0.002092294837065608),
 ('rich', 0.002090055886096352),
 ('rather', 0.0020870186335897253),
 ('various', 0.0020820717486230663),
 ('modern', 0.0020804173582543413),
 ('supposedly', 0.002080402822714092),
 ('ultimately', 0.0020762703131997516),
 ('william', 0.002075711512092768),
 ('practically', 0.0020746764726317962),
 ('white', 0.0020720529883677505),
 ('simply', 0.0020681061456906506),
 ('overly', 0.00206257446442149),
 ('female', 0.0020538277325249763),
 ('previous', 0.002052852117674875),
 ('tiny', 0.002048222963157593),
 ('thoroughly', 0.002047877902676961),
 ('due', 0.0020475837898075817),
 ('can', 0.0020473404084645776),
 ('american', 0.0020471900845285005),
 ('successfully', 0.002045906484840632),
 ('running', 0.0020436877607170455),
 ('aside', 0.0020407291382383754),
 ('apparently', 0.0020399939153677798),
 ('worthy', 0.0020367988916105664),
 ('unbelievable', 0.002035088832233652),
 ('british', 0.002032234003322862),
 ('ridiculous', 0.0020301322616863367),
 ('fresh', 0.0020269257906189767),
 ('clearly', 0.0020242662191528346),
 ('young', 0.0020212947952003455),
 ('directly', 0.0020190399267392815),
 ('worst', 0.002018321777439585),
 ('constant', 0.0020156129690804677),
 ('either', 0.002014093341491322),
 ('dangerous', 0.0020136403323678005),
 ('former', 0.0020132344660249227),
 ('complex', 0.0020126673738765162),
 ('david', 0.0020110649362481653),
 ('interested', 0.0020088882476282043),
 ('predictable', 0.002008212956503268),
 ('exciting', 0.002007826204078083),
 ('to', 0.0020072741565426836),
 ('unable', 0.002005718101470375),
 ('graphic', 0.002005376865260159),
 ('comedic', 0.0020048017378319193),
 ('highly', 0.0019973415101339166),
 ('fine', 0.0019967923686048336),
 ('cheap', 0.001994060677197784),
 ('mysterious', 0.0019907877971670116),
 ('open', 0.00199015894799177),
 ('mostly', 0.0019884061604660734),
 ('latter', 0.0019880256460866478),
 ('live', 0.001986795686860045),
 ('rarely', 0.001984649462275682),
 ('ugly', 0.0019837783774960035),
 ('weak', 0.0019707188745212005),
 ('possibly', 0.0019706075424450655),
 ('thus', 0.001968906729120521),
 ('available', 0.001967797578191252),
 ('ex', 0.001966281279950839),
 ('sweet', 0.001964407940363985),
 ('two', 0.0019635062673976737),
 ('particularly', 0.0019613888258522964),
 ('be', 0.0019605016669826157),
 ('younger', 0.0019603510166971935),
 ('the', 0.001954997718634469),
 ('nowhere', 0.0019516959316690215),
 ('middle', 0.0019498844504859775),
 ('slow', 0.0019495347271947126),
 ('heavily', 0.0019485833263928904),
 ('offensive', 0.0019482222698296579),
 ('giant', 0.0019457706206236593),
 ('on', 0.0019406685251326917),
 ('poorly', 0.0019374121631600583),
 ('screen', 0.001929547443973368),
 ('enjoyable', 0.001923709317142371),
 ('talented', 0.0019224756187088095),
 ('dramatic', 0.001921834165809007),
 ('subject', 0.001920647737495943),
 ('time', 0.0019061790328508419),
 ('dark', 0.0019053830824877353),
 ('bizarre', 0.0018987347112403435),
 ('life', 0.0018962494533953567),
 ('ill', 0.0018936014090258438),
 ('unnecessary', 0.0018922404730965067),
 ('likely', 0.0018907534543162957),
 ('terrible', 0.0018903140224032968),
 ('film', 0.001887782153527676),
 ('critical', 0.001879942478999445),
 ('further', 0.0018762931089131496),
 ('loud', 0.0018747595872502252),
 ('billy', 0.001845104604459428),
 ('sudden', 0.0018410687770122414),
 ('dimensional', 0.0018398050325275027),
 ('romantic', 0.0018293731001022885),
 ('central', 0.0018247775451691005),
 ('essentially', 0.0018232053751663139),
 ('large', 0.001822843226009556),
 ('married', 0.0018222599660140423),
 ('detective', 0.0018207588729618088),
 ('surprising', 0.0018139300761870559),
 ('unfunny', 0.0018076086167803196),
 ('initial', 0.0018063052931670369),
 ('public', 0.0018019761571857404),
 ('one', 0.0018005131509604578),
 ('sole', 0.001795965620240488),
 ('double', 0.001788234721973658),
 ('flat', 0.0017738175378711758),
 ('down', 0.0017713157865333839),
 ('largely', 0.0017605355035379654),
 ('wide', 0.0017602308875251382),
 ('in', 0.0017575874535586198),
 ('somehow', 0.0017483060301847226),
 ('worth', 0.0017339160020791128),
 ('painfully', 0.0017203082812571528),
 ('odd', 0.0017070813783993787),
 ('limited', 0.0017021925664718698),
 ('extraordinary', 0.001701478305054484),
 ('frequently', 0.0017001988086572553),
 ('familiar', 0.001692156191506733),
 ('year', 0.001689555348394929),
 ('angry', 0.0016884671186636357),
 ('free', 0.001682755737495471),
 ('foreign', 0.0016816202029281),
 ('chris', 0.001681151841632327),
 ('naturally', 0.0016765910783645183),
 ('low', 0.0016725509847084342),
 ('crazy', 0.001661263296294774),
 ('genuine', 0.001660915690145844),
 ('recently', 0.0016572720876909868),
 ('appropriate', 0.0016503542247951867),
 ('fellow', 0.0016395777540513942),
 ('cute', 0.001631364584010221),
 ('steve', 0.0016290806706737333),
 ('military', 0.0016101588895060573),
 ('humorous', 0.001605772161822982),
 ('positive', 0.0016056163317255743),
 ('local', 0.0016000580609697656),
 ('laughable', 0.0015909884579457525),
 ('bottom', 0.0015875328878313534),
 ('self', 0.0015710444112117714),
 ('ten', 0.0015664543675963226),
 ('alone', 0.0015632633997377963),
 ('unlikely', 0.0015330664386263545),
 ('jean', 0.001500748385125891),
 ('narrative', 0.0014983497303010325),
 ('near', 0.0014588298150256776),
 ('united', 0.0014333462534527634),
 ('current', 0.0013395265610777166),
 ('oddly', 0.001328033675835792),
 ('heavy', 0.0012452460985996902)]
In [92]:
posscores=seed_score(['good','great','perfect','cool'])
negscores=seed_score(['bad','terrible','wrong',"crap","long","boring"])

## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms:
    sentscores[w] = posscores[w] - negscores[w]
    
In [93]:
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
Out[93]:
[('terrible', -0.011337935788715956),
 ('boring', -0.009940296206073694),
 ('wrong', -0.0038957762492763384),
 ('bad', -0.0028038772074515574),
 ('laughable', -0.0027406189540628715),
 ('unfunny', -0.0027135011838022864),
 ('worst', -0.0026471332624708405),
 ('frankly', -0.002587338946017292),
 ('terribly', -0.0025576562935382737),
 ('horrible', -0.0024121874515479237),
 ('awful', -0.0021579647379428284),
 ('ugly', -0.0020067310545418757),
 ('oddly', -0.0019930152819700904),
 ('exciting', -0.001958238856113128),
 ('running', -0.0019564731340303175),
 ('total', -0.0018477526013480836),
 ('painfully', -0.0018460471673599388),
 ('successfully', -0.0018190646800686494),
 ('ridiculous', -0.0018031715407181696),
 ('sadly', -0.0017829759978268338),
 ('bottom', -0.001769537540949412),
 ('we', -0.0017453041163577711),
 ('current', -0.001733764414992582),
 ('dull', -0.0016994388711972343),
 ('positive', -0.0016958052468327694),
 ('fair', -0.00168358034258908),
 ('ten', -0.0016615516042460942),
 ('poorly', -0.0016255490883917791),
 ('longer', -0.0016219407642753233),
 ('supposedly', -0.0016197365007933275),
 ('long', -0.0016152178963842242),
 ('foreign', -0.0015749275429113633),
 ('responsible', -0.001571347852518935),
 ('complete', -0.001569859685280305),
 ('pathetic', -0.001562411460165396),
 ('sole', -0.0014762838822532996),
 ('stupid', -0.001403251021265037),
 ('particular', -0.001402573126479277),
 ('low', -0.0013894492381992967),
 ('worse', -0.0013611681182488936),
 ('giant', -0.0013523241357618141),
 ('chinese', -0.001337497643663317),
 ('unbelievable', -0.0013303905386979195),
 ('unnecessary', -0.00131660406790726),
 ('doesn', -0.0013067653462435543),
 ('down', -0.0013009964808874883),
 ('weak', -0.001290743094020337),
 ('seriously', -0.001280361005356888),
 ('guilty', -0.0012759457766346595),
 ('huge', -0.0012214986373703203),
 ('worth', -0.0012163717863471562),
 ('silly', -0.0012130592923207716),
 ('double', -0.001210481211132458),
 ('that', -0.0012070521514938042),
 ('disappointing', -0.0011807670388760487),
 ('nowhere', -0.0011606225325787114),
 ('possible', -0.0011491794774951733),
 ('frequently', -0.0011469801859383087),
 ('desperately', -0.0011424453715553045),
 ('shallow', -0.0011384244435511679),
 ('predictable', -0.0011120428233618845),
 ('to', -0.0011058198988504373),
 ('completely', -0.0011032444587055746),
 ('offensive', -0.0010970960168989908),
 ('poor', -0.0010965921061633229),
 ('public', -0.0010840280896041703),
 ('gary', -0.001067755005125484),
 ('absolutely', -0.001026522206518719),
 ('graphic', -0.001024011269851582),
 ('overly', -0.0010201841232575807),
 ('thankfully', -0.0010149951549212268),
 ('angry', -0.0010110188791198813),
 ('no', -0.0010020614536842705),
 ('you', -0.0009949959520265542),
 ('lame', -0.000994378561571215),
 ('attractive', -0.0009846101524466954),
 ('re', -0.0009818428273642007),
 ('utterly', -0.0009700899144521528),
 ('middle', -0.0009700496756428923),
 ('they', -0.0009699456032204955),
 ('modern', -0.0009602179163986603),
 ('heavy', -0.000959430853419093),
 ('loud', -0.0009593880159746828),
 ('international', -0.000947889571331048),
 ('due', -0.0009449777026130446),
 ('flat', -0.000943498486118813),
 ('slow', -0.000936043860322225),
 ('superior', -0.0009253200744961609),
 ('surprising', -0.000919772111017412),
 ('ex', -0.0009109836158998105),
 ('equally', -0.0008988178254337453),
 ('subject', -0.0008972107221829695),
 ('military', -0.0008941009608490051),
 ('apart', -0.0008921972866551358),
 ('talented', -0.0008917010667096683),
 ('standard', -0.0008872939756282405),
 ('hardly', -0.0008776965462900465),
 ('practically', -0.0008750915480863412),
 ('obvious', -0.0008701220382583389),
 ('female', -0.0008680989032270711),
 ('cheap', -0.0008564492852698932),
 ('physical', -0.0008498830824418741),
 ('possibly', -0.0008469268319057162),
 ('aren', -0.0008397280021778201),
 ('ve', -0.0008365152484233855),
 ('steve', -0.0008355973078291424),
 ('accidentally', -0.0008353770315996422),
 ('basic', -0.0008346153861572892),
 ('alien', -0.0008185638092304041),
 ('essentially', -0.0008082539162401208),
 ('dumb', -0.0008028376104339493),
 ('aside', -0.0008012383951446411),
 ('unique', -0.0008003752649116298),
 ('sweet', -0.0007971894394690261),
 ('anyway', -0.0007960303504455316),
 ('largely', -0.0007877402329115519),
 ('peter', -0.0007858145554954835),
 ('up', -0.0007819843592687314),
 ('rich', -0.0007799646645649497),
 ('didn', -0.0007710518174823778),
 ('fi', -0.0007680731803592316),
 ('recently', -0.0007668801474510802),
 ('hard', -0.0007643704792226866),
 ('sci', -0.0007583426737492777),
 ('even', -0.0007573246173436503),
 ('unfortunately', -0.0007564513391728083),
 ('plain', -0.0007550527694653989),
 ('either', -0.0007528454883448344),
 ('of', -0.0007525540145690811),
 ('can', -0.0007441589482778972),
 ('chris', -0.0007405228629783673),
 ('entertaining', -0.0007355219005145538),
 ('dark', -0.0007340251639409206),
 ('potential', -0.000731803390525919),
 ('near', -0.0007309453448941252),
 ('totally', -0.0007257501605525398),
 ('sudden', -0.0007167056368809609),
 ('there', -0.0007132810375892829),
 ('better', -0.0007121005858083877),
 ('bizarre', -0.0007102499887883572),
 ('fascinating', -0.0007101600372832039),
 ('extremely', -0.000702831348441181),
 ('crazy', -0.0007001875489527892),
 ('odd', -0.0007000656180773442),
 ('half', -0.000697018395206481),
 ('apparently', -0.0006909176358485037),
 ('free', -0.0006857205953191604),
 ('appropriate', -0.0006853377884564126),
 ('complex', -0.000684395886488557),
 ('funny', -0.0006838122964675361),
 ('rather', -0.0006768704066807464),
 ('indeed', -0.0006748263747000886),
 ('safe', -0.0006704534362490105),
 ('easy', -0.0006698501733494017),
 ('least', -0.0006696310548561795),
 ('narrative', -0.0006693801173021394),
 ('brief', -0.0006678116323338111),
 ('now', -0.0006673157481890471),
 ('somehow', -0.0006641988757246794),
 ('twice', -0.0006641287075245849),
 ('too', -0.0006627836222457967),
 ('alone', -0.0006580188413029815),
 ('painful', -0.000657722958170764),
 ('otherwise', -0.0006568516671700813),
 ('wide', -0.0006545396855330964),
 ('dead', -0.000653123385060093),
 ('honest', -0.0006515573730144176),
 ('big', -0.0006496164697680088),
 ('lead', -0.0006494149269442007),
 ('central', -0.0006470915436464454),
 ('though', -0.0006442513977681823),
 ('so', -0.0006428789535567973),
 ('one', -0.0006418320196895418),
 ('interested', -0.0006401133976040372),
 ('time', -0.0006364896936115636),
 ('rarely', -0.0006363730750465154),
 ('merely', -0.0006308553338537993),
 ('tiny', -0.0006298494923142961),
 ('else', -0.0006277954394166021),
 ('critical', -0.0006250500330304016),
 ('be', -0.0006191685537481752),
 ('local', -0.0006158657943370705),
 ('major', -0.0006120423051833371),
 ('directly', -0.0006075876863173642),
 ('such', -0.0006049323327150537),
 ('various', -0.0006040198222932126),
 ('likely', -0.0005997093672459831),
 ('future', -0.000599449654797211),
 ('robin', -0.0005869254763942828),
 ('genuine', -0.0005823523828022148),
 ('only', -0.0005811959336262402),
 ('ultimate', -0.0005785256606300397),
 ('oh', -0.0005775859845771575),
 ('cute', -0.0005756849515812109),
 ('finally', -0.0005751148206935329),
 ('special', -0.0005682017753211362),
 ('former', -0.0005632603785982321),
 ('few', -0.0005574191632068473),
 ('aware', -0.0005555379716066988),
 ('much', -0.0005551280489618786),
 ('mostly', -0.0005543440959590073),
 ('latter', -0.000552853559619679),
 ('early', -0.0005520667304396258),
 ('available', -0.0005509522206528497),
 ('climactic', -0.0005506226915431576),
 ('already', -0.0005462021598010699),
 ('truly', -0.0005428115437271551),
 ('seemingly', -0.0005419132742930663),
 ('mary', -0.0005418292000601045),
 ('whole', -0.0005385437695464993),
 ('just', -0.0005370653451239643),
 ('along', -0.0005340060418226622),
 ('ll', -0.0005337104369835871),
 ('fellow', -0.0005223175816882971),
 ('familiar', -0.0005145307593608793),
 ('somewhere', -0.0005102507291033648),
 ('perhaps', -0.0005082367122989867),
 ('simple', -0.0005067682313010637),
 ('tough', -0.0005045886225163313),
 ('common', -0.000503857413382454),
 ('dramatic', -0.0005035860765712917),
 ('it', -0.000503461169935823),
 ('entirely', -0.0004971688457674106),
 ('main', -0.0004953088674902852),
 ('forever', -0.0004948695741177358),
 ('bright', -0.0004927768773291159),
 ('simply', -0.0004890112391900646),
 ('impressive', -0.0004887583209977186),
 ('large', -0.00048589159722396864),
 ('enough', -0.0004840620655300397),
 ('typical', -0.00048259380175424233),
 ('here', -0.00047975496205869637),
 ('bigger', -0.00047644760254505905),
 ('romantic', -0.0004672569278175954),
 ('therefore', -0.00046688494803137984),
 ('short', -0.00046492991647513904),
 ('average', -0.0004646315755049298),
 ('mainly', -0.00046421322270260136),
 ('deep', -0.00046330482043274637),
 ('ultimately', -0.00045854112467534234),
 ('decent', -0.00045700466155881945),
 ('unlikely', -0.00045562672156462466),
 ('cinematic', -0.00045532681631786295),
 ('straight', -0.00045515277058060053),
 ('incredibly', -0.00045403526773077135),
 ('far', -0.0004522929759844356),
 ('really', -0.0004501960190467589),
 ('single', -0.00044602936593327903),
 ('strange', -0.00044431205594842394),
 ('instead', -0.0004435105420098496),
 ('ever', -0.0004386290841801757),
 ('full', -0.0004378995395570035),
 ('self', -0.00043537616380091284),
 ('thus', -0.000435262083165897),
 ('ahead', -0.0004322469438230391),
 ('important', -0.00043116312351972746),
 ('back', -0.0004307387709850414),
 ('spectacular', -0.0004297835719436231),
 ('relatively', -0.0004260154655909695),
 ('the', -0.0004236841293395956),
 ('occasional', -0.0004234868072372209),
 ('maybe', -0.0004234443406997417),
 ('thin', -0.0004225904058378794),
 ('not', -0.0004213831255542376),
 ('dimensional', -0.0004203062524537585),
 ('happy', -0.00041551703086272753),
 ('away', -0.00041521498971391945),
 ('never', -0.00041427663947311974),
 ('certainly', -0.00041414089937138153),
 ('successful', -0.0004137343964682209),
 ('alive', -0.0004120879048515641),
 ('usual', -0.0004085566481352165),
 ('young', -0.00040793611859985153),
 ('easily', -0.0004066581498735587),
 ('don', -0.00040568204613010764),
 ('empty', -0.0004019144488789264),
 ('then', -0.00040178677514092313),
 ('apparent', -0.00039784352360013606),
 ('screen', -0.00039005452168201543),
 ('previous', -0.00038622574033567647),
 ('often', -0.0003804195026694524),
 ('different', -0.00037876868648600674),
 ('difficult', -0.0003764750831032368),
 ('old', -0.0003725818400590591),
 ('impossible', -0.0003710713261974004),
 ('on', -0.00036269638020895115),
 ('tom', -0.00036243731322769916),
 ('real', -0.0003623537714778427),
 ('however', -0.00036185401254073147),
 ('married', -0.0003600511230321961),
 ('other', -0.00035950949260183697),
 ('unable', -0.00035852239590076053),
 ('funniest', -0.00035829495399516183),
 ('ill', -0.0003565244942728442),
 ('many', -0.00035609966284931224),
 ('previously', -0.0003550544177524568),
 ('as', -0.00035229762209222117),
 ('quite', -0.00034903953306314765),
 ('desperate', -0.0003465632012755802),
 ('film', -0.0003457790164877331),
 ('violent', -0.000339928133520359),
 ('english', -0.00033205306151005455),
 ('small', -0.00032812818234550893),
 ('worthy', -0.00032729125644853033),
 ('fast', -0.0003202755225513534),
 ('likable', -0.00031919523251016345),
 ('barely', -0.0003171112852659013),
 ('naturally', -0.00031484551933644864),
 ('constant', -0.0003128648272295361),
 ('top', -0.0003118794791431177),
 ('jean', -0.00031142759612205057),
 ('eventually', -0.00030643819158205103),
 ('human', -0.0003004701642169835),
 ('evil', -0.00029877898921396723),
 ('believable', -0.0002976062581400152),
 ('white', -0.00029256051223548133),
 ('serious', -0.0002915094823329181),
 ('wild', -0.0002904852948483836),
 ('he', -0.0002895451984794026),
 ('year', -0.0002894236019519831),
 ('quickly', -0.00028486728778277575),
 ('david', -0.0002841836580306811),
 ('two', -0.0002803594837186671),
 ('hilarious', -0.0002792293408196405),
 ('billy', -0.0002784563033864521),
 ('fairly', -0.00027817137110490225),
 ('clear', -0.0002757689850125377),
 ('more', -0.00027483851236456084),
 ('little', -0.0002738519948942077),
 ('united', -0.0002712073575661905),
 ('social', -0.0002692116370561332),
 ('badly', -0.0002656577126927363),
 ('first', -0.0002656045421871446),
 ('soon', -0.00026406576498318787),
 ('limited', -0.00026077833129929196),
 ('pretty', -0.0002556044683908313),
 ('recent', -0.00025431639723114043),
 ('in', -0.0002529341542810658),
 ('comedic', -0.00024923527505396137),
 ('nearly', -0.00024558280477428654),
 ('entire', -0.00024552533171258266),
 ('personal', -0.0002429041811083158),
 ('genuinely', -0.00023638211781148218),
 ('ready', -0.00023297057991365734),
 ('interesting', -0.0002319057727939739),
 ('new', -0.0002288779420982481),
 ('basically', -0.00022794420212937563),
 ('certain', -0.00022774660798796816),
 ('exactly', -0.00022187133498753273),
 ('around', -0.0002212138544179596),
 ('national', -0.00021892498564345803),
 ('next', -0.0002158841794732266),
 ('less', -0.0002053370589299667),
 ('popular', -0.00020410840527040636),
 ('out', -0.00020327091044571987),
 ('immediately', -0.00020113758416102478),
 ('heavily', -0.0001978586638053557),
 ('virtually', -0.00019741273624147545),
 ('like', -0.00019201990770032797),
 ('effectively', -0.00018865752036787429),
 ('initial', -0.00018847850036724087),
 ('right', -0.00018667331121528363),
 ('again', -0.00018442600209907407),
 ('able', -0.00018271056934554336),
 ('obviously', -0.0001757623199258253),
 ('meanwhile', -0.0001700542754715078),
 ('favorite', -0.00016790618953759634),
 ('ago', -0.00016776635504992367),
 ('well', -0.00016722410208565514),
 ('further', -0.00016372379001952783),
 ('most', -0.00015933307977614858),
 ('close', -0.00015749033148161314),
 ('fantastic', -0.00015367714248264988),
 ('fine', -0.00014661878484316512),
 ('famous', -0.00014606101687336627),
 ('about', -0.00014439417455700205),
 ('own', -0.00014057528409486264),
 ('star', -0.00014039861913854286),
 ('suddenly', -0.00013654192505882642),
 ('tight', -0.00013514609804481277),
 ('younger', -0.0001342270569405857),
 ('french', -0.00013271869553648733),
 ('several', -0.00013224135315127988),
 ('over', -0.00013178399190350585),
 ('subtle', -0.0001304908962331229),
 ('extraordinary', -0.00013034796186768647),
 ('third', -0.00012689187862063335),
 ('teen', -0.00012547607896861357),
 ('open', -0.0001251771370265795),
 ('good', -0.00012464300361279563),
 ('mad', -0.00012423836030612785),
 ('what', -0.00011744539314099403),
 ('sure', -0.00011140597894659064),
 ('actual', -0.00010631033966002502),
 ('mental', -0.00010554178290598766),
 ('intense', -0.00010320983741859144),
 ('isn', -0.0001019431777130762),
 ('quick', -9.144819523439858e-05),
 ('cold', -8.453928225732842e-05),
 ('thoroughly', -8.446311268023136e-05),
 ('high', -8.108657336266872e-05),
 ('private', -8.082521935904967e-05),
 ('mysterious', -7.89915855672666e-05),
 ('psychological', -7.079354944579917e-05),
 ('last', -6.040588450787962e-05),
 ('emotional', -5.9231389028841126e-05),
 ('dangerous', -5.840343082568443e-05),
 ('numerous', -5.6507297730663385e-05),
 ('red', -5.518097392980415e-05),
 ('probably', -5.510557060454035e-05),
 ('enjoyable', -5.304302448897163e-05),
 ('almost', -5.268059132795041e-05),
 ('necessarily', -5.181284608931757e-05),
 ('live', -5.161664924643525e-05),
 ('incredible', -5.0838444616687316e-05),
 ('yet', -4.993545030473239e-05),
 ('actually', -4.970290822225044e-05),
 ('technical', -4.964327727627381e-05),
 ('once', -4.108719790097684e-05),
 ('very', -3.4591909186824834e-05),
 ('clearly', -3.0778856868112024e-05),
 ('occasionally', -2.9330789441932553e-05),
 ('life', -2.7919699401356872e-05),
 ('key', -2.5608117934161883e-05),
 ('rare', -2.1669136957346707e-05),
 ('later', -1.826701189417898e-05),
 ('off', -1.7711394330840666e-05),
 ('powerful', -1.5583342484341237e-05),
 ('together', -9.288051209772625e-06),
 ('true', -4.87041375588906e-06),
 ('similar', -3.89795186164733e-06),
 ('michael', -2.5784077662730463e-06),
 ('robert', -2.421454439686249e-07),
 ('regular', 9.074791722952363e-07),
 ('nice', 5.07184371876428e-06),
 ('particularly', 1.608419229506765e-05),
 ('terrific', 1.7873781404614906e-05),
 ('smart', 1.830028423949106e-05),
 ('visual', 1.9897264289668766e-05),
 ('american', 2.1072590232642293e-05),
 ('general', 2.142930413662073e-05),
 ('realistic', 3.266246171006355e-05),
 ('intelligent', 3.9886651389091365e-05),
 ('mean', 5.054165961934035e-05),
 ('humorous', 5.432574384601043e-05),
 ('usually', 5.541394191310462e-05),
 ('innocent', 6.0470827055490034e-05),
 ('capable', 6.280583117186135e-05),
 ('originally', 6.378534225396108e-05),
 ('sexual', 6.454153211762643e-05),
 ('grand', 6.640245105755038e-05),
 ('she', 7.44441831502362e-05),
 ('also', 7.641971683174965e-05),
 ('same', 7.726204521844339e-05),
 ('surprisingly', 7.814986440048783e-05),
 ('strong', 7.969296011480593e-05),
 ('still', 8.722930070878283e-05),
 ('willing', 8.872637216204328e-05),
 ('necessary', 8.891509448837737e-05),
 ('inevitable', 8.992341325145042e-05),
 ('deadly', 9.701581319515075e-05),
 ('literally', 9.852399016292528e-05),
 ('fresh', 0.0001081531161383144),
 ('witty', 0.00011555628335896363),
 ('movie', 0.00011610771682999971),
 ('before', 0.0001294858280637642),
 ('biggest', 0.00013372314741634094),
 ('secret', 0.00014397664031058025),
 ('late', 0.00015327727974057386),
 ('tim', 0.00015454685279179918),
 ('original', 0.000166646316708497),
 ('greater', 0.00016828610277696545),
 ('slightly', 0.00016914106338807022),
 ('soft', 0.00016963763368304206),
 ('lee', 0.00018716734589623767),
 ('beautiful', 0.00018782224248475215),
 ('sympathetic', 0.00018964480188837264),
 ('nevertheless', 0.00019220598277431417),
 ('slowly', 0.00019659574810863443),
 ('non', 0.0001988968383649769),
 ('sometimes', 0.00019992977542792213),
 ('british', 0.00020169386175819472),
 ('clever', 0.00020410422641186548),
 ('blue', 0.00021376094430168493),
 ('and', 0.00021441861464148188),
 ('nasty', 0.00021891481020010596),
 ('minor', 0.00021980135793128047),
 ('william', 0.0002221413294301671),
 ('normal', 0.0002222509905768803),
 ('especially', 0.00024404651952017204),
 ('moral', 0.00024919162284080124),
 ('always', 0.0002532388991124115),
 ('political', 0.0002623995973431354),
 ('weird', 0.0002690142191958027),
 ('latest', 0.0002748801528105005),
 ('highly', 0.0002803689422022479),
 ('wonderful', 0.0002952821816267323),
 ('serial', 0.0002977511634248602),
 ('initially', 0.0003026285852441848),
 ('memorable', 0.0003047299564975484),
 ('unfortunate', 0.00030614891738813574),
 ('final', 0.0003100101631852959),
 ('fun', 0.0003142947725700519),
 ('older', 0.0003187683928988676),
 ('brilliant', 0.0003215879003835055),
 ('co', 0.00032396165017799846),
 ('natural', 0.0003269851375293292),
 ('comic', 0.0003289997380387633),
 ('hot', 0.00033107428643060514),
 ('intriguing', 0.0003346515095666398),
 ('danny', 0.0003500028138064999),
 ('black', 0.00035883217167973543),
 ('unusual', 0.0003729001202466308),
 ('overall', 0.00038221150638625533),
 ('forward', 0.0003894123135382067),
 ('effective', 0.00039077394646117947),
 ('surely', 0.0003991479944951801),
 ('visually', 0.00039954294262994645),
 ('second', 0.0004001165402682627),
 ('extra', 0.00040435182000170397),
 ('fake', 0.0004087926346521024),
 ('lucky', 0.000414792487349712),
 ('detective', 0.0004258121327198735),
 ('scary', 0.00044858555977865663),
 ('all', 0.00046052119261720406),
 ('light', 0.0004666619849259899),
 ('unexpected', 0.0004755937806912272),
 ('present', 0.0004883839287330521),
 ('remarkable', 0.0004909132424787039),
 ('pure', 0.0004930297335705856),
 ('animated', 0.0005004334032993982),
 ('constantly', 0.0005005278442861616),
 ('solid', 0.0005104773753039919),
 ('sean', 0.0005354051081864117),
 ('fully', 0.000538860116644163),
 ('green', 0.0005626488560537576),
 ('sad', 0.0005690037657206867),
 ('classic', 0.0006113368528880467),
 ('best', 0.0006232564824836927),
 ('computer', 0.0006261920366367954),
 ('somewhat', 0.0006418769897289041),
 ('creative', 0.0006670668130810274),
 ('excellent', 0.0006877090591556421),
 ('steven', 0.000700126493842716),
 ('wonderfully', 0.0007011900004185701),
 ('tony', 0.000713347461311362),
 ('sharp', 0.000714782515078788),
 ('anti', 0.0007395501537729543),
 ('definitely', 0.0007444490420821688),
 ('musical', 0.0007462502942548184),
 ('friendly', 0.000761119245625841),
 ('professional', 0.0007843680588404591),
 ('perfectly', 0.0008016637337694504),
 ('emotionally', 0.0008558689372788594),
 ('john', 0.0008935668249893006),
 ('past', 0.0009797392305912716),
 ('nicely', 0.000980917414583356),
 ('outstanding', 0.0009961964716528754),
 ('hearted', 0.0010123986481938667),
 ('traditional', 0.0010483779957726606),
 ('generally', 0.0010503112142917228),
 ('amazing', 0.0010694376494112855),
 ('suspenseful', 0.001114814515979623),
 ('man', 0.0011264816790070354),
 ('earlier', 0.0011453103516269971),
 ('known', 0.0011615678408455577),
 ('lovely', 0.0013584838796366254),
 ('quiet', 0.0013999389580393261),
 ('convincing', 0.001406054680451586),
 ('looking', 0.001440831372975097),
 ('great', 0.001442680743868476),
 ('eccentric', 0.0015794026794116564),
 ('frank', 0.0016222794801935836),
 ('greatest', 0.0016856586307944627),
 ('perfect', 0.004148090426618199),
 ('cool', 0.009074307660516472)]

Now let's apply this methodology to real (and important!) scenario where we don't have any sentiment labels: the Kardashians

In [94]:
## Loading the Kardashian data
with open("kardashian-transcripts.json", "rb") as f:
    transcripts = json.load(f)
In [95]:
msgs = [m['text'].lower() for transcript in transcripts
        for m in transcript ]
In [96]:
#msgs_pos_tagged = [pos_tag(tokenizer.tokenize(m)) for m in msgs]
In [97]:
msgs_adj_adv_only_tokenized=[[w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]]
                      for m in msgs_pos_tagged]
In [98]:
msgs_adj_adv_only=[" ".join([w for w,tag in m if tag in ["JJ","RB","RBS","RBJ","JJR","JJS"]])
                      for m in msgs_pos_tagged]
In [99]:
msgs[23]
Out[99]:
'and then if you could take out the trash, and then if you go to dash, maybe tomorrow or whatever, later today and just...'
In [100]:
msgs_adj_adv_only[23]
Out[100]:
'then then maybe later just'
In [101]:
vec = CountVectorizer(min_df = 10)
X = vec.fit_transform(msgs_adj_adv_only)
terms_kard = vec.get_feature_names()
len(terms_kard)
Out[101]:
358
In [102]:
pmi_matrix_kard=getcollocations_matrix(X)
In [103]:
getcollocations("good",pmi_matrix_kard,terms_kard)
Out[103]:
[('good', 0.0013962375073486185),
 ('positive', 0.0005952380952380953),
 ('horrible', 0.0003968253968253968),
 ('awful', 0.00030525030525030525),
 ('nude', 0.0003006253006253006),
 ('proud', 0.00024366471734892786),
 ('extremely', 0.0002204585537918871),
 ('willing', 0.00018896447467876037),
 ('pretty', 0.0001670843776106934),
 ('bruce', 0.00016534391534391533),
 ('strong', 0.00015873015873015873),
 ('such', 0.00013598378084359391),
 ('anywhere', 0.00013227513227513228),
 ('dramatic', 0.00012025012025012025),
 ('everything', 0.00012025012025012025),
 ('honest', 0.00012025012025012025),
 ('online', 0.00012025012025012025),
 ('though', 0.00011671335200746966),
 ('adrienne', 0.00011337868480725624),
 ('half', 0.00011022927689594356),
 ('kimberly', 0.00011022927689594356),
 ('wish', 0.00010175010175010176),
 ('very', 9.831259831259832e-05),
 ('that', 9.101499927188001e-05),
 ('really', 8.846426043878273e-05),
 ('he', 8.818342151675484e-05),
 ('smart', 8.267195767195767e-05),
 ('all', 7.78089013383131e-05),
 ('fun', 7.78089013383131e-05),
 ('rob', 7.78089013383131e-05),
 ('instead', 7.348618459729571e-05),
 ('super', 7.348618459729571e-05),
 ('too', 7.297938332421091e-05),
 ('black', 7.215007215007215e-05),
 ('like', 6.961849067112225e-05),
 ('big', 6.764069264069264e-05),
 ('actually', 6.705191036988273e-05),
 ('um', 6.421122925977295e-05),
 ('sure', 6.081615277017576e-05),
 ('before', 6.012506012506013e-05),
 ('hard', 5.922767116796967e-05),
 ('it', 5.860290670417253e-05),
 ('about', 5.7510927076144466e-05),
 ('busy', 5.7510927076144466e-05),
 ('sometimes', 5.7510927076144466e-05),
 ('clean', 5.628729032984352e-05),
 ('real', 5.166997354497354e-05),
 ('always', 5.069778588942352e-05),
 ('close', 4.99151442547669e-05),
 ('they', 4.99151442547669e-05),
 ('ve', 4.9680800854509775e-05),
 ('great', 4.95412480431207e-05),
 ('also', 4.8393341076267905e-05),
 ('not', 4.468754468754469e-05),
 ('back', 4.4276194903809964e-05),
 ('uh', 4.409171075837742e-05),
 ('we', 4.166145898429363e-05),
 ('she', 4.101554489151388e-05),
 ('maybe', 3.968253968253968e-05),
 ('single', 3.9485114111979786e-05),
 ('own', 3.834061805076298e-05),
 ('definitely', 3.7578162578162574e-05),
 ('still', 3.6743092298647854e-05),
 ('you', 3.651767455448437e-05),
 ('healthy', 3.575003575003575e-05),
 ('armenian', 3.480924533556112e-05),
 ('so', 3.4052186737495216e-05),
 ('pregnant', 3.348737525952716e-05),
 ('best', 3.094155140938767e-05),
 ('hot', 3.0761658668635414e-05),
 ('ready', 3.0292015024839453e-05),
 ('nervous', 3.0062530062530064e-05),
 ('different', 2.972474882587242e-05),
 ('least', 2.9394473838918284e-05),
 ('whole', 2.8446265005404792e-05),
 ('as', 2.7994736989445982e-05),
 ('again', 2.775002775002775e-05),
 ('gorgeous', 2.755731922398589e-05),
 ('re', 2.6577692729161045e-05),
 ('absolutely', 2.5936300446104367e-05),
 ('far', 2.5936300446104367e-05),
 ('well', 2.5221953188054885e-05),
 ('let', 2.519526329050139e-05),
 ('only', 2.4495394865765236e-05),
 ('just', 2.3795526441029087e-05),
 ('don', 2.2937884209560507e-05),
 ('cool', 2.2806057288815908e-05),
 ('probably', 2.2806057288815908e-05),
 ('then', 2.2675736961451248e-05),
 ('now', 2.2102528529263747e-05),
 ('few', 2.204585537918871e-05),
 ('right', 2.140374308659098e-05),
 ('old', 2.133469875405359e-05),
 ('happy', 2.077619878666999e-05),
 ('ll', 2.058756922570152e-05),
 ('here', 2.047975636318077e-05),
 ('long', 2.035002035002035e-05),
 ('perfect', 2.035002035002035e-05),
 ('next', 2.0194676683226302e-05),
 ('never', 2.010260368922983e-05),
 ('together', 1.9742557055989893e-05),
 ('bad', 1.917030902538149e-05),
 ('better', 1.917030902538149e-05),
 ('comfortable', 1.8630300320441167e-05),
 ('beautiful', 1.8119881133579763e-05),
 ('anymore', 1.740462266778056e-05),
 ('obviously', 1.6958350291683625e-05),
 ('last', 1.643169345032699e-05),
 ('gonna', 1.6380821334381707e-05),
 ('honestly', 1.5936762924714734e-05),
 ('first', 1.5873015873015872e-05),
 ('enough', 1.486237441293621e-05),
 ('already', 1.3778659611992945e-05),
 ('ever', 1.3096547750013096e-05),
 ('new', 1.2270420433685741e-05),
 ('crazy', 1.1403028644407954e-05),
 ('up', 9.44822373393802e-06),
 ('wrong', 9.315150160220583e-06),
 ('else', 8.877525656049147e-06),
 ('there', 8.332291796858726e-06),
 ('little', 6.7401341286691605e-06),
 ('more', 5.249013185521122e-06),
 ('even', 3.3572368597749307e-06),
 ('much', 2.9071457642886215e-06),
 ('able', 0.0),
 ('acceptable', 0.0),
 ('accurate', 0.0),
 ('active', 0.0),
 ('afraid', 0.0),
 ('ago', 0.0),
 ('ahead', 0.0),
 ('alcoholic', 0.0),
 ('almost', 0.0),
 ('alone', 0.0),
 ('along', 0.0),
 ('amazing', 0.0),
 ('american', 0.0),
 ('anal', 0.0),
 ('angry', 0.0),
 ('annoying', 0.0),
 ('anxious', 0.0),
 ('anyway', 0.0),
 ('apart', 0.0),
 ('apparently', 0.0),
 ('around', 0.0),
 ('atm', 0.0),
 ('away', 0.0),
 ('awesome', 0.0),
 ('awkward', 0.0),
 ('barely', 0.0),
 ('basic', 0.0),
 ('basically', 0.0),
 ('belly', 0.0),
 ('bible', 0.0),
 ('bigger', 0.0),
 ('biggest', 0.0),
 ('boring', 0.0),
 ('bright', 0.0),
 ('bunim', 0.0),
 ('can', 0.0),
 ('certain', 0.0),
 ('certainly', 0.0),
 ('clear', 0.0),
 ('clearly', 0.0),
 ('cold', 0.0),
 ('common', 0.0),
 ('complete', 0.0),
 ('completely', 0.0),
 ('constantly', 0.0),
 ('couple', 0.0),
 ('cute', 0.0),
 ('dead', 0.0),
 ('deep', 0.0),
 ('delicious', 0.0),
 ('diaper', 0.0),
 ('didn', 0.0),
 ('difficult', 0.0),
 ('disappointed', 0.0),
 ('doesn', 0.0),
 ('double', 0.0),
 ('down', 0.0),
 ('dry', 0.0),
 ('dumb', 0.0),
 ('early', 0.0),
 ('easier', 0.0),
 ('easy', 0.0),
 ('embarrassing', 0.0),
 ('emotional', 0.0),
 ('entire', 0.0),
 ('eric', 0.0),
 ('especially', 0.0),
 ('everyone', 0.0),
 ('everywhere', 0.0),
 ('exactly', 0.0),
 ('excited', 0.0),
 ('exciting', 0.0),
 ('extra', 0.0),
 ('fabulous', 0.0),
 ('fair', 0.0),
 ('family', 0.0),
 ('fast', 0.0),
 ('fat', 0.0),
 ('favorite', 0.0),
 ('female', 0.0),
 ('finally', 0.0),
 ('fine', 0.0),
 ('forever', 0.0),
 ('forward', 0.0),
 ('free', 0.0),
 ('fresh', 0.0),
 ('full', 0.0),
 ('funny', 0.0),
 ('fur', 0.0),
 ('girlfriend', 0.0),
 ('glad', 0.0),
 ('god', 0.0),
 ('gray', 0.0),
 ('green', 0.0),
 ('gross', 0.0),
 ('grown', 0.0),
 ('guilty', 0.0),
 ('guys', 0.0),
 ('high', 0.0),
 ('hopefully', 0.0),
 ('huge', 0.0),
 ('huh', 0.0),
 ('hundred', 0.0),
 ('hungry', 0.0),
 ('immediately', 0.0),
 ('important', 0.0),
 ('incredible', 0.0),
 ('inside', 0.0),
 ('interested', 0.0),
 ('isn', 0.0),
 ('jealous', 0.0),
 ('kardashian', 0.0),
 ('kelly', 0.0),
 ('khloe', 0.0),
 ('kim', 0.0),
 ('kourtney', 0.0),
 ('kris', 0.0),
 ('laker', 0.0),
 ('lamar', 0.0),
 ('late', 0.0),
 ('lately', 0.0),
 ('later', 0.0),
 ('less', 0.0),
 ('lily', 0.0),
 ('literally', 0.0),
 ('live', 0.0),
 ('love', 0.0),
 ('low', 0.0),
 ('luxurious', 0.0),
 ('mad', 0.0),
 ('major', 0.0),
 ('male', 0.0),
 ('many', 0.0),
 ('married', 0.0),
 ('mean', 0.0),
 ('miserable', 0.0),
 ('miss', 0.0),
 ('moral', 0.0),
 ('most', 0.0),
 ('murray', 0.0),
 ('naked', 0.0),
 ('natural', 0.0),
 ('necessary', 0.0),
 ('nice', 0.0),
 ('normal', 0.0),
 ('normally', 0.0),
 ('off', 0.0),
 ('often', 0.0),
 ('oh', 0.0),
 ('okay', 0.0),
 ('older', 0.0),
 ('once', 0.0),
 ('open', 0.0),
 ('other', 0.0),
 ('out', 0.0),
 ('outside', 0.0),
 ('past', 0.0),
 ('people', 0.0),
 ('personal', 0.0),
 ('poor', 0.0),
 ('possible', 0.0),
 ('possibly', 0.0),
 ('private', 0.0),
 ('professional', 0.0),
 ('public', 0.0),
 ('quiet', 0.0),
 ('rather', 0.0),
 ('red', 0.0),
 ('regular', 0.0),
 ('rich', 0.0),
 ('rid', 0.0),
 ('ridiculous', 0.0),
 ('rude', 0.0),
 ('sad', 0.0),
 ('safe', 0.0),
 ('same', 0.0),
 ('san', 0.0),
 ('scary', 0.0),
 ('scott', 0.0),
 ('second', 0.0),
 ('secret', 0.0),
 ('selfish', 0.0),
 ('sensitive', 0.0),
 ('serious', 0.0),
 ('seriously', 0.0),
 ('sexual', 0.0),
 ('sexy', 0.0),
 ('short', 0.0),
 ('sick', 0.0),
 ('sister', 0.0),
 ('small', 0.0),
 ('somewhere', 0.0),
 ('soon', 0.0),
 ('sorry', 0.0),
 ('special', 0.0),
 ('straight', 0.0),
 ('stupid', 0.0),
 ('sudden', 0.0),
 ('supportive', 0.0),
 ('sweet', 0.0),
 ('tall', 0.0),
 ('ten', 0.0),
 ('thebouncedryer', 0.0),
 ('top', 0.0),
 ('total', 0.0),
 ('totally', 0.0),
 ('touch', 0.0),
 ('tough', 0.0),
 ('true', 0.0),
 ('truly', 0.0),
 ('truthful', 0.0),
 ('tryclearblue', 0.0),
 ('twice', 0.0),
 ('ugly', 0.0),
 ('uncomfortable', 0.0),
 ('upset', 0.0),
 ('usually', 0.0),
 ('wear', 0.0),
 ('weird', 0.0),
 ('welcome', 0.0),
 ('what', 0.0),
 ('white', 0.0),
 ('who', 0.0),
 ('won', 0.0),
 ('wonderful', 0.0),
 ('worried', 0.0),
 ('worse', 0.0),
 ('worst', 0.0),
 ('yeah', 0.0),
 ('year', 0.0),
 ('yes', 0.0),
 ('yet', 0.0),
 ('young', 0.0),
 ('younger', 0.0)]
In [106]:
posscores=seed_score(['good',"great"],pmi_matrix_kard,terms_kard)
negscores=seed_score(['bad'],pmi_matrix_kard,terms_kard)

## sentiment polarity score will be the difference between the words that are close to the positive seed
## and the words that are close to the negative seed
sentscores={}
for w in terms_kard:
    sentscores[w]=posscores[w]-negscores[w]

neglexicon_kard = sorted(sentscores.items(),key=itemgetter(1),reverse=False)[:10]
poslexicon_kard = sorted(sentscores.items(),key=itemgetter(1),reverse=False)[-10:]
In [107]:
sorted(sentscores.items(),key=itemgetter(1),reverse=False)
Out[107]:
[('bad', -0.004933680845053443),
 ('horrible', -0.0005693581780538303),
 ('san', -0.00040257648953301127),
 ('worried', -0.0003716090672612412),
 ('worst', -0.00023004370830457787),
 ('high', -0.00022469385462307607),
 ('rich', -0.00021003990758244065),
 ('ready', -0.00019097139906964003),
 ('normal', -0.00016950589032968896),
 ('busy', -0.0001525289805062962),
 ('can', -0.00014639145073927682),
 ('entire', -0.0001271294177472667),
 ('sorry', -0.00012326077909859053),
 ('able', -0.00011670194865114262),
 ('enough', -9.369757782068481e-05),
 ('kourtney', -8.350765556432388e-05),
 ('seriously', -7.791803023219573e-05),
 ('again', -7.359789968485622e-05),
 ('probably', -6.0485630200772624e-05),
 ('around', -5.891363261458702e-05),
 ('long', -5.397179310222788e-05),
 ('still', -5.271834981979909e-05),
 ('other', -5.242651351769991e-05),
 ('now', -4.557302009966452e-05),
 ('obviously', -4.4976494251856576e-05),
 ('too', -4.3628979161213044e-05),
 ('away', -4.052409175844072e-05),
 ('fast', -3.877141151200752e-05),
 ('especially', -3.5593426961842966e-05),
 ('little', -3.0184078924040153e-05),
 ('never', -2.7248013774370832e-05),
 ('right', -2.11946200471519e-05),
 ('not', -1.594014173398639e-05),
 ('more', -1.3921295839860365e-05),
 ('hard', -1.2875580688689063e-05),
 ('ever', -1.0818887271749949e-05),
 ('just', -8.23636069950847e-06),
 ('nice', -7.982349428942723e-06),
 ('together', -4.291860229563018e-06),
 ('much', -4.250653284081997e-06),
 ('acceptable', 0.0),
 ('accurate', 0.0),
 ('active', 0.0),
 ('afraid', 0.0),
 ('ago', 0.0),
 ('ahead', 0.0),
 ('alcoholic', 0.0),
 ('almost', 0.0),
 ('alone', 0.0),
 ('american', 0.0),
 ('anal', 0.0),
 ('angry', 0.0),
 ('annoying', 0.0),
 ('anxious', 0.0),
 ('anyway', 0.0),
 ('apart', 0.0),
 ('apparently', 0.0),
 ('atm', 0.0),
 ('awesome', 0.0),
 ('awkward', 0.0),
 ('barely', 0.0),
 ('basic', 0.0),
 ('basically', 0.0),
 ('belly', 0.0),
 ('bible', 0.0),
 ('bigger', 0.0),
 ('biggest', 0.0),
 ('boring', 0.0),
 ('bright', 0.0),
 ('bunim', 0.0),
 ('certain', 0.0),
 ('clear', 0.0),
 ('clearly', 0.0),
 ('cold', 0.0),
 ('common', 0.0),
 ('complete', 0.0),
 ('completely', 0.0),
 ('constantly', 0.0),
 ('cute', 0.0),
 ('dead', 0.0),
 ('deep', 0.0),
 ('delicious', 0.0),
 ('diaper', 0.0),
 ('didn', 0.0),
 ('difficult', 0.0),
 ('disappointed', 0.0),
 ('doesn', 0.0),
 ('double', 0.0),
 ('down', 0.0),
 ('dry', 0.0),
 ('dumb', 0.0),
 ('early', 0.0),
 ('easier', 0.0),
 ('embarrassing', 0.0),
 ('emotional', 0.0),
 ('everyone', 0.0),
 ('everywhere', 0.0),
 ('exactly', 0.0),
 ('excited', 0.0),
 ('exciting', 0.0),
 ('extra', 0.0),
 ('fabulous', 0.0),
 ('fair', 0.0),
 ('family', 0.0),
 ('fat', 0.0),
 ('favorite', 0.0),
 ('finally', 0.0),
 ('fine', 0.0),
 ('forever', 0.0),
 ('forward', 0.0),
 ('free', 0.0),
 ('full', 0.0),
 ('funny', 0.0),
 ('fur', 0.0),
 ('girlfriend', 0.0),
 ('glad', 0.0),
 ('god', 0.0),
 ('gray', 0.0),
 ('green', 0.0),
 ('gross', 0.0),
 ('grown', 0.0),
 ('guilty', 0.0),
 ('guys', 0.0),
 ('huge', 0.0),
 ('huh', 0.0),
 ('hundred', 0.0),
 ('hungry', 0.0),
 ('immediately', 0.0),
 ('important', 0.0),
 ('incredible', 0.0),
 ('inside', 0.0),
 ('isn', 0.0),
 ('kardashian', 0.0),
 ('kelly', 0.0),
 ('khloe', 0.0),
 ('kim', 0.0),
 ('kris', 0.0),
 ('laker', 0.0),
 ('lamar', 0.0),
 ('late', 0.0),
 ('later', 0.0),
 ('lily', 0.0),
 ('literally', 0.0),
 ('live', 0.0),
 ('low', 0.0),
 ('luxurious', 0.0),
 ('mad', 0.0),
 ('major', 0.0),
 ('male', 0.0),
 ('mean', 0.0),
 ('miserable', 0.0),
 ('miss', 0.0),
 ('moral', 0.0),
 ('most', 0.0),
 ('murray', 0.0),
 ('natural', 0.0),
 ('necessary', 0.0),
 ('normally', 0.0),
 ('off', 0.0),
 ('often', 0.0),
 ('oh', 0.0),
 ('older', 0.0),
 ('open', 0.0),
 ('outside', 0.0),
 ('past', 0.0),
 ('people', 0.0),
 ('personal', 0.0),
 ('poor', 0.0),
 ('possible', 0.0),
 ('possibly', 0.0),
 ('private', 0.0),
 ('professional', 0.0),
 ('public', 0.0),
 ('quiet', 0.0),
 ('rather', 0.0),
 ('red', 0.0),
 ('regular', 0.0),
 ('rid', 0.0),
 ('ridiculous', 0.0),
 ('rude', 0.0),
 ('sad', 0.0),
 ('safe', 0.0),
 ('scary', 0.0),
 ('scott', 0.0),
 ('second', 0.0),
 ('secret', 0.0),
 ('selfish', 0.0),
 ('sensitive', 0.0),
 ('serious', 0.0),
 ('sexual', 0.0),
 ('sexy', 0.0),
 ('short', 0.0),
 ('sick', 0.0),
 ('sister', 0.0),
 ('somewhere', 0.0),
 ('soon', 0.0),
 ('special', 0.0),
 ('straight', 0.0),
 ('stupid', 0.0),
 ('supportive', 0.0),
 ('sweet', 0.0),
 ('tall', 0.0),
 ('ten', 0.0),
 ('thebouncedryer', 0.0),
 ('top', 0.0),
 ('total', 0.0),
 ('touch', 0.0),
 ('tough', 0.0),
 ('true', 0.0),
 ('truly', 0.0),
 ('truthful', 0.0),
 ('tryclearblue', 0.0),
 ('twice', 0.0),
 ('ugly', 0.0),
 ('uncomfortable', 0.0),
 ('upset', 0.0),
 ('usually', 0.0),
 ('wear', 0.0),
 ('weird', 0.0),
 ('welcome', 0.0),
 ('what', 0.0),
 ('white', 0.0),
 ('who', 0.0),
 ('won', 0.0),
 ('wonderful', 0.0),
 ('worse', 0.0),
 ('yeah', 0.0),
 ('year', 0.0),
 ('yes', 0.0),
 ('yet', 0.0),
 ('young', 0.0),
 ('younger', 0.0),
 ('as', 2.4343249556039977e-06),
 ('only', 2.582367470460253e-06),
 ('so', 4.5872906918408804e-06),
 ('really', 5.177341671014243e-06),
 ('there', 6.622686249872567e-06),
 ('even', 7.352463528271139e-06),
 ('wrong', 9.315150160220583e-06),
 ('last', 9.688839274325689e-06),
 ('crazy', 1.1403028644407954e-05),
 ('already', 1.3778659611992945e-05),
 ('honestly', 1.5936762924714734e-05),
 ('anymore', 1.740462266778056e-05),
 ('beautiful', 1.8119881133579763e-05),
 ('comfortable', 1.8630300320441167e-05),
 ('it', 1.9896932173619897e-05),
 ('next', 2.0194676683226302e-05),
 ('perfect', 2.035002035002035e-05),
 ('then', 2.096979485492291e-05),
 ('old', 2.133469875405359e-05),
 ('few', 2.204585537918871e-05),
 ('whole', 2.2609708433704742e-05),
 ('cool', 2.2806057288815908e-05),
 ('don', 2.2937884209560507e-05),
 ('far', 2.5936300446104367e-05),
 ('first', 2.6511891191910728e-05),
 ('gorgeous', 2.755731922398589e-05),
 ('same', 2.8810141169691734e-05),
 ('least', 2.9394473838918284e-05),
 ('nervous', 3.0062530062530064e-05),
 ('here', 3.1602702288756697e-05),
 ('armenian', 3.480924533556112e-05),
 ('healthy', 3.575003575003575e-05),
 ('up', 3.6200497677223196e-05),
 ('re', 3.647728651862571e-05),
 ('actually', 3.97868532420839e-05),
 ('happy', 4.038519539431358e-05),
 ('uh', 4.409171075837742e-05),
 ('ll', 4.5509892890229304e-05),
 ('new', 4.701363334704312e-05),
 ('easy', 4.801690194948622e-05),
 ('close', 4.99151442547669e-05),
 ('they', 4.99151442547669e-05),
 ('real', 5.166997354497354e-05),
 ('best', 5.2843997912662086e-05),
 ('totally', 5.382384186372808e-05),
 ('clean', 5.628729032984352e-05),
 ('about', 5.7510927076144466e-05),
 ('sometimes', 5.7510927076144466e-05),
 ('back', 5.767585427992636e-05),
 ('you', 5.783005781507255e-05),
 ('definitely', 5.885838048759397e-05),
 ('else', 5.915025521390049e-05),
 ('different', 5.960923005872315e-05),
 ('let', 6.0864961881548294e-05),
 ('big', 6.147251353650963e-05),
 ('um', 6.421122925977295e-05),
 ('like', 6.961849067112225e-05),
 ('black', 7.215007215007215e-05),
 ('better', 7.3450285142192e-05),
 ('instead', 7.348618459729571e-05),
 ('super', 7.348618459729571e-05),
 ('many', 7.416471984277079e-05),
 ('maybe', 7.713572320313893e-05),
 ('all', 7.78089013383131e-05),
 ('fun', 7.78089013383131e-05),
 ('rob', 7.78089013383131e-05),
 ('pregnant', 8.089646832357684e-05),
 ('smart', 8.267195767195767e-05),
 ('ve', 8.484810932455602e-05),
 ('amazing', 8.512087163772558e-05),
 ('naked', 8.917424647761726e-05),
 ('jealous', 9.134922809902256e-05),
 ('single', 9.538538802332195e-05),
 ('gonna', 9.754871131710454e-05),
 ('well', 9.76719213100689e-05),
 ('absolutely', 9.93739151923774e-05),
 ('we', 0.00010064285035531607),
 ('wish', 0.00010175010175010176),
 ('half', 0.00011022927689594356),
 ('kimberly', 0.00011022927689594356),
 ('very', 0.00011096570085334131),
 ('adrienne', 0.00011337868480725624),
 ('pretty', 0.00011623261051178672),
 ('though', 0.00011671335200746966),
 ('dramatic', 0.00012025012025012025),
 ('everything', 0.00012025012025012025),
 ('honest', 0.00012025012025012025),
 ('online', 0.00012025012025012025),
 ('sure', 0.00012539060711603653),
 ('always', 0.00012899712426001428),
 ('fresh', 0.00012914890869172155),
 ('anywhere', 0.00013227513227513228),
 ('he', 0.00013812099954422052),
 ('own', 0.00013903390707904913),
 ('she', 0.00015714944728096892),
 ('strong', 0.00015873015873015873),
 ('bruce', 0.00016534391534391533),
 ('such', 0.0001856981514926353),
 ('less', 0.00018726591760299626),
 ('okay', 0.00018726591760299626),
 ('married', 0.00019206760779794487),
 ('interested', 0.00019712201852946972),
 ('sudden', 0.00019712201852946972),
 ('once', 0.00021100385082027748),
 ('lately', 0.0002203128442388191),
 ('extremely', 0.0002204585537918871),
 ('before', 0.0002303668034005113),
 ('hopefully', 0.00023408239700374532),
 ('proud', 0.00024366471734892786),
 ('along', 0.00024968789013732833),
 ('small', 0.00024968789013732833),
 ('hot', 0.0002920629390449092),
 ('nude', 0.0003006253006253006),
 ('awful', 0.00030525030525030525),
 ('that', 0.00031435967164242603),
 ('out', 0.00032102728731942215),
 ('eric', 0.0003745318352059925),
 ('also', 0.0004005512349072824),
 ('love', 0.0004406256884776382),
 ('female', 0.0005350454788657035),
 ('positive', 0.0005952380952380953),
 ('certainly', 0.0006242197253433209),
 ('willing', 0.0007240099535444639),
 ('couple', 0.0009363295880149813),
 ('good', 0.0014266084463663577),
 ('great', 0.004029259646779759)]

We (roughly) calculate the each sentence's sentiment score by comparing the number of words with positive sentiment score vs negative sentiment score (according to our automatically induced lexicon)

In [108]:
final_message_sentiment = {}

for k, m in enumerate(msgs_adj_adv_only_tokenized):
    m_sent_score = sum([sentscores.get(w,0)>0 for w in m])-sum([sentscores.get(w,0)<0 for w in m])
    final_message_sentiment[msgs[k]]=m_sent_score

sorted(final_message_sentiment.items(), key=itemgetter(1), reverse=False)[-10:]
Out[108]:
[("i know i'm setting myself up here, but, honestly, the warm nuts in new york are so good.",
  5),
 ('this whole experience in new york has really opened my eyes to so many different things.',
  6),
 ('we were like, "oh, yeah, good to see you." and then over text message, it was back before like, "oh, so good to look in your eyes."',
  6),
 ('my first memory of bruce was it was my 11th birthday party and it was at tower lane, and i thought it was so cool that you four: you, casey, brody, everyone was coming, and i was, like, "we have four new, like, brothers and sisters, and they\'re all coming." and that\'s why it was so much fun \'cause we had such a good time.',
  6),
 ("i wouldn't be a good manager or a good mom if i didn't find out who's really single out there and who would be a great match for kim.",
  6),
 ("i definitely feel protective over summer because she's so young and new to the industry, but i think the smart thing to do is let her learn her own lessons and kind of feel her way through on her own.",
  6),
 ("with sex, there's a real fine li from being fun to turning trashy, and we are the ones that have to make sure all the pictures and everything else with carmen looks really, really fun and sexy.",
  6),
 ('once you\'ve broken trust, and you start thinking in your mind, "i want to be a better person, i want to be a better dad," people aren\'t automatically gonna go, "oh, scott, that\'s so great!',
  6),
 ("this is a great time to tell khloe that it's not always all about us and that maybe once in a while it's a great thing to help somebody else out.",
  6),
 ("so, tonight, khloe, i ask you to honor that very same promise to his grandmother, that you will always support lamar and stand by him because you have realized very quickly what the rest of us already know: it's very easy to love lamar.",
  8)]
In [109]:
sorted(final_message_sentiment.items(), key=itemgetter(1))[:10]
Out[109]:
[('now, i do not know what case you have him on, but whatever it is, it is going bad, and it sounds like it is going bad right now.',
  -6),
 ('all you can do right now is just dote on her, take care of her and just realize that this stage of pregnancy, you just got to get through it.',
  -5),
 ('look, i promise i will never, ever, ever lie ever again.', -5),
 ("i need scott to start taking things a little bit more seriously when it comes to the baby, like getting the room together, reading baby books, just being more involved in what i'm going through.",
  -5),
 ("i couldn't be any more sorry, and i'll never excuse the way i acted the other night in vegas, but, like, i don't know what i ever did so bad to, like, deserve you to, like, hate me so much.",
  -5),
 ("do you think it's 'cause you guys are spending way too much time together now that you guys live together?",
  -5),
 ('now too much! too much!', -5),
 ("i'm just saying it's probably not the right thing to do.", -4),
 ("you're going too fast, you're going too fast.", -4),
 ('a little more, a little more!', -4)]

Pretty good considering that we had absolutely no sentiment labels to start with!