Test Hypotheses on Human Trafficking




Pay Notebook Creator: Roy Hyunjin Han0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0
In [4]:
import time
time.time()
Out[4]:
1510524583.9173102
In [20]:
"""
import simplejson as json
import time
import webhoseio
webhoseio.config(token='901881ff-1e8a-4631-b6f0-9103d71c00ba')

d = webhoseio.query('filterWebContent', {
    'q': 'sex trafficker',
    'ts': '1507927474389',
    'sort': 'relevancy',    
})
json.dump(d, open('d.json', 'wt'))
# time.sleep(1)
# d = webhoseio.get_next()
""";
In [23]:
import simplejson as json
d = json.load(open('d.json'))
In [26]:
d.keys()
Out[26]:
dict_keys(['posts', 'totalResults', 'moreResultsAvailable', 'next', 'requestsLeft'])
In [8]:
len(d['posts'])
Out[8]:
100
In [9]:
d['totalResults']
Out[9]:
2995
In [10]:
d['moreResultsAvailable']
Out[10]:
2895
In [11]:
d['next']
Out[11]:
'/filterWebContent?token=901881ff-1e8a-4631-b6f0-9103d71c00ba&format=json&ts=1507927474389&q=sex+trafficker&sort=relevancy&from=100'
In [24]:
d['requestsLeft']
Out[24]:
998
In [27]:
d['posts'][0].keys()
Out[27]:
dict_keys(['thread', 'uuid', 'url', 'ord_in_thread', 'author', 'published', 'title', 'text', 'highlightText', 'highlightTitle', 'language', 'external_links', 'entities', 'rating', 'crawled'])
In [29]:
d['posts'][0]['thread']
Out[29]:
{'country': 'US',
 'domain_rank': 1779,
 'main_image': 'http://metrouk2.files.wordpress.com/2017/09/pri_52039396.jpg?quality=80&strip=all&crop=0px%2C713px%2C2500px%2C1313px&resize=1200%2C630',
 'participants_count': 0,
 'performance_score': 10,
 'published': '2017-11-05T14:05:00.000+02:00',
 'replies_count': 0,
 'section_title': '',
 'site': 'metro.co.uk',
 'site_categories': ['world_soccer', 'sports'],
 'site_full': 'metro.co.uk',
 'site_section': '',
 'site_type': 'news',
 'social': {'facebook': {'comments': 202, 'likes': 1052, 'shares': 1052},
  'gplus': {'shares': 0},
  'linkedin': {'shares': 0},
  'pinterest': {'shares': 0},
  'stumbledupon': {'shares': 0},
  'vk': {'shares': 0}},
 'spam_score': 0.171,
 'title': "Sex trafficker attacked in prison after other inmates 'find out what she's in for' | Metro News",
 'title_full': "Sex trafficker attacked in prison after other inmates 'find out what she's in for' | Metro News",
 'url': 'http://omgili.com/ri/2wGaacqxAps2qjtKiioripmAyx3ZbyxG.kljnYbrbbtRr43c_59rX_XBhr0OM_YpuAKzxk3mA09A7oc1cJ2eF.jsKfmpaJJT8sTjzxD7K.eVnIgtdFXkedzlUpFpgX7UmhiRB_z_8bC3h9KgYC8v3Ujqkx0incfd',
 'uuid': '267bf9c7723c41f26c4ff4db32bf3be148c2687e'}
In [31]:
x = d['posts'][0]
x['uuid']
Out[31]:
'267bf9c7723c41f26c4ff4db32bf3be148c2687e'
In [32]:
x['url']
Out[32]:
'http://omgili.com/ri/2wGaacqxAps2qjtKiioripmAyx3ZbyxG.kljnYbrbbtRr43c_59rX_XBhr0OM_YpuAKzxk3mA09A7oc1cJ2eF.jsKfmpaJJT8sTjzxD7K.eVnIgtdFXkedzlUpFpgX7UmhiRB_z_8bC3h9KgYC8v3Ujqkx0incfd'
In [33]:
x['ord_in_thread']
Out[33]:
0
In [34]:
x['author']
Out[34]:
'Oliver Wheaton For Metro.Co.Uk'
In [35]:
x['published']
Out[35]:
'2017-11-05T14:05:00.000+02:00'
In [36]:
x['title']
Out[36]:
"Sex trafficker attacked in prison after other inmates 'find out what she's in for' | Metro News"
In [37]:
x['text']
Out[37]:
"Share this article with Google Plus Share this article with Whatsapp Share this article through email Share this article through sms Carolann Gallon is serving a six-year jail sentence (Picture: Northumbia Police/PA Wire) \nA sex trafficker who provided girls as young as 13 to sex predators has been attacked in jail after the other inmates learned of her crimes. \nCarolann Gallon was jailed for six years in September for her part in a Newcastle child sex grooming gang’s activities. We're drinking a lot less beer - and it's putting pubs at risk \nShe was initially jailed at Low Newton’s women’s prison in Durham, however she had to move HMP Styal in Cheshire 150 miles away after being attacked by inmates. \nNow at her new prison, 23-year-old Gallon has been attacked again after fellow inmates found out who she was. \nShe was assaulted on Sunday and left with minor injuries, with the police looking into the incident. \n17 men were convicted along with Gallon (Picture: Northumbria Police) \nMORE: ‘Serial sex attacker’ wanted after two women dragged into bushes hours apart \nHer father, Jimmy Gallon, told Chronicle Live : ‘People have found out what she’s in for and she’s getting grief. \n‘She got moved from one place to another because she was getting picked on, then she rang me saying she had been assaulted by someone in there. \n‘She cries every time I speak to her.’ \nHe added that he ‘really misses’ his daughter, saying it is ‘heartbreaking’ to see her in prison. \nCarolann was part of a group of 18 who were convicted following Operation Shelter, which was a police investigation into the exploitation of vulnerable girls and young women in Newcastle. \nShe pleaded guilty to three counts of three counts of trafficking and sexual exploitation. \nThe court heard how Gallon allegedly told one victim her family would be ‘chopped to pieces’ if she reported the abuse. "
In [38]:
x['highlightText']
Out[38]:
''
In [39]:
x['highlightTitle']
Out[39]:
''
In [40]:
x['language']
Out[40]:
'english'
In [41]:
x['external_links']
Out[41]:
[]
In [42]:
x['entities']
Out[42]:
{'locations': [], 'organizations': [], 'persons': []}
In [43]:
x['rating']
In [44]:
x['crawled']
Out[44]:
'2017-11-05T14:24:17.379+02:00'
In [51]:
import arrow
arrow.get(x['crawled'])
Out[51]:
<Arrow [2017-11-05T14:24:17.379000+02:00]>
In [ ]:
# + Get news articles
# _ use soup to extract text
In [52]:
import spacy
nlp = spacy.load('en_core_web_lg')
In [53]:
title = nlp(x['title'])
In [54]:
text = nlp(x['text'])
In [56]:
title
Out[56]:
Sex trafficker attacked in prison after other inmates 'find out what she's in for' | Metro News
In [55]:
title.ents
Out[55]:
(| Metro News,)
In [58]:
list(title.noun_chunks)
Out[58]:
[Sex trafficker, prison, other inmates, what, she, | Metro News]
In [61]:
for x in text.ents:
    print(x, x.label_)
Google Plus Share PRODUCT
Whatsapp Share PRODUCT
sms Carolann Gallon PRODUCT
six-year DATE
13 CARDINAL
six years DATE
September DATE
Newcastle GPE

 PERSON
Newton PERSON
Durham GPE
HMP Styal ORG
Cheshire 150 miles QUANTITY
Gallon PRODUCT
Sunday DATE
17 CARDINAL
Gallon (Picture PRODUCT
Northumbria Police ORG
Serial sex attacker’ GPE
two CARDINAL
Jimmy Gallon PERSON
Chronicle Live WORK_OF_ART
one CARDINAL
misses’ PERSON
Carolann PERSON
18 CARDINAL
Operation Shelter WORK_OF_ART
Newcastle GPE
three CARDINAL
three CARDINAL
Gallon PERSON
one CARDINAL
In [62]:
x.text
Out[62]:
'one'
In [ ]:
# use NER with spacy to extract information
# Return JSON

Trafficking Category

  • Labor
  • Sex
  • Organ Removal

Date

  • Conviction Date
  • Incident Start Date
  • Incident End Date

Location

  • Country (Trafficker, Victim)
  • City (Trafficker, Victim)

Age (Trafficker, Victim) Gender (Trafficker, Victim)

In [68]:
from collections import Counter
sorted(Counter('apple').items(), key=lambda x: -x[1])[0][0]
Out[68]:
'p'