Build a Human Trafficking Dataset from Court Cases and News Articles 20171214




Pay Notebook Creator: Roy Hyunjin Han0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Guess the Gender of a Name from the USA

We presented this tool and notebook as part of our workshop on Computational Approaches to Fight Human Trafficking.

Thank you to the USA Social Security Administration for providing a clean and comprehensive dataset of baby names. These baby names come from Social Security card applications dated 1879 to 2017.

In [ ]:
# CrossCompute
name = 'Jerry Seinfeld'
In [ ]:
import pandas as pd
t = pd.read_csv('names-usa.csv.xz', compression='xz', index_col=0)
t[:5]
In [ ]:
try:
    given_name = name.split()[0].lower()
except IndexError:
    print('name.error = required')
given_name
In [ ]:
try:
    selected_t = t.loc[given_name]
    gender = selected_t.idxmax()
    probability = selected_t.max() / selected_t.sum()
except KeyError:
    gender = 'unknown'
    probability = 1
In [ ]:
print('gender = ' + gender)
print('probability = %.02f' % probability)