We presented this tool and notebook as part of our workshop on Computational Approaches to Fight Human Trafficking.
Thank you to the USA Social Security Administration for providing a clean and comprehensive dataset of baby names. These baby names come from Social Security card applications dated 1879 to 2017.
# CrossCompute
name = 'Jerry Seinfeld'
import pandas as pd
t = pd.read_csv('names-usa.csv.xz', compression='xz', index_col=0)
t[:5]
try:
given_name = name.split()[0].lower()
except IndexError:
print('name.error = required')
given_name
try:
selected_t = t.loc[given_name]
gender = selected_t.idxmax()
probability = selected_t.max() / selected_t.sum()
except KeyError:
gender = 'unknown'
probability = 1
print('gender = ' + gender)
print('probability = %.02f' % probability)