We presented this tool and notebook as part of our workshop on Computational Approaches to Fight Human Trafficking.
Thank you to the USA Social Security Administration for providing a clean and comprehensive dataset of baby names. These baby names come from Social Security card applications dated 1879 to 2017.
# CrossCompute name = 'Jerry Seinfeld'
import pandas as pd t = pd.read_csv('names-usa.csv.xz', compression='xz', index_col=0) t[:5]
try: given_name = name.split().lower() except IndexError: print('name.error = required') given_name
try: selected_t = t.loc[given_name] gender = selected_t.idxmax() probability = selected_t.max() / selected_t.sum() except KeyError: gender = 'unknown' probability = 1
print('gender = ' + gender) print('probability = %.02f' % probability)