Tool Walkthroughs




Pay Notebook Creator: Roy Hyunjin Han0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Plot Name Frequency by Year Walkthrough

Prototype Algorithm

Review strings

In [ ]:
# Replace YOUR FULL NAME with your full name then press SHIFT-ENTER
name = 'YOUR FULL NAME'
In [ ]:
# Type name.<TAB> to see what you can do with this string
name
In [ ]:
# Replace "lower" with "title" and press CTRL-ENTER
name.lower()
In [ ]:
# Capitalize the string using a single command
In [ ]:
# Split the string into a list of words using a single command
In [ ]:
# Get the number of characters in the string
len(name)

Review lists

In [ ]:
xs = ['fox', 'rabbit', 'raccoon']
# Get the last item in the list
xs[-1]
In [ ]:
# Get the first item in the list
In [ ]:
# Get the number of items in the list

Explore dataset

In [ ]:
from os.path import expanduser
from pandas import read_csv

name_table_path = expanduser('~/Experiments/Datasets/names-by-year.csv')
name_table = read_csv(name_table_path)
# Show the first two rows the dataset
name_table[:2]
In [ ]:
# Show the first five rows of the dataset
In [ ]:
# Show the unique years in the table
name_table['year'].unique()
In [ ]:
# Count the number of unique years in the table
len(name_table['year'].unique())
In [ ]:
# Count the number of unique names in the table

Filter table

In [ ]:
# Select rows where the name starts with Tim and the year is less than 1915
t = name_table[
    name_table['name'].str.startswith('Tim') & (
    name_table['year'] < 1915)]
t
In [ ]:
# Sum counts
t['count'].sum()
In [ ]:
# Count how many people were born with your name between and including the years 1960 and 1969

Group table

In [ ]:
# Split the large table into smaller tables grouped by year
for year, table in t.groupby('year'):
    print(table)
    print()
In [ ]:
# Sum counts for each year
t.groupby('year')['count'].sum()
In [ ]:
# Count how many people shared the first three letters of your name after the year 2000

Plot table

In [ ]:
# Plot the number of names per year
%matplotlib inline
name_table.groupby('year').sum().plot();
In [ ]:
# Plot the number of babies born with the name Jake by year
name_table[name_table['name'] == 'Jake'].groupby('year')['count'].sum().plot();
In [ ]:
# Plot the number of babies born with your name by year

Save plot as image

In [ ]:
axes = name_table[name_table.name == 'Jake'].groupby('year').sum().plot()
axes.get_figure().savefig('/tmp/plot.png')
In [ ]:
ls /tmp

Assemble Algorithm

In the menu above, choose File > New Notebook > Python 3. Copy and paste each of the following code blocks into the new notebook. Press SHIFT-ENTER on each code block to make sure that the code runs properly.

In [ ]:
name_table_path = '~/Experiments/Datasets/names-by-year.csv'
name = 'jake'
target_folder = '/tmp'
In [ ]:
from pandas import read_csv
name_table = read_csv(name_table_path)
name = name.capitalize().split()[0]
selected_name_table = name_table[name_table.name == name]
selected_count_by_year = selected_name_table.groupby('year')['count'].sum()
selected_count_by_year[:5]
In [ ]:
%matplotlib inline
axes = selected_count_by_year.plot(legend=False, title='Name Frequency by Year of Birth: ' + name)
from os.path import join
target_path = join(target_folder, 'name-by-year.png')
axes.get_figure().savefig(target_path);
print('name_by_year_image_path = ' + target_path)

Preview Tool

Specify arguments

Add a comment with the word CrossCompute to the first code block. Your first code block should look like the following:

In [ ]:
# CrossCompute
name_table_path = '~/Experiments/Datasets/names-by-year.csv'
name = 'jake'
target_folder = '/tmp'

Preview tool

Press the green paper plane to preview your tool!