Plot Name Frequency by Year Walkthrough¶

Prototype Algorithm¶

Review strings¶

In [ ]:

# Replace YOUR FULL NAME with your full name then press SHIFT-ENTER
name = 'YOUR FULL NAME'

In [ ]:

# Type name.<TAB> to see what you can do with this string
name

In [ ]:

# Replace "lower" with "title" and press CTRL-ENTER
name.lower()

In [ ]:

# Capitalize the string using a single command

In [ ]:

# Split the string into a list of words using a single command

In [ ]:

# Get the number of characters in the string
len(name)

Review lists¶

In [ ]:

xs = ['fox', 'rabbit', 'raccoon']
# Get the last item in the list
xs[-1]

In [ ]:

# Get the first item in the list

In [ ]:

# Get the number of items in the list

Explore dataset¶

In [ ]:

from os.path import expanduser
from pandas import read_csv

name_table_path = expanduser('~/Experiments/Datasets/names-by-year.csv')
name_table = read_csv(name_table_path)
# Show the first two rows the dataset
name_table[:2]

In [ ]:

# Show the first five rows of the dataset

In [ ]:

# Show the unique years in the table
name_table['year'].unique()

In [ ]:

# Count the number of unique years in the table
len(name_table['year'].unique())

In [ ]:

# Count the number of unique names in the table

Filter table¶

In [ ]:

# Select rows where the name starts with Tim and the year is less than 1915
t = name_table[
    name_table['name'].str.startswith('Tim') & (
    name_table['year'] < 1915)]
t

In [ ]:

# Sum counts
t['count'].sum()

In [ ]:

# Count how many people were born with your name between and including the years 1960 and 1969

Group table¶

In [ ]:

# Split the large table into smaller tables grouped by year
for year, table in t.groupby('year'):
    print(table)
    print()

In [ ]:

# Sum counts for each year
t.groupby('year')['count'].sum()

In [ ]:

# Count how many people shared the first three letters of your name after the year 2000

Plot table¶

In [ ]:

# Plot the number of names per year
%matplotlib inline
name_table.groupby('year').sum().plot();

In [ ]:

# Plot the number of babies born with the name Jake by year
name_table[name_table['name'] == 'Jake'].groupby('year')['count'].sum().plot();

In [ ]:

# Plot the number of babies born with your name by year

Save plot as image¶

In [ ]:

axes = name_table[name_table.name == 'Jake'].groupby('year').sum().plot()
axes.get_figure().savefig('/tmp/plot.png')

In [ ]:

ls /tmp

Assemble Algorithm¶

In the menu above, choose File > New Notebook > Python 3. Copy and paste each of the following code blocks into the new notebook. Press SHIFT-ENTER on each code block to make sure that the code runs properly.

In [ ]:

name_table_path = '~/Experiments/Datasets/names-by-year.csv'
name = 'jake'
target_folder = '/tmp'

In [ ]:

from pandas import read_csv
name_table = read_csv(name_table_path)
name = name.capitalize().split()[0]
selected_name_table = name_table[name_table.name == name]
selected_count_by_year = selected_name_table.groupby('year')['count'].sum()
selected_count_by_year[:5]

In [ ]:

%matplotlib inline
axes = selected_count_by_year.plot(legend=False, title='Name Frequency by Year of Birth: ' + name)
from os.path import join
target_path = join(target_folder, 'name-by-year.png')
axes.get_figure().savefig(target_path);
print('name_by_year_image_path = ' + target_path)

Preview Tool¶

Specify arguments¶

Add a comment with the word CrossCompute to the first code block. Your first code block should look like the following:

In [ ]:

# CrossCompute
name_table_path = '~/Experiments/Datasets/names-by-year.csv'
name = 'jake'
target_folder = '/tmp'

Preview tool¶

Press the green paper plane to preview your tool!

Pay Notebook Creator: Roy Hyunjin Han	0
Set Container: Numerical CPU with TINY Memory for 10 Minutes	0
Total	0

Tool Walkthroughs

Plot Name Frequency by Year Walkthrough¶

Prototype Algorithm¶

Review strings¶

Review lists¶

Explore dataset¶

Filter table¶

Group table¶

Plot table¶

Save plot as image¶

Assemble Algorithm¶

Preview Tool¶

Specify arguments¶

Preview tool¶