Introduction to Computational Analysis




Pay Notebook Creator: Roy Hyunjin Han0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0
In [31]:
import pandas
import numpy as np
import os

Look at spinach prices from the US food prices dataset.

In [32]:
excelFile = pandas.ExcelFile('datasets/USA-SpinachPrices.xls')
excelFile.sheet_names
In [33]:
organicSpinachFrame = excelFile.parse('organic', index_col=0)
conventionalSpinachFrame = excelFile.parse('conventional', index_col=0)
conventionalSpinachFrame

Rearrange DataFrame into a Series with a MultiIndex.

In [34]:
organicSpinachHierarchicalSeries = organicSpinachFrame.stack()
conventionalSpinachHierarchicalSeries = conventionalSpinachFrame.stack()
conventionalSpinachHierarchicalSeries

Combine year and month into a single index.

In [35]:
organicSpinachHierarchicalSeries.index
In [36]:
organicSpinachHierarchicalSeries.index
In [37]:
from pandas import Timestamp
def parse_indexEntry(indexEntry):
    return Timestamp('%d %s' % indexEntry)
parse_indexEntry(organicSpinachHierarchicalSeries.index[0])
In [38]:
def flatten_index(hierarchicalSeries):
    return pandas.Series(hierarchicalSeries.values,
        index=[parse_indexEntry(x) for x in hierarchicalSeries.index])
organicSpinachSeries = flatten_index(organicSpinachHierarchicalSeries)
conventionalSpinachSeries = flatten_index(conventionalSpinachHierarchicalSeries)
conventionalSpinachSeries[:5]

Compare monthly prices of organic vs conventional spinach.

In [39]:
import pylab as pl
pl.figure(figsize=(12,6))
organicSpinachSeries.plot(label='organic')
conventionalSpinachSeries.plot(label='conventional')
pl.legend()
In [40]:
organicSpinachFrame - conventionalSpinachFrame
In [41]:
organicSpinachSeries.describe()
In [42]:
organicSpinachSeries.describe() - conventionalSpinachSeries.describe()

Look at dates when the price of organic spinach was above average.

In [43]:
subIndex = organicSpinachSeries > organicSpinachSeries.mean()
organicSpinachSeries[subIndex]

Look at the price of conventional spinach for dates when the price of organic spinach was above average.

In [44]:
conventionalSpinachSeries[subIndex]

Was the price of conventional spinach above average on dates when the price of organic spinach was above average?

In [45]:
conventionalSpinachSeries[subIndex] > conventionalSpinachSeries.mean()

Assemble prices and their differences into a table and save it to a spreadsheet.

In [46]:
# Create dataFrame
spinachFrame = pandas.DataFrame.from_items([
    ('organic', organicSpinachSeries),
    ('conventional', conventionalSpinachSeries)])
spinachFrame.head()
In [47]:
# Add column of differences
spinachFrame['difference'] = organicSpinachSeries - conventionalSpinachSeries
spinachFrame.head()
In [48]:
# Save it to a spreadsheet
spinachFrame.to_csv(os.path.expandvars('$HOME/Downloads/USA-SpinachPrices.csv'))

Convert spinachFrame from US dollars to Singapore dollars using an exchange rate of 1 USD = 1.28135 SGD.

In [49]:
# Type your solution here and press CTRL-ENTER

Compute the ratio of organic prices to conventional prices.

In [50]:
# Option 1
spinachFrame['organic'] / spinachFrame['conventional']
In [51]:
# Option 2
spinachFrame.apply(lambda row: row['organic'] / row['conventional'], axis=1)

Compute average prices by year.

In [52]:
from pandas.core.datetools import YearEnd
spinachFrame.groupby(YearEnd().rollforward).aggregate(np.mean)

Predict the price of organic spinach given the price of conventional spinach using ordinary least squares.

In [53]:
import statsmodels.api as sm
y = organicSpinachSeries[:-1]      # Exclude last price
X = conventionalSpinachSeries[:-1] # Exclude last price
model = sm.OLS(y, X)
results = model.fit()
results.params
In [54]:
results.predict(conventionalSpinachSeries[-1])