Prepare and Fit Spatial Regression Models 20190222




Pay Notebook Creator: Roy Hyunjin Han0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Estimate Graduation Rate from Tree Count

Team Name: YOUR-TEAM-NAME

  • Person 1
  • Person 2
  • Person 3

Hypothesis: Schools surrounded by more trees have higher graduation rates.

We prepared this example tool for our 13 teams in the Tech Incubator at CUNY Queens College who are creating tools to present at the annual NYC Open Data Student Showcase for the NYC Mayor's Office of Data Analytics.

In [ ]:
# form hypothesis
# find datasets
# explore datasets
In [ ]:
# tree_id
# tree_dbh
# status=='Alive'
# latitude
# longitude
In [ ]:
url = 'https://data.cityofnewyork.us/api/views/nb39-jx2v/rows.csv'
In [ ]:
import pandas as pd
t2 = pd.read_csv(url)
In [ ]:
len(t2)
In [ ]:
# choose variable you want to predict (graduation rates)
# choose feature variables (tree count)
In [ ]:
# choose variable you want to predict (disease counts)
# choose feature variables (air pollution)
In [ ]:
# design dataset you will use to train your model (training dataset)
# prepare dataset
In [ ]:
# use kd trees to get nearby count
# get nearby rows
# compute sum of distances
In [ ]:
# train model
# compare models and features
In [ ]:
# design tool using tool template (INPUT, OUTPUT)
# build tool
# publish tool