Team Name: YOUR-TEAM-NAME
Hypothesis: Schools surrounded by more trees have higher graduation rates.
We prepared this example tool for our 13 teams in the Tech Incubator at CUNY Queens College who are creating tools to present at the annual NYC Open Data Student Showcase for the NYC Mayor's Office of Data Analytics.
# form hypothesis
# find datasets
# explore datasets
# tree_id
# tree_dbh
# status=='Alive'
# latitude
# longitude
url = 'https://data.cityofnewyork.us/api/views/nb39-jx2v/rows.csv'
import pandas as pd
t2 = pd.read_csv(url)
len(t2)
# choose variable you want to predict (graduation rates)
# choose feature variables (tree count)
# choose variable you want to predict (disease counts)
# choose feature variables (air pollution)
# design dataset you will use to train your model (training dataset)
# prepare dataset
# use kd trees to get nearby count
# get nearby rows
# compute sum of distances
# train model
# compare models and features
# design tool using tool template (INPUT, OUTPUT)
# build tool
# publish tool