KzO9K




Pay Notebook Creator: Rocky Singh0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Predict Metrics by Zipcode

Here is an dummy tool template that you can use to prototype your tool. This tool template assumes that each row of your training dataset corresponds to a zipcode.

Note that this tool uses a dummy model. Please modify the inputs, outputs and model to fit your chosen hypothesis and training dataset.

Thanks to the following groups for making this work possible:

{zipcode_table : Zipcodes ? Specify the zipcodes for which you would like to predict metrics}

In [1]:
# CrossCompute
zipcode_table_path = 'zipcode.csv'
target_folder = '/tmp'
In [ ]:
 

Load Arguments

In [2]:
import pandas as pd
zipcode_table = pd.read_csv(zipcode_table_path)
zipcode_table[:3]
Out[2]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count
0 11365 10
1 10010 20
2 10021 30

Normalize Variables

Here we want to normalize tree count. We can do that by dividing tree count by zipcode area in square meters.

In [3]:
url = 'https://data.cityofnewyork.us/download/i8iw-xf4u/application%2Fzip'
In [4]:
# Use default projection to get zipcode area in square feet
# import geotable
# nyc_zipcode_table = geotable.load(url)
# print(nyc_zipcode_table.iloc[0]['AREA'])
# print(nyc_zipcode_table.iloc[0]['geometry_object'].area)
In [5]:
# Get UTM projection
import geotable
utm_proj4 = geotable.load_utm_proj4(url)
utm_proj4
Out[5]:
'+proj=utm +zone=18 +ellps=WGS84 +datum=WGS84 +units=m +no_defs'
In [11]:
# Get zipcode area in square meters
nyc_zipcode_table = geotable.load(url, target_proj4=utm_proj4)
nyc_zipcode_table.iloc[0]['geometry_object'].area
Out[11]:
2107694.2486345936
In [12]:
nyc_zipcode_table['Area in Square km'] = nyc_zipcode_table[
    'geometry_object'].apply(lambda g: g.area/1000000)
nyc_zipcode_table[['ZIPCODE', 'Area in Square km']][:5]
Out[12]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
ZIPCODE Area in Square km
0 11436 2.107694
1 11213 2.751168
2 11212 3.897041
3 11225 2.200345
4 11218 3.423119
In [13]:
# Extract relevant columns
nyc_zipcode_table = nyc_zipcode_table[['ZIPCODE', 'Area in Square km']].copy()
In [14]:
zipcode_table[:3]
Out[14]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count
0 11365 10
1 10010 20
2 10021 30
In [15]:
# Merge tables
zipcode_table['zipcode'] = zipcode_table['zipcode'].astype(str)
dataset_table = pd.merge(zipcode_table, nyc_zipcode_table, left_on='zipcode', right_on='ZIPCODE')
dataset_table[:5]
Out[15]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count ZIPCODE Area in Square km
0 11365 10 11365 6.442713
1 10010 20 10010 0.906962
2 10021 30 10021 0.974447
In [16]:
dataset_table[:3]
Out[16]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count ZIPCODE Area in Square km
0 11365 10 11365 6.442713
1 10010 20 10010 0.906962
2 10021 30 10021 0.974447
In [17]:
# Add normalized column
dataset_table['Tree Count Per Square km'] = dataset_table[
    'Tree Count'] / dataset_table['Area in Square km']
dataset_table[:5]
Out[17]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count ZIPCODE Area in Square km Tree Count Per Square km
0 11365 10 11365 6.442713 1.552141
1 10010 20 10010 0.906962 22.051641
2 10021 30 10021 0.974447 30.786692

Run Model to Estimate Target Variable

In [18]:
# Load model
from pickle import load
model = load(open('model.pkl', 'rb'))  # !!! Replace dummy model with your model
model
Out[18]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)
In [19]:
dataset_table[:3]
Out[19]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count ZIPCODE Area in Square km Tree Count Per Square km
0 11365 10 11365 6.442713 1.552141
1 10010 20 10010 0.906962 22.051641
2 10021 30 10021 0.974447 30.786692
In [22]:
# Run model
X = dataset_table[['Tree Count Per Square km']].values
y = model.predict(X)
y
Out[22]:
array([466.42380757, 504.44684715, 520.64886082])
In [23]:
# Add column
dataset_table['Predicted Air Pollution'] = y
dataset_table
Out[23]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count ZIPCODE Area in Square km Tree Count Per Square km Predicted Air Pollution
0 11365 10 11365 6.442713 1.552141 466.423808
1 10010 20 10010 0.906962 22.051641 504.446847
2 10021 30 10021 0.974447 30.786692 520.648861

Render Table

In [28]:
# Select columns
output_table = dataset_table[[
    'zipcode',
    'Tree Count Per Square km',
    'Predicted Air Pollution',
]].copy()
In [29]:
# Save file to target folder to include it in the result download
target_path = target_folder + '/a.csv'
output_table.to_csv(target_path, index=False)
print(f'a_table_path = {target_path}')  # Print table_path to render table
a_table_path = /tmp/a.csv

Render Map

In [30]:
output_geotable = output_table.copy()  # Prevent SettingwithCopyWarning
output_geotable
Out[30]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count Per Square km Predicted Air Pollution
0 11365 1.552141 466.423808
1 10010 22.051641 504.446847
2 10021 30.786692 520.648861
In [31]:
# Define wkt_by_zipcode
import geotable
url = 'https://data.cityofnewyork.us/download/i8iw-xf4u/application%2Fzip'
# Specify target_proj4 to convert from NYC spatial reference to longitude and latitude
nyc_zipcode_table = geotable.load(url, target_proj4=geotable.LONGITUDE_LATITUDE_PROJ4)
In [32]:
wkt_by_zipcode = {}
for index, row in nyc_zipcode_table.iterrows():
    zipcode = row['ZIPCODE']
    geometry = row['geometry_object']
    wkt_by_zipcode[zipcode] = geometry.wkt
In [35]:
# Geocode zipcode polygons
output_geotable['WKT'] = output_geotable['zipcode'].apply(
    lambda zipcode: wkt_by_zipcode.get(str(int(zipcode))))
output_geotable = output_geotable.dropna(subset=['WKT'])
output_geotable[:3]
Out[35]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
zipcode Tree Count Per Square km Predicted Air Pollution WKT
0 11365 1.552141 466.423808 POLYGON ((-73.81088634839307 40.72717154257169...
1 10010 22.051641 504.446847 POLYGON ((-73.98827138035784 40.74347939182168...
2 10021 30.786692 520.648861 POLYGON ((-73.9496232806158 40.76752090792029,...
In [36]:
nyc_zipcode_table[nyc_zipcode_table['ZIPCODE'] == '10019'].iloc[0]['geometry_object'].wkt
Out[36]:
'POLYGON ((-73.99309797037012 40.77272824543018, -73.99194592266677 40.77223999357954, -73.9909455774401 40.77182447402802, -73.99062944837497 40.77168799821958, -73.99045325328831 40.77161193356638, -73.99025312369528 40.77152752976274, -73.98826658823141 40.77068965960738, -73.98795725244663 40.77055902860484, -73.9878758607349 40.77052453259513, -73.98779640076761 40.77049117766995, -73.98775904077267 40.77047538968552, -73.98761511649867 40.77041512973537, -73.98723155082492 40.77025779155473, -73.98477434073854 40.76921900013685, -73.98431936781783 40.76984168314652, -73.98220500980263 40.76893884643313, -73.9821910491862 40.76843241910436, -73.98236458127812 40.76833967591926, -73.98242595195842 40.76827465993342, -73.98246501048727 40.76819975299133, -73.98247619465094 40.76810222114855, -73.98245390270159 40.76801175226295, -73.98228729501072 40.76781779520935, -73.98221765385379 40.76771844722319, -73.982140998942 40.76767591628559, -73.98202991720373 40.76763567981872, -73.98186193801092 40.76760173782696, -73.98163627328798 40.76764454984561, -73.9813628410009 40.76783421594484, -73.97924205863474 40.76694561902912, -73.97919771675939 40.76692690486689, -73.97907658113148 40.76687577934278, -73.97894600724852 40.76682067083934, -73.97642224776693 40.76575546586919, -73.97623387896452 40.76567432417963, -73.97606820938987 40.76560612984165, -73.97376650210639 40.76463463416172, -73.97363583789856 40.76456959874577, -73.97351686593359 40.76451062536135, -73.97343945084793 40.76447893230804, -73.97316549383626 40.76436671013569, -73.9730148658829 40.76427857323088, -73.9730929859076 40.76417079575315, -73.97339380779344 40.76375577191267, -73.97346522835808 40.76365879057316, -73.97396798106291 40.76296627791392, -73.97403467321496 40.76287247302172, -73.97405021536295 40.76285996491652, -73.97445730711688 40.76229277236563, -73.97491475562425 40.76166695655948, -73.97537360423978 40.76103959661618, -73.97583011372922 40.76041821004747, -73.97628868970176 40.75978643741984, -73.97674467186306 40.75916442565961, -73.97798851019422 40.75968925008241, -73.97924415591891 40.76021597132856, -73.97980531526082 40.76044844268592, -73.97982320634422 40.76046015598296, -73.97989058572001 40.76049043732654, -73.97997187593019 40.76052439060617, -73.9803629332494 40.75997621628841, -73.98042605575125 40.75989876921924, -73.98050325858345 40.75980316311944, -73.98088573706205 40.75927036253795, -73.98134106854107 40.75864499066453, -73.98345853562783 40.7595425870567, -73.9835687103979 40.7595884601269, -73.98359710507387 40.75959608694587, -73.98403001727731 40.75977264399582, -73.98404977338427 40.75978265584263, -73.98407065834911 40.75979664086892, -73.9841736871245 40.75984128228095, -73.98480813841877 40.76010341725735, -73.98701514294915 40.76103705386799, -73.98826819573529 40.76156454983614, -73.9898628399536 40.7622351906633, -73.99270143380933 40.76343049082325, -73.99325604193466 40.76366312125491, -73.99330058100159 40.76368185434035, -73.99351739310328 40.76377304459236, -73.99554485397877 40.76462575163002, -73.99729250584703 40.76536693335463, -73.99773597315175 40.76555765704952, -73.99779557067069 40.76546711613709, -73.99789737144381 40.76532429181058, -73.99813119677623 40.76499940391125, -73.99818784848034 40.7649248259752, -73.99838224919007 40.76465829678665, -73.99838634980534 40.76464418054277, -73.99839068877941 40.76461751667861, -73.99855740376329 40.76438729151498, -73.99862128650786 40.76429627474713, -73.99867259427324 40.76422943723413, -73.99882040929091 40.76402585827518, -73.99883405811613 40.76400717205291, -73.99889704699042 40.76392376437801, -73.99894124906147 40.76386999208816, -73.99909766419836 40.76367971148704, -73.99925483971241 40.76351291762366, -73.9993997277912 40.76337748263715, -73.99961587365287 40.76321257061978, -73.99969419438845 40.76313698162777, -74.00035603422106 40.76260917071001, -74.00116297246872 40.76202152866602, -74.00170164102903 40.76138042224312, -74.00218297931343 40.76077574965672, -74.00263387431282 40.76015154646123, -74.00321149881574 40.76034782200493, -74.00278647455953 40.76092411007356, -74.00273940603338 40.76098792905837, -74.00233064310348 40.76154817071519, -74.00221276105255 40.761709735244, -74.00184505173537 40.76221369550108, -74.00153363461575 40.76264039774595, -74.0013904662914 40.76283656451891, -74.00105957084233 40.76328994581576, -74.00103750831502 40.76332613818184, -74.00103004601317 40.76334137928273, -74.00101479808363 40.76337251525185, -74.00099451456279 40.76342039613818, -74.00096168101879 40.76350309422045, -74.00095174945966 40.76352810760796, -74.00093918752491 40.76355765235309, -74.00092211452807 40.76359619767229, -74.00089717400483 40.76364307541541, -74.00088472914196 40.76366263266139, -74.00087214211948 40.76367986210375, -74.00085463297314 40.76369977688475, -74.00083656001692 40.76371576186695, -74.00081779771399 40.76372792150946, -74.00079826553205 40.76373631164379, -74.00077785925181 40.76374096558889, -74.0007565480835 40.76374188244403, -74.00071543688516 40.76373538917113, -74.00068130505049 40.76372765404723, -74.00061370331559 40.76369696344404, -74.00054020001379 40.7636633173742, -74.00048966373245 40.76364132638754, -74.00029408022712 40.76355574853807, -74.00008569919143 40.76346393077212, -73.99999874345427 40.76342675618938, -73.99985795947357 40.76336861954992, -73.99983567484851 40.76338316176283, -73.99978757031872 40.76340775355372, -73.99973592102043 40.76343042544138, -73.99971302521951 40.76344661915375, -73.9996965201417 40.76345998079285, -73.99966190743899 40.76350815677068, -73.99962170216938 40.76356260743274, -73.99958549285211 40.76361199906225, -73.99954952981248 40.76366331775999, -73.99945129885303 40.76380348711115, -73.99939836365903 40.76387640246915, -73.99936907781574 40.76391673859468, -73.99930698725736 40.76400226181734, -73.99916107829178 40.76420063019795, -73.99889465220242 40.7645676914232, -73.9988857526872 40.76457995440167, -73.99873291719582 40.76478844382122, -73.99861441066933 40.76495299605825, -73.99859737302778 40.76497526171408, -73.99856861678542 40.76501696280008, -73.99855370627986 40.76503761026999, -73.99854330714228 40.76505358868182, -73.99858246103244 40.76506675994158, -73.99861227869918 40.76507161852361, -73.99863968481776 40.76508476440766, -73.99867056928561 40.76510622197718, -73.998684946099 40.76512120202592, -73.99869186732447 40.7651341576555, -73.99870251783926 40.76516290186296, -73.99870436399176 40.76517766744742, -73.99870329893149 40.76519345597161, -73.99869903728141 40.76520722192779, -73.99868466090311 40.7652290832994, -73.99866708913368 40.7652477055195, -73.99865324475853 40.76525782702307, -73.9986394004039 40.76526673464559, -73.9985920083431 40.76528697650472, -73.99853237227103 40.76530235906964, -73.99850627918129 40.76530600307378, -73.99846250805845 40.76531044018151, -73.99844183688855 40.76532056877228, -73.99842053898483 40.76533878657779, -73.99840829226918 40.76534931239993, -73.998396576034 40.76536469824799, -73.99837687331345 40.76538898906611, -73.9983481190127 40.76543514218602, -73.99833320714055 40.76545659917915, -73.99829327180325 40.76551935111154, -73.99824851697898 40.76558825429869, -73.99821884106574 40.76563394721349, -73.99819120463442 40.76567912506413, -73.99809593410396 40.76584120342449, -73.9980777955185 40.7658720579884, -73.9980703678612 40.76588469283421, -73.99793463958315 40.76607128748956, -73.99776762027085 40.76632171449209, -73.99768104889819 40.76643482416582, -73.99743430815894 40.76677396757121, -73.99738669972716 40.76683854162562, -73.99737684732618 40.76685190847942, -73.99729244082405 40.76696639747328, -73.99721464814426 40.76707191664585, -73.9971688278479 40.76713406739515, -73.9971480390694 40.76716226345464, -73.99709712658164 40.76723092378966, -73.9968547285054 40.76755756973396, -73.99646712004242 40.76808775972454, -73.99673310667809 40.76820878174685, -73.99728252519927 40.76844125020179, -73.99708371948934 40.76873554069729, -73.9970456527063 40.76879189053469, -73.99691234230015 40.7689839707348, -73.9968892073987 40.76901730404921, -73.99678166927853 40.76916792650686, -73.99665857315715 40.76933788492057, -73.99659065211141 40.7694308202859, -73.99656376674376 40.76946760865327, -73.99634767823942 40.76976328049025, -73.99632234554366 40.76979794366077, -73.99596813682567 40.77027480685346, -73.99590433281658 40.77035857302086, -73.99587335967233 40.77040520627897, -73.99586984899437 40.77042294696817, -73.99588144123351 40.77046366363846, -73.99588086331548 40.77048115319222, -73.99587291949058 40.77051278128408, -73.99586440311776 40.77053244256688, -73.99584945068763 40.77054633949594, -73.9958315351561 40.77055304849895, -73.99580349134911 40.77055761933599, -73.99578223893786 40.77054907908763, -73.99576063892897 40.77054923138046, -73.9957490037945 40.77056909245157, -73.99573728723709 40.77058407189028, -73.99571811354363 40.77061281704404, -73.99569574623577 40.7706456098373, -73.99566379338383 40.77069023307727, -73.99564889571806 40.77071681176913, -73.9956217372102 40.77076662477799, -73.9955465623942 40.77086987899908, -73.99539585510955 40.77108253501022, -73.99532743633011 40.77117922558566, -73.99525388565399 40.77128316946069, -73.99513289115802 40.77145498491588, -73.9950154097116 40.77144633326232, -73.99499618785211 40.77144656655999, -73.99497778061922 40.77145077561132, -73.99496157862085 40.77145864169934, -73.99494881613521 40.77146957144758, -73.99494045739782 40.77148273193408, -73.99493713737138 40.77149712633076, -73.99493910843765 40.77151166413984, -73.99494622143827 40.77152524223818, -73.99495793395884 40.77153682952498, -73.99502272735866 40.77160627103803, -73.99497936270458 40.77166017507378, -73.99478349569183 40.77192912953553, -73.99476427281964 40.77195552691109, -73.99454744553579 40.77225329430125, -73.9944494384923 40.7723846009717, -73.9943781719229 40.77248008325292, -73.9943423191085 40.77252461407255, -73.99424956744951 40.77252635375526, -73.99416284972735 40.77248674257174, -73.99383725855455 40.77293952462252, -73.99382939737085 40.77296047800453, -73.99382682259684 40.77298217728574, -73.9938318947051 40.77300752401322, -73.99383806715797 40.77302085899074, -73.99383835149573 40.77304587587762, -73.99378238889445 40.77302271733931, -73.99365828399927 40.77297070923323, -73.99360939717018 40.77295415438785, -73.9935504544968 40.77293094514308, -73.99349586740568 40.77290681580003, -73.99333310131512 40.77283486531718, -73.99328392143083 40.77281312522047, -73.99323626834379 40.77279205876847, -73.99320029521027 40.77277615647051, -73.99309797037012 40.77272824543018))'
In [44]:
# Set color for each geometry using a gradient
output_geotable['FillReds'] = output_geotable['Predicted Air Pollution']
#tput_geotable[:3]
In [ ]:
 
In [46]:
# Save file to target folder to include it in the result download
target_path = target_folder + '/b.csv'
output_geotable.to_csv(target_path, index=False)
print(f'b_geotable_path = {target_path}')  # Print geotable_path to render map
b_geotable_path = /tmp/b.csv

Render Plot

In [48]:
%matplotlib inline
axes = output_table[[
    'Predicted Air Pollution',
]].plot(kind='bar')
In [49]:
# Save file to target folder to include it in the result download
target_path = target_folder + '/c.png'
figure = axes.get_figure()
figure.savefig(target_path)
print(f'c_image_path = {target_path}')
c_image_path = /tmp/c.png

Predicted Metrics by Zipcode

YOUR INTERPRETATION OF THE RESULTS

{a_table : Tree Count vs Air Pollution ? YOUR TABLE DESCRIPTION}

{b_geotable : Tree and Pollution overlap ? YOUR MAP DESCRIPTION}

{c_image : Predicted Air Pollution ? YOUR PLOT DESCRIPTION}