income and FHV trips per region 20190213-2048




Pay Notebook Creator: Ning Wei0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Do your turnstile project here. Homework instruction posted in this CoCalc folder (HW1_INSTR_381_780.docx)

In [41]:
import pandas as pd
import matplotlib as plt
In [42]:
df = pd.read_csv('turnstile_180818.txt')
In [43]:
df[:3]
Out[43]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
C/A UNIT SCP STATION LINENAME DIVISION DATE TIME DESC ENTRIES EXITS
0 A002 R051 02-00-00 59 ST NQR456W BMT 08/11/2018 00:00:00 RECOVR AUD 6720718 2277853.0
1 A002 R051 02-00-00 59 ST NQR456W BMT 08/11/2018 04:00:00 REGULAR 6720729 2277855.0
2 A002 R051 02-00-00 59 ST NQR456W BMT 08/11/2018 08:00:00 REGULAR 6720738 2277886.0
In [44]:
df.iloc[0]
Out[44]:
C/A                                                                            A002
UNIT                                                                           R051
SCP                                                                        02-00-00
STATION                                                                       59 ST
LINENAME                                                                    NQR456W
DIVISION                                                                        BMT
DATE                                                                     08/11/2018
TIME                                                                       00:00:00
DESC                                                                     RECOVR AUD
ENTRIES                                                                     6720718
EXITS                                                                   2.27785e+06
Name: 0, dtype: object
In [45]:
df_temp = df.copy()
In [46]:
list(df)
Out[46]:
['C/A',
 'UNIT',
 'SCP',
 'STATION',
 'LINENAME',
 'DIVISION',
 'DATE',
 'TIME',
 'DESC',
 'ENTRIES',
 'EXITS                                                               ']
In [47]:
df_temp = df_temp.rename(columns={'EXITS                                                               ': 'EXITS'})
In [48]:
df_subset1 = df_temp[['STATION','DATE','TIME','ENTRIES','EXITS']]
In [49]:
df_subset1[:3]
Out[49]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
STATION DATE TIME ENTRIES EXITS
0 59 ST 08/11/2018 00:00:00 6720718 2277853.0
1 59 ST 08/11/2018 04:00:00 6720729 2277855.0
2 59 ST 08/11/2018 08:00:00 6720738 2277886.0
In [50]:
df_subset1 = df_subset1[df_subset1['STATION'] == '59 ST']
In [51]:
grouped = df_subset1.groupby(['DATE','TIME'])
In [52]:
grouped.sum()
Out[52]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
ENTRIES EXITS
DATE TIME
08/11/2018 00:00:00 2137785610 764094566.0
04:00:00 2137785865 764094874.0
08:00:00 2137786784 764095371.0
12:00:00 2137789001 764096584.0
16:00:00 2137791858 764098006.0
20:00:00 2137794659 764099423.0
08/12/2018 00:00:00 2137796169 764100305.0
04:00:00 2137796394 764100503.0
08:00:00 2137796995 764100823.0
12:00:00 2137799050 764101837.0
16:00:00 2137801665 764103233.0
20:00:00 2137804147 764104617.0
08/13/2018 00:00:00 2137805404 764105500.0
04:00:00 2137805574 764105638.0
08:00:00 2137807848 764107106.0
12:00:00 2131797139 762763711.0
16:00:00 2137815918 764112160.0
20:00:00 2137822944 764114851.0
08/14/2018 00:00:00 2137824996 764115782.0
04:00:00 2137825180 764115982.0
08:00:00 2137827698 764117130.0
12:00:00 2137832382 764120558.0
16:00:00 2137836076 764122471.0
20:00:00 2137843276 764125127.0
08/15/2018 00:00:00 2137845434 764126208.0
04:00:00 2137845607 764126364.0
08:00:00 2137848088 764127894.0
12:00:00 2137852981 764131348.0
16:00:00 2137856688 764133178.0
20:00:00 2137864078 764136015.0
08/16/2018 00:00:00 2137866438 764137164.0
04:00:00 2137866652 764137318.0
08:00:00 2137869017 764138881.0
12:00:00 2137873629 764142328.0
16:00:00 2137877527 764144207.0
20:00:00 2137884750 764146943.0
08/17/2018 00:00:00 2137887224 764148001.0
04:00:00 2137887413 764148186.0
08:00:00 2137889711 764149620.0
12:00:00 2137894163 764152889.0
16:00:00 2137898357 764154780.0
20:00:00 2137905039 764157137.0
In [53]:
grouped = grouped.sum()
In [54]:
import matplotlib.pyplot as plt
grouped.plot(y = 'ENTRIES')
plt.title('59ST Entries By Date')
plt.xlabel('DATE,TIME')
plt.ylabel('ENTRIES')
Out[54]:
Text(0,0.5,'ENTRIES')
Out[54]:
In [55]:
df_hist = df_subset1[['DATE', 'ENTRIES', 'EXITS']]
In [56]:
df_hist = df_hist.groupby(['DATE'], as_index = False)
In [57]:
df_hist = df_hist.sum()
In [58]:
df_hist
Out[58]:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style>
DATE ENTRIES EXITS
0 08/11/2018 12826733777 4.584579e+09
1 08/12/2018 12826794420 4.584611e+09
2 08/13/2018 12820854827 4.583309e+09
3 08/14/2018 12826989608 4.584717e+09
4 08/15/2018 12827112876 4.584781e+09
5 08/16/2018 12827238013 4.584847e+09
6 08/17/2018 12827361907 4.584911e+09
In [59]:
df_hist.plot(x = 'DATE',kind = 'bar')
Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feb6b98b588>
Out[59]:
In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]:
 
In [0]: