Opening your session in 90 seconds...

If your browser keeps redirecting back and forth from this page in an endless loop, it is possible that you are using an older browser. Please update Google Chrome or use Mozilla Firefox.

Pandas¶

Chunk large CSV files.

In [ ]:

def load(filePath, **kw):
    ignore_index = 'index_col' not in kw
    chunkIterator = read_csv(filePath, iterator=True, chunksize=10000, **kw)
    return concat(chunkIterator, ignore_index=ignore_index)

Limit columns in DataFrames.

In [4]:

from pandas import read_csv
chocolate = read_csv('datasets/UN-Chocolate.csv')
chocolate

In [5]:

chocolate[['Year', 'Flow']]

Use h5py
Use numpy.memmap
Use lru_cache or dogpile.cache for computationally intensive operations

In [2]:

from dogpile.cache import make_region
region = make_region().configure('dogpile.cache.memory')
cache_on_arguments = region.cache_on_arguments

@cache_on_arguments()
def f(x):
    print 'Wheee!'
    return x

print f(1)
print f(1)  # Cached
print f(2)
print f(2)  # Cached

Wheee!
1
1
Wheee!
2
2

Scikit-Learn¶

Select features with cross-validation

Select models with cross-validation

Scale samples

Decorrelate samples

Cross-validate with transformations by pipelining

Interpolate missing labels

Introduction to Computational Analysis

Pandas¶

Scikit-Learn¶

References¶

Pay Notebook Creator: Roy Hyunjin Han	0
Set Container: Numerical CPU with TINY Memory for 10 Minutes	0
Total	0