Introduction to Computational Analysis




Pay Notebook Creator: Roy Hyunjin Han0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Pandas

Chunk large CSV files.

In [ ]:
def load(filePath, **kw):
    ignore_index = 'index_col' not in kw
    chunkIterator = read_csv(filePath, iterator=True, chunksize=10000, **kw)
    return concat(chunkIterator, ignore_index=ignore_index)

Limit columns in DataFrames.

In [4]:
from pandas import read_csv
chocolate = read_csv('datasets/UN-Chocolate.csv')
chocolate
In [5]:
chocolate[['Year', 'Flow']]
  • Use h5py
  • Use numpy.memmap
  • Use lru_cache or dogpile.cache for computationally intensive operations
In [2]:
from dogpile.cache import make_region
region = make_region().configure('dogpile.cache.memory')
cache_on_arguments = region.cache_on_arguments

@cache_on_arguments()
def f(x):
    print 'Wheee!'
    return x

print f(1)
print f(1)  # Cached
print f(2)
print f(2)  # Cached
Wheee!
1
1
Wheee!
2
2

Scikit-Learn

Select features with cross-validation

Select models with cross-validation

Scale samples

Decorrelate samples

Cross-validate with transformations by pipelining

Interpolate missing labels