Test Relationship Between Labor Trafficking and GDP¶

This exploratory tool and notebook are based on a presentation by Eric Schles at our workshop on Computational Approaches to Fight Human Trafficking.

{ labor_table : Labor Trafficking by Product and Country ? Child Labor is child slavery and Forced Labor is adult slavery }

{ gdp_table : Gross Domestic Product by Country ? USD in millions }

{ selected_year : Selected Year of Analysis }

In [ ]:

# CrossCompute
labor_table_path = 'labor.csv'
gdp_table_path = 'gdp.csv'
selected_year = '2017'
target_folder = '/tmp'

In [ ]:

# Install python packages (this takes a few seconds)
import pip
pip.main('install statsmodels'.split())

In [ ]:

import numpy as np
import pandas as pd
from statsmodels import api as sm

In [ ]:

from IPython.display import Image
from matplotlib import pyplot as plt

In [ ]:

labor_table = pd.read_csv(labor_table_path)
labor_table.head()

In [ ]:

labor_countries = labor_table['country'].unique()
len(labor_countries)

In [ ]:

gdp_table = pd.read_csv(gdp_table_path)
gdp_table.head()

Understanding Our Data¶

Child Labor refers to child slavery.

Forced Labor refers to adult slavery.

Child Labor¶

In [ ]:

child_labor_table = pd.DataFrame()
for country in labor_countries:
    table = labor_table[labor_table['country'] == country]
    child_labor_product_type_count = len(table[table['child labor'] == 'X'])
    child_labor_table = child_labor_table.append({
        'country': country,
        'child labor product type count': child_labor_product_type_count,
    }, ignore_index=True)
child_labor_table.head()

In [ ]:

t = child_labor_table
labels = [
    'child labor',
    'no child labor']
country_count_with_child_labor = len(t[t[
    'child labor product type count'] > 0])
country_count_without_child_labor = len(t[t[
    'child labor product type count'] == 0])
sizes = [
    country_count_with_child_labor,
    country_count_without_child_labor]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal');  # Draw pie as a circle

In [ ]:

target_path = target_folder + '/percent-of-countries-with-child-labor.png'
plt.savefig(target_path)
print('child_labor_percent_image_path = %s' % target_path)

So, as we can see, 98.7% of all countries surveyed have some form of child labor.

In [ ]:

t = child_labor_table
labels = [
    'more than one product type',
    'one or fewer product types']
country_count_with_child_labor_in_more_than_one_product = len(t[t[
    'child labor product type count'] > 1])
country_count_with_child_labor_in_one_or_fewer_products = len(t[t[
    'child labor product type count'] <= 1])
sizes = [
    country_count_with_child_labor_in_more_than_one_product,
    country_count_with_child_labor_in_one_or_fewer_products]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal')  # Draw pie as a circle

plt.show()

And 71.1% have more than one product type where child labor is used.

In [ ]:

target_path = target_folder + '/percent-of-countries-with-child-labor-in-more-than-one-product.png'
plt.savefig(target_path)
print('child_labor_product_percent_image_path = %s' % target_path)

Forced Labor¶

In [ ]:

forced_labor_table = pd.DataFrame()
for country in labor_countries:
    table = labor_table[labor_table['country'] == country]
    forced_labor_product_type_count = len(table[table['forced labor'] == 'X'])
    forced_labor_table = forced_labor_table.append({
        'country': country,
        'forced labor product type count': forced_labor_product_type_count,
    }, ignore_index=True)
forced_labor_table.head()

In [ ]:

t = forced_labor_table
labels = [
    'forced labor',
    'no forced labor']
country_count_with_forced_labor = len(t[t[
    'forced labor product type count'] > 0])
country_count_without_forced_labor = len(t[t[
    'forced labor product type count'] == 0])
sizes = [
    country_count_with_forced_labor,
    country_count_without_forced_labor]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal')  # Draw pie as a circle

plt.show()

So, as we can see, 48.7% of all countries surveyed have some form of forced labor.

In [ ]:

target_path = target_folder + '/percent-of-countries-with-forced-labor.png'
plt.savefig(target_path)
print('forced_labor_percent_image_path = %s' % target_path)

In [ ]:

t = forced_labor_table
labels = [
    'more than one product type',
    'one or fewer product types']
country_count_with_forced_labor_in_more_than_one_product = len(t[t[
    'forced labor product type count'] > 1])
country_count_with_forced_labor_in_one_or_fewer_products = len(t[t[
    'forced labor product type count'] <= 1])
sizes = [
    country_count_with_forced_labor_in_more_than_one_product,
    country_count_with_forced_labor_in_one_or_fewer_products]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal')  # Draw pie as a circle

plt.show()

And 22.4% have more than one product type where forced labor is used.

In [ ]:

target_path = target_folder + '/percent-of-countries-with-forced-labor-in-more-than-one-product.png'
plt.savefig(target_path)
print('forced_labor_product_percent_image_path = %s' % target_path)

Slave Labor by Industry¶

In [ ]:

products_by_industry = {
    'gems': [
        'Diamonds', 'Emeralds', 'Gems', 'Jade', 'Rubies', 'Sapphires',
        'Tanzanite (gems)'],
    'minerals': [
        'Zinc, ''Wolframite (tungsten ore)', 'Trona (mineral)', 'Silver',
        'Iron', 'Heterogenite (cobalt ore)', 'Gypsum (mineral)',
        'Granite (crushed)', 'Granite', 'Gold', 'Coltan (tantalum ore)',
        'Copper', 'Cassiterite (tin ore)', 'Fluorspar (mineral)', 'Tin',
        'Zinc', 'Wolframite (tungsten ore)', 'Gravel (crushed stones)'],
    'food': [
        'Alcoholic Beverages', 'Baked Goods', 'Bananas', 'Beans (green beans)',
        'Beans (green, soy, yellow)', 'Beef', 'Blueberries',
        'Brazil Nuts/Chestnuts', 'Broccoli', 'Cashews', 'Chile Peppers',
        'Citrus Fruits', 'Cloves', 'Coca (stimulant plant)', 'Cocoa',
        'Coconuts', 'Coffee', 'Corn', 'Cucumbers', 'Cumin', 'Dried Fish',
        'Eggplants', 'Fish', 'Garlic', 'Grapes', 'Goats', 'Hazelnuts', 'Hogs',
        'Lobsters', 'Meat', 'Melons', 'Miraa (stimulant plant)', 'Olives',
        'Onions', 'Peanuts', 'Pepper', 'Physic Nuts/Castor Beans', 'Potatoes',
        'Poultry', 'Pulses (legumes)', 'Rice', 'Salt', 'Sesame', 'Shellfish',
        'Shrimp', 'Strawberries', 'Sugar Beets', 'Sugarcane', 'Tea',
        'Tomatoes', 'Vanilla', 'Wheat', 'Yerba Mate (stimulant plant)',
        'Nile Perch (fish)', 'Pineapples', 'Tilapia (fish)', 'Cattle',
        'Oil (Palm)', 'Oil (palm)', 'Sisal', 'Manioc/Cassava'],
    'decorations': [
        'Artificial Flowers', 'Flowers', 'Flowers (poppies)', 'Sunflowers'],
    'construction': [
        'Bricks', 'Bricks (clay)', 'Cement', 'Ceramics', 'Glass', 'Nails',
        'Palm Thatch', 'Sand', 'Stones', 'Stones (limestone)', 'Timber',
        'Rubber', 'Bamboo'],
    'cigarettes': [
        'Bidis (hand-rolled cigarettes)', 'Tobacco'],
    'home durables': [
        'Brassware', 'Ceramics', 'Furniture', 'Furniture (steel)', 'Glass',
        'Glass Bangles', 'Locks', 'Matches', 'Soap', 'Stones (pumice)',
        'Teak'],
    'fabric': [
        'Carpets', 'Cotton', 'Embellished Textiles', 'Footwear',
        'Footwear (sandals)', 'Garments', 'Garments ', 'Leather',
        'Leather Goods/Accessories', 'Silk Cocoons', 'Silk Fabric',
        'Silk Thread', 'Textiles', 'Textiles (hand-woven)', 'Textiles (jute)',
        'Thread/Yarn', 'Cottonseed (hybrid)', 'Fashion Accessories'],
    'celebratory': [
        'Christmas Decorations', 'Fireworks', 'Incense (agarbatti)', 'Matches',
        'Pyrotechnics', 'Soccer Balls', 'Toys'],
    'energy': ['Charcoal', 'Coal'],
    'medical': ['Surgical Instruments'],
    'sexploitation': ['Pornography'],
    'technology': ['Electronics'],
}

def categorize_industry(x):
    for industry, products in products_by_industry.items():
        product = x['product'].strip()
        if product in products:
            x['industry'] = industry
    return x

labor_table = labor_table.apply(categorize_industry, axis=1)

In [ ]:

# Check if we have missed anything
for index in range(len(labor_table)):
    if pd.isnull(labor_table.iloc[index]['industry']):
        print(labor_table.iloc[index]['product'])

In [ ]:

slave_industry_counts = labor_table['industry'].value_counts()
slave_industry_counts

In [ ]:

labels = slave_industry_counts.keys()
xs = range(len(labels))
ys = slave_industry_counts.values
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.bar(xs, ys, align='center')
plt.xticks(xs, labels);

In [ ]:

target_path = target_folder + '/slave-labor-by-industry.png'
plt.savefig(target_path)
print('slave_labor_by_industry_image_path = %s' % target_path)

Testing for a Relationship Between Labor Trafficking and GDP¶

In [ ]:

rows = []
for country in labor_countries:
    slave_product_type_count = len(labor_table[labor_table['country'] == country])
    try:
        gdp = gdp_table[gdp_table['country'] == country][selected_year].iloc[0]
    except IndexError:
        print('could not find GDP for %s' % country)
        continue
    rows.append((country, slave_product_type_count, gdp))
rows[:5]

In [ ]:

slave_gdp_table = pd.DataFrame(rows, columns=[
    'country', 'slave product type count', 'gdp'])
slave_gdp_table.head()

In [ ]:

target_path = target_folder + '/slave-product-type-count-and-gdp-by-country.csv'
slave_gdp_table.to_csv(target_path, index=False)
print('slave_gdp_table_path = %s' % target_path)

Spearman Rank Correlation¶

In [ ]:

from scipy import stats
t = slave_gdp_table
x = stats.spearmanr(t['gdp'], t['slave product type count'])
x

In [ ]:

target_path = target_folder + '/gdp-vs-slave-product-type-count-spearman-rank-correlation.txt'
open(target_path, 'wt').write(str(x))
print('spearman_rank_correlation_text_path = %s' % target_path)

Ordinary Least Squares Regression¶

In [ ]:

model = sm.OLS(slave_gdp_table['gdp'], slave_gdp_table['slave product type count'])
result = model.fit()
x = result.summary()
x

In [ ]:

target_path = target_folder + '/gdp-vs-slave-product-type-count-ols-regression.txt'
open(target_path, 'wt').write(str(x))
print('ols_regression_text_path = %s' % target_path)

Decision Tree Regression¶

In [ ]:

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    slave_gdp_table['slave product type count'], 
    slave_gdp_table['gdp'], 
    test_size=0.3,
    random_state=42)
x_test[:5]

In [ ]:

x_train = x_train.values.reshape(-1, 1)
x_test = x_test.values.reshape(-1, 1)
x_test[:5]

In [ ]:

from math import sqrt
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('decision_tree_root_mean_squared_error = %s' % sqrt(mean_squared_error(
    y_test.values, predictions)))

Gradient Boosting Regression¶

In [ ]:

from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(
    learning_rate=0.1,
    max_depth=None,
    max_leaf_nodes=17,
    min_samples_split=5,
    n_estimators=1000,
    random_state=2,
    subsample=1)
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('gradient_boosting_root_mean_squared_error = %s' % sqrt(mean_squared_error(
    y_test.values, predictions)))

Support Vector Machine Regression¶

In [ ]:

from sklearn.svm import SVR

model = SVR()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('support_vector_machine_root_mean_squared_error = %s' % sqrt(mean_squared_error(
    y_test.values, predictions)))

Testing for a Relationship Between Labor Trafficking and GDP¶

These results are highly dependent on the datasets and thus should be interpreted with care.

Charts¶

{ child_labor_percent_image : Percent of Countries with Child Labor ? Child Slavery }

{ child_labor_product_percent_image : Percent of Countries with Child Labor in More Than One Product }

{ forced_labor_percent_image : Percent of Countries with Forced Labor ? Adult Slavery }

{ forced_labor_product_percent_image : Percent of Countries with Forced Labor in More Than One Product }

{ slave_labor_by_industry_image : Relative Frequency of Slave Labor by Industry ? Industries with More Product Types in More Countries Will Appear Higher }

Models¶

If people who are free to work produce better quality products than people who are forced to work, then it is reasonable to assume that countries with more slave labor products will have lower GDP. It might even be possible to estimate a country's GDP solely based on the number of product types made by slave labor.

{ slave_gdp_table : Slave Labor and GDP }

{ spearman_rank_correlation_text : Spearman Rank Correlation Test }

{ ols_regression_text : OLS Regression Test }

{ decision_tree_root_mean_squared_error : Decision Tree Root Mean Squared Error ? Lower is Better }

{ gradient_boosting_root_mean_squared_error : Gradient Boosting Root Mean Squared Error ? Lower is Better }

{ support_vector_machine_root_mean_squared_error : Support Vector Machine Root Mean Squared Error ? Lower is Better }

Pay Notebook Creator: Roy Hyunjin Han	0
Set Container: Numerical CPU with TINY Memory for 10 Minutes	0
Total	0

Test Relationship between Labor Trafficking and GDP 20171214

Test Relationship Between Labor Trafficking and GDP¶

Understanding Our Data¶

Child Labor¶

Forced Labor¶

Slave Labor by Industry¶

Testing for a Relationship Between Labor Trafficking and GDP¶

Spearman Rank Correlation¶

Ordinary Least Squares Regression¶

Decision Tree Regression¶

Gradient Boosting Regression¶

Support Vector Machine Regression¶

Testing for a Relationship Between Labor Trafficking and GDP¶

Charts¶

Models¶