Test Relationship between Labor Trafficking and GDP 20171214




Pay Notebook Creator: Roy Hyunjin Han0
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Test Relationship Between Labor Trafficking and GDP

This exploratory tool and notebook are based on a presentation by Eric Schles at our workshop on Computational Approaches to Fight Human Trafficking.

{ labor_table : Labor Trafficking by Product and Country ? Child Labor is child slavery and Forced Labor is adult slavery }

{ gdp_table : Gross Domestic Product by Country ? USD in millions }

{ selected_year : Selected Year of Analysis }

In [ ]:
# CrossCompute
labor_table_path = 'labor.csv'
gdp_table_path = 'gdp.csv'
selected_year = '2017'
target_folder = '/tmp'
In [ ]:
# Install python packages (this takes a few seconds)
import pip
pip.main('install statsmodels'.split())
In [ ]:
import numpy as np
import pandas as pd
from statsmodels import api as sm
In [ ]:
from IPython.display import Image
from matplotlib import pyplot as plt
In [ ]:
labor_table = pd.read_csv(labor_table_path)
labor_table.head()
In [ ]:
labor_countries = labor_table['country'].unique()
len(labor_countries)
In [ ]:
gdp_table = pd.read_csv(gdp_table_path)
gdp_table.head()

Understanding Our Data

Child Labor refers to child slavery.

Forced Labor refers to adult slavery.

Child Labor

In [ ]:
child_labor_table = pd.DataFrame()
for country in labor_countries:
    table = labor_table[labor_table['country'] == country]
    child_labor_product_type_count = len(table[table['child labor'] == 'X'])
    child_labor_table = child_labor_table.append({
        'country': country,
        'child labor product type count': child_labor_product_type_count,
    }, ignore_index=True)
child_labor_table.head()
In [ ]:
t = child_labor_table
labels = [
    'child labor',
    'no child labor']
country_count_with_child_labor = len(t[t[
    'child labor product type count'] > 0])
country_count_without_child_labor = len(t[t[
    'child labor product type count'] == 0])
sizes = [
    country_count_with_child_labor,
    country_count_without_child_labor]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal');  # Draw pie as a circle
In [ ]:
target_path = target_folder + '/percent-of-countries-with-child-labor.png'
plt.savefig(target_path)
print('child_labor_percent_image_path = %s' % target_path)

So, as we can see, 98.7% of all countries surveyed have some form of child labor.

In [ ]:
t = child_labor_table
labels = [
    'more than one product type',
    'one or fewer product types']
country_count_with_child_labor_in_more_than_one_product = len(t[t[
    'child labor product type count'] > 1])
country_count_with_child_labor_in_one_or_fewer_products = len(t[t[
    'child labor product type count'] <= 1])
sizes = [
    country_count_with_child_labor_in_more_than_one_product,
    country_count_with_child_labor_in_one_or_fewer_products]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal')  # Draw pie as a circle

plt.show()

And 71.1% have more than one product type where child labor is used.

In [ ]:
target_path = target_folder + '/percent-of-countries-with-child-labor-in-more-than-one-product.png'
plt.savefig(target_path)
print('child_labor_product_percent_image_path = %s' % target_path)

Forced Labor

In [ ]:
forced_labor_table = pd.DataFrame()
for country in labor_countries:
    table = labor_table[labor_table['country'] == country]
    forced_labor_product_type_count = len(table[table['forced labor'] == 'X'])
    forced_labor_table = forced_labor_table.append({
        'country': country,
        'forced labor product type count': forced_labor_product_type_count,
    }, ignore_index=True)
forced_labor_table.head()
In [ ]:
t = forced_labor_table
labels = [
    'forced labor',
    'no forced labor']
country_count_with_forced_labor = len(t[t[
    'forced labor product type count'] > 0])
country_count_without_forced_labor = len(t[t[
    'forced labor product type count'] == 0])
sizes = [
    country_count_with_forced_labor,
    country_count_without_forced_labor]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal')  # Draw pie as a circle

plt.show()

So, as we can see, 48.7% of all countries surveyed have some form of forced labor.

In [ ]:
target_path = target_folder + '/percent-of-countries-with-forced-labor.png'
plt.savefig(target_path)
print('forced_labor_percent_image_path = %s' % target_path)
In [ ]:
t = forced_labor_table
labels = [
    'more than one product type',
    'one or fewer product types']
country_count_with_forced_labor_in_more_than_one_product = len(t[t[
    'forced labor product type count'] > 1])
country_count_with_forced_labor_in_one_or_fewer_products = len(t[t[
    'forced labor product type count'] <= 1])
sizes = [
    country_count_with_forced_labor_in_more_than_one_product,
    country_count_with_forced_labor_in_one_or_fewer_products]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
    sizes,
    labels=labels,
    autopct='%1.1f%%',
    startangle=90)
ax.axis('equal')  # Draw pie as a circle

plt.show()

And 22.4% have more than one product type where forced labor is used.

In [ ]:
target_path = target_folder + '/percent-of-countries-with-forced-labor-in-more-than-one-product.png'
plt.savefig(target_path)
print('forced_labor_product_percent_image_path = %s' % target_path)

Slave Labor by Industry

In [ ]:
products_by_industry = {
    'gems': [
        'Diamonds', 'Emeralds', 'Gems', 'Jade', 'Rubies', 'Sapphires',
        'Tanzanite (gems)'],
    'minerals': [
        'Zinc, ''Wolframite (tungsten ore)', 'Trona (mineral)', 'Silver',
        'Iron', 'Heterogenite (cobalt ore)', 'Gypsum (mineral)',
        'Granite (crushed)', 'Granite', 'Gold', 'Coltan (tantalum ore)',
        'Copper', 'Cassiterite (tin ore)', 'Fluorspar (mineral)', 'Tin',
        'Zinc', 'Wolframite (tungsten ore)', 'Gravel (crushed stones)'],
    'food': [
        'Alcoholic Beverages', 'Baked Goods', 'Bananas', 'Beans (green beans)',
        'Beans (green, soy, yellow)', 'Beef', 'Blueberries',
        'Brazil Nuts/Chestnuts', 'Broccoli', 'Cashews', 'Chile Peppers',
        'Citrus Fruits', 'Cloves', 'Coca (stimulant plant)', 'Cocoa',
        'Coconuts', 'Coffee', 'Corn', 'Cucumbers', 'Cumin', 'Dried Fish',
        'Eggplants', 'Fish', 'Garlic', 'Grapes', 'Goats', 'Hazelnuts', 'Hogs',
        'Lobsters', 'Meat', 'Melons', 'Miraa (stimulant plant)', 'Olives',
        'Onions', 'Peanuts', 'Pepper', 'Physic Nuts/Castor Beans', 'Potatoes',
        'Poultry', 'Pulses (legumes)', 'Rice', 'Salt', 'Sesame', 'Shellfish',
        'Shrimp', 'Strawberries', 'Sugar Beets', 'Sugarcane', 'Tea',
        'Tomatoes', 'Vanilla', 'Wheat', 'Yerba Mate (stimulant plant)',
        'Nile Perch (fish)', 'Pineapples', 'Tilapia (fish)', 'Cattle',
        'Oil (Palm)', 'Oil (palm)', 'Sisal', 'Manioc/Cassava'],
    'decorations': [
        'Artificial Flowers', 'Flowers', 'Flowers (poppies)', 'Sunflowers'],
    'construction': [
        'Bricks', 'Bricks (clay)', 'Cement', 'Ceramics', 'Glass', 'Nails',
        'Palm Thatch', 'Sand', 'Stones', 'Stones (limestone)', 'Timber',
        'Rubber', 'Bamboo'],
    'cigarettes': [
        'Bidis (hand-rolled cigarettes)', 'Tobacco'],
    'home durables': [
        'Brassware', 'Ceramics', 'Furniture', 'Furniture (steel)', 'Glass',
        'Glass Bangles', 'Locks', 'Matches', 'Soap', 'Stones (pumice)',
        'Teak'],
    'fabric': [
        'Carpets', 'Cotton', 'Embellished Textiles', 'Footwear',
        'Footwear (sandals)', 'Garments', 'Garments ', 'Leather',
        'Leather Goods/Accessories', 'Silk Cocoons', 'Silk Fabric',
        'Silk Thread', 'Textiles', 'Textiles (hand-woven)', 'Textiles (jute)',
        'Thread/Yarn', 'Cottonseed (hybrid)', 'Fashion Accessories'],
    'celebratory': [
        'Christmas Decorations', 'Fireworks', 'Incense (agarbatti)', 'Matches',
        'Pyrotechnics', 'Soccer Balls', 'Toys'],
    'energy': ['Charcoal', 'Coal'],
    'medical': ['Surgical Instruments'],
    'sexploitation': ['Pornography'],
    'technology': ['Electronics'],
}

def categorize_industry(x):
    for industry, products in products_by_industry.items():
        product = x['product'].strip()
        if product in products:
            x['industry'] = industry
    return x

labor_table = labor_table.apply(categorize_industry, axis=1)
In [ ]:
# Check if we have missed anything
for index in range(len(labor_table)):
    if pd.isnull(labor_table.iloc[index]['industry']):
        print(labor_table.iloc[index]['product'])
In [ ]:
slave_industry_counts = labor_table['industry'].value_counts()
slave_industry_counts
In [ ]:
labels = slave_industry_counts.keys()
xs = range(len(labels))
ys = slave_industry_counts.values
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.bar(xs, ys, align='center')
plt.xticks(xs, labels);
In [ ]:
target_path = target_folder + '/slave-labor-by-industry.png'
plt.savefig(target_path)
print('slave_labor_by_industry_image_path = %s' % target_path)

Testing for a Relationship Between Labor Trafficking and GDP

In [ ]:
rows = []
for country in labor_countries:
    slave_product_type_count = len(labor_table[labor_table['country'] == country])
    try:
        gdp = gdp_table[gdp_table['country'] == country][selected_year].iloc[0]
    except IndexError:
        print('could not find GDP for %s' % country)
        continue
    rows.append((country, slave_product_type_count, gdp))
rows[:5]
In [ ]:
slave_gdp_table = pd.DataFrame(rows, columns=[
    'country', 'slave product type count', 'gdp'])
slave_gdp_table.head()
In [ ]:
target_path = target_folder + '/slave-product-type-count-and-gdp-by-country.csv'
slave_gdp_table.to_csv(target_path, index=False)
print('slave_gdp_table_path = %s' % target_path)

Spearman Rank Correlation

In [ ]:
from scipy import stats
t = slave_gdp_table
x = stats.spearmanr(t['gdp'], t['slave product type count'])
x
In [ ]:
target_path = target_folder + '/gdp-vs-slave-product-type-count-spearman-rank-correlation.txt'
open(target_path, 'wt').write(str(x))
print('spearman_rank_correlation_text_path = %s' % target_path)

Ordinary Least Squares Regression

In [ ]:
model = sm.OLS(slave_gdp_table['gdp'], slave_gdp_table['slave product type count'])
result = model.fit()
x = result.summary()
x
In [ ]:
target_path = target_folder + '/gdp-vs-slave-product-type-count-ols-regression.txt'
open(target_path, 'wt').write(str(x))
print('ols_regression_text_path = %s' % target_path)

Decision Tree Regression

In [ ]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(
    slave_gdp_table['slave product type count'], 
    slave_gdp_table['gdp'], 
    test_size=0.3,
    random_state=42)
x_test[:5]
In [ ]:
x_train = x_train.values.reshape(-1, 1)
x_test = x_test.values.reshape(-1, 1)
x_test[:5]
In [ ]:
from math import sqrt
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('decision_tree_root_mean_squared_error = %s' % sqrt(mean_squared_error(
    y_test.values, predictions)))

Gradient Boosting Regression

In [ ]:
from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(
    learning_rate=0.1,
    max_depth=None,
    max_leaf_nodes=17,
    min_samples_split=5,
    n_estimators=1000,
    random_state=2,
    subsample=1)
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('gradient_boosting_root_mean_squared_error = %s' % sqrt(mean_squared_error(
    y_test.values, predictions)))

Support Vector Machine Regression

In [ ]:
from sklearn.svm import SVR

model = SVR()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('support_vector_machine_root_mean_squared_error = %s' % sqrt(mean_squared_error(
    y_test.values, predictions)))

Testing for a Relationship Between Labor Trafficking and GDP

These results are highly dependent on the datasets and thus should be interpreted with care.

Charts

{ child_labor_percent_image : Percent of Countries with Child Labor ? Child Slavery }

{ child_labor_product_percent_image : Percent of Countries with Child Labor in More Than One Product }

{ forced_labor_percent_image : Percent of Countries with Forced Labor ? Adult Slavery }

{ forced_labor_product_percent_image : Percent of Countries with Forced Labor in More Than One Product }

{ slave_labor_by_industry_image : Relative Frequency of Slave Labor by Industry ? Industries with More Product Types in More Countries Will Appear Higher }

Models

If people who are free to work produce better quality products than people who are forced to work, then it is reasonable to assume that countries with more slave labor products will have lower GDP. It might even be possible to estimate a country's GDP solely based on the number of product types made by slave labor.

{ slave_gdp_table : Slave Labor and GDP }

{ spearman_rank_correlation_text : Spearman Rank Correlation Test }

{ ols_regression_text : OLS Regression Test }

{ decision_tree_root_mean_squared_error : Decision Tree Root Mean Squared Error ? Lower is Better }

{ gradient_boosting_root_mean_squared_error : Gradient Boosting Root Mean Squared Error ? Lower is Better }

{ support_vector_machine_root_mean_squared_error : Support Vector Machine Root Mean Squared Error ? Lower is Better }