This exploratory tool and notebook are based on a presentation by Eric Schles at our workshop on Computational Approaches to Fight Human Trafficking.
{ labor_table : Labor Trafficking by Product and Country ? Child Labor is child slavery and Forced Labor is adult slavery }
{ gdp_table : Gross Domestic Product by Country ? USD in millions }
{ selected_year : Selected Year of Analysis }
# CrossCompute
labor_table_path = 'labor.csv'
gdp_table_path = 'gdp.csv'
selected_year = '2017'
target_folder = '/tmp'
# Install python packages (this takes a few seconds)
import pip
pip.main('install statsmodels'.split())
import numpy as np
import pandas as pd
from statsmodels import api as sm
from IPython.display import Image
from matplotlib import pyplot as plt
labor_table = pd.read_csv(labor_table_path)
labor_table.head()
labor_countries = labor_table['country'].unique()
len(labor_countries)
gdp_table = pd.read_csv(gdp_table_path)
gdp_table.head()
child_labor_table = pd.DataFrame()
for country in labor_countries:
table = labor_table[labor_table['country'] == country]
child_labor_product_type_count = len(table[table['child labor'] == 'X'])
child_labor_table = child_labor_table.append({
'country': country,
'child labor product type count': child_labor_product_type_count,
}, ignore_index=True)
child_labor_table.head()
t = child_labor_table
labels = [
'child labor',
'no child labor']
country_count_with_child_labor = len(t[t[
'child labor product type count'] > 0])
country_count_without_child_labor = len(t[t[
'child labor product type count'] == 0])
sizes = [
country_count_with_child_labor,
country_count_without_child_labor]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
sizes,
labels=labels,
autopct='%1.1f%%',
startangle=90)
ax.axis('equal'); # Draw pie as a circle
target_path = target_folder + '/percent-of-countries-with-child-labor.png'
plt.savefig(target_path)
print('child_labor_percent_image_path = %s' % target_path)
So, as we can see, 98.7% of all countries surveyed have some form of child labor.
t = child_labor_table
labels = [
'more than one product type',
'one or fewer product types']
country_count_with_child_labor_in_more_than_one_product = len(t[t[
'child labor product type count'] > 1])
country_count_with_child_labor_in_one_or_fewer_products = len(t[t[
'child labor product type count'] <= 1])
sizes = [
country_count_with_child_labor_in_more_than_one_product,
country_count_with_child_labor_in_one_or_fewer_products]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
sizes,
labels=labels,
autopct='%1.1f%%',
startangle=90)
ax.axis('equal') # Draw pie as a circle
plt.show()
And 71.1% have more than one product type where child labor is used.
target_path = target_folder + '/percent-of-countries-with-child-labor-in-more-than-one-product.png'
plt.savefig(target_path)
print('child_labor_product_percent_image_path = %s' % target_path)
forced_labor_table = pd.DataFrame()
for country in labor_countries:
table = labor_table[labor_table['country'] == country]
forced_labor_product_type_count = len(table[table['forced labor'] == 'X'])
forced_labor_table = forced_labor_table.append({
'country': country,
'forced labor product type count': forced_labor_product_type_count,
}, ignore_index=True)
forced_labor_table.head()
t = forced_labor_table
labels = [
'forced labor',
'no forced labor']
country_count_with_forced_labor = len(t[t[
'forced labor product type count'] > 0])
country_count_without_forced_labor = len(t[t[
'forced labor product type count'] == 0])
sizes = [
country_count_with_forced_labor,
country_count_without_forced_labor]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
sizes,
labels=labels,
autopct='%1.1f%%',
startangle=90)
ax.axis('equal') # Draw pie as a circle
plt.show()
So, as we can see, 48.7% of all countries surveyed have some form of forced labor.
target_path = target_folder + '/percent-of-countries-with-forced-labor.png'
plt.savefig(target_path)
print('forced_labor_percent_image_path = %s' % target_path)
t = forced_labor_table
labels = [
'more than one product type',
'one or fewer product types']
country_count_with_forced_labor_in_more_than_one_product = len(t[t[
'forced labor product type count'] > 1])
country_count_with_forced_labor_in_one_or_fewer_products = len(t[t[
'forced labor product type count'] <= 1])
sizes = [
country_count_with_forced_labor_in_more_than_one_product,
country_count_with_forced_labor_in_one_or_fewer_products]
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.pie(
sizes,
labels=labels,
autopct='%1.1f%%',
startangle=90)
ax.axis('equal') # Draw pie as a circle
plt.show()
And 22.4% have more than one product type where forced labor is used.
target_path = target_folder + '/percent-of-countries-with-forced-labor-in-more-than-one-product.png'
plt.savefig(target_path)
print('forced_labor_product_percent_image_path = %s' % target_path)
products_by_industry = {
'gems': [
'Diamonds', 'Emeralds', 'Gems', 'Jade', 'Rubies', 'Sapphires',
'Tanzanite (gems)'],
'minerals': [
'Zinc, ''Wolframite (tungsten ore)', 'Trona (mineral)', 'Silver',
'Iron', 'Heterogenite (cobalt ore)', 'Gypsum (mineral)',
'Granite (crushed)', 'Granite', 'Gold', 'Coltan (tantalum ore)',
'Copper', 'Cassiterite (tin ore)', 'Fluorspar (mineral)', 'Tin',
'Zinc', 'Wolframite (tungsten ore)', 'Gravel (crushed stones)'],
'food': [
'Alcoholic Beverages', 'Baked Goods', 'Bananas', 'Beans (green beans)',
'Beans (green, soy, yellow)', 'Beef', 'Blueberries',
'Brazil Nuts/Chestnuts', 'Broccoli', 'Cashews', 'Chile Peppers',
'Citrus Fruits', 'Cloves', 'Coca (stimulant plant)', 'Cocoa',
'Coconuts', 'Coffee', 'Corn', 'Cucumbers', 'Cumin', 'Dried Fish',
'Eggplants', 'Fish', 'Garlic', 'Grapes', 'Goats', 'Hazelnuts', 'Hogs',
'Lobsters', 'Meat', 'Melons', 'Miraa (stimulant plant)', 'Olives',
'Onions', 'Peanuts', 'Pepper', 'Physic Nuts/Castor Beans', 'Potatoes',
'Poultry', 'Pulses (legumes)', 'Rice', 'Salt', 'Sesame', 'Shellfish',
'Shrimp', 'Strawberries', 'Sugar Beets', 'Sugarcane', 'Tea',
'Tomatoes', 'Vanilla', 'Wheat', 'Yerba Mate (stimulant plant)',
'Nile Perch (fish)', 'Pineapples', 'Tilapia (fish)', 'Cattle',
'Oil (Palm)', 'Oil (palm)', 'Sisal', 'Manioc/Cassava'],
'decorations': [
'Artificial Flowers', 'Flowers', 'Flowers (poppies)', 'Sunflowers'],
'construction': [
'Bricks', 'Bricks (clay)', 'Cement', 'Ceramics', 'Glass', 'Nails',
'Palm Thatch', 'Sand', 'Stones', 'Stones (limestone)', 'Timber',
'Rubber', 'Bamboo'],
'cigarettes': [
'Bidis (hand-rolled cigarettes)', 'Tobacco'],
'home durables': [
'Brassware', 'Ceramics', 'Furniture', 'Furniture (steel)', 'Glass',
'Glass Bangles', 'Locks', 'Matches', 'Soap', 'Stones (pumice)',
'Teak'],
'fabric': [
'Carpets', 'Cotton', 'Embellished Textiles', 'Footwear',
'Footwear (sandals)', 'Garments', 'Garments ', 'Leather',
'Leather Goods/Accessories', 'Silk Cocoons', 'Silk Fabric',
'Silk Thread', 'Textiles', 'Textiles (hand-woven)', 'Textiles (jute)',
'Thread/Yarn', 'Cottonseed (hybrid)', 'Fashion Accessories'],
'celebratory': [
'Christmas Decorations', 'Fireworks', 'Incense (agarbatti)', 'Matches',
'Pyrotechnics', 'Soccer Balls', 'Toys'],
'energy': ['Charcoal', 'Coal'],
'medical': ['Surgical Instruments'],
'sexploitation': ['Pornography'],
'technology': ['Electronics'],
}
def categorize_industry(x):
for industry, products in products_by_industry.items():
product = x['product'].strip()
if product in products:
x['industry'] = industry
return x
labor_table = labor_table.apply(categorize_industry, axis=1)
# Check if we have missed anything
for index in range(len(labor_table)):
if pd.isnull(labor_table.iloc[index]['industry']):
print(labor_table.iloc[index]['product'])
slave_industry_counts = labor_table['industry'].value_counts()
slave_industry_counts
labels = slave_industry_counts.keys()
xs = range(len(labels))
ys = slave_industry_counts.values
figure = plt.figure(figsize=(20, 10))
ax = figure.add_subplot(111)
ax.bar(xs, ys, align='center')
plt.xticks(xs, labels);
target_path = target_folder + '/slave-labor-by-industry.png'
plt.savefig(target_path)
print('slave_labor_by_industry_image_path = %s' % target_path)
rows = []
for country in labor_countries:
slave_product_type_count = len(labor_table[labor_table['country'] == country])
try:
gdp = gdp_table[gdp_table['country'] == country][selected_year].iloc[0]
except IndexError:
print('could not find GDP for %s' % country)
continue
rows.append((country, slave_product_type_count, gdp))
rows[:5]
slave_gdp_table = pd.DataFrame(rows, columns=[
'country', 'slave product type count', 'gdp'])
slave_gdp_table.head()
target_path = target_folder + '/slave-product-type-count-and-gdp-by-country.csv'
slave_gdp_table.to_csv(target_path, index=False)
print('slave_gdp_table_path = %s' % target_path)
from scipy import stats
t = slave_gdp_table
x = stats.spearmanr(t['gdp'], t['slave product type count'])
x
target_path = target_folder + '/gdp-vs-slave-product-type-count-spearman-rank-correlation.txt'
open(target_path, 'wt').write(str(x))
print('spearman_rank_correlation_text_path = %s' % target_path)
model = sm.OLS(slave_gdp_table['gdp'], slave_gdp_table['slave product type count'])
result = model.fit()
x = result.summary()
x
target_path = target_folder + '/gdp-vs-slave-product-type-count-ols-regression.txt'
open(target_path, 'wt').write(str(x))
print('ols_regression_text_path = %s' % target_path)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(
slave_gdp_table['slave product type count'],
slave_gdp_table['gdp'],
test_size=0.3,
random_state=42)
x_test[:5]
x_train = x_train.values.reshape(-1, 1)
x_test = x_test.values.reshape(-1, 1)
x_test[:5]
from math import sqrt
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('decision_tree_root_mean_squared_error = %s' % sqrt(mean_squared_error(
y_test.values, predictions)))
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor(
learning_rate=0.1,
max_depth=None,
max_leaf_nodes=17,
min_samples_split=5,
n_estimators=1000,
random_state=2,
subsample=1)
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('gradient_boosting_root_mean_squared_error = %s' % sqrt(mean_squared_error(
y_test.values, predictions)))
from sklearn.svm import SVR
model = SVR()
model.fit(x_train, y_train)
predictions = model.predict(x_test)
print('support_vector_machine_root_mean_squared_error = %s' % sqrt(mean_squared_error(
y_test.values, predictions)))
These results are highly dependent on the datasets and thus should be interpreted with care.
{ child_labor_percent_image : Percent of Countries with Child Labor ? Child Slavery }
{ child_labor_product_percent_image : Percent of Countries with Child Labor in More Than One Product }
{ forced_labor_percent_image : Percent of Countries with Forced Labor ? Adult Slavery }
{ forced_labor_product_percent_image : Percent of Countries with Forced Labor in More Than One Product }
{ slave_labor_by_industry_image : Relative Frequency of Slave Labor by Industry ? Industries with More Product Types in More Countries Will Appear Higher }
If people who are free to work produce better quality products than people who are forced to work, then it is reasonable to assume that countries with more slave labor products will have lower GDP. It might even be possible to estimate a country's GDP solely based on the number of product types made by slave labor.
{ slave_gdp_table : Slave Labor and GDP }
{ spearman_rank_correlation_text : Spearman Rank Correlation Test }
{ ols_regression_text : OLS Regression Test }
{ decision_tree_root_mean_squared_error : Decision Tree Root Mean Squared Error ? Lower is Better }
{ gradient_boosting_root_mean_squared_error : Gradient Boosting Root Mean Squared Error ? Lower is Better }
{ support_vector_machine_root_mean_squared_error : Support Vector Machine Root Mean Squared Error ? Lower is Better }