Satellite Image Recognition Research Log




Pay Notebook Creator: Roy Hyunjin Han10
Set Container: Numerical CPU with TINY Memory for 10 Minutes 0
Total0

Satellite Image Recognition Research

Welcome to our research log on our satellite image recognition work. You can find our ongoing and upcoming missions in the Active and Available folders. Archived missions are stored in the Cancelled and Completed folders.

History

This work first started in December 2007 as I sat in a lab of the Modi Research Group at Columbia University, clicking on buildings in satellite images. The goal was to identify where people lived in order to provide better energy and water infrastructure as part of the Millennium Villages Project. Previously, the team gathered locations of houses, factories and schools using GPS devices in the field, but the lab was looking for more cost-effective approaches to collect the location data and having graduate students click on buildings in satellite images was their interim solution.

After a few weeks of mindless clicking and complaining on the phone to Cathaleen, I realized that this could be an opportunity to apply some of the innovative work I had seen at the Computational and Biological Learning Laboratory at New York University when I had volunteered there from September 2005 to August 2006. Professor Yann LeCun and his graduate students had created a Lisp package called Lush that encapsulated their work on Convolutional Neural Networks, which at the time was a niche field of research that had garnered modest success in industry but was decidedly unpopular in academic circles in favor of the more theoretically crisp Support Vector Machines. When I had volunteered at CBLL, I was not mature enough to contribute anything particularly useful, but I had learned and seen enough to understand that convolutional neural networks worked on images, sounds and videos.

I proposed the idea of creating an automated building classifier at a lab presentation and Professor Vijay Modi and Dana Pillai were very supportive of the project. However, in order to create the training sets for the classifier, I still continued to click on buildings one by one for hours each day.

Toward the end of 2008, I had developed a prototype that seemed to perform better than expected on the test sets. I constructed the test sets using an early version of GDAL with Python and used subprocess to wrap the Lush Convolutional Neural Network package. The images had been acquired through DigitalGlobe's QuickBird satellite and I think for the prototype, I might have used just the panchromatic band. Eventually, I integrated the red, green, blue, near-infrared multispectral bands as well. However, the addition of the color bands multiplied the size of the dataset five fold and what had already been a time consuming training process slowed to a crawl. It was frustrating to run experiments and tweak parameters when training a new classifier took months. I put the code on a subversion repository and named it furtherperception.

Sometime in between, I started working with the team on a separate web-based decision support tool for planning energy infrastructure which became known as the networkplanner and was finished in 2009 to much success in the field. I used Pylons for the web framework and greatly expanded my knowledge of geospatial algorithms and techniques.

Around 2010, I restarted work on a second version of the furtherperception building classifier with the ambition of releasing a web-based version of the tool. A statistician named Jiehua Chen of the Statistics Department of Columbia University was tasked with evaluating the performance of the classifier but did not like the black box approach of Convolutional Neural Networks, preferring the more established techniques from statistics. No amount of empirical results could overcome the skepticism. Furthermore, the leap from a collection of command-line scripts to a web-based tool was near insurmountable at the time. Though PyCUDA had recently been released, we did not have the resources to adapt our code to work on a GPU and training each classifier still took months on state of the art CPUs. Overall, the resulting classifiers from this phase of research performed well in identifying buildings in satellite images in certain areas such as Uganda, but had difficulty in parts of Ethiopia. We also did not have a robust way to recognize duplicate markings of the same building as a single building, resulting in a huge variation in the estimated number of buildings for a given image.

With the growing success of Alex Krizhevsky's cuda-convnet in 2012 and 2013, we restarted work on the building classifier in a separate repository named count-buildings at the start of 2014. Thanks to GPUs, the training period of our classifiers became days and weeks instead of months. We also made a web-based version of the satellite image recognition tool available for the first time in September 2014, in an early version of the CrossCompute platform. Although our building classifier had become significantly more accurate, we had still not found a way to resolve the duplicate building marker issue. Furthermore, there was some misunderstanding on where and how the building classifier was supposed to be used. Users tried to scan a building classifier trained on images in Myanmar over images in Nigeria and Kenya, with disastrous results. In response, we created several versions of a generic building classifier, but by May 2015, we ran out of money to continue the experiments. A fundamental rehaul of the website toward the end of 2015 was incompatible with the GPU worker architecture and so we could not include the count-buildings tool in the subsequent launch at PyCon 2016.

Since then, we have been improving the core architecture of the website to support GPUs, and with the advent of user-friendly deep learning packages such as PyTorch and TensorFlow/Keras, we believe the time is right to make another attempt at a web-based tool that pinpoints where people live using satellite imagery.

Acknowledgments

Thank you to Professor Vijay Modi and Edwin Adkins for providing the motivation for this project.

Thank you to Marc'Aurelio Ranzato, Alex Krizhevsky, Ilya Sutskever and Professor Yann LeCun for their advice at various stages of the project throughout the years.