Learning Deep Learning

As a data scientist, your field changes quite fast. Expanding your knowledge of all the new techniques that are being developed is therefore not only recommended. It’s a must.Currently, deep learning is number one of the list.

Back in 2014, I played around with some deep learning code that could reproduce similar text as the one being fed into it. It was fun to see what it did with Belgian Law or Shakespeare. Since R is not the greatest tool for deep learning, I needed to learn Python. The stack goes as follows: Pyton – Numpy – Scypy – Pandas – Tensorflow - Keras. Depending on your programming background, this can take the better part of a month.

The learning

After figuring out and installing Anaconda and get the stack operational, you can take your first steps with Jupiter or Spyder as work environment, whichever suits you better.

Python Basics

A copy of this book is present in our office. Working through it took about a week. Skipping chapters that don’t seem necessary.

Convolutional Neural Networks for Visual Recognition

This Stanford course is exceptionally good! Softmax, regularization penalty, cross-entropy loss… all terms that one needs to understand and know in order to get the building blocks and concepts that compose a network, and how you can start to train one. Apparently, you have convolutional layers and pooling layers… The former are the ones doing the pattern detection (of sort), while pooling is just a way to reduce the number of parameters (it’s more than that probably, but that’s mainly the use).

Going back and forth between theory and examples; playing around with code in a language you barely know is quite exhausting and slow. https://www.tensorflow.org/tutorials/ is a great way to start with tensorflow.

Another awesome link! : https://elitedatascience.com/learn-python-for-data-science

Following this step by step really brings you somewhere: “By now, you'll have a basic understanding of programming and a working knowledge of essential libraries. This actually covers most of the Python you'll need to get started with data science.[] The first option is to participate on Kaggle, a site that hosts data science competitions.”

I decided to go for this one: https://www.kaggle.com/c/statoil-iceberg-classifier-challenge 

And by the end of the week:

I had a model that actually read the files, did training and made predictions. It was a lousy model, and of course, I copy-pasted most of the code from snippets and examples online, but that how programming seems to work. Flipping images, rotating them, trying to add some data from the test set, adding pseudo labels. This is working on a neural network. In the end I did manage to get a ‘decent’ result:

Ending around place 2500 of 3300 entries doesn’t feel all that bad for a first timer.

Would I recommend others to learn deep learning? Yes, off course. It’s a great tool with many applications, being able to solve problems that are hard or unfeasible with the classic techniques. Depending on your background, you may be able to get more insight than I did.

The resources

For Python, there was this introduction book laying around in the office, “The quick Python book” And these sites are excellent: