The purpose of this post is to collect together online resources for anyone who wants to learn how to do machine learning (data science) in Python, starting from scratch. Some of these sites I’ve used, and others I’ve only glanced at, but I hope they let you get started no matter what your level. I’ll add new stuff as I come across it, but let me know if you have any useful resources to add!
If you’re new to programming, the first step is to get started! Code Academy has an introduction to Python tutorial which will get you started with some basic concepts. Google’s tutorial is a bit more advanced, but should be do-able once you have an understanding of variables, conditionals and loops:
With the tools in place, the best thing to do is dive in. A great place to start is Kaggle. They have some tutorial tasks to get started on, including one from Data Science London. This is a binary supervised classification task so you’ll want to read up on how that works, but it’s essentially about deciding whether a data example is from one class or another. ‘Class’ in this context can be things like:
- Is an email spam or not?
- Is a credit card transaction fraudulent or not?
- Does some audio contain speech or noise?
You can use sci-kit learn to get started without knowing too much about what’s going on under the hood. Perhaps the most important thing to get to grips with is the use of training/dev/test data, cross-validation and generalisation. But, if you want to really get a good understanding, then Coursera’s Machine Learning course covers a lot of the basics of machine learning, with some practical tasks to complete.
Finally, be aware of common pitfalls.
If you can build a classifier to work on the Data Science London Kaggle challenge, and understand how it works, then you’re well on the way to learning about more advanced stuff. But that’s a topic for another post!