Data science on the command line

There are just a few linux command line tools that I use many times a day!

less for checking the contents of files, verifying you’ve got the right output or input format, quickly examining data

grep for searching within files, especially as you can search for regular expressions

awk is incredibly useful for doing basic operations on text files, simple transformations from one format to another, or getting simple stats

Combine these with a few useful helpers like paste, diffsortwc, head, tail and cut, and you can do some really complex operations on actually quite large files.

