Data science on the command line

There are just a few linux command line tools that I use many times a day!

less for checking the contents of files, verifying you’ve got the right output or input format, quickly examining data

grep for searching within files, especially as you can search for regular expressions

awk is incredibly useful for doing basic operations on text files, simple transformations from one format to another, or getting simple stats

Combine these with a few useful helpers like paste, diffsortwc, head, tail and cut, and you can do some really complex operations on actually quite large files.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s