- Bash Shell
- Regular Expressions with Bash
- SED, GREP and Find
- Project 1: The ‘US News’ Uni Ranks
- Project 2: Facebook Data Mining
- Project 3: Best Australian Cities - Least Crimes
- Project 4: Mining Shakespear-era Plays and Poems
- And more..
This book starts with some practical bash-based flat file data mining projects involving: University ranking data [Previews: Part I, Part II, Part III] Sample video lectures. Facebook data, [Previews: Part I, Part II], Crime Data, Shakespeare-era plays and poems data.
If you haven’t used Bash before, feel free to skip the projects and get to the tutorials part. Read the tutorials and then come back to the projects again. The tutorial section will introduce with bash scripting, regular expressions, AWK, sed, grep and so on.
Bash may not the best way to handle all kinds of data! But, there often comes a time when you are provided with a pure Bash environment, such as what you get in the common Linux based super computers and you just want an early result or view of the data before you drive into the real programming, using Python, R and SQL, SPSS, and so on.
Expertise in these data-intensive languages also comes at the price of spending a lot of time on them. In contrast, bash scripting is simple, easy to learn and perfect for mining textual data! Particularly if you deal with genomics, microarrays, social networks, life sciences, and so on. It can help you to quickly sort, search, match, replace, clean and optimise various aspect of your data, and you wouldn’t need to go through any tough learning curves.
There are several examples of practical data mining that will have a flow of importing specific data resources into flat text-type files. Bash can run different programs (grep, sort, sed, and so on) on those files, clean, optimise and extract preliminary views (cut, csvlook, view, cat, head, etc.) of the data. There is one part of data mining, which involves unstructured data and then transforming it into a structured one (awk, shell). A scripting language like Bash can be very useful for doing the transformation. We strongly believe, learning and using Bash shell scripting should be the first step if you want to say, Hello Big Data!