The second section of Big Data Week Two focused again on the more practical aspects of working with software. I’ve increasingly realised that I’m following along with this course rather than trying tasks that are probably aimed at someone with a little more prior experience, but that’s okay, I’m still learning some things!
I read through the it all anyway, I feel like some of it will stick. The data that the tasks use for analysis was interesting to me in light of having looked at open data in Smart Cities, because it provided by the Australian Bureau of Meteorology, so kind of what I discussed in this post.
After tasks on Hadoop and MapReduce, which we learned about last week, the course went on to talk about Apache Pig, which improves the output of Hadoop, using HDFS and MapReduce. From the course material, it has:
- Faster development, increases productivity 10x and is very flexible.
- Expresses data transformation tasks in just a few lines of code.
- Doesn’t reinvent the wheel: 10 lines of Pig Latin = ~200 lines of Java!
(The previous couple of tasks provided linees of Java to execute tasks which is why it’s mentioned.)
Other important facts: Apache Pig runs on a language called Pig Latin, which is very unlike other programming languages including the bits that I am familiar with from the course:
Pig Latin script describes a Directed Acyclic Graph (DAG) where the edges are data flows and the nodes are operators that process the data. Pig Latin has no if statements or forloops, and focuses purely on data flow.
Instead it relies on commands such as ‘store’ or ‘dump’.
Overall, this might have been a less useful course for me simply as I was less interested in the practical coding aspects and more in the theoretical application, which is partly why I’m putting the conclusion here instead of in a separate post. However, even the coding info gave me a basic overview of how people work with large-scale data sets, and the case studies did link in well to my other courses.
I’m not exactly sure where to go next with online learning and finishing off Blaugust. I do have some ideas in exploring online learning outside of Future Learn for a little while, which I’ll be talking about tomorrow.