Blaugust 14: FutureLearn Big Data Notes Week One

The first week of this course focused not just on what big data is, but on some common tools to measure it and how they work, in particular SQL, Hadoop and MapReduce.
From the course notes:

  • SQL is very popular for storing data, but targets structured data by design.
  • Hadoop can deal with unstructured data, such as text, by providing a more general paradigm than SQL.

Hadoop can use multiple sources for it’s data. 

MapReduce, unlike SQL, allows you to specify the steps that you take to get to processing the data, which allows you to be more flexible in your approach to large-scale data sets. 

This was a useful article in explaining how Hadoop’s data processing functions.

It was also interesting to be given examples in the course notes of where Hadoop is used – Amazon being one. 

This was also a good article. This helped me make more sense of what this system is capable of and why it is used. 

The Hadoop platform was designed to solve problems where you have a lot of data — perhaps a mixture of complex and structured data — and it doesn’t fit nicely into tables.

The next bit talked about the Hadoop Distributed File System – obviously when large-scale ecommerce etc companies are using it, the data will be spread over many servers, so, to quote the course material:

Hadoop breaks incoming files into blocks and stores them redundantly across the cluster.

I copied and annotated the diagram used into the back of my diary as I was killing time at a local library and it was the only paper I had, so I figured I’d save it here too!


The next few steps were a practical task involving Cloudera, which I’ll be learning how to use.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s