Blaugust 8: Starting with Big Data

Now that my Social Media Analysis course is over (though I do have a retrospective post and a post round-up still to do) I’m embarking on a new short two-week course – Big Data: From Data to Decisions.
This is a post is just to cover the first few sections of week’s course material. Much of the test will depend on certain tasks and software that can’t be done via an iPhone and airport wifi, so I’ll be aiming for a post later on about that.

Social Media Analysis focused on working with Big Data in the specific context of Twitter, but as my career progresses I’ll need to be able to analyse and understand the significance of a whole range of large-scale data sets. Data is becoming an ever more powerful tool and resource for many different reasons. 

What is Big Data?

There were a couple of different explanations and opinions offered, but they all agreed on one thing – this is data that is too large to be dealt with without the aid of additional software, codes or algorithms. 

But the size of the data isn’t the only problem, it might have a massive range of points, or come from a huge variety of sources. So data scientists have to be quite inventive in the forms of modelling that they use to display it. 

The course asked us to consider what data forms we use, how we use them and what might change if they were suddenly a hundred or a thousand times larger. I talked briefly in the comments about working with venue box office data, and the need to consider how much data and the variety of data attached to each account.

Some of the case studies shown also tied into my Smart Cities course, empowering citizens via apps to collect big data on topics such as the Great Barrier Reef; details can be found here.