Blaugust 14: FutureLearn Big Data Notes Week One

The first week of this course focused not just on what big data is, but on some common tools to measure it and how they work, in particular SQL, Hadoop and MapReduce.
From the course notes:

  • SQL is very popular for storing data, but targets structured data by design.
  • Hadoop can deal with unstructured data, such as text, by providing a more general paradigm than SQL.

Hadoop can use multiple sources for it’s data. 

MapReduce, unlike SQL, allows you to specify the steps that you take to get to processing the data, which allows you to be more flexible in your approach to large-scale data sets. 

This was a useful article in explaining how Hadoop’s data processing functions.

It was also interesting to be given examples in the course notes of where Hadoop is used – Amazon being one. 

This was also a good article. This helped me make more sense of what this system is capable of and why it is used. 

The Hadoop platform was designed to solve problems where you have a lot of data — perhaps a mixture of complex and structured data — and it doesn’t fit nicely into tables.

The next bit talked about the Hadoop Distributed File System – obviously when large-scale ecommerce etc companies are using it, the data will be spread over many servers, so, to quote the course material:

Hadoop breaks incoming files into blocks and stores them redundantly across the cluster.

I copied and annotated the diagram used into the back of my diary as I was killing time at a local library and it was the only paper I had, so I figured I’d save it here too!


The next few steps were a practical task involving Cloudera, which I’ll be learning how to use.

Blaugust 8: Starting with Big Data

Now that my Social Media Analysis course is over (though I do have a retrospective post and a post round-up still to do) I’m embarking on a new short two-week course – Big Data: From Data to Decisions.
This is a post is just to cover the first few sections of week’s course material. Much of the test will depend on certain tasks and software that can’t be done via an iPhone and airport wifi, so I’ll be aiming for a post later on about that.

Social Media Analysis focused on working with Big Data in the specific context of Twitter, but as my career progresses I’ll need to be able to analyse and understand the significance of a whole range of large-scale data sets. Data is becoming an ever more powerful tool and resource for many different reasons. 

What is Big Data?

There were a couple of different explanations and opinions offered, but they all agreed on one thing – this is data that is too large to be dealt with without the aid of additional software, codes or algorithms. 

But the size of the data isn’t the only problem, it might have a massive range of points, or come from a huge variety of sources. So data scientists have to be quite inventive in the forms of modelling that they use to display it. 

The course asked us to consider what data forms we use, how we use them and what might change if they were suddenly a hundred or a thousand times larger. I talked briefly in the comments about working with venue box office data, and the need to consider how much data and the variety of data attached to each account.

Some of the case studies shown also tied into my Smart Cities course, empowering citizens via apps to collect big data on topics such as the Great Barrier Reef; details can be found here. http://www.gbrmpa.gov.au/eye-on-the-reef/f?p=150:LOGIN:31299941098770:::::

Blaugust 6: More Internet of Things – Thingful and Manchester Smart Car charges

In my Smart Cities Future Learn Course, we were encouraged to check out where Smart devices could be found in our home cities, using a site called Thingful.

Thingful pointed out something in Manchester that I knew about but hadn’t even thought of in relation to this course – Smart Charge points for electric cars. Manchester and the surrounding boroughs have invested heavily in this, and looking at http://ev.tfgm.com/charging.html – apparently people who use these can download a smartphone app to both track payments and energy usage.

Satellite image of Manchester overlaid with Smart device points

After a lot of theory, it was really interesting to see some smart technology in action, and to realise that yes, these ideas were already being put into practice in really simple but effective ways.

You can see the full map of Manchester through Thingful in this link.

Blaugust 5 – A Connected Household Future

A little while ago, I posted a blog on smart household devices. Today I wanted to share a few thought I had how these might develop into entire smart connected households.

In my Smart Cities course discussions on The Internet of Things, I posted this comment.

Earlier in the course, we talked a lot about open data and open software. I follow a number of programmers and software devs on Twitter who are interested on IoT, and have hacked their smart devices to create custom home networks. If coding starts to become something more commonly taught in schools across the world (which I know a lot of governments are very keen on), maybe in a few decades, it will be pretty standard for households to build their own efficient networks, adapted to the people that live there. Of course, upcoming smart devices will need to not be too locked in to propriety software, I guess.

I know that systems already exist for multiple household items, as well as heating, lighting etc. to be integrated, if it’s possible that in the future, households will be able to entirely customise their smart homes to suit their own particular needs. Much of the OU’s smart cities course has focussed on how smart technology needs to focus on energy-conservation and lowering pollution for a more sustainable future, and customisable smart homes would be a good step towards that.

Future Learn Smart Cities Week 4: Civic Hacking

I’m into week four of my Smart Cities course, and suddenly I’ve hit the point where I’m starting to get seriously inspired. This week, the course materials discussed the idea of ‘civic hacking’.

According to the course notes: “a civic hacker is a person who collaborates with others to build open source solutions using publicly released data, code and technology.” The examples given were Open Data Day, Code for Europe, and Code for America.

In the course comments sections, a lot of people have brought up the issues of getting governments and local authorities interested in solving smart problems on a small scale, and how to ensure that citizens are involved. There’s also the concern of large-scale ‘experiments’ that could easily fail. These are small-scale and low risk ideas that could be slowly built on if they prove workable.

I’ve copied my comments on course hackathons below. I’ve decided to start archiving them here, so that I can remind myself and not have to trail back through FutureLearn for them!

What I love about this is that, rather than putting together a large scale and potentially expensive project at government level that might or might not work, people can throw dozens of ideas at the wall and see what starts to stick. People will be less scared to suggest crazy-sounding solutions, and encouraged to think outside the box, rather than within the budget.

The one thing that these hackathons do rely on is easily accessible open data sources from local authorities. Manchester City Council has announced it’s dedication to coming a smart city through open data, and despite some quite interesting projects coming out of this – such as the Open Data Infrastructure Map, and trawl through their open data catalogue shows that a lot of the content is either missing or out of date, which is disappointing. Anyway, I’ve signed up to a newsletter from Open Data Manchester to see what interesting projects might be in progress around the city.

 

Blogpril April 8th Google Maps – Past and Future

Google Maps. Google Steetview. They’re become something that we take for granted, so looking back, it’s strange to consider what a ridiculous undertaking it must have seemed to investors- photographing every metre of road across the entire world, and keeping it constantly updated, through the actually fairly basic method of mounting a couple of cameras on a fleet of cars and just, well, driving them everywhere. Then someone has to map that onto a satellite image that covers the entire world in minute detail. 

With Google at the forefront of so many technologies, it will be interesting to see both how they progress in mapping, and what other functions can come out of Maps. It’s already possible, for example, to see past versions of some places, mostly using historical imagery. Given the number of times that Google has updated Steetview in some places since it started, it would be really interesting in the future to be able to see every version of a place as it develops (perhaps even as a timelapse?) Combined with advances in virtual reality technology, it could be amazing to see.