Why do we need Machine Learning?

I have written this 2 part blog to articulate the technical aspects of machine learning in layman’s terms. For part 2 of this series click here

It is now the age of data. For several years we humans have been collecting and collating data for various purposes. When I was a kid, I loved borrowing books from the neighborhood local lending library in Chennai, India. It’s a tiny place stacked with aisles of books, I got access to my first Harry Potter book there. Every time I borrow a book, the librarian used to pull out a heavy bundle of papers, go through them to locate my library number and once he pulls out my sheet he would jot down the title of my book and the date. I used to wonder back then (keep in mind this was the 90s) how the poor librarian would go through all those papers and find how many customers need to be paying late fees. It must have been a nightmare having to go through all those papers one by one and see if the due date of the a book has passed.


Gone are the 90s, we now have computers in the 00s. I visit the same library and see the stacks of papers replaced by a bulky white computer. I borrow a book and the librarian now enters the book ID and customer ID on the computer and then it is magic! All late payments are tracked and everything is perfect now. The librarian doesn’t have to go through stacks and stacks of data in the weekends. The librarian is happy.

Say hello to the 2010s. What now? There are other libraries popping up closer to his. Imagine these are not council libraries lending books for free and these are all privately owned libraries making good profit by lending books to people. Having new libraries popping up means there is competition for your business.  With the advent of Kindle very few people prefer to own and read physical books and this means that few will visit his library. What is the use of the computer now when things are getting more digital.  What does the librarian have that will draw in the right customers to his library?

The answer lies in data. Imagine, he has access to all his previous and current customer’s data. He has collected information on  his customers’ profile (like age, gender, address, education, qualification) etc. He also has information on each book’s profile. Has a particular book been quite popular than others? Does age impact the genre of the book a customer chooses?

He can answer several other questions like these using his data. What makes a customer visit his library regularly? Is it the customer’s location, age or gender? Is it the librarian’s books that influence him? If only he could build a machine that would take into account every single instance that would impact the customer and somehow learn what it is that makes him/her stay? It would be useful if the machine could just predict if a new customer will stay or leave.

The answer lies in machine learning. What is machine learning anyway? It is the process of analysing vast amounts of information (or data), look into several variables (instances like customer information, etc) and predict if a future customer will stay or leave.

The librarian can just use a type of marketing medium to gain more publicity. Imagine a new situation, replace the library to a bigger organisation. This time there are several more problems to address. We might have several more customers to target. They are scattered everywhere around the country. We have several more variables. We are faced in a situation where we need something more than just advertising now. An important point to note is that machine learning is not a replacement to all your current operations. It is only a complement / an add-on bonus.

See how Amazon is taking up the librarian’s problem to the next level with several years of online and offline experience in book sales.

For part 2 of this series click here


Big Data isn’t just big

Imagine your data is constantly getting updated everyday. It is ever growing in size. It is messy and unstructured. That is the precise definition of big data. See below the three V’s of big data.


Volume: Your data is huge (e.g. a 5 TB collection of all emails in your company network)

Variety: Your data is unstructured (e.g. a collection of Twitter statuses: some with images, some with links or simply plain text statuses)

Velocity: Your data is continuously flowing (both examples above are applicable to be have great velocity)

My favourite quote on big data says something like this “About 80% of the world’s data has been generated from just the recent years”.  With growing demand of data scientists  (unicorns who use statistical modelling to Big Data) we are now looking for a more tech savvy future for data and analytics.

Read my other blogs on

Why machine learning?

The types of machine learning problems

Data Visualisation: Open Sourcing Mental Illness

This viz won the Tableau Viz of the Day on 31/03/2017 with over 2000 views. It also won third place in the worldwide competition for #DataForACause

Data For a Cause is an exciting challenge for participants to contribute their data science / visualisation skills for a good cause. A Not-for-Profit organisation comes with a social issue and a relevant data set. Volunteers analyse the data for a week and come up with interesting insights or visualisation pieces. You can see their website here.

I had the chance to contribute this time for Open Sourcing Mental Illness (OSMI) on the survey data they had collected. The survey was about mental health illness in tech jobs. I created a data visualisation for raising awareness on this issue to tech organisations. The most significant finding was that of all the respondents who had mental health issues, nearly half did not seek treatment. My main goal of the visualisation was to make people aware of this issue and help them reach out to OSMI to seek help.

The interactive version is here