I have written this 2 part blog to articulate the technical aspects of machine learning in layman’s terms. For part 1 of this series click here
In the last blog we looked into a small introduction to machine learning and why it is important. I suggest you read the last blog to get a better understanding from this one.
This time we will dive into a more technical introduction to the types of machine learning problems. Let’s look closely into the situations you might come across when you are trying to build your own predictive model.
Imagine you own a bakery. Your business seems to be quite popular among all types of customers – kids, teens, adults. But you want to know if the people truly like your bakery or not. It can depend on anything (e.g., the order they place, their age, their favourite flavour, suggestions from their family, their friends). These are the predictor variables that impact our answer. But the answer you are looking forward is a simple Yes or No. Do people like your bakery or not? This type of machine learning is known as classification. Sometimes there are more than 2 categories. For example how much do people like your bakery (Very much, Quite a bit, Not at all). These are ordinal classifiers. Ordinal classifiers can also be 1, 2 or 3 but remember this is not the same as regression (see below)
You are the owner of the same bakery. But you want more than a classification answer. You want to go straight to the target and find out how much a customer might spend based on their historic data. You are now looking at a numerical scale measurement for an answer. It can range anywhere from £5 to £15 per visit. Imagine every time you see a new customer walk into your bakery you see the amount they are most likely to spend floating above their head. This is a regression situation.
You don’t know what you want to know. You just want to know if there are groups of customers who are likely to act in a particular way. Do little kids always go for the cupcakes with cartoon characters. Do young teens with their girlfriend / boyfriend go for heart shaped one? You want the data to frame the question and answer it. We are looking for patterns, groups or clusters in the data. This is the Clustering problem
In situations 1 and 2 we have a question framed, we have a set of predictors that we think might influence the answer to our question. This type of machine learning is known as supervised learning. In situation 3, we did not have any question in our mind but we are looking to find patterns or groups from the data. This is known as unsupervised learning.
- Classification: supervised machine learning method where the output variable takes the form of class labels.
- Regression: supervised machine learning method where the output variable takes the form of continuous values.
- Clustering: unsupervised machine learning method where we group a set of objects and find whether there is some relationship between the objects
For part 1 of this series click here
Or read my blog on big data