Open In Colab

Basic Concepts of Machine Learning

Concepts

Human Example

Machine Learning is very similar to how humans learn, and at the end of the day, the fundamental concepts are very similar to what we already know.

Let's say you have a child and two pieces of fruit: an apple and a banana. The child doesn't know what the red fruit is called; it could easily be the banana.

So, how do you get the child to understand that the red fruit is the apple? It's pretty simple: you show them the red fruit and say "apple". You show them the yellow fruit and say "banana". If you continue this process again and again and again, they'll eventually figure out which name belongs with each fruit by looking at the characteristics of the apple and figuring out that they match with the word "apple".

The child is able to figure this out because it gets rewarded when it gets the name of the fruit right with a "Good Job!". When it gets the fruit wrong, it gets punished in a sense with a "No, you got that wrong". The child wants to be rewarded, so it'll keep doing what it did right before and continue to improve this.

How does Machine Learning relate to the human example?

Machine Learning does the exact same thing, but with a computer instead of humans.

We refer to this period when we are trying to teach the machine learning program what is right and what is wrong as "training". There are three main forms of training: supervised, unsupervised, and semi-supervised learning.

For the sake of this example, we are going to focus on supervised learning, which is the most common.

Understanding the data

In supervised learning, we have a dataset with features and labels.

Features are the characteristics of whatever you're trying to predict. For example, if it were the child's apple, the features could be: color (red), shape (round), and firmness (hard).

Labels are what you are trying to predict -- the correct answer. In this case, the label of the apple would be "apple", since we are trying to guess its name.

The algorithm, just like the child, wants to associate the features with the labels. It wants to figure out that the red, round, hard fruit is called an apple.

Using the data in our ML model

The algorithm will start off by essentially guessing which labels are associated with the features.

If it gets the feature right, then that's great! Using mathematics, it will understand that those features are correlated with the correct label. From then on, it will continue to associate those features with the correct labels.

However, if it gets the label wrong, it will use mathematics to determine "okay, that actually made my program worse." From then on, it will understand that whatever mathematical patterns was in those features is not associated with the correct label.

The machine learning model will run through thousands of these cycles (each one called an epoch) and see how these features and labels connect again and again. Each time, it carries over what it learned from the last cycles and continues to improve how well it matches with the data. Much like the human baby will use the patterns it learned in the past, the machine learning model will do the same.

Conclusion

Just like humans learn different concepts in all different ways, there are countless different ways these machine learning concepts can be applied. But this is what most models boil down to: you feed data to an algorithm, and it mathematically associates certain features with certain labels. As we tell it whether its prediction was right or wrong, it adjusts and continues doing what we said was right to improve itself over time.

Mathematics behind the concepts

I want to preface this section by saying that the mathematics may be very complicated depending on your familiarity with algebra. It is totally possible to get by with simply understanding the concepts behind machine learning and leaving the mathematics until later. In this course, this is essentially the only time we will be going over the fundamental math, as you don't need to be too familiar with it to succeed at a beginner level.

With that said, if you are enthusiastic about how exactly the computer uses math to determine which features are associated with what labels, and how it learns from its past errors, this section will go over some common ways it does so!

The Loss Function

We know our model has an initial hypothesis about which features are correlated with which labels.

In order to proceed, our model needs to understand how far off its prediction was from the real answer. To do so, we use a loss function.

To be clear, there are a lot of different loss functions you can use — cross-entropy, mean-square error, and more. However, to keep it simple, we're going to focus on the most common one: mean-square error.

Here is the mean square error loss function:

On first take, this looks like a pretty intimidating math equation. However, it's not so bad once we understand it. Let's look at each part of it.

n: the number of entries in our dataset

i: which entry in the dataset we are on

yi: the current y value that our function predicted

yip: the true y value from the dataset

  1. First, let's look inside the parentheses. With yi - yip, we are subtracting the true value of the function from whatever our function predicted, which gives us the difference between the right and wrong answer.

  2. Then, we square this difference. This has been proven to be more effective in machine learning, but the mathematical proof is a bit complicated, so we aren't going to go over it here. Don't worry too much about this step.

  3. The sigma sign is the symbol that looks like a fancy "e". Basically, it's saying that we are going to do the (yi - yip)2 again and again for each value we have. After we find the difference of every value we have, we are going to add them all up.

  4. At this point, we got the difference for every prediction we made versus the real answer, squared it, and added them all up together. Now, we are going to divide it by the total number of predictions we made.

You may remember that we find the mean, or average, of something by adding all its parts together and dividing by the total number of parts present. That's exactly what we are doing here. We are finding the difference between the model's predictions and the real value, squaring this difference, and finding the average for all predictions.

So, what is our output? At this point, we would have a number that represents, on average, how wrong our model's prediction was compared to the real answer. Our model's fundamental goal is to minimize the output of this loss function, meaning it's the least wrong it can be.

Using the Loss Function to improve accuracy

Now, we have a target goal in mind. We want to decrase how wrong our machine learning model is by finding the miniumum of our loss function.

Let's say we have a graph:

The y coordinates of this graph represents our loss function. Remember, our goal is to minimize the loss function and find the lowest value it can have. So on this graph, if we are at the point "initial weights", we want the output of our loss function to be at the global minimum.

So, how do we do this?

Let's say our machine learning model does a couple predictions. It puts these predictions into the loss function and comes up with the output. Pretend that output is where the "initial weights" point is on the graph.

Then, it does a couple more predictions. It puts all these predictions into the loss function and comes up with the output. However, let's say the loss function's output is actually higher than it was before. The model knows it's going in the wrong direction, since it should be going down, not up.

So, it does even more predictions, but continues to change its predictions until it comes up with a lower loss function output than it initially had. Now, the model knows it's doing better, since it's being less wrong.

The model continues to do this over and over again thousands of times, testing which way it can go before finding the path that lets it have a lower loss function output.

Eventually it reaches the global minimum, where it cannot be get lower no matter what it does. We say that the model has converged, meaning it is the best it can be.