K in KNN; Essence of the tech.

swapnil raj
4 min readJan 23, 2021

KNN or K-Nearest Neighbour is one of the simplest algorithms for classification as well as regression in machine learning.

KNN is a non-parametric method proposed by Thomas Cover and it is used extensively. This algorithm is very simple but the main problem is deciding “K“ which is the number of neighbours which we want to take into account.

Working with KNN

KNN is a simple algorithm which takes a query point and calculates K test points which are closest to it. Now from those K points, KNN decides the output for our query point.

Why K is important?

Coming to the main point of this article, which is why K is important? We know that our output for the query point will depend on the output of its neighbours.

K value changes the decision curve of our model significantly and make our model overworked which in turn make our model prone to outliers, also taking a very high value of K will make our model underworked.

Now we will see some of the graphs with different K values and how the decision curve gets changed with a change in K value.

Here we took a uniformly distributed demo data and we tried applying KNN with different K values to find the results.

Given a query point if it lies in the blue region then our output will be blue class and if it lies in the red region then its class label will be red.

One thing that we can notice that due to a single blue point at x=5 makes a whole region blue, but there is a high probability that the point is an outlier.

Here we can see when we increase the K value and take k=5 our decision plane is less affected by a single point.

Here the decision plane seems to be more accurate when compared to the previous case.

Now, let us increase our K value even more to see the results.

For K=15 we can observe a bit of underfitting. We can easily see our decision boundary in two parts. In the blue region, we can see some red area but other than that we can see a straight line between both regions.

Now let's increase the K value even more.

As we increase our k we can notice the increase in overfitting of data. We can easily see many blue points getting classified as red and vice versa.

This decision is even worse than the previous one where we took K=30 as in this case we have so much underfitting.

Now let's increase the k value even more.

Here when we take k=50 we can we get a decision plane which fully divides the plane into two parts. We can see 2 fully separate Planes.

We can see here if there are a bunch of blue points together in the red area, but still, they don't have much effect on the decision boundary.

At x=6 there are a bunch of blue points but still, they are classified as red and any point lying there in future will also be classified as red but there is a high chance of that point to be blue.

Underfitting can be easily observed here. And we can see many points being misclassified.

This is how the decision plane changes as we change our k value, so it is the most important thing to do while using KNN.

Credits to AAIC(https://www.appliedaicourse.com/course/11/applied-machine-learning-online-course).

--

--