k nearest neighbors with well defined k value

so let us understand how we can chooose the perfect k for our model

from the last model i had prepared a function

def regression(model):

x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.2)

reg_all=model

reg_all.fit(x_train,y_train)

y_predict=reg_all.predict(x_test)

rmse_value=np.sqrt(mean_squared_error(y_test,y_predict))

print("rms error={}".format(rmse_value))

i have prepared cross value squared to get the mean of rmse where k=3 denote the mean of three iterated value of rmse

Lasso is a way to conterect over fitting (we can also use ridge) to check.

from sklearn.model_selection import cross_val_score

from sklearn.linear_model import Lasso

def regression_cv(model,k=3):

scores=cross_val_score(model,x,y,scoring='neg_mean_squared_error',cv=k)

rmse=np.sqrt(-scores)

print('reg rmse:',rmse)

print('reg mean:',rmse.mean())

importing knnn and useing the function KNregressor

from sklearn.neighbors import KNeighborsRegressor

regression(KNeighborsRegressor())

by default k value will be 5, we can change it to 6, 7 and check the score.

regression_cv(KNeighborsRegressor(n_neighbors=10))

.gridsearchCV is used to get the k value from 1 to 20 from 20 diffrent number of interval to count.

from sklearn.model_selection import GridSearchCV

neighbors=np.linspace(1,20,20)

k=neighbors.astype(int)

parm_grid={'n_neighbors':k}

knn=KNeighborsRegressor()

knn_tuned=GridSearchCV(knn,parm_grid,cv=5,scoring='neg_mean_squared_error')

knn_tuned.fit(x,y)

printing the best k and score value

k=knn_tuned.best_params_

print("best n_neighbors={}".format(k))

score=knn_tuned.best_score_

rsm=np.sqrt(-score)

print("best score={}".format(rsm))

hears the output

hobbyist