k nearest neighbors with well defined k value
so let us understand how we can chooose the perfect k for our model
from the last model i had prepared a function
def regression(model):
x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.2)
reg_all=model
reg_all.fit(x_train,y_train)
y_predict=reg_all.predict(x_test)
rmse_value=np.sqrt(mean_squared_error(y_test,y_predict))
print("rms error={}".format(rmse_value))
i have prepared cross value squared to get the mean of rmse where k=3 denote the mean of three iterated value of rmse
Lasso is a way to conterect over fitting (we can also use ridge) to check.
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Lasso
def regression_cv(model,k=3):
scores=cross_val_score(model,x,y,scoring='neg_mean_squared_error',cv=k)
rmse=np.sqrt(-scores)
print('reg rmse:',rmse)
print('reg mean:',rmse.mean())
importing knnn and useing the function KNregressor
from sklearn.neighbors import KNeighborsRegressor
regression(KNeighborsRegressor())
by default k value will be 5, we can change it to 6, 7 and check the score.
regression_cv(KNeighborsRegressor(n_neighbors=10))
.gridsearchCV is used to get the k value from 1 to 20 from 20 diffrent number of interval to count.
from sklearn.model_selection import GridSearchCV
neighbors=np.linspace(1,20,20)
k=neighbors.astype(int)
parm_grid={'n_neighbors':k}
knn=KNeighborsRegressor()
knn_tuned=GridSearchCV(knn,parm_grid,cv=5,scoring='neg_mean_squared_error')
knn_tuned.fit(x,y)
printing the best k and score value
k=knn_tuned.best_params_
print("best n_neighbors={}".format(k))
score=knn_tuned.best_score_
rsm=np.sqrt(-score)
print("best score={}".format(rsm))
hears the output
Comments
Post a Comment