Posts

Showing posts from September, 2020

Project 3:Movie Recomendation using python

Image
Basicly there are two types of recomendation system  content based filtering colaborative filtering you can google out this and get to know about it i am here basicly to put down the projects and code so all what we do is predict the movie for the  customer using the angular distance (we can calculate using two method ie. euclidian diatance and angular distance) according to the problem you have to decide which one will be suitable for the project i think this much description is ell and good lets cary on with the codes so are you ready? this is the basic code to get the matrix of the text realation ,i mean the realtion between the words in  two sentence. from  sklearn.feature_extraction.text  import  CountVectorizer from  sklearn.metrics.pairwise  import  cosine_similarity text=[ "London Paris London" , "Paris Paris London" ] cv=CountVectorizer() cv_matrix=cv.fit_transform(text) #print(cv_matrix.toarray()) similirity_scores= cosine_similarity(cv_matrix) print (simi

Project 2: pdf extractor using python

Image
let us prepare a project of few codes to extract the whole pdf    import pip install PyPDF2 code: ............................................................................................................................................. from PyPDF2 import PdfFileReader #read a pdf file ie. by rb mode file=open("Handbook.pdf",'rb') #reader ia s variable use to read file  reader=PdfFileReader(file) #lets get the info of the pdf document print("document info:",reader.getDocumentInfo()) print() #getNumPages() this comand can get you page numbers of pdf print("number pf pages are:",reader.getNumPages()) #lets take variable "pages" to take comand over get number of pages pages=reader.getNumPages() for i in range(0,pages):     print("page number=",i+1)     pageObj = reader.getPage(i)     print(pageObj.extractText()) print() print(reader.getDocumentInfo().creator) file.close() .........................................................

k nearest neighbors with well defined k value

Image
 so let us understand how we can chooose the perfect k for our model  from the last model i had prepared a function def regression(model):     x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.2)     reg_all=model     reg_all.fit(x_train,y_train)     y_predict=reg_all.predict(x_test)     rmse_value=np.sqrt(mean_squared_error(y_test,y_predict))     print("rms error={}".format(rmse_value)) i have prepared cross value squared to get the mean of rmse where k=3 denote the mean of three iterated value of rmse Lasso is a way to conterect over fitting (we can also use ridge) to check. from sklearn.model_selection import cross_val_score from sklearn.linear_model import Lasso def regression_cv(model,k=3):     scores=cross_val_score(model,x,y,scoring='neg_mean_squared_error',cv=k)     rmse=np.sqrt(-scores)     print('reg rmse:',rmse)     print('reg mean:',rmse.mean()) importing knnn and useing the function KNregressor from sklearn.neighbors import K

Packt publication linear regression

code: ''' step1: import all the required packages step2: read the csv file and deop the null value step3: declare x & y ie. independent and target vallue step4:split train and test values from the data set step5:call Linear regression and fit x & y train value step6: predict y for our x value from test value and find rmse value rmse value can be usefull to undersand the effeciency of our model ''' import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split In [20]: df_housing . head () Out[20]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV 0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 35.3 396.9 4.98 24.00 1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.60 2 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.61 3 0.02731 0.0 7.07 0 0.4

image editor using python

Image
install packages like opencv & numpy    import  cv2 import  numpy  as  np num_down= 2 num_bilaterial = 7 img_rgb =cv2.imread( "animesh.jpg" ) print (img_rgb.shape) img_rgb=cv2.resize(img_rgb,( 400 , 400 )) #downsampling, bilaterialfilter  img_color=img_rgb for  _  in   range (num_down):     img_color=cv2.pyrDown(img_color) for  _  in   range (num_bilaterial):     img_color=cv2.bilateralFilter(img_color,  d = 9 , sigmaColor = 9 , sigmaSpace = 7 ) for  _  in   range (num_down):     img_color=cv2.pyrUp(img_color) #editing tools  #image to gray scale img_gray= cv2.cvtColor(img_rgb,cv2.COLOR_RGB2GRAY) #bluring the image img_blur= cv2.medianBlur(img_gray, 9 ) #thersholding img_edge= cv2.adaptiveThreshold(img_blur, 255 ,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY, blockSize = 9 ,  C = 6  ) #converting back to color img_edge= cv2.cvtColor(img_edge,cv2.COLOR_GRAY2RGB) #preformimg bitwise AND   img_cartoon= cv2.bitwise_and(img_color,img_edge) #cv2.imshow("cartoon",img_ca

scatterplot/ violon plot /histogram /boxplot

Image
 dataset addresss=[" https://data.gov.uk/dataset/bb3520e6-dd76-46d9-8bdd-86f0a2178be9/organogram-of-staff-roles-salaries/datafile/cb432dfd-a5eb-4eaa-8523-961601d5601b/preview#organogram "] (save dataset as uk_statistic) . uk=pd.read_csv("uk_statistic") .......................boxplot..................................................... x=uk['Salary Cost of Reports (£)'] y=uk['Actual Pay Floor (£)'] plt.boxplot(x) plt.title("UK Sststistics") plt.xlabel('salary cost of Report') plt.ylabel('Actual pay floor') plt.show() .........................................violin plot................................................... x=uk['Salary Cost of Reports (£)'] #y=uk['Actual Pay Floor (£)'] plt.violinplot(x) plt.show ..................................................histogram................................................ title = 'UK Sststistics' plt.figure(figsize=(10,6)) plt.hist(uk['Actual Pay Floor (£

Replacing a nulll value with a mean\0\median and Heatmap

Image
Replacing a nulll value with a mean  df_housing["AGE"]=df_housing["AGE"].fillna(df_housing.mean()) df_housing["AGE"] Replacing a nulll value with a "0" df_housing["AGE"]=df_housing["AGE"].fillna(df_housing.mean()) Replacing a nulll value with a median df_housing["AGE"]=df_housing["AGE"].fillna(df_housing.median()) Correlation Correlation is a statistical measure between -1 and +1 that indicates how closely two variables are related. A correlation of -1 or +1 means that variables are completely dependent, and they fall in a perfectly straight line. A correlation of 0 indicates that an increase in one variable gives no information whatsoever about the other variable. Visually, this would be points all over the place. Correlations usually fall somewhere in the middle.  For instance, a correlation of 0.75 represents a fairly strong relationship, whereas a correlation of 0.25 is a reasonably weak relationship.