Posts

Showing posts from May, 2021

what is good dataset?

Image
    First things first, we need  to learn how to identify good data set! .I can make it easy for you  acronyms:                                                                            R-O-C-C-C                                                                                                                        R for reliable.       Like a good friend, good data sources are reliable. With this data you can  trust that you're getting accurate,  complete and unbiased information  that's been vetted and proven fit for use. O is for original.       There's a good chance you'll discover data  through a second or third party source.  To make sure you're dealing with good data,  be sure to validate it with the original source. C is for comprehensive.       The best data sources contain  all critical information needed to  answer the question or find the solution.  Think about it like this.  You wouldn't want to work for a company just because you  found one great

image clustering technique using Kmeans

Image
 Kmeans algorithm is widely used to cluster image i.e grouping the image as per the color.  Kmeans basically use the technique to form a cluster by making decision boundry the code below will help you to get the proper grip on the idea. code import numpy as np import cv2 import matplotlib.pyplot as plt original_image = cv2.imread("/content/sample_data/ocen.png") original_image this few lines is importing essential lib. and uploading the image to see the cluster image of it. img=cv2.cvtColor(original_image,cv2.COLOR_BGR2RGB) #Next, converts the MxNx3 image into a Kx3 matrix where K=MxN and each row is now a vector in the 3-D space of RGB. vectorized = img.reshape(( -1 , 3 )) #We convert the unit8 values to float as it is a requirement of the k-means method of OpenCV. vectorized = np.float32(vectorized) criteria is used for specific comand criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER,  10 ,  1.0 ) K =  6 attempts= 10 ret,label,center=cv2.kmeans(vectorized,K, N