Naive Bayes is a probabilistic machine learning algorithm. It is used widely to solve the classification problem. In addition to that this algorithm works perfectly in natural language problems (NLP).

About Bayes Theorem

The Naive Bayes algorithm is based on the Bayes theorem. Let’s see what the theorem explains.


The Support Vector Machine is one of the most popular supervised machine learning algorithms. Which is used for classification as well as regression. The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.


SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called support vectors, and hence algorithm is termed as Support Vector Machine.



Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram. Sometimes the results of K-means clustering and hierarchical clustering may look similar, but they both differ depending on how they work. As there is no requirement to predetermine the number of clusters as we did in the K-Means algorithm.

The hierarchical clustering technique has two approaches:

  • Agglomerative


K-Means Clustering is an unsupervised machine learning algorithm which is used to solve the clustering problems in the machine learning. In real-world scenarios, the unlabelled data that might be exists to solve problems. In such cases, the K-means algorithm plays a vital role to solve the problem. Whereby taking unlabelled data dividing into subgroups. Where groups can be termed as a cluster. The grouping can be done by data having similar size, shape, measure, characteristic. Finally, group the data in a separate cluster. Here come K values, how many clusters need to create. There is a separate technique to…


K-nearest neighbors (KNN) is a type of supervised learning algorithm used for both regression and classification. KNN tries to predict the correct class for the test data by calculating the distance between the test data and all the training points. Then select the K number of points which is closet to the test data. The KNN algorithm calculates the probability of the test data belonging to the classes of ‘K’ training data and class holds the highest probability will be selected. In the case of regression, the value is the mean of the ‘K’ selected training points.

Let see…

Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.

As the name suggests, “Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset.” …

A decision tree is a type of supervised machine learning model. which can be used for both regression and classification as well. It is one of the most powerful models used for the complex dataset. The decision tree coupled with an ensemble technique. The process of dividing the data set in the tree-structured format based on some rules and condition and finally provide the prediction based on the condition. Let’s see the simple example start with.

Logistic regression is one of the machine learning model used for supervised learning. It is used to predict the categorical dependent variable using a given set of the independent variable. This regression provides the output as 0 and 1, True or False, Yes or No. Linear regression deals with the classification problem.


The regression calculates the probability of given value belongs to the specific class. If the probability is >50%, the value assigned to that class. If the probability is<50%, the value is assigned to another subset of the class.

Where to apply the model?

From the above three…


Generally occurs a high correlation between two or more independent variables. This can be implied widely in the regression model. In realtime, the data might be having collinearity properties. Before applying any model the collinearity needs to be rectified or else it will lead to a false result and lesser accuracy.

Consider daily activities to explain better about multicollinearity.Tom usually likes sweet. He enjoyed sweet while watching television. How can we determine Tom happiness rating? This can be raised in two ways while watching TV and eating sweets. Those two variables are correlated to one another. …

Multiple linear regression refers to a statistical technique that is used to predict the outcome of a variable based on the value of two or more variables. It is sometimes known simply as multiple regression, and it is an extension of linear regression. The variable that we want to predict is known as the dependent variable, while the variables we use to predict the value of the dependent variable are known as independent or explanatory variables.

Multiple linear regression is used to estimate the relationship between two or more independent variables and one dependent variable. …

Antony Christopher

Data Science and Machine Learning enthusiast | Software Architect | Full stack developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store