Q1. What are Different Types of Machine Learning algorithms?
There are various types of machine learning algorithms. Here is the list of them in a broad category based on:
· Whether they are trained with human supervision (Supervised, unsupervised, reinforcement learning)
· The criteria in the below diagram are not exclusive, we can combine them any way we like.
Q2. What Are the Differences Between Machine Learning and Deep Learning?
Machine Learning | Deep Learning |
· Enables machines to take decisions on their own, based on past data · It needs only a small amount of data for training · Works well on the low-end system, so you don't need large machines · Most features need to be identified in advance and manually coded · The problem is divided into two parts and solved individually and then combined | · Enables machines to take decisions with the help of artificial neural networks · It needs a large amount of training data · Needs high-end machines because it requires a lot of computing power · The machine learns the features from the data it is provided · The problem is solved in an end-to-end manner |
Q3. What Are Unsupervised Machine Learning Techniques?
There are two techniques used in unsupervised learning: clustering and association.
Clustering
Clustering problems involve data to be divided into subsets. These subsets, also called clusters, contain data that are similar to each other.
Different clusters reveal different details about the objects, unlike classification or regression.
Association
In an association problem, we identify patterns of associations between different variables or items.
For example, an e-commerce website can suggest other items for you to buy, based on the prior purchases that you have made, spending habits,
items in your wishlist, other customers’ purchase habits, and so on.
Q4. What is the Difference Between Inductive Machine Learning and Deductive Machine Learning?
Inductive Learning | Deductive Learning |
· It observes instances based on defined principles to draw a conclusion · Example: Explaining to a child to keep away from the fire by showing a video where fire causes damage | · It concludes experiences · Example: Allow the child to play with fire. If he or she gets burned, they will learn that it is dangerous and will refrain from making the same mistake again |
Q5. Why do we perform normalization?
To achieve stable and fast training of the model we use normalization techniques to bring all the features to a certain scale or range of values.
If we do not perform normalization then there are chances that the gradient will not converge to the global or local minima and end up oscillating back and forth.
Q6. What is Supervised Learning?
Supervised learning is a machine learning algorithm of inferring a function from labeled training data. The training data consists of a set of training examples.
Example: 01
Knowing the height and weight identifying the gender of the person. Below are the popular supervised learning algorithms.
· Support Vector Machines
· Regression
· Naive Bayes
· Decision Trees
· K-nearest Neighbour Algorithm and Neural Networks.
Example: 02
If you build a T-shirt classifier, the labels will be “this is an S, this is an M and this is L”, based on showing the classifier examples of S, M, and L.
Q7. What is Unsupervised Learning?
Unsupervised learning is also a type of machine learning algorithm used to find patterns on the set of data given. In this, we don’t have any dependent
variable or label to predict. Unsupervised Learning Algorithms:
· Clustering,
· Anomaly Detection,
· Neural Networks and Latent Variable Models.
Example:
In the same example, a T-shirt clustering will categorize as “collar style and V neck style”, “crew neck style” and “sleeve types”.
Q8. How is KNN different from k-means clustering?
Answer: K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an unsupervised clustering algorithm. While the mechanisms may seem similar at first, what this really means is that in order for K-Nearest Neighbors to work, you need labeled data you want to classify an unlabeled point into (thus the nearest neighbor part). K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to
cluster them into groups by computing the mean of the distance between different points.
The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn’t—and is thus unsupervised learning.
Q9. How machine learning is different from general programming?
In general programming, we have the data and the logic by using these two we create the answers. But in machine learning, we have the data and
the answers and we let the machine learn the logic from them so, that the same logic can be used to answer the questions which will be faced in the future.
Also, there are times when writing logic in codes is not possible so, at those times machine learning becomes a saviour and learns the logic itself.
Q10. What is a Hypothesis in Machine Learning?
A hypothesis is a term that is generally used in the Supervised machine learning domain. As we have independent features and target variables and
we try to find an approximate function mapping from the feature space to the target variable that approximation of mapping is known as a hypothesis.
Q11. What is the difference between precision and recall?
Precision is simply the ratio between the true positives(TP) and all the positive examples (TP+FP) predicted by the model. In other words, precision
measures how many of the predicted positive examples are actually true positives. It is a measure of the model’s ability to avoid false positives and
make accurate positive predictions.
But in the case of a recall, we calculate the ratio of true positives (TP) and the total number of examples (TP+FN) that actually fall in the positive class.
recall measures how many of the actual positive examples are correctly identified by the model. It is a measure of the model’s ability to avoid false negatives
and identify all positive examples correctly.
Q12. Explain the difference between L1 and L2 regularization.
Answer: L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with many variables either being assigned a
1 or 0 in weighting. L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.
Q13. What is Overfitting, and How Can You Avoid It?
The Overfitting is a situation that occurs when a model learns the training set too well, taking up random fluctuations in the training data as concepts.
These impact the model’s ability to generalize and don’t apply to new data.
When a model is given the training data, it shows 100 percent accuracy—technically a slight loss. But, when we use the test data, there may be an error and low efficiency. This condition is known as overfitting.
There are multiple ways of avoiding overfitting, such as:
· Regularization. It involves a cost term for the features involved with the objective function
· Making a simple model. With lesser variables and parameters, the variance can be reduced
· Cross-validation methods like k-folds can also be used
· If some model parameters are likely to cause overfitting, techniques for regularization like LASSO can be used that penalize these parameters
Q14. How Do You Handle Missing or Corrupted Data in a Dataset?
One of the easiest ways to handle missing or corrupted data is to drop those rows or columns or replace them entirely with some other value.
There are two useful methods in Pandas:
· IsNull() and dropna() will help to find the columns/rows with missing data and drop them
· Fillna() will replace the wrong values with a placeholder value
Q15. Explain Machine Learning, Artificial Intelligence, and Deep Learning
It is common to get confused between the three in-demand technologies, Machine Learning, Artificial Intelligence, and Deep Learning.
These three technologies, though a little different from one another,
are interrelated. While Deep Learning is a subset of Machine Learning, Machine Learning is a subset of Artificial Intelligence. Since some terms and
techniques may overlap in these technologies, it is easy to get confused among them.
So, let us learn about these technologies in detail:
· Machine Learning: Machine Learning involves various statistical and Deep Learning techniques that allow machines to use their past experiences
and get better at performing specific tasks without having to be monitored.
· Artificial Intelligence: Artificial Intelligence uses numerous Machine Learning and Deep Learning techniques that enable computer systems to
perform tasks using human-like intelligence with logic and rules. Artificial intelligence is used in every sector hence it is necessary to pursue
Artificial Intelligence Course to make your career in AI.
· Deep Learning: Deep Learning comprises several algorithms that enable software to learn from themselves and perform various business tasks
including image and speech recognition. Deep Learning is possible when systems expose their multilayered neural networks to large volumes of data for learning.
Q16. What are Support Vectors in SVM?
A Support Vector Machine (SVM) is an algorithm that tries to fit a line (or plane or hyperplane) between the different classes that maximizes the distance
from the line to the points of the classes.
In this way, it tries to find a robust separation between the classes. The Support Vectors are the points of the edge of the dividing hyperplane as in the below figure.
Q17. What is Cross-Validation?
Cross-validation is a method of splitting all your data into three parts: training, testing, and validation data. Data is split into k subsets, and the model
has trained on k-1of those datasets.
The last subset is held for testing. This is done for each of the subsets. This is k-fold cross-validation. Finally, the scores from all the k-folds are
averaged to produce the final score.
Q18. How do measure the effectiveness of the clusters?
There are metrics like Inertia or Sum of Squared Errors (SSE), Silhouette Score, l1, and l2 scores. Out of all of these metrics, the Inertia or
Sum of Squared Errors (SSE) and Silhouette score is a common metrics for measuring the effectiveness of the clusters.
Although this method is quite expensive in terms of computation cost. The score is high if the clusters formed are dense and well separated.
Q19. Explain the classification report and the metrics it includes.
Classification reports are evaluated using classification metrics that have precision, recall, and f1-score on a per-class basis.
· Precision can be defined as the ability of a classifier not to label an instance positive that is actually negative.
· Recall is the ability of a classifier to find all positive values. For each class, it is defined as the ratio of true positives to the sum of true positives and false negatives.
· F1-score is a harmonic mean of precision and recall.
· Support is the number of samples used for each class.
· The overall accuracy score of the model is also there to get a high-level review of the performance. It is the ratio between the total number
of correct predictions and the total number of datasets.
· Macro avg is nothing but the average of the metric(precision, recall, f1-score) values for each class.
· The weighted average is calculated by providing a higher preference to that class that was present in the higher number in the datasets.
Q20. What’s a Fourier transform?
Answer: A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this
more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes, and
phases to match any time signal. A Fourier transform converts a signal from time to frequency domain—it’s a very common way to extract features from audio signals or other time series such as sensor data.
Q21. Which is more important to you: model accuracy or model performance?
Answer: Such machine learning interview questions tests your grasp of the nuances of machine learning model performance! Machine learning
interview questions often look towards the details. There are models with higher accuracy that can perform worse in predictive power—how does that make sense?
Well, it has everything to do with how model accuracy is only a subset of model performance, and at that, a sometimes misleading one.
For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict
no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model—a model designed to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that you understand model
accuracy isn’t the be-all and end-all of model performance.
Q22. Explain the Difference Between Classification and Regression?
Classification is used to produce discrete results, classification is used to classify data into some specific categories.
For example, classifying emails into spam and non-spam categories.
Whereas, regression deals with continuous data.
For example, predicting stock prices at a certain point in time.
Classification is used to predict the output into a group of classes.
For example, Is it Hot or Cold tomorrow?
Whereas, regression is used to predict the relationship that data represents.
For example, What is the temperature tomorrow?
Q23. What is a Neural Network?
It is a simplified model of the human brain. Much like the brain, it has neurons that activate when encountering something similar.
The different neurons are connected via connections that help information flow from one neuron to another.
Q24. What is a Decision Tree in Machine Learning?
A decision tree is used to explain the sequence of actions that must be performed to get the desired output. It is a hierarchical diagram that shows the actions.
An algorithm can be created for a decision tree on the basis of the set hierarchy of actions.
In the above decision-tree diagram, a sequence of actions has been made for driving a vehicle with or without a license.
Q25. Explain Logistic Regression
Logistic regression is the proper regression analysis used when the dependent variable is categorical or binary. Like all regression analyses,
logistic regression is a technique for predictive analysis. Logistic regression is used to explain data and the relationship between one dependent
binary variable and one or more independent variables. Logistic regression is also employed to predict the probability of categorical dependent variables.
Logistic regression can be used in the following scenarios:
· To predict whether a citizen is a Senior Citizen (1) or not (0)
· To check whether a person has a disease (Yes) or not (No)
There are three types of logistic regression:
· Binary logistic regression: In this type of logistic regression, there are only two outcomes possible.
Example: To predict whether it will rain (1) or not (0)
· Multinomial logistic regression: In this type of logistic regression, the output consists of three or more unordered categories.
Example: Predicting whether the prize of the house is high, medium, or low.
· Ordinal logistic regression: In this type of logistic regression, the output consists of three or more ordered categories.
Example: Rating an Android application from one to five stars.
Q26. What is meant by Parametric and Non-parametric Models?
Parametric models refer to the models having a limited number of parameters. In case of parametric models, only the parameter of a model
is needed to be known to make predictions regarding the new data.
Non-parametric models do not have any restrictions on the number of parameters, which makes new data predictions more flexible. In case of non-parametric models, the knowledge of model parameters and the state of the data needs to be known to make predictions.
Q27. What is one-shot learning?
One-shot learning is a concept in machine learning where the model is trained to recognize the patterns in datasets from a single example instead
of training on large datasets. This is useful when we haven’t large datasets. It is applied to find the similarity and dissimilarities between the two images.
Q28. What is the difference between covariance and correlation?
As the name suggests, Covariance provides us with a measure of the extent to which two variables differ from each other. But on the other hand,
correlation gives us the measure of the extent to which the two variables are related to each other. Covariance can take on any value while correlation
is always between -1 and 1. These measures are used during the exploratory data analysis to gain insights from the data.
Q29. What’s the “kernel trick” and how is it useful?
Answer: The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products
between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions
while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products.
Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.
Q30. What is Clustering?
Clustering is the process of grouping a set of objects into a number of groups. Objects should be similar to one another within the same cluster
and dissimilar to those in other clusters.
A few types of clustering are:
· Hierarchical clustering
· K means clustering
· Density-based clustering
· Fuzzy clustering, etc
Q31. What is the Central Limit theorem?
This theorem is related to sampling statistics and its distribution. As per this theorem the sampling distribution of the sample means tends to
towards a normal distribution as the sample size increases. No matter how the population distribution is shaped. i.e if we take some sample points
from the distribution and calculate its mean then the distribution of those mean points will follow a normal/gaussian distribution no matter from
which distribution we have taken the sample points.
There is one condition that the size of the sample must be greater than or equal to 30 for the CLT to hold. and the mean of the sample means
approaches the population mean.
Q32. How do check the Normality of a dataset?
Visually, we can use plots. A few of the normality checks are as follows:
· Shapiro-Wilk Test
· Anderson-Darling Test
· Martinez-Iglewicz Test
· Kolmogorov-Smirnov Test
· D’Agostino Skewness Test
Q33. What is P-value?
P-values are used to make a decision about a hypothesis test. P-value is the minimum significant level at which you can reject the null hypothesis. The lower the p-value, the more likely you reject the null hypothesis.
Q34. Discuss the main types of ensemble learning techniques
The main types of ensemble learning techniques are:
1. Bagging: Combines multiple models by averaging (for regression) or voting (for classification), trained on random subsets of the training
data (with replacement). Random Forest is an example of bagging.
2. Boosting: Trains a sequence of models iteratively, with each model learning from the errors of its predecessor, aiming to improve the overall performance.
Gradient Boosted Trees and AdaBoost are examples of boosting methods.
3. Stacking: Trains multiple models on the same data and uses the predictions from these models as inputs to another model, called the meta-model,
to make the final prediction.
Q35. Describe the main challenges associated with working with imbalanced datasets
Imbalanced datasets are characterized by having a significantly larger number of samples in one class than in others. Challenges associated with imbalanced datasets include:
1. Poor performance on minority class: Most machine learning algorithms optimize for overall accuracy, so they tend to perform poorly on the minority
class due to their bias towards the majority class.
2. Inappropriate evaluation metrics: Accuracy may not be an appropriate performance metric for imbalanced datasets, as it might produce high accuracy
even with a poor model. Alternative metrics like precision, recall, F1-score, and the area under the ROC curve should be considered.
Q36. What Are the Applications of Supervised Machine Learning in Modern Businesses?
Applications of supervised machine learning include:
· Email Spam Detection
Here we train the model using historical data that consists of emails categorized as spam or not spam. This labeled information is fed as input to the model.
· Healthcare Diagnosis
By providing images regarding a disease, a model can be trained to detect if a person is suffering from the disease or not.
· Sentiment Analysis
This refers to the process of using algorithms to mine documents and determine whether they’re positive, neutral, or negative in sentiment.
· Fraud Detection
Q37. Explain the working principle of SVM.
A data set that is not separable in different classes in one plane may be separable in another plane. This is exactly the idea behind the SVM in this a low
dimensional data is mapped to high dimensional data so, that it becomes separable in the different classes. A hyperplane is determined after mapping the data
into a higher dimension which can separate the data into categories. SVM model can even learn non-linear boundaries with the objective that there should be as much margin as possible between the categories in which the data has been categorized.
To perform this mapping different types of kernels are used like radial basis kernel, gaussian kernel, polynomial kernel, and many others.
Q38. What happens to the mean, median, and mode when your data distribution is right skewed and left skewed?
In the case of a left-skewed distribution also known as a positively skewed distribution mean is greater than the median which is greater than the mode. But in the case of left-skewed distribution, the scenario is completely reversed.
Right Skewed Distribution
Mode < Median < Mean
Right Skewed Distribution
Left Skewed Distribution,
Q39. How does transfer learning work?
Transfer learning leverages a pre-trained model, often on a large dataset, to solve a similar, potentially smaller-scale problem. The pre-trained model's weights are fine-tuned on the target task using a smaller learning rate, allowing it to adapt to the specific domain without overwriting the generalized learned features.
Transfer learning allows for faster convergence and better performance with limited data.
Q40. Explain the main differences between reinforcement learning (RL) and supervised learning
In supervised learning, a labeled dataset is provided, and the goal is to learn a mapping from input features to the target labels. In reinforcement learning,
an agent interacts with an environment to learn optimal actions and decisions based on receiving feedback in the form of rewards or penalties. In RL, there
is no explicit guidance or correct action to be taken, and the agent learns through trial and error, refining its policy over time to maximize the cumulative reward.
Q41. Compare K-means and KNN Algorithms.
K-means | KNN |
· K-Means is a clustering algorithm · The points in each cluster are similar to each other, and each cluster is different from its neighboring clusters |
· KNN is a classification algorithm · It classifies an unlabeled observation based on its K (can be any number) surrounding neighbors |
Q42. How Do You Design an Email Spam Filter?
Building a spam filter involves the following process:
· The email spam filter will be fed with thousands of emails
· Each of these emails already has a label: ‘spam’ or ‘not spam.’
· The supervised machine learning algorithm will then determine which type of emails are being marked as spam based on spam words like the lottery, free offer, no money, full refund, etc.
· The next time an email is about to hit your inbox, the spam filter will use statistical analysis and algorithms like Decision Trees and SVM to determine how likely the email is spam
· If the likelihood is high, it will label it as spam, and the email won’t hit your inbox
· Based on the accuracy of each model, we will use the algorithm with the highest accuracy after testing all the models
Q43. What is a Random Forest?
A ‘random forest’ is a supervised machine learning algorithm that is generally used for classification problems. It operates by constructing multiple
decision trees during the training phase. The random forest chooses the decision of the majority of the trees as the final decision.
Q44. What is a radial basis function? Explain its use.
RBF (radial basis function) is a real-valued function used in machine learning whose value only depends upon the input and fixed point called the center. The formula for the radial basis function is as follows:
Machine learning systems frequently use the RBF function for a variety of functions, including:
· RBF networks can be used to approximate complex functions. By training the network’s weights to suit a set of input-output pairs,
· RBF networks can be used for unsupervised learning to locate data groups. By treating the RBF centers as cluster centers,
· RBF networks can be used for classification tasks by training the network’s weights to divide inputs into groups based on how far from the RBF nodes they are.
It is one of the very famous kernels which is generally used in the SVM algorithm to map low dimensional data to a higher dimensional plane so,
we can determine a boundary that can separate the classes in different regions of those planes with as much margin as possible.
Q45. What is F1 score? How would you use it?
Let’s have a look at this table before directly jumping into the F1 score.
Prediction | Predicted Yes | Predicted No |
Actual Yes | True Positive (TP) | False Negative (FN) |
Actual No | False Positive (FP) | True Negative (TN) |
In binary classification we consider the F1 score to be a measure of the model’s accuracy. The F1 score is a weighted average of precision and recall scores.
F1 = 2TP/2TP + FP + FN
We see scores for F1 between 0 and 1, where 0 is the worst score and 1 is the best score.
The F1 score is typically used in information retrieval to see how well a model retrieves relevant results and our model is performing.
Q46. Define Precision and Recall.
Precision
Precision is the ratio of several events you can correctly recall to the total number of events you recall (mix of correct and wrong recalls).
Precision = (True Positive) / (True Positive + False Positive)
Recall
A recall is the ratio of the number of events you can recall the number of total events.
Recall = (True Positive) / (True Positive + False Negative)
Q47. Briefly Explain Logistic Regression.
Logistic regression is a classification algorithm used to predict a binary outcome for a given set of independent variables.
The output of logistic regression is either a 0 or 1 with a threshold value of generally 0.5. Any value above 0.5 is considered as 1, and any point below 0.5 is considered as 0.
Q48. Explain Correlation and Covariance?
Correlation: Correlation tells us how strongly two random variables are related to each other. It takes values between -1 to +1.
Formula to calculate Correlation:
Covariance: Covariance tells us the direction of the linear relationship between two random variables. It can take any value between - ∞ and + ∞.
Formula to calculate Covariance:
Q49. What is Cross-Validation?
Cross-Validation in Machine Learning is a statistical resampling technique that uses different parts of the dataset to train and test a machine
learning algorithm on different iterations. The aim of cross-validation is to test the model’s ability to predict a new set of data that was not used to train the model. Cross-validation avoids the overfitting of data.
K-Fold Cross Validation is the most popular resampling technique that divides the whole dataset into K sets of equal sizes.
Q50. What is KNN Imputer?
We generally impute null values by the descriptive statistical measures of the data like mean, mode, or median but KNN Imputer is a more sophisticated
method to fill the null values. A distance parameter is also used in this method which is also known as the k parameter. The work is somehow similar
to the clustering algorithm. The missing value is imputed in reference to the neighborhood points of the missing values.
0 comments