Sign in

Working with the intent to make it big in the Data Science community. Connect on Instagram @sandy31_03

A detailed post about the mathematical foundations of linear regression and how to use it to improve the goodness of fit.

Simple and sophisticated methods are often under-valued when trying to solve complex problems. This story is intended to show how linear regression is still very relevant, and how we can improve the performance of these algorithms, and become better machine learning and data science engineers.

As a fresher in the field of machine learning, the first thing that you learn would be simple univariate linear regression. However, for the past decade or so, tree-based algorithms and neural networks have overshadowed the significance of linear regression on a commercial scale. …

A machine learning model is only as good as the features that it is trained on. But how do we find the best features for the problem statement? This is where feature selection comes in.

In this article, we will be exploring various feature selection techniques that we need to be familiar with, in order to get the best performance out of your model.

  • SelectKBest
  • Linear Regression
  • Random Forest
  • XGBoost
  • Recursive Feature Elimination
  • Boruta


SelectKbest is a method provided by sklearn to rank features of a dataset by their “importance ”with respect to the target variable. This “importance” is calculated using a score function which can be one of the following:

  • f_classif: ANOVA F-value between label/feature for classification tasks
  • f_regression: F-value between label/feature for regression tasks.
  • chi2: Chi-squared stats of non-negative features for classification tasks.

Random Forests are one of the most powerful algorithms that every data scientist or machine learning engineer should have in their toolkit. In this article, we will take a code-first approach towards understanding everything that sklearn’s Random Forest has to offer!

Decision Trees

To understand Random Forest, it is essential to understand what they are made from. Decision trees are the foundational building blocks of all tree-based algorithms. Every other tree-based algorithm is a sophisticated ensemble of decision trees. Thus understanding the aspects of decision trees would be a good place to start.

Every data science aspirant, who is new to the field, has an arsenal of projects that are left untouched on their own desktops. How about we get them up on the internet??

All the code and the screenshots used in this article are from a personal project that I worked on, earlier this year. The code for the GitHub repo is linked here and the deployed model is linked here


First, you need to get streamlit installed on your system, or on the virtual environment where you’re working on this project.

If you do not have streamlit installed, open the command prompt and type:

 pip install streamlit

Once you have streamlit installed, you should check out the official documentation of streamlit to familiarize yourself with the wide range of widgets provided by…

Why, When, and How to start your first data-science/machine learning project

When should I start my first project?

The question that every data science/machine learning aspirant comes across at least once, while they are relatively new to this field is that

Is it too early to start my own project? What more do I need to learn before I start working on my own project?

The answer to this question varies from person to person but a general rule of thumb is that once you feel comfortable with your command over a few fundamental subtopics of machine learning, you’re good to go! It’s never too early. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store