TPOT: Tool to create and optimize ML pipelines.

Aditya Kumar
2 min readMar 19, 2024

--

Few days back I had a requirement to choose the best algorithm for a set of data to perform regression. I had read a lot about Auto-Sklearn and decided to use it with Python 3.11. But for some unknown reason I was not able to install it in my machine. This led me to search some other options which can help me to do the same task and came across a Python library called “TPOT”. TPOT stands for Tree based Pipeline Optimization tool.

Let's see how it works.

Install the library.

!pip install tpot

Fig 1.0: Install TPOT library.

import tpot
print(‘tpot: %s’ % tpot.__version__)

Import topt to use it. You can check the version of the library.

Fig 2.0: Import library and check the version.

TOPT can be used both for regression as well as classification. But I had a requirement to work with regression.

from tpot import TPOTRegressor

from sklearn.model_selection import train_test_split

Fig 3.0: Import Totp regressor and train test split
Fig 4.0: Defining y and x

After importing we need to find our target variable and the dependent variables. Then we need to apply test train followed by regression.

Fig 5.0: Best ML algorithm.

We can apply regression and find the best fit. This process takes a huge timing to find out the best pipeline and in our case its giving as DecisionTreeRegressor. For demo purpose I have used very small data set because of which the results are not appropriate.

Finally, we can check the score using.

print(tpot.score(X_test, y_test)).

TPOT helps data scientist to reach to the proper algorithm in a very short span of time.

--

--

Aditya Kumar
Aditya Kumar

Written by Aditya Kumar

Data Scientist with 6 years of experience. To find out more connect with me on https://www.linkedin.com/in/adityakumar529/

No responses yet