TPOT: Tool to create and optimize ML pipelines.
Few days back I had a requirement to choose the best algorithm for a set of data to perform regression. I had read a lot about Auto-Sklearn and decided to use it with Python 3.11. But for some unknown reason I was not able to install it in my machine. This led me to search some other options which can help me to do the same task and came across a Python library called “TPOT”. TPOT stands for Tree based Pipeline Optimization tool.
Let's see how it works.
Install the library.
!pip install tpot
import tpot
print(‘tpot: %s’ % tpot.__version__)
Import topt to use it. You can check the version of the library.
TOPT can be used both for regression as well as classification. But I had a requirement to work with regression.
from tpot import TPOTRegressor
from sklearn.model_selection import train_test_split
After importing we need to find our target variable and the dependent variables. Then we need to apply test train followed by regression.
We can apply regression and find the best fit. This process takes a huge timing to find out the best pipeline and in our case its giving as DecisionTreeRegressor. For demo purpose I have used very small data set because of which the results are not appropriate.
Finally, we can check the score using.
print(tpot.score(X_test, y_test)).
TPOT helps data scientist to reach to the proper algorithm in a very short span of time.