[Deployment] 4.4 Third Party Machine Learning Pipeline: Leveraging the power of Scikit-Learn
in Development on API
Third Party Machine Learning Pipeline: Leveraging the power of Scikit-Learn
4.4 Third Party Machine Learning Pipeline: Leveraging the power of Scikit-Learn
Scikit-Learn and sklearn pipeline
: Scikit-Learn is a Python library that provides a solid implementation of a range of machine learning algorithms.
Scikit-Learn Objects
- Transformers - class that have fit and transform method, it transforms data
- Scalers
- Feature selectors
- One hot encoders
- Predictor - class that has fit and predict methods, it fits and predicts
- Any ML algorithm like lasso, decision trees, svm, etc
- Pipeline - class that allows you to list and run transformers and predictors in sequence
- All steps should be transformers except the last one
- Last step should be a predictor
Scikit-Learn and sklearn pipline : 장단점
Advantages
- Can be tested, versioned, tracked and controlled
- Can build future models on top
- Good software developer practice
- Leverages the power of acknowledged API
- Data scientists familiar with Pipeline use, reduced over-head
- Engineering steps can be packaged and re-used in future ML models
Disadvantages
- Requires team of software deveploper to build and maintain
- Overhead for software develpers to familiarise with code for sklearn API => difficulties debugging