[Deployment] 4.4 Third Party Machine Learning Pipeline: Leveraging the power of Scikit-Learn


Third Party Machine Learning Pipeline: Leveraging the power of Scikit-Learn


4.4 Third Party Machine Learning Pipeline: Leveraging the power of Scikit-Learn

Scikit-Learn and sklearn pipeline

: Scikit-Learn is a Python library that provides a solid implementation of a range of machine learning algorithms.

Scikit-Learn Objects

  • Transformers - class that have fit and transform method, it transforms data
    • Scalers
    • Feature selectors
    • One hot encoders
  • Predictor - class that has fit and predict methods, it fits and predicts
    • Any ML algorithm like lasso, decision trees, svm, etc
  • Pipeline - class that allows you to list and run transformers and predictors in sequence
    • All steps should be transformers except the last one
    • Last step should be a predictor

Scikit-Learn and sklearn pipline : 장단점

Advantages

  • Can be tested, versioned, tracked and controlled
  • Can build future models on top
  • Good software developer practice
  • Leverages the power of acknowledged API
  • Data scientists familiar with Pipeline use, reduced over-head
  • Engineering steps can be packaged and re-used in future ML models

Disadvantages

  • Requires team of software deveploper to build and maintain
  • Overhead for software develpers to familiarise with code for sklearn API => difficulties debugging


코드 참고




© 2018. by statssy

Powered by statssy