Some cool open-source Python packages for Machine Learning Ep 1
There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities. Of course, this list is far from being exhaustive and should evolve as fast as the Python ecosystem does. Also, we exclude from the current list:
- main ML algorithm frameworks (scikit-learn, LightGBM, PyTorch, …),
- famous user-friendly libraries built on top of deep-learning libraries (fastai, Keras, …),
- specific application-oriented libraries (spaCy, scikit-image, StellarGraph, …),
- packages dealing with the general data/analytics environment (JupyterLab, Pandas, Dask, Conda, …) that are also used in many other domains, even if some of the following tools are more on the data-engineering side than on the ML one.
We hope you will find this list informative!
Data cleaning
- Pyjanitor - a clean API for cleaning data.
Auto-ML
- Featuretools - a library for automated feature engineering.
- TPOT - an automated tool that optimizes ML pipelines using genetic programming.
- Scikit-Optimize - a simple and efficient library to minimize expensive and noisy black-box functions.
- Randopt - a package for ML experiment management, hyper-parameter optimization, and results visualization.
- Optuna - an automatic hyper-parameter optimization software framework, particularly designed for ML.
Dimension reduction and visualization
- UMAP - Uniform Manifold Approximation and Projection is a dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction.
Model analysis
- ELI5 - a library which allows to visualize and debug various ML models using unified API.
- Yellowbrick - a suite of visual diagnostic tools called “Visualizers” that extend the scikit-learn API to allow human steering of the model selection process.
- SHAP - SHapley Additive exPlanations is a unified approach to explain the output of any ML model.
Experimentation frameworks and tools
- Guild AI - a toolkit that automates and optimizes ML experiments.
- ModelChimp - an experiment tracker for Deep Learning and ML experiments.
- Sacred - a tool to help you configure, organize, log and reproduce experiments.
- SKLL - SciKit-Learn Laboratory provides command-line utilities to make it easier to run ML experiments with scikit-learn.
- DVC - Data Version Control is a tool for data science and ML projects.
Model export
- ONNXMLTools - enables you to convert models from different ML toolkits into ONNX (Open Neural Network Exchange)
Worflows
- MLflow - a platform to streamline ML development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models.
- Kubeflow - a Cloud Native platform for ML based on Google’s internal ML pipelines.