Moebius

There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities.

This post is following these ones:

(☞゚ヮ゚)☞

Feature engineering

  • category_encoders - a set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques.

Time series

  • TSFRESH - automatically extracts 100s of features from time series. Those features describe basic characteristics of the time series such as the number of peaks, the average or maximal value or more complex features such as the time reversal symmetry statistic. To avoid extracting irrelevant features, the TSFRESH package has a built-in filtering procedure. This filtering procedure evaluates the explaining power and importance of each characteristic for the regression or classification tasks at hand.
  • matrixprofile-ts - a library for detecting patterns and anomalies in massive datasets using the Matrix Profile.
  • sktime - a scikit-learn compatible toolbox for learning with time series data.

Auto-ML

  • Hyperopt - a library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
  • ray[tune] - a library for hyperparameter tuning at any scale.
  • MLBox - a powerful Automated Machine Learning library.
  • Chocolate - a completely asynchronous hyperparameter optimisation framework relying solely on a database to share information between workers.
  • ATM - a multi-tenant, multi-data system for automated machine learning (model selection and tuning).
  • HpBandSter - a distributed Hyperband implementation on Steroids.
  • SHERPA - a library for hyperparameter tuning of machine learning models.
  • HyperparameterHunter - automatically save and learn from Experiment results, leading to long-term, persistent optimization that remembers all your tests.

EDIT: here is a very nice post that compares Hyperopt with Optuna, mentioned in EP 1: https://neptune.ai/blog/optuna-vs-hyperopt

Experimentation frameworks and tools

  • Ax - an accessible, general-purpose platform for understanding, managing, deploying, and automating adaptive experiments.

Workflows

  • sagify - a command-line utility to train and deploy Machine Learning/Deep Learning models on AWS SageMaker in a few simple steps!
  • Kale - a package that aims at automatically deploy a general purpose Jupyter Notebook as a running Kubeflow Pipelines instance, without requiring the use the specific KFP DSL.

Computing

  • horovod - a distributed training framework for TensorFlow, Keras, PyTorch, and MXNet. The goal of Horovod is to make distributed Deep Learning fast and easy to use.
  • jupyterlab-nvdashboard - A JupyterLab extension for displaying dashboards of GPU usage.

Model analysis

  • tf-explain - implements interpretability methods as Tensorflow 2.0 callbacks to ease neural network’s understanding.

App framework

  • Streamlit - lets you create apps for your machine learning projects with deceptively simple scripts. It supports hot-reloading, so your app updates live as you edit and save your file. No need to mess with HTTP requests, HTML, JavaScript, etc. All you need is your favorite editor and a browser.
  • Voila - from Jupyter notebooks to standalone web applications and dashboards .

Data visualization

  • Chartify - a library that makes it easy for data scientists to create charts.

Music

  • LibROSA - a package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems.

Images

  • imagededup - a package that simplifies the task of finding exact and near duplicates in an image collection.