Moebius

There is a very rich ecosystem of Python libraries related to ML. Here is a list of some “active”, open-source packages that may be useful for ML day-to-day activities.

This post is following these ones:

(☞゚ヮ゚)☞

Online learning

  • creme - a library for online machine learning, also known as incremental learning. Online learning is a machine learning regime where a model learns one observation at a time. This is in contrast to batch learning where all the data is processed in one go. Incremental learning is desirable when the data is too big to fit in memory, or simply when you want to handle streaming data. In addition to many online machine learning algorithms, creme provides utilities for extracting features from a stream of data.

Auto-ML

  • AutoGluon - enables easy-to-use and easy-to-extend AutoML with a focus on deep learning and real-world applications spanning image, text, or tabular data.

Time series

  • GluonTS - a toolkit for probabilistic time series modeling, built around Apache MXNet. GluonTS provides utilities for loading and iterating over time series datasets, state of the art models ready to be trained, and building blocks to define your own models and quickly experiment with different solutions.
  • ADTK - Anomaly Detection Toolkit (ADTK) is a package for unsupervised / rule-based time series anomaly detection.
  • Pmdarima - a statistical library designed to fill the void in Python’s time series analysis capabilities, including the equivalent of R’s auto.arima function. .

General machine learning & statistics

  • pomegranate - pomegranate is a package for building probabilistic models in Python that is implemented in Cython for speed. A primary focus of pomegranate is to merge the easy-to-use API of scikit-learn with the modularity of probabilistic modeling to allow users to specify complicated models without needing to worry about implementation details. The models implemented here are built from the ground up with big data processing in mind and so natively support features like multi-threaded parallelism and out-of-core processing.
  • Mlxtend - a library of extension and helper modules for Python’s data analysis and machine learning libraries (initiated and maintained by Sebastian Raschka).
  • Pyglmnet - a python implementation of elastic-net regularized generalized linear models.

REST API endpoints

  • FastAPI - a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.

Conversational framework

  • Rasa - a machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contexual assistants on Facebook Messenger, Slack, Microsoft Bot Framework, Rocket.Chat, Mattermost, Telegram, Twilio, your own custom conversational channels… or voice assistants as: Alexa Skills, Google Home Actions.

NLP

  • Flair - a very simple framework for state-of-the-art NLP. Developed by Zalando Research.
  • Snips NLU - Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natural language.
  • FastText - an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
  • Kashgari - a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.

Graphs

  • DGL - a package built to ease deep learning on graph, on top of existing DL frameworks.
  • Higra - a C++/Python library for efficient sparse graph analysis with a special focus on hierarchical methods.

Data summarization

  • apricot - a package for the greedy selection of diverse subsets of data from massive data sets using submodular selection.

Music

  • Spleeter - Spleeter is the Deezer source separation library with pretrained models written in Python and uses Tensorflow. It makes it easy to train source separation model (assuming you have a dataset of isolated sources), and provides already trained state of the art model for performing various flavour of separation.

Workflows

  • Metaflow - a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.

I like this figure from MetaFlow’s documentation about its purpose:

MetaFlow

Containers

  • repo2docker - fetches a git repository and builds a container image based on the configuration files found in the repository.

Privacy

  • CrypTen - a framework for Privacy Preserving Machine Learning built on PyTorch. Its goal is to make secure computing techniques accessible to Machine Learning practitioners.
  • PySyft - a library for secure and private Deep Learning. PySyft decouples private data from model training, using Federated Learning, Differential Privacy, and Multi-Party Computation (MPC) within the main Deep Learning frameworks like PyTorch and TensorFlow.

Satellite imagery

  • Raster Vision - a framework for building computer vision models on satellite, aerial, and other large imagery sets (including oblique drone imagery).
  • eo-learn - earth observation processing framework for machine learning in Python.
  • sentinelhub - download and process satellite imagery in Python scripts using Sentinel Hub services.
  • sentinelsat - makes searching, downloading and retrieving the metadata of Sentinel satellite images from the Copernicus Open Access Hub easy.

Scraper

  • twitterscraper - a simple script to scrape for Tweets using the Python package requests to retrieve the content and Beautifulsoup4 to parse the retrieved content..

JupyterLab

  • nbgather - tools for cleaning code, recovering lost code, and comparing versions of code in Jupyter Lab (this is actually not a Python package but a JupyterLab extension).

Miscellaneous tools

  • Knock Knock - a small library to get a notification when your training is complete or when it crashes during the process with two additional lines of code.
  • perfplot - perfplot extends Python’s timeit by testing snippets with input parameters (e.g., the size of an array) and plotting the results.