What is LakeXpress?

LakeXpress is a CLI tool that exports database tables to partitioned Parquet files on local disk or cloud storage (AWS S3, GCS, Azure) and registers them as tables in Snowflake, Databricks, or other catalogs. It uses FastBCP to stream data in parallel without exhausting memory.

Key features

  • Cross-platform: Native binaries for Windows and Linux
  • Multiple database sources: Oracle, PostgreSQL, SQL Server, MySQL/MariaDB
  • Multiple cloud targets: AWS S3, S3-compatible, GCS, Azure, or local disk
  • Catalog publishing: Snowflake, AWS Glue, Databricks, Microsoft Fabric, BigQuery, MotherDuck, DuckLake
  • Incremental sync: Watermark-based delta exports
  • Parallel exports: Multiple tables at once, with per-table partitioning
  • Schema filtering: Include/exclude schemas and tables via SQL patterns
  • Resume on failure: Pick up where a failed export left off
  • CDM metadata: Generate Common Data Model files
  • Export logging: Track runs, jobs, and file metadata in a dedicated database

Quick example

Create a sync configuration once, then run it.

Step 1: Create a sync configuration

Windows (PowerShell)

.\LakeXpress.exe config create `
  -a .\credentials.json `
  --log_db_auth_id log_db_ms `
  --source_db_auth_id source_pg `
  --source_db_name tpch `
  --source_schema_name public `
  --fastbcp_dir_path .\FastBCP_win-x64\latest\ `
  --target_storage_id s3_01 `
  --n_jobs 4 `
  --fastbcp_p 2 `
  --generate_metadata

Linux

./LakeXpress config create \
  -a ./credentials.json \
  --log_db_auth_id log_db_ms \
  --source_db_auth_id source_pg \
  --source_db_name tpch \
  --source_schema_name public \
  --fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
  --target_storage_id s3_01 \
  --n_jobs 4 \
  --fastbcp_p 2 \
  --generate_metadata

Arguments:

Argument Description
-a ./credentials.json Path to database and storage credentials
--log_db_auth_id log_db_ms Credential ID for the logging database
--source_db_auth_id source_pg Credential ID for the source PostgreSQL database
--source_db_name tpch Database to export from
--source_schema_name public Schema to export (supports patterns like sales_%)
--fastbcp_dir_path Path to the FastBCP binary directory
--target_storage_id s3_01 Credential ID for target storage (S3 here)
--n_jobs 4 Export 4 tables in parallel
--fastbcp_p 2 Split large tables into 2 partitions for faster transfer
--generate_metadata Create CDM metadata files alongside Parquet files

Step 2: Run the sync

./LakeXpress sync

Supported Databases

Database Type As Source As Log Database
SQL Server ✅ Supported ✅ Supported
PostgreSQL ✅ Supported ✅ Supported
Oracle ✅ Supported ❌ Not Supported
MySQL ✅ Supported ✅ Supported
MariaDB ✅ Supported ❌ Not Supported
SQLite ❌ Not Supported ✅ Supported
DuckDB ❌ Not Supported ✅ Supported

Supported Storage Backends

Storage Backend Support Status
Local Filesystem ✅ Supported
AWS S3 ✅ Supported
S3-Compatible (OVH, MinIO, etc.) ✅ Supported
Google Cloud Storage (GCS) ✅ Supported
Azure Storage (ADLS Gen2, Blob) ✅ Supported

Supported Publishing Targets

After export, LakeXpress can create tables in analytics platforms as external (data stays in cloud storage) or internal (data loaded into native storage).

Publishing Target External Internal
Snowflake
Databricks
Microsoft Fabric
BigQuery
MotherDuck
AWS Glue
DuckLake

Set --publish_method external or --publish_method internal. Default is external.

Publishing guides:

Getting started

  1. Download the LakeXpress and FastBCP binaries for your platform
  2. Create a JSON file with your database and storage credentials
  3. (Optional) Run LakeXpress logdb init to set up the logging database
  4. Run LakeXpress config create to define your sync settings
  5. Run LakeXpress sync to export

See the Quick Start Guide for a full walkthrough.

Next steps