LakeXpress - Fast Database-to-Parquet Export Tool

What is LakeXpress?

LakeXpress is a CLI tool that exports database tables to partitioned Parquet files on local disk or cloud storage (AWS S3, GCS, Azure) and registers them as tables in Snowflake, Databricks, or other catalogs. It uses FastBCP to stream data in parallel without exhausting memory.

Key features

Cross-platform: Native binaries for Windows and Linux
Multiple database sources: Oracle, PostgreSQL, SQL Server, MySQL/MariaDB, SAP HANA
Multiple cloud targets: AWS S3, S3-compatible, GCS, Azure, or local disk
Catalog publishing: Snowflake, AWS Glue, Databricks, Microsoft Fabric, BigQuery, MotherDuck, DuckLake
Incremental sync: Watermark-based delta exports
Parallel exports: Multiple tables at once, with per-table partitioning
Schema filtering: Include/exclude schemas and tables via SQL patterns
Resume on failure: Pick up where a failed export left off
CDM metadata: Generate Common Data Model files
Export logging: Track runs, jobs, and file metadata in a dedicated database

Quick example

Create a sync configuration once, then run it.

Step 1: Create a sync configuration

Windows (PowerShell)

.\LakeXpress.exe config create `
  -a .\credentials.json `
  --log_db_auth_id log_db_ms `
  --source_db_auth_id source_pg `
  --source_db_name tpch `
  --source_schema_name public `
  --fastbcp_dir_path .\FastBCP_win-x64\latest\ `
  --target_storage_id s3_01 `
  --n_jobs 4 `
  --fastbcp_p 2 `
  --generate_metadata

Linux

./LakeXpress config create \
  -a ./credentials.json \
  --log_db_auth_id log_db_ms \
  --source_db_auth_id source_pg \
  --source_db_name tpch \
  --source_schema_name public \
  --fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
  --target_storage_id s3_01 \
  --n_jobs 4 \
  --fastbcp_p 2 \
  --generate_metadata

Arguments:

Argument	Description
`-a ./credentials.json`	Path to database and storage credentials
`--log_db_auth_id log_db_ms`	Credential ID for the logging database
`--source_db_auth_id source_pg`	Credential ID for the source PostgreSQL database
`--source_db_name tpch`	Database to export from
`--source_schema_name public`	Schema to export (supports patterns like `sales_%`)
`--fastbcp_dir_path`	Path to the FastBCP binary directory
`--target_storage_id s3_01`	Credential ID for target storage (S3 here)
`--n_jobs 4`	Export 4 tables in parallel
`--fastbcp_p 2`	Split large tables into 2 partitions for faster transfer
`--generate_metadata`	Create CDM metadata files alongside Parquet files

Step 2: Run the sync

./LakeXpress sync

Supported Databases

Database Type	As Source	As Log Database
SQL Server	✅ Supported	✅ Supported
PostgreSQL	✅ Supported	✅ Supported
Oracle	✅ Supported	❌ Not Supported
MySQL	✅ Supported	✅ Supported
MariaDB	✅ Supported	❌ Not Supported
SAP HANA	✅ Supported	❌ Not Supported
SQLite	❌ Not Supported	✅ Supported
DuckDB	❌ Not Supported	✅ Supported

Supported Storage Backends

Storage Backend	Support Status
Local Filesystem	✅ Supported
AWS S3	✅ Supported
S3-Compatible (OVH, MinIO, etc.)	✅ Supported
Google Cloud Storage (GCS)	✅ Supported
Azure Storage (ADLS Gen2, Blob)	✅ Supported

Supported Publishing Targets

After export, LakeXpress can create tables in analytics platforms as external (data stays in cloud storage) or internal (data loaded into native storage).

Publishing Target	External	Internal
Snowflake	✅	✅
Databricks	✅	✅
Microsoft Fabric	✅	✅
BigQuery	✅	✅
MotherDuck	✅	✅
AWS Glue	✅	❌
DuckLake	✅	❌

Set --publish_method external or --publish_method internal. Default is external.

Publishing guides:

Getting started

Download the LakeXpress and FastBCP binaries for your platform
Create a JSON file with your database and storage credentials
(Optional) Run LakeXpress logdb init to set up the logging database
Run LakeXpress config create to define your sync settings
Run LakeXpress sync to export

See the Quick Start Guide for a full walkthrough.

Next steps

Ready to export?

Get Started Build Your Command Command Cookbook