What is LakeXpress?
LakeXpress is a CLI tool that exports database tables to partitioned Parquet files on local disk or cloud storage (AWS S3, GCS, Azure) and registers them as tables in Snowflake, Databricks, or other catalogs. It uses FastBCP to stream data in parallel without exhausting memory.
Key features
- Cross-platform: Native binaries for Windows and Linux
- Multiple database sources: Oracle, PostgreSQL, SQL Server, MySQL/MariaDB
- Multiple cloud targets: AWS S3, S3-compatible, GCS, Azure, or local disk
- Catalog publishing: Snowflake, AWS Glue, Databricks, Microsoft Fabric, BigQuery, MotherDuck, DuckLake
- Incremental sync: Watermark-based delta exports
- Parallel exports: Multiple tables at once, with per-table partitioning
- Schema filtering: Include/exclude schemas and tables via SQL patterns
- Resume on failure: Pick up where a failed export left off
- CDM metadata: Generate Common Data Model files
- Export logging: Track runs, jobs, and file metadata in a dedicated database
Quick example
Create a sync configuration once, then run it.
Step 1: Create a sync configuration
Windows (PowerShell)
.\LakeXpress.exe config create `
-a .\credentials.json `
--log_db_auth_id log_db_ms `
--source_db_auth_id source_pg `
--source_db_name tpch `
--source_schema_name public `
--fastbcp_dir_path .\FastBCP_win-x64\latest\ `
--target_storage_id s3_01 `
--n_jobs 4 `
--fastbcp_p 2 `
--generate_metadata
Linux
./LakeXpress config create \
-a ./credentials.json \
--log_db_auth_id log_db_ms \
--source_db_auth_id source_pg \
--source_db_name tpch \
--source_schema_name public \
--fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
--target_storage_id s3_01 \
--n_jobs 4 \
--fastbcp_p 2 \
--generate_metadata
Arguments:
| Argument | Description |
|---|---|
-a ./credentials.json |
Path to database and storage credentials |
--log_db_auth_id log_db_ms |
Credential ID for the logging database |
--source_db_auth_id source_pg |
Credential ID for the source PostgreSQL database |
--source_db_name tpch |
Database to export from |
--source_schema_name public |
Schema to export (supports patterns like sales_%) |
--fastbcp_dir_path |
Path to the FastBCP binary directory |
--target_storage_id s3_01 |
Credential ID for target storage (S3 here) |
--n_jobs 4 |
Export 4 tables in parallel |
--fastbcp_p 2 |
Split large tables into 2 partitions for faster transfer |
--generate_metadata |
Create CDM metadata files alongside Parquet files |
Step 2: Run the sync
./LakeXpress sync
Supported Databases
| Database Type | As Source | As Log Database |
|---|---|---|
| SQL Server | ✅ Supported | ✅ Supported |
| PostgreSQL | ✅ Supported | ✅ Supported |
| Oracle | ✅ Supported | ❌ Not Supported |
| MySQL | ✅ Supported | ✅ Supported |
| MariaDB | ✅ Supported | ❌ Not Supported |
| SQLite | ❌ Not Supported | ✅ Supported |
| DuckDB | ❌ Not Supported | ✅ Supported |
Supported Storage Backends
| Storage Backend | Support Status |
|---|---|
| Local Filesystem | ✅ Supported |
| AWS S3 | ✅ Supported |
| S3-Compatible (OVH, MinIO, etc.) | ✅ Supported |
| Google Cloud Storage (GCS) | ✅ Supported |
| Azure Storage (ADLS Gen2, Blob) | ✅ Supported |
Supported Publishing Targets
After export, LakeXpress can create tables in analytics platforms as external (data stays in cloud storage) or internal (data loaded into native storage).
| Publishing Target | External | Internal |
|---|---|---|
| Snowflake | ✅ | ✅ |
| Databricks | ✅ | ✅ |
| Microsoft Fabric | ✅ | ✅ |
| BigQuery | ✅ | ✅ |
| MotherDuck | ✅ | ✅ |
| AWS Glue | ✅ | ❌ |
| DuckLake | ✅ | ❌ |
Set --publish_method external or --publish_method internal. Default is external.
Publishing guides:
Getting started
- Download the LakeXpress and FastBCP binaries for your platform
- Create a JSON file with your database and storage credentials
- (Optional) Run
LakeXpress logdb initto set up the logging database - Run
LakeXpress config createto define your sync settings - Run
LakeXpress syncto export
See the Quick Start Guide for a full walkthrough.