Quick start guide

Two steps: create a config, then run a sync.

Prerequisites

LakeXpress and FastBCP binaries (Windows or Linux)
Source database connection details (read access to tables and information_schema) and a logging database
A storage destination: local directory or cloud credentials (S3, GCS, Azure)
Publishing credentials (optional) for Snowflake, Databricks, BigQuery, etc.

Create a credentials file

{
  "log_db_postgres": {
    "ds_type": "postgres",
    "auth_mode": "classic",
    "info": {
      "username": "$env{LOG_DB_USER}",
      "password": "$env{LOG_DB_PASSWORD}",
      "server": "localhost",
      "port": 5432,
      "database": "lakexpress_log"
    }
  },
  "source_postgres": {
    "ds_type": "postgres",
    "auth_mode": "classic",
    "info": {
      "username": "$env{PG_USER}",
      "password": "$env{PG_PASSWORD}",
      "server": "localhost",
      "port": 5432,
      "database": "production_db"
    }
  },
  "s3_01": {
    "ds_type": "s3",
    "auth_mode": "profile",
    "info": {
      "directory": "s3://my-data-lake/exports",
      "profile": "your-aws-profile"
    }
  }
}

Save as credentials.json in a secure location.

Tip: Environment variables – Use $env{VAR_NAME} in any string value. An error is raised if the variable is not set. Plain-text values also work.
# Linux
export PG_USER="postgres"
export PG_PASSWORD="your_password"

# Windows (cmd)
set PG_USER=postgres
set PG_PASSWORD=your_password

# Windows (PowerShell)
$env:PG_USER = "postgres"
$env:PG_PASSWORD = "your_password"

For other databases (Oracle, SQL Server, MySQL) and storage backends (GCS, Azure), see Database Configuration and Storage Backends.

Initialize the logging database (optional)

The logging database tracks syncs, runs, and table exports. LakeXpress creates the schema automatically on first sync, so this step is optional.

Use logdb init to verify connectivity or pre-create the schema for audit purposes.

Windows (PowerShell)

.\LakeXpress.exe logdb init `
  -a credentials.json `
  --log_db_auth_id log_db_postgres

Linux

./LakeXpress logdb init \
  -a credentials.json \
  --log_db_auth_id log_db_postgres

Create a sync configuration

The configuration is stored in the logging database and reused for every sync.

Export to local filesystem

Windows (PowerShell)

.\LakeXpress.exe config create `
  -a credentials.json `
  --log_db_auth_id log_db_postgres `
  --source_db_auth_id source_postgres `
  --source_db_name public `
  --source_schema_name public `
  --fastbcp_dir_path .\FastBCP_win-x64\latest\ `
  --output_dir .\exports `
  --n_jobs 4 `
  --fastbcp_p 2

Linux

./LakeXpress config create \
  -a credentials.json \
  --log_db_auth_id log_db_postgres \
  --source_db_auth_id source_postgres \
  --source_db_name public \
  --source_schema_name public \
  --fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
  --output_dir ./exports \
  --n_jobs 4 \
  --fastbcp_p 2

Exports all tables from public to ./exports/public/table_name/, 4 tables in parallel with 2-way partitioning.

Run the sync

Windows (PowerShell)

.\LakeXpress.exe sync

Linux

./LakeXpress sync

Loads the config from the logging database, exports tables, and shows real-time progress.

More examples

Export to cloud storage

Export to AWS S3 with CDM metadata:

Windows (PowerShell)

.\LakeXpress.exe config create `
  -a credentials.json `
  --log_db_auth_id log_db_postgres `
  --source_db_auth_id source_postgres `
  --source_db_name tpch `
  --source_schema_name public `
  --fastbcp_dir_path .\FastBCP_win-x64\latest\ `
  --target_storage_id s3_01 `
  --n_jobs 4 `
  --fastbcp_p 2 `
  --generate_metadata

.\LakeXpress.exe sync

Linux

./LakeXpress config create \
  -a credentials.json \
  --log_db_auth_id log_db_postgres \
  --source_db_auth_id source_postgres \
  --source_db_name tpch \
  --source_schema_name public \
  --fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
  --target_storage_id s3_01 \
  --n_jobs 4 \
  --fastbcp_p 2 \
  --generate_metadata

./LakeXpress sync

Exports to S3 and generates CDM metadata files.

Filter tables with patterns

Use include/exclude patterns to select specific tables:

Windows (PowerShell)

.\LakeXpress.exe config create `
  -a credentials.json `
  --log_db_auth_id log_db_postgres `
  --source_db_auth_id source_postgres `
  --source_db_name public `
  --source_schema_name public `
  --include "orders%, customer%, product%" `
  --exclude "temp%, test%" `
  --fastbcp_dir_path .\FastBCP_win-x64\latest\ `
  --output_dir .\exports `
  --n_jobs 4

.\LakeXpress.exe sync

Linux

./LakeXpress config create \
  -a credentials.json \
  --log_db_auth_id log_db_postgres \
  --source_db_auth_id source_postgres \
  --source_db_name public \
  --source_schema_name public \
  --include "orders%, customer%, product%" \
  --exclude "temp%, test%" \
  --fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
  --output_dir ./exports \
  --n_jobs 4

./LakeXpress sync

Includes tables matching orders%, customer%, or product%; excludes those matching temp% or test%.

Incremental sync

Use a watermark column so subsequent syncs only export new rows:

./LakeXpress config create \
  -a credentials.json \
  --log_db_auth_id log_db_ms \
  --source_db_auth_id source_pg \
  --source_db_name tpch \
  --source_schema_name tpch_1_incremental \
  --fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
  --target_storage_id s3_01 \
  --incremental_table "tpch_1_incremental.orders:o_orderdate:date" \
  --incremental_table "tpch_1_incremental.lineitem:l_shipdate:date" \
  --generate_metadata

# First sync: exports everything and records high watermarks
./LakeXpress sync

# Later syncs: only exports rows past the watermark
./LakeXpress sync

Tables not configured as incremental are fully exported each sync. See the Incremental Sync guide for details.

Resume failed syncs

./LakeXpress sync --run_id 20251208-f7g8h9i0-j1k2-l3m4 --resume

Skips completed tables and retries only the failed ones.

Snowflake publishing

Export to S3 and create Snowflake external tables in one step:

./LakeXpress config create \
  -a credentials.json \
  --log_db_auth_id log_db_postgres \
  --source_db_auth_id source_postgres \
  --source_db_name public \
  --source_schema_name public \
  --fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
  --target_storage_id s3_01 \
  --publish_target snowflake_prod \
  --n_jobs 4

./LakeXpress sync

Query the data in Snowflake:

SELECT * FROM PUBLIC.V_CUSTOMER LIMIT 10;

For internal tables with primary key constraints, add --publish_method internal --snowflake_pk_constraints. See the Snowflake Publishing Guide.

Reference

List configurations

./LakeXpress config list \
  -a credentials.json \
  --log_db_auth_id log_db_postgres

Check sync status

./LakeXpress status -a credentials.json --log_db_auth_id log_db_postgres --sync_id <your-sync-id>

Manage the logging database

# Initialize the schema
./LakeXpress logdb init -a credentials.json --log_db_auth_id log_db_postgres

# Clear run history (keeps schema)
./LakeXpress logdb truncate -a credentials.json --log_db_auth_id log_db_postgres

# Drop the schema
./LakeXpress logdb drop -a credentials.json --log_db_auth_id log_db_postgres --confirm

Next steps

CLI reference - all available options
Incremental sync - continuous updates
Storage backends - S3, GCS, Azure, local
Database configuration - all supported databases
Examples - real-world scenarios