MotherDuck Publishing

LakeXpress creates MotherDuck tables from exported Parquet files in cloud storage (S3, GCS, Azure).

Prerequisites
Authentication
Configuration Options
Table Types
Dynamic Naming Patterns
Usage Examples
Data Type Mapping
CLI Reference

Prerequisites

1. MotherDuck Account

Log in to MotherDuck
Go to Settings → Access Tokens
Create a new token
Copy the token for your credentials file

2. Cloud Storage Setup

Configure credentials for cloud storage and MotherDuck:

{
    "s3_datalake": {
        "ds_type": "s3",
        "auth_mode": "profile",
        "info": {
            "directory": "s3://my-datalake-bucket/lakexpress/",
            "profile": "my-aws-profile"
        }
    },
    "motherduck_prod": {
        "ds_type": "motherduck",
        "auth_mode": "token",
        "info": {
            "database": "my_analytics",
            "token": "your-motherduck-access-token"
        }
    }
}

3. Storage Access from MotherDuck (Required for External Views)

Important: When using external views (--publish_method external), MotherDuck needs credentials to read your Parquet files from cloud storage. Without this, you’ll see errors like:

IO Error: No files found that match the pattern "s3://..."
HTTP Error: Permission error: Missing or invalid credentials

Configure secrets in MotherDuck before querying external views:

For AWS S3 (most common):

CREATE SECRET aws_s3_secret (
    TYPE S3,
    KEY_ID 'your_aws_access_key_id',
    SECRET 'your_aws_secret_access_key',
    REGION 'us-east-1'  -- your bucket's region
);

To find your AWS credentials:

aws configure get aws_access_key_id
aws configure get aws_secret_access_key
aws configure get region

For GCS:

CREATE SECRET gcs_secret (
    TYPE GCS,
    KEY_ID 'your_hmac_access_key',
    SECRET 'your_hmac_secret_key'
);

For Azure:

CREATE SECRET azure_secret (
    TYPE AZURE,
    ACCOUNT_NAME 'your_storage_account',
    ACCOUNT_KEY 'your_account_key'
);

Verify secrets are configured:

SELECT * FROM duckdb_secrets();

Authentication

Token Authentication (Recommended)

Store your token in the credentials file:

{
    "motherduck_prod": {
        "ds_type": "motherduck",
        "auth_mode": "token",
        "info": {
            "database": "my_analytics",
            "token": "your-motherduck-access-token"
        }
    }
}

Field	Required	Description
`ds_type`	Yes	Must be `"motherduck"`
`auth_mode`	Yes	Set to `"token"`
`database`	Yes	Target MotherDuck database name
`token`	Yes	MotherDuck access token

Environment Variable Authentication

Read the token from an environment variable:

{
    "motherduck_env": {
        "ds_type": "motherduck",
        "auth_mode": "env",
        "info": {
            "database": "my_analytics"
        }
    }
}

export motherduck_token="your-motherduck-access-token"
lakexpress ...

Field	Required	Description
`ds_type`	Yes	Must be `"motherduck"`
`auth_mode`	Yes	Set to `"env"`
`database`	Yes	Target MotherDuck database name

Configuration Options

Option	Description	Default
`--publish_target ID`	Credential ID for MotherDuck target (required)	-
`--publish_schema_pattern PATTERN`	Schema naming pattern	`{schema}`
`--publish_table_pattern PATTERN`	Table naming pattern	`{table}`
`--publish_method TYPE`	Table type: `external` or `internal`	`external`
`--n_jobs N`	Parallel workers for table creation	`1`

Table Types

External Views (Default)

Queries data directly from cloud storage via read_parquet().

lakexpress ... --publish_target motherduck_prod --publish_method external

Generated SQL:

CREATE OR REPLACE VIEW "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')

Pros:

No data copying, instant publishing
No MotherDuck storage costs
Always reflects latest cloud storage data
Best for large datasets, infrequent queries, exploration

Trade-offs:

Query speed depends on cloud storage latency
Requires cloud storage credentials in MotherDuck

Internal Tables

Loads data into MotherDuck native columnar storage.

lakexpress ... --publish_target motherduck_prod --publish_method internal

Generated SQL:

CREATE OR REPLACE TABLE "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')

Pros:

Faster queries (optimized columnar format)
No storage credentials needed after load
Better caching for repeated queries
Best for dashboards and production analytics

Trade-offs:

Slower publishing (data must load)
MotherDuck storage costs apply
Data is a point-in-time snapshot

Dynamic Naming Patterns

Schema and table names support token-based patterns.

Supported Tokens

Token	Description	Example Output
`{schema}`	Source schema name	`tpch_1`
`{table}`	Source table name	`customer`
`{database}`	Source database name	`tpch`
`{date}`	Current date (YYYYMMDD)	`20251210`
`{timestamp}`	Current timestamp (YYYYMMDD_HHMMSS)	`20251210_143022`
`{uuid}`	UUID4 identifier (consistent per run)	`a1b2c3d4`
`{subpath}`	CLI `--sub_path` parameter	`staging`

Common Patterns

Prefixed Schemas

lakexpress \
  --publish_schema_pattern "lx_{schema}" \
  --publish_table_pattern "{table}" \
  --publish_target motherduck_prod \
  ...

# Results:
# Schema: lx_tpch_1
# Tables: customer, orders, lineitem

Date-Partitioned Schemas

lakexpress \
  --publish_schema_pattern "{schema}_{date}" \
  --publish_table_pattern "{table}" \
  --publish_target motherduck_prod \
  ...

# Results:
# Schema: tpch_1_20251210
# Tables: customer, orders, lineitem

Consolidated Multi-Schema

lakexpress \
  --source_schema_name schema1,schema2 \
  --publish_schema_pattern "consolidated" \
  --publish_table_pattern "{schema}_{table}" \
  --publish_target motherduck_prod \
  ...

# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer

Usage Examples

Example 1: Basic Export with External Views

lakexpress -a credentials.json \
    --log_db_auth_id log_db \
    --source_db_auth_id postgres_prod \
    --source_schema_name public \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod

Result: Database my_analytics, schema public, external views over S3 Parquet files.

Example 2: Internal Tables with Parallel Execution

lakexpress -a credentials.json \
    --log_db_auth_id log_db \
    --source_db_auth_id postgres_prod \
    --source_schema_name tpch_1 \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod \
    --publish_method internal \
    --publish_schema_pattern "lx_{schema}" \
    --n_jobs 4

Result: Database my_analytics, schema lx_tpch_1, native tables loaded from S3.

Example 3: Daily Snapshots

lakexpress -a credentials.json \
    --log_db_auth_id log_db \
    --source_db_auth_id postgres_prod \
    --source_schema_name sales \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod \
    --publish_schema_pattern "sales_{date}" \
    --sub_path "daily/$(date +%Y%m%d)"

Result: Schema sales_20251210, data at s3://bucket/lakexpress/daily/20251210/.

Example 4: Environment Variable Token

export motherduck_token="your-token-here"

lakexpress -a credentials.json \
    --log_db_auth_id log_db \
    --source_db_auth_id postgres_prod \
    --source_schema_name public \
    --target_storage_id s3_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_env

Example 5: GCS Storage Backend

{
    "gcs_datalake": {
        "ds_type": "gcs",
        "auth_mode": "profile",
        "info": {
            "directory": "gs://my-datalake-bucket/lakexpress/",
            "profile": "/path/to/service-account.json"
        }
    },
    "motherduck_prod": {
        "ds_type": "motherduck",
        "auth_mode": "token",
        "info": {
            "database": "my_analytics",
            "token": "your-motherduck-token"
        }
    }
}

lakexpress -a credentials.json \
    --log_db_auth_id log_db \
    --source_db_auth_id postgres_prod \
    --source_schema_name public \
    --target_storage_id gcs_datalake \
    --fastbcp_dir_path /path/to/FastBCP \
    --publish_target motherduck_prod

Data Type Mapping

Source types map to DuckDB-compatible types automatically.

PostgreSQL → DuckDB/MotherDuck

PostgreSQL Type	DuckDB Type
`INTEGER`, `INT4`	`INTEGER`
`BIGINT`, `INT8`	`BIGINT`
`SMALLINT`, `INT2`	`SMALLINT`
`NUMERIC(p,s)`	`DECIMAL(p,s)`
`REAL`, `FLOAT4`	`REAL`
`DOUBLE PRECISION`	`DOUBLE`
`VARCHAR(n)`, `TEXT`	`VARCHAR`
`DATE`	`DATE`
`TIMESTAMP`	`TIMESTAMP`
`TIMESTAMPTZ`	`TIMESTAMPTZ`
`TIME`	`TIME`
`BOOLEAN`	`BOOLEAN`
`BYTEA`	`BLOB`
`JSON`, `JSONB`	`JSON`
`UUID`	`UUID`

SQL Server → DuckDB/MotherDuck

SQL Server Type	DuckDB Type
`INT`	`INTEGER`
`BIGINT`	`BIGINT`
`SMALLINT`	`SMALLINT`
`TINYINT`	`TINYINT`
`DECIMAL(p,s)`	`DECIMAL(p,s)`
`MONEY`	`DECIMAL(19,4)`
`FLOAT`	`DOUBLE`
`REAL`	`REAL`
`VARCHAR(n)`, `NVARCHAR(n)`	`VARCHAR`
`DATE`	`DATE`
`DATETIME`, `DATETIME2`	`TIMESTAMP`
`DATETIMEOFFSET`	`TIMESTAMPTZ`
`TIME`	`TIME`
`BIT`	`BOOLEAN`
`VARBINARY`	`BLOB`
`UNIQUEIDENTIFIER`	`UUID`

Oracle → DuckDB/MotherDuck

Oracle Type	DuckDB Type
`NUMBER`	`DECIMAL(38,9)`
`NUMBER(p,s)`	`DECIMAL(p,s)`
`FLOAT`	`DOUBLE`
`VARCHAR2`, `NVARCHAR2`	`VARCHAR`
`CLOB`, `NCLOB`	`VARCHAR`
`DATE`	`TIMESTAMP`
`TIMESTAMP`	`TIMESTAMP`
`RAW`, `BLOB`	`BLOB`

CLI Reference