MotherDuck Publishing
LakeXpress creates MotherDuck tables from exported Parquet files in cloud storage (S3, GCS, Azure).
Table of Contents
- Prerequisites
- Authentication
- Configuration Options
- Table Types
- Dynamic Naming Patterns
- Usage Examples
- Data Type Mapping
- CLI Reference
Prerequisites
1. MotherDuck Account
Sign up at motherduck.com and generate an access token:
- Log in to MotherDuck
- Go to Settings → Access Tokens
- Create a new token
- Copy the token for your credentials file
2. Cloud Storage Setup
Configure credentials for cloud storage and MotherDuck:
{
"s3_datalake": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://my-datalake-bucket/lakexpress/",
"profile": "my-aws-profile"
}
},
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-access-token"
}
}
}
3. Storage Access from MotherDuck (Required for External Views)
Important: When using external views (--publish_method external), MotherDuck needs credentials to read your Parquet files from cloud storage. Without this, you’ll see errors like:
IO Error: No files found that match the pattern "s3://..."
HTTP Error: Permission error: Missing or invalid credentials
Configure secrets in MotherDuck before querying external views:
For AWS S3 (most common):
CREATE SECRET aws_s3_secret (
TYPE S3,
KEY_ID 'your_aws_access_key_id',
SECRET 'your_aws_secret_access_key',
REGION 'us-east-1' -- your bucket's region
);
To find your AWS credentials:
aws configure get aws_access_key_id
aws configure get aws_secret_access_key
aws configure get region
For GCS:
CREATE SECRET gcs_secret (
TYPE GCS,
KEY_ID 'your_hmac_access_key',
SECRET 'your_hmac_secret_key'
);
For Azure:
CREATE SECRET azure_secret (
TYPE AZURE,
ACCOUNT_NAME 'your_storage_account',
ACCOUNT_KEY 'your_account_key'
);
Verify secrets are configured:
SELECT * FROM duckdb_secrets();
Authentication
Token Authentication (Recommended)
Store your token in the credentials file:
{
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-access-token"
}
}
}
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "motherduck" |
auth_mode |
Yes | Set to "token" |
database |
Yes | Target MotherDuck database name |
token |
Yes | MotherDuck access token |
Environment Variable Authentication
Read the token from an environment variable:
{
"motherduck_env": {
"ds_type": "motherduck",
"auth_mode": "env",
"info": {
"database": "my_analytics"
}
}
}
export motherduck_token="your-motherduck-access-token"
lakexpress ...
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "motherduck" |
auth_mode |
Yes | Set to "env" |
database |
Yes | Target MotherDuck database name |
Configuration Options
| Option | Description | Default |
|---|---|---|
--publish_target ID |
Credential ID for MotherDuck target (required) | - |
--publish_schema_pattern PATTERN |
Schema naming pattern | {schema} |
--publish_table_pattern PATTERN |
Table naming pattern | {table} |
--publish_method TYPE |
Table type: external or internal |
external |
--n_jobs N |
Parallel workers for table creation | 1 |
Table Types
External Views (Default)
Queries data directly from cloud storage via read_parquet().
lakexpress ... --publish_target motherduck_prod --publish_method external
Generated SQL:
CREATE OR REPLACE VIEW "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')
Pros:
- No data copying, instant publishing
- No MotherDuck storage costs
- Always reflects latest cloud storage data
- Best for large datasets, infrequent queries, exploration
Trade-offs:
- Query speed depends on cloud storage latency
- Requires cloud storage credentials in MotherDuck
Internal Tables
Loads data into MotherDuck native columnar storage.
lakexpress ... --publish_target motherduck_prod --publish_method internal
Generated SQL:
CREATE OR REPLACE TABLE "my_database"."tpch_1"."orders" AS
SELECT * FROM read_parquet('s3://bucket/exports/tpch_1/orders/*.parquet')
Pros:
- Faster queries (optimized columnar format)
- No storage credentials needed after load
- Better caching for repeated queries
- Best for dashboards and production analytics
Trade-offs:
- Slower publishing (data must load)
- MotherDuck storage costs apply
- Data is a point-in-time snapshot
Dynamic Naming Patterns
Schema and table names support token-based patterns.
Supported Tokens
| Token | Description | Example Output |
|---|---|---|
{schema} |
Source schema name | tpch_1 |
{table} |
Source table name | customer |
{database} |
Source database name | tpch |
{date} |
Current date (YYYYMMDD) | 20251210 |
{timestamp} |
Current timestamp (YYYYMMDD_HHMMSS) | 20251210_143022 |
{uuid} |
UUID4 identifier (consistent per run) | a1b2c3d4 |
{subpath} |
CLI --sub_path parameter |
staging |
Common Patterns
Prefixed Schemas
lakexpress \
--publish_schema_pattern "lx_{schema}" \
--publish_table_pattern "{table}" \
--publish_target motherduck_prod \
...
# Results:
# Schema: lx_tpch_1
# Tables: customer, orders, lineitem
Date-Partitioned Schemas
lakexpress \
--publish_schema_pattern "{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target motherduck_prod \
...
# Results:
# Schema: tpch_1_20251210
# Tables: customer, orders, lineitem
Consolidated Multi-Schema
lakexpress \
--source_schema_name schema1,schema2 \
--publish_schema_pattern "consolidated" \
--publish_table_pattern "{schema}_{table}" \
--publish_target motherduck_prod \
...
# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer
Usage Examples
Example 1: Basic Export with External Views
lakexpress -a credentials.json \
--log_db_auth_id log_db \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod
Result: Database my_analytics, schema public, external views over S3 Parquet files.
Example 2: Internal Tables with Parallel Execution
lakexpress -a credentials.json \
--log_db_auth_id log_db \
--source_db_auth_id postgres_prod \
--source_schema_name tpch_1 \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod \
--publish_method internal \
--publish_schema_pattern "lx_{schema}" \
--n_jobs 4
Result: Database my_analytics, schema lx_tpch_1, native tables loaded from S3.
Example 3: Daily Snapshots
lakexpress -a credentials.json \
--log_db_auth_id log_db \
--source_db_auth_id postgres_prod \
--source_schema_name sales \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod \
--publish_schema_pattern "sales_{date}" \
--sub_path "daily/$(date +%Y%m%d)"
Result: Schema sales_20251210, data at s3://bucket/lakexpress/daily/20251210/.
Example 4: Environment Variable Token
export motherduck_token="your-token-here"
lakexpress -a credentials.json \
--log_db_auth_id log_db \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id s3_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_env
Example 5: GCS Storage Backend
{
"gcs_datalake": {
"ds_type": "gcs",
"auth_mode": "profile",
"info": {
"directory": "gs://my-datalake-bucket/lakexpress/",
"profile": "/path/to/service-account.json"
}
},
"motherduck_prod": {
"ds_type": "motherduck",
"auth_mode": "token",
"info": {
"database": "my_analytics",
"token": "your-motherduck-token"
}
}
}
lakexpress -a credentials.json \
--log_db_auth_id log_db \
--source_db_auth_id postgres_prod \
--source_schema_name public \
--target_storage_id gcs_datalake \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target motherduck_prod
Data Type Mapping
Source types map to DuckDB-compatible types automatically.
PostgreSQL → DuckDB/MotherDuck
| PostgreSQL Type | DuckDB Type |
|---|---|
INTEGER, INT4 |
INTEGER |
BIGINT, INT8 |
BIGINT |
SMALLINT, INT2 |
SMALLINT |
NUMERIC(p,s) |
DECIMAL(p,s) |
REAL, FLOAT4 |
REAL |
DOUBLE PRECISION |
DOUBLE |
VARCHAR(n), TEXT |
VARCHAR |
DATE |
DATE |
TIMESTAMP |
TIMESTAMP |
TIMESTAMPTZ |
TIMESTAMPTZ |
TIME |
TIME |
BOOLEAN |
BOOLEAN |
BYTEA |
BLOB |
JSON, JSONB |
JSON |
UUID |
UUID |
SQL Server → DuckDB/MotherDuck
| SQL Server Type | DuckDB Type |
|---|---|
INT |
INTEGER |
BIGINT |
BIGINT |
SMALLINT |
SMALLINT |
TINYINT |
TINYINT |
DECIMAL(p,s) |
DECIMAL(p,s) |
MONEY |
DECIMAL(19,4) |
FLOAT |
DOUBLE |
REAL |
REAL |
VARCHAR(n), NVARCHAR(n) |
VARCHAR |
DATE |
DATE |
DATETIME, DATETIME2 |
TIMESTAMP |
DATETIMEOFFSET |
TIMESTAMPTZ |
TIME |
TIME |
BIT |
BOOLEAN |
VARBINARY |
BLOB |
UNIQUEIDENTIFIER |
UUID |
Oracle → DuckDB/MotherDuck
| Oracle Type | DuckDB Type |
|---|---|
NUMBER |
DECIMAL(38,9) |
NUMBER(p,s) |
DECIMAL(p,s) |
FLOAT |
DOUBLE |
VARCHAR2, NVARCHAR2 |
VARCHAR |
CLOB, NCLOB |
VARCHAR |
DATE |
TIMESTAMP |
TIMESTAMP |
TIMESTAMP |
RAW, BLOB |
BLOB |
CLI Reference
MotherDuck Publishing Arguments
| Option | Type | Description |
|---|---|---|
--publish_target ID |
String | Credential ID for MotherDuck publishing (required) |
--publish_schema_pattern PATTERN |
String | Dynamic schema naming pattern (default: {schema}) |
--publish_table_pattern PATTERN |
String | Dynamic table naming pattern (default: {table}) |
--publish_method TYPE |
String | Table type: external (default) or internal |
--n_jobs N |
Integer | Number of parallel workers for table creation (default: 1) |
Querying MotherDuck Tables
Once published, query tables via any of these methods.
MotherDuck Web UI:
Log in to app.motherduck.com and run:
SELECT * FROM my_analytics.lx_tpch_1.customer LIMIT 10;
DuckDB CLI:
duckdb "md:my_analytics?motherduck_token=your_token"
SELECT * FROM lx_tpch_1.customer LIMIT 10;
Python (duckdb):
import duckdb
conn = duckdb.connect("md:my_analytics?motherduck_token=your_token")
df = conn.execute("SELECT * FROM lx_tpch_1.customer LIMIT 10").df()
print(df)
Python with environment variable:
import os
import duckdb
os.environ["motherduck_token"] = "your_token"
conn = duckdb.connect("md:my_analytics")
df = conn.execute("SELECT * FROM lx_tpch_1.customer LIMIT 10").df()
print(df)
Troubleshooting
Common Issues
“Authentication failed”:
- Verify your token is valid and not expired
- Check token permissions for the target database
- For env mode, ensure
motherduck_tokenis set
“Database not found”:
- Verify the database name matches an existing MotherDuck database
- Create the database in MotherDuck UI before publishing
“IO Error: No files found that match the pattern” or “Permission error: Missing or invalid credentials”:
- This means MotherDuck cannot access your private S3/GCS/Azure bucket
- Create a secret in MotherDuck with your cloud storage credentials (see Storage Access)
- Verify secrets are configured:
SELECT * FROM duckdb_secrets(); - Test access:
SELECT count(*) FROM read_parquet('s3://your-bucket/path/*.parquet');
“Cannot read file” (external views):
- Verify cloud storage credentials in MotherDuck
- Check that the S3/GCS/Azure path contains valid Parquet files
- Ensure the bucket allows access from MotherDuck
- Verify the files exist:
aws s3 ls s3://your-bucket/path/ --recursive | head
“Connection timeout”:
- Check network connectivity
- MotherDuck requires outbound HTTPS
- Verify no firewall is blocking the connection
Verifying Setup
Test MotherDuck connectivity:
duckdb "md:my_analytics?motherduck_token=your_token" -c "SELECT 1"
Test cloud storage access from MotherDuck:
SELECT * FROM read_parquet('s3://your-bucket/path/to/file.parquet') LIMIT 1;
Configuring Storage Secrets
See Storage Access from MotherDuck in Prerequisites for detailed instructions on configuring S3, GCS, or Azure secrets.
Comparison with Other Targets
| Feature | MotherDuck | BigQuery | Snowflake | DuckLake |
|---|---|---|---|---|
| Query Engine | DuckDB | BigQuery | Snowflake | DuckDB |
| External Tables | Views with read_parquet() |
External tables | External tables | DuckLake catalog |
| Native Tables | CREATE TABLE AS SELECT | Load jobs | COPY INTO | Not supported |
| Serverless | Yes | Yes | No (warehouse required) | N/A (local) |
| SQL Dialect | DuckDB/PostgreSQL | BigQuery SQL | Snowflake SQL | DuckDB/PostgreSQL |
| Best For | Analytics, exploration | Large-scale analytics | Enterprise data warehouse | Local development |