Snowflake Publishing
LakeXpress creates Snowflake tables from exported Parquet files, enabling queries on data stored in S3, Azure, or GCS.
Table of Contents
- Prerequisites
- Authentication Modes
- Table Types
- Dynamic Naming Patterns
- Usage Examples
- Views and Customization
- Primary Key Constraints
- Data Type Mapping
- CLI Reference
Prerequisites
1. Snowflake Setup
Required objects:
- Database (e.g.,
LAKEXPRESS_DB) - Warehouse (e.g.,
SNOWFLAKE_LEARNING_WH) - Stage pointing to S3 bucket (e.g.,
AWSS3_AWDW_STAGE)
Required privileges:
USE ROLE ACCOUNTADMIN;
GRANT CREATE SCHEMA ON DATABASE LAKEXPRESS_DB TO ROLE SYSADMIN;
GRANT USAGE ON WAREHOUSE SNOWFLAKE_LEARNING_WH TO ROLE SYSADMIN;
GRANT USAGE ON STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE TO ROLE SYSADMIN;
2. Cloud Storage Setup
Both S3 and Snowflake credentials must be in your credentials.json:
{
"aws_s3_01": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://aetplakexpress/lakexpress/",
"profile": "lakexpress"
}
},
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "password",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"password": "your-password",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}
Critical: The S3 directory path must exactly match the Snowflake stage location.
Verify the match:
DESC STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE;
-- URL column shows: s3://aetplakexpress/lakexpress/
-- This MUST match your S3 credentials directory
-- "directory": "s3://aetplakexpress/lakexpress/" ✓ Correct
-- "directory": "s3://aetplakexpress/" ✗ Wrong - files won't be found
Authentication Modes
Password Authentication
{
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "password",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"password": "your-password",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}
Programmatic Access Token (PAT) Authentication
Recommended for automation:
{
"snowflake_prod": {
"ds_type": "snowflake",
"auth_mode": "pat",
"info": {
"account": "tj36405.eu-west-1",
"user": "FRANCOIS.PACULL",
"token": "your-personal-access-token",
"warehouse": "SNOWFLAKE_LEARNING_WH",
"database": "LAKEXPRESS_DB",
"stage": "AWSS3_AWDW_STAGE"
}
}
}
PATs require a network policy configured for the user.
PAT benefits:
- Easily rotated, configurable lifetime
- No MFA prompts – suited for CI/CD pipelines
Table Types
External Tables
Data stays in cloud storage; Snowflake stores only metadata. Views are auto-created for cleaner querying.
- No data loading time or Snowflake storage costs
- Immediate availability after export
- Suited for: data lake integration, infrequent access, exploration, ETL staging
./LakeXpress \
-a data/ds_credentials.json \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--publish_target snowflake_prod \
--snowflake_table_type external # Default
Internal Tables
Data is loaded into Snowflake storage with full feature support (clustering, time travel, etc.). Can include PRIMARY KEY constraints from source.
- Faster queries with full Snowflake features
- Maintains PRIMARY KEY constraints
- Suited for: frequently queried data, production warehouses, complex joins
./LakeXpress \
-a data/ds_credentials.json \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--publish_target snowflake_prod \
--snowflake_table_type internal \
--snowflake_pk_constraints # Optional: include PK constraints
Dynamic Naming Patterns
Schema and table names support token-based patterns for time-partitioned exports, schema consolidation, or unique identifiers.
Supported Tokens
| Token | Description | Example Output |
|---|---|---|
{schema} |
Source schema name | tpch_1 |
{table} |
Source table name | customer |
{database} |
Source database name | tpch |
{date} |
Current date (YYYYMMDD) | 20251120 |
{timestamp} |
Current timestamp (YYYYMMDD_HHMMSS) | 20251120_112511 |
{uuid} |
UUID4 identifier (consistent per run) | a1b2c3d4-... |
{subpath} |
CLI --sub_path parameter |
toto |
Note: All timestamps are in local time. The {table} token is mandatory for --publish_table_pattern.
CLI Arguments
--publish_schema_pattern PATTERN # Schema name pattern (default varies by table type)
--publish_table_pattern PATTERN # Table name pattern (must include {table} token)
Default Patterns
| Table Type | Default Schema Pattern | Default Table Pattern |
|---|---|---|
| External | EXT_{schema} |
{table} |
| Internal | {schema} |
{table} |
Common Patterns
1. Time-Based Partitioning
./LakeXpress \
--publish_schema_pattern "EXT_{schema}_{date}" \
--publish_table_pattern "{table}" \
--publish_target snowflake_prod \
...
# Results:
# Schema: EXT_tpch_1_20251120
# Tables: customer, orders, lineitem
2. Multi-Schema Consolidation
./LakeXpress \
--source_schema_name schema1,schema2,schema3 \
--publish_schema_pattern "{subpath}" \
--publish_table_pattern "{schema}_{table}" \
--sub_path consolidated \
--publish_target snowflake_prod \
...
# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer, schema3_customer
3. Unique Run Identifiers
./LakeXpress \
--publish_schema_pattern "EXT_{schema}_{uuid}" \
--publish_table_pattern "{table}" \
--publish_target snowflake_prod \
...
# Results:
# Schema: EXT_tpch_1_a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Tables: customer, orders, lineitem
Usage Examples
Example 1: External Tables with Custom Patterns
./LakeXpress -a data/ds_credentials.json \
--log_db_auth_id log_db_ms \
--source_db_auth_id ds_04_pg \
--source_db_name tpch \
--source_schema_name tpch_1 \
--fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
--fastbcp_p 2 \
--n_jobs 4 \
--target_storage_id aws_s3_01 \
--generate_metadata \
--sub_path toto \
--publish_target snowflake_pass \
--publish_schema_pattern "EXT_{subpath}_{date}" \
--publish_table_pattern "{schema}_{table}" \
--snowflake_table_type external
Result:
- Schema:
EXT_toto_20251124 - Tables:
tpch_1_customer,tpch_1_orders, etc. - Views:
V_tpch_1_customer,V_tpch_1_orders, etc.
Example 2: Internal Tables with Primary Keys
./LakeXpress -a data/ds_credentials.json \
--log_db_auth_id log_db_ms \
--source_db_auth_id ds_04_pg \
--source_db_name tpch \
--source_schema_name tpch_1 \
--fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
--fastbcp_p 2 \
--n_jobs 4 \
--target_storage_id aws_s3_01 \
--generate_metadata \
--sub_path toto \
--snowflake_table_type internal \
--publish_target snowflake_pat \
--snowflake_pk_constraints
Result:
- Schema:
tpch_1 - Tables:
customer,orders, etc. (with data loaded) - PRIMARY KEY constraints applied where available
Example 3: Single Table
./LakeXpress \
-a data/ds_credentials.json \
--log_db_auth_id log_db_ms \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--source_schema_name tpch_1 \
-i nation \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target snowflake_prod
Example 4: Full Schema
./LakeXpress \
-a data/ds_credentials.json \
--log_db_auth_id log_db_ms \
--source_db_auth_id ds_04_pg \
--target_storage_id aws_s3_01 \
--source_schema_name tpch_1 \
--fastbcp_dir_path /path/to/FastBCP \
--publish_target snowflake_prod
Views and Customization
Automatic Views (External Tables Only)
For external tables, LakeXpress auto-creates views that expose typed columns instead of the raw Parquet VALUE column.
-- External table (raw)
SELECT * FROM EXT_TPCH_1.CUSTOMER;
-- Shows: VALUE column (complex structure)
-- View (typed columns)
SELECT * FROM EXT_TPCH_1.V_CUSTOMER;
-- Shows: C_CUSTKEY, C_NAME, C_ADDRESS, etc.
View Options
| Option | Description | Default |
|---|---|---|
--snowflake_no_views |
Skip view creation | Views created |
--snowflake_view_prefix PREFIX |
Custom view name prefix | V_ |
Custom prefix:
./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type external \
--snowflake_view_prefix "VIEW_" \
...
# Tables: customer, orders
# Views: VIEW_customer, VIEW_orders
No views:
./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type external \
--snowflake_no_views \
...
# Tables: customer, orders
# Views: None
Refreshing External Tables
External tables must be refreshed after data upload:
ALTER EXTERNAL TABLE EXT_TPCH_1.CUSTOMER REFRESH;
ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;
Or via Snowflake CLI:
snow sql -q "ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;" -c lakexpress
Primary Key Constraints
Applies to internal tables only. LakeXpress propagates PRIMARY KEY constraints from the source database to Snowflake.
Usage
./LakeXpress \
--publish_target snowflake_prod \
--snowflake_table_type internal \
--snowflake_pk_constraints \
...
How It Works
- Reads PRIMARY KEY metadata from the source database
- Stores PK information in the export logging database
- Adds PRIMARY KEY constraints to the Snowflake DDL
- Snowflake enforces constraints (query planner benefits)
Requirements
--snowflake_table_type internalonly- Source database must have PRIMARY KEY constraints defined
- Export logging database must have
is_primary_keycolumn (auto-created in v0.1.16+)
Benefits
- Query planner optimizations
- Better execution plans
- Documents data integrity
Data Type Mapping
LakeXpress maps source types to Snowflake types automatically.
PostgreSQL to Snowflake
| PostgreSQL Type | Snowflake Type |
|---|---|
INTEGER, INT4 |
NUMBER(38,0) |
BIGINT, INT8 |
NUMBER(38,0) |
SMALLINT, INT2 |
NUMBER(38,0) |
NUMERIC(p,s) |
NUMBER(p,s) |
REAL, FLOAT4 |
FLOAT |
DOUBLE PRECISION |
DOUBLE |
VARCHAR(n) |
VARCHAR(n) |
TEXT |
VARCHAR(16777216) |
DATE |
DATE |
TIMESTAMP |
TIMESTAMP_NTZ |
BOOLEAN |
BOOLEAN |
JSON, JSONB |
VARIANT |
SQL Server to Snowflake
| SQL Server Type | Snowflake Type |
|---|---|
INT |
NUMBER(38,0) |
BIGINT |
NUMBER(38,0) |
DECIMAL(p,s) |
NUMBER(p,s) |
FLOAT |
DOUBLE |
NVARCHAR(n) |
VARCHAR(n) |
VARCHAR(MAX) |
VARCHAR(16777216) |
DATE |
DATE |
DATETIME, DATETIME2 |
TIMESTAMP_NTZ |
BIT |
BOOLEAN |
CLI Reference
Snowflake Publishing Arguments
| Option | Type | Description |
|---|---|---|
--publish_target ID |
String | Credential ID for Snowflake publishing |
--publish_schema_pattern PATTERN |
String | Schema naming pattern (default: EXT_{schema} for external, {schema} for internal) |
--publish_table_pattern PATTERN |
String | Table naming pattern (default: {table}) |
--snowflake_table_type TYPE |
Enum | external or internal (default: external) |
--snowflake_no_views |
Flag | Skip view creation (external tables only) |
--snowflake_view_prefix PREFIX |
String | View name prefix (default: V_) |
--snowflake_pk_constraints |
Flag | Add PRIMARY KEY constraints (internal tables only) |
Deprecated Arguments
| Old Argument | New Argument | Notes |
|---|---|---|
--publish |
--publish_target |
Removed in v0.1.12+ |