Snowflake Publishing

LakeXpress creates Snowflake tables from exported Parquet files, enabling queries on data stored in S3, Azure, or GCS.

Table of Contents

Prerequisites

1. Snowflake Setup

Required objects:

  • Database (e.g., LAKEXPRESS_DB)
  • Warehouse (e.g., SNOWFLAKE_LEARNING_WH)
  • Stage pointing to S3 bucket (e.g., AWSS3_AWDW_STAGE)

Required privileges:

USE ROLE ACCOUNTADMIN;
GRANT CREATE SCHEMA ON DATABASE LAKEXPRESS_DB TO ROLE SYSADMIN;
GRANT USAGE ON WAREHOUSE SNOWFLAKE_LEARNING_WH TO ROLE SYSADMIN;
GRANT USAGE ON STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE TO ROLE SYSADMIN;

2. Cloud Storage Setup

Both S3 and Snowflake credentials must be in your credentials.json:

{
  "aws_s3_01": {
    "ds_type": "s3",
    "auth_mode": "profile",
    "info": {
      "directory": "s3://aetplakexpress/lakexpress/",
      "profile": "lakexpress"
    }
  },
  "snowflake_prod": {
    "ds_type": "snowflake",
    "auth_mode": "password",
    "info": {
      "account": "tj36405.eu-west-1",
      "user": "FRANCOIS.PACULL",
      "password": "your-password",
      "warehouse": "SNOWFLAKE_LEARNING_WH",
      "database": "LAKEXPRESS_DB",
      "stage": "AWSS3_AWDW_STAGE"
    }
  }
}

Critical: The S3 directory path must exactly match the Snowflake stage location.

Verify the match:

DESC STAGE LAKEXPRESS_DB.PUBLIC.AWSS3_AWDW_STAGE;
-- URL column shows: s3://aetplakexpress/lakexpress/

-- This MUST match your S3 credentials directory
-- "directory": "s3://aetplakexpress/lakexpress/"  ✓ Correct
-- "directory": "s3://aetplakexpress/"              ✗ Wrong - files won't be found

Authentication Modes

Password Authentication

{
  "snowflake_prod": {
    "ds_type": "snowflake",
    "auth_mode": "password",
    "info": {
      "account": "tj36405.eu-west-1",
      "user": "FRANCOIS.PACULL",
      "password": "your-password",
      "warehouse": "SNOWFLAKE_LEARNING_WH",
      "database": "LAKEXPRESS_DB",
      "stage": "AWSS3_AWDW_STAGE"
    }
  }
}

Programmatic Access Token (PAT) Authentication

Recommended for automation:

{
  "snowflake_prod": {
    "ds_type": "snowflake",
    "auth_mode": "pat",
    "info": {
      "account": "tj36405.eu-west-1",
      "user": "FRANCOIS.PACULL",
      "token": "your-personal-access-token",
      "warehouse": "SNOWFLAKE_LEARNING_WH",
      "database": "LAKEXPRESS_DB",
      "stage": "AWSS3_AWDW_STAGE"
    }
  }
}

PATs require a network policy configured for the user.

PAT benefits:

  • Easily rotated, configurable lifetime
  • No MFA prompts – suited for CI/CD pipelines

Table Types

External Tables

Data stays in cloud storage; Snowflake stores only metadata. Views are auto-created for cleaner querying.

  • No data loading time or Snowflake storage costs
  • Immediate availability after export
  • Suited for: data lake integration, infrequent access, exploration, ETL staging
./LakeXpress \
  -a data/ds_credentials.json \
  --source_db_auth_id ds_04_pg \
  --target_storage_id aws_s3_01 \
  --publish_target snowflake_prod \
  --snowflake_table_type external  # Default

Internal Tables

Data is loaded into Snowflake storage with full feature support (clustering, time travel, etc.). Can include PRIMARY KEY constraints from source.

  • Faster queries with full Snowflake features
  • Maintains PRIMARY KEY constraints
  • Suited for: frequently queried data, production warehouses, complex joins
./LakeXpress \
  -a data/ds_credentials.json \
  --source_db_auth_id ds_04_pg \
  --target_storage_id aws_s3_01 \
  --publish_target snowflake_prod \
  --snowflake_table_type internal \
  --snowflake_pk_constraints  # Optional: include PK constraints

Dynamic Naming Patterns

Schema and table names support token-based patterns for time-partitioned exports, schema consolidation, or unique identifiers.

Supported Tokens

Token Description Example Output
{schema} Source schema name tpch_1
{table} Source table name customer
{database} Source database name tpch
{date} Current date (YYYYMMDD) 20251120
{timestamp} Current timestamp (YYYYMMDD_HHMMSS) 20251120_112511
{uuid} UUID4 identifier (consistent per run) a1b2c3d4-...
{subpath} CLI --sub_path parameter toto

Note: All timestamps are in local time. The {table} token is mandatory for --publish_table_pattern.

CLI Arguments

--publish_schema_pattern PATTERN    # Schema name pattern (default varies by table type)
--publish_table_pattern PATTERN      # Table name pattern (must include {table} token)

Default Patterns

Table Type Default Schema Pattern Default Table Pattern
External EXT_{schema} {table}
Internal {schema} {table}

Common Patterns

1. Time-Based Partitioning

./LakeXpress \
  --publish_schema_pattern "EXT_{schema}_{date}" \
  --publish_table_pattern "{table}" \
  --publish_target snowflake_prod \
  ...

# Results:
# Schema: EXT_tpch_1_20251120
# Tables: customer, orders, lineitem

2. Multi-Schema Consolidation

./LakeXpress \
  --source_schema_name schema1,schema2,schema3 \
  --publish_schema_pattern "{subpath}" \
  --publish_table_pattern "{schema}_{table}" \
  --sub_path consolidated \
  --publish_target snowflake_prod \
  ...

# Results:
# Schema: consolidated
# Tables: schema1_customer, schema2_customer, schema3_customer

3. Unique Run Identifiers

./LakeXpress \
  --publish_schema_pattern "EXT_{schema}_{uuid}" \
  --publish_table_pattern "{table}" \
  --publish_target snowflake_prod \
  ...

# Results:
# Schema: EXT_tpch_1_a1b2c3d4-e5f6-7890-abcd-ef1234567890
# Tables: customer, orders, lineitem

Usage Examples

Example 1: External Tables with Custom Patterns

./LakeXpress -a data/ds_credentials.json \
       --log_db_auth_id log_db_ms \
       --source_db_auth_id ds_04_pg \
       --source_db_name tpch \
       --source_schema_name tpch_1 \
       --fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
       --fastbcp_p 2 \
       --n_jobs 4 \
       --target_storage_id aws_s3_01 \
       --generate_metadata \
       --sub_path toto \
       --publish_target snowflake_pass \
       --publish_schema_pattern "EXT_{subpath}_{date}" \
       --publish_table_pattern "{schema}_{table}" \
       --snowflake_table_type external

Result:

  • Schema: EXT_toto_20251124
  • Tables: tpch_1_customer, tpch_1_orders, etc.
  • Views: V_tpch_1_customer, V_tpch_1_orders, etc.

Example 2: Internal Tables with Primary Keys

./LakeXpress -a data/ds_credentials.json \
       --log_db_auth_id log_db_ms \
       --source_db_auth_id ds_04_pg \
       --source_db_name tpch \
       --source_schema_name tpch_1 \
       --fastbcp_dir_path /home/francois/Workspace/FastBCP_exe/FastBCP_0_28_3_full/ \
       --fastbcp_p 2 \
       --n_jobs 4 \
       --target_storage_id aws_s3_01 \
       --generate_metadata \
       --sub_path toto \
       --snowflake_table_type internal \
       --publish_target snowflake_pat \
       --snowflake_pk_constraints

Result:

  • Schema: tpch_1
  • Tables: customer, orders, etc. (with data loaded)
  • PRIMARY KEY constraints applied where available

Example 3: Single Table

./LakeXpress \
  -a data/ds_credentials.json \
  --log_db_auth_id log_db_ms \
  --source_db_auth_id ds_04_pg \
  --target_storage_id aws_s3_01 \
  --source_schema_name tpch_1 \
  -i nation \
  --fastbcp_dir_path /path/to/FastBCP \
  --publish_target snowflake_prod

Example 4: Full Schema

./LakeXpress \
  -a data/ds_credentials.json \
  --log_db_auth_id log_db_ms \
  --source_db_auth_id ds_04_pg \
  --target_storage_id aws_s3_01 \
  --source_schema_name tpch_1 \
  --fastbcp_dir_path /path/to/FastBCP \
  --publish_target snowflake_prod

Views and Customization

Automatic Views (External Tables Only)

For external tables, LakeXpress auto-creates views that expose typed columns instead of the raw Parquet VALUE column.

-- External table (raw)
SELECT * FROM EXT_TPCH_1.CUSTOMER;
-- Shows: VALUE column (complex structure)

-- View (typed columns)
SELECT * FROM EXT_TPCH_1.V_CUSTOMER;
-- Shows: C_CUSTKEY, C_NAME, C_ADDRESS, etc.

View Options

Option Description Default
--snowflake_no_views Skip view creation Views created
--snowflake_view_prefix PREFIX Custom view name prefix V_

Custom prefix:

./LakeXpress \
  --publish_target snowflake_prod \
  --snowflake_table_type external \
  --snowflake_view_prefix "VIEW_" \
  ...

# Tables: customer, orders
# Views: VIEW_customer, VIEW_orders

No views:

./LakeXpress \
  --publish_target snowflake_prod \
  --snowflake_table_type external \
  --snowflake_no_views \
  ...

# Tables: customer, orders
# Views: None

Refreshing External Tables

External tables must be refreshed after data upload:

ALTER EXTERNAL TABLE EXT_TPCH_1.CUSTOMER REFRESH;
ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;

Or via Snowflake CLI:

snow sql -q "ALTER EXTERNAL TABLE EXT_TPCH_1.LINEITEM REFRESH;" -c lakexpress

Primary Key Constraints

Applies to internal tables only. LakeXpress propagates PRIMARY KEY constraints from the source database to Snowflake.

Usage

./LakeXpress \
  --publish_target snowflake_prod \
  --snowflake_table_type internal \
  --snowflake_pk_constraints \
  ...

How It Works

  1. Reads PRIMARY KEY metadata from the source database
  2. Stores PK information in the export logging database
  3. Adds PRIMARY KEY constraints to the Snowflake DDL
  4. Snowflake enforces constraints (query planner benefits)

Requirements

  • --snowflake_table_type internal only
  • Source database must have PRIMARY KEY constraints defined
  • Export logging database must have is_primary_key column (auto-created in v0.1.16+)

Benefits

  • Query planner optimizations
  • Better execution plans
  • Documents data integrity

Data Type Mapping

LakeXpress maps source types to Snowflake types automatically.

PostgreSQL to Snowflake

PostgreSQL Type Snowflake Type
INTEGER, INT4 NUMBER(38,0)
BIGINT, INT8 NUMBER(38,0)
SMALLINT, INT2 NUMBER(38,0)
NUMERIC(p,s) NUMBER(p,s)
REAL, FLOAT4 FLOAT
DOUBLE PRECISION DOUBLE
VARCHAR(n) VARCHAR(n)
TEXT VARCHAR(16777216)
DATE DATE
TIMESTAMP TIMESTAMP_NTZ
BOOLEAN BOOLEAN
JSON, JSONB VARIANT

SQL Server to Snowflake

SQL Server Type Snowflake Type
INT NUMBER(38,0)
BIGINT NUMBER(38,0)
DECIMAL(p,s) NUMBER(p,s)
FLOAT DOUBLE
NVARCHAR(n) VARCHAR(n)
VARCHAR(MAX) VARCHAR(16777216)
DATE DATE
DATETIME, DATETIME2 TIMESTAMP_NTZ
BIT BOOLEAN

CLI Reference

Snowflake Publishing Arguments

Option Type Description
--publish_target ID String Credential ID for Snowflake publishing
--publish_schema_pattern PATTERN String Schema naming pattern (default: EXT_{schema} for external, {schema} for internal)
--publish_table_pattern PATTERN String Table naming pattern (default: {table})
--snowflake_table_type TYPE Enum external or internal (default: external)
--snowflake_no_views Flag Skip view creation (external tables only)
--snowflake_view_prefix PREFIX String View name prefix (default: V_)
--snowflake_pk_constraints Flag Add PRIMARY KEY constraints (internal tables only)

Deprecated Arguments

Old Argument New Argument Notes
--publish --publish_target Removed in v0.1.12+