Storage Backends

LakeXpress exports Parquet files to local or cloud storage.

Overview

Option Flag Description
Local Filesystem --output_dir Export to local directories
Cloud Storage --target_storage_id Export to S3, GCS, Azure, or OneLake

--output_dir and --target_storage_id are mutually exclusive.

Snowflake is a publishing target, not a storage backend. Use --publish_target after exporting to cloud storage. See the Snowflake Publishing Guide.

Local Filesystem

Configuration

./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
        --source_db_auth_id ds_03_pg \
        --output_dir ./exports \
        --fastbcp_dir_path /path/to/fastbcp

Directory Structure

output_dir/
└── schema_name/
    └── table_name/
        ├── part-00000.parquet
        ├── part-00001.parquet
        └── ...

With --sub_path staging/temp:

exports/
└── staging/
    └── temp/
        └── schema_name/
            └── table_name/
                ├── part-00000.parquet
                └── ...

Permissions

The user running LakeXpress needs write access to the output directory, sufficient disk space, and the ability to create subdirectories.

S3-Compatible Storage

All S3-compatible providers use ds_type: "s3". Provider differentiation is handled through endpoint_url in the AWS profile.

Supported providers: Amazon S3, OVH S3, MinIO, Wasabi, DigitalOcean Spaces, Backblaze B2, and any S3-compatible object storage.

Uses ~/.aws/config and ~/.aws/credentials:

{
  "s3_01": {
    "ds_type": "s3",
    "auth_mode": "profile",
    "info": {
      "directory": "s3://your-bucket-name/lakexpress/exports",
      "profile": "your-aws-profile"
    }
  }
}

~/.aws/config:

[profile your-aws-profile]
region = us-east-1

~/.aws/credentials:

[your-aws-profile]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
Field Required Description
ds_type Yes Must be "s3"
auth_mode Yes Must be "profile"
directory Yes Full S3 URL (e.g., s3://bucket/path)
profile Yes AWS profile name from ~/.aws/credentials

Method 2: S3-Compatible Providers (OVH, MinIO, etc.)

Use an AWS profile with a custom endpoint_url.

OVH S3:

{
  "s3_02": {
    "ds_type": "s3",
    "auth_mode": "profile",
    "info": {
      "directory": "s3://your-ovh-bucket/lakexpress",
      "profile": "ovh"
    }
  }
}
[profile ovh]
endpoint_url = https://s3.gra.io.cloud.ovh.net
region = gra

MinIO:

{
  "s3_03": {
    "ds_type": "s3",
    "auth_mode": "profile",
    "info": {
      "directory": "s3://data-lake/exports",
      "profile": "minio"
    }
  }
}
[profile minio]
endpoint_url = http://localhost:9000
region = us-east-1
Field Required Description
ds_type Yes Must be "s3"
auth_mode Yes Must be "profile"
directory Yes Full S3 URL (e.g., s3://bucket/path)
profile Yes AWS profile name (with endpoint_url in ~/.aws/config)

Usage

# AWS S3
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
        --source_db_auth_id ds_03_pg \
        --target_storage_id s3_01 \
        --fastbcp_dir_path /path/to/fastbcp

# OVH S3
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
        --source_db_auth_id ds_03_pg \
        --target_storage_id s3_02 \
        --fastbcp_dir_path /path/to/fastbcp

# MinIO
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
        --source_db_auth_id ds_03_pg \
        --target_storage_id s3_03 \
        --fastbcp_dir_path /path/to/fastbcp

S3 Path Structure

s3://bucket-name/base_path/schema_name/table_name/part-00000.parquet

With --sub_path:

s3://bucket-name/base_path/sub_path/schema_name/table_name/part-00000.parquet

Required IAM Permissions (AWS S3)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name/*",
        "arn:aws:s3:::your-bucket-name"
      ]
    }
  ]
}

Backward Compatibility

Custom storage types ending in _s3 (e.g., minio_s3, custom_s3) still work, but storage_type: "s3" is preferred.

Google Cloud Storage (GCS)

Configuration

{
  "gcs_storage": {
    "ds_type": "gcs",
    "auth_mode": "profile",
    "info": {
      "directory": "gs://your-bucket-name/path/to/exports",
      "profile": "/path/to/service-account-key.json"
    }
  }
}
Field Required Description
ds_type Yes Must be "gcs"
auth_mode Yes Must be "profile"
directory Yes Full GCS URL (e.g., gs://bucket/path)
profile Yes Path to GCS service account JSON key file

Usage

./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
        --source_db_auth_id ds_03_pg \
        --target_storage_id gcs_01 \
        --fastbcp_dir_path /path/to/fastbcp

Path Structure

gs://bucket-name/base_path/schema_name/table_name/part-00000.parquet

Service Account Permissions

Required roles:

  • Storage Object Creator
  • Storage Object Viewer (if reading is needed)

Or custom IAM permissions:

  • storage.objects.create
  • storage.objects.delete (if overwriting)
  • storage.buckets.get

Creating a Service Account

gcloud iam service-accounts create lakexpress-export \
    --display-name="LakeXpress Export Service Account"

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
    --member="serviceAccount:lakexpress-export@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/storage.objectCreator"

gcloud iam service-accounts keys create service-account-key.json \
    --iam-account=lakexpress-export@YOUR_PROJECT_ID.iam.gserviceaccount.com

Azure Storage

Supports Azure Data Lake Storage Gen2 (ADLS Gen2) and Azure Blob Storage via Service Principal authentication.

Configuration

{
  "azure_storage": {
    "ds_type": "azure",
    "auth_mode": "service_principal",
    "info": {
      "directory": "abfss://your-container@your-storage-account.dfs.core.windows.net/path/to/exports",
      "azure_client_id": "your-application-client-id",
      "azure_tenant_id": "your-directory-tenant-id",
      "azure_client_secret": "${AZURE_CLIENT_SECRET}"
    }
  }
}

Use ${VAR_NAME} syntax to reference environment variables. Set the variable before running LakeXpress (e.g., export AZURE_CLIENT_SECRET="your-secret").

Field Required Description
ds_type Yes Must be "azure"
auth_mode Yes Must be "service_principal"
directory Yes Azure storage URL (abfss:// for ADLS Gen2, abs:// for Blob)
azure_client_id Yes Application (client) ID from Azure AD app registration
azure_tenant_id Yes Directory (tenant) ID from Azure AD
azure_client_secret Yes Client secret value from Azure AD app

Storage URL Formats

ADLS Gen2:

abfss://container-name.dfs.core.windows.net/path/to/exports

Azure Blob Storage:

abs://container-name.blob.core.windows.net/path/to/exports

The URL format uses .dfs.core.windows.net (not @storageaccount.dfs.core.windows.net).

Azure Authentication Setup

1. Create an Azure AD Application (Service Principal):

In Azure Portal App Registrations:

  • Click “New registration”
  • Note the Application (client) ID and Directory (tenant) ID
  • Go to “Certificates & secrets” -> “New client secret”
  • Copy the client secret value (shown only once)

2. Assign Storage Permissions:

In your Storage Account -> Access Control (IAM), add the “Storage Blob Data Contributor” role to your service principal.

Azure RBAC roles reference

Usage

./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
        --source_db_auth_id ds_03_pg \
        --target_storage_id azure_01 \
        --fastbcp_dir_path /path/to/fastbcp

Path Structure

ADLS Gen2:

abfss://container@storage.dfs.core.windows.net/path/to/exports/schema_name/table_name/part-00000.parquet

With --sub_path:

abfss://container@storage.dfs.core.windows.net/path/to/exports/sub_path/schema_name/table_name/part-00000.parquet

Required Permissions

The Service Principal needs Storage Blob Data Contributor, which grants:

  • Microsoft.Storage/storageAccounts/blobServices/containers/read
  • Microsoft.Storage/storageAccounts/blobServices/containers/write
  • Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action

Troubleshooting

Problem Solution
Authentication / access denied Verify client ID, tenant ID, and secret. Confirm “Storage Blob Data Contributor” role. Check network access rules. Verify container exists.
Invalid URL format Use abfss:// for ADLS Gen2, abs:// for Blob. Format: abfss://container@storageaccount.dfs.core.windows.net/path.
Connection timeout Check firewall rules, storage account firewall allowlist, and private endpoint restrictions.

Microsoft Fabric OneLake

Exports Parquet files to OneLake via Service Principal authentication.

Configuration

{
  "onelake_storage": {
    "ds_type": "onelake",
    "auth_mode": "service_principal",
    "info": {
      "directory": "onelake://your-workspace-name/your-lakehouse-name/",
      "azure_client_id": "your-application-client-id",
      "azure_tenant_id": "your-directory-tenant-id",
      "azure_client_secret": "your-client-secret"
    }
  }
}
Field Required Description
ds_type Yes Must be "onelake"
auth_mode Yes Must be "service_principal"
directory Yes onelake://workspace/lakehouse/path/
azure_client_id Yes Application (client) ID from Azure AD
azure_tenant_id Yes Directory (tenant) ID from Azure AD
azure_client_secret Yes Client secret value from Azure AD app

URL Format

onelake://workspace-name/lakehouse-name/path/to/exports/

Example:

onelake://FABRICPOC/lakexpress_lakehouse/exports/

Usage

./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
        --source_db_auth_id ds_03_pg \
        --target_storage_id fabric_onelake \
        --fastbcp_dir_path /path/to/fastbcp

With config creation:

python lxpress.py config create \
       -a auth.json \
       --log_db_auth_id log_db_ms \
       --source_db_auth_id ds_04_pg \
       --source_db_name tpch \
       --source_schema_name tpch_1 \
       --fastbcp_dir_path /path/to/fastbcp \
       --fastbcp_p 2 \
       --n_jobs 4 \
       --target_storage_id fabric_onelake \
       --generate_metadata

Path Structure

onelake://workspace/lakehouse/base_path/schema_name/table_name/part-00000.parquet

With --sub_path:

onelake://workspace/lakehouse/base_path/sub_path/schema_name/table_name/part-00000.parquet

Authentication Setup

1. Create an Azure AD Application (Service Principal):

In Azure Portal App Registrations:

  • Click “New registration”
  • Note the Application (client) ID and Directory (tenant) ID
  • Go to “Certificates & secrets” -> “New client secret”
  • Copy the client secret value (shown only once)

2. Grant OneLake Access in Microsoft Fabric:

In the Fabric portal, go to Workspace settings -> “Manage access” and add the service principal with Contributor or Admin permissions. For finer control, grant access at the Lakehouse level.

Troubleshooting

Problem Solution
Authentication / access denied Verify client ID, tenant ID, and secret. Confirm Contributor role on workspace. Verify OneLake is enabled.
Invalid URL format Use onelake:// prefix. Format: onelake://workspace-name/lakehouse-name/path/. Ensure path ends with /.
Connection timeout Check firewall rules and verify Fabric workspace is network-accessible.

Snowflake Publishing

Snowflake is not a storage backend. LakeXpress first exports to cloud storage (S3/GCS/Azure), then creates Snowflake tables pointing to that data using --publish_target.

For the full guide covering table types, naming patterns, primary key propagation, views, and troubleshooting, see the Snowflake Publishing Guide.

Credential Configuration

Password

{
  "snowflake_password": {
    "ds_type": "snowflake",
    "auth_mode": "password",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "password": "${SNOWFLAKE_PASSWORD}",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}

Use ${VAR_NAME} to reference environment variables (e.g., export SNOWFLAKE_PASSWORD="your-password").

Key Pair

{
  "snowflake_keypair": {
    "ds_type": "snowflake",
    "auth_mode": "keypair",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "private_key_path": "/path/to/rsa_key.p8",
      "private_key_passphrase": "optional-passphrase",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}

Programmatic Access Token (PAT)

{
  "snowflake_pat": {
    "ds_type": "snowflake",
    "auth_mode": "pat",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "token": "your-personal-access-token",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}

OAuth

{
  "snowflake_oauth": {
    "ds_type": "snowflake",
    "auth_mode": "oauth",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "oauth_token": "your-oauth-token",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}

Configuration Fields

Field Required Description
ds_type Yes Must be "snowflake"
auth_mode Yes "password", "keypair", "pat", or "oauth"
account Yes Account identifier (e.g., khhoube-vt58050)
user Yes Snowflake username (usually all caps)
warehouse Yes Warehouse name
database Yes Target database
stage Yes Stage name for external table data
schema No Schema (default: PUBLIC)
role No Role (default: user’s default role)
password Conditional For password auth mode
private_key_path Conditional For keypair auth mode
private_key_passphrase No Passphrase for encrypted private key
token Conditional For pat auth mode
oauth_token Conditional For oauth auth mode

Sub-Path Option

All backends support --sub_path to insert an intermediate directory level:

./LakeXpress --target_storage_id s3_01 --sub_path staging/daily/2025-01-15 ...

Produces:

s3://bucket/base_path/staging/daily/2025-01-15/schema_name/table_name/*.parquet

Use cases: date partitioning (staging/2025/01/15), environment separation (dev/exports vs prod/exports), project organization (project-a/datasets).

Troubleshooting

Local Filesystem

Problem Solution
Permission denied chmod 755 /path/to/output_dir. Ensure parent directories exist.

AWS S3

Problem Solution
Access denied / credential errors Verify credentials: aws s3 ls s3://your-bucket --profile your-profile. Check IAM includes s3:PutObject. Verify region.
Slow uploads Enable S3 Transfer Acceleration. Use S3 in the same region as your source database.

GCS

Problem Solution
Authentication errors Verify key file path. Check Storage Object Creator role. Ensure service account is enabled.
Quota exceeded Check GCS quotas in Google Cloud Console. Request increase if needed.

S3-Compatible Storage

Problem Solution
Connection errors Verify endpoint URL. Check firewall rules. Ensure valid SSL certificates. Verify credential format.

See Also