Storage Backends
LakeXpress exports Parquet files to local or cloud storage.
Overview
| Option | Flag | Description |
|---|---|---|
| Local Filesystem | --output_dir |
Export to local directories |
| Cloud Storage | --target_storage_id |
Export to S3, GCS, Azure, or OneLake |
--output_dir and --target_storage_id are mutually exclusive.
Snowflake is a publishing target, not a storage backend. Use --publish_target after exporting to cloud storage. See the Snowflake Publishing Guide.
Local Filesystem
Configuration
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
--source_db_auth_id ds_03_pg \
--output_dir ./exports \
--fastbcp_dir_path /path/to/fastbcp
Directory Structure
output_dir/
└── schema_name/
└── table_name/
├── part-00000.parquet
├── part-00001.parquet
└── ...
With --sub_path staging/temp:
exports/
└── staging/
└── temp/
└── schema_name/
└── table_name/
├── part-00000.parquet
└── ...
Permissions
The user running LakeXpress needs write access to the output directory, sufficient disk space, and the ability to create subdirectories.
S3-Compatible Storage
All S3-compatible providers use ds_type: "s3". Provider differentiation is handled through endpoint_url in the AWS profile.
Supported providers: Amazon S3, OVH S3, MinIO, Wasabi, DigitalOcean Spaces, Backblaze B2, and any S3-compatible object storage.
Method 1: AWS Profile (Recommended for AWS)
Uses ~/.aws/config and ~/.aws/credentials:
{
"s3_01": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://your-bucket-name/lakexpress/exports",
"profile": "your-aws-profile"
}
}
}
~/.aws/config:
[profile your-aws-profile]
region = us-east-1
~/.aws/credentials:
[your-aws-profile]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "s3" |
auth_mode |
Yes | Must be "profile" |
directory |
Yes | Full S3 URL (e.g., s3://bucket/path) |
profile |
Yes | AWS profile name from ~/.aws/credentials |
Method 2: S3-Compatible Providers (OVH, MinIO, etc.)
Use an AWS profile with a custom endpoint_url.
OVH S3:
{
"s3_02": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://your-ovh-bucket/lakexpress",
"profile": "ovh"
}
}
}
[profile ovh]
endpoint_url = https://s3.gra.io.cloud.ovh.net
region = gra
MinIO:
{
"s3_03": {
"ds_type": "s3",
"auth_mode": "profile",
"info": {
"directory": "s3://data-lake/exports",
"profile": "minio"
}
}
}
[profile minio]
endpoint_url = http://localhost:9000
region = us-east-1
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "s3" |
auth_mode |
Yes | Must be "profile" |
directory |
Yes | Full S3 URL (e.g., s3://bucket/path) |
profile |
Yes | AWS profile name (with endpoint_url in ~/.aws/config) |
Usage
# AWS S3
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
--source_db_auth_id ds_03_pg \
--target_storage_id s3_01 \
--fastbcp_dir_path /path/to/fastbcp
# OVH S3
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
--source_db_auth_id ds_03_pg \
--target_storage_id s3_02 \
--fastbcp_dir_path /path/to/fastbcp
# MinIO
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
--source_db_auth_id ds_03_pg \
--target_storage_id s3_03 \
--fastbcp_dir_path /path/to/fastbcp
S3 Path Structure
s3://bucket-name/base_path/schema_name/table_name/part-00000.parquet
With --sub_path:
s3://bucket-name/base_path/sub_path/schema_name/table_name/part-00000.parquet
Required IAM Permissions (AWS S3)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectAcl",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name/*",
"arn:aws:s3:::your-bucket-name"
]
}
]
}
Backward Compatibility
Custom storage types ending in _s3 (e.g., minio_s3, custom_s3) still work, but storage_type: "s3" is preferred.
Google Cloud Storage (GCS)
Configuration
{
"gcs_storage": {
"ds_type": "gcs",
"auth_mode": "profile",
"info": {
"directory": "gs://your-bucket-name/path/to/exports",
"profile": "/path/to/service-account-key.json"
}
}
}
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "gcs" |
auth_mode |
Yes | Must be "profile" |
directory |
Yes | Full GCS URL (e.g., gs://bucket/path) |
profile |
Yes | Path to GCS service account JSON key file |
Usage
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
--source_db_auth_id ds_03_pg \
--target_storage_id gcs_01 \
--fastbcp_dir_path /path/to/fastbcp
Path Structure
gs://bucket-name/base_path/schema_name/table_name/part-00000.parquet
Service Account Permissions
Required roles:
Storage Object CreatorStorage Object Viewer(if reading is needed)
Or custom IAM permissions:
storage.objects.createstorage.objects.delete(if overwriting)storage.buckets.get
Creating a Service Account
gcloud iam service-accounts create lakexpress-export \
--display-name="LakeXpress Export Service Account"
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:lakexpress-export@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectCreator"
gcloud iam service-accounts keys create service-account-key.json \
--iam-account=lakexpress-export@YOUR_PROJECT_ID.iam.gserviceaccount.com
Azure Storage
Supports Azure Data Lake Storage Gen2 (ADLS Gen2) and Azure Blob Storage via Service Principal authentication.
Configuration
{
"azure_storage": {
"ds_type": "azure",
"auth_mode": "service_principal",
"info": {
"directory": "abfss://your-container@your-storage-account.dfs.core.windows.net/path/to/exports",
"azure_client_id": "your-application-client-id",
"azure_tenant_id": "your-directory-tenant-id",
"azure_client_secret": "${AZURE_CLIENT_SECRET}"
}
}
}
Use
${VAR_NAME}syntax to reference environment variables. Set the variable before running LakeXpress (e.g.,export AZURE_CLIENT_SECRET="your-secret").
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "azure" |
auth_mode |
Yes | Must be "service_principal" |
directory |
Yes | Azure storage URL (abfss:// for ADLS Gen2, abs:// for Blob) |
azure_client_id |
Yes | Application (client) ID from Azure AD app registration |
azure_tenant_id |
Yes | Directory (tenant) ID from Azure AD |
azure_client_secret |
Yes | Client secret value from Azure AD app |
Storage URL Formats
ADLS Gen2:
abfss://container-name.dfs.core.windows.net/path/to/exports
Azure Blob Storage:
abs://container-name.blob.core.windows.net/path/to/exports
The URL format uses .dfs.core.windows.net (not @storageaccount.dfs.core.windows.net).
Azure Authentication Setup
1. Create an Azure AD Application (Service Principal):
In Azure Portal App Registrations:
- Click “New registration”
- Note the Application (client) ID and Directory (tenant) ID
- Go to “Certificates & secrets” -> “New client secret”
- Copy the client secret value (shown only once)
2. Assign Storage Permissions:
In your Storage Account -> Access Control (IAM), add the “Storage Blob Data Contributor” role to your service principal.
Usage
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
--source_db_auth_id ds_03_pg \
--target_storage_id azure_01 \
--fastbcp_dir_path /path/to/fastbcp
Path Structure
ADLS Gen2:
abfss://container@storage.dfs.core.windows.net/path/to/exports/schema_name/table_name/part-00000.parquet
With --sub_path:
abfss://container@storage.dfs.core.windows.net/path/to/exports/sub_path/schema_name/table_name/part-00000.parquet
Required Permissions
The Service Principal needs Storage Blob Data Contributor, which grants:
Microsoft.Storage/storageAccounts/blobServices/containers/readMicrosoft.Storage/storageAccounts/blobServices/containers/writeMicrosoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action
Troubleshooting
| Problem | Solution |
|---|---|
| Authentication / access denied | Verify client ID, tenant ID, and secret. Confirm “Storage Blob Data Contributor” role. Check network access rules. Verify container exists. |
| Invalid URL format | Use abfss:// for ADLS Gen2, abs:// for Blob. Format: abfss://container@storageaccount.dfs.core.windows.net/path. |
| Connection timeout | Check firewall rules, storage account firewall allowlist, and private endpoint restrictions. |
Microsoft Fabric OneLake
Exports Parquet files to OneLake via Service Principal authentication.
Configuration
{
"onelake_storage": {
"ds_type": "onelake",
"auth_mode": "service_principal",
"info": {
"directory": "onelake://your-workspace-name/your-lakehouse-name/",
"azure_client_id": "your-application-client-id",
"azure_tenant_id": "your-directory-tenant-id",
"azure_client_secret": "your-client-secret"
}
}
}
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "onelake" |
auth_mode |
Yes | Must be "service_principal" |
directory |
Yes | onelake://workspace/lakehouse/path/ |
azure_client_id |
Yes | Application (client) ID from Azure AD |
azure_tenant_id |
Yes | Directory (tenant) ID from Azure AD |
azure_client_secret |
Yes | Client secret value from Azure AD app |
URL Format
onelake://workspace-name/lakehouse-name/path/to/exports/
Example:
onelake://FABRICPOC/lakexpress_lakehouse/exports/
Usage
./LakeXpress -a auth.json --log_db_auth_id log_db_ms \
--source_db_auth_id ds_03_pg \
--target_storage_id fabric_onelake \
--fastbcp_dir_path /path/to/fastbcp
With config creation:
python lxpress.py config create \
-a auth.json \
--log_db_auth_id log_db_ms \
--source_db_auth_id ds_04_pg \
--source_db_name tpch \
--source_schema_name tpch_1 \
--fastbcp_dir_path /path/to/fastbcp \
--fastbcp_p 2 \
--n_jobs 4 \
--target_storage_id fabric_onelake \
--generate_metadata
Path Structure
onelake://workspace/lakehouse/base_path/schema_name/table_name/part-00000.parquet
With --sub_path:
onelake://workspace/lakehouse/base_path/sub_path/schema_name/table_name/part-00000.parquet
Authentication Setup
1. Create an Azure AD Application (Service Principal):
In Azure Portal App Registrations:
- Click “New registration”
- Note the Application (client) ID and Directory (tenant) ID
- Go to “Certificates & secrets” -> “New client secret”
- Copy the client secret value (shown only once)
2. Grant OneLake Access in Microsoft Fabric:
In the Fabric portal, go to Workspace settings -> “Manage access” and add the service principal with Contributor or Admin permissions. For finer control, grant access at the Lakehouse level.
Troubleshooting
| Problem | Solution |
|---|---|
| Authentication / access denied | Verify client ID, tenant ID, and secret. Confirm Contributor role on workspace. Verify OneLake is enabled. |
| Invalid URL format | Use onelake:// prefix. Format: onelake://workspace-name/lakehouse-name/path/. Ensure path ends with /. |
| Connection timeout | Check firewall rules and verify Fabric workspace is network-accessible. |
Snowflake Publishing
Snowflake is not a storage backend. LakeXpress first exports to cloud storage (S3/GCS/Azure), then creates Snowflake tables pointing to that data using --publish_target.
For the full guide covering table types, naming patterns, primary key propagation, views, and troubleshooting, see the Snowflake Publishing Guide.
Credential Configuration
Password
{
"snowflake_password": {
"ds_type": "snowflake",
"auth_mode": "password",
"info": {
"account": "your-account-identifier",
"user": "your-username",
"password": "${SNOWFLAKE_PASSWORD}",
"warehouse": "your-warehouse",
"database": "your-database",
"stage": "your-external-stage"
}
}
}
Use
${VAR_NAME}to reference environment variables (e.g.,export SNOWFLAKE_PASSWORD="your-password").
Key Pair
{
"snowflake_keypair": {
"ds_type": "snowflake",
"auth_mode": "keypair",
"info": {
"account": "your-account-identifier",
"user": "your-username",
"private_key_path": "/path/to/rsa_key.p8",
"private_key_passphrase": "optional-passphrase",
"warehouse": "your-warehouse",
"database": "your-database",
"stage": "your-external-stage"
}
}
}
Programmatic Access Token (PAT)
{
"snowflake_pat": {
"ds_type": "snowflake",
"auth_mode": "pat",
"info": {
"account": "your-account-identifier",
"user": "your-username",
"token": "your-personal-access-token",
"warehouse": "your-warehouse",
"database": "your-database",
"stage": "your-external-stage"
}
}
}
OAuth
{
"snowflake_oauth": {
"ds_type": "snowflake",
"auth_mode": "oauth",
"info": {
"account": "your-account-identifier",
"user": "your-username",
"oauth_token": "your-oauth-token",
"warehouse": "your-warehouse",
"database": "your-database",
"stage": "your-external-stage"
}
}
}
Configuration Fields
| Field | Required | Description |
|---|---|---|
ds_type |
Yes | Must be "snowflake" |
auth_mode |
Yes | "password", "keypair", "pat", or "oauth" |
account |
Yes | Account identifier (e.g., khhoube-vt58050) |
user |
Yes | Snowflake username (usually all caps) |
warehouse |
Yes | Warehouse name |
database |
Yes | Target database |
stage |
Yes | Stage name for external table data |
schema |
No | Schema (default: PUBLIC) |
role |
No | Role (default: user’s default role) |
password |
Conditional | For password auth mode |
private_key_path |
Conditional | For keypair auth mode |
private_key_passphrase |
No | Passphrase for encrypted private key |
token |
Conditional | For pat auth mode |
oauth_token |
Conditional | For oauth auth mode |
Sub-Path Option
All backends support --sub_path to insert an intermediate directory level:
./LakeXpress --target_storage_id s3_01 --sub_path staging/daily/2025-01-15 ...
Produces:
s3://bucket/base_path/staging/daily/2025-01-15/schema_name/table_name/*.parquet
Use cases: date partitioning (staging/2025/01/15), environment separation (dev/exports vs prod/exports), project organization (project-a/datasets).
Troubleshooting
Local Filesystem
| Problem | Solution |
|---|---|
| Permission denied | chmod 755 /path/to/output_dir. Ensure parent directories exist. |
AWS S3
| Problem | Solution |
|---|---|
| Access denied / credential errors | Verify credentials: aws s3 ls s3://your-bucket --profile your-profile. Check IAM includes s3:PutObject. Verify region. |
| Slow uploads | Enable S3 Transfer Acceleration. Use S3 in the same region as your source database. |
GCS
| Problem | Solution |
|---|---|
| Authentication errors | Verify key file path. Check Storage Object Creator role. Ensure service account is enabled. |
| Quota exceeded | Check GCS quotas in Google Cloud Console. Request increase if needed. |
S3-Compatible Storage
| Problem | Solution |
|---|---|
| Connection errors | Verify endpoint URL. Check firewall rules. Ensure valid SSL certificates. Verify credential format. |