Database Configuration

LakeXpress supports multiple database types as sources and for export logging.

Supported Databases

Source Databases

Database Support Status Notes
PostgreSQL Supported Ctid-based parallel export
Oracle Supported Thin and thick modes, Rowid-based parallel export
SQL Server Supported Physloc-based parallel export
MySQL Supported Partition detection
MariaDB Supported Dedicated mariadb ds_type

Log Databases

Database Support Status Notes
PostgreSQL Supported Recommended for production
SQL Server Supported  
MySQL Supported  
SQLite Supported Single-user scenarios
DuckDB Supported Embedded analytical option

Authentication File Format

Create a JSON file with your database credentials:

{
  "log_db_postgres": {
    "ds_type": "postgres",
    "auth_mode": "classic",
    "info": {
      "username": "postgres",
      "password": "your_password",
      "server": "localhost",
      "port": 5432,
      "database": "lakexpress_log"
    }
  },
  "source_oracle": {
    "ds_type": "oracle",
    "auth_mode": "classic",
    "info": {
      "username": "oracle_user",
      "password": "oracle_password",
      "server": "oracle-server.com",
      "port": 1521,
      "database": "orclpdb1",
      "lib_dir": "/opt/oracle/instantclient_19_8"
    }
  },
  "source_postgres": {
    "ds_type": "postgres",
    "auth_mode": "classic",
    "info": {
      "username": "postgres",
      "password": "your_password",
      "server": "localhost",
      "port": 5432,
      "database": "production_db"
    }
  },
  "source_mssql": {
    "ds_type": "mssql",
    "auth_mode": "classic",
    "info": {
      "username": "sa",
      "password": "mssql_password",
      "server": "mssql-server.com",
      "port": 1433,
      "database": "sales_db"
    }
  },
  "log_db_sqlite": {
    "ds_type": "sqlite",
    "auth_mode": "filesystem",
    "info": {
      "filepath": "/path/to/lakexpress_log.sqlite"
    }
  }
}

PostgreSQL Configuration

Connection Parameters

Parameter Required Description
ds_type Yes Must be "postgres"
auth_mode Yes Must be "classic"
username Yes PostgreSQL username
password Yes PostgreSQL password
server Yes Hostname or IP
port Yes Port (default: 5432)
database Yes Database name

Required Permissions

-- Source database user:
GRANT SELECT ON ALL TABLES IN SCHEMA public TO lakexpress_user;
GRANT USAGE ON SCHEMA public TO lakexpress_user;

-- information_schema access is granted by default

Example

{
  "postgres_prod": {
    "ds_type": "postgres",
    "auth_mode": "classic",
    "info": {
      "username": "dataexport",
      "password": "SecureP@ssw0rd",
      "server": "pg-prod.company.com",
      "port": 5432,
      "database": "analytics"
    }
  }
}

Oracle Configuration

Connection Parameters

Parameter Required Description
ds_type Yes Must be "oracle"
auth_mode Yes Must be "classic"
username Yes Oracle username
password Yes Oracle password
server Yes Hostname or IP
port Yes Port (default: 1521)
database Yes Service name or SID
lib_dir No Client library directory (thick mode)

Thin vs Thick Mode

Thin Mode (Recommended):

  • No client libraries required
  • Oracle 12.1+
  • Omit lib_dir

Thick Mode:

  • Required for Oracle 11.2 and earlier
  • Requires Oracle Instant Client
  • Set lib_dir to the Instant Client directory

Required Permissions

For parallel RowID-based exports:

-- As Oracle DBA (e.g., sys as sysdba)
GRANT SELECT_CATALOG_ROLE TO your_username;

Grants access to metadata discovery, RowID range partitioning, and export method selection.

Example: Thin Mode

{
  "oracle_prod": {
    "ds_type": "oracle",
    "auth_mode": "classic",
    "info": {
      "username": "EXPORT_USER",
      "password": "OracleP@ss",
      "server": "oracle-prod.company.com",
      "port": 1521,
      "database": "PRODPDB"
    }
  }
}

Example: Thick Mode

{
  "oracle_legacy": {
    "ds_type": "oracle",
    "auth_mode": "classic",
    "info": {
      "username": "LEGACY_USER",
      "password": "OracleP@ss",
      "server": "oracle-11g.company.com",
      "port": 1521,
      "database": "LEGACY",
      "lib_dir": "/opt/oracle/instantclient_19_8"
    }
  }
}

SQL Server Configuration

Connection Parameters

Parameter Required Description
ds_type Yes Must be "mssql"
auth_mode Yes Must be "classic"
username Yes SQL Server username
password Yes SQL Server password
server Yes Hostname or IP
port Yes Port (default: 1433)
database Yes Database name

LakeXpress falls back to PyMSSQL when ODBC drivers are unavailable.

Required Permissions

-- Source database user:
GRANT SELECT ON SCHEMA::dbo TO lakexpress_user;
GRANT VIEW DEFINITION ON SCHEMA::dbo TO lakexpress_user;

-- INFORMATION_SCHEMA access is granted by default

Example

{
  "mssql_prod": {
    "ds_type": "mssql",
    "auth_mode": "classic",
    "info": {
      "username": "sa",
      "password": "StrongP@ssw0rd",
      "server": "mssql-prod.company.com",
      "port": 1433,
      "database": "SalesDB"
    }
  }
}

MySQL Configuration

MySQL works as both a source and log database.

Connection Parameters

Parameter Required Description
ds_type Yes Must be "mysql"
auth_mode Yes Must be "classic"
server Yes Hostname or IP
port Yes Port (default: 3306)
database Yes Database name
username Yes MySQL username
password Yes MySQL password

Required Permissions

-- Source database user:
GRANT SELECT ON your_database.* TO 'lakexpress_user'@'%';

-- Log database user:
GRANT ALL PRIVILEGES ON lakexpress_log.* TO 'lakexpress_user'@'%';

-- INFORMATION_SCHEMA access is granted by default

Example: Source Database

{
  "source_mysql": {
    "ds_type": "mysql",
    "auth_mode": "classic",
    "info": {
      "server": "localhost",
      "port": 3306,
      "database": "your-database",
      "username": "your-username",
      "password": "your-password"
    }
  }
}

Example: Log Database

{
  "log_db_mysql": {
    "ds_type": "mysql",
    "auth_mode": "classic",
    "info": {
      "server": "localhost",
      "port": 3306,
      "database": "lakexpress_log",
      "username": "your-username",
      "password": "your-password"
    }
  }
}

MariaDB Configuration

MariaDB is supported as a source database with its own mariadb ds_type.

Connection Parameters

Parameter Required Description
ds_type Yes Must be "mariadb"
auth_mode Yes Must be "classic"
server Yes Hostname or IP
port Yes Port (default: 3306)
database Yes Database name
username Yes MariaDB username
password Yes MariaDB password

Required Permissions

-- Source database user:
GRANT SELECT ON your_database.* TO 'lakexpress_user'@'%';

-- INFORMATION_SCHEMA access is granted by default

Example: Source Database

{
  "source_mariadb": {
    "ds_type": "mariadb",
    "auth_mode": "classic",
    "info": {
      "server": "localhost",
      "port": 3306,
      "database": "your-database",
      "username": "your-username",
      "password": "your-password"
    }
  }
}

SQLite Configuration (Log Database Only)

Connection Parameters

Parameter Required Description
ds_type Yes Must be "sqlite"
auth_mode Yes Must be "filesystem"
filepath Yes Path to SQLite database file

Example

{
  "log_db_local": {
    "ds_type": "sqlite",
    "auth_mode": "filesystem",
    "info": {
      "filepath": "/var/lib/lakexpress/export_log.sqlite"
    }
  }
}

When to Use SQLite

Good for: single-user setups, development, testing, deployments without a database server.

Avoid for: multi-user environments, high-concurrency production, distributed systems.

DuckDB Configuration (Log Database Only)

File-based databases only. In-memory mode (:memory:) is not supported.

Connection Parameters

Parameter Required Description
ds_type Yes Must be "duckdb"
auth_mode Yes Must be "filesystem"
filepath Yes Path to DuckDB database file

Example

{
  "log_db_duckdb": {
    "ds_type": "duckdb",
    "auth_mode": "filesystem",
    "info": {
      "filepath": "/var/lib/lakexpress/export_log.duckdb"
    }
  }
}

When to Use DuckDB

Good for: single-user setups, development, testing, fast analytical queries, lightweight embedded needs.

Avoid for: concurrent writes, high-concurrency production, distributed systems.

Connection String Examples

Using the Auth File

# PostgreSQL source, PostgreSQL log
./LakeXpress -a auth.json \
  --log_db_auth_id log_db_postgres \
  --source_db_auth_id source_postgres

# Oracle source, SQL Server log
./LakeXpress -a auth.json \
  --log_db_auth_id log_db_mssql \
  --source_db_auth_id source_oracle

# SQL Server source, SQLite log
./LakeXpress -a auth.json \
  --log_db_auth_id log_db_sqlite \
  --source_db_auth_id source_mssql

# MySQL source, MySQL log
./LakeXpress -a auth.json \
  --log_db_auth_id log_db_mysql \
  --source_db_auth_id source_mysql

# MySQL source, DuckDB log
./LakeXpress -a auth.json \
  --log_db_auth_id log_db_duckdb \
  --source_db_auth_id source_mysql

Troubleshooting

PostgreSQL

Connection refused: Check PostgreSQL is running (systemctl status postgresql), verify pg_hba.conf allows your IP, check firewall (sudo ufw allow 5432/tcp).

Authentication failed: Verify credentials, check password encryption in pg_hba.conf, confirm user exists: SELECT * FROM pg_user WHERE usename='your_user';

Oracle

ORA-12154: TNS could not resolve connect identifier: Verify service name, check tnsnames.ora, try IP instead of hostname, confirm listener is running.

ORA-28000: account is locked:

ALTER USER your_username ACCOUNT UNLOCK;

Thick mode library not found: Verify lib_dir path, confirm Instant Client is installed, check file permissions, set LD_LIBRARY_PATH on Linux.

SQL Server

Login failed for user: Confirm SQL Server authentication is enabled (not Windows-only), verify user exists with correct permissions, check password and IP restrictions.

Cannot open database requested by the login: Verify database name, check user access, confirm database is online: SELECT state_desc FROM sys.databases WHERE name='YourDB'

SQLite

Unable to open database file: Check file path, verify directory exists with write permissions, use absolute paths, check disk space.

MySQL / MariaDB

Access denied for user: Verify credentials, check host access: SELECT user, host FROM mysql.user WHERE user='your_user';, grant if needed: GRANT ALL PRIVILEGES ON database.* TO 'user'@'%';

Unknown database: Verify name: SHOW DATABASES;, create if needed: CREATE DATABASE your_database;

Connection refused: Check MySQL is running (systemctl status mysql), verify port (netstat -tlnp | grep 3306), check bind-address allows remote connections, check firewall (sudo ufw allow 3306/tcp).

DuckDB

Unable to open database file: Check file path, verify directory exists with write permissions, use absolute paths, check disk space.

Database is locked: Ensure no other process is using the file, close existing connections, check for stale lock files.

Snowflake Publishing Configuration

LakeXpress can create Snowflake tables after exporting to cloud storage.

Authentication Modes

Snowflake supports four authentication methods:

Password Authentication

{
  "snowflake_password": {
    "ds_type": "snowflake",
    "auth_mode": "password",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "password": "your-password",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}
{
  "snowflake_pat": {
    "ds_type": "snowflake",
    "auth_mode": "pat",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "token": "your-personal-access-token",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}

Key-Pair Authentication - Most Secure

{
  "snowflake_keypair": {
    "ds_type": "snowflake",
    "auth_mode": "keypair",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "private_key_path": "/path/to/rsa_key.p8",
      "private_key_passphrase": "optional-passphrase",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}

OAuth Authentication

{
  "snowflake_oauth": {
    "ds_type": "snowflake",
    "auth_mode": "oauth",
    "info": {
      "account": "your-account-identifier",
      "user": "your-username",
      "oauth_token": "your-oauth-token",
      "warehouse": "your-warehouse",
      "database": "your-database",
      "stage": "your-external-stage"
    }
  }
}

Configuration Parameters

Parameter Required Description
ds_type Yes Must be "snowflake"
auth_mode Yes One of: password, pat, keypair, oauth
account Yes Snowflake account identifier (e.g., tj36405.eu-west-1)
user Yes Snowflake username (usually all caps)
warehouse Yes Snowflake warehouse name
database Yes Target database name
stage Yes Snowflake stage name pointing to cloud storage
password Conditional Required for password auth mode
token Conditional Required for pat auth mode
private_key_path Conditional Required for keypair auth mode
private_key_passphrase Optional For encrypted private keys
oauth_token Conditional Required for oauth auth mode

Important: Stage Configuration

The Snowflake stage must point to the same location as your cloud storage configuration.

Example - S3:

{
  "aws_s3": {
    "ds_type": "s3",
    "info": {
      "directory": "s3://your-bucket-name/path/to/exports"
    }
  },
  "snowflake_pat": {
    "ds_type": "snowflake",
    "info": {
      "stage": "your-external-stage"
    }
  }
}

The stage URL must match the S3 directory path.

Verify stage location in Snowflake:

DESC STAGE YOUR_DB.PUBLIC.YOUR_STAGE;
-- URL column must match S3 directory exactly

Usage

# Export to S3 and publish to Snowflake
./LakeXpress -a credentials.json \
  --log_db_auth_id log_db_postgres \
  --source_db_auth_id source_postgres \
  --target_storage_id aws_s3 \
  --publish_target snowflake_pat \
  --fastbcp_dir_path /path/to/fastbcp

For more details, see the Snowflake Publishing Guide.

Microsoft Fabric Publishing Configuration

LakeXpress can create Fabric Lakehouse tables after exporting to OneLake storage.

Authentication

Fabric uses Service Principal authentication with Azure AD:

{
  "fabric_lakehouse": {
    "ds_type": "fabric",
    "auth_mode": "service_principal",
    "fabric_target": "lakehouse",
    "info": {
      "workspace_id": "your-workspace-id",
      "lakehouse_id": "your-lakehouse-id",
      "lakehouse_name": "your-lakehouse-name",
      "sql_endpoint": "your-sql-endpoint.datawarehouse.fabric.microsoft.com",
      "azure_client_id": "your-application-client-id",
      "azure_tenant_id": "your-directory-tenant-id",
      "azure_client_secret": "your-client-secret"
    }
  }
}

Configuration Parameters

Parameter Required Description
ds_type Yes Must be "fabric"
auth_mode Yes Must be "service_principal"
fabric_target Yes Must be "lakehouse"
workspace_id Yes Fabric workspace GUID
lakehouse_id Yes Lakehouse GUID
lakehouse_name Yes Lakehouse name
sql_endpoint Yes SQL analytics endpoint hostname
azure_client_id Yes Application (client) ID from Azure AD app registration
azure_tenant_id Yes Directory (tenant) ID from Azure AD
azure_client_secret Yes Client secret from Azure AD app

Finding Fabric Configuration Values

Workspace ID and Lakehouse ID: From the Fabric portal URL: https://app.fabric.microsoft.com/groups/{workspace_id}/lakehouses/{lakehouse_id}

SQL Endpoint: In the Lakehouse view, click “SQL analytics endpoint” in the bottom pane and copy the connection string hostname.

Usage

# Export to OneLake and publish to Fabric Lakehouse
./LakeXpress -a credentials.json \
  --log_db_auth_id log_db_postgres \
  --source_db_auth_id source_postgres \
  --target_storage_id onelake_storage \
  --publish_target fabric_lakehouse \
  --publish_method internal \
  --fastbcp_dir_path /path/to/fastbcp

Security Best Practices

1. Credential Storage

  • Never commit credentials to version control
  • Restrict auth.json permissions:
    chmod 600 auth.json
    
  • Use environment-specific auth files (dev, staging, prod)

2. Least Privilege

  • Create dedicated database users for LakeXpress
  • Grant only SELECT on source, full privileges on log DB
  • Use read-only credentials for source databases

3. Network Security

  • Use SSL/TLS connections
  • Restrict database access by IP
  • Use VPN or private networks

4. Password Management

  • Use strong passwords (12+ characters, mixed case, numbers, symbols)
  • Rotate passwords regularly
  • Consider secret managers (AWS Secrets Manager, HashiCorp Vault)

5. Environment Variables for Passwords

LakeXpress supports ${VAR_NAME} syntax in any string value in the credentials JSON:

{
  "source_postgres": {
    "ds_type": "postgres",
    "auth_mode": "classic",
    "info": {
      "username": "postgres",
      "password": "${PG_PASSWORD}",
      "server": "localhost",
      "port": 5432,
      "database": "production_db"
    }
  },
  "source_mssql": {
    "ds_type": "mssql",
    "auth_mode": "classic",
    "info": {
      "username": "sa",
      "password": "${MSSQL_PASSWORD}",
      "server": "mssql-server.com",
      "port": 1433,
      "database": "sales_db"
    }
  }
}

Set variables before running LakeXpress:

# Linux
export PG_PASSWORD="SecureP@ssw0rd"
export MSSQL_PASSWORD="StrongP@ssw0rd"

# Windows (cmd)
set PG_PASSWORD=SecureP@ssw0rd

# Windows (PowerShell)
$env:PG_PASSWORD = "SecureP@ssw0rd"

Plain-text passwords still work – values without $ are returned as-is.

See Also