FastBCP Configuration

LakeXpress uses FastBCP for high-performance data export with per-table control over export methods.

Configuration Approaches

1. Auto-Detection (Default)

LakeXpress selects the export method based on table size, database type, and available features. No configuration needed.

2. Database Table Configuration (Production)

Store table-specific settings in the partition_columns table of the export logging database. Use this for persistent, shared configuration across runs.

3. CLI Configuration (Development/Testing)

Use --fastbcp_table_config for quick, one-off testing.

Available Export Methods

Methods Without Key Column

Method	Database	Description	Best For
None	All	Sequential export (no parallelism)	Small tables (< 100K rows)
Ctid	PostgreSQL	Tuple ID-based parallel export	Medium to large PostgreSQL tables
Rowid	Oracle	Row ID-based parallel export	Medium to large Oracle tables
Physloc	SQL Server	Physical location-based parallel export	Medium to large SQL Server tables

Methods With Key Column

Method	Description	Key Column Requirements	Best For
Random	Distributes using column values	Integer or bigint column	Tables with sequential IDs
DataDriven	Splits on distinct column values	Column with distinct values	Tables partitioned by year, region, etc.
RangeId	Range partitioning via min/max	Numeric or date column with good distribution	Large tables with sortable columns
Ntile	Even distribution via column values	Numeric column	Tables requiring even distribution

Production: Database Configuration

Creating Table Configuration

Insert into the partition_columns table:

INSERT INTO partition_columns (
    source_db_type,
    source_database,
    source_schema,
    source_table,
    fastbcp_method,
    fastbcp_parallel_degree,
    fastbcp_distribution_key,
    env_name
) VALUES (
    'postgres',           -- Database type: postgres, oracle, mssql
    'tpch',              -- Database name
    'tpch_1',            -- Schema name
    'lineitem',          -- Table name
    'DataDriven',        -- FastBCP method
    8,                   -- Parallel degree (--fastbcp_p equivalent)
    'YEAR(l_shipdate)',  -- Distribution key column/expression
    'production'         -- Environment name (optional)
);

Examples

PostgreSQL - Ctid Method (No Key Column)

INSERT INTO partition_columns (
    source_db_type, source_database, source_schema, source_table,
    fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
    'postgres', 'analytics', 'public', 'events',
    'Ctid', 4, NULL
);

Oracle - DataDriven Method

INSERT INTO partition_columns (
    source_db_type, source_database, source_schema, source_table,
    fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
    'oracle', 'PRODDB', 'SALES', 'TRANSACTIONS',
    'DataDriven', 8, 'YEAR(TRANSACTION_DATE)'
);

SQL Server - Physloc Method

INSERT INTO partition_columns (
    source_db_type, source_database, source_schema, source_table,
    fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
    'mssql', 'SalesDB', 'dbo', 'Orders',
    'Physloc', 4, NULL
);

RangeId Method with Numeric Column

INSERT INTO partition_columns (
    source_db_type, source_database, source_schema, source_table,
    fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
    'postgres', 'ecommerce', 'public', 'customer',
    'RangeId', 2, 'c_custkey'
);

Development/Testing: CLI Configuration

Format

--fastbcp_table_config "table:method:key_column:parallel_degree"

Empty fields use defaults: :: means default key and parallel degree
Separate multiple tables with semicolons

Single Table Example

./LakeXpress -a auth.json --log_db_auth_id log_db \
        --source_db_auth_id source_pg \
        --source_schema_name tpch_1 \
        --fastbcp_table_config "lineitem:DataDriven:YEAR(l_shipdate):8" \
        --output_dir ./exports \
        --fastbcp_dir_path /opt/fastbcp

Multiple Tables Example

./LakeXpress -a auth.json --log_db_auth_id log_db \
        --source_db_auth_id source_pg \
        --source_schema_name tpch_1 \
        --fastbcp_table_config "lineitem:DataDriven:YEAR(l_shipdate):8;orders:Ctid::4;customer:RangeId:c_custkey:2" \
        --output_dir ./exports \
        --fastbcp_dir_path /opt/fastbcp

This configures:

lineitem: DataDriven, partition by YEAR(l_shipdate), 8 parallel processes
orders: Ctid, no key column, 4 parallel processes
customer: RangeId, partition by c_custkey, 2 parallel processes

Using Defaults

# Use default parallel degree (from --fastbcp_p or 1)
--fastbcp_table_config "orders:Ctid::"

# Use default method and parallel degree (auto-detection)
--fastbcp_table_config "orders:::"

Configuration Priority

From highest to lowest:

partition_columns table - Database configuration
–fastbcp_table_config - CLI configuration
Auto-detection - Automatic method selection

Database table configuration always overrides CLI configuration.

Performance Tuning

Parallel Degree (`--fastbcp_p`)

Controls parallelism within large tables:

--fastbcp_p 4  # Export large tables using 4 parallel processes

Guidelines:

< 100K rows: Use 1 (default)
100K - 10M rows: Use 2-4
10M - 100M rows: Use 4-8
> 100M rows: Use 8-16

Higher parallelism is not always faster. Factor in CPU cores, source database load, and network bandwidth.

Large Table Threshold

--large_table_threshold 500000  # Tables with < 500K rows use sequential export

Default: 100,000 rows. Tables below this threshold export sequentially.

Table-Level Parallelism (`--n_jobs`)

Controls how many tables export simultaneously:

--n_jobs 8  # Export 8 tables in parallel

Example configurations:

# Balanced (medium server)
--n_jobs 4 --fastbcp_p 2

# Aggressive (large server)
--n_jobs 8 --fastbcp_p 4

# Conservative
--n_jobs 2 --fastbcp_p 1

Method Selection Guide

PostgreSQL Tables

Small tables (< 100K rows):

fastbcp_method = 'None'

Medium to large tables:

fastbcp_method = 'Ctid'
fastbcp_parallel_degree = 4

Tables with a good distribution column:

fastbcp_method = 'DataDriven'
fastbcp_distribution_key = 'YEAR(created_date)'
fastbcp_parallel_degree = 8

Oracle Tables

Rowid-based exports require SELECT_CATALOG_ROLE.

Small tables:

fastbcp_method = 'None'

Medium to large tables with SELECT_CATALOG_ROLE:

fastbcp_method = 'Rowid'
fastbcp_parallel_degree = 4

Without SELECT_CATALOG_ROLE, or with a distribution column:

fastbcp_method = 'DataDriven'
fastbcp_distribution_key = 'EXTRACT(YEAR FROM ORDER_DATE)'
fastbcp_parallel_degree = 8

SQL Server Tables