FastBCP Configuration
LakeXpress uses FastBCP for high-performance data export with per-table control over export methods.
Configuration Approaches
1. Auto-Detection (Default)
LakeXpress selects the export method based on table size, database type, and available features. No configuration needed.
2. Database Table Configuration (Production)
Store table-specific settings in the partition_columns table of the export logging database. Use this for persistent, shared configuration across runs.
3. CLI Configuration (Development/Testing)
Use --fastbcp_table_config for quick, one-off testing.
Available Export Methods
Methods Without Key Column
| Method | Database | Description | Best For |
|---|---|---|---|
| None | All | Sequential export (no parallelism) | Small tables (< 100K rows) |
| Ctid | PostgreSQL | Tuple ID-based parallel export | Medium to large PostgreSQL tables |
| Rowid | Oracle | Row ID-based parallel export | Medium to large Oracle tables |
| Physloc | SQL Server | Physical location-based parallel export | Medium to large SQL Server tables |
Methods With Key Column
| Method | Description | Key Column Requirements | Best For |
|---|---|---|---|
| Random | Distributes using column values | Integer or bigint column | Tables with sequential IDs |
| DataDriven | Splits on distinct column values | Column with distinct values | Tables partitioned by year, region, etc. |
| RangeId | Range partitioning via min/max | Numeric or date column with good distribution | Large tables with sortable columns |
| Ntile | Even distribution via column values | Numeric column | Tables requiring even distribution |
Production: Database Configuration
Creating Table Configuration
Insert into the partition_columns table:
INSERT INTO partition_columns (
source_db_type,
source_database,
source_schema,
source_table,
fastbcp_method,
fastbcp_parallel_degree,
fastbcp_distribution_key,
env_name
) VALUES (
'postgres', -- Database type: postgres, oracle, mssql
'tpch', -- Database name
'tpch_1', -- Schema name
'lineitem', -- Table name
'DataDriven', -- FastBCP method
8, -- Parallel degree (--fastbcp_p equivalent)
'YEAR(l_shipdate)', -- Distribution key column/expression
'production' -- Environment name (optional)
);
Examples
PostgreSQL - Ctid Method (No Key Column)
INSERT INTO partition_columns (
source_db_type, source_database, source_schema, source_table,
fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
'postgres', 'analytics', 'public', 'events',
'Ctid', 4, NULL
);
Oracle - DataDriven Method
INSERT INTO partition_columns (
source_db_type, source_database, source_schema, source_table,
fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
'oracle', 'PRODDB', 'SALES', 'TRANSACTIONS',
'DataDriven', 8, 'YEAR(TRANSACTION_DATE)'
);
SQL Server - Physloc Method
INSERT INTO partition_columns (
source_db_type, source_database, source_schema, source_table,
fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
'mssql', 'SalesDB', 'dbo', 'Orders',
'Physloc', 4, NULL
);
RangeId Method with Numeric Column
INSERT INTO partition_columns (
source_db_type, source_database, source_schema, source_table,
fastbcp_method, fastbcp_parallel_degree, fastbcp_distribution_key
) VALUES (
'postgres', 'ecommerce', 'public', 'customer',
'RangeId', 2, 'c_custkey'
);
Development/Testing: CLI Configuration
Format
--fastbcp_table_config "table:method:key_column:parallel_degree"
- Empty fields use defaults:
::means default key and parallel degree - Separate multiple tables with semicolons
Single Table Example
./LakeXpress -a auth.json --log_db_auth_id log_db \
--source_db_auth_id source_pg \
--source_schema_name tpch_1 \
--fastbcp_table_config "lineitem:DataDriven:YEAR(l_shipdate):8" \
--output_dir ./exports \
--fastbcp_dir_path /opt/fastbcp
Multiple Tables Example
./LakeXpress -a auth.json --log_db_auth_id log_db \
--source_db_auth_id source_pg \
--source_schema_name tpch_1 \
--fastbcp_table_config "lineitem:DataDriven:YEAR(l_shipdate):8;orders:Ctid::4;customer:RangeId:c_custkey:2" \
--output_dir ./exports \
--fastbcp_dir_path /opt/fastbcp
This configures:
lineitem: DataDriven, partition by YEAR(l_shipdate), 8 parallel processesorders: Ctid, no key column, 4 parallel processescustomer: RangeId, partition by c_custkey, 2 parallel processes
Using Defaults
# Use default parallel degree (from --fastbcp_p or 1)
--fastbcp_table_config "orders:Ctid::"
# Use default method and parallel degree (auto-detection)
--fastbcp_table_config "orders:::"
Configuration Priority
From highest to lowest:
- partition_columns table - Database configuration
- –fastbcp_table_config - CLI configuration
- Auto-detection - Automatic method selection
Database table configuration always overrides CLI configuration.
Performance Tuning
Parallel Degree (--fastbcp_p)
Controls parallelism within large tables:
--fastbcp_p 4 # Export large tables using 4 parallel processes
Guidelines:
- < 100K rows: Use 1 (default)
- 100K - 10M rows: Use 2-4
- 10M - 100M rows: Use 4-8
- > 100M rows: Use 8-16
Higher parallelism is not always faster. Factor in CPU cores, source database load, and network bandwidth.
Large Table Threshold
--large_table_threshold 500000 # Tables with < 500K rows use sequential export
Default: 100,000 rows. Tables below this threshold export sequentially.
Table-Level Parallelism (--n_jobs)
Controls how many tables export simultaneously:
--n_jobs 8 # Export 8 tables in parallel
Example configurations:
# Balanced (medium server)
--n_jobs 4 --fastbcp_p 2
# Aggressive (large server)
--n_jobs 8 --fastbcp_p 4
# Conservative
--n_jobs 2 --fastbcp_p 1
Method Selection Guide
PostgreSQL Tables
Small tables (< 100K rows):
fastbcp_method = 'None'
Medium to large tables:
fastbcp_method = 'Ctid'
fastbcp_parallel_degree = 4
Tables with a good distribution column:
fastbcp_method = 'DataDriven'
fastbcp_distribution_key = 'YEAR(created_date)'
fastbcp_parallel_degree = 8
Oracle Tables
Rowid-based exports require SELECT_CATALOG_ROLE.
Small tables:
fastbcp_method = 'None'
Medium to large tables with SELECT_CATALOG_ROLE:
fastbcp_method = 'Rowid'
fastbcp_parallel_degree = 4
Without SELECT_CATALOG_ROLE, or with a distribution column:
fastbcp_method = 'DataDriven'
fastbcp_distribution_key = 'EXTRACT(YEAR FROM ORDER_DATE)'
fastbcp_parallel_degree = 8
SQL Server Tables
Small tables:
fastbcp_method = 'None'
Medium to large tables:
fastbcp_method = 'Physloc'
fastbcp_parallel_degree = 4
Tables with a good distribution column:
fastbcp_method = 'DataDriven'
fastbcp_distribution_key = 'YEAR(OrderDate)'
fastbcp_parallel_degree = 8
Distribution Key Examples
Good Distribution Keys
Year-based:
-- PostgreSQL
'YEAR(created_date)'
'EXTRACT(YEAR FROM order_date)'
-- Oracle
'EXTRACT(YEAR FROM ORDER_DATE)'
'TO_CHAR(CREATE_DATE, ''YYYY'')'
-- SQL Server
'YEAR(OrderDate)'
'DATEPART(YEAR, CreatedDate)'
Region/Category:
'region_id'
'category'
'department_code'
Numeric range:
'customer_id'
'order_id'
'account_number'
Poor Distribution Keys
Avoid:
- Boolean columns - Only 2 values
- Low cardinality columns - Too few distinct values
- Highly skewed columns - Most rows share one value
- NULL-heavy columns - Many NULL values
Monitoring and Troubleshooting
Check Export Performance
SELECT
source_table,
status,
started_at,
finished_at,
finished_at - started_at AS duration,
row_count
FROM jobs
WHERE run_id = 'your-run-id'
ORDER BY duration DESC;
Identify Slow Tables
SELECT
source_table,
finished_at - started_at AS duration,
row_count,
row_count / EXTRACT(EPOCH FROM (finished_at - started_at)) AS rows_per_second
FROM jobs
WHERE run_id = 'your-run-id' AND status = 'completed'
ORDER BY duration DESC
LIMIT 10;
Optimize Slow Tables
- Increase parallel degree (
--fastbcp_p) - Try a different FastBCP method
- Verify distribution key cardinality
- Check source DB for locks, slow queries, or resource constraints
Complete Example
./LakeXpress -a auth.json \
--log_db_auth_id log_db_postgres \
--source_db_auth_id source_postgres \
--source_schema_name public \
--fastbcp_dir_path ./FastBCP_linux-x64/latest/ \
--n_jobs 8 \
--fastbcp_p 4 \
--large_table_threshold 500000 \
--fastbcp_table_config "huge_table:DataDriven:YEAR(created):16;medium_table:Ctid::4" \
--output_dir ./exports
- 8 tables exported simultaneously (
--n_jobs 8) - 4-way parallelism for large tables (
--fastbcp_p 4) - Tables under 500K rows export sequentially
huge_table: DataDriven, 16-way parallelismmedium_table: Ctid, 4-way parallelism- All other tables use auto-detection
See Also
- CLI Reference - FastBCP command-line options
- Database Configuration - Required database permissions
- Examples - Real-world FastBCP configurations