Configuration Guide
This guide covers all configuration options for Hubio Sync, including data source connections, sync settings, transformations, and monitoring.
Configuration File Location
Hubio Sync reads configuration from a TOML file at:
- Linux/macOS:
~/.config/hubio-sync/config.toml - Windows:
%APPDATA%\hubio-sync\config.toml
You can override this location with the --config flag:
hubio-sync --config /path/to/custom-config.toml run
Quick Start Configuration
Create Configuration Directory
# Linux/macOS
mkdir -p ~/.config/hubio-sync
# Windows (PowerShell)
New-Item -ItemType Directory -Path "$env:APPDATA\hubio-sync" -Force
Minimal Configuration Example
# config.toml - Minimal working configuration
[source]
type = "mysql"
host = "localhost"
port = 3306
database = "myapp"
username = "readonly_user"
password = "secure_password"
[destination]
type = "s3"
bucket = "my-data-lake"
region = "us-east-1"
[sync]
tables = ["users", "orders"]
Source Configuration
Hubio Sync supports multiple data source types. Configure one source per sync job.
MySQL
[source]
type = "mysql"
host = "mysql.example.com"
port = 3306
database = "production_db"
username = "readonly_user"
password = "secure_password"
# Connection pool settings
max_connections = 10
connection_timeout = 30 # seconds
idle_timeout = 300 # seconds
# SSL/TLS (optional)
ssl_mode = "required" # Options: "disabled", "preferred", "required"
ssl_ca = "/path/to/ca-cert.pem"
ssl_cert = "/path/to/client-cert.pem"
ssl_key = "/path/to/client-key.pem"
PostgreSQL
[source]
type = "postgres"
host = "postgres.example.com"
port = 5432
database = "production_db"
username = "readonly_user"
password = "secure_password"
# Connection string (alternative to individual fields)
# connection_string = "postgresql://user:pass@host:5432/db"
# Schema selection
schema = "public"
# SSL mode
ssl_mode = "require" # Options: "disable", "allow", "prefer", "require", "verify-ca", "verify-full"
SQLite
[source]
type = "sqlite"
path = "/path/to/database.db"
# Read-only mode (recommended for safety)
read_only = true
# In-memory caching
cache_size = 2000 # pages
Microsoft SQL Server
[source]
type = "mssql"
host = "sqlserver.example.com"
port = 1433
database = "ProductionDB"
username = "readonly_user"
password = "secure_password"
# Windows Authentication (alternative to username/password)
# integrated_security = true
# Encryption
encrypt = true
trust_server_certificate = false
REST API
[source]
type = "rest_api"
base_url = "https://api.example.com/v1"
auth_type = "bearer" # Options: "none", "basic", "bearer", "api_key"
auth_token = "your-api-token"
# Rate limiting
rate_limit = 100 # requests per minute
retry_attempts = 3
retry_delay = 1000 # milliseconds
# Headers
[source.headers]
Accept = "application/json"
User-Agent = "HubioSync/1.0"
Destination Configuration
Configure where synced data should be written.
Amazon S3
[destination]
type = "s3"
bucket = "my-data-lake"
region = "us-east-1"
prefix = "hubio-sync/" # Optional folder prefix
# AWS credentials (or use IAM role)
access_key_id = "AKIAIOSFODNN7EXAMPLE"
secret_access_key = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
# File format
format = "parquet" # Options: "parquet", "json", "csv", "avro"
compression = "snappy" # Options: "none", "snappy", "gzip", "zstd"
# Partitioning
partition_by = ["year", "month", "day"] # Time-based partitioning
Google Cloud Storage
[destination]
type = "gcs"
bucket = "my-data-lake"
project_id = "my-gcp-project"
prefix = "hubio-sync/"
# Credentials
credentials_file = "/path/to/service-account-key.json"
# File format
format = "parquet"
compression = "snappy"
Azure Blob Storage
[destination]
type = "azure_blob"
account_name = "mystorageaccount"
container = "data-lake"
prefix = "hubio-sync/"
# Authentication
account_key = "your-account-key"
# Or use SAS token
# sas_token = "your-sas-token"
# File format
format = "parquet"
compression = "snappy"
Snowflake
[destination]
type = "snowflake"
account = "xy12345.us-east-1"
warehouse = "COMPUTE_WH"
database = "ANALYTICS"
schema = "HUBIO_SYNC"
username = "sync_user"
password = "secure_password"
# Role (optional)
role = "ACCOUNTADMIN"
# Stage for data loading
stage = "@~/hubio_stage"
Local Filesystem
[destination]
type = "filesystem"
path = "/data/exports"
# File format
format = "parquet"
compression = "snappy"
# Directory structure
partition_by = ["table", "date"]
Sync Configuration
Control how data synchronization behaves.
Basic Sync Settings
[sync]
# Tables to sync
tables = ["users", "orders", "products"]
# Or sync all tables
# tables = ["*"]
# Sync mode
mode = "incremental" # Options: "full", "incremental", "append"
# Incremental column (for incremental mode)
incremental_column = "updated_at"
# Batch size
batch_size = 10000 # rows per batch
max_parallel_batches = 4 # concurrent batches
Scheduling
[sync]
# Cron expression for scheduling
schedule = "0 2 * * *" # Daily at 2 AM
# Examples:
# "*/15 * * * *" - Every 15 minutes
# "0 */4 * * *" - Every 4 hours
# "0 0 * * 0" - Weekly on Sunday at midnight
# "0 9 * * 1-5" - Weekdays at 9 AM
# Timezone
timezone = "America/New_York"
Full Refresh vs Incremental
[sync]
# Full refresh: Replace all data on each sync
mode = "full"
# Incremental: Only sync new/updated records
mode = "incremental"
incremental_column = "updated_at"
incremental_type = "timestamp" # Options: "timestamp", "integer", "date"
# Append-only: Only sync new records (never update)
mode = "append"
incremental_column = "created_at"
Table-Specific Configuration
Override global settings for specific tables.
[sync]
tables = ["users", "orders", "large_table"]
# Global incremental column
incremental_column = "updated_at"
# Table-specific overrides
[[sync.table_config]]
name = "large_table"
batch_size = 50000 # Larger batches for big tables
incremental_column = "last_modified" # Different column
[[sync.table_config]]
name = "orders"
mode = "append" # Append-only for orders
incremental_column = "created_at"
# Column selection
include_columns = ["id", "customer_id", "total", "created_at"]
# Or exclude specific columns
# exclude_columns = ["internal_notes", "debug_data"]
Transformations
Apply transformations during sync.
Column Transformations
[[transformations]]
table = "users"
transform = "anonymize"
columns = ["email", "phone", "ssn"]
method = "hash" # Options: "hash", "mask", "null"
[[transformations]]
table = "users"
transform = "rename"
columns = { "user_id" = "id", "user_email" = "email" }
[[transformations]]
table = "orders"
transform = "cast"
columns = { "total" = "decimal", "quantity" = "integer" }
Row Filtering
[[transformations]]
table = "orders"
transform = "filter"
condition = "created_at >= NOW() - INTERVAL 90 DAY"
[[transformations]]
table = "users"
transform = "filter"
condition = "status = 'active' AND deleted_at IS NULL"
Custom SQL Transformations
[[transformations]]
table = "orders"
transform = "sql"
query = """
SELECT
id,
customer_id,
total,
CASE
WHEN total > 1000 THEN 'high_value'
WHEN total > 100 THEN 'medium_value'
ELSE 'low_value'
END as value_segment,
created_at
FROM orders
WHERE status = 'completed'
"""
Monitoring & Logging
Configure observability and debugging.
Logging
[logging]
level = "info" # Options: "debug", "info", "warn", "error"
format = "json" # Options: "json", "text"
output = "stdout" # Options: "stdout", "file"
# File output settings
log_file = "/var/log/hubio-sync/sync.log"
max_file_size = "100MB"
max_backups = 10
compress = true
Metrics
[metrics]
enabled = true
port = 9090
path = "/metrics"
# Prometheus exposition format
format = "prometheus"
# Metrics to track
track_row_counts = true
track_sync_duration = true
track_error_rates = true
Alerts
[alerts]
enabled = true
# Microsoft Teams webhook
teams_webhook_url = "https://your-org.webhook.office.com/webhookb2/YOUR/WEBHOOK/URL"
# Alert conditions
alert_on_failure = true
alert_on_slow_sync = true
slow_sync_threshold = 3600 # seconds (1 hour)
# Email alerts
smtp_host = "smtp.gmail.com"
smtp_port = 587
smtp_username = "alerts@example.com"
smtp_password = "app-password"
email_to = ["team@example.com"]
Performance Tuning
Optimize sync performance for your workload.
[performance]
# Connection pooling
max_connections = 20
min_connections = 5
# Memory limits
max_memory = "2GB"
buffer_size = "100MB"
# Parallelization
max_parallel_tables = 4 # Sync multiple tables concurrently
max_parallel_batches = 8 # Batches per table
# Compression
enable_compression = true
compression_level = 6 # 1-9, higher = better compression, slower
# Caching
enable_metadata_cache = true
cache_ttl = 3600 # seconds
Security
Secure your configuration and credentials.
Credential Management
[security]
# Use environment variables for sensitive data
# Instead of hardcoding passwords in config
[source]
type = "mysql"
host = "mysql.example.com"
username = "readonly"
# Reference environment variable
password = "${MYSQL_PASSWORD}"
[destination]
type = "s3"
bucket = "data-lake"
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"
Encryption at Rest
[security]
# Encrypt local cache and temporary files
encrypt_cache = true
encryption_key_file = "/secure/path/to/key.pem"
Network Security
[security]
# Require TLS for all connections
require_tls = true
tls_min_version = "1.2"
# Verify certificates
verify_certificates = true
ca_bundle = "/path/to/ca-bundle.crt"
Environment-Specific Configuration
Manage different configurations for dev, staging, production.
Using Environment Variables
# Set environment
export HUBIO_ENV=production
export HUBIO_CONFIG=/etc/hubio-sync/production.toml
# Run sync
hubio-sync run
Configuration Includes
# production.toml
include = ["base.toml", "secrets.toml"]
[sync]
# Production-specific overrides
batch_size = 50000
max_parallel_batches = 16
Validation
Validate your configuration before running a sync.
# Validate configuration syntax
hubio-sync validate
# Test source connection
hubio-sync validate --test-source
# Test destination connection
hubio-sync validate --test-destination
# Full validation (syntax + connections)
hubio-sync validate --full
Example Configurations
Complete Production Example
# Production configuration for MySQL → S3 sync
[source]
type = "mysql"
host = "${MYSQL_HOST}"
port = 3306
database = "production"
username = "${MYSQL_USER}"
password = "${MYSQL_PASSWORD}"
max_connections = 20
ssl_mode = "required"
[destination]
type = "s3"
bucket = "company-data-lake"
region = "us-east-1"
prefix = "mysql-exports/"
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"
format = "parquet"
compression = "snappy"
partition_by = ["year", "month", "day"]
[sync]
tables = ["users", "orders", "products", "transactions"]
mode = "incremental"
incremental_column = "updated_at"
schedule = "0 2 * * *" # Daily at 2 AM
timezone = "UTC"
batch_size = 10000
max_parallel_tables = 4
[[transformations]]
table = "users"
transform = "anonymize"
columns = ["email", "phone", "ssn"]
[[transformations]]
table = "orders"
transform = "filter"
condition = "status = 'completed'"
[logging]
level = "info"
format = "json"
output = "file"
log_file = "/var/log/hubio-sync/sync.log"
[metrics]
enabled = true
port = 9090
[alerts]
enabled = true
teams_webhook_url = "${TEAMS_WEBHOOK_URL}"
alert_on_failure = true
Configuration Reference
For a complete list of all configuration options, run:
hubio-sync config --help
Or see the Configuration Schema Documentation.
Troubleshooting
Common Configuration Errors
“Invalid TOML syntax”
- Check for missing quotes around strings
- Ensure proper escaping of backslashes in Windows paths
- Validate with:
hubio-sync validate
“Connection refused”
- Verify host, port, and credentials
- Check firewall rules
- Test with:
hubio-sync validate --test-source
“Permission denied”
- Ensure database user has SELECT permissions
- Check destination write permissions
- Verify IAM roles (for cloud destinations)
For additional help, see the Troubleshooting Guide or contact support@hubio.com.