Configuration Guide

This guide covers all configuration options for Hubio Sync, including data source connections, sync settings, transformations, and monitoring.

Configuration File Location

Hubio Sync reads configuration from a TOML file at:

  • Linux/macOS: ~/.config/hubio-sync/config.toml
  • Windows: %APPDATA%\hubio-sync\config.toml

You can override this location with the --config flag:

hubio-sync --config /path/to/custom-config.toml run

Quick Start Configuration

Create Configuration Directory

# Linux/macOS
mkdir -p ~/.config/hubio-sync

# Windows (PowerShell)
New-Item -ItemType Directory -Path "$env:APPDATA\hubio-sync" -Force

Minimal Configuration Example

# config.toml - Minimal working configuration

[source]
type = "mysql"
host = "localhost"
port = 3306
database = "myapp"
username = "readonly_user"
password = "secure_password"

[destination]
type = "s3"
bucket = "my-data-lake"
region = "us-east-1"

[sync]
tables = ["users", "orders"]

Source Configuration

Hubio Sync supports multiple data source types. Configure one source per sync job.

MySQL

[source]
type = "mysql"
host = "mysql.example.com"
port = 3306
database = "production_db"
username = "readonly_user"
password = "secure_password"

# Connection pool settings
max_connections = 10
connection_timeout = 30  # seconds
idle_timeout = 300       # seconds

# SSL/TLS (optional)
ssl_mode = "required"    # Options: "disabled", "preferred", "required"
ssl_ca = "/path/to/ca-cert.pem"
ssl_cert = "/path/to/client-cert.pem"
ssl_key = "/path/to/client-key.pem"

PostgreSQL

[source]
type = "postgres"
host = "postgres.example.com"
port = 5432
database = "production_db"
username = "readonly_user"
password = "secure_password"

# Connection string (alternative to individual fields)
# connection_string = "postgresql://user:pass@host:5432/db"

# Schema selection
schema = "public"

# SSL mode
ssl_mode = "require"     # Options: "disable", "allow", "prefer", "require", "verify-ca", "verify-full"

SQLite

[source]
type = "sqlite"
path = "/path/to/database.db"

# Read-only mode (recommended for safety)
read_only = true

# In-memory caching
cache_size = 2000  # pages

Microsoft SQL Server

[source]
type = "mssql"
host = "sqlserver.example.com"
port = 1433
database = "ProductionDB"
username = "readonly_user"
password = "secure_password"

# Windows Authentication (alternative to username/password)
# integrated_security = true

# Encryption
encrypt = true
trust_server_certificate = false

REST API

[source]
type = "rest_api"
base_url = "https://api.example.com/v1"
auth_type = "bearer"     # Options: "none", "basic", "bearer", "api_key"
auth_token = "your-api-token"

# Rate limiting
rate_limit = 100         # requests per minute
retry_attempts = 3
retry_delay = 1000       # milliseconds

# Headers
[source.headers]
Accept = "application/json"
User-Agent = "HubioSync/1.0"

Destination Configuration

Configure where synced data should be written.

Amazon S3

[destination]
type = "s3"
bucket = "my-data-lake"
region = "us-east-1"
prefix = "hubio-sync/"   # Optional folder prefix

# AWS credentials (or use IAM role)
access_key_id = "AKIAIOSFODNN7EXAMPLE"
secret_access_key = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

# File format
format = "parquet"       # Options: "parquet", "json", "csv", "avro"
compression = "snappy"   # Options: "none", "snappy", "gzip", "zstd"

# Partitioning
partition_by = ["year", "month", "day"]  # Time-based partitioning

Google Cloud Storage

[destination]
type = "gcs"
bucket = "my-data-lake"
project_id = "my-gcp-project"
prefix = "hubio-sync/"

# Credentials
credentials_file = "/path/to/service-account-key.json"

# File format
format = "parquet"
compression = "snappy"

Azure Blob Storage

[destination]
type = "azure_blob"
account_name = "mystorageaccount"
container = "data-lake"
prefix = "hubio-sync/"

# Authentication
account_key = "your-account-key"
# Or use SAS token
# sas_token = "your-sas-token"

# File format
format = "parquet"
compression = "snappy"

Snowflake

[destination]
type = "snowflake"
account = "xy12345.us-east-1"
warehouse = "COMPUTE_WH"
database = "ANALYTICS"
schema = "HUBIO_SYNC"
username = "sync_user"
password = "secure_password"

# Role (optional)
role = "ACCOUNTADMIN"

# Stage for data loading
stage = "@~/hubio_stage"

Local Filesystem

[destination]
type = "filesystem"
path = "/data/exports"

# File format
format = "parquet"
compression = "snappy"

# Directory structure
partition_by = ["table", "date"]

Sync Configuration

Control how data synchronization behaves.

Basic Sync Settings

[sync]
# Tables to sync
tables = ["users", "orders", "products"]
# Or sync all tables
# tables = ["*"]

# Sync mode
mode = "incremental"     # Options: "full", "incremental", "append"

# Incremental column (for incremental mode)
incremental_column = "updated_at"

# Batch size
batch_size = 10000       # rows per batch
max_parallel_batches = 4 # concurrent batches

Scheduling

[sync]
# Cron expression for scheduling
schedule = "0 2 * * *"   # Daily at 2 AM

# Examples:
# "*/15 * * * *"    - Every 15 minutes
# "0 */4 * * *"     - Every 4 hours
# "0 0 * * 0"       - Weekly on Sunday at midnight
# "0 9 * * 1-5"     - Weekdays at 9 AM

# Timezone
timezone = "America/New_York"

Full Refresh vs Incremental

[sync]
# Full refresh: Replace all data on each sync
mode = "full"

# Incremental: Only sync new/updated records
mode = "incremental"
incremental_column = "updated_at"
incremental_type = "timestamp"  # Options: "timestamp", "integer", "date"

# Append-only: Only sync new records (never update)
mode = "append"
incremental_column = "created_at"

Table-Specific Configuration

Override global settings for specific tables.

[sync]
tables = ["users", "orders", "large_table"]

# Global incremental column
incremental_column = "updated_at"

# Table-specific overrides
[[sync.table_config]]
name = "large_table"
batch_size = 50000           # Larger batches for big tables
incremental_column = "last_modified"  # Different column

[[sync.table_config]]
name = "orders"
mode = "append"              # Append-only for orders
incremental_column = "created_at"

# Column selection
include_columns = ["id", "customer_id", "total", "created_at"]
# Or exclude specific columns
# exclude_columns = ["internal_notes", "debug_data"]

Transformations

Apply transformations during sync.

Column Transformations

[[transformations]]
table = "users"
transform = "anonymize"
columns = ["email", "phone", "ssn"]
method = "hash"              # Options: "hash", "mask", "null"

[[transformations]]
table = "users"
transform = "rename"
columns = { "user_id" = "id", "user_email" = "email" }

[[transformations]]
table = "orders"
transform = "cast"
columns = { "total" = "decimal", "quantity" = "integer" }

Row Filtering

[[transformations]]
table = "orders"
transform = "filter"
condition = "created_at >= NOW() - INTERVAL 90 DAY"

[[transformations]]
table = "users"
transform = "filter"
condition = "status = 'active' AND deleted_at IS NULL"

Custom SQL Transformations

[[transformations]]
table = "orders"
transform = "sql"
query = """
  SELECT
    id,
    customer_id,
    total,
    CASE
      WHEN total > 1000 THEN 'high_value'
      WHEN total > 100 THEN 'medium_value'
      ELSE 'low_value'
    END as value_segment,
    created_at
  FROM orders
  WHERE status = 'completed'
"""

Monitoring & Logging

Configure observability and debugging.

Logging

[logging]
level = "info"               # Options: "debug", "info", "warn", "error"
format = "json"              # Options: "json", "text"
output = "stdout"            # Options: "stdout", "file"

# File output settings
log_file = "/var/log/hubio-sync/sync.log"
max_file_size = "100MB"
max_backups = 10
compress = true

Metrics

[metrics]
enabled = true
port = 9090
path = "/metrics"

# Prometheus exposition format
format = "prometheus"

# Metrics to track
track_row_counts = true
track_sync_duration = true
track_error_rates = true

Alerts

[alerts]
enabled = true

# Microsoft Teams webhook
teams_webhook_url = "https://your-org.webhook.office.com/webhookb2/YOUR/WEBHOOK/URL"

# Alert conditions
alert_on_failure = true
alert_on_slow_sync = true
slow_sync_threshold = 3600   # seconds (1 hour)

# Email alerts
smtp_host = "smtp.gmail.com"
smtp_port = 587
smtp_username = "alerts@example.com"
smtp_password = "app-password"
email_to = ["team@example.com"]

Performance Tuning

Optimize sync performance for your workload.

[performance]
# Connection pooling
max_connections = 20
min_connections = 5

# Memory limits
max_memory = "2GB"
buffer_size = "100MB"

# Parallelization
max_parallel_tables = 4      # Sync multiple tables concurrently
max_parallel_batches = 8     # Batches per table

# Compression
enable_compression = true
compression_level = 6        # 1-9, higher = better compression, slower

# Caching
enable_metadata_cache = true
cache_ttl = 3600             # seconds

Security

Secure your configuration and credentials.

Credential Management

[security]
# Use environment variables for sensitive data
# Instead of hardcoding passwords in config

[source]
type = "mysql"
host = "mysql.example.com"
username = "readonly"
# Reference environment variable
password = "${MYSQL_PASSWORD}"

[destination]
type = "s3"
bucket = "data-lake"
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"

Encryption at Rest

[security]
# Encrypt local cache and temporary files
encrypt_cache = true
encryption_key_file = "/secure/path/to/key.pem"

Network Security

[security]
# Require TLS for all connections
require_tls = true
tls_min_version = "1.2"

# Verify certificates
verify_certificates = true
ca_bundle = "/path/to/ca-bundle.crt"

Environment-Specific Configuration

Manage different configurations for dev, staging, production.

Using Environment Variables

# Set environment
export HUBIO_ENV=production
export HUBIO_CONFIG=/etc/hubio-sync/production.toml

# Run sync
hubio-sync run

Configuration Includes

# production.toml
include = ["base.toml", "secrets.toml"]

[sync]
# Production-specific overrides
batch_size = 50000
max_parallel_batches = 16

Validation

Validate your configuration before running a sync.

# Validate configuration syntax
hubio-sync validate

# Test source connection
hubio-sync validate --test-source

# Test destination connection
hubio-sync validate --test-destination

# Full validation (syntax + connections)
hubio-sync validate --full

Example Configurations

Complete Production Example

# Production configuration for MySQL → S3 sync

[source]
type = "mysql"
host = "${MYSQL_HOST}"
port = 3306
database = "production"
username = "${MYSQL_USER}"
password = "${MYSQL_PASSWORD}"
max_connections = 20
ssl_mode = "required"

[destination]
type = "s3"
bucket = "company-data-lake"
region = "us-east-1"
prefix = "mysql-exports/"
access_key_id = "${AWS_ACCESS_KEY_ID}"
secret_access_key = "${AWS_SECRET_ACCESS_KEY}"
format = "parquet"
compression = "snappy"
partition_by = ["year", "month", "day"]

[sync]
tables = ["users", "orders", "products", "transactions"]
mode = "incremental"
incremental_column = "updated_at"
schedule = "0 2 * * *"  # Daily at 2 AM
timezone = "UTC"
batch_size = 10000
max_parallel_tables = 4

[[transformations]]
table = "users"
transform = "anonymize"
columns = ["email", "phone", "ssn"]

[[transformations]]
table = "orders"
transform = "filter"
condition = "status = 'completed'"

[logging]
level = "info"
format = "json"
output = "file"
log_file = "/var/log/hubio-sync/sync.log"

[metrics]
enabled = true
port = 9090

[alerts]
enabled = true
teams_webhook_url = "${TEAMS_WEBHOOK_URL}"
alert_on_failure = true

Configuration Reference

For a complete list of all configuration options, run:

hubio-sync config --help

Or see the Configuration Schema Documentation.


Troubleshooting

Common Configuration Errors

“Invalid TOML syntax”

  • Check for missing quotes around strings
  • Ensure proper escaping of backslashes in Windows paths
  • Validate with: hubio-sync validate

“Connection refused”

  • Verify host, port, and credentials
  • Check firewall rules
  • Test with: hubio-sync validate --test-source

“Permission denied”

  • Ensure database user has SELECT permissions
  • Check destination write permissions
  • Verify IAM roles (for cloud destinations)

For additional help, see the Troubleshooting Guide or contact support@hubio.com.