MFT Configuration

Configure managed file transfers between storage systems

Defines a managed file transfer (MFT) operation that moves files between storage systems with optional processing and scheduling.

Schema Properties

PropertyTypeRequiredDefaultDescription
SourceMftConfigurationSpecStorageYes-Source storage configuration
DestinationMftConfigurationSpecStorageYes-Destination storage configuration
CommandstringNo-Python command to execute during transfer (for processing files)
SchedulestringNo-Cron expression for scheduled transfers
RequirementsDictionary<string, string>No{}Python package requirements for command execution
ParametersDictionary<string, string>No{}Additional transfer parameters
AdditionalPackagesstringNo-Additional Python packages to install
ArgsstringNo-Additional command arguments
FiltersstringNo-File filtering patterns (glob patterns)
ProjectstringNo""Project name for organizing transfers

Storage Properties

PropertyTypeRequiredDefaultDescription
NamestringYes-Name of the storage system
PathstringYes-Path within the storage system

YAML Examples

Basic File Transfer

apiVersion: weik.io/v1
kind: MFT
metadata:
  name: daily-backup-transfer
spec:
  Source:
    Name: local-storage
    Path: /data/exports
  Destination:
    Name: azure-backup
    Path: /backups/daily
  Schedule: "0 2 * * *"

Transfer with File Filtering

apiVersion: weik.io/v1
kind: MFT
metadata:
  name: csv-data-sync
spec:
  Source:
    Name: sftp-server
    Path: /incoming/data
  Destination:
    Name: data-lake
    Path: /raw/customer-data
  Filters: "*.csv"
  Schedule: "*/30 * * * *"

Transfer with Processing

apiVersion: weik.io/v1
kind: MFT
metadata:
  name: process-and-transfer
spec:
  Source:
    Name: source-storage
    Path: /raw/files
  Destination:
    Name: processed-storage
    Path: /processed/files
  Command: |
    import pandas as pd
    import sys

    def process_file(file_path):
        # Read CSV file
        df = pd.read_csv(file_path)

        # Clean and transform data
        df = df.dropna()
        df['processed_date'] = pd.Timestamp.now()

        # Save processed file
        output_path = file_path.replace('.csv', '_processed.csv')
        df.to_csv(output_path, index=False)

        return output_path
  Requirements:
    pandas: ">=2.0.0"
  Schedule: "0 */6 * * *"

Complex Transfer with Parameters

apiVersion: weik.io/v1
kind: MFT
metadata:
  name: encrypted-backup-transfer
spec:
  Source:
    Name: production-database-exports
    Path: /exports/encrypted
  Destination:
    Name: offsite-backup
    Path: /backups/${date}
  Command: decrypt_and_compress.py
  Args: "--compression-level 9 --verify-integrity"
  Requirements:
    cryptography: ">=41.0.0"
    python-dateutil: ">=2.8.0"
  Parameters:
    EncryptionKey: ${backup-encryption-key}
    RetryCount: "3"
    Timeout: "3600"
  Filters: "*.enc"
  Schedule: "0 1 * * *"
  Project: production-backups

Multiple MFT Configurations

apiVersion: weik.io/v1
kind: MFT
metadata:
  name: orders-sync
spec:
  Source:
    Name: erp-system
    Path: /exports/orders
  Destination:
    Name: data-warehouse
    Path: /staging/orders
  Filters: "orders_*.json"
  Schedule: "0 * * * *"
  Project: data-integration
---
apiVersion: weik.io/v1
kind: MFT
metadata:
  name: customer-sync
spec:
  Source:
    Name: crm-system
    Path: /exports/customers
  Destination:
    Name: data-warehouse
    Path: /staging/customers
  Filters: "customers_*.json"
  Schedule: "30 * * * *"
  Project: data-integration
---
apiVersion: weik.io/v1
kind: MFT
metadata:
  name: products-sync
spec:
  Source:
    Name: pim-system
    Path: /exports/products
  Destination:
    Name: data-warehouse
    Path: /staging/products
  Filters: "products_*.json"
  Schedule: "15 * * * *"
  Project: data-integration

Transfer with Complex Processing

apiVersion: weik.io/v1
kind: MFT
metadata:
  name: image-optimization-transfer
spec:
  Source:
    Name: uploads-storage
    Path: /raw/images
  Destination:
    Name: cdn-storage
    Path: /optimized/images
  Command: |
    from PIL import Image
    import os

    def optimize_image(file_path):
        """Optimize and resize images"""
        img = Image.open(file_path)

        # Resize if larger than max dimensions
        max_size = (1920, 1080)
        img.thumbnail(max_size, Image.Resampling.LANCZOS)

        # Save optimized image
        output_path = file_path.replace('.jpg', '_optimized.jpg')
        img.save(output_path, 'JPEG', quality=85, optimize=True)

        return output_path
  Requirements:
    Pillow: ">=10.0.0"
  Filters: "*.jpg,*.jpeg,*.png"
  Schedule: "*/15 * * * *"
  Parameters:
    MaxFileSize: "5242880"
    DeleteSource: "false"

Usage Notes

MFT configurations automate file transfers between storage systems with optional processing, filtering, and scheduling.

Storage Configuration

The Source and Destination objects define storage endpoints:

  • Name references a configured storage system (must exist)
  • Path specifies the location within that storage

Storage systems must be configured separately before creating MFT configurations.

File Processing

The Command field can contain:

  • Inline Python code for simple transformations
  • Path to a Python script file
  • Name of an installed Python package/module

Processing happens during transfer:

  1. File is retrieved from source
  2. Command processes the file
  3. Processed file is transferred to destination

Scheduling

The Schedule field uses cron expressions to automate transfers:

  • Run transfers at specific times
  • Coordinate multiple related transfers
  • Balance load across different schedules

Omit Schedule for manual/on-demand transfers.

File Filtering

The Filters field accepts:

  • Single pattern: *.csv
  • Multiple patterns: *.csv,*.json,*.xml
  • Glob patterns: data_*.csv, 202?-??-??.json

Only files matching the filter patterns are transferred.

Parameters

Common parameters:

  • RetryCount - Number of retry attempts on failure
  • Timeout - Transfer timeout in seconds
  • DeleteSource - Whether to delete source files after successful transfer
  • OverwriteExisting - Whether to overwrite existing destination files
  • PreserveTimestamps - Whether to preserve file timestamps

Project Organization

Use the Project field to group related MFT configurations:

  • Enables batch operations on related transfers
  • Improves organization and filtering
  • Useful for managing complex transfer workflows

Best Practices

  • Test transfers manually before scheduling
  • Use descriptive names indicating source, destination, and purpose
  • Implement error handling in processing commands
  • Choose appropriate schedules to avoid conflicts
  • Monitor transfer success rates and failures
  • Use filters to transfer only necessary files
  • Consider storage costs when designing transfer patterns
  • Log processing results for troubleshooting
  • Set appropriate timeouts for large file transfers
  • Clean up old files to manage storage usage
  • Use variables for sensitive information (credentials, keys)
  • Document processing logic for maintenance

Error Handling

Transfers may fail due to:

  • Network connectivity issues
  • Authentication failures
  • Insufficient permissions
  • Storage capacity limits
  • Processing errors

Include error handling in processing commands and configure appropriate retry behavior in parameters.

Monitoring

Monitor MFT operations:

weikio integration mft ls
weikio integration mft status <mft-id>
weikio integration mft history <mft-id>