MFT Configuration
Configure managed file transfers between storage systems
Defines a managed file transfer (MFT) operation that moves files between storage systems with optional processing and scheduling.
Schema Properties
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
| Source | MftConfigurationSpecStorage | Yes | - | Source storage configuration |
| Destination | MftConfigurationSpecStorage | Yes | - | Destination storage configuration |
| Command | string | No | - | Python command to execute during transfer (for processing files) |
| Schedule | string | No | - | Cron expression for scheduled transfers |
| Requirements | Dictionary<string, string> | No | {} | Python package requirements for command execution |
| Parameters | Dictionary<string, string> | No | {} | Additional transfer parameters |
| AdditionalPackages | string | No | - | Additional Python packages to install |
| Args | string | No | - | Additional command arguments |
| Filters | string | No | - | File filtering patterns (glob patterns) |
| Project | string | No | "" | Project name for organizing transfers |
Storage Properties
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
| Name | string | Yes | - | Name of the storage system |
| Path | string | Yes | - | Path within the storage system |
YAML Examples
Basic File Transfer
apiVersion: weik.io/v1
kind: MFT
metadata:
name: daily-backup-transfer
spec:
Source:
Name: local-storage
Path: /data/exports
Destination:
Name: azure-backup
Path: /backups/daily
Schedule: "0 2 * * *"
Transfer with File Filtering
apiVersion: weik.io/v1
kind: MFT
metadata:
name: csv-data-sync
spec:
Source:
Name: sftp-server
Path: /incoming/data
Destination:
Name: data-lake
Path: /raw/customer-data
Filters: "*.csv"
Schedule: "*/30 * * * *"
Transfer with Processing
apiVersion: weik.io/v1
kind: MFT
metadata:
name: process-and-transfer
spec:
Source:
Name: source-storage
Path: /raw/files
Destination:
Name: processed-storage
Path: /processed/files
Command: |
import pandas as pd
import sys
def process_file(file_path):
# Read CSV file
df = pd.read_csv(file_path)
# Clean and transform data
df = df.dropna()
df['processed_date'] = pd.Timestamp.now()
# Save processed file
output_path = file_path.replace('.csv', '_processed.csv')
df.to_csv(output_path, index=False)
return output_path
Requirements:
pandas: ">=2.0.0"
Schedule: "0 */6 * * *"
Complex Transfer with Parameters
apiVersion: weik.io/v1
kind: MFT
metadata:
name: encrypted-backup-transfer
spec:
Source:
Name: production-database-exports
Path: /exports/encrypted
Destination:
Name: offsite-backup
Path: /backups/${date}
Command: decrypt_and_compress.py
Args: "--compression-level 9 --verify-integrity"
Requirements:
cryptography: ">=41.0.0"
python-dateutil: ">=2.8.0"
Parameters:
EncryptionKey: ${backup-encryption-key}
RetryCount: "3"
Timeout: "3600"
Filters: "*.enc"
Schedule: "0 1 * * *"
Project: production-backups
Multiple MFT Configurations
apiVersion: weik.io/v1
kind: MFT
metadata:
name: orders-sync
spec:
Source:
Name: erp-system
Path: /exports/orders
Destination:
Name: data-warehouse
Path: /staging/orders
Filters: "orders_*.json"
Schedule: "0 * * * *"
Project: data-integration
---
apiVersion: weik.io/v1
kind: MFT
metadata:
name: customer-sync
spec:
Source:
Name: crm-system
Path: /exports/customers
Destination:
Name: data-warehouse
Path: /staging/customers
Filters: "customers_*.json"
Schedule: "30 * * * *"
Project: data-integration
---
apiVersion: weik.io/v1
kind: MFT
metadata:
name: products-sync
spec:
Source:
Name: pim-system
Path: /exports/products
Destination:
Name: data-warehouse
Path: /staging/products
Filters: "products_*.json"
Schedule: "15 * * * *"
Project: data-integration
Transfer with Complex Processing
apiVersion: weik.io/v1
kind: MFT
metadata:
name: image-optimization-transfer
spec:
Source:
Name: uploads-storage
Path: /raw/images
Destination:
Name: cdn-storage
Path: /optimized/images
Command: |
from PIL import Image
import os
def optimize_image(file_path):
"""Optimize and resize images"""
img = Image.open(file_path)
# Resize if larger than max dimensions
max_size = (1920, 1080)
img.thumbnail(max_size, Image.Resampling.LANCZOS)
# Save optimized image
output_path = file_path.replace('.jpg', '_optimized.jpg')
img.save(output_path, 'JPEG', quality=85, optimize=True)
return output_path
Requirements:
Pillow: ">=10.0.0"
Filters: "*.jpg,*.jpeg,*.png"
Schedule: "*/15 * * * *"
Parameters:
MaxFileSize: "5242880"
DeleteSource: "false"
Usage Notes
MFT configurations automate file transfers between storage systems with optional processing, filtering, and scheduling.
Storage Configuration
The Source and Destination objects define storage endpoints:
Namereferences a configured storage system (must exist)Pathspecifies the location within that storage
Storage systems must be configured separately before creating MFT configurations.
File Processing
The Command field can contain:
- Inline Python code for simple transformations
- Path to a Python script file
- Name of an installed Python package/module
Processing happens during transfer:
- File is retrieved from source
- Command processes the file
- Processed file is transferred to destination
Scheduling
The Schedule field uses cron expressions to automate transfers:
- Run transfers at specific times
- Coordinate multiple related transfers
- Balance load across different schedules
Omit Schedule for manual/on-demand transfers.
File Filtering
The Filters field accepts:
- Single pattern:
*.csv - Multiple patterns:
*.csv,*.json,*.xml - Glob patterns:
data_*.csv,202?-??-??.json
Only files matching the filter patterns are transferred.
Parameters
Common parameters:
RetryCount- Number of retry attempts on failureTimeout- Transfer timeout in secondsDeleteSource- Whether to delete source files after successful transferOverwriteExisting- Whether to overwrite existing destination filesPreserveTimestamps- Whether to preserve file timestamps
Project Organization
Use the Project field to group related MFT configurations:
- Enables batch operations on related transfers
- Improves organization and filtering
- Useful for managing complex transfer workflows
Best Practices
- Test transfers manually before scheduling
- Use descriptive names indicating source, destination, and purpose
- Implement error handling in processing commands
- Choose appropriate schedules to avoid conflicts
- Monitor transfer success rates and failures
- Use filters to transfer only necessary files
- Consider storage costs when designing transfer patterns
- Log processing results for troubleshooting
- Set appropriate timeouts for large file transfers
- Clean up old files to manage storage usage
- Use variables for sensitive information (credentials, keys)
- Document processing logic for maintenance
Error Handling
Transfers may fail due to:
- Network connectivity issues
- Authentication failures
- Insufficient permissions
- Storage capacity limits
- Processing errors
Include error handling in processing commands and configure appropriate retry behavior in parameters.
Monitoring
Monitor MFT operations:
weikio integration mft ls
weikio integration mft status <mft-id>
weikio integration mft history <mft-id>