Best Practices
Follow these security and performance recommendations to get the most out of FreeState in production environments while maintaining security and reliability.
Security Best Practices
API Key Management
Key Generation and Storage
- Use descriptive names: "prod-terraform-ci", "dev-team-access"
- Store securely: Use secret management systems (AWS Secrets Manager, HashiCorp Vault)
- Never commit keys: Add to .gitignore and use environment variables
- Limit scope: Use workspace-scoped keys when possible
# Good: Environment variables
export TF_HTTP_USERNAME="workspace-id"
export TF_HTTP_PASSWORD="$FREESTATE_API_KEY"
# Bad: Hardcoded in files
terraform {
backend "http" {
username = "workspace-123"
password = "fs_key_abc123..." # Never do this!
}
}Key Rotation
- Regular rotation: Rotate keys every 90 days
- Emergency rotation: Immediate rotation if compromised
- Overlap period: Brief overlap when updating systems
- Audit old keys: Remove unused keys promptly
Access Control
Principle of Least Privilege
| Role | Permissions | Use Case |
|---|---|---|
| Read-Only | State viewing, workspace metadata | Monitoring, audit, read-only CI |
| Contributor | State read/write, lock management | Developers, CI/CD pipelines |
| Admin | Full workspace management | DevOps engineers, team leads |
Multi-Factor Authentication
- Enable MFA: Required for all team members
- Backup codes: Store securely for account recovery
- Regular audits: Review MFA status monthly
Network Security
IP Whitelisting
# Configure IP restrictions for production workspaces
allowed_ips = [
"203.0.113.0/24", # Office network
"192.0.2.1/32", # CI/CD server
"198.51.100.0/24" # VPN range
]VPN and Private Networks
- Use VPNs: Route traffic through secure connections
- Private endpoints: Consider private connectivity options
- Monitor access: Log and alert on unusual access patterns
Performance Best Practices
State File Optimization
State Size Management
- Modular architecture: Break large configurations into modules
- Separate state files: Use different workspaces for logical boundaries
- Resource limits: Monitor state file size and resource count
# Good: Modular approach
# Infrastructure layer
terraform workspace select infra-prod
terraform apply
# Application layer
terraform workspace select app-prod
terraform apply
# Bad: Everything in one state
terraform apply # 500+ resources in one state fileState Hygiene
- Remove unused resources: Clean up regularly
- Import existing resources: Don't recreate what exists
- Use data sources: Reference external resources
# Remove unused resources
terraform state rm 'aws_instance.old_server'
# Import existing resources instead of recreating
terraform import aws_instance.existing i-1234567890abcdef0
# Use data sources for external references
data "aws_vpc" "existing" {
id = "vpc-12345678"
}Lock Management
Minimize Lock Duration
- Small changes: Make incremental updates
- Pre-validation: Use terraform plan to catch issues early
- Automated releases: Configure auto-unlock timeouts
Coordinate Team Access
- Communication: Announce large changes in advance
- Scheduled windows: Use maintenance windows for major updates
- Monitoring: Set up alerts for long-running locks
Spot Instance Optimization
Capacity Strategy
- Mixed capacity: Use 70% Spot + 30% On-Demand for optimal cost/availability
- Multi-AZ deployment: Distribute tasks across availability zones
- Right-sizing: Choose instance types with good Spot availability
- Base capacity: Ensure at least 1 On-Demand task for service stability
# Optimal ECS capacity provider strategy
{
"capacityProviderStrategy": [
{
"capacityProvider": "FARGATE_SPOT",
"weight": 70,
"base": 0
},
{
"capacityProvider": "FARGATE",
"weight": 30,
"base": 1
}
]
}Application Resilience
- Stateless design: Store state externally (database, S3, cache)
- Graceful shutdown: Handle SIGTERM within 120 seconds
- Health checks: Implement comprehensive readiness/liveness probes
- Circuit breakers: Add fallback mechanisms for service dependencies
Monitoring and Alerting
- Spot interruption tracking: Monitor interruption rates and patterns
- Placement failure alerts: Get notified when Spot capacity is unavailable
- Service availability metrics: Track healthy task percentage
- Cost optimization reports: Measure savings vs. operational overhead
Learn More: See our comprehensive Spot Instance Support guide for detailed configuration examples and troubleshooting.
CI/CD Best Practices
Pipeline Design
Environment Promotion
# GitLab CI example with proper promotion
stages:
- validate
- plan-dev
- apply-dev
- plan-staging
- apply-staging
- plan-prod
- apply-prod
dev-plan:
stage: plan-dev
script:
- terraform workspace select dev
- terraform plan
only:
- develop
dev-apply:
stage: apply-dev
script:
- terraform workspace select dev
- terraform apply -auto-approve
only:
- develop
prod-plan:
stage: plan-prod
script:
- terraform workspace select prod
- terraform plan
only:
- main
prod-apply:
stage: apply-prod
script:
- terraform workspace select prod
- terraform apply
when: manual # Require manual approval
only:
- mainError Handling
# GitHub Actions with proper error handling
- name: Terraform Apply
id: apply
continue-on-error: true
run: terraform apply -auto-approve
- name: Handle Failure
if: steps.apply.outcome == 'failure'
run: |
echo "Terraform apply failed"
terraform show
exit 1
- name: Notify Success
if: steps.apply.outcome == 'success'
run: |
echo "Deployment successful"
# Send notification to Slack/TeamsSecurity in CI/CD
Secret Management
- Use CI/CD secret stores: GitHub Secrets, GitLab Variables
- Scope secrets appropriately: Environment-specific secrets
- Rotate regularly: Automated secret rotation
- Audit access: Log secret usage
Branch Protection
- Protect main branches: Require reviews for production
- Status checks: Require successful builds
- Signed commits: Verify commit authenticity
Monitoring and Alerting
Key Metrics
Performance Metrics
- Operation duration: Track apply/plan times
- State file size: Monitor growth trends
- Lock duration: Identify bottlenecks
- API response times: Monitor backend performance
Security Metrics
- Failed authentication attempts: Detect brute force attacks
- Unusual access patterns: Geographic or time-based anomalies
- Permission changes: Track access control modifications
- API key usage: Monitor for suspicious activity
Alerting Strategy
Critical Alerts
- State corruption: Immediate notification
- Failed deployments: Real-time alerts
- Security incidents: Immediate escalation
- Service outages: Automated failover triggers
Warning Alerts
- Long-running operations: 30+ minute threshold
- State file growth: Size increase warnings
- High API usage: Approaching rate limits
- Lock contention: Multiple failed lock attempts
Disaster Recovery
Backup Strategy
Automated Backups
- Pre-change backups: Before every apply operation
- Scheduled backups: Daily snapshots of all workspaces
- Cross-region replication: Geographic redundancy
- Retention policies: 30 days of backup history
Backup Verification
# Regular backup verification script
#!/bin/bash
WORKSPACE="prod-app"
BACKUP_DIR="/backups"
# Create backup
terraform workspace select $WORKSPACE
terraform state pull > "$BACKUP_DIR/verify-$(date +%Y%m%d).tfstate"
# Verify backup integrity
terraform state list > /tmp/current-resources
terraform state pull | terraform state list > /tmp/backup-resources
if diff /tmp/current-resources /tmp/backup-resources; then
echo "Backup verification successful"
else
echo "Backup verification failed!"
exit 1
fiRecovery Procedures
State Recovery
- Identify issue: Corruption, accidental deletion, etc.
- Stop operations: Prevent further changes
- Restore from backup: Use most recent valid backup
- Verify integrity: Compare with actual infrastructure
- Resume operations: Gradual return to normal operations
Infrastructure Drift
# Detect and fix infrastructure drift
terraform refresh
terraform plan
# If drift detected
terraform apply # Apply corrections
# Or import changes made outside Terraform
terraform import aws_security_group.web sg-12345678Cost Optimization
Resource Management
Right-sizing Workspaces
- Monitor usage: Track API calls and storage
- Consolidate when appropriate: Merge low-activity workspaces
- Archive old workspaces: Remove unused environments
Efficient Operations
- Batch operations: Group related changes
- Use targeted applies: Apply specific resources when possible
- Minimize plan frequency: Avoid unnecessary planning
Team Collaboration
Workspace Organization
Naming Conventions
# Recommended naming patterns
{project}-{environment}-{component}
myapp-prod-frontend
myapp-prod-backend
myapp-prod-database
# Or team-based
{team}-{project}-{environment}
platform-infrastructure-prod
payments-service-prod
user-management-stagingDocumentation Standards
- Workspace descriptions: Clear purpose and scope
- README files: Setup and operation instructions
- Change logs: Document major modifications
- Contact information: Workspace owners and escalation
Code Review Process
Review Checklist
- Security: No hardcoded secrets or overprivileged access
- Performance: Efficient resource configurations
- Standards: Follows team conventions
- Testing: Includes appropriate validation
Approval Process
- Development: Peer review required
- Staging: Senior engineer approval
- Production: DevOps team approval