Spot Instance Support

FreeState services are designed to run efficiently on AWS Fargate Spot instances, providing cost savings while maintaining high availability through graceful shutdown handling and automatic recovery.

Overview

AWS Fargate Spot lets you run containerized applications on spare compute capacity at up to 70% savings compared to on-demand pricing. FreeState services handle Spot interruptions gracefully, ensuring no data loss or service disruption.

Cost Savings: Spot instances can reduce infrastructure costs by 50-70% while maintaining the same performance and reliability as on-demand instances.

How Spot Instance Support Works

Signal Handling

When AWS needs to reclaim Spot capacity, it sends a SIGTERM signal to the container with a 2-minute graceful shutdown period. FreeState services handle this signal properly:

Backend API: Gracefully drains in-flight requests and closes database connections
Portal: Completes user operations and saves session state
State Manager: Ensures all Terraform state operations complete before shutdown

Load Balancer Integration

All services properly deregister from Application Load Balancers during shutdown:

Services stop accepting new requests immediately upon SIGTERM
Load balancer health checks fail, triggering automatic deregistration
Traffic is automatically routed to healthy instances
Graceful shutdown completes within the configured timeout (≤ 120 seconds)

Automatic Recovery

ECS service scheduler automatically launches replacement tasks when Spot instances are interrupted:

Maintains desired task count across availability zones
Replaces interrupted tasks within 30-60 seconds
Preserves service availability during interruptions
No manual intervention required

Configuration

ECS Task Definition Settings

FreeState services use optimized ECS task definitions for Spot instances:

{
  "family": "freestate-backend",
  "requiresCompatibilites": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "backend",
      "image": "freestate/backend:latest",
      "essential": true,
      "stopTimeout": 120,
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/freestate-backend",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

Service Configuration

ECS services are configured for optimal Spot instance behavior:

{
  "serviceName": "freestate-backend",
  "cluster": "freestate-cluster",
  "taskDefinition": "freestate-backend:1",
  "desiredCount": 3,
  "capacityProviderStrategy": [
    {
      "capacityProvider": "FARGATE_SPOT",
      "weight": 70,
      "base": 0
    },
    {
      "capacityProvider": "FARGATE",
      "weight": 30,
      "base": 1
    }
  ],
  "deploymentConfiguration": {
    "maximumPercent": 200,
    "minimumHealthyPercent": 50,
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  },
  "networkConfiguration": {
    "awsvpcConfiguration": {
      "subnets": ["subnet-xxx", "subnet-yyy"],
      "securityGroups": ["sg-xxx"],
      "assignPublicIp": "DISABLED"
    }
  }
}

Monitoring and Alerting

EventBridge Integration

FreeState services integrate with AWS EventBridge to monitor Spot interruptions and placement failures:

{
  "Rules": [
    {
      "Name": "SpotInterruption",
      "EventPattern": {
        "source": ["aws.ecs"],
        "detail-type": ["ECS Task State Change"],
        "detail": {
          "stoppedReason": ["Spot interruption"]
        }
      },
      "Targets": [
        {
          "Arn": "arn:aws:sns:us-east-1:123456789012:spot-alerts",
          "Id": "SpotInterruptionAlert"
        }
      ]
    },
    {
      "Name": "TaskPlacementFailure",
      "EventPattern": {
        "source": ["aws.ecs"],
        "detail-type": ["ECS Service Action"],
        "detail": {
          "eventType": ["ERROR"],
          "eventName": ["SERVICE_TASK_PLACEMENT_FAILURE"]
        }
      },
      "Targets": [
        {
          "Arn": "arn:aws:sns:us-east-1:123456789012:placement-alerts",
          "Id": "PlacementFailureAlert"
        }
      ]
    }
  ]
}

CloudWatch Metrics

Key metrics to monitor for Spot instance health:

SpotInterruptions: Count of Spot interruptions per hour
TaskReplacements: Time to replace interrupted tasks
ServiceAvailability: Percentage of healthy tasks
PlacementFailures: Failed task placements due to Spot unavailability

Best Practices

Application Design

Stateless Design: Keep application state in external stores (database, cache)
Idempotent Operations: Ensure operations can be safely retried
Circuit Breakers: Implement fallback mechanisms for service dependencies
Health Checks: Comprehensive health checks for proper load balancer integration

Deployment Strategy

Mixed Capacity: Use 70% Spot + 30% On-Demand for optimal cost/availability balance
Multi-AZ Distribution: Spread tasks across multiple availability zones
Rolling Deployments: Minimize impact during deployments
Deployment Circuit Breaker: Automatic rollback on failed deployments

Operational Excellence

Monitoring: Comprehensive logging and metrics collection
Alerting: Proactive alerts for interruptions and placement failures
Testing: Regular chaos engineering to validate resilience
Documentation: Clear runbooks for handling Spot-related incidents

Testing Spot Interruptions

Simulating Interruptions Locally

Test graceful shutdown behavior in your development environment:

# Start your service in a container
docker run -d --name freestate-test freestate/backend:latest

# Send SIGTERM to test graceful shutdown
docker kill --signal=SIGTERM freestate-test

# Monitor logs to verify graceful shutdown
docker logs -f freestate-test

# Verify exit code (should be 0 for clean shutdown)
docker inspect freestate-test --format='{{.State.ExitCode}}'

Simulating Interruptions in ECS

Test in your staging environment by manually stopping tasks:

# Stop a task to simulate Spot interruption
aws ecs stop-task   --cluster freestate-staging   --task arn:aws:ecs:us-east-1:123456789012:task/task-id   --reason "Testing Spot interruption"

# Monitor service metrics during replacement
aws ecs describe-services   --cluster freestate-staging   --services freestate-backend   --query 'services[0].events[0:5]'

Troubleshooting

Common Issues

High Interruption Rate: If experiencing frequent Spot interruptions, consider increasing the On-Demand percentage or changing instance types.

Placement Failures: If tasks fail to place due to Spot unavailability, ECS will automatically fall back to On-Demand instances.

Shutdown Timeout: If services don't shutdown within 120 seconds, they'll be forcefully terminated. Check application logs and optimize shutdown procedures.

Debugging Steps

Check Service Events: Review ECS service events for interruption patterns
Monitor CloudWatch Logs: Look for graceful shutdown logs and errors
Verify Health Checks: Ensure health checks properly reflect service state
Review Load Balancer Metrics: Check target deregistration timing
Analyze EventBridge Events: Review Spot interruption event patterns

Cost Optimization

Spot instances provide significant cost savings while maintaining service reliability:

Backend Services: 50-70% cost reduction with minimal availability impact
Batch Processing: Ideal for non-time-critical workloads
Development/Staging: Maximum cost savings for non-production environments
Monitoring: Track cost savings vs. interruption overhead

Security Considerations

Spot instances maintain the same security posture as On-Demand instances:

IAM Roles: Same task and execution roles apply
Network Security: VPC and security group configurations unchanged
Secrets Management: AWS Secrets Manager integration works identically
Encryption: Data in transit and at rest encryption maintained

Ready to Get Started? Contact our support team to enable Spot instance support for your FreeState deployment and start saving on infrastructure costs today.