Spot Instance Support

FreeState services are designed to run efficiently on AWS Fargate Spot instances, providing cost savings while maintaining high availability through graceful shutdown handling and automatic recovery.

Overview

AWS Fargate Spot lets you run containerized applications on spare compute capacity at up to 70% savings compared to on-demand pricing. FreeState services handle Spot interruptions gracefully, ensuring no data loss or service disruption.

Cost Savings: Spot instances can reduce infrastructure costs by 50-70% while maintaining the same performance and reliability as on-demand instances.

How Spot Instance Support Works

Signal Handling

When AWS needs to reclaim Spot capacity, it sends a SIGTERM signal to the container with a 2-minute graceful shutdown period. FreeState services handle this signal properly:

  • Backend API: Gracefully drains in-flight requests and closes database connections
  • Portal: Completes user operations and saves session state
  • State Manager: Ensures all Terraform state operations complete before shutdown

Load Balancer Integration

All services properly deregister from Application Load Balancers during shutdown:

  • Services stop accepting new requests immediately upon SIGTERM
  • Load balancer health checks fail, triggering automatic deregistration
  • Traffic is automatically routed to healthy instances
  • Graceful shutdown completes within the configured timeout (≤ 120 seconds)

Automatic Recovery

ECS service scheduler automatically launches replacement tasks when Spot instances are interrupted:

  • Maintains desired task count across availability zones
  • Replaces interrupted tasks within 30-60 seconds
  • Preserves service availability during interruptions
  • No manual intervention required

Configuration

ECS Task Definition Settings

FreeState services use optimized ECS task definitions for Spot instances:

{
  "family": "freestate-backend",
  "requiresCompatibilites": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "backend",
      "image": "freestate/backend:latest",
      "essential": true,
      "stopTimeout": 120,
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/freestate-backend",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

Service Configuration

ECS services are configured for optimal Spot instance behavior:

{
  "serviceName": "freestate-backend",
  "cluster": "freestate-cluster",
  "taskDefinition": "freestate-backend:1",
  "desiredCount": 3,
  "capacityProviderStrategy": [
    {
      "capacityProvider": "FARGATE_SPOT",
      "weight": 70,
      "base": 0
    },
    {
      "capacityProvider": "FARGATE",
      "weight": 30,
      "base": 1
    }
  ],
  "deploymentConfiguration": {
    "maximumPercent": 200,
    "minimumHealthyPercent": 50,
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  },
  "networkConfiguration": {
    "awsvpcConfiguration": {
      "subnets": ["subnet-xxx", "subnet-yyy"],
      "securityGroups": ["sg-xxx"],
      "assignPublicIp": "DISABLED"
    }
  }
}

Monitoring and Alerting

EventBridge Integration

FreeState services integrate with AWS EventBridge to monitor Spot interruptions and placement failures:

{
  "Rules": [
    {
      "Name": "SpotInterruption",
      "EventPattern": {
        "source": ["aws.ecs"],
        "detail-type": ["ECS Task State Change"],
        "detail": {
          "stoppedReason": ["Spot interruption"]
        }
      },
      "Targets": [
        {
          "Arn": "arn:aws:sns:us-east-1:123456789012:spot-alerts",
          "Id": "SpotInterruptionAlert"
        }
      ]
    },
    {
      "Name": "TaskPlacementFailure",
      "EventPattern": {
        "source": ["aws.ecs"],
        "detail-type": ["ECS Service Action"],
        "detail": {
          "eventType": ["ERROR"],
          "eventName": ["SERVICE_TASK_PLACEMENT_FAILURE"]
        }
      },
      "Targets": [
        {
          "Arn": "arn:aws:sns:us-east-1:123456789012:placement-alerts",
          "Id": "PlacementFailureAlert"
        }
      ]
    }
  ]
}

CloudWatch Metrics

Key metrics to monitor for Spot instance health:

  • SpotInterruptions: Count of Spot interruptions per hour
  • TaskReplacements: Time to replace interrupted tasks
  • ServiceAvailability: Percentage of healthy tasks
  • PlacementFailures: Failed task placements due to Spot unavailability

Best Practices

Application Design

  • Stateless Design: Keep application state in external stores (database, cache)
  • Idempotent Operations: Ensure operations can be safely retried
  • Circuit Breakers: Implement fallback mechanisms for service dependencies
  • Health Checks: Comprehensive health checks for proper load balancer integration

Deployment Strategy

  • Mixed Capacity: Use 70% Spot + 30% On-Demand for optimal cost/availability balance
  • Multi-AZ Distribution: Spread tasks across multiple availability zones
  • Rolling Deployments: Minimize impact during deployments
  • Deployment Circuit Breaker: Automatic rollback on failed deployments

Operational Excellence

  • Monitoring: Comprehensive logging and metrics collection
  • Alerting: Proactive alerts for interruptions and placement failures
  • Testing: Regular chaos engineering to validate resilience
  • Documentation: Clear runbooks for handling Spot-related incidents

Testing Spot Interruptions

Simulating Interruptions Locally

Test graceful shutdown behavior in your development environment:

# Start your service in a container
docker run -d --name freestate-test freestate/backend:latest

# Send SIGTERM to test graceful shutdown
docker kill --signal=SIGTERM freestate-test

# Monitor logs to verify graceful shutdown
docker logs -f freestate-test

# Verify exit code (should be 0 for clean shutdown)
docker inspect freestate-test --format='{{.State.ExitCode}}'

Simulating Interruptions in ECS

Test in your staging environment by manually stopping tasks:

# Stop a task to simulate Spot interruption
aws ecs stop-task   --cluster freestate-staging   --task arn:aws:ecs:us-east-1:123456789012:task/task-id   --reason "Testing Spot interruption"

# Monitor service metrics during replacement
aws ecs describe-services   --cluster freestate-staging   --services freestate-backend   --query 'services[0].events[0:5]'

Troubleshooting

Common Issues

High Interruption Rate: If experiencing frequent Spot interruptions, consider increasing the On-Demand percentage or changing instance types.

Placement Failures: If tasks fail to place due to Spot unavailability, ECS will automatically fall back to On-Demand instances.

Shutdown Timeout: If services don't shutdown within 120 seconds, they'll be forcefully terminated. Check application logs and optimize shutdown procedures.

Debugging Steps

  1. Check Service Events: Review ECS service events for interruption patterns
  2. Monitor CloudWatch Logs: Look for graceful shutdown logs and errors
  3. Verify Health Checks: Ensure health checks properly reflect service state
  4. Review Load Balancer Metrics: Check target deregistration timing
  5. Analyze EventBridge Events: Review Spot interruption event patterns

Cost Optimization

Spot instances provide significant cost savings while maintaining service reliability:

  • Backend Services: 50-70% cost reduction with minimal availability impact
  • Batch Processing: Ideal for non-time-critical workloads
  • Development/Staging: Maximum cost savings for non-production environments
  • Monitoring: Track cost savings vs. interruption overhead

Security Considerations

Spot instances maintain the same security posture as On-Demand instances:

  • IAM Roles: Same task and execution roles apply
  • Network Security: VPC and security group configurations unchanged
  • Secrets Management: AWS Secrets Manager integration works identically
  • Encryption: Data in transit and at rest encryption maintained

Ready to Get Started? Contact our support team to enable Spot instance support for your FreeState deployment and start saving on infrastructure costs today.