Spot Instance Support
FreeState services are designed to run efficiently on AWS Fargate Spot instances, providing cost savings while maintaining high availability through graceful shutdown handling and automatic recovery.
Overview
AWS Fargate Spot lets you run containerized applications on spare compute capacity at up to 70% savings compared to on-demand pricing. FreeState services handle Spot interruptions gracefully, ensuring no data loss or service disruption.
Cost Savings: Spot instances can reduce infrastructure costs by 50-70% while maintaining the same performance and reliability as on-demand instances.
How Spot Instance Support Works
Signal Handling
When AWS needs to reclaim Spot capacity, it sends a SIGTERM signal to the container with a 2-minute graceful shutdown period. FreeState services handle this signal properly:
- Backend API: Gracefully drains in-flight requests and closes database connections
- Portal: Completes user operations and saves session state
- State Manager: Ensures all Terraform state operations complete before shutdown
Load Balancer Integration
All services properly deregister from Application Load Balancers during shutdown:
- Services stop accepting new requests immediately upon
SIGTERM - Load balancer health checks fail, triggering automatic deregistration
- Traffic is automatically routed to healthy instances
- Graceful shutdown completes within the configured timeout (≤ 120 seconds)
Automatic Recovery
ECS service scheduler automatically launches replacement tasks when Spot instances are interrupted:
- Maintains desired task count across availability zones
- Replaces interrupted tasks within 30-60 seconds
- Preserves service availability during interruptions
- No manual intervention required
Configuration
ECS Task Definition Settings
FreeState services use optimized ECS task definitions for Spot instances:
{
"family": "freestate-backend",
"requiresCompatibilites": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "backend",
"image": "freestate/backend:latest",
"essential": true,
"stopTimeout": 120,
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/freestate-backend",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}Service Configuration
ECS services are configured for optimal Spot instance behavior:
{
"serviceName": "freestate-backend",
"cluster": "freestate-cluster",
"taskDefinition": "freestate-backend:1",
"desiredCount": 3,
"capacityProviderStrategy": [
{
"capacityProvider": "FARGATE_SPOT",
"weight": 70,
"base": 0
},
{
"capacityProvider": "FARGATE",
"weight": 30,
"base": 1
}
],
"deploymentConfiguration": {
"maximumPercent": 200,
"minimumHealthyPercent": 50,
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
}
},
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": ["subnet-xxx", "subnet-yyy"],
"securityGroups": ["sg-xxx"],
"assignPublicIp": "DISABLED"
}
}
}Monitoring and Alerting
EventBridge Integration
FreeState services integrate with AWS EventBridge to monitor Spot interruptions and placement failures:
{
"Rules": [
{
"Name": "SpotInterruption",
"EventPattern": {
"source": ["aws.ecs"],
"detail-type": ["ECS Task State Change"],
"detail": {
"stoppedReason": ["Spot interruption"]
}
},
"Targets": [
{
"Arn": "arn:aws:sns:us-east-1:123456789012:spot-alerts",
"Id": "SpotInterruptionAlert"
}
]
},
{
"Name": "TaskPlacementFailure",
"EventPattern": {
"source": ["aws.ecs"],
"detail-type": ["ECS Service Action"],
"detail": {
"eventType": ["ERROR"],
"eventName": ["SERVICE_TASK_PLACEMENT_FAILURE"]
}
},
"Targets": [
{
"Arn": "arn:aws:sns:us-east-1:123456789012:placement-alerts",
"Id": "PlacementFailureAlert"
}
]
}
]
}CloudWatch Metrics
Key metrics to monitor for Spot instance health:
- SpotInterruptions: Count of Spot interruptions per hour
- TaskReplacements: Time to replace interrupted tasks
- ServiceAvailability: Percentage of healthy tasks
- PlacementFailures: Failed task placements due to Spot unavailability
Best Practices
Application Design
- Stateless Design: Keep application state in external stores (database, cache)
- Idempotent Operations: Ensure operations can be safely retried
- Circuit Breakers: Implement fallback mechanisms for service dependencies
- Health Checks: Comprehensive health checks for proper load balancer integration
Deployment Strategy
- Mixed Capacity: Use 70% Spot + 30% On-Demand for optimal cost/availability balance
- Multi-AZ Distribution: Spread tasks across multiple availability zones
- Rolling Deployments: Minimize impact during deployments
- Deployment Circuit Breaker: Automatic rollback on failed deployments
Operational Excellence
- Monitoring: Comprehensive logging and metrics collection
- Alerting: Proactive alerts for interruptions and placement failures
- Testing: Regular chaos engineering to validate resilience
- Documentation: Clear runbooks for handling Spot-related incidents
Testing Spot Interruptions
Simulating Interruptions Locally
Test graceful shutdown behavior in your development environment:
# Start your service in a container
docker run -d --name freestate-test freestate/backend:latest
# Send SIGTERM to test graceful shutdown
docker kill --signal=SIGTERM freestate-test
# Monitor logs to verify graceful shutdown
docker logs -f freestate-test
# Verify exit code (should be 0 for clean shutdown)
docker inspect freestate-test --format='{{.State.ExitCode}}'Simulating Interruptions in ECS
Test in your staging environment by manually stopping tasks:
# Stop a task to simulate Spot interruption
aws ecs stop-task --cluster freestate-staging --task arn:aws:ecs:us-east-1:123456789012:task/task-id --reason "Testing Spot interruption"
# Monitor service metrics during replacement
aws ecs describe-services --cluster freestate-staging --services freestate-backend --query 'services[0].events[0:5]'Troubleshooting
Common Issues
High Interruption Rate: If experiencing frequent Spot interruptions, consider increasing the On-Demand percentage or changing instance types.
Placement Failures: If tasks fail to place due to Spot unavailability, ECS will automatically fall back to On-Demand instances.
Shutdown Timeout: If services don't shutdown within 120 seconds, they'll be forcefully terminated. Check application logs and optimize shutdown procedures.
Debugging Steps
- Check Service Events: Review ECS service events for interruption patterns
- Monitor CloudWatch Logs: Look for graceful shutdown logs and errors
- Verify Health Checks: Ensure health checks properly reflect service state
- Review Load Balancer Metrics: Check target deregistration timing
- Analyze EventBridge Events: Review Spot interruption event patterns
Cost Optimization
Spot instances provide significant cost savings while maintaining service reliability:
- Backend Services: 50-70% cost reduction with minimal availability impact
- Batch Processing: Ideal for non-time-critical workloads
- Development/Staging: Maximum cost savings for non-production environments
- Monitoring: Track cost savings vs. interruption overhead
Security Considerations
Spot instances maintain the same security posture as On-Demand instances:
- IAM Roles: Same task and execution roles apply
- Network Security: VPC and security group configurations unchanged
- Secrets Management: AWS Secrets Manager integration works identically
- Encryption: Data in transit and at rest encryption maintained
Ready to Get Started? Contact our support team to enable Spot instance support for your FreeState deployment and start saving on infrastructure costs today.