Skip to content

ElastiCache for Redis replication groups should have automatic failover enabled

When a Redis primary node fails without automatic failover, the replication group requires manual intervention to promote a replica. During that window, all write operations fail and applications depending on the cache layer degrade or fall over entirely. Automatic failover lets ElastiCache detect the failure and promote a healthy replica within seconds, keeping write availability intact without operator involvement.

The cost of enabling this is negligible compared to the operational risk. You already pay for replicas; automatic failover simply ensures they're used when needed rather than sitting idle during an incident.

Retrofit consideration

Enabling automatic failover on an existing replication group requires at least one read replica and may trigger a brief maintenance window depending on configuration changes.

Implementation

Choose the approach that matches how you manage Terraform.

If you use terraform-aws-modules/elasticache/aws, set the right module inputs for this control. You can later migrate to the compliance.tf module with minimal changes because it is compatible by design.

module "elasticache" {
  source  = "terraform-aws-modules/elasticache/aws"
  version = ">=1.0.0,<2.0.0"

  description          = "Redis cluster"
  engine               = "redis"
  engine_version       = "7.1"
  node_type            = "cache.t3.micro"
  num_cache_clusters   = 2
  replication_group_id = "abc123"
  subnet_ids           = ["subnet-12345678", "subnet-12345678"]
  vpc_id               = "vpc-12345678"

  automatic_failover_enabled = true
}

Use AWS provider resources directly. See docs for the resources involved: aws_elasticache_replication_group.

resource "aws_elasticache_replication_group" "this" {
  at_rest_encryption_enabled = true
  auth_token                 = "PofixExampleAuthToken32CharsLng"
  description                = "pofix example replication group"
  node_type                  = "cache.t3.micro"
  num_cache_clusters         = 2
  replication_group_id       = "pofix-abc123"
  snapshot_retention_limit   = 15
  subnet_group_name          = "example-subnet-group"
  transit_encryption_enabled = true

  automatic_failover_enabled = true
}

What this control checks

In Terraform, aws_elasticache_replication_group must have automatic_failover_enabled = true. The argument defaults to false, so omitting it causes the control to fail. Automatic failover also requires at least one replica: num_cache_clusters must be 2 or greater, or replicas_per_node_group must be at least 1 when using cluster mode. A replication group with automatic_failover_enabled = true but only a single node will fail to apply. Setting multi_az_enabled = true places replicas in separate Availability Zones and pairs well with this control, but the policy evaluates only the automatic_failover_enabled flag.

Common pitfalls

  • Single-node groups fail at apply time, not silently

    Setting automatic_failover_enabled = true with num_cache_clusters = 1 causes a Terraform apply error. You need at least two cache clusters (one primary, one replica) for automatic failover to function. If num_cache_clusters is set dynamically, validate it is always >= 2 whenever failover is enabled.

  • Cluster mode disabled vs enabled syntax differences

    When using cluster mode (sharding), replica count is controlled by replicas_per_node_group inside aws_elasticache_replication_group, not num_cache_clusters. Ensure replicas_per_node_group is at least 1 per shard. Mixing both arguments produces a conflict error.

  • T2 node types don't support automatic failover

    Older cache.t2.* node types don't support Multi-AZ with automatic failover. Specifying automatic_failover_enabled = true on a cache.t2.micro instance results in an API rejection. Switch to a current-generation node type such as cache.t4g.* to avoid this.

Audit evidence

AWS Config rule evaluations showing all AWS::ElastiCache::ReplicationGroup resources as COMPLIANT are the primary evidence, or equivalent output from a CSPM tool. The aws elasticache describe-replication-groups CLI should show AutomaticFailover: enabled for every group. Console screenshots confirming "Auto Failover: Enabled" work as supplementary evidence.

For continuous assurance, AWS Config conformance pack or Security Hub findings scoped to this control should show consistently compliant evaluations over time.

Framework-specific interpretation

Tool mappings

Use these identifiers to cross-reference this control across tools, reports, and evidence.

  • Compliance.tf Control: elasticache_replication_group_auto_failover_enabled

  • AWS Config Managed Rule: ELASTICACHE_REPL_GRP_AUTO_FAILOVER_ENABLED

  • Checkov Check: CKV2_AWS_50

  • Powerpipe Control: aws_compliance.control.elasticache_replication_group_auto_failover_enabled

  • Prowler Checks: elasticache_redis_cluster_automatic_failover_enabled, elasticache_redis_cluster_multi_az_enabled

  • AWS Security Hub Control: ElastiCache.3

Last reviewed: 2026-03-09