Skip to content

SageMaker models should be in a VPC

SageMaker models deployed without a VPC configuration run in AWS-managed network space. Inference traffic to downstream resources like S3, DynamoDB, or RDS must traverse the public internet or rely solely on IAM for access control. Placing models inside your VPC lets you apply security groups, use VPC endpoints, and keep data paths entirely within your private network.

This matters most when models call internal APIs or access sensitive data stores during inference. Without VPC placement, network-level segmentation is off the table, and a misconfigured IAM policy becomes the only barrier between your model's execution environment and a data exfiltration path.

Retrofit consideration

Changing VPC configuration on an existing SageMaker model requires creating a new model resource. The target VPC must already have subnets in the correct availability zones, appropriate security group rules, and VPC endpoints for S3 and ECR provisioned before the new model is created. Any endpoints referencing the old model will need updating, which causes downtime unless you stage the cutover with a blue-green endpoint configuration.

Implementation

Choose the approach that matches how you manage Terraform.

Use AWS provider resources directly. See docs for the resources involved: aws_sagemaker_model.

resource "aws_sagemaker_model" "this" {
  execution_role_arn = "arn:aws:iam::123456789012:role/example-role"
  name               = "pofix-abc123"

  primary_container {
    image = "123456789012.dkr.ecr.us-east-1.amazonaws.com/sagemaker-prebuilt-image"
  }

  vpc_config {
    security_group_ids = ["sg-abc12345"]
    subnets            = ["subnet-12345678", "subnet-12345678"]
  }
}

What this control checks

The policy engine checks that each aws_sagemaker_model resource includes a vpc_config block, with security_group_ids and subnets both containing at least one value. A model without the block, or with empty lists for either argument, fails. The subnets should be private to prevent inference containers from receiving public IP addresses. The referenced security groups control inbound and outbound traffic on the model's elastic network interfaces in those subnets. VPC endpoints or a NAT gateway must exist in the VPC for the model to reach ECR and S3, but those are infrastructure prerequisites, not arguments on the model resource itself.

Common pitfalls

  • Missing VPC endpoints cause model creation timeouts

    When vpc_config places the model in a private subnet with no NAT gateway or VPC endpoints, SageMaker can't pull container images from ECR or download model artifacts from S3. The CreateModel call succeeds, but endpoint creation and batch transform jobs will fail with timeout errors. You need aws_vpc_endpoint resources for com.amazonaws.<region>.s3 (gateway type) and com.amazonaws.<region>.ecr.dkr plus com.amazonaws.<region>.ecr.api (interface type) before any inference infrastructure can come up.

  • Security groups block SageMaker runtime traffic

    Security groups in security_group_ids must allow outbound HTTPS (port 443) to VPC endpoint addresses or the NAT gateway. A restrictive egress rule, like deny-all, will cut off the model container from AWS services entirely. DNS resolution also has to work: set enable_dns_support and enable_dns_hostnames to true on the VPC or the container won't be able to resolve endpoint hostnames.

  • Subnet availability zone mismatch with endpoints

    Endpoint deployments fail if your subnets don't cover the availability zones where SageMaker needs to place instances. Verify that the subnets in vpc_config.subnets span all AZs your production variant instance types support, and that interface VPC endpoint subnet associations include the same AZs.

  • Replacing a model forces endpoint redeployment

    aws_sagemaker_model is immutable with respect to vpc_config. Any change destroys and recreates the model. If aws_sagemaker_endpoint_configuration references that model by name, and aws_sagemaker_endpoint references that configuration, the whole chain needs updating. Plan for downtime, or stage the cutover using a blue-green endpoint configuration before running the Terraform apply.

Audit evidence

AWS Config rule evaluations showing all AWS::SageMaker::Model resources as COMPLIANT are the primary evidence, confirming each has a populated VpcConfig. The DescribeModel API output should include VpcConfig.Subnets and VpcConfig.SecurityGroupIds with non-empty arrays. Console screenshots of the model detail page showing VPC, subnets, and security groups under the 'Network' section can supplement the API output.

For continuous compliance, Config conformance pack results or Security Hub findings scoped to this control work well. Reviewing CloudTrail CreateModel events confirms no model was created without VPC parameters during the audit period.

Framework-specific interpretation

Tool mappings

Use these identifiers to cross-reference this control across tools, reports, and evidence.

  • Compliance.tf Control: sagemaker_model_in_vpc

  • AWS Config Managed Rule: SAGEMAKER_MODEL_IN_VPC

  • Powerpipe Control: aws_compliance.control.sagemaker_model_in_vpc

  • Prowler Check: sagemaker_models_vpc_settings_configured

Last reviewed: 2026-03-09