Back to blog
Terraform

The Production-Grade Infrastructure Checklist

July 10, 2024
4 minutes
production-grade infrastructureterraform checklistinfrastructure as codehigh availability infrastructureinfrastructure monitoringcloud infrastructure best practicesinfrastructure testingscalability checklistcloud securityTerraform Up and Running
The Production-Grade Infrastructure Checklist

Introduction

I'm reading a book called "Terraform Up and Running"and it has the very interesting Chapter 8 about production grade infrastructure. That chapter contains a production-grade infrastructure checklist which is so good that I decided to move it to a separate post for future reference.

The Production-Grade Infrastructure Checklist

1InstallInstall the software binaries and all dependeciesBash, Ansible, Docker, Packer
2ConfigureConfigure the softwae at runtime. Includes port settings, TLS certs, service discovery, leaders, followers, replication, etc.Chef, Ansible, Kubernetes
3ProvisionProvision the infrastructure.Includes servers, load balancers, network configuration, firewall settings, IAM permissions, etc.Terraform, CloudFormation
4DeployDeploy the service on top of the infrastructure. Roll out updates with no downtime. Includes blue-green, rolling, and canary deployments.ASG, Kubernetes, ECS
5High availabilityWithstand outages of individual processes, servers, services, datacenters, and regions.Multi-datacenter, multi-region
6ScalabilityScale up and down in response to load. Scale horizontally (more servers) and/or vertically (bigger servers).Auto scaling, replication
7PerformanceOptimize CPU, memory, disk, network, and GPU usage. Includes query tuning, benchmarking, load testing, and profiling.Dynatrace, Valgrind, VisualVM
8NetworkingConfigure static and dynamic IPs, ports, service discovery, firewalls, DNS, SSH access, and VPN access.VPCs, firewalls, Route 53
9SecurityEncryption in transit (TLS) and on disk, authentication, authorization, secrets management, server hardening.ACM, Let's Encrypt, KMS, Vault
10MetricsAvailability metrics, business metrics, app metrics, server metrics, events, observability, tracing, and alerting.CloudWatch, Datadog, Grafana
11LogsRotate logs on disk. Aggregate log data to a central location.Elastic Stack, Sumo Logic
12Data backupMake backups of DBs, caches, and other data on a scheduled basis. Replicate to separate region/account.AWS Backup, RDS snapshots
13Cost optimizationPick proper Instance types, use spot and reserved Instances, use auto scaling, and clean up unused resources.Auto scaling, Infracost
14DocumentationDocument your code, architecture, and practices. Create playbooks to respond to incidents.READMEs, wikis, Slack, IaC
15TestsWrite automated tests for your infrastructure code. Run tests after every commit and nightly.Terratest, tflint, OPA, InSpec

Author's Quote

Every time you're working on a new piece of infrastructure, go through this checklist. Not every single piece of infrastructure needs every single item on the list, but you should consciously and explicitly document which items you've implemented, which ones you've decided to skip, and why.

Conclusions

production-grade-infrastructure-checklist-1.png

The Production Infrastructure Checklist provided in this article is pretty strong, and I'm looking forward to using it in my day job.

Also, I highly recommend reading the book "Terraform Up and Running" where I found this checklist. It contains more practical advice about using Terraform and building modern infrastructure.

Share this article

Got a Specific Challenge? 🤔

Describe your backend challenge below to get a preliminary fixed-fee package suggestion and estimated price.

Please note: The initial proposal is generated by AI. A final, detailed proposal will be provided after a discovery call.

Vitalii Honchar portrait

Meet Vitalii Honchar

Senior Software Engineer specializing in high-load systems, AI/ML infrastructure, and cloud-native architectures. With experience at companies like Pinterest, Revolut, Form3, and Ajax Systems, I focus on building scalable, efficient, and robust systems that solve complex technical challenges.

More About Vitalii →