DevOps 4 min read

Stop Wrestling YAML: How to Deploy 50 AI Models with Python Loops

Infrastructure as Code shouldn't be a copy-paste nightmare. Learn how to use Pulumi and Python to programmatically deploy scalable AI infrastructure without the YAML fatigue.

MR

Moshiour Rahman

Advertisement

The “YAML Hell” Scenario

Imagine this: You work at HealthAI, a startup analyzing medical records. You have one amazing “Sentiment Analysis” model deployed on AWS ECS. It works great.

Then Sales closes a deal with 50 hospitals.

Each hospital needs:

  1. Its own isolated data bucket (HIPAA compliance).
  2. Its own dedicated inference API (performance isolation).
  3. Its own specific version of the model (some want v1, some want v2).

The “Old Way” (Terraform/YAML/CloudFormation): You start copy-pasting. You create hospital-a.tf, hospital-b.tf… by the 10th file, you’re questioning your career choices. You try for_each loops in HCL, but debugging the state file becomes a nightmare. You make a typo in Hospital #42’s config, and now production is down.

The “New Way” (Python + Pulumi): You write a for loop. You go home early.

Why Software Engineers Hate DevOps (And Why Pulumi Fixes It)

We tell developers to “shift left” and own their infrastructure, but then we hand them a domain-specific language (HCL) or a configuration markup (YAML) that lacks basic programming features.

  • Where are my loops? (Terraform’s count is a hack, not a loop).
  • Where is my abstraction? (Modules are heavy; classes are native).
  • Where is my IDE support? (I want IntelliSense, not a syntax highlighter).

Pulumi treats infrastructure as software. You define resources (EC2, S3, ECS) as objects in Python, TypeScript, or Go.

The Tutorial: Deploying the “HealthAI” Stack

Let’s build that 50-hospital infrastructure right now, in about 50 lines of Python.

Prerequisites

  • Pulumi CLI
  • AWS Credentials configured
  • Python 3.9+
mkdir health-ai-infra && cd health-ai-infra
pulumi new aws-python

1. The “Hospital” Abstraction

Instead of writing raw resources, we create a Python class. This is our “blueprint.”

import pulumi
from pulumi_aws import s3, ecs, ec2, iam

class HospitalStack:
    def __init__(self, client_name: str, model_version: str, cluster_arn: str):
        """
        Deploys a dedicated S3 bucket and Fargate Service for a single client.
        """
        self.client_name = client_name
        
        # 1. Dedicated S3 Bucket for Patient Data
        self.bucket = s3.Bucket(f"{client_name}-records",
            acl="private",
            server_side_encryption_configuration={
                "rule": {"apply_server_side_encryption_by_default": {"sse_algorithm": "AES256"}}
            },
            tags={"Client": client_name, "Compliance": "HIPAA"}
        )

        # 2. Fargate Service (The AI Model)
        # Note: We assume a Cluster and Task Definition exist for brevity, 
        # but you could create those here too!
        
        self.service = ecs.Service(f"{client_name}-service",
            cluster=cluster_arn,
            desired_count=1,
            launch_type="FARGATE",
            task_definition=f"ai-model-{model_version}", # Dynamic versioning!
            network_configuration={
                "assign_public_ip": False,
                "subnets": ["subnet-abc", "subnet-xyz"], # Real subnet IDs here
                "security_groups": ["sg-12345"]
            },
            tags={"Client": client_name}
        )

        # Export the bucket name so we can find it later
        pulumi.export(f"{client_name}_bucket", self.bucket.id)

2. The “Business Logic” (The Loop)

Now, here is the magic. We don’t need 50 files. We need a list and a loop.

__main__.py:

import pulumi
import pulumi_aws as aws
from hospital_stack import HospitalStack # Import our class above

# Our "Database" of clients (could come from a JSON file, API, or database)
clients = [
    {"name": "metro-general", "version": "v1.2"},
    {"name": "city-care", "version": "v2.0"},
    {"name": "st-marys", "version": "v1.5"},
    # ... imagine 47 more lines here ...
]

# Shared Infrastructure (The Cluster)
cluster = aws.ecs.Cluster("main-ai-cluster")

# Deployment Loop
for client in clients:
    HospitalStack(
        client_name=client["name"],
        model_version=client["version"],
        cluster_arn=cluster.arn
    )

That’s it.

When you run pulumi up, Pulumi calculates the graph. It sees 50 buckets and 50 services. It creates them in parallel.

3. Handling Complexity (Conditionals)

Suddenly, Metro General calls. They want to pay double for “High Availability” (2 replicas instead of 1).

In Terraform, this requires refactoring variables or using messy logic. In Python?

# In __main__.py
clients = [
    {"name": "metro-general", "version": "v1.2", "tier": "premium"},
    {"name": "city-care", "version": "v2.0", "tier": "standard"},
]

# In HospitalStack class
desired_count = 2 if tier == "premium" else 1

self.service = ecs.Service(..., desired_count=desired_count)

This is just programming. You already know how to do this.

Who Needs This?

  1. Platform Engineers: Building “vending machines” where developers request resources via a simple config, and Python handles the heavy lifting.
  2. AI/ML Engineers: You know Python. Why learn HCL? Define your training jobs and inference endpoints in the language you use for modeling.
  3. Startups: You need to move fast. You don’t have time to manage 5000 lines of YAML.

Real Value: “Infrastructure as Software”

The real power isn’t just loops. It’s the ecosystem.

  • Validation: Use pydantic to validate your client configuration before Pulumi even runs.
  • External Data: Use requests to fetch the list of active clients from your CRM API dynamically during deployment.
  • Testing: Use pytest to verify your HospitalStack class actually sets the “HIPAA” tag correctly.

Stop treating infrastructure like a configuration file. Treat it like the mission-critical software it is.

Advertisement

MR

Moshiour Rahman

Software Architect & AI Engineer

Share:
MR

Moshiour Rahman

Software Architect & AI Engineer

Enterprise software architect with deep expertise in financial systems, distributed architecture, and AI-powered applications. Building large-scale systems at Fortune 500 companies. Specializing in LLM orchestration, multi-agent systems, and cloud-native solutions. I share battle-tested patterns from real enterprise projects.

Related Articles

Comments

Comments are powered by GitHub Discussions.

Configure Giscus at giscus.app to enable comments.