The AI Technical Architect: Infrastructure, Security & Non-Functionals in the AI Era (Part 8 of 12)

The Technical Architect (TA) is responsible for the invisible qualities of a system: performance, reliability, security, scalability, cost-efficiency, and operational simplicity. While the Solution Architect focuses on what to build, the TA focuses on how it runs — in production, under load, over time, when things go wrong. In the AI era, the TA gains a particularly powerful set of tools for the work that has historically been the most time-consuming: IaC generation, cost modelling, and non-functional requirements analysis.

The TA’s Core Responsibilities

The Technical Architect typically owns:

Infrastructure design: Cloud architecture, network topology, data residency, managed services vs custom
Non-functional requirements (NFRs): Performance targets, availability SLAs (e.g. 99.9%), disaster recovery objectives (RTO/RPO), security posture
Infrastructure as Code (IaC): Terraform, Bicep, CDK — keeping infra reproducible and version-controlled
Cost governance: Cloud cost budgets, right-sizing recommendations, Reserved Instance/Savings Plan strategies
Operational observability: Logging, tracing, alerting strategy
Capacity planning: Scaling strategies for anticipated growth

Where AI Changes the TA Game

AI Technical Architect Blueprint

1. IaC Generation (AI-accelerated)

Writing Terraform or Bicep from scratch is time-consuming and error-prone. AI can generate initial IaC from a plain-language infrastructure description.

Prompt example:

Generate Terraform code for this infrastructure:
- Azure App Service: .NET 10 application, 2 environments (staging, prod)
- Azure SQL: General Purpose, 4 vCores, geo-redundant backup
- Azure Front Door: Global load balancing with WAF policy
- Azure Key Vault: Secrets storage for connection strings
- App configuration: Separate per environment via variable files

Requirements:
- Tagging strategy: environment, product, cost-centre
- Production environment locked (prevent accidental destroy)
- Outputs: all connection strings, endpoint URLs

Use Terraform best practices: separate modules per resource type, remote state in Azure Blob Storage

The TA reviews every line of generated IaC before it is used. AI IaC commonly requires corrections in:

IAM/RBAC scope (AI often over-provisions permissions)
Networking (subnet CIDR ranges, NSG rules)
High availability configuration (AI often generates single-AZ by default)

2. Cost Modelling and Right-Sizing

AI can analyse cloud bills (exported as CSV) and infrastructure manifests to identify:

Over-provisioned resources (CPU/memory utilisation < 20%)
Idle resources (dev environments running 24/7)
Opportunity for Reserved Instance coverage
Data egress costs that can be reduced by CDN configuration

Prompt example:

Analyse this Azure Cost Export:
[paste CSV or JSON]

Identify:
1. Top 5 cost drivers
2. Resources with >50% over-provisioning potential (based on utilisation data)
3. Environments running outside business hours (cost saving opportunity)
4. Estimated annual saving from each recommendation

Our budget target: [£X/month]

3. NFR Analysis and Validation

When a project’s NFRs are specified (or need to be discovered), AI can:

Calculate whether the proposed architecture meets stated SLAs
Identify NFRs that are missing or inconsistent
Suggest monitoring and alerting configurations that validate NFR compliance
Model failure scenarios and their impact on RTO/RPO objectives

Prompt example:

Our architecture:
[describe architecture]

NFR targets:
- Availability: 99.9% (allows 8.7 hrs downtime/year)
- RTO: 4 hours (system restored within 4 hours of failure)
- RPO: 1 hour (maximum 1 hour data loss)
- Response time: P95 < 500ms under 1000 concurrent users

Analyse:
1. Does the architecture meet each NFR? Justify.
2. Which NFRs are at risk given the architecture?
3. What additional infrastructure is needed to meet [specific NFR]?
4. Design a DR runbook for the most likely failure scenario

4. Observability Configuration

AI generates initial Prometheus/Grafana alert rules, Azure Monitor queries, and CloudWatch dashboards from the system’s SLA targets and known failure modes.

Given these SLAs: [paste SLAs]
And these known failure modes: [database unavailable, third-party API timeout, memory leak under sustained load]

Generate:
1. Azure Monitor alert rules (YAML) with appropriate thresholds
2. A 4-panel Grafana dashboard layout: golden signals (latency, traffic, errors, saturation)
3. A runbook entry for each alert: probable cause, investigation steps, remediation

Non-Functional Requirements: The TA’s Checklist

Every system needs NFRs defined before development starts. AI assists in generating a complete NFR checklist from the product context:

Category	Key NFRs to define
Performance	Response time targets (P50/P95/P99), throughput (requests/sec)
Availability	Uptime SLA (e.g. 99.9%), maintenance windows
Recovery	RTO (how fast restored), RPO (how much data loss tolerable)
Scalability	Horizontal or vertical? Auto-scale triggers? Maximum scale target?
Security	Data classification, encryption at rest/transit, authentication standard
Compliance	GDPR, ISO 27001, PCI-DSS, HIPAA — as applicable
Observability	Log retention, tracing coverage, alert response SLA
Cost	Monthly budget ceiling, showback/chargeback strategy

Rule: Every NFR must have a measurable acceptance criterion. “The system should be fast” is not an NFR.

The Human-Irreplaceable TA Work

Platform judgment: Knowing when managed services are worth the cost, when vendor lock-in is an acceptable risk, when “build it yourself” is justified — this comes from experience, not documentation. AI can list trade-offs; TAs must make calls.

Incident command support: When a production system fails in a way that wasn’t anticipated, the TA needs to diagnose the failure, propose and execute a recovery, and then lead the post-incident review. AI assists with diagnosis; humans drive the recovery.

Cost vs reliability trade-offs: Deciding whether an extra £2,000/month for geo-redundant storage is worth the RPO improvement requires understanding the business risk appetite. This is not a technical calculation; it is a business judgment that the TA must own.

Regulatory interpretation: GDPR Article 32 says “appropriate technical and organisational measures” — but what is “appropriate” for this product, this data, this team? A TA must interpret regulatory requirements in the specific context of their system. AI can summarise the regulation; humans must judge the implementation.

IaC Governance in AI Teams

In an AI-augmented team, IaC generation is fast — which means mistakes can be made quickly. Governance controls:

No IaC runs without a plan review — terraform plan output must be reviewed by a human before apply
Production IaC changes require two approvals — TA + Tech Lead
Destructive changes require explicit flag — AI-generated IaC that destroys resources must be explicitly confirmed
State is protected — Remote state with state locking, no local state in production

Tools for the AI Technical Architect

Tool	Purpose
Terraform + AI	IaC generation, plan analysis
Azure Cost Management	Cost export analysis
Infracost	Cost estimation before `terraform apply`
Checkov / tfsec	IaC security scanning (AI-assisted)
Azure Monitor / Prometheus	Observability configuration
k6 / Gatling	Performance testing against NFR targets
Claude	NFR analysis, DR planning, runbook generation

Previous: Part 7 — The AI Quality Engineer ←
Next: Part 9 — The AI Security Engineer →

This is Part 8 of the AI-Powered Software Teams series.

Export for reading

The AI Technical Architect: Infrastructure, Security & Non-Functionals in the AI Era (Part 8 of 12)

The TA’s Core Responsibilities

Where AI Changes the TA Game

1. IaC Generation (AI-accelerated)

2. Cost Modelling and Right-Sizing

3. NFR Analysis and Validation

4. Observability Configuration

Non-Functional Requirements: The TA’s Checklist

The Human-Irreplaceable TA Work

IaC Governance in AI Teams

Tools for the AI Technical Architect

Comments

On this page

The AI Technical Architect: Infrastructure, Security & Non-Functionals in the AI Era (Part 8 of 12)