The Technical Architect (TA) is responsible for the invisible qualities of a system: performance, reliability, security, scalability, cost-efficiency, and operational simplicity. While the Solution Architect focuses on what to build, the TA focuses on how it runs — in production, under load, over time, when things go wrong. In the AI era, the TA gains a particularly powerful set of tools for the work that has historically been the most time-consuming: IaC generation, cost modelling, and non-functional requirements analysis.


The TA’s Core Responsibilities

The Technical Architect typically owns:

  • Infrastructure design: Cloud architecture, network topology, data residency, managed services vs custom
  • Non-functional requirements (NFRs): Performance targets, availability SLAs (e.g. 99.9%), disaster recovery objectives (RTO/RPO), security posture
  • Infrastructure as Code (IaC): Terraform, Bicep, CDK — keeping infra reproducible and version-controlled
  • Cost governance: Cloud cost budgets, right-sizing recommendations, Reserved Instance/Savings Plan strategies
  • Operational observability: Logging, tracing, alerting strategy
  • Capacity planning: Scaling strategies for anticipated growth

Where AI Changes the TA Game

AI Technical Architect Blueprint

1. IaC Generation (AI-accelerated)

Writing Terraform or Bicep from scratch is time-consuming and error-prone. AI can generate initial IaC from a plain-language infrastructure description.

Prompt example:

Generate Terraform code for this infrastructure:
- Azure App Service: .NET 10 application, 2 environments (staging, prod)
- Azure SQL: General Purpose, 4 vCores, geo-redundant backup
- Azure Front Door: Global load balancing with WAF policy
- Azure Key Vault: Secrets storage for connection strings
- App configuration: Separate per environment via variable files

Requirements:
- Tagging strategy: environment, product, cost-centre
- Production environment locked (prevent accidental destroy)
- Outputs: all connection strings, endpoint URLs

Use Terraform best practices: separate modules per resource type, remote state in Azure Blob Storage

The TA reviews every line of generated IaC before it is used. AI IaC commonly requires corrections in:

  • IAM/RBAC scope (AI often over-provisions permissions)
  • Networking (subnet CIDR ranges, NSG rules)
  • High availability configuration (AI often generates single-AZ by default)

2. Cost Modelling and Right-Sizing

AI can analyse cloud bills (exported as CSV) and infrastructure manifests to identify:

  • Over-provisioned resources (CPU/memory utilisation < 20%)
  • Idle resources (dev environments running 24/7)
  • Opportunity for Reserved Instance coverage
  • Data egress costs that can be reduced by CDN configuration

Prompt example:

Analyse this Azure Cost Export:
[paste CSV or JSON]

Identify:
1. Top 5 cost drivers
2. Resources with >50% over-provisioning potential (based on utilisation data)
3. Environments running outside business hours (cost saving opportunity)
4. Estimated annual saving from each recommendation

Our budget target: [£X/month]

3. NFR Analysis and Validation

When a project’s NFRs are specified (or need to be discovered), AI can:

  • Calculate whether the proposed architecture meets stated SLAs
  • Identify NFRs that are missing or inconsistent
  • Suggest monitoring and alerting configurations that validate NFR compliance
  • Model failure scenarios and their impact on RTO/RPO objectives

Prompt example:

Our architecture:
[describe architecture]

NFR targets:
- Availability: 99.9% (allows 8.7 hrs downtime/year)
- RTO: 4 hours (system restored within 4 hours of failure)
- RPO: 1 hour (maximum 1 hour data loss)
- Response time: P95 < 500ms under 1000 concurrent users

Analyse:
1. Does the architecture meet each NFR? Justify.
2. Which NFRs are at risk given the architecture?
3. What additional infrastructure is needed to meet [specific NFR]?
4. Design a DR runbook for the most likely failure scenario

4. Observability Configuration

AI generates initial Prometheus/Grafana alert rules, Azure Monitor queries, and CloudWatch dashboards from the system’s SLA targets and known failure modes.

Given these SLAs: [paste SLAs]
And these known failure modes: [database unavailable, third-party API timeout, memory leak under sustained load]

Generate:
1. Azure Monitor alert rules (YAML) with appropriate thresholds
2. A 4-panel Grafana dashboard layout: golden signals (latency, traffic, errors, saturation)
3. A runbook entry for each alert: probable cause, investigation steps, remediation

Non-Functional Requirements: The TA’s Checklist

Every system needs NFRs defined before development starts. AI assists in generating a complete NFR checklist from the product context:

CategoryKey NFRs to define
PerformanceResponse time targets (P50/P95/P99), throughput (requests/sec)
AvailabilityUptime SLA (e.g. 99.9%), maintenance windows
RecoveryRTO (how fast restored), RPO (how much data loss tolerable)
ScalabilityHorizontal or vertical? Auto-scale triggers? Maximum scale target?
SecurityData classification, encryption at rest/transit, authentication standard
ComplianceGDPR, ISO 27001, PCI-DSS, HIPAA — as applicable
ObservabilityLog retention, tracing coverage, alert response SLA
CostMonthly budget ceiling, showback/chargeback strategy

Rule: Every NFR must have a measurable acceptance criterion. “The system should be fast” is not an NFR.


The Human-Irreplaceable TA Work

Platform judgment: Knowing when managed services are worth the cost, when vendor lock-in is an acceptable risk, when “build it yourself” is justified — this comes from experience, not documentation. AI can list trade-offs; TAs must make calls.

Incident command support: When a production system fails in a way that wasn’t anticipated, the TA needs to diagnose the failure, propose and execute a recovery, and then lead the post-incident review. AI assists with diagnosis; humans drive the recovery.

Cost vs reliability trade-offs: Deciding whether an extra £2,000/month for geo-redundant storage is worth the RPO improvement requires understanding the business risk appetite. This is not a technical calculation; it is a business judgment that the TA must own.

Regulatory interpretation: GDPR Article 32 says “appropriate technical and organisational measures” — but what is “appropriate” for this product, this data, this team? A TA must interpret regulatory requirements in the specific context of their system. AI can summarise the regulation; humans must judge the implementation.


IaC Governance in AI Teams

In an AI-augmented team, IaC generation is fast — which means mistakes can be made quickly. Governance controls:

  1. No IaC runs without a plan reviewterraform plan output must be reviewed by a human before apply
  2. Production IaC changes require two approvals — TA + Tech Lead
  3. Destructive changes require explicit flag — AI-generated IaC that destroys resources must be explicitly confirmed
  4. State is protected — Remote state with state locking, no local state in production

Tools for the AI Technical Architect

ToolPurpose
Terraform + AIIaC generation, plan analysis
Azure Cost ManagementCost export analysis
InfracostCost estimation before terraform apply
Checkov / tfsecIaC security scanning (AI-assisted)
Azure Monitor / PrometheusObservability configuration
k6 / GatlingPerformance testing against NFR targets
ClaudeNFR analysis, DR planning, runbook generation

Previous: Part 7 — The AI Quality Engineer ←
Next: Part 9 — The AI Security Engineer →

This is Part 8 of the AI-Powered Software Teams series.

Export for reading

Comments