The backend of Kids Learn handles two fundamentally different workloads. Fast, stateless request-response operations — “get lesson X,” “submit answer Y,” “check progress for child Z” — complete in under 500ms and happen thousands of times per minute during school hours. Then there’s the adaptive learning engine — a compute-intensive process that analyzes a child’s learning history, runs vector similarity searches against our curriculum database, calls Amazon Bedrock for content generation, and produces a personalized lesson plan. This takes 5-15 seconds and requires sustained compute.

Trying to serve both workloads with the same compute model is a mistake. Lambda is perfect for the first: scales instantly, pay-per-request, no servers. But a 15-second AI inference call on Lambda means paying for idle time while waiting on Bedrock, hitting memory limits on complex prompts, and dealing with cold starts that make the first request even slower.

The answer: API Gateway + Lambda for the lessons API, auth, and progress tracking. ECS Fargate for the adaptive learning engine and batch processing. Let the architecture match the workload.

This is Part 4. We’re building on the CDK infrastructure from Part 2 and deploying alongside the frontend from Part 3.

Backend architecture — API Gateway routing to Lambda functions and ECS Fargate services

API Gateway — The Front Door

AWS offers two API Gateway types. REST API has more features (request validation, API keys, usage plans, caching). HTTP API is simpler, faster, and 70% cheaper. For Kids Learn, HTTP API gives us everything we need.

Why HTTP API Over REST API

┌──────────────────────────┬─────────────┬──────────────┐
│ Feature                  │ REST API    │ HTTP API     │
├──────────────────────────┼─────────────┼──────────────┤
│ Latency overhead         │ ~30ms       │ ~10ms        │
│ Cost per million requests│ $3.50       │ $1.00        │
│ JWT authorization        │ Custom auth │ Built-in     │
│ CORS                     │ Manual      │ Built-in     │
│ OpenAPI import           │ ✅          │ ✅           │
│ WebSocket support        │ ✅          │ ❌           │
│ Request validation       │ ✅          │ ❌           │
│ API caching              │ ✅          │ ❌           │
│ Usage plans              │ ✅          │ ❌           │
└──────────────────────────┴─────────────┴──────────────┘

We don’t need API-level caching (we cache at the application layer with ElastiCache), we don’t need request validation (we validate in our Lambda functions with Zod), and we don’t need usage plans yet. HTTP API saves us $2.50 per million requests and adds 20ms less latency per request.

API Gateway CDK Configuration

// lib/stacks/compute-stack.ts
import * as cdk from 'aws-cdk-lib';
import * as apigw from 'aws-cdk-lib/aws-apigatewayv2';
import * as apigwIntegrations from 'aws-cdk-lib/aws-apigatewayv2-integrations';
import * as apigwAuth from 'aws-cdk-lib/aws-apigatewayv2-authorizers';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as cognito from 'aws-cdk-lib/aws-cognito';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ecsPatterns from 'aws-cdk-lib/aws-ecs-patterns';
import { Construct } from 'constructs';
import { EnvironmentConfig } from '../config/environments';
import { DatabaseStack } from './database-stack';

export interface ComputeStackProps extends cdk.StackProps {
  config: EnvironmentConfig;
  vpc: ec2.Vpc;
  database: DatabaseStack;
  userPool: cognito.UserPool;
}

export class ComputeStack extends cdk.Stack {
  public readonly api: apigw.HttpApi;
  public readonly lessonsFunction: lambda.Function;
  public readonly progressFunction: lambda.Function;
  public readonly adaptiveService: ecsPatterns.ApplicationLoadBalancedFargateService;

  constructor(scope: Construct, id: string, props: ComputeStackProps) {
    super(scope, id, props);

    const { config, vpc, database, userPool } = props;

    // =========================================
    // JWT Authorizer — Cognito
    // =========================================
    const jwtAuthorizer = new apigwAuth.HttpJwtAuthorizer(
      'CognitoAuthorizer',
      `https://cognito-idp.${config.region}.amazonaws.com/${userPool.userPoolId}`,
      {
        jwtAudience: [userPool.userPoolClientId],
      }
    );

    // =========================================
    // HTTP API
    // =========================================
    this.api = new apigw.HttpApi(this, 'KidsLearnApi', {
      apiName: `kidslearn-api-${config.envName}`,
      description: 'Kids Learn API',
      corsPreflight: {
        allowHeaders: ['Content-Type', 'Authorization', 'X-Request-Id'],
        allowMethods: [
          apigw.CorsHttpMethod.GET,
          apigw.CorsHttpMethod.POST,
          apigw.CorsHttpMethod.PUT,
          apigw.CorsHttpMethod.DELETE,
          apigw.CorsHttpMethod.OPTIONS,
        ],
        allowOrigins: [
          `https://${config.envName === 'production' 
            ? '' : config.envName + '.'}kidslearn.app`,
        ],
        maxAge: cdk.Duration.hours(1),
      },
      disableExecuteApiEndpoint: config.envName === 'production',
    });

    // =========================================
    // Lambda Functions
    // =========================================
    
    // Shared Lambda configuration
    const lambdaDefaults = {
      runtime: lambda.Runtime.NODEJS_20_X,
      architecture: lambda.Architecture.ARM_64,
      memorySize: config.lambdaMemoryMB,
      timeout: cdk.Duration.seconds(30),
      vpc,
      vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
      tracing: lambda.Tracing.ACTIVE,
      environment: {
        NODE_ENV: config.envName,
        DB_SECRET_ARN: database.dbSecret.secretArn,
        SESSION_TABLE: database.sessionEventsTable.tableName,
        REDIS_ENDPOINT: database.redisCluster.attrPrimaryEndPointAddress,
        REDIS_PORT: database.redisCluster.attrPrimaryEndPointPort,
      },
      bundling: {
        minify: true,
        sourceMap: true,
        externalModules: ['@aws-sdk/*'],
      },
    };

    // Lessons API
    this.lessonsFunction = new lambda.Function(this, 'LessonsFunction', {
      ...lambdaDefaults,
      functionName: `kidslearn-lessons-${config.envName}`,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('src/lambda/lessons'),
      description: 'Handles lesson CRUD operations',
    });

    // Progress API
    this.progressFunction = new lambda.Function(this, 'ProgressFunction', {
      ...lambdaDefaults,
      functionName: `kidslearn-progress-${config.envName}`,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('src/lambda/progress'),
      description: 'Handles learning progress tracking',
    });

    // Grant permissions
    database.dbSecret.grantRead(this.lessonsFunction);
    database.dbSecret.grantRead(this.progressFunction);
    database.sessionEventsTable.grantReadWriteData(this.progressFunction);

    // Allow Lambda to connect to Aurora
    database.dbSecurityGroup.addIngressRule(
      this.lessonsFunction.connections.securityGroups[0],
      ec2.Port.tcp(5432),
      'Lambda lessons → Aurora'
    );
    database.dbSecurityGroup.addIngressRule(
      this.progressFunction.connections.securityGroups[0],
      ec2.Port.tcp(5432),
      'Lambda progress → Aurora'
    );

    // Allow Lambda to connect to Redis
    database.redisSecurityGroup.addIngressRule(
      this.lessonsFunction.connections.securityGroups[0],
      ec2.Port.tcp(6379),
      'Lambda lessons → Redis'
    );

    // =========================================
    // API Routes
    // =========================================
    
    // Public routes (no auth)
    this.api.addRoutes({
      path: '/health',
      methods: [apigw.HttpMethod.GET],
      integration: new apigwIntegrations.HttpLambdaIntegration(
        'HealthCheck', this.lessonsFunction
      ),
    });

    // Authenticated routes
    this.api.addRoutes({
      path: '/api/lessons',
      methods: [apigw.HttpMethod.GET],
      integration: new apigwIntegrations.HttpLambdaIntegration(
        'GetLessons', this.lessonsFunction
      ),
      authorizer: jwtAuthorizer,
    });

    this.api.addRoutes({
      path: '/api/lessons/{lessonId}',
      methods: [apigw.HttpMethod.GET],
      integration: new apigwIntegrations.HttpLambdaIntegration(
        'GetLesson', this.lessonsFunction
      ),
      authorizer: jwtAuthorizer,
    });

    this.api.addRoutes({
      path: '/api/progress',
      methods: [apigw.HttpMethod.GET, apigw.HttpMethod.POST],
      integration: new apigwIntegrations.HttpLambdaIntegration(
        'Progress', this.progressFunction
      ),
      authorizer: jwtAuthorizer,
    });

    this.api.addRoutes({
      path: '/api/progress/{childId}',
      methods: [apigw.HttpMethod.GET],
      integration: new apigwIntegrations.HttpLambdaIntegration(
        'ChildProgress', this.progressFunction
      ),
      authorizer: jwtAuthorizer,
    });

    // =========================================
    // ECS Fargate — Adaptive Learning Engine
    // =========================================
    const cluster = new ecs.Cluster(this, 'KidsLearnCluster', {
      vpc,
      clusterName: `kidslearn-${config.envName}`,
      containerInsights: true,
    });

    this.adaptiveService = new ecsPatterns.ApplicationLoadBalancedFargateService(
      this, 'AdaptiveEngine', {
        cluster,
        serviceName: `adaptive-engine-${config.envName}`,
        
        taskImageOptions: {
          image: ecs.ContainerImage.fromAsset('src/fargate/adaptive-engine'),
          containerPort: 3000,
          environment: {
            NODE_ENV: config.envName,
            DB_SECRET_ARN: database.dbSecret.secretArn,
            BEDROCK_REGION: config.region,
          },
          secrets: {
            DB_CREDENTIALS: ecs.Secret.fromSecretsManager(database.dbSecret),
          },
        },
        
        cpu: config.fargateCpu,
        memoryLimitMiB: config.fargateMemory,
        desiredCount: config.fargateDesiredCount,
        
        // Networking
        taskSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
        
        // Health check
        healthCheck: {
          command: ['CMD-SHELL', 'curl -f http://localhost:3000/health || exit 1'],
          interval: cdk.Duration.seconds(30),
          timeout: cdk.Duration.seconds(10),
          retries: 3,
          startPeriod: cdk.Duration.seconds(60),
        },
        
        // Use Fargate Spot for cost savings in non-prod
        capacityProviderStrategies: config.envName !== 'production' ? [
          {
            capacityProvider: 'FARGATE_SPOT',
            weight: 1,
          },
        ] : [
          {
            capacityProvider: 'FARGATE',
            weight: 1,
          },
        ],
        
        // Deployment configuration
        circuitBreaker: { rollback: true },
        enableExecuteCommand: config.envName !== 'production',
        runtimePlatform: {
          operatingSystemFamily: ecs.OperatingSystemFamily.LINUX,
          cpuArchitecture: ecs.CpuArchitecture.ARM64,
        },
      }
    );

    // Auto-scaling
    const scaling = this.adaptiveService.service.autoScaleTaskCount({
      minCapacity: config.fargateDesiredCount,
      maxCapacity: config.fargateDesiredCount * 4,
    });

    scaling.scaleOnCpuUtilization('CpuScaling', {
      targetUtilizationPercent: 70,
      scaleInCooldown: cdk.Duration.minutes(5),
      scaleOutCooldown: cdk.Duration.minutes(2),
    });

    scaling.scaleOnRequestCount('RequestScaling', {
      requestsPerTarget: 100,
      targetGroup: this.adaptiveService.targetGroup,
      scaleInCooldown: cdk.Duration.minutes(5),
      scaleOutCooldown: cdk.Duration.minutes(1),
    });

    // Route adaptive learning requests through API Gateway to Fargate ALB
    this.api.addRoutes({
      path: '/api/adaptive/{proxy+}',
      methods: [apigw.HttpMethod.ANY],
      integration: new apigwIntegrations.HttpAlbIntegration(
        'AdaptiveIntegration',
        this.adaptiveService.listener,
        { vpcLink: new apigw.VpcLink(this, 'VpcLink', { vpc }) }
      ),
      authorizer: jwtAuthorizer,
    });

    // Grant Bedrock access to Fargate task
    this.adaptiveService.taskDefinition.taskRole.addManagedPolicy(
      cdk.aws_iam.ManagedPolicy.fromAwsManagedPolicyName(
        'AmazonBedrockFullAccess'
      )
    );

    // =========================================
    // Outputs
    // =========================================
    new cdk.CfnOutput(this, 'ApiUrl', {
      value: this.api.apiEndpoint,
      description: 'API Gateway endpoint URL',
    });
  }
}

Lambda Function Implementation

The Lessons Handler

// src/lambda/lessons/index.ts
import { APIGatewayProxyHandlerV2 } from 'aws-lambda';
import { Logger } from '@aws-lambda-powertools/logger';
import { Tracer } from '@aws-lambda-powertools/tracer';
import { Metrics, MetricUnit } from '@aws-lambda-powertools/metrics';
import { getLessonById, getLessons } from './db';
import { getCachedLesson, cacheLesson } from './cache';

const logger = new Logger({ serviceName: 'lessons' });
const tracer = new Tracer({ serviceName: 'lessons' });
const metrics = new Metrics({ namespace: 'KidsLearn', serviceName: 'lessons' });

export const handler: APIGatewayProxyHandlerV2 = async (event) => {
  const segment = tracer.getSegment();
  const handlerSegment = segment?.addNewSubsegment('handler');
  
  try {
    const { routeKey, pathParameters } = event;
    
    // Route: GET /health
    if (routeKey === 'GET /health') {
      return { statusCode: 200, body: JSON.stringify({ status: 'healthy' }) };
    }

    // Route: GET /api/lessons
    if (routeKey === 'GET /api/lessons') {
      const { subject, grade, page, limit } = event.queryStringParameters || {};
      
      metrics.addMetric('LessonListRequests', MetricUnit.Count, 1);
      
      const lessons = await getLessons({
        subject: subject || undefined,
        grade: grade ? parseInt(grade) : undefined,
        page: page ? parseInt(page) : 1,
        limit: limit ? Math.min(parseInt(limit), 50) : 20,
      });
      
      return {
        statusCode: 200,
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(lessons),
      };
    }

    // Route: GET /api/lessons/{lessonId}
    if (routeKey === 'GET /api/lessons/{lessonId}') {
      const lessonId = pathParameters?.lessonId;
      if (!lessonId) {
        return { statusCode: 400, body: JSON.stringify({ error: 'lessonId required' }) };
      }
      
      // Check cache first
      const cached = await getCachedLesson(lessonId);
      if (cached) {
        metrics.addMetric('LessonCacheHit', MetricUnit.Count, 1);
        return {
          statusCode: 200,
          headers: { 'Content-Type': 'application/json', 'X-Cache': 'HIT' },
          body: JSON.stringify(cached),
        };
      }
      
      metrics.addMetric('LessonCacheMiss', MetricUnit.Count, 1);
      
      const lesson = await getLessonById(lessonId);
      if (!lesson) {
        return { statusCode: 404, body: JSON.stringify({ error: 'Lesson not found' }) };
      }
      
      // Cache for 30 minutes
      await cacheLesson(lessonId, lesson, 1800);
      
      return {
        statusCode: 200,
        headers: { 'Content-Type': 'application/json', 'X-Cache': 'MISS' },
        body: JSON.stringify(lesson),
      };
    }

    return { statusCode: 404, body: JSON.stringify({ error: 'Not found' }) };
    
  } catch (error) {
    logger.error('Handler error', { error });
    metrics.addMetric('LessonErrors', MetricUnit.Count, 1);
    return {
      statusCode: 500,
      body: JSON.stringify({ error: 'Internal server error' }),
    };
  } finally {
    handlerSegment?.close();
    metrics.publishStoredMetrics();
  }
};

Cold Start Optimization

Lambda cold starts are the most common performance complaint. Here’s how we minimize them:

1. ARM64 architecture. Graviton processors boot Lambda runtimes ~20% faster than x86.

2. Provisioned concurrency for critical paths.

// In the CDK stack
const lessonsAlias = this.lessonsFunction.addAlias('live', {
  provisionedConcurrentExecutions: config.envName === 'production' ? 5 : 0,
});

3. Minimal dependencies. Our Lambda bundles exclude @aws-sdk (it’s included in the runtime) and use tree-shaking to remove unused code.

4. Lazy initialization. Database connections initialize on first use, not on import:

// src/lambda/lessons/db.ts
import { SecretsManagerClient, GetSecretValueCommand } from '@aws-sdk/client-secrets-manager';
import { Pool } from 'pg';

let pool: Pool | null = null;

async function getPool(): Promise<Pool> {
  if (pool) return pool;
  
  const client = new SecretsManagerClient({});
  const response = await client.send(
    new GetSecretValueCommand({ SecretId: process.env.DB_SECRET_ARN })
  );
  
  const credentials = JSON.parse(response.SecretString!);
  
  pool = new Pool({
    host: credentials.host,
    port: credentials.port,
    database: credentials.dbname,
    user: credentials.username,
    password: credentials.password,
    max: 3,              // Lambda only needs a few connections
    idleTimeoutMillis: 0, // Don't close idle connections
    ssl: { rejectUnauthorized: false },
  });
  
  return pool;
}

ECS Fargate — The Adaptive Learning Engine

The Dockerfile

# src/fargate/adaptive-engine/Dockerfile
FROM node:20-slim AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --production=false
COPY . .
RUN npm run build

FROM node:20-slim AS runner
WORKDIR /app

# Security: run as non-root
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 appuser

COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --from=builder --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:nodejs /app/package.json ./

USER appuser
EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

CMD ["node", "dist/server.js"]

The Adaptive Engine Server

// src/fargate/adaptive-engine/src/server.ts
import express from 'express';
import { Logger } from '@aws-lambda-powertools/logger';
import { adaptiveLearningRouter } from './routes/adaptive';

const app = express();
const logger = new Logger({ serviceName: 'adaptive-engine' });

app.use(express.json({ limit: '10mb' }));

// Health check
app.get('/health', (req, res) => {
  res.json({ status: 'healthy', uptime: process.uptime() });
});

// Adaptive learning routes
app.use('/adaptive', adaptiveLearningRouter);

// Start server
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  logger.info(`Adaptive engine running on port ${PORT}`);
});

The Bottom Line

The split architecture — Lambda for fast CRUD, Fargate for heavy computation — lets us optimize each workload independently. Lambda scales to zero cost during quiet hours. Fargate maintains warm containers for the adaptive engine that need instant response times.

Key takeaways:

  • HTTP API saves 70% over REST API with lower latency
  • Lambda Powertools gives you structured logging, tracing, and metrics for free
  • Provisioned concurrency eliminates cold starts for critical paths
  • Fargate Spot saves 70% on non-production workloads
  • Auto-scaling on both CPU and request count prevents over/under-provisioning

In Part 5, we deep-dive into the data layer — Aurora Serverless v2 with pgvector for vector search, DynamoDB for session events, and ElastiCache Redis for caching.

See you in Part 5.


This is Part 4 of a 10-part series: AWS Full-Stack Mastery for Technical Leads.

Series outline:

  1. Why AWS & Getting Started (Part 1)
  2. Infrastructure as Code (CDK) (Part 2)
  3. Frontend (Amplify + CloudFront) (Part 3)
  4. Backend (API Gateway + Lambda + Fargate) (this post)
  5. Database (Aurora + DynamoDB + ElastiCache) (Part 5)
  6. AI/ML (Bedrock + SageMaker) (Part 6)
  7. DevOps (CodePipeline + CodeBuild) (Part 7)
  8. Security (IAM + Cognito + WAF) (Part 8)
  9. Observability (CloudWatch + X-Ray) (Part 9)
  10. Production (Multi-Region + DR) (Part 10)

References

Export for reading

Comments