Broken Stack

Overview¶

A pre-deployed CloudFormation stack named broken-app-stack has three deliberate misconfigurations causing it to fail. In this lab, you'll use your AI coding agent to inspect stack events, identify root causes, and apply fixes — just like you would in a real production incident.

What You'll Learn¶

How to use your agent to inspect CloudFormation stack events and diagnose failures
How to identify IAM permission gaps, invalid resource configurations, and capacity issues
How to iteratively fix infrastructure problems and validate corrections through stack updates

Instructions¶

Explore¶

Use your agent to diagnose and fix the broken stack. Here are some hints if you get stuck:

Start by asking your agent to describe the broken-app-stack and check its status and recent events for error messages
The stack has three separate issues — an IAM permission gap, a network configuration error, and a capacity problem
Once you've identified all three issues, ask your agent to update the stack with corrections applied simultaneously

Step-by-step Walkthrough

Start by inspecting the broken stack's status and events:

Check the CloudFormation stack named "broken-app-stack" — what's its current status and what do the stack events show? Look for any error messages.

Your agent should identify the stack is in a failed state. Look at the events to find the failure reasons. The invalid CIDR block error is the most visible — CloudFormation rejects it during validation.
Ask your agent to dig deeper into each resource:

Look at the resources in the broken-app-stack template. I know there are three issues — can you identify all of them by examining the Lambda role permissions, the security group configuration, and the DynamoDB table settings?

Your agent should identify these three issues:
Issue 1 — Missing IAM permission: The Lambda execution role (broken-app-role-{participant-id}) has s3:ListBucket but is missing s3:GetObject, so the function gets AccessDenied when reading objects
Issue 2 — Invalid CIDR block: The security group has an ingress rule with 10.999.0.0/16 which is invalid (999 exceeds the maximum octet value of 255)
Issue 3 — Insufficient DynamoDB throughput: The table is provisioned with 1 RCU and 1 WCU, but needs at least 5 of each for the workshop workload
Ask your agent to fix all three issues:

Update the broken-app-stack to fix all three issues: add s3:GetObject permission to the Lambda role for the workshop bucket objects, change the security group CIDR from 10.999.0.0/16 to 10.0.0.0/16, and increase the DynamoDB table to 5 RCU and 5 WCU.

Wait for the stack update to complete, then verify:

Check the broken-app-stack status — did it reach UPDATE_COMPLETE? Also verify the Lambda role now has s3:GetObject and the security group has a valid CIDR.

The stack should now be in UPDATE_COMPLETE state with all three issues resolved.

Validation¶

Open the CloudWatch Dashboard in the AWS Console. The Module 5 widget checks:

✅ Stack broken-app-stack has status UPDATE_COMPLETE
✅ Lambda execution role includes s3:GetObject permission
✅ No security group has an invalid CIDR block

You can also run validation directly:

Check if the broken-app-stack is in UPDATE_COMPLETE status, and verify the Lambda role has the s3:GetObject permission.

Info

All three issues must be fixed in a single stack update. If you fix only one or two, the stack will roll back to its previous failed state because the remaining invalid configuration still prevents successful deployment.

Agent-Specific Tips¶

Claude CodeKiroCursorCodex

Claude Code is strong at reading CloudFormation templates and correlating errors across resources. Try asking it to explain the chain of failures:

Describe the broken-app-stack, get its template, and analyze all resources for misconfigurations. Explain how each issue would manifest at runtime.

Kiro can inspect both the stack events and the actual resource state. Ask it to cross-reference what the template defines versus what AWS actually created:

Show me the broken-app-stack events and describe each resource that failed. What needs to change in the template?

Cursor can retrieve the CloudFormation template and identify issues through code analysis. Ask it to get the template first, then propose a corrected version:

Get the template for broken-app-stack and identify all misconfigurations. Then show me what the corrected template should look like.

Codex can use the MCP Server to inspect stack events and resources. Be explicit about wanting it to check all three resource types:

Check the broken-app-stack events for errors. Then inspect the Lambda role policy, security group rules, and DynamoDB table settings to find all issues.