Broken Stack
Overview¶
A pre-deployed CloudFormation stack named broken-app-stack has three deliberate misconfigurations causing it to fail. In this lab, you'll use your AI coding agent to inspect stack events, identify root causes, and apply fixes — just like you would in a real production incident.
What You'll Learn¶
- How to use your agent to inspect CloudFormation stack events and diagnose failures
- How to identify IAM permission gaps, invalid resource configurations, and capacity issues
- How to iteratively fix infrastructure problems and validate corrections through stack updates
Instructions¶
Explore¶
Use your agent to diagnose and fix the broken stack. Here are some hints if you get stuck:
- Start by asking your agent to describe the
broken-app-stackand check its status and recent events for error messages - The stack has three separate issues — an IAM permission gap, a network configuration error, and a capacity problem
- Once you've identified all three issues, ask your agent to update the stack with corrections applied simultaneously
Step-by-step Walkthrough
- Start by inspecting the broken stack's status and events:
Check the CloudFormation stack named "broken-app-stack" — what's its current status and what do the stack events show? Look for any error messages.
-
Your agent should identify the stack is in a failed state. Look at the events to find the failure reasons. The invalid CIDR block error is the most visible — CloudFormation rejects it during validation.
-
Ask your agent to dig deeper into each resource:
Look at the resources in the broken-app-stack template. I know there are three issues — can you identify all of them by examining the Lambda role permissions, the security group configuration, and the DynamoDB table settings?
- Your agent should identify these three issues:
- Issue 1 — Missing IAM permission: The Lambda execution role (
broken-app-role-{participant-id}) hass3:ListBucketbut is missings3:GetObject, so the function gets AccessDenied when reading objects - Issue 2 — Invalid CIDR block: The security group has an ingress rule with
10.999.0.0/16which is invalid (999 exceeds the maximum octet value of 255) -
Issue 3 — Insufficient DynamoDB throughput: The table is provisioned with 1 RCU and 1 WCU, but needs at least 5 of each for the workshop workload
-
Ask your agent to fix all three issues:
Update the broken-app-stack to fix all three issues: add s3:GetObject permission to the Lambda role for the workshop bucket objects, change the security group CIDR from 10.999.0.0/16 to 10.0.0.0/16, and increase the DynamoDB table to 5 RCU and 5 WCU.
- Wait for the stack update to complete, then verify:
Check the broken-app-stack status — did it reach UPDATE_COMPLETE? Also verify the Lambda role now has s3:GetObject and the security group has a valid CIDR.
- The stack should now be in
UPDATE_COMPLETEstate with all three issues resolved.
Validation¶
Open the CloudWatch Dashboard in the AWS Console. The Module 5 widget checks:
- ✅ Stack
broken-app-stackhas status UPDATE_COMPLETE - ✅ Lambda execution role includes s3:GetObject permission
- ✅ No security group has an invalid CIDR block
You can also run validation directly:
Check if the broken-app-stack is in UPDATE_COMPLETE status, and verify the Lambda role has the s3:GetObject permission.
Info
All three issues must be fixed in a single stack update. If you fix only one or two, the stack will roll back to its previous failed state because the remaining invalid configuration still prevents successful deployment.
Agent-Specific Tips¶
Claude Code is strong at reading CloudFormation templates and correlating errors across resources. Try asking it to explain the chain of failures:
Kiro can inspect both the stack events and the actual resource state. Ask it to cross-reference what the template defines versus what AWS actually created:
Cursor can retrieve the CloudFormation template and identify issues through code analysis. Ask it to get the template first, then propose a corrected version: