Back

AWS IAM Drift Detector

What happens when you let engineers ship fast but still want to sleep at night.

AWSIAMLambdaEventBridgeCloudTrailn8n

So here's the thing. Every org I've worked with has the same problem: devs need to move fast, security wants to review everything, and nobody has time for the back-and-forth. IAM changes especially. Someone adds a policy, it ships, and three weeks later you find out it grants admin to half the account.

I got tired of finding these things manually. Scrolling through CloudTrail logs at 2am because something felt off. There had to be a better way.

The idea was simple: catch IAM changes the moment they happen, figure out if they're risky, and do something about it before anyone has to ask. No dashboards to check. No weekly reviews. Just automated guardrails that actually work.

I wired up EventBridge to listen for IAM events from CloudTrail. Every time someone touches a policy, role, or user, it fires. A Lambda picks it up and starts asking questions. Who made this change? What did they actually modify? Does this grant wildcard permissions? Can this role assume into other accounts? Is this touching sensitive services like KMS or Secrets Manager?

The scoring part took the longest to get right. First version flagged everything. Useless. Teams ignored it after day two. So I started tuning. Wildcards on s3:GetObject? Probably fine. Wildcards on iam:*? That's a problem. Context matters. A change from a CI pipeline service role hits different than a change from some random IAM user at 3am.

Once I had signal I could trust, I piped it into n8n for the response workflows. High severity stuff creates a ticket, posts to Slack with the full context, and optionally triggers a rollback. Medium severity gets logged and queued for review. Low severity just gets tracked so we can spot patterns later.

The rollback piece was tricky. You can't just revert IAM changes blindly. Sometimes the "risky" change is intentional and the team knows what they're doing. So I added approval gates. Flag it, pause it, let a human confirm before anything gets reverted. Automation with a kill switch.

What actually made this useful was the summaries. Nobody wants to read raw JSON diffs. So every alert includes: who made the change, what account and region, what the policy looked like before and after, and why the system thinks it's risky. One glance and you know if you need to care.

Ran this for a few months now. Caught a handful of real issues that would've slipped through manual review. More importantly, it let teams ship faster because they knew the guardrails were there. They didn't have to wait for security to approve every change. They just shipped, and if something was off, the system caught it.

Still tweaking the rules. Still dealing with edge cases. But the core loop works. Change happens, system evaluates, response fires if needed. Simple when you say it like that. Getting there was the hard part.