Photo of Alex Arvanitidis
Alex Arvanitidis

Machine Learning Engineer

The 5 worst bugs I've seen on production - #4 the $60 otp ddos

Published about 1 month ago

This one showed up as user complaints about OTP codes failing early in the morning. Our AWS SNS budget had been drained. We used AWS SNS (Amazon’s notification service with SMS support) to send the messages. The endpoint was public by design, and we had IP‑based rate limiting. The attacker simply sent requests from many different IPs, so our limit did nothing.

What is it?

An OTP is a one‑time password used to verify a user or action. A DDoS is a distributed denial‑of‑service attack, where many sources flood a service to exhaust resources or budget. See: One‑time password (Wikipedia) and Denial‑of‑service attack (Wikipedia).

Problem

We were sending OTP SMS through AWS SNS (Amazon’s SMS‑capable notification service). The service was unauthenticated because it needed to be. Our protection was based only on the caller’s IP address, which did not stop traffic coming from hundreds of different machines.

Impact

Around 6am EU time, the $30 SMS budget was exhausted within minutes. We jumped on a call, found the cause, and made a mistake: we closed the meeting without action points. Two days later, when the budget reset, the same thing happened again. Another $30 gone, plus downtime while AWS re‑approved the spend, which took about two days.

Solution

The main lesson was ownership and follow‑through. Incidents now end with clear action items, owners, and deadlines. We also treated the initial spend as a warning and made sure there were concrete next steps before closing the incident.

Lesson learned: problems don’t fix themselves. Always leave incidents with explicit actions and owners, especially when money is involved.

Read previous

← #3 The €300,000 Double Refund

Read next

#5 The animation memory leak →