It’s 2 AM. The payment service is down. Your on-call phone is ringing. Your manager — who is in a different timezone and speaks English as a first language — needs an update.

This is the worst moment to be searching for the right English words.

Incident communication is a unique English skill. It needs to be fast, clear, and calm — even when you’re neither calm nor certain what’s happening. The language you use during an incident directly affects how stakeholders perceive your team’s competence, even if the technical response is excellent.

Here’s what works.

The Three Stages of Incident Communication

Every incident has three communication moments: the initial alert, status updates during the incident, and the post-incident report. Each has its own language pattern.

Stage 1: Initial Alert

When you first discover a problem, send a short, factual message. Do not wait until you understand everything — acknowledge first, then investigate.

Template:

”🔴 [SERVICE NAME] — [WHAT IS BROKEN]
Impact: [Who is affected, how many users, what they can’t do]
Status: Investigating
Next update: [In X minutes]”

Example:

”🔴 Payment Service — checkout failing for all users
Impact: ~100% of checkout attempts returning 500 error since 02:17 UTC
Status: Investigating
Next update: 02:30 UTC”

Notice what’s included: what, who, since when, and when to expect the next update. Notice what’s not included: guesses, blame, apologies, or technical jargon.

Verbal version (for a call):

“We have an active incident. The payment service has been returning errors since about 2:17 AM. All checkout attempts are failing. We’re actively investigating — I’ll have an update in 15 minutes.”

Stage 2: Status Updates

During a long incident, send regular updates even if nothing has changed. Silence is more stressful than “still working on it.”

Template:

“Update [time]: Still investigating [X]. We’ve ruled out [Y] and [Z]. Current theory: [short explanation]. ETA for resolution: [time or ‘unknown’].”

Example:

“Update 02:35 UTC: Still investigating. We’ve ruled out a database issue and a recent deployment (rolled back at 02:28, no change). Current theory is a rate limit from our payment processor. Reaching out to them now. ETA: unknown — will update at 03:00.”

Key phrase: “We’ve ruled out…” — this shows methodical progress even before you have a solution.

Stage 3: Post-Incident (Postmortem) Summary

Once resolved, send a summary. This is the most formal communication and the one most likely to be read by senior stakeholders.

Template:

Incident Summary — [Service] — [Date]
Duration: [X hours Y minutes]
Impact: [Quantified: X% of users, $Y revenue at risk, N tickets created]
Root cause: [One clear sentence]
What we did: [Numbered list of actions taken]
Immediate fix: [What resolved it]
Prevention: [What we’re doing so this doesn’t happen again]“


Bug Report English: Writing Tickets That Get Fixed

Clear bug reports in English dramatically increase how quickly issues get addressed — especially in international teams where the developer reading your ticket may not know your system well.

Weak bug report:

“Login is broken. It doesn’t work sometimes.”

Strong bug report:

Login failure on mobile Safari — intermittent, affects ~20% of iOS users
Steps to reproduce:

  1. Open the app on Safari iOS 17+ on a device with Face ID enabled
  2. Enter valid credentials and tap Sign In
  3. Face ID prompt appears but after authentication, app returns to login screen

Expected: User is logged in and redirected to dashboard
Actual: User stays on login screen. No error message shown.

Frequency: ~3 out of 5 attempts on affected devices
Environment: iOS 17.4, Safari 17, iPhone 15 Pro
Logs: [link]”

The structure: Title with impactStepsExpected vs ActualFrequency and environment. Every developer on your team, in any timezone, can now reproduce and fix this without asking you 5 follow-up questions.


🗣️ Key Phrases to Say Out Loud

  1. “We have an ACTive INcident” /wiː hæv ən ˈæktɪv ˈɪnsɪdənt/ — The opener for any emergency call. Calm, declarative.

  2. “We’ve RULED out [X]” /wiːv ruːld aʊt/ — Shows methodical investigation even without a solution

  3. “The IMPACT is…” /ðə ˈɪmpækt ɪz/ — Always quantify: users affected, revenue at risk, features unavailable

  4. “ETA is UNknown — I’ll upDATE at [time]” /iːtiːeɪ ɪz ʌnˈnoʊn/ — Honest and structured. Much better than “I don’t know”

  5. “We’re INvestigating — no FINder yet” /wɪər ɪnˈvestɪɡeɪtɪŋ noʊ ˈfaɪndər jɛt/ — Status update when you’re still looking

  6. “The ROOT cause was…” /ðə ruːt kɔːz wɒz/ — Open the postmortem section with this phrase

  7. “As a PREVentive MEAsure, we’re…” /æz ə prɪˈvɛntɪv ˈmeʒər wɪər/ — Closes the incident with action, not just explanation


📚 Vocabulary

1. Rollback /ˈroʊlbæk/

  • Meaning: Reverting code or config to a previous working version
  • Example: “We initiated a rollback at 02:28 but the error persisted.”

2. Postmortem /poʊstˈmɔːrtəm/

  • Meaning: A meeting or document analyzing what went wrong after an incident (no blame — just learning)
  • Example: “Can you write the postmortem by EOD?”

3. Intermittent /ˌɪntərˈmɪtənt/

  • Meaning: Occurring at irregular intervals, not constant
  • Example: “The error is intermittent — hard to reproduce locally.”

4. Escalate /ˈɛskəleɪt/

  • Meaning: To bring a problem to a higher level of authority or urgency
  • Example: “If we don’t have a fix by 3 AM, I’ll escalate to the CTO.”

5. Mitigation /ˌmɪtɪˈɡeɪʃən/

  • Meaning: Action taken to reduce the impact of a problem (not the same as fixing it)
  • Example: “As a mitigation, we’ve enabled the fallback payment processor.”

6. Reproduce /ˌriːprəˈdjuːs/

  • Meaning: To make a bug happen again deliberately
  • Example: “Can you reliably reproduce this? We need steps.”

7. SLA /ɛs ɛl eɪ/ (Service Level Agreement)

  • Meaning: A commitment about uptime, response time, or performance
  • Example: “We’re approaching the SLA breach threshold — this needs to be resolved in the next 20 minutes.”

🎯 Practice Now

Exercise 1: Write the Initial Alert

The scenario: your team’s API gateway has been returning timeout errors for the last 8 minutes, affecting approximately 30% of API calls. Your product is a B2B SaaS platform.

Write the initial alert message following the template above. Then read it aloud — it should take under 20 seconds.

Check: Does it include what broke, impact scope, start time, and next update time?

Exercise 2: Convert Technical to Plain English

Translate this technical status update for a non-technical stakeholder:

“The Redis connection pool is exhausted due to a memory leak in the session handler introduced in commit a3f8b2. We’ve increased pool size as a mitigation but the underlying leak will require a patch.”

Stakeholder-friendly version might sound like:

“We found the cause: a recent code change is using more memory than expected, which is causing the slowdown. We’ve applied a temporary fix that’s helping right now. We’re writing a proper fix that we’ll deploy by [time].”

Practice saying your version aloud until it sounds natural — no hesitation on the key words.

Exercise 3: The Postmortem Opening

Every postmortem starts with a brief verbal summary before the document is shared. Practice this structure:

“We had an incident on [date] affecting [service] for [duration]. [X users / $Y revenue] was impacted. The root cause was [one sentence]. We’ve fixed it by [action] and we’re putting [prevention measure] in place to prevent recurrence. The full postmortem is in Confluence — I’ll walk through the key points now.”

Record yourself saying this. Check: is your tone calm? Is “root cause” clearly stressed? Does “prevention” sound confident or apologetic?


Incidents happen to every team. How your team communicates during and after them is what separates teams that build trust from teams that erode it.

Clear English isn’t just about language — it’s about giving stakeholders the information they need to make decisions and stay calm. Your words during an incident are part of the incident response.

Next time the alert fires, you’ll have the phrases ready.

Export for reading

Comments