30-Day Failure Pre-Mortem: Field-Ready Prompt with 8 Risk Lenses

TLDR

  • A practical AI prompt for teams to run a structured 30-day failure pre-mortem workshop.
  • Outputs: failure headlines, early warning indicators (EWIs), counter-moves, kill criteria, rollback path, and assumption tests.
  • Provides eight risk lenses to surface governance, observability, sequencing, integration, and human factor issues.
  • Helps decision-owners move from abstract risks to concrete counter-moves with owners and dates.
  • Includes a VMware lifecycle lens for teams running vSphere, NSX, Aria, HCX, or SRM.

Introduction

Every initiative carries hidden risks, and too often teams wait until failure is visible before addressing them. A 30-Day Failure Pre-Mortem flips the script. Instead of reacting after something breaks, you simulate failure in advance and design counter-moves before launch.

This article introduces a field-ready prompt you can paste directly into your AI tool. It helps facilitators run a fast but disciplined session that surfaces real risks across governance, resourcing, observability, and more. By the end, you walk away with failure headlines, EWIs, counter-moves, and clear ownership.


The Custom Prompt

Comprehensive 30-Day Failure Pre-Mortem Prompt

TITLE: 30-Day Failure Pre-Mortem — Field-Ready Prompt (with 8 Risk Lenses)

HOW TO USE (2 minutes prep)
- Timebox: 20–40 minutes. Attendees do 5 minutes silent write, then round-robin.
- Roles: Facilitator (keeps time), Scribe (fills template), Decision-Owner (can approve kill criteria).
- Inputs ready: Scope/goal, dates, dependencies, owners, current version/builds (if applicable).
- Outputs by end: Top 3 failure headlines, EWIs with thresholds, counter-moves w/ owners & dates, kill criteria, rollback path, 3 assumption tests for this week.
- Norms: No blame. Be concrete (“NSX Edge T0 failover drains half traffic”), not vague (“network issue”).
- Scoring: For each risk, rate Impact (H/M/L) × Likelihood (H/M/L) and set a red/amber/green gate to proceed.

QUICK VERSION (60-second run)
Imagine it’s 30 days from now and this initiative has failed.
1) What are the top three causes? 
2) What early warning indicators (EWIs) would tip us off (with thresholds)? 
3) What counter-moves do we put in place now (Prevent / Detect / Respond / Recover)? 
Assign owner + date per counter-move. Define kill criteria and a tested rollback path.

FULL WORKSHOP SCRIPT

A) SET THE SCENE (2 min)
“It’s Day 30 and the initiative failed. Tell the short headline of what went wrong.”

B) SILENT WRITE (5 min) → ROUND-ROBIN (10–20 min) → CONSOLIDATE (5 min) → COMMIT (5 min)

C) CAPTURE THE FAILURE (Top 3 Headlines)
1) …
2) …
3) …

D) ROOT CAUSES (tag each): People / Process / Technology / Data / Dependencies / Vendors / Security & Compliance / Budget & Time / Governance / Change Mgmt / Observability
- Cause #1 → tags: … | Why now? …
- Cause #2 → tags: … | Why now? …
- Cause #3 → tags: … | Why now? …

E) EARLY WARNING INDICATORS (EWIs)
List the earliest observable signals in weeks 1–4. Add thresholds and owners.
- EWI → threshold (e.g., error rate > 2% for 5m) → owner → alert path/channel.

F) COUNTER-MOVES (build now)
For each cause, propose at least one in each category:
- Prevent: guardrails, checklists, approvals/gates, interop checks, feature flags, capacity reservations.
- Detect: synthetic tests, health probes, telemetry, drift checks, canary dashboards.
- Respond: comms tree, incident roles, rollback button, isolation/drain steps.
- Recover: tested backups/restores, DR pattern, vendor escalation path.
→ Assign Owner | Date | Success metric (how we’ll know it worked).

G) KILL CRITERIA & ROLLBACK (binding)
- Kill criteria (objective tripwires): …
- Rollback path (tested dry-run): exact steps/tools, data/state implications, time estimate.

H) ASSUMPTIONS TO TEST THIS WEEK
Assumption → Test/Experiment → When → Owner → Success/Fail signal.

I) DECISIONS & COMMITMENTS
- Decision log (who decided, what, when). 
- Parking lot (defer, with owner/date).

J) EIGHT RISK LENSES (ask these explicitly)
1. Governance & Decision Traps
   - Who can pause/kill by Day 15? What objective criteria force a stop? Where is this decision tree documented?

2. Resourcing Reality
   - If a key SME is unavailable, which runbooks/bench cover? Which cross-team dependencies are “assumed free” but actually gated?

3. Observability Gaps
   - Which failure would be invisible for 48 hours? What’s the earliest symptom per cause, and how do we automate detection?

4. Time & Sequencing
   - What takes longer than planned (CAB, firewalls, PKI, procurement)? What’s the latest safe start date to still finish on time?

5. Economic & Contract Risks
   - What could double run-rate (egress, snapshots, log retention, rehydration)? Any vendor SLA/EoGS boundary inside 30–90 days?

6. Integration / Interop Hotspots
   - Which API/contract/version mismatch would hard-fail us? What pre-flight interop check blocks the change if mismatched?

7. Second-Order Impacts
   - If we succeed technically but fail operationally (handover, monitoring, cost), what breaks first? Who downstream is impacted?

8. Human Factors
   - Misaligned incentives? Quiet resistance? What training/dry-run is required so hand-off isn’t “learn it in prod”?

K) (OPTIONAL) VMWARE INTEROP & LIFECYCLE LENS
- Version truth: List vCenter/ESXi/NSX/vSAN/Aria/HCX/SRM builds. Interop matrix check before/after + rollback path validated.
- vLCM/vSAN: What if remediation drifts hosts out of compliance or resyncs spike? EWI thresholds; throttle/evac plan.
- NSX: Edge/T0 failover continuity, parallel change freeze with upstream firewalls, synthetic east-west checks per segment.
- DR: SRM/HCX cutover fail path, RTO breach threshold, manual runbook validated.

L) RISK REGISTER (paste into your doc)

| Cause | Impact | Likelihood | Tags | EWI (threshold) | Counter-Moves (P/D/R/R) | Owner | Date | Success Metric |
|---|---|---|---|---|---|---|---|---|
| … | H/M/L | H/M/L | … | … | … | … | … | … |
| … | H/M/L | H/M/L | … | … | … | … | … | … |
| … | H/M/L | H/M/L | … | … | … | … | … | … |

M) “WHAT DO YOU WISH I WOULD ASK?”
Capture 3–5 meta-questions. Promote the best one to a standing checkpoint in status reviews.

FACILITATION TIPS (use as needed)
- Anchor discipline: silent write first prevents groupthink.
- Force ownership: no counter-move without an owner and a date.
- Gate to green: do not exit the meeting without kill criteria, rollback steps tested, and three assumption tests scheduled this week.

What This Prompt Does

The 30-Day Failure Pre-Mortem prompt gives teams a ready-to-use facilitation script for surfacing risks before launch. It forces concrete failure headlines, early warning indicators, counter-moves, and tested rollback paths. With the eight risk lenses, teams look beyond technology into governance, economics, sequencing, and human factors.

Example: A team preparing a cloud migration uses the prompt. Within 30 minutes, they identify “firewall change delays” as a top cause of failure, set an EWI (ticket not approved by Day 10), and assign a counter-move (pre-file CAB request with escalation path).


Step by Step Usage

  1. Copy the full prompt into ChatGPT or Claude.
  2. Bring scope, dates, dependencies, and owners to the session.
  3. Timebox 20–40 minutes, with silent write followed by round-robin.
  4. Capture top three failure headlines, EWIs, counter-moves, kill criteria, rollback path, and assumption tests.
  5. Apply the eight risk lenses explicitly before closing the session.
  6. Save the risk register and integrate it into project tracking.

Quality and Safety Checks

  • Risks are captured without blame and made concrete.
  • Each risk has both an EWI and a counter-move with owner/date.
  • Kill criteria and rollback path are binding and tested.
  • Three assumption tests are scheduled in the same week.

FAQ

Q1: Who should use this prompt?
Teams launching new initiatives, projects, or product builds within a 30-day horizon.

Q2: What if we only have 5 minutes?
Use the Quick Version to capture failure headlines, EWIs, counter-moves, and kill criteria.

Q3: Does this replace risk registers?
No: it populates and strengthens them with tested thresholds and ownership.

Q4: What if my initiative involves VMware or hybrid cloud?
Use the optional VMware lens for vSphere, NSX, Aria, HCX, or SRM lifecycle risks.

Q5: How do we avoid groupthink?
The silent write step ensures individual thinking before group consolidation.


Conclusion

The 30-Day Failure Pre-Mortem prompt enables teams to move from vague concerns to actionable risk management in under an hour. By combining structured facilitation with eight risk lenses, it helps teams set clear thresholds, assign counter-moves, and validate rollback paths. This proactive discipline reduces surprises, builds resilience, and gives decision-owners confidence to proceed.


Field Drill Walkthrough

Scenario: Cloud Migration with Tight Deadlines

  • Failure headline: “Firewall approvals missed, blocking cutover.”
  • EWI: CAB ticket not approved by Day 10.
  • Counter-moves: Prevent (submit ticket Day 1), Detect (track SLA daily), Respond (escalate at Day 7), Recover (rollback cutover).
  • Kill criteria: No firewall approval by Day 12 → stop cutover.
  • Rollback path: Tested dry-run with isolation in staging, 1-hour recovery window.

Top Risks:

  1. Dependency approvals slip past deadlines.
  2. Integration mismatches between APIs.
  3. Human factors: hand-off not trained before launch.

Checkpoints:

  • Run assumption tests weekly.
  • Confirm rollback steps validated in staging before go-live.

Leave a Reply

Discover more from Digital Thought Disruption

Subscribe now to keep reading and get access to the full archive.

Continue reading