How to document a service outage or downtime
Your internet goes down during a critical workday. A hosting provider's servers become unreachable, taking your website offline. A SaaS tool your team depends on throws errors for three hours. The immediate priority is getting the service back. But once it's restored, there's a second question: what record do you have of what happened, how long it lasted, and what it cost you?
Service outages are common. Documented service outages - with timestamps, communication records, and impact assessments - are not. Most people move on once the service is restored. The ones who document the outage are the ones who successfully claim SLA credits, negotiate contract adjustments, or hold providers accountable for repeated failures.
Capturing the outage in real time
Start documenting the moment you notice the outage. The first entry in your record should be the time you first observed the problem. Not when the provider later says it started - when you experienced it.
Note what you observed: "At 9:15 AM, [service] became unreachable. Error message: 502 Bad Gateway. Attempted to access from two different devices and networks. Same result." Specificity matters. "The site was down" is less useful than a description of the error, the time, and what you tried.
Screenshot everything you can:
- Error messages. The specific error your browser, application, or device displays when the service fails. Screenshots capture the exact wording and the timestamp.
- Status pages. Most major providers maintain a public status page (status.aws.amazon.com, status.slack.com, etc.). Check it as soon as you notice the outage and screenshot what it shows. Status pages are updated retroactively, and the initial display - which may say "all systems operational" while the service is down - is informative in itself.
- Provider communications. If the provider sends an email, posts on social media, or updates a status page acknowledging the outage, save those communications with timestamps. The gap between when the outage began and when the provider acknowledged it is relevant data.
If the outage continues, take periodic screenshots of the status page and any error messages. A single screenshot shows one moment. Multiple screenshots at intervals show the duration and progression of the outage, including any claims of partial resolution that didn't match your experience.
Support ticket documentation
Most people contact the provider's support team during an outage. That support interaction generates documentation, but you need to capture it yourself.
If you submit a support ticket, save the ticket number, the date and time of submission, and the text of your report. If the ticket system sends a confirmation email, save it. If responses come through a portal, screenshot them - support portal content can be archived or deleted after a ticket is closed.
If you contact support by phone, note the date, time, the representative's name, and what you were told. "Called at 10:30 AM. Spoke with [name]. Was told the engineering team was aware of the issue and estimated resolution within two hours. No ticket number was provided."
If the provider gives you estimated restoration times, document those alongside the actual restoration time. Being told the service would be restored by noon when it wasn't restored until 5 PM is relevant to both an SLA claim and a broader evaluation of the provider's communication during incidents.
Checking against your SLA
Service Level Agreements define what uptime the provider has committed to and what remedies are available when that commitment isn't met. If you have an SLA - and many paid services include one - review it during or shortly after the outage.
Key SLA elements to document:
- Guaranteed uptime percentage. 99.9% uptime allows roughly 8.7 hours of downtime per year. 99.99% allows about 52 minutes. Calculate whether this outage, combined with any previous outages, pushes the provider below their guarantee.
- How downtime is measured. Some SLAs define downtime narrowly - only counting periods where the service is completely unavailable, not degraded performance. Your documentation of the outage's nature (complete failure vs. intermittent errors vs. degraded speed) matters here.
- Credit or remedy process. Most SLAs require you to request credits within a specific timeframe - often 30 days of the outage. Note the deadline and the submission process.
- Exclusions. SLAs typically exclude certain causes: scheduled maintenance, force majeure, problems caused by your own systems. Review whether the provider might invoke an exclusion.
Save a copy of the relevant SLA document. SLA terms can change with contract renewals, and having the version that was in effect at the time of the outage prevents disputes about what was promised.
Measuring and documenting impact
SLA credit claims often require you to describe the impact of the outage. Even if you're not filing a formal claim, documenting the impact while it's fresh is useful.
Business impact. Revenue lost during the outage. Customers who couldn't access your service. Orders that couldn't be processed. Deadlines missed. Quantify what you can. "Our e-commerce site was unreachable for 4.5 hours during peak business hours. Based on average hourly revenue, estimated lost sales are approximately $X."
Productivity impact. Staff who couldn't work because a tool was unavailable. Hours spent on workarounds. Time spent communicating with your own customers about the outage. "12 team members were unable to access the project management platform for three hours. Total productivity impact: approximately 36 person-hours."
Customer impact. If the outage affected your customers, document any communications you received - support tickets, complaints, social media mentions. These demonstrate downstream consequences of the provider's failure.
Filing for SLA credits
Once the outage is resolved and you've assembled your documentation, submit a credit request within the SLA's required timeframe. A well-documented request includes:
- The date and time the outage began (from your perspective)
- The date and time service was restored
- The total duration of downtime
- Screenshots of error messages and status page updates
- Your support ticket number and correspondence
- A description of the impact
- The specific SLA provision that was breached
- The credit amount you're requesting, calculated per the SLA's formula
Keep a copy of your credit request and any response from the provider. If the credit is granted, save the confirmation. If it's denied, save the denial and the stated reason - this may be relevant if the provider's reliability becomes a factor in contract renewal discussions.
Building an outage history
A single outage is an event. Multiple outages are a pattern. Maintaining a simple log of service outages over time - even brief ones - gives you a complete picture of a provider's reliability.
Each entry needs only a few fields: date, provider, duration, cause (if disclosed), impact, and resolution. Over a year, this log shows whether outages are isolated incidents or recurring problems. That information is leverage in contract negotiations, migration decisions, or conversations with leadership about infrastructure reliability.
Your outage documentation is your account of what happened to your service, your business, and your operations. The provider will have their own incident report, written from their perspective. Your records ensure you have a parallel account grounded in your experience and your data.