Incident Management
Incident classification
Classification | Type | Example | Response time | Update frequency |
---|---|---|---|---|
P1 | Critical | Ongoing unauthorised access or severe security incident | 20 minutes (Live Services Team) | 1 hour |
P2 | Major | Substantial degradation of service, significant numbers of events not being ingested or outbound, no events being processed | 60 minutes (office hours only) | 2 hours |
P3 | Significant | Users experiencing intermittent or degraded service due to platform issue | 2 hours (office hours only) | Once after 2 business days |
P4 | Minor | Component failure that does not immediately impact a service, or an unsuccessful DoS attempt | 1 business day (office hours only) | Once after 5 business days |
Incident workflow
Incidents include may include the following scenarios:
- system access problems
- wider technical failures with possible reputational impact to GDS
- denial of service (DoS)
- data breach or leak
- defacement
- unauthorised use of systems
- suspicious activity, such as traffic from an unknown source
A priority level will be assigned to incidents based on their complexity, urgency and resolution time. Incident severity also determines response times and support level, in line with Incident classification.
Notifying us of an incident
The team can be notified of an incident via email on di-life-events-platform@digital.cabinet-office.gov.uk or on Slack through #di-data-life-events-platform.
In addition, the team will be automatically notified of certain events.
What happens when an incident occurs?
GDX Platform follows a prepared workflow to manage an incident to minimise its impact on clients and service users.
The GDX platform team will take the following key steps upon an incident being identified.
An incident lead will be established
When an incident occurs we will assign an incident lead. This often is the person who notices the problem and will coordinate with anyone else investigating and attempting to fix the issue. The problem can be handed over to another person or team if needed but the lead person needs to be recorded in the incident report.
The team will be kept informed
The GDX Platform team will use the #di-data-life-events-platform Slack channel. If the incident involves a data or security breach, we will notify the Cyber Security team who’ll help you manage the incident using the #cyber-security-help Slack channel.
The incident is prioritised
We will prioritise the incident and start tracking actions, updates and communications. This is done by creating a new incident report - copied from the incident report template - and we used it to track updates and progress.
Incident response team setup (if needed)
If necessary we form a team with both an incident lead and a communications lead. The communications lead will make sure relevant parties are updated according to the incident priority table.
Investigation takes place
We keep the incident report up to date whilst investigation progresses. If the incident involves a breach of personal data we will immediately inform the Security Incident team within Cabinet Office.
Communicate results to key stakeholders
If the incident is serious (P1 or P2) we inform a wider GDS audience and our service users. (suppliers and acquirers)
See Google Doc for contact details
External and internal communications
We will ensure internal and external parties, such as our service users are fully informed at every stage of our incident management process. We will post regular updates to the status of an incident in the #incident Slack channel.
Incident escalations
We will notify escalation contacts of all high priority incidents (P1/P2).
Report cyber security incidents
The incident lead will inform the National Cyber Security Centre (NCSC) of any P1 incidents. The NCSC defines security incidents in its categorisation system prioritisation framework. Report a Cyber Incident - Report a Cyber Incident - NCSC
Resolve the incident
We hold an incident and lesson learned review following a blameless post mortem culture so that our service can improve. Add a row to the central GDS incidents summary spreadsheet linking to our incident report document.