Incident Response - For Sysadmins
- 1 Scope
- 2 Background
- 3 Goals (of this policy)
- 4 Goals (of the procedures in this policy)
- 5 Roles and Definitions
- 6 Procedures
The policy and procedure in this document applies to all individuals authorized to use SDSC's IT resources. The prescribed actions and processes described in this document apply to situations involving any SDSC Host (as defined in the SDSC Security Policy) as well as any of the IT resources provided by SDSC.
Note that this document does not authorize all affected individuals to act upon the procedures described here. The policy in this document authorizes actions for only specific roles. For those not included in those roles, this document is purely informational.
After identification, incident response focuses upon two general activities: containment and eradication. Containment attempts to restrict the influence of an attacker while trying to learn the attacker's goals and methods. Eradication attempts to remove the attacker's influence. Both activities require the oversight of security personnel as well as cooperation between security personnel, service administrators, and host administrators.
Incident response involves both activities: containment, then eradication, in that order. Depending on the nature of the incident and attack, one may have priority over the other. We must strive to understand an attack to reduce the chance of future attacks, but must also balance that effort against the security needs (confidentiality, integrity, availability) of the affected system. This policy outlines the criteria for striking that balance, but leaves the final judgement in the hands of security personnel.
Goals (of this policy)
- Establish roles, timelines, and key procedures for post-detection incident response efforts.
- Explain the reasoning behind the stipulations of this policy.
Goals (of the procedures in this policy)
- Acquire and preserve information that may help understand an attack.
- Determine the scope of an attack.
- Address the security needs of the affected service or host.
- Remove the influence of the attacker.
- Restore the affected service or host to a secure state.
Roles and Definitions
(Move these to their own page and replace with hyperlinks... some day.)
These are members of SDSC's security group (sometimes known as "Security Technologies"), or individuals designated by SDSC's CISO as members of the incident response team. Incident response team members may oversee a response effort until relieved by a member of SDSC's security group.
A member of SDSC's security group designated by the CISO as the preferred person to respond to security incidents.
Service administrators are personnel responsible for the maintenance, configuration and administration of a service or group of services running on a host operating system. In some cases, a service administrator may also serve as a system administrator for the same host; though in most cases, service administrators have restricted administrative privileges and do not maintain the underlying host operating system.
System administrators are personnel responsible for the maintenance, configuration, and administration of a host operating system and its core services (e.g. ssh). System administrators have full administrative privileges on the host they manage, and bear ultimate responsibility for the proper operation of their hosts.
A possible violation of security, which may result in impacted confidentiality, integrity, or availability of data or resources.
An action that violates security and may result in impacted confidentiality, integrity, or availability of data or resources. (An instance of a threat.)
An entity that executes an attack.
An investigative event arising from an attack, set of related attacks, evidence of an attack, or discovery of a new threat. (When we get stuck at work late and don't get enough sleep.)
A N-Class service (or host) has low sensitivity to all of the following threats: disclosure, deception, and disruption. In response to an incident, security personnel, service administrators, or system administrators may suspend an N-Class service or host at any time and without notification. All services running on personal hosts and personal hosts themselves are N-Class, unless designated otherwise. "N" means "no-care".
An A-Class service (or host) has low sensitivity to all of the following threats: disclosure, deception; and at least a mild sensitivity to disruption. In response to an incident, only security personnel may effect the suspension of an A-Class service or host, and only after performing all other containment tasks. In other words, the host or service must remain available as long as possible, even during the course of incident response. "A" means "availability emphasized".
An I-Class service (or host) has a low sensitivity to all of the following threats: disclosure; and at least a mild sensitivity to deception. In response to an incident, security personnel, service administrators, or system administrators may effect the suspension of an I-Class service or host after performing initial containment tasks, but otherwise, as soon as possible in order to protect the integrity of the resource or its dependencies. "I" means "integrity emphasized".
A C-Class service (or host) has at least a mild sensitivity to the threat of disclosure. In response to an incident, security personnel, service administrators, or system administrators must effect the suspension of a C-Class service or host immediately after capturing relevant portions of its running state. Unless excepted, all SDSC hosts and personal hosts used for storing, viewing or otherwise handling unencrypted PII or PHI are C-Class. "C" means "confidentiality emphasized".
This process begins when someone reports an incident according to Incident_Response_-_For_Users, or when a service administrator or system administrator notifies security personnel of an incident. Note that the recipient of an incident report, as discussed here, must have the role of security personnel.
The preferred responder should verify all incident reports, however, sometimes that is not possible. In those cases, other security personnel may take over. Regardless, the security group must know about all reported incidents.
- Upon receipt of an incident report, the recipient must attempt to contact the preferred responder through all reasonable and persistent means to hand off the report.
- If the recipient can not confirm hand-off to the preferred responder within 30 minutes of receipt of the report, the recipient may:
- Investigate the claims if working within their expertise and with the assistance of qualified individuals.
- Contact and hand off the report to other security personnel more qualified to perform the investigation.
- Whoever performs the investigation becomes the responder.
- The preferred responder or other member of the security group may request the responder to hand off the incident at any time, at which point after confirmation of the hand off, the person making the request becomes the new responder.
- The responder shall send an email to firstname.lastname@example.org with their findings. The sender must encrypt the email using GPG and the predetermined shared secret.
This is the meat of the incident response. The information gathered here is crucial to understanding the attack, detecting related attacks, preventing future attacks, and guiding recovery and follow-up efforts.
Gathering information may often conflict with protecting the security needs of the affected host or service. The different classes (N, A, C, I) help decide when and where to draw the line.
Note that other documents describe investigative techniques in more detail.
- If the responder can not account for the claims in the incident report as false positives, we must assume an attack is happening, or has taken place.
- The responder shall create an incident page on incidenterator with the initial information.
- The responder, at their discretion and depending on the scope of the incident, may use the #incident IRC channel to communicate with involved individuals.
- The #incident IRC channel shall have a channel key (+k) set to a shared secret that is not the same as the one used for GPG-encrypted communication.
- Any authorized channel operator in the #incident IRC channel may kick any member out of the channel at any time for any reason or no reason at all.
- This is to ensure that everyone is confident with sharing potentially sensitive information within the channel.
- Frivolous or abusive use of this privilege may result in unexpected and unpleasant consequences.
- If the responder is dealing with an N-Class host or service and the corresponding service administrator or system administrator is not immediately reachable, the responder may effect the suspension of the affected service or host at this point and wait for the service or system administrator. The responder may temporarily restore the host or service as needed for further investigation.
- The responder shall update the incidenterator page with their action and current state of the investigation.
- The responder shall, in a timely and as non-intrusive manner as possible, collect details of the running state of the affected system and place these details, when practical, into the incidenterator page. (Pasting in chunks of logs is okay. Pasting in a binary file is not.) Small, frequent updates with remarks are easier to follow than long updates with no remarks.
- If the responder is dealing with a C-Class host or service, the responder shall effect a suspension of the service. If the responder is unable to verify that the attacker's influence does not extend into the host operating system beyond the scope normally used by the affected service, the responder shall effect the suspension of the host instead.
- Suspension of the host may consist of "pausing" its virtual machine (if applicable), a shutdown that disrupts file access times the least (e.g. sync and yank the power), or removal of network connectivity.
- The responder shall decide the most appropriate method.
- The responder shall attempt to discover the extent of the attacker's influence, goals and methods using any passive methods available to the responder. The responder must document their findings and attempt in the incidenterator page. The responder must not perform destructive analysis at this time.
- If the responder is dealing with an I-Class host or service, the responder shall suspend the affected service. If the responder is unable to verify that the attacker's influence does not extend into the host operating system beyond the scope normally used by the affected service, the responder shall effect the suspension of the host instead.
- The responder may now perform more intrusive, possibly destructive analysis at this point, however when possible, the responder must use non-destructive methods.
- Work on an image or snapshot.
- Work on disks mounted read-only (unix mount flags: ro,noatime,loop).
- Use ZFS snapshots where available. Beware of symbolic links pointing outside of the snapshot.
- When working with an A-Class host or service, the responder should employ session transcription and carefully examine live data, rather than suspend the host or service to make a copy for analysis.
- The Unix program "script" may be useful.
- The responder should provide a copy of the session transcript for the security group.
- The responder must provide the access times of relevant or possibly relevant files and directories on the incidenterator page, as the access times will change as a result of the investigation.
- When the responder feels comfortable with their understanding of the incident, the responder shall, in their best judgement, take the measures of the narrowest scope to limit the attacker's ability to do further damage.
- Understanding the incident may require recursive analysis of other data, leading to the discovery of additional attacks to identify and contain.
- The responder should not proceed with containment or eradication methods until they are confident that they understand the scope of the incident.
- Failure to heed to this step may result in an attacker hiding their tracks or doing more damage than originally intended.
- The responder must document these measures and actions on the incidenterator page for the incident.
- The responder shall, with the help of service or system administrators as necessary, eradicate the attacker's presence on the affected host or service.
Sometimes exceptions arise that require immediate action to protect against an imminent attack. In certain cases (defined by classes), protecting against an imminent attack is more important than understanding an existing one.
Or, in the case of an N-Class host or service, the role of the affected resource is insignificant compared to the overhead of following a more stringent procedure.
- When dealing with an N-Class, C-Class, or I-Class service, in the absence of a response from security personnel within 30 minutes, a system or service administrator may effect the suspension of the corresponding affected host or service, provided that such action is within their authorized scope of duty, such action is minimally disruptive to other resources, and they believe that the affected resource is under attack or at risk of immediate attack.
- A system administrator has the authority to effect the suspension of any service running on a host they are responsible for.
- This policy does not grant a service administrator the authority to shut down the system their service runs on. (Though that authority may have been granted elsewhere.)
- A responder may effect the suspension of a host or service at any point during their investigation at their discretion, particularly if continued operation of the host or service will likely result in the attacker significantly spreading their influence or causing further damage.