This article covers what is a problem in the realm of cyber security, the benefits of ServiceNow problems and the problem life cycle in detail.
A problem is defined as the underlying cause of one or more incidents. The root cause is not known at the time a problem record is created, and the problem management process is responsible for further investigation. A problem can be:
- The occurrence of the same or similar incidents multiple times
- An incident that impacts a service(s) with many users
- The result of diagnostics reveals systems not operating in the expected way that will lead to an incident; a service impact has not yet occurred
Problem relies heavily on:
- The CMDB for problem assignment and impact analysis
- The incident management process for providing details of individual-related incidents
- The change management process for controlling changes needed to solve problems
- The knowledge management process for sharing information about known errors
Managing a problem isn't just finding and fixing incidents, it's more than identifying, analyzing the factors behind an incident that occurred due to the failure of an asset, detecting the underlying root cause, and following a best practice to erode that underlying cause permanently to prevent recurrence of similar incident. Unless we address the root cause and other miscellaneous reasons behind an incident generation, the problem remains for longer time.
Benefits of ServiceNow Problems
Problem has the potential to provide the following benefits.
- Reduces resolution time: Every support team wants to solve an incident faster to meet the SLA tracking. But, how do they do? By notifying the best processes/methods while solving the today's incidents minimizes the need for more time to solve a future service disruption.
- Avoids costly incidents: A costly incident can cause more damage to medium and large enterprises. Preventing the costly incidents saves time, human resource, and cost to an organization.
- Improves productivity: By following best practices of solving an incident or multiple incidents increases the productivity of support team. Therefore, support team members stop responding to incidents frequently which saves their time.
- Empowers Support team morale: When an organization puts a problem management in practice, helps the support team to investigate, and learn from the patterns that are used to solve a complex or a simple incident.
- Enables consistent and continuous service improvement: Having a proper problem management decreases the occurrence of incidents across the organization and prevents service disruption.
- Improves Customer Satisfaction: Customers are happy when there are fewer incidents in the triage. Best problem management reduces the incidents and earns the customer trust.
Required User Permissions
- A service desk agent, incident manager, problem coordinator, or other IT user can manually trigger Problems.
Problem Life Cycle
States in any ServiceNow application serve a specific purpose. They are designed to make it clear where in a process a record currently resides and to display progress. States should represent a unique phase in a process where a specific set of related activities are grouped together and designed to achieve a particular outcome to move to the next phase of the process.
Our recommended Problem Management process consists of the following state model:
- New
- Assess
- Root Cause Analysis
- Fix in Progress
- Resolved
- Closed
State 1: New
Initially, when a problem ticket is created, by default it is in a state of New. This is where very basic information is added that may suggests a problem exists. All known information about the symptoms experienced is captured.
Below are the mandatory fields to create a Problem manually.
Problem Statement
If other fields such as CIs are known at this point, they can still be added and will be done automatically if they are coming from an existing incident record; however, they do not need to be mandatory to progress.
Problem Assignment
In New state, it is necessary to identify the appropriate assignment group to assess the problem in the next phase of the lifecycle. This assignment is best achieved by automatically updating the Assignment Group field rather than letting the user try to pick the correct group manually since this approach is prone to error.
At this point, the assignment group (problem management group) will need to choose an individual problem coordinator to take responsibility for the problem on behalf of the group. This is done by populating the Assigned to field.
Once the mandatory fields are populated the problem coordinator is expected to click the Assess button. This will move the problem into the lifecycle where it is considered live and something that requires attention.
State 2: Assess
At Assess state, the problem coordinator is primarily assessing the problem to determine whether it is genuine or not.
The new Assigned to the individual (the Problem Coordinator (PC)) will now conduct an initial review of the problem primarily to check that it is indeed a real problem. If the problem coordinators are comfortable in proceeding with the investigation, before clicking Confirm, PCs will update the Priority, Business Service, and CI.
Establishing priority
Problem prioritization typically drives the criticality associated with the handling of the problem and the order in which problems will be focused on. Priority is calculated through a combination of impact and urgency.
- The impact is the effect that a problem has on business.
- Urgency is the extent to which the problem’s resolution can bear delay.
Priority is generated from urgency and impact according to the following table.
Impact |
|
Urgency |
||
1- High |
2 - Medium |
3 - Low |
||
1- High |
Priority 1 - Critical |
Priority 2 - High |
Priority 3 - Medium |
|
2 - Medium |
Priority 2 - High |
Priority 3 - Medium |
Priority 4 - Low |
|
3 - Low |
Priority 3 - Medium |
Priority 4 - Low |
Priority 4 - Low |
It is possible to automatically establish the priority of the problem based on the CI that is identified in the problem record. With this technique, the business criticality value of the CI is used to determine the priority of the problem. For example, an online banking service would be considered critical to a financial organization. If this CI is related to the problem the priority can be automatically set to Critical as a result. This ensures a more accurate and consistent prioritization of the problem, as the determination of impact and urgency can be a subjective call. If this automated method is being used this can occur in New state when the problem is first raised as it will be helpful to the problem coordinator to see this immediately.
Business services and Cis
The Business service and CI fields are used to identify what is being impacted by the problem. These fields use CMDB data which is also used across other ITSM processes, therefore, creating a valuable link and traceability, particularly for business services. This also offers the use of the dependency views feature that displays the relationship map from the service/CI in question to other related CMDB components and will display any ongoing incidents or changes that may exist to aid with root cause analysis.
If the ServiceNow Agile Development application is in use, existing defects should be searched to look for possible causes or relationships to the problem. Once this data is entered, the problem coordinator will click Confirm to move to the next state.
State 3: Root Cause Analysis
The problem coordinator may now need to engage one or more technical support teams to investigate and potentially help fix the problem. This is achieved using problem tasks.
Problem Tasks
The parent problem record remains assigned to the problem coordinator throughout the entire process. The coordinator creates and assigns individual tasks to the various technical support teams to aid in the investigation and diagnosis. Each team will capture its own investigation and discoveries in its individual tasks and the problem coordinator will review and coordinate them.
There are two types of problem tasks: Root Cause Analysis and General.
The RCA type should be selected for specific tasks required to investigate the root cause and should be created by the problem coordinator at this point.
During the Root Cause Analysis state, four main activities are intended to occur:\
Discover a workaround (if possible)–The problem coordinator should enter it into the Workaround field and communicate it to all open related incidents using the Communicate Workaround related link. This populates the text from the Workaround field into the Activity Log of all related open incidents explaining that it is a workaround from the problem record.
Discover the root cause and document if using the Cause notes field on the problem– If the problem coordinator needs help to discover the root cause, they can assign a Root Cause Analysis problem task to the relevant team who will attempt to document the cause code, cause notes, proposed fix and provide a workaround on the problem task.
Discover a permanent fix for the problem to prevent it from happening in the future– Enter it into the Fix notes field and communicate it to all open incidents using the Communicate Fix-related link. As with the root cause, if the problem coordinator needs help to discover a permanent fix, they can use a problem task to gather that information.
Communicate that this problem is known and currently being worked on that can help to deflect incidents – Since it could take time to implement a permanent fix, this might be helpful. The problem coordinator should create and publish a known error knowledge article. Known error articles show up when an end-user goes to create an Incident via the Service Portal. Click the Create Known Error article related link to create a known error article from a problem which creates a link between the two records displayed in the Primary Known Error article field in the problem record. The Problem statement, Description, Workaround, and Cause notes are copied over when creating a known error article. The known error article will be in the draft state and can then be published. If it is not deemed necessary to publish the known error in the Knowledge Base, service desk analysts and other technical support users can simply search for all problems to find these and use the information as they require it. Most commonly this will be used by the service desk if they are trying to resolve an incident, find a workaround or if they recognize a pattern of similar incidents.
These activities can all happen in parallel and may be discovered at different times through root cause analysis. It is possible that none of these are discovered, some or all four.
State 4: Fix in Progress
The Fix in Progress state represents a problem that has been investigated and is now needing a fix. At this state it becomes mandatory to enter Cause notes and Fix notes.
The problem coordinator can create or relate to one or more change requests to show the clear path to resolution.
Risk Accepted
From the Root Cause Analysis or Fix in Progress states, the problem coordinator can accept the risk of not fixing the problem for now.
- Due to cost implications, the business may determine it is not worth the cost of fixing
- A fix cannot be determined, then the resolution code is set to Risk Accepted.
Risk Accepted problems in these situations can be reviewed by the problem coordinator at appropriate intervals. The coordinator establishes whether the impact is increased, or the situation is changed to the point where either the problem is fixed or no longer required due to some other change that is occurred and resolved it. The problem coordinator can then choose to reanalyze the problem, so the problem process can then be worked through to find the solution.
ServiceNow recommends that Risk Accepted problems are not set to Closed since the goal of the process is not achieved yet and the problem still exists. Setting a problem to Closed suggests that it is no longer a problem and can be misleading since closed records are considered inactive as a standard across the platform.
Risk Accepted problems should be set to Resolved and can remain in Resolved indefinitely. It is important to remember that this is a key difference between Incident and Problem Management. Incident Management is concerned with the restoration of service as quickly as possible using whatever means possible. Problem management is concerned with permanently resolving the issue and ensuring it will not reoccur, therefore it is acceptable for this to take as much time as is required. There should be no driver to close out unresolved problems purely because they are not going to be fixed at that time and are sitting on a list of active problems.
State 5: Resolved
Once the problem is moved to a Resolved state, the Assigned to an individual will need to populate the Mandatory Fix notes field with text to describe exactly what is done to solve the issue.
At the Resolved state it is also possible for an organization to conduct a review of the problem if the process requires it. Additional fields can be added here to capture that information. Alternatively, problem tasks can be assigned by the problem coordinator as required.
A set period can also be observed before setting the problem to Closed state to confirm that the known error is solved. If evidence suggests that the issue persists, the state can be set back to Root Cause Analysis using the Re-analyze button. The process can then be worked through again to continue to find the solution.
If the problem is confidently considered solved, then the Assigned to an individual will close the problem using the Complete button.
State 6: Closed
At Closed state, several resolution codes are displayed. These are automatically determined by certain actions that occurred during the process:
Duplicate –The problem was marked as a duplicate of another problem. Any related incidents will be moved over to the problem that this one is a duplicate of. This is configurable in the problem properties.
Canceled –The problem was canceled. There are very few scenarios where a problem is genuinely canceled. This will only occur when a problem was raised in error usually prematurely before realizing there is no real problem.
Fix Applied –A permanent fix was applied, and the problem was resolved.
Risk Accepted –The problem has not been resolved but it has been accepted that the solution is not to be applied at this time. If that decision changes, the problem can revert.
Note: The Problem management properties determine whether a closed problem can be re-analyzed, for example, additional incidents are added to this problem after the fix is applied. Set the role to Nobody if you do not want to be able to reanalyze the problem, if you do that, should the problem reoccur, create a new problem ticket and set the first reported field to refer back to this problem, so you can trace it back later.
Creating a Problem Manually
Roles required: A service desk agent, incident manager, problem coordinator, or other IT user
To create a Problem ticket in Service Now,
- From the ServiceNow main web screen, search Problem using the Filter navigator on the top left of the page
- Click Create New
- In the Problem record page, fill in the necessary fields and enter the detailed information in the Problem Statement(Mandatory field).
- Assign a responsible team to the Assignment Group field
- Then, assign an individual to Assigned to field
- Click Submit either on the top right corner or bottom left corner
The Problem enters the New state, If you have filled the mandatory fields that are necessary to move a problem record to the Assess state, the problem record directly moves to the Assess state.
Assessing a Problem
Roles required: A service desk agent, incident manager, problem coordinator, or other IT user
At Assess state, the Problem Coordinator (PC) is primarily assessing the problem to determine whether it is genuine or not to conduct a thorough investigation.
To assess a Problem,
- In the Problem record screen, click Assess and fill in the mandatory fields
- After assessing the problem, you can perform any of the following.
-
- If the problem is valid and requires a resolution, then click Confirm
- If the problem exists with a similar issue, then click Mark Duplicate
- If the problem is not a valid one, then click Cancel
After PC confirms that the problem needs investigation and a resolution, the problem enters the Root Cause Analysis state.
Investigating Root Cause Analysis
Roles required: A service desk agent, incident manager, problem coordinator, or other IT user
The problem coordinator may now need to engage one or more technical support teams to investigate and potentially help fix the problem.
The problem coordinator performs the following tasks to identify the root cause of the problem.
- If the PC can resolve a problem, then click Fix. The problem state changes to Fix in Progress. You can then create a new Change Request or link to an existing Change Request to apply a permanent fix to the problem.
- If you acknowledge the problem but there is no permanent resolution to the problem, then click Accept Risk. The problem directly enters the Closed or Resolved state with the Resolution code as Risk accepted.
- If you decide to reanalyze the problem, then click Re-analyze. The problem opens for reanalysis and the state is changed to Root Cause Analysis.
Fixing a Problem
Roles required: A service desk agent, incident manager, problem coordinator, or other IT user
The Fix in Progress state represents a problem that is investigated and is now needing a fix. At this state it becomes mandatory to enter Cause notes and Fix notes.
The problem coordinator can create or relate to one or more change requests to show the clear path to resolution.
Once the change(s) are implemented and the problem is considered to be resolved the State field is now updated to Resolved using the Resolve button.
Resolving and Completing a Problem
Resolve the issue and add a detailed note of the resolution for future reference.
Roles required: A service desk agent, incident manager, problem coordinator, or other IT user
To resolve and complete a problem,
- Click Resolve
The problem enters the Resolved state with the Resolution code as Fix Applied.
2. Click Complete
The problem enters the Closed state with the Resolution code as Fix Applied.
PCs can reanalyze the problem even after it is closed by clicking Re-analyze. The state of the problem changes from Closed to Root Cause Analysis.
Verifying Problem details in Resolution Intelligence
To verify the successful integration of Resolution Intelligence with ServiceNow, you will need to verify the problem details that are created in ServiceNow.
To verify the problem details,
- Login to the Resolution Intelligence using your valid credentials.
- Navigate to Resolutions --> ActOns.
- Search problem using the Netenrich ID generated from the ServiceNow UI.
- Find a new conversation associated with the problem appears in the ActOns tab.
- If the problem is not found in the ActOns tab, refer Trouble Shooting.
Comments
0 comments
Please sign in to leave a comment.