Failing SQL Agent Jobs – Part 14 min read

SQL Server Agent Jobs are one of the core features of SQL Server. Agent Jobs perform many actions, from maintenance tasks (server and database), data movement flows, replication, log shipping, data cleanup, health checks and many more. Since Agent Jobs are a critical component in a data organization, it is important to know when Agent Jobs do not succeed. There are several ways to accomplish the monitoring of failed Agent Jobs, from easy to more complex. This four part series is going to cover how you, the DBA, can be notified of failing Agent Jobs in your SQL Server environment.

  • Part 1: Checking remote servers for failed jobs from a Central Console
  • Part 2: Building an e-mail report that notifies you of failed jobs
  • Part 3: Setting up SQL Server Agent Alerts/Operators to notify you when a job fails
  • Part 4: Collecting Agent Job information centrally and reporting off that data

Let’s get started with a basic example of checking our Agent Jobs for failure using our trusty friend dbatools.

First step:

Load the $servers variable with all the SQL Servers we want to monitor for failing SQL Server Agent Jobs.

Second step: 

Now run the Get-DbaAgentJob command for all the servers and store the results in $failedJobs .

Third Step: 

As you can see from the results we get a lot of extra output that does not relate to what we are trying to accomplish.  We get Agent Jobs that have never been executed (Unknown) and Replication jobs that have a Cancelled status.

Next, let’s adjust our filters a bit to remove the Replication jobs from the mix.

This makes the list more manageable.  We now have Agent Job failures and Agent Jobs never executed.  Now we have the ability to track down the Job Owner to see if jobs that have been deployed and never executed are still needed.  You should always keep an eye on things that can be cleaned up.

You can also determine from the list if a Job has a schedule or is enabled.  This will help you determine technical debt as well.  A Job with a last run outcome in 2018 that is disabled is probably not something that is expected to run and might be a candidate for archive and deletion.

We are now to the meat of why we are here.  A list of SQL Agent Jobs that have Failed on their last execution. We can work to identify the failures and fix them or notify the appropriate teams. Again keeping in our mind possible cleanup candidates.  A couple of these failures have no schedule or are disabled.

Fourth Step:

Here are a couple extra examples of filters that might help identify jobs in your echo system that might need attention.

Conclusion:

You learned how to get a list of all SQL Agent Jobs and populate a variable.  Then you learned how to filter that output to help you identify Failed Agent Jobs.  Also, you have the tools to filter the information further to provide Agent Jobs that are technical debt and no longer needed and can be candidates for removal.

Stay tuned for Part 2 of our series where you will learn how to take the output from the command and send an HTML e-mail for review. The easiest way to do that is by subscribing to this blog!

 

If you found this blog handy, please use the Subscribe section at the top of the screen to join my newly created mailing list.  I hope to start delivering content via the mailing list soon. 👆 

Leave a Reply

Your email address will not be published.