Introduction:

A couple of weeks ago I had a customer who was interested in monitoring the duration of some of his SQL-Agent jobs. The jobs were running in a data warehouse environment and were executing some ETL processes for loading data. The goal was to get informed when one of these specific jobs were running significantly longer than usual. Therefore I wrote a short stored procedure and a script which I would like to show you in this blog post!

First of all, I would like to start with explaining the parameter’s of the stored procedure to get a better understanding of the logic:

Input parameter’s:

We have the following input parameter’s for influencing the behavior of the stored procedure:

@SensitivityPercentage: This “float” parameter defines the percentage increase of the last job runtime compared to the calculated average value, which must be exceeded to set the @ThresholdExceeded output variable as true. I set the default value for this Input parameter to “30%” or “0.3“ in decimal, but the value can be overwritten to a less or more aggressive behavior when executing the stored procedure.


@SensivityTotalMinutes: This “integer” parameter defines the increase in minutes of the last job runtime compared to the calculated average value, which must be exceeded to set the @ThresholdExceeded output variable as true. The default value in the stored procedure is 20 minutes, but it can be changed here as well!

Note: both values provided in the sensitivity variables must be exceeded to set the @ThresholdExceeded variable as true. I implemented this logic, for not getting notified when a job is increasing from 1 minute to 2 minutes – the increase would be 100% but as it’s only 1 minute I don’t care😉


@granularity: This “integer” parameter defines how many of the latest entries of the job history should be used to calculate a average value for comparing the last run duration. I set this with a default value of 5, but here as well you can change it.

Note: If in the job history are less than 5 entries, the stored procedure takes automatically the amount of entries which are available to calculate a average value.


@JobId -> This is quite obvious: The ID of the job which should be checked.

Output parameter’s:

To get some interesting values out of the stored procedure, I defined the following output parameters:

@AverageJobRunDuration: This is the average job run duration in minutes.


@RunDurationLastExecution: This is the duration of the last execution in minutes.


@Factor: This is the factor of the duration from the last execution compared to the calculated average value.


@IncreasionPercentageString: This is the increase or decrease of the last duration compared to the average value in a percentage value.


@IncreasionTotalMinutes: This is the increase or decrease of the last duration compared to the average value in minutes.


@OutputMessage: This is a generated output message.


@ThresholdExceeded: This defines if the threshold is exceeded based on the above sensitivity values.

The stored procedure:

See below the stored procedure:

CREATE PROCEDURE [dbo].[dbi_jobruntimeinfo]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--Parameters
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(

--Input
@SensitivityPercentage float = 0.3,
@SensivityTotalMinutes int = 20,
@granularity int = 5,
@JobId uniqueidentifier,

--Output
@AverageJobRunDuration float OUTPUT,
@RunDurationLastExecution float OUTPUT,
@Factor float OUTPUT,
@IncreasionPercentageString NVARCHAR(50) OUTPUT,
@IncreasionTotalMinutes int OUTPUT,
@OutputMessage NVARCHAR(max) OUTPUT,
@ThresholdExceeded BIT OUTPUT

)

AS

Begin

   SET NOCOUNT ON;

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--Gather average runtime information
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
		declare @IncreasionPercentage float
       --@StepCount selects how many stps a specific Job has
       declare @StepCount Int
       Set @StepCount = (select count(*) from msdb.dbo.sysjobsteps where job_id = @JobId)

       --@RowCount dynamically sets how many rows of informations must be queried from job history --> its calculated between granularity and the amount of job steps for a specific job
       declare @RowCount Int
       Set @RowCount = @granularity * @StepCount

	   declare @TotalAvailableRows int
	   set @TotalAvailableRows = (
	   select count(*)
	   From msdb.dbo.sysjobs j 
	   INNER JOIN msdb.dbo.sysjobhistory h 
	   ON j.job_id = h.job_id 
	   where j.enabled = 1  --Only Enabled Job
	   and j.job_id = @JobId
	   and h.step_id != 0)

	   if @TotalAvailableRows <= @RowCount
		Begin 
			if @TotalAvailableRows <= @StepCount
				Begin
				Declare @MessageReturn NVARCHAR(max)
				Set @MessageReturn= 'Not enough History Data for Job with ID: ' +cast(@JobId as nvarchar(50))+ ' There is only: ' + cast((@TotalAvailableRows/@StepCount) as nvarchar (1)) + ' Runtime entrie in the history.'
				Print @MessageReturn
				set @AverageJobRunDuration = NULL
				set @RunDurationLastExecution = NULL
				set @Factor = NULL
				set @IncreasionPercentageString = NULL
				set @IncreasionTotalMinutes = NULL
				set @OutputMessage = @MessageReturn
				set @ThresholdExceeded = NULL
				Return
				End

			Set @RowCount = @TotalAvailableRows
			set @granularity = (@RowCount - @StepCount) / @StepCount
		End
			else
				Begin
					set @RowCount = @RowCount + @StepCount
				End
			print cast(@RowCount as NVARCHAR(max)) + ' Rowcount'
			print cast(@granularity as nvarchar(max)) + ' Granularity'

       --@SumRunDuration is for summing the total run Duration of all job executions defined in @granularity
       declare @SumRunDuration int

       Set @SumRunDuration = (Select SUM(RunDurationMinutes) from
       (select Top (@RowCount) j.name as 'JobName', h.step_id as 'StepID',
       msdb.dbo.agent_datetime(run_date, run_time) as 'RunDateTime',
       ((run_duration/10000*3600 + (run_duration/100)%100*60 + run_duration%100 + 31 ) / 60)
                     as 'RunDurationMinutes'
       From msdb.dbo.sysjobs j 
       INNER JOIN msdb.dbo.sysjobhistory h 
       ON j.job_id = h.job_id 
       where j.enabled = 1  --Only Enabled Job
       and j.job_id = @JobId
       and h.step_id != 0
       Order by run_date desc, run_time desc) SubSelect)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--Gather last runtime information
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

       declare @RowCountLastExecution int
       set @RowCountLastExecution = 1 * @StepCount

       set @RunDurationLastExecution = (Select SUM(RunDurationMinutes) from
       (select Top (@RowCountLastExecution) j.name as 'JobName', h.step_id as 'StepID',
       msdb.dbo.agent_datetime(run_date, run_time) as 'RunDateTime',
       ((run_duration/10000*3600 + (run_duration/100)%100*60 + run_duration%100 + 31 ) / 60)
                     as 'RunDurationMinutes'
       From msdb.dbo.sysjobs j 
       INNER JOIN msdb.dbo.sysjobhistory h 
       ON j.job_id = h.job_id 
       where j.enabled = 1  --Only Enabled Job
       and j.job_id = @JobId
       and h.step_id != 0
       Order by run_date desc, run_time desc) SubSelect)

       --@AverageJobRunDuration calculates the avarage runtime for a single job execution. Calculateion is made through @SumRunDuration and the @granularity
       set @AverageJobRunDuration = (@SumRunDuration - @RunDurationLastExecution) / @granularity
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--Calculate Factor
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       
       Begin try
              set @Factor = (1/@AverageJobRunDuration)*@RunDurationLastExecution
       End try
       Begin catch
              set @Factor = NULL
       End catch

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--Check if sensitivity threshold is exceeded
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

       if @Factor != 0 and @Factor !< 0 and @Factor Is Not NULL
       Begin
              set @IncreasionPercentage = @Factor -1
              set @IncreasionTotalMinutes = @RunDurationLastExecution - @AverageJobRunDuration

				if @IncreasionPercentage > 0
				Begin
					set @IncreasionPercentageString = '+' + convert(nvarchar(max), @IncreasionPercentage*100) + '%'
				End
					Else
					Begin
					set @IncreasionPercentageString = convert(nvarchar(max), @IncreasionPercentage*100) + '%'
					End

       End 
              Else 
              Begin
              set @IncreasionPercentage = NULL
			  set @IncreasionPercentageString = NULL
              set @IncreasionTotalMinutes = NULL
              End

       If @IncreasionPercentage Is Not Null and @IncreasionTotalMinutes is not null and @IncreasionPercentage >= @SensitivityPercentage and @IncreasionTotalMinutes >= @SensivityTotalMinutes
       Begin
              Set @OutputMessage = 'Job with ID: ' + cast(@JobId as nvarchar(200)) + ' has exceed the defined thresholds. ' + 'The duration has increased by factor: ' +  cast(@Factor as nvarchar(200)) + ' Which is in total ' + cast(@IncreasionTotalMinutes as nvarchar(200)) + ' minutes.'
              Set @ThresholdExceeded = 1
       End
       Else
              Begin
              Set @OutputMessage = 'Job with ID: ' + cast(@JobId as nvarchar(200)) + ' has not exceeded the defined threshold'
              Set @ThresholdExceeded = 0
              End

End
GO



Execution Script:

In case of my customer I implemented a daily running job which is executing the stored procedure through an execution script. In this case a daily schedule was appropriate, as the ETL jobs were also running on a daily base.

In the script below, you can put every job name which should be monitored in the @Jobs variable and divide the names by semicolons. The STRING_SPLIT function will separate the string by semicolons and load the cursor with the job names. Note: that this functions is available from compatibility level 130 and above.

After the cursor is loaded with each job, the script will execute the stored procedure for each of them. If the threshold is exceeded for any of the provided jobs, the execution script will use SQL-Server mail to inform. To get informed you must put in the appropriate mail profile and recipients mail addresses in the @profile_name and @recipients input parameter!

See below the execution script:

declare @Jobs NVARCHAR(max)
Set @Jobs = ''

Declare @Job_Id uniqueidentifier
Declare @JobName Nvarchar(max)

--Load Jobs i cursor
declare jobs_cursor cursor
       for select value from STRING_SPLIT(@Jobs, ';')

open jobs_cursor
fetch next from jobs_cursor into @JobName

While @@FETCH_STATUS = 0
Begin

       --set JobId from Jobname
       set @Job_Id = (select j.job_id from msdb.dbo.sysjobs j where j.name = @JobName)

       --Exec SP
       declare @AverageJobRunDurationOut float
       declare @RunDurationLastExecutionOut int
       declare @FactorOut float
	   declare @IncreasionPercentageStringOut NVARCHAR(50)
       declare @IncreasionTotalMinutesOut int
       declare @OutputMessageOut NVARCHAR(max)
       declare @ThresholdExceededOut BIT

       exec dbo.dbi_jobruntimeinfo @SensivityTotalMinutes = 1,
       @JobId = @Job_Id,
       @AverageJobRunDuration = @AverageJobRunDurationOut Output,
       @RunDurationLastExecution = @RunDurationLastExecutionOut Output,
       @Factor = @FactorOut Output,
       @IncreasionTotalMinutes = @IncreasionTotalMinutesOut Output,
       @OutputMessage = @OutputMessageOut Output,
       @ThresholdExceeded = @ThresholdExceededOut Output,
	   @IncreasionPercentageString = @IncreasionPercentageStringOut Output;

       Select 
	   @JobName as 'Job',
	   @AverageJobRunDurationOut as 'AverageDuration',
	   @RunDurationLastExecutionOut as 'DurationLastExecution',
	   @IncreasionPercentageStringOut as 'IncreasionInPercent',
	   @IncreasionTotalMinutesOut as 'IncreasionMinutes',
       @ThresholdExceededOut as 'Threshold exceeded',
	   @FactorOut as 'Factor',
	   @OutputMessageOut as 'Message'

	   

	   if @ThresholdExceededOut = 1
	   Begin

		declare @bodycontent nvarchar(max)
		set @bodycontent = 'Job Name = '+@JobName+'; Average duration in minutes = '+cast(@AverageJobRunDurationOut as nvarchar(max))+'; Duration of last execution in minutes = '
		+cast(@RunDurationLastExecutionOut as nvarchar(max))+'; Slow down in percent = '+
		@IncreasionPercentageStringOut+'; Slow down in minutes = '+cast(@IncreasionTotalMinutesOut as nvarchar(max))
		
		declare @subjectcontent nvarchar(100)
		set @subjectcontent = 'SQL Agent Job: '+@JobName+' has slowed down'

		exec msdb.dbo.sp_send_dbmail
			@profile_name = '',
			@recipients  = '',
			@subject = @subjectcontent,
			@body = @bodycontent
       End

       fetch next from jobs_cursor into @JobName

End
CLOSE jobs_cursor;
DEALLOCATE jobs_cursor;

Let’s see how it works!

For testing I have two jobs as shown below:

The only thing which these jobs are actually doing is waiting:

The job Test_dbi_jobruntimeinfo has one step and the job Test_dbi_jobruntimeinfo_2 has two steps:

For demonstration purpose I made sure, that the history for both jobs is empty:

Let’s start and see how the stored procedure is going to behave when no data is in the history. Therefore I’m putting the name of both jobs in the @Jobs variable of my execution script:

In this demonstration I’m also selecting the output values from the stored procedure for showing the returned values:

I’m now executing the script and these are the values which are returned:

We can see that every value is “NULL” and the message says, that there is no history data available for the jobs. The stored procedure is even telling us the amount of available history data which is 0.

Let’s execute the jobs to generate a first entry in the history. Note: that each job step is running 35 second which would be rounded by the stored procedure to 1 minute. So the stored procedure will consider the job Test_dbi_jobruntimeinfo with one step as running one minute while the job Test_dbi_jobruntimeinfo_2 would be considered by the stored procedure as running two minutes.

The jobs are executed for the first time:

Now let’s execute again the script and see if we get any values:

We can see, that we still didn’t get any values. As we have only one entry in the history we obviously can’t compare the last run duration with the previous. But the message is now telling us, that we have “only 1” entry in the history.

I’m now executing both jobs again and after that the script and we can see from the script the following data:

We can see, that we have for the first time real data. As mentioned previously: In case that there is not enough history data to calculate the average according to the @granularity input parameter (which is 5 per default), the stored procedure is just taking the amount of history data which is available. So in this case were we have 2x entries in the history, the stored procedure is just taking the first duration as the average and compares the second execution to it.

I’m now executing the jobs again but now I’m changing the waiting duration within the job steps. For the first job Test_dbi_jobruntimeinfo, with only one step, I’m changing the wait duration to 1 minute and 35 seconds which will be considered by the stored procedure as 2 minutes. So we can expect for this job an increase of 100% returned by the stored procedure. For the other job Test_dbi_jobruntimeinfo_2 I’m doing the same but only for one step. So we can expect there an increase of 50% returned by the stored procedure:

After executing the jobs with the changed duration and executing the script, we got the following values back from the stored procedure:

We can see, that we have the increase exactly as expected in the output. But the threshold value is still at “0 = false”, as we didn’t reached the default sensitivity value in minutes of 20 minutes.

I’m changing now the waiting duration back to 35 seconds and execute the jobs and the script again. We can expect now a decrease, as the previous increase will be considered when the stored procedure is calculating the average value. After doing so, the following values are returned as expected:

The stored procedure is telling us, that the decrease was 0 minutes because its working with rounded values. But for the percentage value we can see a small decrease.

I’m now changing the wait time for the job Test_dbi_jobruntimeinfo_2 back to 1 minute and 35 seconds but this time for both steps and execute this job again. We can see that we have now again an increase for this job:

In my execution script, I send a mail over the SQL-Server Mail in case when the threshold is exceeded to inform interested people:

To demonstrate the generated information mail, I overwrite the default sensitivity value in minutes, which is 20 minutes, as I’m too impatient to wait 20 minutes 😉:

After executing the script again, we can see, that the threshold value for the job Test_dbi_jobruntimeinfo_2 is now set to “1 = true”:

And when I check the interface of the test SMTP Server I can see the following mail:

Summary:

For some SQL Agent jobs it can be interesting or necessary to keep an eye on it. Feel free to use the stored procedure and script from this blog or get inspired by them when you are facing a similar situation!