Trending of microbiological indicator test results

Trending of microbiological indicator testing results can make it easier to spot patterns in your data and better manage the risks associated with environmental resource usage. Depending on the nature of the test results, there are a number of commonly encountered problems that technical managers can run into when trending their microbiological test results data. This page attempts to summarise these difficulties and explains best ways to circumvent them. 

Arithmetic means

In microbiology, an arithmetic mean is also known simply as an average (or sometimes a log mean or log average). If you collected five samples and the results were: 

  • 230 cfu/100ml/
  • 1,000 cfu/100ml
  • 2,650 cfu/100ml
  • 150,000 cfu/100ml
  • 25 cfu/100ml 

Then you would calculate the arithmetic mean by first of all adding the counts together: 230+1000+2650+150000+25 = 153,905 cfu/500ml 
Then dividing this result by the number of samples: 153,905/5 = 30,781 cfu/100ml. 

Results can be plotted over time:

A diagram of a graph.
Figure 1 A trend of water test samples where the replicate results have been averaged as an arithmetic mean

The main drawback of using arithmetic means for indicator counts is that a disproportionate weighting is given to high values. In the above example, 4 out of 5 of the counts are below 3,000 cfu/100ml. However, the arithmetic mean is over 30,000 cfu/100ml because of the disproportionate influence that the arithmetic mean gives to the single high value of 150,000 cfu/100ml. Arithmetic means can be useful for process control purposes only if samples have consistent test results which fall within a very narrow range of values. It is common to use the log of the arithmetic mean to make it easier to plot trends on a graph. In the above example the log of 30,781 is 4.49. If the log is taken, then the result is called a "log mean" or "log arithmetic mean". Taking the log of the arithmetic mean is optional though, and largely down to personal preference.

Geometric means

A geometric mean is also known as a mean log. You calculate a geometric mean by taking logs before you take any averages. For example if you sampled a water source five times and the total aerobic count results were:

  • 230 cfu/100ml
  • 1,000 cfu/100ml
  • 2,650 cfu/100ml
  • 150,000 cfu/100ml
  • and 25 cfu/100ml

Then you would calculate the geometric mean by first of all using a calculator to provide the log values for all your counts:

  • Log 230 = 2.36
  • Log 1000 = 3.00
  • Log 2650 = 3.42
  • Log 150000 = 5.18
  • and log 25 = 1.40

Then you would add all these values together: 2.36+3.00+3.42+5.18+1.4 = 15.36

Divide this result by the number of samples taken: 15.36 divided by 5 = 3.08 cfu/100 ml to give the final answer.

The main benefit of using geometric means for process control is that they help smooth out the effects of occasional very high or very low values. If you have not done so already, have a look at the arithmetic mean information immediately above this section. The arithmetic mean example uses the same test results for its calculations as this example, but the result for the geometric mean is lower. Geometric means are good for samples with a range of counts. The higher values in this range are typically 10-100 times the value of the lower counts. 
A further example of how geometric means help to reduce the uneven weighting of an occasional high count is shown in the graph below. The graph shows the same data as above calculated as a geometric mean). The geometric mean is always lower than the arithmetic mean, due to log values being used. In addition, some of the 'spikes' on the blue line (above, arithmetic mean) are smoothed out on the red line (geometric mean).

A diagram of a graph.
Figure 2 A trend of water test samples where the replicate results have been averaged as geometric mean.

Rolling Geometric Means

A rolling geometric mean is calculated in a similar way to a geometric mean because you start off by taking the logs of your indicator counts, adding them together and then dividing by the number of samples that were tested. The main difference between a geometric mean and a rolling geometric mean is that all of the results from several weeks are used to calculate a rolling geometric mean. It is common in microbiology to use a six week rolling geometric mean. To calculate a six week rolling geometric mean, you would begin taking samples and saving the results until you had six weeks-worth of data (NB: the six weeks that are used don't have to be consecutive). Then you would log all of the data and divide by the total number of samples that were taken during that entire six week period. 

When the seventh week results came back from the lab, you would discard the first week’s data and take all of the data from week two to week seven, log it as before and divide by the total number of samples taken between week two and week seven. For week eight, you would use all the data from week three to week eight. For week nine you would use all the data from week four to week nine and so on.

Rolling geometric means are useful for samples whose trends jump around from one week to the next. An example of a situation in which a rolling geometric mean is useful is shown in the example below. Both graphs are the same set of test results. The red line is a geometric mean, the purple line is a six-week rolling geometric mean. The purple line makes the trend easier to see because it removes the “noise” from the results.

A diagram of a graph.
Figure 3 A trend of water test results where the same replicate results have been averaged as a geometric mean (red line) and a six week rolling geometric mean (purple line)

Cumulative sums

Cusums or cumulative sums are a running total of how your test results compare with a baseline average. The baseline average is also calculated from your test results data, so a cusum shows how your samples are performing against their historical selves. Cusums tend to exaggerate hygiene trends in data, so this type of comparison is especially useful for your business if your bacterial indicator counts normally fall within a narrow range of values. To calculate a cusum trend you would look back over your laboratory reports and choose a block of time that you think is representative of operating conditions in your plant. For guidance, an appropriate length of time to use is 13 weeks (or three months) of test data. 
Next you would take all of the test results within that time period, log them, add up all the log values and finally divide by the number of samples to give the baseline geometric mean over your 13 week period. The next stage is to go back through all of your data and individually calculate a geometric mean for every week that you have results. Then, for every week, subtract the weekly geometric mean from the baseline average to find out how your results compared for that week with the baseline. Finally you add this result to a running tally (the cumulative sum) of how your plant’s results compare to your plant’s baseline average. The explanation may seem a bit complicated, but if you have a look at the table below, and then re-read the above paragraph, it will (hopefully) all become a bit clearer.

Date Weekly Geometric Mean Baseline Average Difference  Running Tally
10/10/08 1.9 1.8 0.1 0.1
17/10/08 2.2 1.8 0.4 0.5
24/10/08 2.3 1.8 0.5 1
31/10/08 2.4 1.8 0.6 1.6
07/11/08 0.8 1.8 -1.0 0.6
14/11/08 1.1 1.8 -0.7 -0.1
21/11/08 1.3 1.8 -0.5 -0.6
28/11/08 2.4 1.8 0.6 0.0
05/12/08 0.5 1.8 -1.3 -1.3
12/12/08 1.6 1.8 -0.2 -1.5
19/12/08 1.4 1.8 -0.4 -1.9
26/12/08 1.9 1.8 0.1 -1.8
02/01/09 1.8 1.8 0 -1.8

If you were to plot out your cumulative sum (i.e. the running tally) on a graph it would look like this:

A diagram of a graph.
Figure 4 An example of a cumulative sum being used to trend water test results

When the line on the graph is going up, then your test results are worse than during the period that you calculated the baseline. When the line is going down, hygiene is better. When the line is level, hygiene is the same. In the example graph above, hygiene was getting worse for the samples throughout October, then through November and the first part of December it got progressively and consistently better, with a slight worsening during the end of November. In the last two weeks of December, plant hygiene was the same as during the baseline period.