Guidelines for Consumer Testing - guidance from ESN members
In this series ESN members give their solutions to the most frequently asked questions from product developers and marketers.
How can I measure the sensory variation in my product?
Product standardisation is a basic requirement in sensory quality control systems. This involves measuring the sensory variability and defining the tolerance limits of a product. Standardisation procedures can clarify such questions as
- Are the characteristics of a product maintained and reproduced during the production?
- What is a product’s typical variability within one production line
- What is the variability between production lines?
- What are the most variable and therefore most critical product attributes?
The numerous sensory properties of a product may vary in appearance, flavour or texture. Such differences may either be induced by variations within the production process, or they may be due to the heterogeneous nature of the basic materials. The utilisation of methods that test for overall difference, such as the “triangle test”, works well when dealing with homogeneous (food) products in which the variability between production lots is caused by treatment differences, but Fiorella Sinesio, from the National Research Institute on Food and Nutrition (INRAN) in Italy warns “with products made from more heterogeneous basic materials, this technique becomes ineffective, as it gives a high frequency of false statistically significant results”.
She recommends, “The best approach to measure variability in highly variable products is to establish a variability baseline by measuring within-lot differences from one control lot and between-lot differences by comparing a product from a second control lot with one from the first control lot. A test product’s variability is measured against the variation within the ratings of the control lot against itself, as well as the between lot variation” (Model 1).
A t-test can be used to compare the mean scores of the test product and the control product.
When the within-lot variability is statistically insignificant, the within-lot differences do not need to be determined. “In this case a more appropriate approach is to pair one test product to two control lots to determine whether the variability in the intensity of the attributes falls within or outside the variability between the two lots.” (Model 2). Thus, respondents rate the perceived degree of difference within each pair on a multiple-point category scale, or a line-marking scale, typically with anchors of 'not at all different' at one end to 'extremely different' at the other end. Sinesio states that, “An overall difference scale works well for a single characteristic of variation, but for complex products with a variety of characteristics, descriptive scales, with anchors “little” and “much”, are the most appropriate measurement tools”.
She continues, “An extension of this method reduces the risk of false results. In considering two test lots, the method assumes that heterogeneous products can have variable test results. The model incorporates the between-control lots variability, along with the between-test lot variability. By doing this it is possible to compare each test lot against each control lot” (Model 3).
Fiorella Sinesio affirms that the number of panellists may vary depending on the desired strength and effectiveness of the test and the number of product pairs to be presented to the panellists. A panel may often have 10-12 assessors. However, it may also have as many as 20, 30, or even more if the panel members have different sensitivities or training. “If the number of product pairs is high, an incomplete block design will limit the number of product pairs to be presented to the panellists.”
Panellists compare the degree of difference between three pairs of samples:
control lot sample (C1) against the same control lot sample (C1) (within lot difference)
control lot sample (C1) against control lot sample (C2) (between lot difference)
test lot sample(T) against control lot sample (C1)
Panellists compare three pairs of samples:
- test product sample (T) against first control lot sample (C1)
- test product sample (T) against the second control lot sample (C2)
- control lot sample(C1) against control lot sample (C2) (baseline difference)
Panellists compare the degree of difference between six pairs of samples:
- baseline difference C1 against C2; T1 against T2
- test product – control lot difference T1 against C1 T1 against C2
- test product – control lot difference T2 against C1 T2 against C2
A manufacturer wants to measure the variability of his production. He knows that the variability within a single lot is slight; therefore he applies model 2 to measure the variability between different production runs (or manufacturing sites).
To get strong results the manufacturer runs the test using 30 panellists.
To avoid serving position effects when the samples are presented, the three pairs of samples are balanced within and across pairs. The overall mean rating of the perceived degree of difference of each pair is calculated and three statistical contrasts are applied to compare measures of test-control differences (C1T, C2T) to the normal control lot variation of sensory attributes (baseline difference C1-C2).
For more information please contact:
INRAN - Istituto Nazionale di Ricerca per gli Alimenti e la Nutrizione
Via Ardeatina, 546
Aust L.B., Gacula M.C., Jr., Beard, S.A. & Washam, R.W. II (1985). Degree of difference test method in sensory evaluation of heterogeneous product types. Journal of Food Science, 50, 511-513.
Lawless H.T. & Heymann H (1988). Sensory evaluation of food. New York: Chapman & Hall.
Mugnoz, Civille & Carr. (1992). Sensory evaluation in quality control.
Pecore S., Stoer N., Hooge S., Holschuh N., Hulting F. & F. Case (2006). Degree of difference testing: a new approach incorporating control lot variability. Food Quality and Preference 17, 552-555.
T. A. Young, S. Pecore, N. Stoer, F. Hulting, N. Holschuh, F. Case (2008). Incorporating test and control product variability in degree of difference tests. Food Quality and Preference, 19, 734-736.