Test market hands-on analysis

Experimental testing using a discrete choice framework is definitely a cost-effective way to test a new product concept. Testing costs are lower, testing is more targeted to specific customer segments, and the new product concept is more secure from competitive discovery. More importantly, the effect of specific parameters, the attributes and their levels, can be isolated because of the use of experimental design principles. This, in fact, is the purpose of experimental designs: to allow you to isolate effects. Despite these advantages, hands-on actual market testing is still preferred by many researchers because they believe that only by observing actual market behavior can one tell if a product will not only sell but also reveal any product issues. In this section, I will discuss some ways to conduct actual market tests.

Live trial tests with customers

The development of a discrete choice model for “market” testing was certainly an advance in market research. Discrete choice studies are sometimes referred to as experimental choice studies simply because they are experimental. Sometimes, however, actual market tests are conducted in which a new product is placed in a market for a period of time and its performance tracked to see how it fares against the competition. It is believed that such market tests are superior to experimental tests because they involve actual market conditions.

In-field studies: advantages and disadvantages

There are some advantages and disadvantages to live market testing. A major advantage is that actual consumer purchases are observed under the same conditions that would exist when the product is finally launched. In addition, if a prototype is in the market for a sufficient period of time, enough data could be collected to allow you to build a model to forecast sales once the product is actually marketed. I comment on new product forecasting in Chapter 6.

There are disadvantages that include:

  • 1. Signaling to the competition what the new product will be, how it will be offered, and the likely price points. Just as you monitor your competition to determine their tactical and strategic plans, so they are doing the same regarding your plans. They will become fully aware of the test offering which will allow them to develop their own new product.
  • 2. Test marketing is expensive. Placing any new product into the market is not cost-free, so why should it be any different for a test product? Actual products have to be developed which means that manufacturing must be able to produce sufficient quantities for sale. This may require a retooling of current manufacturing operations which is typically not a trivial issue.
  • 3. Adverse publicity could be created for the product and company if consumers do not like it or something goes dramatically wrong simply because the product is not yet ready for market introduction. The public relations damage could be devastating.
  • 4. Customer and service representatives must be trained regarding the test product so that they are prepared for customer complaints, questions, and product failures that have to be addressed.
  • 5. Some advertising or promotional material has to be developed for the test markets just to get the “word” out for the test product. This is also costly.

In my opinion, the experimental approach to testing a new product is a more cost-effective way to test than an actual market test.

A consumer lab setting: clinics

Some industries use a compromise between actual market tests and experimental tests. The automobile industry, for example, uses car clinics to test new car designs.2 The concept of a clinic is not restricted to cars. It could be applied to any durable good for which consumers have to make a major purchase decision.3 Food and beverage taste testing are also forms of clinics since consumers are brought into a controlled environment and are asked to evaluate a new product (usually several), so clinics are not restricted to durable goods.4

There are some issues associated with running a clinic, security being a major one. Since customers would be shown actual prototypes of a new product, they could conceivably divulge properties of the prototype. Customers should be asked to sign a nondisclosure agreement (NDA) prior to taking part in the clinic. The NDA may be difficult to enforce, however, since a specific person divulging clinic information may be difficult to identify.

An important aspect of a clinic is the data collected. Data collection could be in the form of routine survey questions and/or in the form of a stated preference discrete choice experiment. For the former, questions regarding

  • • likelihood to purchase;
  • • attribute importance ratings;
  • • performance ratings if the consumers could use the product;5
  • • attribute liking;
  • • measures of emotion stimulated by the product (e.g., power, confidence, prestige); and
  • • Just-About-Right measures to mention a few.

The analyses conducted for these measures could be just stand-alone analyses (e.g., simply tabulate the number of responses regardless of the characteristics of the respondents) or they could be by major customer characteristics (e.g., income level, education, socioeconomic status, marketing segment membership).

Data from the first five question types can be analyzed via the techniques I described earlier in Chapter 3 on the design of a new product. The sixth question type, Just-About-Right measures, is more complicated. It is typical for managers in the sensor)' areas (e.g., beverages, foods, and fragrances) to want to understand consumers’ reactions to the sensory experience of their product. This also holds for personal items such as jewelry and automobiles. Automobiles are in this category because of the emotions attached to them as discussed above. These products appeal to a subjective state - the senses - so objective measures are hard to determine. Consequently, the measures commonly used are more subjective regarding “getting it right.” Basically, managers need to know if customers believe some attribute is

  • • too little or insufficient;
  • • just-about-right (JAR); or
  • • too much or too intense.

Penalty analysis is a methodology for determining if a product attribute is less than optimal (i.e. above or below a JAR level) on an overall liking-the-product scale.

An Overall Liking scale is part of the measurement because how customers “like” a product is a predictor of their potential to purchase it. If they say they do not like the product at all, then it is a safe bet they will not buy it. If they say they really like it, the probability7 of buying should be high. The probability' of purchase should vary directly with the degree of liking.

A JAR scale is used because a measure is needed for a sensory evaluation of an aspect of the attribute. Sensory' evaluations are, by their nature, highly subjective. It is believed that customers (primarily consumers) cannot exactly express their sensory evaluation but can only say that the attribute is “about right.” This is a less definitive response to a question regarding whether or not the attribute is correct.

The amount by which an attribute is not correct (i.e., not JAR) indicates the amount of improvement that has to be made. The improvement could be an increase (reflecting “Too Little”) or a decrease (reflecting “Too Much”) in the attribute. Making the indicated improvement would increase the product’s Overall Liking. In general, the Overall Liking will rise as the customers’ sensory subjective evaluation rises to an optimal JAR level, but then fall as the sensory subjective evaluation rises past the optimal JAR level: Too Little of the attribute has a negative impact on Overall Liking while Too Much also has a negative impact on Overall Liking. This is illustrated in Figure 4.4.

This illustrates the impact on Overall Liking of not having the setting of an attribute “just abou

FIGURE 4.4 This illustrates the impact on Overall Liking of not having the setting of an attribute “just about right.” A setting that is too much or too intense or too overpowering will make the product unappealing to customers so their Overall Liking will fall. The same holds if the attribute is too little or insufficient or not enough. The goal is to get the attribute setting “just about right.”

Data collection for a penalty analysis is quite simple. Consumers are first asked to rate their Overall Liking for a product. Typically, a 7-point hedonic scale is used. An example set of points might be:

  • • 1 = Do Not Like the Product at All
  • • 4 = Neither Like nor Dislike
  • • 7 = Like the Product Very Much.

After providing an overall product rating, they are then asked to rate specific sensory attributes of the product. This rating is typically on a 5-point scale. The scale must have an odd number of points so as to have an unambiguous middle - the JAR value. An example might be:

  • • 1 = Too Weak/Little/Small/etc. depending on the product
  • • 3 = Just About Right (JAR)
  • • 5 = Too Strong/Big/Much/etc. depending on the product.

The sensory attributes evaluated depend on the product. Some typical sensory attributes are shown in Table 4.1.

There are several steps in the penalty calculations. For illustrative purposes, consider a food company that developed a new set of flavors for a popular young-adult product they sell. The product managers wanted to assess consumer reactions to the flavors to see if they “got it right.” Since getting it right is personal to the consumer and sensory, a penalty analysis was conducted. First assume that the sensor)' attribute is flavor. A 5-point JAR scale is used with the left, middle, and right points being “Too Mild”, “Just About Right”, and “Too Spicy.” The five points are recoded so that

TABLE 4.1 These are some sensory attributes that might be used in three diverse consumer studies. The Food/Beverage attributes are common. The automotive attributes vary but are representative. The personal care products could be item such as deodorants, perfumes, aftershave lotions, sun screen lotions, general skin lotions, and hair care such as shampoos, conditioners, and colorings.



Personal Care






Better than others


Length of hood

Strength of fragrance


Angle of front windshield

Fragrance intensity


Length of trunk

Application amount


Angle of rear windshield

Strength of applicator


Overall shape



Overall length


  • • Bottom-two boxes is Too Little
  • • Middle box is JAR
  • • Top-two boxes is 7ix> Much.

In addition to the sensory attribute question, an Overall Liking question is asked on a 7-point scale. The average Overall Liking for the product is calculated for each level of the recoded assessment. The drops in the average Overall Liking rating for Too Little and Too Much from the JAR level are determined. These are sometimes called (appropriately) Mean Drops. The weighted average of the Mean Drops is the Penalty for not “getting it right.” Getting it right is the JAR level.

There is disagreement as to exactly what is the “penalty.” Some view it as the Mean Drops and others view it as the weighted average of the Mean Drops. I view the penalty as the weighted average, but a best practice is to report both Mean Drops and the weighted average labeled as “Penalty.” Penalties are usually analyzed by plotting the mean drops against the percent of respondents for that mean drop.

Extending the example just mentioned, 107 young-adult consumers were asked to taste the flavors. They were asked to rate them on overall liking using a 7-point hedonic scale and then to indicate their opinion on flavor as mentioned above plus sweetness, sensation, and sourness. A penalty analysis is shown in Table 4.2. For each attribute, the responses on the 5-point JAR scale were recoded as described

TABLE 4.2 This is an example penalty analysis table summarizing the calculations for four sensory attributes for the new flavors for a popular young-adult product. The Percent column is the percent of the respondents in each JAR category for each attribute. These percents sum to 100% for each attribute (within rounding). The Liking scores sum to 585 for each attribute because the liking rating was for the product regardless of the individual attributes. The table shows the sum of the Liking scores by attribute by JAR category.


JAR Level



Meati Liking

Mean Drop



Too Weak













Too Strong






Too Weak













Too Strong






Too Weak













Too Strong






Too Weak













Too Strong





above and the percent of young adults in each JAR group was calculated. For Flavor/'loo Weak, there were 10 respondents which is 9.3% (= lo/io7X 100). The Liking score from the 7-point liking scale was calculated by simple summation. The 10 respondents for Flavor/Too Weak assigned a total of 52 points to this category. The Mean Liking is just the Liking points divided by 10 respondents. The Mean Drop is the drop in the mean liking for each category from the JAR value. For the Flavor attribute, the Too Weak category is 0.6 points below the JAR value (= 5.2-5.8) while the Too Strong category is -1.4 points below (= 4.4-5.8). Obviously, the JAR category has a zero mean drop. The Penalty for not getting Flavor just right is a drop in the overall liking of the product by 1.2 points. This is calculated as the weighted average of the mean drops, the weights being the proportion of respondents in the Too Weak and Too Strong categories. For the Flavor attribute, the weights are 0 093/(0.093+0.224) = 0.293 for Too Weak and a224/(o.o93+0.224) = 0.707 for Too Strong. The proportion for the JAR category is not included since the respondents in this group do not contribute to either mean drop. The Penalty is then

0. 293.X (-0.6) + 0.707 X (-1.4) = -1.2.

If a stated preference discrete choice experiment is included, take rates could be estimated as described in the previous section and then used for volumetric forecasting.6 The advantage of this approach is that the consumers would have firsthand experience with the new product as opposed to just reading a description of it. Personal experience always dominates.

< Prev   CONTENTS   Source   Next >