# Discrete choice background – high-level view

A discrete choice model was outlined in the above paragraphs. In the remainder of this section I will describe the model and some key output. See Paczkowski [2016] and Paczkowski [2018] for more details.

*The basic model*

A fundamental concept in the microeconomic theory of consumer demand is utility maximization. A consumer buys a combination of products that maximize the utility he/she receives from that combination. These products are continuous so they can be infinitely divided. The products considered in this section are not continuous: you either buy a product or not; it is not infinitely divisible. A box of cereal is purchased or not; you cannot buy half a box or a quarter of a box. The unit of measure is discrete. The utility maximization concept, however, can still be applied to the discrete product case but with modifications.

Suppose there are C consumers in a market consisting of *J* products. Define a measure of utility, *U,* for consumer *i, i =* 1,..., C, for product *j, j* = 1*,... ,J.* as

The term *V-* is a measure of the average utility received by all consumers for product *j.* This is sometimes called systematic utility. Each consumer’s utility for product *j *will differ from this average by an unobserved and unobservable random factor characteristic of the individual. The term e,-.- is a measure of this random variation. Total utility is thus composed of two parts: a systematic part, the *V.,* and a random part, *e, _{r}* It is a random variable because of the random part of utility.

A specific product is purchased if the total utility received from it is greater than that received from any other product. That is, utility is maximized when

Unfortunately, because of the random part of utility, you cannot make definitive statements about what will be purchased by an individual consumer. You can, however, make a probabilistic statement. That is, you can determine the probability a consumer, selected at random in the market, would buy product*j, j* = 1,...,/. This is expressed as

Substituting (4.1) into (4.3) and rearranging terms yields

This has the form of a cumulative probability function. To complete the specification, a probability function is needed for the random components. As noted in Paczkowski [2018], three possibilities are

- • Linear;
- • Normal; and
- • Extreme Value Type I (
*Gumbel*)

distributions. The Linear distribution results in a linear model; the Normal results in a *Probit* model; the Extreme Value Type I distribution results in a *logit* model. The linear model is just *OLS.* The Probit model is popular in academic research but is a challenge to use in model estimation. The logit model is popular in market research and other practical applications because of its simplicity. Assuming the Extreme Vilue Type I distribution, the logit model is

This is sometimes called a *multinomial logit choice model* (*MNL*) although this is not a good name since there is another multinomial logit model that is more complicated. A better name is a *logistic model* or *conditional logit model (CLOGIT).* The latter name emphasizes that the model is conditioned on the systematic utility. I often refer to the probability as a *take rate.* For a discussion of the derivation of (4.6), see Paczkowski [2018] and Train [2009]. Also see McFadden [1974] for the original discussion and derivation of the conditional logit model.

As I noted above, each product is defined by a set of attributes with each attribute defined by different levels. These attributes and their levels define the systematic utility. In particular, the systematic utility is specified as a linear function of the attributes and their levels:

where *X. _{m}, m =* 1...../> is the measure for attribute

*m*for product

*j.*The parameters, /?! ,/?

_{2}.....

*P*are sometimes called part-worth utilities as in conjoint analysis.

_{p},They are the same for each alternative product in the choice sets so there is just one set of part-worth utilities. This is the sense in which the choice probability is “conditional” - the probability is conditioned on the parameters being the same for all products in a choice set. See Paczkowski [2018| for some comments about this conditionality.

If the attributes are discrete, then the measures are appropriately coded dummy or effects variables. See Paczkowski [2018] for a thorough treatment of both types of coding for discrete choice as well as conjoint studies. Notice that a constant term is omitted from the systematic utility specification in (4.7). This is due to a property of this class of models called the *Equivalent Differences Property* which states that a constant factor will have no effect on the choice probability because the constant factor cancels from the numerator and denominator of the choice probability. The intercept is certainly such a constant. Sometimes you may want to have a constant in the model so this property is a potential drawback. There are ways around it which are discussed in Paczkowski [2018 ].

There is another property called the *Independence of Irrelevant Alternatives* (*IIA*). Consider a case of just two products. The *IIA* property states that only the features of these two products count for the choice probabilities while the features of other products do not. This could be a problem or a benefit. The issues are too complex to develop here, but see Paczkowski [2018] for an extensive discussion.

The discrete choice model (4.6) is estimated using a maximum likelihood procedure. McFadden [1974J presents the details of estimation. Also see Train [2009] for an excellent discussion of estimation.

*Main outputs and analyses*

The estimated part-worth utilities are of some interest in a discrete choice problem just as they are in a conjoint problem because they show the importance of each level of each attribute. An attribute importance analysis could be done as for conjoint. This helps the product designers finalize their design because they would know the weight placed on each attribute. A better way to finalize the design is to recognize that the estimated part-worth utilities used in conjunction with different settings of the attributes’ levels produce an estimate of the choice probability for each possible product defined by the attributes. If there are *J* products, then there are *J* probabilities. These can be interpreted as estimates of market share given the settings of the attributes. A simulator could be built that allows the product designers and the product managers to try different levels of the attributes, basically different sets of assumptions about the attributes, to determine which settings give the maximum estimated market share. Since competitive products are usually included in choice sets, a competitive scenario analysis can also be done to gauge competitive reactions and possible counter-moves. The results of these simulations would help finalize the design for launch. See Paczkowski [2018] for some mention of simulators.

**Pricing and willingness-to-pay analysis**

One attribute included in the choice study must certainly be the price, both for the new product and the competition’s product. Marketing and pricing management could then use the simulator to test different price points, along with the settings of the other attributes, to determine the best price point for the product’s launch.

There is more, however, that could be gained by including prices in a choice study. The willingness-to-pay (*WTP)* for different settings of the attributes (excluding price, of course) can be estimated. The estimated *IVTPs* show the value consumers attach to each level of the product. Product managers could then determine which attributes and their levels should receive the most emphasis in the final design: those attributes and levels the consumers are willing to pay the most to get.

Since the part-worth utilities show the importance of each attribute and the levels, it stands to reason that the *WTP* should be a function of these part-worths. And they are. The actual formulation depends on the coding used for the discrete attributes. Without loss of generality, assume the price is designated as attribute 1 so /?, is its part-worth. You should expect /?, < 0: the higher the price, the lower the utility received. If dummy coding is used, it can be shown that the willingness-to-pay for a level *m* of attribute *к* is given by

The negative sign offsets the negative value of /?, so *WTP^,,* > 0. The *WTP* is just the scaled part-worths. If effects coding is used, then

A full discussion of these results and their derivations is in Paczkowski [20181.

**Volumetric estimation**

The estimated choice probabilities could be used to estimate total sales volume and expected revenue. To estimate volume, an estimate of the potential market size is needed. I call this the *addressable market:* the number of customers in the segments targeted for the new product. If *N* is the size of the addressable market and *Pr(j)* is the estimated choice probability for product /, that is, the estimated market share for product /, then the expected sales volume is simply ;Yx *Pr(j).* Given the price point used to determine the choice probability, the expected revenue is *N* X *Pr(j)* X P_{; }where P_{;} is the price for product *j.* Paczkowski [2018] outlines this estimation procedure in more detail.

112 **Deep Data Analytics for New Product Development**

*Software*

The commercial software products that can handle discrete choice problems include JMP, SAS, Stata, and Limdep. JMP is especially good because it has platforms to handle all aspects of design, estimation, and reporting. The open source software R will also handle these problems but this will require some programming.

*Case study*

As a case study, consider an example used in Paczkowski [2018] of a drone manufacturer, FlyDrone Inc., that developed a new drone model targeted to hobbyists. FlyDrone’s marketing team identified five attributes and their levels plus two competitive brands, Birdie Inc. and Sky Fly Inc., for a total of six attributes. The attributes and their levels are:

- • Price: $79.99, $99.99, $119.99
- • Maximum Flight Time (in minutes): 5, 7, 9
- • Integrated GPS: Yes, No
- • Integrated Camera: Yes, No
- • Video Resolution: 640x480, 720x576, 1280x720
- • Brand: FlyDrone Inc, Birdie Inc, SkyFly Inc.

A panel of hobbyists were recruited to complete a survey of their drone use: how many drones they own; when they started using drones as a hobby; how much they paid for their drone; how often they fly it; and any plans to buy another one and at what price range.

As part of the survey, they were asked to complete a choice exercise for the new drone product. Each hobbyist was shown 12 choice sets, each set consisting of two alternative products identified by the six attributes listed above. In addition to the two products, a Neither option was included so there were really three options in each choice set. The Neither option allows the hobbyist the chance to not select one of the two products if neither is satisfactory. For each choice set, the hobbyist had to select one product or the Neither option. An example choice “card” (which actually appeared on a computer monitor) is shown in Figure 4.1.

A choice model was estimated using a maximum likelihood procedure. The dependent variable was the choice (including the Neither) for each choice set, for each hobbyist. The independent variables were the six attributes, each one effects coded. The estimated part-worth utilities are shown in Figure 4.2.

Take rates for different settings of the independent variables can be calculated in a specially written program, a spreadsheet, or with a simple hand-held calculator. The *WTP* calculations for the drone study are shown in Figure 4.3. See Paczkowski |2018] for an interpretation of these results.

**FIGURE 4.1 **This is an example of what a choice card presented to the drone hobbyist. Each card had two alternative products and a *Neither* option. Each hobbyist had to select one of the products or Neither. See Paczkowski [2018]. Permission to reproduce from Routledge.

**FIGURE 4.2 **Estimated choice parameters for the drone example. See Paczkowski [2018]. Permission to reproduce from Routledge.

FIGURE 4.3 The HTTP was calculated for each attribute for the drone example. See Paczkowski [2018]. Permission to reproduce from Routledge.