Decision Theory

Franco Taroni, Silvia Bozza, and Alex Biedermann

Introduction

Forensic scientists, lawyers and other participants of the legal process are routinely faced with problems of making decisions under circumstances of uncertainty. Uncertainty relates to propositions of interest that are not completely known by the decision-maker at the time when a decision needs to be made. Propositions may relate to the source or nature of forensic traces, marks and objects. For example, with friction ridge marks, propositions of interest may be 'Does this fingermark come from the person of interest (POI) or from some unknown person?'. In forensic document examination, a scientist may ask 'Is this a genuine document or has it been modified (e.g., page substitution)?'. In forensic anthropology the question 'Are these human remains?' may arise, and so on. Replying in one way or another to such questions may be perceived as uncomfortable since knowledge about the relevant underlying truth-state of the world is incomplete to some extent. For example, in typical real-world applications of forensic science it is not known with certainty, when deciding to consider a POI as the source of a particular fingermark, whether the POI is in fact the source of the fingermark. Similarly, at an advanced stage of the legal process, the question of whether to convict or acquit a POI (i.e., the verdict) needs to be made in the presence of incomplete knowledge about whether or not the POI truly is the offender. There are analogies between the above questions, in terms of their logical underpinnings, that can be studied, analysed and described using formal methods, such as decision theory, which will be the main aim of this chapter.

Around the middle of the past century, discussions intensified and several fields of study emerged on decision-making concerning, for example, contexts where decisions have monetary consequences. These developments gravitated around questions such as how decisions should be made in order to be considered rational (Pratt et al., 1964). Though an important area, economics was not the only branch with strong interests in decision-making and decision analysis. Entire fields of study developed and interacted with each other in various ways, including psychology, mathematics and statistics, the law and philosophy of science, among others.

This chapter will primarily rely on statistical decision theory[1] as developed by Savage (1954) and in subsequent treatises (e.g., Lindley, 1985; Luce and Raiffa, 1958; Raiffa, 1968) as the framework for studying the formal structure of decision problems arising in forensic science and the law. Before proceeding with this presentation, an important preliminary needs to be considered. It deals with the question of how to understand decision analysis and the notion of theory of decision. To this point, the field of judg- ment and decision making, a branch of applied psychology, has contributed considerably by crystallizing three main perspectives and approaches, known as the descriptive, the normative and the prescriptive view (Baron, 2008; French et al., 2009). For a review of the history of these terms, see Baron (2006). Broadly speaking, the descriptive approach focuses on peoples' observable decision behaviour and extends to the development of psy- chological theories intended to explain how individuals make decisions. Such research is valuable in that it allows one to better understand the conditions under which deci- sion behaviour departs towards incoherence or, worse, logical error. However, revealing such departures requires reference points against which observable decision behaviour can be compared. The provision of such reference points, also called normative standards, is the object of study of the normative approach. Decision theory and decision criteria (or, norms) derived from it, fall into this category of study. It is mainly pursued by mathe- maticians, statisticians and philosophers of science. The third perspective, the prescriptive approach, addresses the question of what recommendations ought to be derived from normative insights in order to improve practical decision making. For example, some strict normativists, such as Lindley (1985), consider that the normative concept of prob- ability - that is, a standard for reasoning under uncertainty - and decision theory as its extension, are also prescriptive in the sense that they provide direct prescriptions on how to arrange one's reasoning and acting. Properly distinguishing the different inten- tions and goals of these kinds of decision science research is important for an informed discourse about notions of decision and decision analysis in forensic science applications (Biedermann et al., 2014).

This chapter is structured as follows. Section 5.2 outlines standard elements of statistical decision theory that will be exemplified in Section 5.3 for decision problems arising in the law in general (Section 5.3.1) and forensic science in particular (Section 5.3.2). This exposition will include examples such as decisions following forensic inference of source

(i.e., identification/individualization; Section 5.3.2.1). Discussion and conclusions will be presented in Section 5.4. Further readings on applications of decision theory in forensic science and treatments of decision theory in general are given in Section 5.5.

Concepts of Statistical Decision Theory

Preliminaries: Basic Elements of Decision Problems

Decision theory is a mathematical theory of how to make decisions when there is uncer- tainty about the true state of nature. The presence of uncertainty implies that a choice among the alternative courses of action leads to uncertainty regarding which consequences will effectively take place. In statistical terms, the states of nature may also be referred to as parameters, commonly denoted by 0, which may be discrete or continuous. The collection of all possible states of nature is denoted by ©, the parameter space, and represents a first element of the formalization of decision problems. A second basic element is the feasible decisions (or courses of action), denoted by d. The space of all decisions, called the decision space, is denoted by T>. The third basic element is the consequences c. They are defined as the outcome following the combination of a decision d taken when the actual state of nature is 0, formally written c(d, 0). The space of all consequences is denoted by C. Before proceed- ing, in Section 5.2.2, with presenting a formal approach to qualifying and quantifying the relative merit of rival courses of action, given the basic elements of the decision problem, it is useful to devote a few more comments to the description of the decision space and the parameter space.

Regarding the decision space, it is important for the decision maker to draw up an exhaustive list of m decisions that are available, say dj, ^2, • • •, dm & D. As noted by Lindley: "(. • •) it would not be a properly defined decision problem in which the only decision was whether to go to the cinema, because if the decision were not made (that is, one did not go to the cinema) one would have to decide whether to stay at home and read, or go to the public-house, or indulge in other activities. All the possible decisions, or actions, must be included (...)" (Lindley, 1965, p. 63). Further, it is convenient to make the requirement of exclusivity, meaning that only one of the decisions can be selected. As noted by Lindley: "Hence, the decisions are both exclusive and exhaustive: one of them has to be taken, and at most one of them can be taken" (Lindley, 1985, p. 6).

The second task for a decision maker is to draw up a list of n exclusive and exhaus- tive events or states of nature, say © = {#i, 0i, • • •, #»)• Regarding the latter list, the decision maker may distinguish between situations of certainty and uncertainty. In the former case, certainty, the decision maker has complete knowledge about the states of nature. Hence, each alternative course of action leads to one and only one foreseeable consequence, and a choice among alternatives is equivalent to a choice among related consequences. In the latter case, uncertainty, the decision maker does not know which state of nature actually holds, or what the future will be. Consequently, each available course of action will have one of several consequences. It is possible, however, to measure uncertainty about the states of nature using a suitable probability distribution Pr over ©. Note that in some fields, such as business decision analysis and operations research, this situation is called 'decision making under risk' and the expression 'decision making under uncertainty' is reserved to situations in which the decision maker is unable to provide a list of all possible outcomes and/or a probability distribution for the various outcomes. In this chapter, however, this interpretation will not be pursued.

Utility Theory

The principal issue in decision making under uncertainty is the selection of a member in the list of available decisions without knowing which state of nature is truly the case. The aim, therefore, is to create a framework that allows decision makers to assess the consequences of alternative courses of action in order to compare them and avoid irrational choices or behaviour.

The formulation of such a decision framework involves, first, the assumption that the decision maker can express preferences amongst possible consequences. It is in fact assumed that the space of consequences has a partial pre-ordering, denoted by <, meaning that the decision maker must be able to specify, at any point, which consequence is suitable or whether they are equivalent (Piccinato, 1996). When comparing any pair of consequences (ci,C2) e C, C] -< C2 indicates that the consequence C2 is strictly preferred to consequence ci, ci ~ C2 indicates that C| and C2 are equivalent (or equally preferred), while ci < C2 indicates that ci is not preferred to C2, that is either Ci -< C2, or Ci ~ C2 holds. The measurement of preferences among decision outcomes is operated by a function, called a utility function, denoted by U(-) that associates a utility value U(d,0) to each one of the possible conse- quences c(d,0), also denoted U(c); it specifies the desirability of each consequence on some numerical scale.

Second, the decision maker's uncertainty about the states of nature, when they are dis- crete, is expressed in terms of a probability mass function Pr(0 | I), where I denotes the relevant information available at the time when the probability assessment is made. Com- bining the utilities U(d,0) for decision consequences and the probabilities for states of nature leads to a measure of the desirability of alternative courses of action d in terms of their expected utility (EU)*:

A standard decision rule, based on EU, instructs one to select the action with the max- imum expected utility (see also Section 5.2.3). Hereafter, information I will be omitted to simplify the notation, though it is important to keep in mind that it conditions all probability assignments.

Some further conditions (axioms) must be imposed on the preference system in order for there to exist a function U, the utility function, such that for any pair (ci,C2) e C, the relationship ci < C2 holds if and only if U(ci) < U(C2).

A.l The first axiom requires that the preference system is complete. This amounts to assume that for any pair of consequences (ci, C2) of the space of consequences C, it must always be possible to express a preference or indifference among them (one of the following relations must hold: ci -< C2,C2 ■< ci/ti ~ £2)-

A.2 The second axiom requires that the preference system is transitive. This means that for any (ci,C2,cs) e C, if one prefers C2 to ci (ci -< C2) and C3 to C2 (C2 -< C3), then

* The same idea can be applied when 0 is continuous and takes values in 0C, 0 e 0C. The probability mass function Pr(S 11) is replaced by a probability density function//;? 11) and the expected utility of decision d is:

EUfd) = f U(d,0)f(0 | I)d0.

J®cone prefers C3 to Ci (cj -< C3). In the same way, if one is indifferent between Ci and C2 (q ~ C2), and is indifferent between c^ and C3 (c^ ~ cf), then one is indifferent between q and C3 (q ~ C3). Not all the consequences are equivalent to each other, that is, for at least a pair of consequences (q, C2), either q -< c^ or c^ -< Ci holds.

A.3 The third axiom requires that the ordering of preferences is invariant with respect to compound gambles. For any pair of consequences (q, C2) g C, such that q < C2, then, for any other consequence C3 € C, and any probability a, the gamble that offers probability a of winning C2, and probability (1 — a) of winning C3 is pre- ferred (or it is equivalent) to the gamble that offers probability a of winning ci and probability (1 — a) of winning C3. Denote by (c(-,cy;a, 1 — a) the gamble offering q with probability a, and cj with probability (1 — a), i j. This axiom can then be formulated as follows: q ■< C2 if and only if (q,C3;a, 1 — a) < (C2,C3;a, 1 — a), for any a s [0,1] and any C3 e C.

A.4 The fourth axiom requires that there are not (i) infinitely desirable or (ii) infinitely undesirable consequences. Let (q,C2,C3) e C be any three consequences such that ci is preferred to C2 and C2 is preferred to C3 (C3 -< C2 -< q). Then there exist proba- bilities a and p, such that (i) C2 is preferred to the gamble (q,C3;a, 1 — a); (ii) the gamble (q, C3; p, 1 — p) is preferred to C2.

If (i) does not hold, then one will always prefer the possibility of obtaining the best consequence q, no matter how small is the probability of obtaining it, to C2; that is, one believes that ci is infinitely better than C2 (and C3). If (ii) does not hold, then one will prefer C2, no matter how small is the probability of obtaining the worse consequence C3 is; that is, one believes that C3 is infinitely worse than C2 (and q).

If these four conditions are satisfied, then one can prove the expected utility theorem, according to which there exists a function U on the space of consequences C such that for any dj and dk belonging to the decision space D, dj is preferred (or equivalent) to dk if and only if the expected utility of dj, EU(d;), is greater (or equal) than the expected utility of dk, EU(djt), that is, assuming 0 discrete, if

Consider, next, any consequence c and a pair (q, C2) e C such that ci -< C2, and q x c < 02- Following the stated conditions, it may be proved (see De Groot, 1970) that there exists a unique number a € [0,1] such that

and that

It can also be proved that the utility function is invariant under linear transformations. This means that if U(c) is a utility function, then for any a > 0, aU(c) + b is also a utility function preserving the same pattern of preferences.

Utility functions can be constructed in different ways. One possibility starts with a pair of non-equivalent consequences (q,C2) € C and assigns them a utility value. This will fix the origin and the scale of the utility function. The desirability of each consequence c e C

of interest will then be compared with those of and C2- Given that utility functions are invariant under linear transformation, the choice of q and c^, and the choice of the scale of the utility, are not relevant. They are, however, generally identified with the worst and the best consequence, respectively. It is assumed, for example, that the utility of the worst consequence is zero, U(q) = 0, and the utility of the best consequence is one, U(C2) = 1. The utilities of the remaining intermediate consequences are computed using Equation (5.2). This will be discussed further in Section 5.3.

Implications of the Expected Utility Maximisation Principle

Consider taking a decision d when the true state of nature is 0, so that the consequence is c(d,0). It is possible to show, using relation (5.1), that there exists some a such that the consequence c(d,0) is equivalent to a hypothetical gamble offering the worst consequence ci with probability a and the best consequence C2 with probability (1 — a)

The utility U(d,0) of the consequence c(d,0) can then be calculated using Equation (5.2) as follows:

According to this, for any d and any 0, selecting decision d is equivalent to assigning a probability U(d, 0) = 1 - a to the occurrence of the most favorable consequence. This hypo- thetical gamble can always be played. It can be played, in particular, after that decision d has been taken and it is known which state of nature 0 holds. The term U(d, 0) can be under- stood as the conditional probability of obtaining the consequence C2, given decision d has been taken and the state of nature 0 occurred: Pr(C2 | d,0) = U(d,0). Note that probability Pr(C2 | d) can be written in extended form as

Therefore, (5.3) can be rewritten as

namely, the expected utility that quantifies the probability of obtaining the best conse- quence once decision d is taken (Lindley, 1985). The decision rule which instructs decision makers to select the decision which maximizes the expected utility (MEU criterion) is optimal because it is the decision which has associated with it the highest probability of obtaining the most favorable consequence.

The Loss Function

An alternative way to express preferences among decision consequences c(d, 0) is the use of non-negative loss functions. When a utility function is available, the loss function can be derived as follows (Lindley, 1985):

The loss L(d,0) for a given consequence c(d,6) thus is defined as the difference between the utility of the best consequence under the state of nature at hand and the utility for the consequence of interest. That is, the loss measures the penalty for choosing a non-optimal action, also called opportunity loss (Press, 1989, p. 26-27): the difference between the utility of the best consequence that could have been obtained and the utility of the actual one received.

Note that following Equation (5.5), losses cannot, by definition, be negative because U(d,0) will be smaller or at best equal to max^-p J(d,0). The expected loss, EL(d), thus characterises the undesirability of each possible decision, and can be quantified as follows:

When using losses instead of utilities, the decision rule of maximising expected utility becomes the rule instructing the selection of the decision that minimizes the expected loss EL(d). It might be objected that assuming a non-negative loss function is too restrictive. Note, however, that the loss function represents error due to an non-optimal choice. It thus makes sense to consider that even the most favorable decision will induce at best a zero loss.

Particular Forms of the Expected Utility Maximisation Principle

For the remainder of this chapter, it will be important to anticipate two particular forms in which the MEU principle may be formulated. Consider, first, the utility-based perspective of a two-action decision problem involving two states of nature, 0 and 02- The decision maker's probabilities for these states of nature are Pr(#i | •) and Pr(02 I •), respectively, such that Pr(6>i | •) + Pr(#2 I ■) = 1- Note that | • is shorthand notation for the conditioning on any relevant evidence E or background information I. The two possible decisions are di and d2, representing the decision maker's acceptance of, respectively, 0 and 02 as the true states of nature. Hereafter, we write C(-; to denote the consequence c(dj, 0p of taking decision d, when 0j is the actual state of nature and denote the corresponding utility by U(Qp. The decision problem is summarized in Table 5.1.

According to the principle of maximization of expected utility, the decision maker should select decision d rather than d2 if EU(di) > EU(d2)- This will be the case if

TABLE 5.1

A Simple Decision Matrix with Two Decisions di and <12/ Two States of Nature 01 and 02 and Corresponding Decision Consequences C/. (for i,j = {1,2))

States of Nature

Decisions

Cn

C12

C2i

C22

which can be rearranged to give

The term U(C22) — U(Ci2) in the numerator on the right-hand side of (5.7) is the addi- tional utility involved in making the correct decision when 02 turns out to be the correct state of nature. An alternative way to look at this term is to consider it as the potential regret: it is the potential loss in utility when erroneously deciding d instead of d2- The term U(Cn) — U(C2i) similarly deals with the potential regret of deciding d2 when the true state of nature is Relation (5.7) thus states that decision d should only be taken if the odds in favour of are sufficient to outweigh any extra potential regret associated with incorrectly deciding di (Spiegelhalter et al., 2004).

Consider now the loss-based account. Recall, from Section 5.2.4, that the loss L(d(, Of) = L(C(y) for a decision consequence C,y is the difference between the utility of the outcome of the best decision under the state of nature at hand, and the utility of the outcome of the actual decision dt under the same state of nature. Therefore, the decision that minimizes the expected loss is the same as the decision that maximizes the expected utility. Continuing the example introduced above, assume that there is a positive loss L incurred when falsely choosing a proposition that is not actually the case, that is L(C(y) > 0 if i j, and there is no loss when accepting a proposition that is actually the case, that is L(C(y) = 0 if i = j. The loss can be symmetric, L(Cfy) = L(Cy,), or asymmetric, L(Cn) L(Cy,), i j. The decision crite- rion depicted in Equation (5.6) will become to select cf rather then d^ if EL(dj) < EL(d2), that is if

and the expected loss of deciding d, will be:

Considering the principle of minimizing expected loss and given that

the decision problem involves a comparison of odds with the ratio of losses associated with erroneous decisions. Specifically, deciding di rather than d2 is optimal if and only if:

or, equivalently

The loss ratio on the right-hand side in Equations (5.9) and (5.10) fixes a threshold for odds. The relation (5.9) specifies that if the odds in favour of 6] exceed the loss incurred from incorrectly choosing decision di divided by the loss incurred from incorrectly choosing decision d2, then the decision maker should take decision di.

Likelihood Ratios in the Decision Framework

So far it has been considered that the decision maker's probabilities for the state of nature are conditional probabilities written Prf^ | •) and Pr(#2 I •)/ incorporating all relevant evi- dence E and background information I available at the time when the decision needs to be made. The odds in Equation (5.9) can therefore be interpreted as posterior odds. It is useful to emphasize that likelihood ratios, commonly used in forensic science for quantifying the value of forensic results (e.g., Aitken and Taroni, 2004), play an important role in the infer- ence process preceding the decision. Recalling that the posterior odds can be written as the product of the prior odds and the likelihood ratio for the forensic results E, the relation (5.9) can thus be rewritten as:

Relation (5.11) defines the conditions under which the decision dj is preferable to d^, that is when the relative losses on the right are smaller than the product on the left, containing the likelihood ratio. Thus, it is now possible to reformulate the decision cri- terion, minimizing expected loss (Section 5.2.5), with an emphasis on the likelihood ratio, as follows:

The decision d is to be preferred to decision d% if the product of the likelihood ratio and the prior odds is larger than the ratio of the losses associated with adverse decision consequences.

A more intuitive form of (5.11) can be obtained when working with logarithms (e.g., Good, 1950):

By re-arranging the terms one can isolate the log-likelihood ratio as follows:

Note that following Good (1950), the logarithm of the likelihood ratio, the term on the left, is commonly referred to as the weight of evidence. The decision criterion minimizing expected loss (Section 5.2.5) thus becomes:

The decision d[ is to be preferred to decision d^ if and only if the weight of evidence is greater than the difference between the logarithm of the ratio of the losses associated with adverse consequences and the logarithm of the prior odds in favour of proposition .

  • [1] In later parts of this chapter, the discussion will be extended to the notion of Bayesian statistical decision theory,emphasizing the idea of using Bayesian inference procedures to inform decision makers, for example based onexperimental information (Parmigiani, 2001).
 
Source
< Prev   CONTENTS   Source   Next >