# Bayesian Networks

## General Concepts

A Bayesian network consists of two components (Pearl, 1988): first, a directed acyclic graph (DAG) in which nodes represent stochastic domain variables and directed arcs represent conditional dependencies between the variables (see Definitions 1—3) and second, a probability distribution for each node as represented by conditional dependencies captured with the directed acyclic graph (see Definition 4). Bayesian networks are powerful representation and visualization tools that enable users to conceptualize the association between variables. However, as explained later, Bayesian networks can also be used for making predictions. To formalize, the following definitions are relevant:

Definition 1. *A* directed acyclic graph *(DAG) is a directed graph that contains no directed cycles.* ■

Definition 2. *A* directed graph G *can be defined as an ordered pair that consists of a finite set* V *of vertices or nodes and an adjacency relation* E *on* V. *The Graph* G *is denoted as* (V, E). *For each* (a, b)εEfa *and* b *are nodes) there is a directed edge from node* a *to node* b. *In this representation,* a *is called a* parent *of* b *and* b *is called a* child *of a. In a graph, this is represented by an arrow which is drawn from node* a *to node* b. *For any , which means that an arc cannot have a node as both its start and end point. Each node in a network corresponds to a particular variable of interest.* ■

**Definition 3.** *Edges in a Bayesian network represent direct conditional dependencies between the variables. The absence of edges between variables denotes statements of independence. We say that variables* B *and* C *are* independent *given a set of variables* A *if for all values a,* b *and* c *of variables* A, B *and* C. *Variables* B *and* C *are also said to be* independent conditional *on* A. ■

Definition 4. *A Bayesian network also represents distributions, in addition to representing statements of independence. A distribution is represented by a set of conditional probability tables (CPT). Each node X has an associated CPT that describes the conditional distribution of X given different assignments of values for its parents.* ■

The definitions mentioned above are graphically illustrated in Figure 7.1 by means of a simple hypothetical example. First, this network introduced here clearly is acyclic and directed. Second, the variables 'gender', 'driving license' and 'number of cars' are parents of the mode choice variable. Finally, dependent and independent relationships, as well as examples of CPTs are shown in this figure. In the upper CPT for instance, the probability for mode choice being equal to bike, is 0.2, given that gender = male, driving license = yes and number of cars = 1. Learning Bayesian networks has traditionally been divided into two categories (Cheng, Bell, & Liu, 1997):

Figure 7.1: A small Bayesian network with its CPT.

structural and parameter learning. Since these learning phases are relevant for the new integrated BNT classifier, the following sections elaborate on them into detail.