# Statistical Models for Multiple Binary and Continuous Outcomes

It should be realized that consideration of a single binary or continuous outcome, or even models that consider only multiple binary outcomes do not truly characterize developmental toxicity experiments. Using the results of ten studies from the NTP, Ryan et al. (1991) found a direct correlation between the incidence of malformation and fetal weight. Consistently, normal fetuses had higher weight and malformed fetuses had lower weight. This suggests that, in reality, joint modeling of fetal weight and malformation would better characterize the outcomes of developmental toxicity studies. The difficulty, however, is that the fetal weight is a continuous outcome while the incidence of malformation is a binary and a discrete variable, and therefore we need models for joint consideration of discrete and continuous outcomes. Although some authors had discussed joint modeling of discrete and continuous outcomes, they were not applicable to developmental studies since the models did not account for the intralitter correlation. Besides the models were developed with a focus on deriving a single p-value for the experimental results rather than exploring the relationship between outcomes. Perhaps the first attempt to jointly model fetal weight and malformation is by Catalano and Ryan (1992) who define an unobserved latent variable model for the discrete outcome. Accordingly let the random variable Yit be the value of the latent variable for the /th fetus in the /th litter. That is, suppose that there is a threshold, which without the loss of generality can be assumed to be 0. For otherwise a shift in the data would make the adjustment above which the fetus becomes malformed. Thus if we define Y£ as the binary malformation indicator, we have Now, assume a linear model for Y,;, where dt is the dosage administered to the /th pregnant animal. Note that while Y'j is observed from the experiment, Yt- is not observable. Let Wn be the fetal weight of the /'th fetus in the /th litter and consider a linear model

for Wijr The simplest approach for joint modeling of malformation and fetal weight is to use the normal linear regression model and assume a bivariate normal distribution for <5i; and This approach assumes a constant correlation p between W,, and Y^. However, it assumes that littermates are independent and ignores the intralitter correlation. Note that by the property of the bivariate normal distribution we have which is the probit model.

Now, the joint distribution of Yi( and Y' can be expressed in terms of the product of marginal and conditional distributions as where/(re) is the marginal distribution of Wi( and g(y | re) is the conditional distribution of Y£ . Further, by the standard properties of the bivariate normal distribution we have that the conditional distribution of Yif given Wj( is normal with mean and variance where et) is the residual from the weight model. Further, the conditional distribution of Y; given Wf/ is the probit model given by where, as noted by Catalano and Ryan (1992), we see that if there is perfect correlation between fetal weight and malformation, the above conditional distribution becomes degenerate. Substituting for // from (5.19), we have where Equation (5.20) defines a generalized linear regression model for malformation with the probit link in which the residual from the weight model is a covariate in addition to the administered dose. Thus, in practice, the parameters , fi{, and /Т* can be estimated using the usual regression modeling. It is interesting to note in (5.20) that when p = 0, the equation collapses into the unconditional probit model.

Since the above modeling structure ignores the important issue of litter effect, Catalano and Ryan (1992) consider an extension and derive a model similar to (5.20) with a slightly different covariate structure. In the extended model, both individual and litter averages of residuals are used as covariates Thus, in practice, we have a pair of regression models. The first models the fetal weight as a linear function of the administered dose, and the second regresses malformation outcome conditional on fetal weight outcome as a function of dose using the residuals from the first model as covariates. The regression coefficients in the second model are directly related to the variance and correlation structure of the defined latent variable. To illustrate the modeling procedure, a data set from a developmental toxicity experiment in mice conducted through the NTP is used. The experiment is the study of the developmental effects of ethylene glycol (EG) described by Price et al. (1985). Table 5.7 provides a summary of the data over all litters (no litter effect), and the information is retrieved from table 5 of Price et al. (1985). See also Table 5.8. The experiment consisted of exposing pregnant CD-I mice during gestation days 6 through 15. Several other variables were also measured.

To apply the two-step modeling procedure proposed by Catalanao and Ryan (1992) in the context of risk assessment with malformation and fetal weight as outcomes, Catalano et al. (1993) define an affected fetus as a fetus which is either dead/resorbed, or has a malformation, or has a below-normal fetal weight. Thus if 6{ (d) and 02 (d) respectively denote the probability of death/resorption and probability of malformation and/or low fetal weight, the probability that a fetus is abnormal is given by Note that this is similar to how we defined the probability of being abnormal when we considered the joint modeling of trinomial responses except that now e2(d) is defined differently to include low fetal weight as well. For parametric models, Catalano et al. (1993) suggest using a modified probit model for the binary variable of death/resorption and a regression model for a power of dose with litter size as covariate. After fitting the regression model, according to the modeling procedure, individual and average residuals are calculated and the conditional distribution of the malformation

TABLE 5.7

Average Fetal Weight per Litter (Grams) in a Single Replicate

 Litter Dose 0 15 45 60 1 0.54 0.65 0.73 0.57 2 0.69 0.79 0.80 0.40 3 0.78 1.06 0.87 0.56 4 0.88 1.03 0.67 0.60 5 0.83 0.96 0.77 0.40 6 1.03 0.87 0.78 0.63 7 0.76 0.96 0.71 0.98 8 1.01 0.77 0.55 0.71 9 1.01 0.85 0.62 0.62 10 0.82 0.80 0.68 0.88 11 0.75 1.04 0.90 0.93 12 1.13 0.67 0.75 13 1.04 0.90 0.95 14 0.83 0.77 15 0.94 0.66 16 0.80 Average 0.83 g 0.93 0.76 0.70

TABLE 5.8

Average Fetal Weight for All Litters (Grams)

 Dose Number of Litters Average Fetal Weight Standard Deviation 0 71 82.34 15 15 13 93.57 12 30 54 74.15 16 45 74 71.19 13 60 69 67.58 14 75 37 59.91 16 90 19 56.27 10

indicator variable is modeled using another modified probit model. Finally, to model the probability of low fetal weight, Catalano et al. (1993) suggest choosing a threshold response such as three standard deviations below the mean and call fetuses, with fetal weight lower than this threshold, abnormal. Thus another modified probit model is used for parameterization of this probability. Combining all models and using (5.21), the following dose- response function is derived for risk assessment where w0 is the fetal weight threshold below which a fetus is classified as abnormal and ow is the standard deviation of fetal weight in control animals. By applying the methodology to an experimental data set regarding the developmental effects of exposure to diethylene glycol dimethyl ether (Price et al., 1987) the authors show that there is a strong correlation between different fetal outcomes and that a combined analysis considering all possible outcomes including death/resorption, incidence of malformation, and fetal weight provides a "useful and sensitive summary" of the overall toxicity effects of the chemical. Furthermore, the model provides a better understanding of the underlying relationship between different outcomes.

Fitzmaurice and Laird (1995) argue that in the model proposed by Catalano and Ryan (1992), the regression parameters for both the binary malformation status and conditional fetal weight outcomes have no specific marginal interpretation. This is due to the fact that the link function relating the conditional mean of the binary response to the covariates is nonlinear. They propose a model to describe the marginal distribution of yi; given the covariates as a logistic model given by where it is assumed that 0; is a linear function of a set of covariates Sf predicting yj( and is related to the mean response /и, = /j, (p,) = £(T;) = P( Y, = l) through a logistic link, that is Writing the joint distribution of IA/( and Y,; as and assuming a normal for the conditional distribution of W„, where /г„, is a linear function of another set of covariates T: predicting the weight, i.e. and a is a parameter for regression of Wi( on Y,;, it becomes apparent that the weight variable W(;- has a conditional mean that depends on the binary malformation variable Y,;, inducing the correlation between the two variables. This modeling structure can be extended to allow for clustering by using the

GEE methodology. Fitzmaurice and Laird (1995) illustrate their methodology by using the experimental data on the developmental effects of exposure to diethylene glycol dimethyl ether (Price et al., 1985).

Both, Catalano and Ryan (1992) and Fitzmaurice and Laird (1995) rely on factorization of the joint distribution of the binary variable for malformation and continuous variable for fetal weight as the product of a marginal distribution and a conditional distribution. In fact, a couple of other models were developed based on this conditioning structure; see Chen (1993) and Ahn and Chen (1997). But, as noted in Regan and Catalano (1999), this conditioning is largely because of statistical convenience and not so much based on biological principles. In addition, conditional models do not provide adequate interpretation of the marginal dose-response relationships, fail to provide an estimate of the direct correlation between fetal weight and malformation status, and are often difficult to apply for risk assessment. Noting these drawbacks, Regan and Catalano (1999) propose a likelihood-based approach that utilizes the joint distribution of a sequence of latent variables induced by binary variables generated by the malformation outcomes in each litter and an extension of the so-called correlated probit model. More specifically, let Y:k be the latent variable corresponding to the malformation status of the kth. fetus in the y'th litter, i.e. Y,k > 0 if the Arth pup in the jth litter is malformed and Yjk < 0 otherwise. Assume that the vector Yy = (Уд,..., Ymj j7 has a multivariate normal distribution with mean and covariance matrix

Z„ = cr,; [jl - p„) I,„, + p„ J where lra. is the m- dimensional vector of ones and Imj and Jmj respectively represent the w; x w, identity matrix and the matrix of ones. Note that this model assumes equal correlation across all litters. Assuming a multivariate normal distribution for the fetal weight vector W, = (Wjzvmj) , the joint distribution of the 2m, dimensional random vector (Yy, W,-) is expressed as a multivariate normal where S2„,y is the 2x 2,„; joint covariance matrix of (Y;,Wy j given by and p„,, 0, and pw respectively represent the mean, variance, and the common correlation coefficient of components of W.. Also, p represents the correlation coefficient between weight and the latent variable induced by malformation status for each fetus and assumed to be the same for all fetuses within a litter. In this formulation, it is assumed that all outcomes are exchangeable. Regan and Catalano (1999) assume quadratic dose-response models for the mean and the inverse of the coefficient of variation and a linear dose-response model for the variance of the weight variables. They also model all three correlation coefficients linearly in dose and use maximum likelihood to determine the parameter estimates. The methodology is applied to the data from the developmental toxicity of diethylene glycol dimethyl ether by Price et al. (1985) summarized in Tables 5.9 and 5.10. The advantage of this approach in comparison to the conditional modeling approach is that

TABLE 5.9

R Output for Poisson Regression of Diethylhexyl Phthalate (DEHP) Data TABLE 5.10

R Output for Poisson-Gamma Regression of Diethylhexyl Phthalate (DEHP) Data it allows for estimation of the parameters that characterize the latent variables, which are not directly observable. In addition, since the weight variable and the latent variable induced by malformation status are modeled as linear functions of the dose, they can vary at different dose levels.

There are also some other approaches for joint modeling of discrete and continuous outcomes in developmental toxicity studies. For example, Faes et al. (2004) propose a likelihood-based model using an extension of the Placket-Dale approach to modeling.