# The Gini Concentration Index for the Studyof Survival

The Gini Concentration Index for the Study of Survival

Marco Bonetti, Chiara Gigliarano, and Ugofilippo Basellini

## Introduction

Originally proposed to measure income or wealth inequality (Gini, 1912, 1914), the Gini concentration index has become one of the most common statistical indices employed for measuring concentration in the distribution of a positive random variable. In the analysis of survival times, the Gini index measures the degree of inequality within a given population in terms of the subjects' ages at death; one can think about a person's length of life to be the equivalent of income, and the total number of years lived to be distributed across the population. The Lorenz curve and the Gini index can be constructed from the life-table age-at-death distribution or from the individual ages at death.

Here, we review some developments that have occurred mostly over the past 20 years in the use of the Gini concentration index to study survival distributions. We first describe methods to estimate the concentration index from incomplete data, both within the parametric and the nonparametric setting. We then move to illustrate work in the demographic domain, where survival distributions are typically estimated from life tables. Lastly, we consider some recent developments that have focused on the study of a class of survival distributions for which the measures of life expectancy at birth and of concentration can move in the same or opposite directions across groups (typically, birth cohorts) as a result of changes in mortality over time. For all cases we also refer to the software packages that can be used to compute the Gini index and implement the different approaches.

## Estimation of the Gini Concentration Index from Incomplete Data

The Gini concentration index is defined, for a nonnegative random variable X with cumulative distribution function F(x), as where // is the expected value of X (Gini, 1912, 1914). The Gini index varies between zero (in the case of perfect equality) and one (perfect inequality). For length-of-life distributions, if a small group of individuals lives much longer than the rest of the population, then the index will tend to be large. The index is equal to zero if all individuals die at the same age, and equal to one if all people but one die at birth and that one individual dies at any positive age (see e.g., Gigliarano et al., 2017).

An alternative expression of the Gini index that will be considered throughout this chapter is given by where S(u) = P(X > и); see Michetti and DalTAglio (1957) and Hanada (1983).

Most research on the Gini index estimation has focused on completely observed data. We now review some recent developments on the calculation of the Gini concentration index when data are incomplete, and in particular left or right censored or truncated. The right-censored case is most common when dealing with survival data, while the other cases are more typical of income or wealth data. For parametric settings, the index can be estimated together with a confidence interval. In nonparametric settings, a restricted version of the Gini index has been defined, and nonparametric bounds for the index can be constructed. Both R and Stata software is now available to implement the methods (R Development Core Team, 2018; StataCorp, 2017).

### Some Types of Incomplete Survival (or Income) Data

In both cases of survival and of income/wealth data, the data are often incompletely measured, due to left or right censoring or truncation. Let us review those situations.

Consider the lifetime (or income) random variable X > 0, with distribution fe{x) indexed by the parameter в. Also let C > 0 be another random variable, independent of X. Observation of X is left censored if one observes the largest between X and C, and knows which one it is. The observation of X is right censored if one observes the smallest between X and C, and knows which one it is. Right (or left) censoring can occur with a varying or a constant censoring time C. The variable X is left truncated if one observes X only when X > C, and it is right truncated if one observes X only when X < C. Truncation often occurs with respect to a constant.

Note that with censoring one does observe the individual (or, more generally, the statistical unit), but the value that is collected might not correspond to the true underlying value that one is interested in. In contrast, for truncated variables one does not even observe the individuals/units with values of the variable that are outside some range.