Parametric and Nonparametric Estimation

The Stata package Giniinc allows one to estimate the Gini concentration index under several conditions, both in the parametric and nonparametric setting. In particular, maximum likelihood estimation for three commonly used parametric models is available to estimate the Gini index, both from censored and truncated data (Hong et al., 2018).

The two cases of left censoring vs. left truncation can be handled easily from the observed data when a parametric model describes the population. Observed left-censored data can be written as the pairs (yi,k, and 1 otherwise. Denoting x, as the true underlying survival time of individual i, then y, = max(.tirk) and <5,- = l(k < x,).

If the times x,..., x„ are an i.i.d. sample from the model density fo(x), then the likelihood function for the observed data is

where у,- = к whenever S, = 0.

In the case of left truncation, only the cases such that x-, > к are observed, but one does not know how many of the observations are missing, i.e., the cases with x, < k. Denote by m the number of available observations y, = x,, j = 1,... ,m. The left-truncated income are therefore an i.i.d. sample from fo(xX > k), and the observed data likelihood is

Using maximum likelihood estimation, the estimate of 9 and its estimated variance-covariance matrix can be obtained for both cases. From these, the Gini index can also be estimated consistently and confidence intervals constructed through a delta method procedure.

In the nonparametric setting, one of the first attempts to estimate the Gini index when data are observed after random left truncation and right censoring traces back to Tse (2006). In particular, the author considers data in a random left-truncated and right-censored model, where X is the lifetime variable with continuous distribution function F, that is subject to right censorship S and left truncation T, with X, S and Г being independent. The author proposes to estimate the Gini index using the Gini statistics G„ defined by

where L„(}/) is the Lorenz statistics for estimating the Lorenz curve defined as L„(y) = {X}-1 fj Q„(u)du, 0 < у < 1. X is the sample mean and Q„ is the empirical quantile function of the product-limit (PL) estimator of F. Following the approach proposed by Csorgo et al. (1986), the author obtains a central limit theorem for the proposed Gini estimator, based on the PL quantile process. Unfortunately, an explicit formula of the asymptotic variance of the estimator is not provided.

Also in the nonparametric setting, a restricted version of the Gini concentration index has been introduced and exploited to construct a test for differences in survival distributions.

The Restricted Gini Index and Test

A nonparametric test has been proposed for testing the equality of two survival distributions from the point of view of concentration, when data consist of two independent i.i.d. right-censored samples (Bonetti et al., 2009). This test is based on a restricted version of the Gini index, useful for survival data in which subjects have finite follow-up:

Note that the traditional Gini index G can be written as in Equation (6.1) but with the integrals running from zero to infinity. The restricted Gini index can be estimated nonparametrically from right-censored data with:

where S(u) is the Kaplan-Meier estimator of the survival function S(u) and t indicates the longest follow-up time in the data (Kaplan and Meier, 1958). Under regularity conditions, as n -t oo,

where

with = ц, - Mv = f!, S(u)du, P,(v) = u, - uv = j‘ S2{u)du, W, = /J S(u)du, and Vt = fg S2(u)du. The derived two-sample test statistic is

where Gu is the estimator of the restricted Gini index for censored data for group j and Var(Gjft) is the estimator of the sampling variance of Gu for group j, j = 1,2. Under the null hypothesis of equality of the two survival distributions, the statistic Thas an approximate chi-squared distribution with one degree of freedom, while under any alternative hypothesis T is distributed as an approximate noncentral chi-squared distribution with noncentrality parameter r] = [(Gi,t - G2,t)2]/{nrl/ni + т^/иг), where ту, indicates the asymptotic variance of y/njGu and и, is the sample size of group j, j = 1,2.

Note that the test compares the two distributions from the point of view of their shape. In their work the authors compared the Gini test with other tests for the difference between two survival distributions, such as the log- rank, Wilcoxon, and Gray-Tsiatis tests (see, among others, Ewell and Ibrahim,

1997; Gray and Tsiatis, 1989; Harrington and Fleming, 1982). The comparison focuses on cure rate survival models, for which a fraction of the population never experiences the event of interest. For such models, the restricted Gini index is shown to converge, as the follow-up time tends to infinity, to the proportion of noncured subjects in the population.

In Gigliarano and Bonetti (2013) the behavior of the Gini test for the cases of small-sized and unbalanced groups is studied, and a version of the test based on permutation inference is introduced. The restricted Gini test is implemented in the R function survgini (Gigliarano and Bonetti, 2011), and it has also been made available in the Stata package Giniinc.

Nonparametric bounds for the Gini index can be produced with the Giniinc package when data are left censored by a fixed threshold. The bounds are based on decomposition results for the index when assessment of the (say) income value is only recorded above the threshold, since in that case the threshold defines a partition of the support of the variable (see e.g., Yitzhaki and Schechtman, 2013).

Estimation with Dependent Censoring

So far, we have focused on the Gini index estimation in the case of independent censoring and/or truncation. However, the independence assumption of truncation and censoring mechanisms may not be satisfied in empirical applications, especially when data are obtained from an observational study. One of the few attempts to estimate the Gini index using dependent censoring has been proposed by Lv et al. (2017). The authors propose an estimator of the Gini index based on the inverse probability weighting method, both under independent censoring and in case of covariate-dependent censoring. The authors focus on the following expression of the Gini index (see David, 1968):

The proposed estimator is given by

with A I2S,Fa,(T,)Tj ~ _ 1 S,T, J Г (1 _ I 2<5,/(Ti<0

Wlttl 11 « 2->i=l K<(T,) ' Р И ^-"'=1 KC(T,) anCl n 2->i= 1 Kc(Tj) '

where S, is the censoring indicator, Kc(t) is the Kaplan-Meier estimator and T, is the minimum between censoring variable and observed lifetime variable. According to Lv et al. (2017), using the weight —allows the quantities fj and fi to be consistent and asymptotically normal. The delta method leads then to the consistency and the asymptotic normality of G. The authors show that the estimator proposed is consistent and asymptotically normal, with an explicit formula for the asymptotic variance, composed of 11 elements. In case of dependent censoring, the authors assume that a covariate vector z, can explain all dependence between lifetime variable T, and censoring variable Cj. The proposed estimator is based on the local Kaplan-Meier estimator, for estimating the conditional survival function of censoring given the covariates, and on the linear hazard model of Aalen (1980,1989,1993) which allows the coefficients to vary with f. The authors show that the Gini estimator remains consistent and asymptotically normal even in case of covariate-dependent censoring.