# Exploring Fixed-Accuracy Estimationfor Population Gini Inequality Index Under Big Data: A Passage to Practical Distribution-Free Strategies

## Introduction

Economic inequality exists. It is very important to measure such inequality in order to grasp the impact of current or past economic policies. Among numerous available measures or indices of income inequality the Gini inequality index or Gini coefficient stands out as the leading measure of inequity in income or wealth distribution.

Suppose X is a non-negative and nondegenerate random variable having the distribution function (d.f.) F that represents income or wealth of a person or a household. If we consider *X* and *X**2**,* two independent and identically distributed (i.i.d.) copies of *X,* then a population Gini index is defined as follows:

A population Gini index (Gini 1914, 1921) is often based on census data. But, for continuous monitoring or updating of economic policies, it is often crucial to estimate Gf using sample data drawn from a population. In a large region, estimating Gf is carried out by implementing stratified sampling or another appropriate complex sampling design. In a smaller or a localized region, simple random sampling may be used to produce nearly i.i.d. observations. One may refer to miscellaneous sources including Beach and Davidson (1983), Chattopadhyay and De (2014, 2016), Davidson (2009), Davidson and Duclos (2000), De and Chattopadhyay (2017), Gastwirth (1972), Xu (2007), and Arnold and Sarabia (2018).

### Recent Developments in Sequential Estimation Strategies

We consider *n* i.i.d. observations Xi,..., X„ from the common d.f. F with its support (0,00). A customary estimator of Gf is given by:

where X„ is the sample mean and *A„* is the sample Gini mean difference (GMD) defined as follows:

Clearly, X„ and A„ are both U-statistics (Hoeffding, 1948, 1961; Sen, 1981; Lee, 1990; Jureckova and Sen, 1996) of degree 1 and 2 respectively. Exploiting a series of crucial properties of U-statistics, Chattopadhyay and De (2014, 2016) and De and Chattopadhyay (2017) developed novel purely sequential methodologies to estimate the Gini index, Gf. Now, we briefly touch upon two fundamental problems when estimating *Gf.*

#### Fixed-Width Confidence Interval (FWCI) Strategy

Having prefixed half-width *d(>* 0), Chattopadhyay and De (2016) constructed a FWCI estimation methodology to come up with [Gn ± *d]* for G based on the finally accrued sequential data {N, Xi,..., Xn}, associated with the following stopping variable:

Their statistic, *V ^{2},* used in the definition of the boundary condition in (11.4) was a consistent estimator of

*Vf[n*

^{l}/

^{2}*G„}.*They concluded asymptotically (as 0):

with some pre-specified level *a* e (0,1) in the spirit of Chow and Robbins (1965). Obviously, the two preassigned numbers *d, **a* were fixed before data started to arrive. One may also refer to Starr (1966a) and Ghosh and Mukhopadhyay (1976) in the context of population mean estimation.

The stopping variable, that is, the terminal sample size N obtained from the sequential procedure (11.4) was shown to satisfy a number of desirable asymptotic properties including: (a) the *first-order asymptotic efficiency* property (Ghosh and Mukhopadhyay, 1981), and (b) the *asymptotic consistency *property (Chow and Robbins, 1965).

#### Minimum Risk Point Estimation (MRPE) Strategy

In a sequel, De and Chattopadhyay (2017) developed a remarkable purely sequential methodology to construct a MRPE of Gf in the spirit of Robbins (1959). The loss function under consideration was a combination of the squared error loss (SEL) due to estimation error plus linear cost. It had the following form:

where *A(>* 0) is a known weight and *c* is a known positive constant representing the cost for sampling one observation. The two preassigned numbers *A, c* were fixed before data began arriving. One may also refer to Starr (1966b),

Mukhopadhyay (1978) and Ghosh and Mukhopadhyay (1979) in the context of population mean estimation.

De and Chattopadhyay (2017) constructed a MRPE methodology to come up with G,v for Gf based on the finally accrued sequential data {N, X],..., X,v}, associated with the following stopping variable:

Again, the statistic *V„* used in the definition of the boundary condition was a consistent estimator of Vj/^n^G,,].

The associated sequential risk became asymptotically (as *c* -t 0) close to the minimum fixed-sample-size risk, much in the spirit of Robbins (1959). They developed technical details along the lines of Ghosh and Mukhopadhyay (1979) and Sen and Ghosh (1981) and proved two elegant results among others of substantial importance: The associated stopping variable N from (11.7) enjoyed (a) the *first-order asymptotic efficiency* property and (b) the *first-order asymptotic risk efficiency* property in the spirits of Ghosh and Mukhopadhyay (1981).

In a more recent paper, Mukhopadhyay et al. (2020) followed up on De and Chattopadhyay's (2017) strategies by broadening their analysis under a weighted version of the original loss function from (11.6). They have come up with updated both sequential FWCI and MRPE methodologies with customarily desired asymptotic characteristics. Going back to the original FWCI and MRPE problems for Gf and their preceding and/or subsequent sequential sampling strategies, it is readily seen that the researchers restricted themselves to gather one observation at a time. They were possibly over-conscious about keeping the terminal sample size rather small within reason.