# Simulation Studies

In Sections 8.2 and 8.3, we established that the proposed, purely sequential procedures for relative-accuracy confidence set estimation as well as MRRPE enjoy a number of relevant asymptotic optimality properties as d or c approaches 0 respectively. In this section, we attempt to assess finite sample performances of our proposed estimation methodologies for moderate to small values of d and c via Monte Carlo simulations.

## Confidence Set Estimation

Let us first consider the sequential procedure (8.19) for relative-accuracy confidence set estimation of the Gini index from Section 8.2. With some preassigned values of d, we implemented the sequential procedure corresponding to the stopping rule (8.19) by choosing the pilot sample size

TABLE 8.1

Performance of the Purely Sequential Procedure for Relative-Accuracy Confidence Set Estimation of G for Different Distributions with a = 1:10,000 replications

 Distribution            Gamma shape = 1.5, rate = 1 0.03 461 463.98 1.0066 0.8981 0.4244 0.90 0.6133 0.0030 0.4240 Log-normal meanlog = 0.75, sdlog = 0.25 0.025 301 314.33 1.0466 0.9492 0.1403 0.95 0.4452 0.0022 0.1400 Pareto scale = 5, shape = 4.01 0.09 721 693.92 0.9635 0.8817 0.1424 0.90 2.6246 0.0032 0.1401 Exponential rate = 0.4 0.05 О 00 799.7 0.9992 0.9513 0.5000 0.95 0.9167 0.0021 0.4998

m = max{4, |7?(d)1^1+7^l} by fixing 7 = 0.2 (for the Pareto case) and 7 = 1 (for the rest). We evaluated the performances in a number of cases, that is, by drawing independent samples from several distributions, specifically relevant to income or wealth data, namely gamma, log-normal, Pareto, and exponential. Tables 8.1 and 8.2 correspond to the cases a = 1 and a = 2 respectively with a given in (8.6) and show summaries in a limited number of scenarios.

These tables present the estimated expected sample size Nj (estimator of Ef [Nrf]), its standard error s(Nj), the estimated coverage probability CP (estimator of the probability in part (ii) of Theorem 8.1), its standard error s(CP), and the final estimator of the Gini index G^ . based on 10,000 replications. For different values of d, a, and the given parameters of respective distributions, we exhibit the values of the population Gini index G and the optimal oracle

TABLE 8.2

Performance of the Purely Sequential Procedure for Relative-Accuracy Confidence Set Estimation of G for Different Distributions with a = 2:10,000 replications

 Distribution            Gamma shape = 2, rate = 1:5 0.04 305 304.25 0.9987 0.8929 0.3750 0.90 0.6249 0.0031 0.3744 Log-normal meanlog = 0.6, sdlog = 0.3 0.03 587 582.86 0.9932 0.8952 0.1680 0.90 0.8005 0.0031 0.1676 Pareto scale = 2.25, shape = 4.01 0.09 1311 1268.54 0.9680 0.8805 0.1424 0.90 4.2764 0.0032 0.1409 Exponential rate = 0.5 0.07 1046 1036.86 0.9919 0.9439 0.5000 0.95 1.5595 0.0023 0.4998

sample size [и/|. Fifth columns in Tables 8.1 and 8.2 illustrate that the ratios of average sample sizes and optimal sample sizes are nearly 1 under all scenarios. These validate the asymptotic first-order efficiency property of Nj given in Theorem 8.2.

The standard errors of the estimator N,t are small in all scenarios. The estimated coverage probabilities in column six of Tables 8.1 and 8.2 are very close to the target level 1 - a for all chosen distributions, which validates the asymptotic consistency property of the sequential procedure. In the Pareto case, we observe that the expected final sample sizes are slightly less, on an average, than the corresponding optimal sample sizes leading to little loss in the attained coverage probabilities. This may be due to the fact that for the given shape and scale parameters of Pareto distributions, the values of £2 and its estimator V are close to zero leading to early stopping of the purely sequential procedure. One way to deal with this problem may be to choose a smaller value of 7 to prevent undersampling. However, if the chosen value of 7 is too small, it may also lead to oversampling. The question of optimal selection of 7 for a given scenario, or in general, is an open question and is beyond the scope of this article. The last columns in Tables 8.1 and 8.2 illustrate that the final estimator G,v(l is very accurate in estimating the population Gini index, G. Overall, the simulation results validate the theoretical properties and the finite-sample-size performances of the purely sequential procedure of Section 8.2 are clearly very encouraging.

## Point Estimation

We implemented the purely sequential procedure corresponding to (8.26) for MRRPE with A = \$500000, c = \$0.5, 7 = 1 and pilot sample size m = max{4, ["(Л/с)1/2(1+'!,)"|}. The performances of the sequential procedure were evaluated for small to moderate sample sizes by drawing random samples from four different income distributions, namely exponential, gamma, log-normal, and Pareto. Tables 8.3 and 8.4 correspond to a = 1 and a = 2 respectively where a is given in (8.8).

Since negative moments do not exist for the exponential and gamma distributions under consideration, we assumed truncated support in such cases, that is, we assumed that the data came from truncated exponential or truncated gamma distributions having support (f, 00) with t = 0.001. In the cases of log-normal and Pareto, since all negative moments exist, we assumed full support (0,00). Tables 8.3 and 8.4 summarize the estimated expected sample size Nc (estimates Ef[Nc]), its standard error s(Nc), the average risk R^c (estimates R/vc(G)), its standard error s(R,v<), and the final estimator G,v, for the Gini index G from 10,000 replications.

For the given distributions, we also provide the values of the population Gini index G and the optimal sample size c] that minimized the expected cost. The ratios of the estimated expected sample sizes and optimal sample

TABLE 8.3

Performance of the Purely Sequential Procedure for Minimum Relative Risk Point Estimation of G with a = 1, A = S500000,7 = 1 and c = \$0.5:10,000 replications

 Distribution            Exponential 0.4998 409 411.12 1.0073 408 0.9996 rate = 0.5, t = 0.001 0.4996 0.2665 0.2680 Gamma 0.2026 314 315.82 1.0075 312.01 0.9953 shape = 7.5, rate = 1.5, t = 0.001 0.2023 0.1986 0.2006 Log-normal 0.1403 280 280.92 1.005 276.71 0.9902 meanlog = 2, sdlog = 0.25 0.1401 0.2271 0.2299 Pareto 0.0907 913 884.56 0.9692 882.69 0.9672 scale = 45, shape = 6.01 0.0903 1.2633 1.2650

sizes under given scenarios are nearly 1 validating the assertion of asymptotic first-order efficiency property stated in Theorem 8.3. The last columns of Tables 8.3 and 8.4 show that, on the average, the overall risk of estimating G using the final sample size Nc and the accrued data is approximately equal to the minimum possible risk R*(G) given in (8.25). This validates the asymptotic first-order risk efficiency property stated in Theorem 8.4. Moreover, we observed that the final estimator G,v, accurately estimated the population Gini index G under all four scenarios. Based on the large set of our broad-ranging simulations, we feel comfortable concluding that the finite-sample-size performances of the proposed, purely sequential procedure of (8.26) are clearly very encouraging.

TABLE 8.4

Performance of the Purely Sequential Procedure for Minimum Relative Risk Point Estimation of G with a = 2, A = S500000,7 = 1 and c = \$0.5:10,000 replications

 Distribution            Exponential 0.4992 61 66.70 1.0959 613.31 1.0077 rate = 1.5, t = 0.001 0.4982 0.0979 1.0388 Gamma 0.5194 77 80.99 1.0618 763.63 1.0012 shape = 0.9, rate = 1.1, t = 0.001 0.5182 0.1195 1.2478 Log-normal 0.1955 226 223.19 0.9912 2211.01 0.9819 meanlog = 1.55, sdlog = 0.35 0.1949 0.2445 2.4597 Pareto 0.0907 754 726.37 0.9635 7252.22 0.9619 scale = 16, shape = 6.01 0.0902 1.1460 11.46651