JRR Variance Estimates for Longitudinal Fuzzy Measures of Multi-Dimensional Poverty

Gianni Betti, Francesca Gagliardi and Vijay Verma

Introduction

As highlighted in Chapter 4, one of the most important goals of the 2030 UN Agenda for Sustainable Development is to ‘eradicate poverty, in all its forms and dimensions’ (United Nations, 2015). Moreover, the need to reduce poverty and launch anti-poverty programmes and policies has also been expressed by the European Commission (2010) in their Europe 2020 Strategy. This consists of a series of policy objectives called ‘headline targets’, which should be reached by 2020. They include targets for reducing the cross-sectional at-risk-of-poverty rate (ARPR), known in the literature as the headcount ratio or as FGT(0) in the family in Foster et al. (1984), and the at-persistent-risk-of-poverty rate in a longitudinal context for monitoring poverty over time. Moreover, poverty measures are most useful to policymakers and researchers when they are finely disaggregated, at least at the level of geographic units smaller than entire countries; this is the purpose of the DG Regional Policy of the European Commission, which uses sub-national/regional-level data (NUTS 2)1 for social indicators to monitor the ‘headline targets’ at the regional level.

In the framework of the Europe 2020 Strategy of the European Commission, purely monetary measures of poverty are presented together with new multidimensional measures of poverty. Measures of multi-dimensional poverty have been developed as well in the literature: applications are in Whelan and Maitre (2007), Betti et al. (2015) and Goedeme (2013). Betti and Verma (2008) elaborated a fuzzy formulation of multi-dimensional measures of deprivation. One approach to modelling fuzzy measures of multi-dimensional poverty is in the next section. The basic concept of the fuzzy approach is to treat poverty and deprivation as a matter of degree, replacing the conventional poor/non-poor dichotomy. An individual degree of poverty is determined by that person’s position in the income distribution. This methodology facilitates the inclusion of other dimensions of deprivation in the analysis: by appropriately weighting indicators of deprivation to reflect their dispersion and correlation, we can construct measures of non-monetary deprivation in its various dimensions. Verma et al. (2017) extended the methodology to longitudinal fuzzy multi-dimensional measures of deprivation. The extension of cross-sections to longitudinal fuzzy measures is by no means straightforward. Fuzzy measures of multi-dimensional poverty in the cross-sectional context (e.g. those developed in the next section) need to be extended on the basis of fuzzy-set operations appropriate for the analysis of multi-dimensional poverty in the longitudinal context. This is developed in detail in the third section on fuzzy-set operations. The methodology is fairly general and can yield a variety of longitudinal measures. It provides the basis for the primary objective of this chapter: the variance estimation of longitudinal fuzzy measures of multi-dimensional poverty.

In dealing with population-based survey samples of households or persons, sampling and non-sampling errors affect the accuracy of the estimates based on these surveys, which are particularly relevant when our concern is the analysis of income poverty and inequality. These are complex statistics based on large sample surveys, normally with complex sampling designs. Unfortunately, information on sampling errors and design effects is often not obtained or at least not reported and used in the analysis of the substantive results of the surveys. It seems to be a frequent practice that poverty statistics are published without their confidence interval. Even when standard errors are published, it is not always clear whether they have been calculated accurately. These shortcomings exist even in the most developed settings: in most official publications from Eurostat on poverty indicators, for instance, standard errors are not presented, but as Goedeme (2013) notes, ‘this is not a feature unique to Eurostat publications’.

In the literature, several works are devoted to the calculation of standard errors of traditional measures of monetary poverty at the national level. In the present context, when we refer to traditional measures, we mean indicators that are officially adopted by international statistical institutions such as Eurostat, for example, the ARPR, defined as the proportion of the population with an income below a certain threshold. Applications of different methods for the variance estimation of poverty measures are found in Binder (1983), Binder and Patak (1994), Binder and Kovacevic (1995), Preston (1995), Kovacevic and Yung (1997), Deville (1999), Zheng (2001), Berger and Skinner (2003), Demnati and Rao (2004), Osier (2009), Goedeme and Rottiers (2011), Muennich and Zins (2011), Verma and Betti (2011), Graf and Tille (2014), Alper and Berger (2015) and Berger and Priam (2016). Furthermore, in recent years, the literature on the estimation of the standard errors of traditional measures of poverty at the local (sub-national, regional) level has expanded: see Verma et al. (2017) for an analysis on overall member countries of the Organisation for Economic Co-operation and Development (OECD); Verma et al. (2010a, 2010b) on survey errors and on the robustness of indicators at the regional level; Betti et al. (2012) on a methodology for constructing sub-national indicators of deprivation; Giusti et al. (2012) on small area method applications; Betti et al. (2016a) on the use of a cumulation methodology; and Elbers et al. (2003) on poverty mapping.

By contrast, relatively little attention has been paid to the standard error calculations of these multi-dimensional poverty measures. Because the scope is the estimation of these measures at the regional NUTS 2 level - at which the sample sizes are much smaller than those of corresponding country- level samples - it is essential to develop an adequate methodology for estimating standard errors. Betti et al. (2018) extended variance estimation to go beyond measures of monetary poverty, specifically fuzzy formulation of those measures and, as a corollary, to multi-dimensional measures of deprivation, which by their very nature are a matter of degree.

In this chapter, we further extend variance estimation to longitudinal multi-dimensional fuzzy poverty measures. The measures considered are based on a fuzzy representation of individuals’ propensity for deprivation in monetary and diverse non-monetary dimensions and are derived from sample surveys with complex designs and fairly large samples. In particular, we describe and adopt a new longitudinal measure based on the fuzzy-set approach to multi-dimensional poverty, as proposed by Verma et al. (2017): the ‘fuzzy at-persistent-risk-of-poverty rate’. The fourth section on data and variance presents a practical methodology for variance estimation - in particular, the jack-knife repeated replication (JRR) method - for multi-dimensional measures of poverty and deprivation of households and individuals in a longitudinal context. We describe and quantitatively illustrate calculation procedures and difficulties in producing reliable and robust estimates of sampling errors in such measures. Some of the problems encountered are identified, and solutions are provided in the context of actual conditions. The fifth section describes the micro-dataset used, which comes from the Spanish EU-SILC (EU Statistics on Income and Living Conditions) survey. Finally, the sixth section presents an empirical analysis, and the last section offers concluding remarks, indicating fruitful directions for further research.

Modelling Fuzzy Measures of Multi-Dimensional Poverty

This section describes the basic cross-sectional fuzzy measures of monetary and non-monetary deprivation. The fuzzy-set approach (Zadeh, 1965) treats poverty as a matter of degree, replacing the simple (0,1), poor/non- poor, dichotomy into which individuals or households are divided in the traditional approach. In a fuzzy conceptualisation, all individuals are subject to poverty, but at varying degrees; in this way, each individual has a certain propensity for poverty over the entire range [0,1]. Treating poverty as a matter of degree has several advantages, applicable to all members of the population, rather than as simply a ‘yes-no’ state. These are summarised by Verma et al. (2017) as follows:

  • 1. Non-monetary poverty depends on forced non-access to various facilities or possessions that determine basic living conditions. An individual might have access to some of them but not others. Hence, clearly, non-monetary poverty is inherently a matter of degree, and a quantitative approach such as the present one is essential.
  • 2. The fuzzy approach provides more robust indicators of poverty in a longitudinal context. The conventional approach measures income mobility simply in terms of movement across a designated poverty line and does not reflect the actual magnitude of the changes affecting individuals at all points in the income distribution. Consequently, the degree of mobility of people near this line tends to be over-emphasised, while that of people far from the line is largely ignored.

Apart from various methodological choices involved in the construction of conventional poverty measures, the introduction of fuzzy measures adds factors on which choices must be made. The fundamental factor concerns the choice of ‘membership functions’, a quantitative specification of the propensity for the poverty of each person, given the level and distribution of income of the population.

Cross-Sectional Fuzzy Membership Function

Betti and Verma (2008) proposed two fuzzy membership functions (MF), based on the seminal contributions of Cerioli and Zani (1990) and Cheli and Lemmi (1995), which are further elaborated in Betti et al. (2015).2

In the generalised form, these MF are defined for any individual i as follows:

where X is equivalised income in monetary poverty or the overall score s in non-monetary poverty; w is the sample weight of individual of rank у in the ascending distribution, and aK (K = 1,2) are parameters corresponding, respectively, to monetary and non-monetary dimensions of poverty. aK are estimated so that the mean of the corresponding MF is equal to the ARPR calculated on the basis of the official poverty line. Betti and Verma (2008) call the monetary-based indicator fuzzy monetary (FM) and the non-monetary indicator as fuzzy supplementary (FS).

Construction of the FS Measure

Betti et al. (2015) proposed a step-by-step procedure for measuring the FS, as follows. First, we begin by identifying the items to be included in the index or indices, which should be those that are more meaningful and useful (see

Eurostat 2000, 2002). In fact, it is desirable to avoid items for which issues of choice in terms of possession versus non-possession cannot be satisfactorily resolved, whose possession is relatively rare (i.e. possession of a boat), or whose degree of comparability among regions or countries is not sufficient. Then, for each item, we determine a quantitative deprivation indicator in the range [0,1]: these indicators are used in a first exploratory factor analysis to identify the highlighted ‘dimensions’. By dimension, we mean a distinct group of items of non-monetary poverty, ideally independent of other dimensions, which describe a particular facet of living conditions. After this first exploratory factor analysis, we rearrange some items in the dimensions identified to create more meaningful groups: to test the goodness of fit of this final grouping, a confirmatory factor analysis is necessary. The weights to be assigned to each item are determined within each dimension; they are based on two elements, namely, the dispersion of the item (prevalence weights) and the correlation with other items in the same dimension (correlation weights); for a detailed description of the weight construction, see Betti and Verma (2008).

The score within each dimension h, sjh, is calculated as a weighted mean of the items in this dimension; the overall score si is defined as a simple average of the dimension scores sih, thus giving the same importance to all the dimensions, each of which represents a different facet of FS poverty. Finally, as explained above, FS is defined in Equation (7.1). The positive results achieved from applying this methodology to fields other than poverty demonstrate its applicability and robustness (see e.g. Aassve et al., 2007; Betti et al., 2011, 2016b; Belhadj, 2015; Betti, 2017).

Fuzzy-Set Operations Appropriate for an Analysis of Poverty and Deprivation

In this section, we describe the rules for the manipulation of multiple fuzzy sets, which are appropriate for an analysis of deprivation longitudinally and in multiple dimensions.

Let /I, be the propensity for the poverty of an individual at time t. In a conventional (non-fuzzy, ‘crisp’) analysis, it is a {0,1} dichotomy: an individual or household can have one of two possible states, ‘poverty’ fy, = 1) and ‘non-poverty’ (/t, = 0, or its complement p, = 1); the two states are exhaustive and mutually exclusive, meaning (/j, + p, =1).

Fuzzy-set operations are a generalisation of the corresponding crisp-set operations in the sense that the former reduce to (exactly reproduce) the latter when the fuzzy membership functions, being in the whole range [0,1], are reduced to a [0,1] dichotomy. Again, at any given time t there are two possible states, ‘poverty’ with propensity 0 < /л, < 1 and ‘non-poverty’ with propensity 0 < pi < 1. In the crisp formulation, only one of the above two quantities is non-zero, and an individual belongs uniquely to one or the other of the two states. In a fuzzy formulation, generally, both the quantities are non-zero, specifying the degree to which an individual belongs to each of the two states. In common with the crisp formulation, these propensities satisfy the constraint (p, + ja, =1) covering the two possible states.

Operation for Two Fuzzy Sets

We start with the simple case of two fuzzy sets: they could refer to the propensity for deprivation at two points in time (e.g. for a panel of individuals) or in 2D (e.g. monetary and non-monetary). The formal treatment of the two situations is identical.3

Let (pp p,) be the propensity for the poverty of an individual at times t = (1,2). The fuzzy operations of the intersection of propensities at two such time points, defining a set of four sequences or trajectories, are specified in Table 7.1.

The fourth column in Table 7.1 identifies the type of fuzzy-set operation involved, following the terminology explained by Klir and Yuan (1995).4 Intersection (1) corresponds to the propensity for being in poverty (to be in the state ‘poor’) at both times t = (1, 2); this is the individual propensity for being in ‘continuous poverty’. Intersection (2) corresponds to the propensity for being in poverty at t = 1 and in non-poverty at t = 2; intersection (3) is just the reverse: non-poverty at t = 1 and poverty at t = 2. Together, (2) and (3) give the individuals’ propensity for being in ‘transient poverty’. Intersection (4) corresponds to the propensity for being in non-poverty (to be in the state ‘non-poor’) at both times (‘never poor’). Its complement, max (pp p2), corresponds to the propensity for being in poverty in at least one of the two times (‘any-time poverty’). Note that this is given by standard fuzzy union, intersection (5) in Table 7.1. The fifth column in the table expresses the propensities when using the notation of an ordered set p(|| = max (p,, p2) > min (pt, /q) = p(2). It clearly shows that

Table 7.1 Four sequences defined by intersection of propensities at the two points in time

Intersection

Area in Fig. 1

Propensity

Type of

fuzzy

operation

Propensity using ordered set p(l) > Pm > -

A

Standard

max(0, B)

Bounded

max(0, В)

Bounded

C

Standard

A + B

Standard

pt = propensity for poverty at time t = (1, 2); P< =1 -fi . Sum of propensities (1) to (4) = 1.

the composite operation, involving standard and bounded operations in the table, satisfies the basic requirement that the propensities for the four mutually exclusive and exhaustive sequences or trajectories (1)—(4) for an individual add up to 1.

As noted, our framework for the analysis of fuzzy sets applies equally to multi-dimensional deprivation. Consider, for example, two dimensions of deprivation: monetary and non-monetary. An analysis of the occurrence of deprivation in both dimensions simultaneously parallels the analysis of continuous poverty (poverty at both times); similarly, either dimension of deprivation parallels any-time poverty.

Longitudinal Analysis of Deprivation Conceptualised as a Sequence of Fuzzy States

Now we extend the procedure presented above to an analysis of poverty over any number of time periods (or equivalently, for any number of dimensions of deprivation) in a consistent and substantively meaningful way.

Consider an individual with a propensity for poverty pt, over T points in time. The corresponding propensities for non-poverty are their complements, pt = 1 - pr We also find it useful to order the /г( values with the ordered set pw > pp] > ... > p{T). Consider first a sequence of states that are ‘poor’ at all the T times. The fuzzy operation involved in the standard operation is:

This intersection is the individual’s propensity for being in a state of continuous poverty during T periods. More generally, we consider the two possible states, namely being ‘poor’ or being ‘non-poor’ at each point in time. It defines 2T distinct sequences, each of which takes a length T. For a particular sequence, let t e Xp be the set of points at which an individual’s state of being poor is being considered, and XN = {T- Xp) be the set of remaining time points at which the state considered is that of being non-poor. The individual’s propensity to follow that particular sequence is the intersection of the T quantities {pt, t e Xp; Д, = 1 - //(, t e XN).

The result can be seen more clearly by ordering the cross-sections according to the magnitude of the propensity. With = 1 = ma x(pt,t e Xn),

algebraically, the overlap equals:

which is simply the bounded intersection between propensities mp and inN.

Equation (7.2) has a non-zero value only when mf > MN. This requirement has the following noteworthy consequence: consider the sequence with P time points at which an individual’s state of being poor is considered (with the state of being non-poor at the remaining time points N =T-P)An the set of all such sequences, only one sequence at most can have a non-zero propensity for an individual: the sequence in which P points have the highest pt values. Using the notation of ordered propensities p(t) introduced earlier, this means mr = p{P) and MN = /t(f)+1).

In Equation (7.2), an individual’s propensity for having that particular sequence is:

This is an individual’s propensity for being in the poor state exactly P times during the T periods. Note that these are T + 1 such sets, corresponding to 0 < P < T. The sum of propensities in Equation (7.2) over these sets is 1 for any individual.5 Summing Equation (7.3) over the series of values P, P + 1, ..., T gives the propensity for being in the poor state at least P times as:

Longitudinal Fuzzy Measures

Table 7.1 and Equations (7.2) to (7.4) provide the basis for constructing individual propensities, say pl, for diverse longitudinal measures of poverty. Table 7.2 shows examples of movements between the states of poverty and non-poverty.

Elere are examples of the measures of persistent poverty for any T time periods.

Table 7.2 Examples of movements between the states of poverty and non-poverty

T =

pattern6

Description

2

PP

Continuous poverty: poor at both times

2

pn + np

Transient poverty: poor at one of the two times

2

nn

Never poor

3

pnp

Temporary exit from poverty: poor at time 1, non-poor at 2, again poor at 3

3

npn

Temporary fall into poverty; non-poor at time 1, poor at 2, again non-poor at 3

4

npnp

Constant movement between states of poverty and non-poverty

etc.

For T = 4 time points (say, years), persistent poverty could be defined as being in the state ‘poor’ for at least three of the four years, in which case:

Eurostat has adopted a somewhat different definition of persistent poverty in its ‘at-persistent-risk-of-poverty’ rate; this rate is defined as the proportion of people who are poor in the current year (year 4) and were also poor in at least two of the preceding three years (years 1, 2 and 3). Two cases need to be considered:

  • (1) Ra> R{}), in which the required propensity of EU-persistent poverty is fiL = as in Equation (7.8).
  • (2) fi4 > In this case, the propensity being considered is non-zero only for the sequence with state of poverty at all the four points, and therefore the required propensity of EU-persistent poverty is ц1 =

Both cases are covered by:

Conventional measures are usually in terms of the proportion of people with a specific pattern of poverty. Fuzzy equivalents of the conventional measures are given by the mean value of the individual propensities for the pattern concerned:

where wt are the appropriate sample weights.

In the section below we present estimates for Equations (7.6), (7.7) and (7.9) and of their standard errors for T = 4 waves of EU-SILC.

Data and Variance Estimation Procedure Data: EU-SILC Rotational Panel Design

The data used in this chapter come from the EU-SILC survey, which is the most important source of comparable statistics on income and living conditions in Europe. The survey is conducted every year in each participating country. The EU-SILC’s total sample size is about 250,000 households in 31 countries, 500,000 adults (age 16+) and 600,000 people of all ages per year. The EU-SILC collects various types of data from different sources: cross-sectional and longitudinal; at the level of households and individuals; on income and social conditions; and from registers and interview surveys depending on the country.

Nearly all EU member countries adopted a standard integrated design. The design recommended by Eurostat is developed and described in Verma and Betti (2006). It involves a rotational panel in which a new sample of households and individuals is introduced each year to replace one-quarter of the existing sample. Each person enumerated in each new sample is followed up in the survey for four years, along with that person’s entire household. Every year the design yields a cross-sectional sample as well as longitudinal samples of various durations.

For each year, the cross-sectional sample is made up of four sub-samples or panels (Figure 7.1). A new panel is added each year that will remain in the survey for four years and then dropped to be replaced by another new panel. Movers from the original sample are followed up in their new location for as long as their panel remains in the survey. This scheme provides both cross-sectional and longitudinal data from the same common set of people or households. The cross-sectional sample for year T consists of sub-samples 1-4, with one introduced each year from (T- 3) to T. A longitudinal sample consists of those who have remained in the survey since they were first present in it. Three overlapping longitudinal samples of different durations are formed: two-year duration from sub-samples (2 + 3 + 4); three-year duration from sub-samples (3 + 4); and four-year duration from subsample (4). In this chapter, we analyse longitudinal measures over the four-year panel.

All micro-level data have been weighted for variations in selection probabilities, non-response, other shortcomings in implementation, calibration on the basis of external data and population size. For the most part, the weighting has been implemented centrally by Eurostat; the weighting procedure used is developed and described in Verma et al. (2007).

The EU-SILC survey collects comparable multi-dimensional micro-data on income, economic hardship, social exclusion, housing, labour, education and health. Information on social exclusion and housing conditions is collected mainly at the household level, whereas labour, education and healthcare information is obtained for each individual aged 16 and over. The core variable, income at the very detailed component level, is collected mainly at the individual level and then it is aggregated to the household level to construct household income. The reference income is household disposable income, which is converted into the equivalised household income using a proper conversion equivalence scale. The equivalised household income is then ascribed equally to each household member.

A fundamental dimension of the EU-SILC is that both cross-sectional and longitudinal data are collected. The cross-sectional component covers

A Illustration of a simple rotational design

Figure 7A Illustration of a simple rotational design.

information pertaining to the current and recent periods, such as the preceding calendar year. It aims to provide estimates of cross-sectional levels and of net changes from one period (year) to another. The longitudinal component covers information compiled or collected through repeated enumeration of individuals and then linked over time at the micro level. It aims to measure gross (micro-level) change. Both cross-sectional and longitudinal micro-data sets are updated on an annual basis.

Data Subset Used in the Present Research

The data subset used in the EU-SILC survey is the one for Spain. For the purpose of a longitudinal analysis, we constructed a four-year panel for the years 2008-2011. In this panel, we selected only people who are present in all the four years.

Thanks to research co-operation with the OECD (Piacentini, 2014; Verma et al., 2017), we have access to the public cross-sectional and longitudinal 2011 EU-SILC data (User Data Base) for Spain and to the non-public variables concerning the sample structure, such as the code for regions at the NUTS 2 level (DB040), sampling strata (DB050) and PSUs (DB060).8

The EU-SILC national surveys are mainly designed with a focus on the production of reliable estimates at the national level,9 and the EU-SILC survey for Spain has a large sample: 13,109 households and 34,756 individuals (2011 cross-sectional sample). This permits direct analysis not only at the national but also at the regional (NUTS 2) level. For our purposes, we constructed a four-year longitudinal balanced panel; this means that we consider only individuals who are present in all four years 2008-2011 of the sample. The final sample size of the panel totals 8,309.

The Spain EU-SILC 2011 Intermediate Quality Report (INE, 2012) gives us a detailed description of the sample design, which is important for understanding whether regions form independent sampling domains and for the construction of the ‘computational’ PSUs and strata. The sample is a two-stage design. The first-stage units are stratified census sections. In each autonomous community [self-ruling region], the first-stage units are stratified by the size of the municipality in the census section it is in.

An independent sample is designed in each autonomous community to represent it, because one of the INE’s survey objectives is to provide data at this level of disaggregation. The sample is distributed across autonomous communities by allocating one part (40%) uniformly and the remainder in proportion to autonomous community size. Each section is made up of around 400 addresses. The second stage comprises principal family addresses selected for the sample in the census section; all households that usually reside in those addresses are surveyed.

The regions are treated as sampling domains. DB040, DB050 and DB060 define, respectively, the region, stratum and PSU for each individual or household in the micro-dataset. The first two variables together define unique strata within each region separately. Using the sample description and the three variables mentioned above makes it possible to define the computational strata and PSUs for variance estimation. Sample weights have been computed at the level of individuals in the sample.

JRR Variance Estimation Procedure

Practical procedures for estimating the variance in complex statistics from complex surveys require that certain basic assumptions of the sample design be met. (1) The survey is based on a probability sample. (2) The sample selection is independent between strata, with two or more primary selections (PS) drawn at random, independently and with replacement from each stratum.10 (3) The total number of PS is large enough (say > 30) for valid use of the approximations in the variance estimation methods.11 (4) Subsampling of any complexity might involve PSUs and might differ from one PSU to another. (5) The PS at the same stratum do not differ greatly in size, i.e. in the number (more precisely the sum of sample weights) of final units selected (say cv < 0.1).

In addition, practical calculation procedures for estimating sampling errors for complex surveys: (6) must take into account the actual complex structure of the design; (7) should be flexible enough to apply to diverse designs; (8) should be suitable and convenient for large-scale application, producing results routinely for diverse statistics and sub-classes; (9) should be robust to the departure of the actual sample design from the ideal model assumed in the calculation method; (10) should have desirable statistical properties, such as small mean-square error of the variance estimator; (11) should be economical in terms of effort and cost; and (12) suitable computer software should be available to apply the method.

These assumptions are usually met or are reasonably approximated in most large-scale population-based surveys (for more details, see Verma, 1991). However, another requirement must be added, which, unlike those mentioned earlier, is often not met in practice. The assumption is that (13) all essential information on the sample structure is available. Sampling error calculations must take variations in the sample design into account through the definition of the sample structure. To apply the calculation procedures, it is necessary to have full access to the variables that define the structure of the sample, at least on stratification, the PSUs and weights of sample units and a detailed description of the sample design. Even when this type of information on sample structure is collected in the surveys (and in the EU-SILC surveys used in this chapter for quantitative illustration), not all of it is included in the micro-data in the public domain available to researchers. The lack of this information in complex samples makes it impossible to calculate valid estimates of the sampling error.

The calculation method used in the estimation of the standard errors of the complex statistics from the complex samples in this chapter is JRR, which is a practical method based on measures of observed variability among replications of the full sample. After the set of replications is appropriately defined for a complex design, the same variance estimation algorithm can be applied to statistics of any level of complexity. The variance estimates take into account the effect on the variance of aspects of the estimation procedure, which are repeated for each replication. In principle, this can include complex effects, such as those of imputation and weighting, though the full repetition of these procedures for each replication often is not feasible. We extended and applied JRR for estimating variances for subpopulations (including regions and other geographic divisions), longitudinal measures such as persistent poverty rates, and measures of net changes and averages over cross-sections in rotational panel designs. We use JRR in the version proposed by Verma and Betti (2011 );12 moreover, Betti et al. (2018) show that the JRR variant by Verma and Betti (2011) is particularly suitable at estimating the variance of fuzzy poverty measures.

The basic equation for JRR variance estimation is:

where г is a full-sample estimate of any complexity;

subscript i indicates a sample PSU and h is its stratum; a > 2 is the number of PSUs in stratum b;

z{hj) is the estimate produced using the same procedure after PSU i in stratum h is eliminated and the weight of the remaining (ah - 1) units in the stratum is increased by an appropriate factor gh; z,h) is the simple average of the z(hh over ah sample units in b;

(1 - fh) is the finite population correction, usually ~1 for samples in typical social surveys; g{hi) = wh/(wh -whi), where wh = with

whj = / whjjas the sum of sample weights of final units j in PSU i.

The replications are calculated by deleting one PSU at a time and appropriately increasing the weight of the remaining PSUs in the same stratum. In going from a cross-sectional to a longitudinal (panel) sample, the important point to note is that the sample structure for each final unit in the four-year panel - regions, strata, PSUs - is defined by the structure when the unit is first entered into the sample and remains fixed for the duration of the sample. A unit can move to different locations over time, but this does not affect the structure of the sample used in selecting the unit. This means that replications, once constructed, remain unchanged over the duration of the panel. Hence, each replication is itself a panel sample, so all the longitudinal measures Ц1 defined above can be constructed from the cross-sectional measures Ht in the same way. The pt are cross-sectional estimates for the panel sample.

Construction of Computational Strata and PSUs

This procedure enables the construction of ‘computational’ PSUs and strata for variance estimation. In our analysis, we imposed the sample structure, namely, regions, strata and PSUs, for each individual in the four-year panel, which are defined by the structure at the beginning of the panel and remain fixed for the duration of the panel.

  • (1) The strata are confined within regions. The regional and PSU codes in combination directly identify unique computational strata for estimating variances in each region in most cases. In a few cases (about 10, i.e. on average less than one per region) a stratum contains only one PSU. Because the calculation procedure requires at least two PSUs per stratum, any stratum with a single unit is merged with another stratum to provide the final computational stratum.
  • (2) By design, each PSU is confined to a single stratum. However, we found that, in some cases, a PSU extended across more than one stratum and occasionally even across different regions. These cases represent a mis- specification of the PSU code in the rotating panel design of the survey. For each unit in the sample, the code should specify the original PSU where the unit was selected. It appears that these errors concern people who have changed their residence since they participated in the survey, and in error the code identifies their current rather than their original PSU. Mostly, the majority of the samples with problematic PSUs remained in the same stratum, and a very small proportion (movers) were coded as being in a different stratum. In these cases, there is no way to identify the original PSU from the information available. In each stratum, these movers were grouped to form a new, separate PSU in the stratum. In this way, the resulting computational PSUs did not cut across computational strata, as required by a design with independent sampling within regions and by the variance estimation procedure.
  • (3) Finally, we found some PSUs with a very small number of samples. When the number of adults in a PSU was less than six, we combined this PSU with others so that the resulting computational PSU had at least six people. In practice, this procedure was quite long and demanding, but not complex, and yielded appropriate computational PSUs and strata.

Empirical Analysis

Using the quantitative data described above, this section analyses the estimates of sampling errors for longitudinal poverty and deprivation-related variables. The results for each of the 19 regions in Spain enable analysis from a comparative perspective. Because the regions form independent sample domains, we can treat them as domains in a comparative analysis, in a way that is similar to a multi-country comparative analysis.

Tables 7.3, 7.4 and 7.5 calculate the three longitudinal measures - anytime, continuous and (EU) persistent - for the three measures of poverty: (1) conventional poverty; (2) fuzzy monetary poverty (FM); and (3) fuzzy supplementary (non-monetary) deprivation (FS).

Region

n

ANY-TIME

Conventional

ANY-TIME ES

ANY-TIME EM

ratio est fs/conv

ratio est fm/conv

ratio CV ES/ CONV

ratio CV ЕМ/ CONV

est

CV

est

CV

est

CV

ESI 1

643

0.349

0.107

0.465

0.081

0.354

0.079

1.34

1.01

0.76

0.74

ESI 2

375

0.221

0.213

0.254

0.155

0.233

0.137

1.15

1.05

0.73

0.64

ES13

286

0.356

0.127

0.354

0.131

0.295

0.171

1.00

0.83

1.03

1.34

ES21

398

0.276

0.190

0.248

0.109

0.251

0.162

0.90

0.91

0.57

0.85

ES22

311

0.120

0.304

0.243

0.170

0.162

0.192

2.03

1.36

0.56

0.63

ES23

311

0.365

0.205

0.291

0.129

0.340

0.149

0.80

0.93

0.63

0.73

ES24

349

0.179

0.225

0.244

0.146

0.213

0.158

1.37

1.19

0.65

0.70

ES30

660

0.289

0.160

0.290

0.087

0.280

0.100

1.00

0.97

0.54

0.63

ES41

512

0.327

0.197

0.277

0.128

0.326

0.129

0.85

1.00

0.65

0.65

ES42

445

0.500

0.098

0.310

0.126

0.421

0.080

0.62

0.84

1.29

0.81

ES43

365

0.559

0.140

0.389

0.103

0.462

0.085

0.69

0.83

0.74

0.61

ES51

841

0.301

0.121

0.318

0.099

0.296

0.079

1.06

0.98

0.82

0.66

ES52

651

0.358

0.126

0.392

0.090

0.343

0.087

1.09

0.96

0.71

0.69

ES53

221

0.171

0.191

0.305

0.121

0.208

0.158

1.78

1.21

0.64

0.83

ES61

980

0.469

0.070

0.456

0.062

0.440

0.047

0.97

0.94

0.88

0.66

ES62

377

0.358

0.206

0.420

0.134

0.362

0.162

1.17

1.01

0.65

0.79

ES63

132

0.379

0.475

0.339

0.394

0.315

0.423

0.89

0.83

0.83

0.89

ES64

45

0.287

1.213

0.190

0.298

0.281

0.920

0.66

0.98

0.25

0.76

ES70

407

0.403

0.210

0.434

0.080

0.354

0.148

1.08

0.88

0.38

0.71

Mean

0.330

0.241

0.327

0.139

0.312

0.183

0.99

0.95

0.58

0.76

St. Dev./Mean

0.338

1.044

0.244

0.572

0.2SS

1.070

0.72

0.75

0.55

1.02

Notes: n is the number of people present in all four years of the panel. The mean and the ratio st. devimean for the last four columns are calculated as the

ratio of the corresponding mean value across regions. The official regional codes are used, and the names of the regions are identified on the Eurostat web site.

Region

n

CONTINUOUS

Conventional

CONTINUOUS

FS

CONTINUOUS

FM

ratio est fs/conv

ratio est fm/conv

ratio CV FS/CONV

ratio CV FM/ CONV

est

CV

est

CV

est

CV

ES11

643

0.051

0.311

0.136

0.129

0.079

0.144

2.65

1.55

0.42

0.46

ES12

375

0.005

0.660

0.043

0.214

0.051

0.202

9.23

10.87

0.32

0.31

ES13

286

0.029

0.483

0.092

0.281

0.070

0.208

3.15

2.40

0.58

0.43

ES21

398

0.014

0.723

0.052

0.322

0.044

0.264

3.81

3.23

0.45

0.37

ES22

311

0.005

0.766

0.056

0.376

0.021

0.357

12.00

4.53

0.49

0.47

ES23

311

0.072

0.544

0.059

0.219

0.112

0.275

0.82

1.57

0.40

0.51

ES24

349

0.006

0.565

0.040

0.294

0.048

0.214

6.37

7.69

0.52

0.38

ES30

660

0.027

0.387

0.054

0.152

0.056

0.156

1.97

2.04

0.39

0.40

ES41

512

0.037

0.336

0.060

0.201

0.085

0.244

1.63

2.32

0.60

0.72

ES42

445

0.113

0.243

0.059

0.225

0.136

0.144

0.52

1.21

0.92

0.59

ES43

365

0.168

0.179

0.087

0.186

0.195

0.100

0.52

1.16

1.04

0.56

ES51

841

0.025

0.321

0.066

0.166

0.062

0.149

2.64

2.48

0.52

0.46

ES52

651

0.037

0.373

0.081

0.158

0.079

0.149

2.18

2.11

0.43

0.40

ES53

221

0.037

0.804

0.086

0.399

0.060

0.310

2.30

1.62

0.50

0.38

ES61

980

0.087

0.209

0.143

0.123

0.138

0.114

1.64

1.59

0.59

0.55

ES62

377

0.101

0.474

0.079

0.198

0.111

0.267

0.78

1.09

0.42

0.56

ES63

132

0.155

0.616

0.167

0.527

0.144

0.523

1.08

0.93

0.86

0.85

ES64

45

0.056

1.213

0.012

0.808

0.058

1.043

0.21

1.03

0.67

0.86

ES70

407

0.052

0.502

0.098

0.194

0.092

0.221

1.88

1.77

0.39

0.44

Mean

0.0S7

0.511

0.077

0.272

0.086

0.268

1.36

1.52

0.53

0.52

St. Dev./Mean

0.853

0.493

0.493

0.610

0.501

0.795

0.58

0.59

1.24

1.61

Notes: n is the number of people present in all four years of the panel. The mean and the ratio st. dev ./mean for the last four columns are calculated as the

ratio of the corresponding mean value across regions. The official regional codes are used, and the names of the regions are identified on the Eurostat web site.

Region

n

PERSISTENT

Conventional

PERSISTENT

FS

PERSISTENT

FM

ratio est fs/conv

ratio est fm/conv

ratio CV FS/CONV

ratio CV FM/ CONV

est

CV

est

CV

est

CV

ESI 1

643

0.072

0.235

0.199

0.119

0.123

0.101

2.75

1.71

0.50

0.43

ES12

375

0.074

0.397

0.079

0.173

0.102

0.223

1.06

1.37

0.44

0.56

ES13

286

0.092

0.393

0.132

0.233

0.109

0.213

1.43

1.18

0.59

0.54

ES21

398

0.034

0.471

0.081

0.212

0.074

0.233

2.35

2.15

0.45

0.49

ES22

311

0.019

0.714

0.104

0.277

0.039

0.359

5.40

2.02

0.39

0.50

ES23

311

0.164

0.343

0.091

0.175

0.173

0.213

0.55

1.06

0.51

0.62

ES24

349

0.061

0.380

0.068

0.204

0.081

0.243

1.11

1.32

0.54

0.64

ES30

660

0.048

0.353

0.090

0.147

0.082

0.140

1.86

1.71

0.42

0.40

ES41

512

0.081

0.313

0.090

0.183

0.138

0.208

1.11

1.69

0.59

0.67

ES42

445

0.198

0.234

0.117

0.184

0.198

0.140

0.59

1.00

0.79

0.60

ES43

365

0.210

0.150

0.128

0.145

0.250

0.100

0.61

1.19

0.96

0.66

ES51

841

0.055

0.250

0.110

0.145

0.091

0.140

2.02

1.66

0.58

0.56

ES52

651

0.074

0.248

0.139

0.128

0.122

0.127

1.88

1.64

0.52

0.51

ES53

221

0.043

0.719

0.128

0.340

0.080

0.191

2.97

1.86

0.47

0.27

ES61

980

0.165

0.161

0.210

0.095

0.201

0.089

1.27

1.22

0.59

0.56

ES62

377

0.145

0.370

0.144

0.224

0.153

0.210

0.99

1.05

0.61

0.57

ES63

132

0.155

0.616

0.188

0.500

0.171

0.510

1.21

1.10

0.81

0.83

ES64

45

0.056

1.213

0.053

0.125

0.070

0.891

0.94

1.25

0.10

0.74

ES70

407

0.082

0.341

0.179

0.158

0.126

0.197

2.19

1.54

0.46

0.58

Mean

0.096

0.416

0.122

0.198

0.125

0.238

1.27

1.30

0.48

0.57

St. Dev./Mean

0.S99

0.606

0.370

0.473

0.433

0.781

0.62

0.78

1.65

1.43

Notes: n is the number of people present in all four years of the panel. The mean and the ratio st. dev./mean for the last four columns are calculated as the

ratio of the corresponding mean value across regions. The official regional codes are used, and the names of the regions are identified on the Eurostat web

site.

For each of them, standard errors are estimated using JRR. We present the estimates, their coefficient of variation, and the ratios of the estimates and the coefficients of variation (CV) of fuzzy measures over conventional measures.

The empirical analysis indicates that in general the fuzzy measures have smaller standard errors than conventional measures. This confirms the findings in Betti et al. (2018). This is particularly true for the any-time and persistent poverty measures. For continuous poverty, this is true in terms of the number of regions with smaller standard errors using fuzzy measures than with the conventional one. The values of the ratios for regions where the opposite situation (smaller standard errors for conventional measures) occurs are very large. The CV ratio offers more evidence of the stability of the fuzzy longitudinal poverty measures. For any-time poverty, the CV ratio shows a reduction of 25% in FM and 30% in FS; the FS ratio is more variable than the FM ratio. The CV ratio shows a greater reduction for continuous poverty than for any-time poverty. In this case, the reduction is about 45-50%. Again, the FS ratio is more variable than the FM ratio, but the variation is smaller than it is for any-time poverty.

The ratio of CV again shows a greater reduction for persistent poverty than for any-time poverty and is in line with the ratio for continuous poverty. The reduction is about 45%. The variation in the CV ratio for FM and FS is very similar to the one for continuous poverty.

Notes

  • 1 NUTS is an abbreviation for Nomenclature of Statistical Territorial Units, Eurostat’s hierarchical classification of regions, from member states (NUTS 0) to smaller areas.
  • 2 See also Lemmi and Betti (2006) for further contributions on philosophy, mathematics, economics of the fuzzy-set approach to poverty measurement, and the contributions of Cheli and Betti (1999), Belhadj (2011, 2012), Alkire and Foster (2011), Belhadj and Limam (2012) and the book of Betti and Lemmi (2013).
  • 3 We express the following arbitrarily in terms of longitudinal poverty at two points in time.
  • 4 Fuzzy-set operations are generalisations of corresponding ‘crisp’ operations, however, there is more than one way to formulate the fuzzy-set operations (e.g. standard, algebraic and bounded operations), each representing an equally valid generalisation of the corresponding crisp-set operations. Klir and Yuan (1995) provide an excellent discussion of different types of fuzzy operations. The choice among alternative formulations should be made primarily on substantive grounds: some options are more appropriate (meaningful, useful, illuminating, convenient) than others, depending on the context and objectives of the application. Following Verma et al. (2017), in this chapter we present a formulation that is suitable specifically for the study of poverty and deprivation.
  • 5 plOI = 1 and n(Ttl = 0 by definition.
  • 6 p and n are used in sequence to specify the poor and non-poor pattern referred to.
  • 7 This is given by the sum of rows (2) and (3) in Table 7.1; at most, only one of those rows is non-zero.
  • 8 Full information on sample structure design for other EU-SILC surveys is unavailable in the public-access data files.
  • 9 Analysis at the regional (sub-national) level requires special procedures; see Verma et al. (2010b) on the robustness of EU-SILC-based indicators at the regional level.
  • 10 The term ‘primary selection’ refers to a set of final units drawn independently from a primary sample unit (PSU).
  • 11 This is to ensure that, although the income distribution on which the measures analysed here are based is highly skewed, the sampling distribution of the measures constructed from large enough samples tends towards a normal distribution.
  • 12 Verma and Betti (2011) demonstrate that a variant of the JRR method can fit better in the case of ‘complex measures’.

References

Aassve A., Betti G., Mazzuco S., Mencarini L. (2007), Marital disruption and economic well-being; a comparative analysis, Journal of the Royal Statistical Society, Series A, 170(3), pp. 781-799.

Alkire S., Foster J. (2011), Counting and multidimensional poverty measurement, Journal of Public Economics, 95(7-8), pp. 476-487.

Alper M.O., Berger Y.G. (2015), Variance estimation of change in poverty rates: An application to Turkish EU-SILC survey, Journal of Official Statistics, 31(2), pp. 155-175.

Belhadj B. (2011), A new fuzzy unidimensional poverty index from an information theory perspective, Empirical Economics, 40(3), pp. 687-704.

Belhadj B. (2012), New weighting scheme for the dimensions in multidimensional poverty indices, Economic Letters, 116(3), pp. 304-307.

Belhadj B. (2015), Employment measure in development countries via minimum wage and poverty: new Fuzzy approach, Opsearch, 52(2), pp 329-339.

Belhadj B., Limam M. (2012), Unidimensional and multidimensional fuzzy poverty measures: New approach, Economic Modelling, 29(4), pp. 995-1002.

Berger Y.G., Priam R. (2016), A simple variance estimator of change for rotating repeated surveys: An application to the EU-SILC household surveys, Journal of the Royal Statistical Society, Series A, 179(1), pp. 251-272.

Berger Y.G., Skinner C.J. (2003), Variance estimation of a low-income proportion, Journal of the Royal Statistical Society, Series C, 52, pp. 457-468.

Betti G., D’Agostino A., Neri L. (2011), Educational mismatch of graduates: A multidimensional and fuzzy indicator, Social Indicators Research, 103(3), pp. 465-480.

Betti G., Gagliardi F., Lemmi A., Verma V. (2012), Sub-national indicators of poverty and deprivation in Europe: methodology and applications, Cambridge Journal of Regions, Economy and Society, 5(1), pp. 149-162.

Betti G., Gagliardi F., Lemmi A., Verma V. (2015), Comparative measures of multidimensional deprivation in the European Union, Empirical Economics, 49(3), pp. 1071-1100.

Betti G., Gagliardi F., Verma V. (2016a), Variance estimation for cumulative and longitudinal poverty indicators from panel data at regional level, in Pratesi M. (ed.), Analysis of Poverty Data by Small Area Estimation, John Wiley, United Kingdom, pp. 129-147.

Betti G., Gagliardi F., Verma V. (2018), Simplified Jack-knife variance estimates for fuzzy measures of multidimensional poverty, International Statistical Review,

86(1), pp. 68-86.

Betti G., Lemmi A. eds. (2013), Poverty and Social Exclusion: New Methods of Analysis. London: Routledge.

Betti G., Soldi R., Talev I. (2016b), Fuzzy multidimensional indicators of quality of life: The empirical case of Macedonia, Social Indicators Research, 127(1), pp. 39-53.

Betti G., Verma V. (2008), Fuzzy measures of the incidence of relative poverty and deprivation: A multi-dimensional perspective, Statistical Methods and Applications, 17, pp. 225-250.

Binder D.A. (1983), On the variance of asymptotically normal estimators from complex surveys, International Statistical Review, 51, pp. 279-292.

Binder D.A., Kovacevic M.S. (1995), Estimating some measures of income inequality from survey data: An application of the estimation equation approach, Survey Methodology, 21, pp. 137-145.

Binder D.A., Patak Z. (1994), Use of estimation functions for interval estimation from complex surveys, Journal of the American Statistical Association, 89, pp. 1035-1043.

Cerioli A., Zani S. (1990), A fuzzy approach to the measurement of poverty, in Dagum C., Zenga M. (eds.), Income and Wealth Distribution, Inequality and Poverty, Springer, Berlin, pp. 272-284.

Cheli B., Betti G. (1999), Fuzzy analysis of poverty dynamics on an Italian pseudo panel, 1985-1994, Metron, 57, pp. 83-104.

Cheli B., Lemmi A. (1995), A totally fuzzy and relative approach to the multidimensional analysis of poverty, Economic Notes, 24, pp. 115-134.

Demnati A., Rao J.N.K. (2004), Linearization variance estimators for survey data, Survey Methodology’, 30, pp. 17-26.

Deville J.C. (1999), Variance estimation for complex statistics and estimators: Linearization and residual techniques, Survey Methodology, 25, pp. 193-203.

Elbers C., Lanjouw J.O., Lanjouw P. (2003), Micro-level estimation of poverty and inequality, Econometrica, 71(1), pp. 355-364.

European Commission (2010), Europe 2020 - A European strategy for smart, sustainable and inclusive growth.

Eurostat (2000), European Social Statistics: Income, Poverty and Social Exclusion, Detailed Tables, Luxembourg.

Eurostat (2002), European Social Statistics: Income, Poverty and Social Exclusion, 2nd Report, Luxembourg.

Foster J., Greer J., Thorbecke E. (1984), A class of decomposable poverty measures, Econometrica, 52, pp. 761-766.

Giusti C., Marchetti S., Pratesi M., Salvati N. (2012), Robust small area estimation and oversampling in the estimation of poverty indicators, Survey Research Methods, 6(3), pp. 155-163.

Goedeme T., Rottiers S. (2011), Poverty in the Enlarged European Union. A discussion about definitions and reference groups, Sociology Compass, 5(1), pp. 77-91.

Goedeme T. (2013), How much confidence can we have in EU-SILC? Complex sample design and standard error of the Europe 2020 poverty indicators, Social Indicators Research, 110, pp. 89-110.

Graf E., Tille Y. (2014), Variance estimation using linearization for poverty and social exclusion indicators, Survey Methodology, 40(1), pp. 61-79.

Instituto Nacional De Estadistica (INE) (2012), Intermediate Quality Report, Survey on Income and Living Conditions Spain (Spanish ECV 2011).

Klir G.J. and Yuan B. (1995), Fuzzy Sets and Fuzzy Logic. Englewood Cliffs, NJ: Prentice-Hall.

Kovacevic M.S., Yung W. (1997), Variance estimation for measures of income inequality and polarization - an empirical study, Survey Methodology, 23(1), pp. 41-52.

Lemmi A., Betti G. eds. (2006), Fuzzy Set Approach to Multidimensional Poverty Measurement. New York: Springer.

Muennich R., Zins S. (2011), Variance Estimation for Indicators of Poverty and Social Exclusion. Work package of the European project on Advanced Methodology for European Laeken Indicators (AMELI).

Osier G. (2009), Variance estimation for complex indicators of poverty and inequality using linearization techniques, Survey Research Methods, 3, pp. 167-195.

Piacentini M. (2014), Measuring Income Inequality and Poverty at the Regional Level in OECD Countries. OECD Statistics Working Papers, 2014/03, OECD.

Preston I. (1995), Sampling distributions of relative poverty statistics, Applied Statistics, 44, pp. 91-99.

United Nations General Assembly (2015), Transforming Our World: The 2030 Agenda for Sustainable Development.

Verma V. (1991), Sampling Methods, Training Handbook. Tokyo: Statistical Institute for Asia and the Pacific.

Verma V., Betti G. (2006), EU statistics on income and living conditions (EU-SILC): Choosing the survey structure and sample design, Statistics in Transition, 7(5), pp.935-970.

Verma V., Betti G. (2011), Taylor linearization sampling errors and design effects for poverty measures and other complex statistics, Journal of Applied Statistics, 38(8), pp. 1549-1576.

Verma V. Betti G., Gagliardi F. (2010a), An Assessment of Survey Errors in EU-SILC. Eurostat Methodologies and Working Papers, Luxembourg: Publications Office of the European Union.

Verma V., Betti G., Gagliardi F. (2010b), Robustness of Some EU-SILC Based Indicators at Regional Level. Eurostat Methodologies and Working Papers, Luxembourg: Publications Office of the European Union.

Verma V., Betti G., Gagliardi F. (2017), Fuzzy measures of longitudinal poverty in a comparative perspective, Social Indicators Research, 130(2), pp. 435-454.

Verma V., Betti G., Ghellini G. (2007), Cross-sectional and longitudinal weighting in a rotational household panel: applications to EU-SILC, Statistics in Transition, 8(1), pp. 5-50.

Verma V., Lemmi A., Betti G., Gagliardi F., Piacentini M. (2017), How precise are poverty measures estimated at the regional level?, Regional Science and Urban Economics, 66, pp. 175-184.

Whelan C.T., Maitre B. (2007), Measuring material deprivation in EU-Silc: Lessons from the Irish survey, European Societies, 9(2), pp. 147-173.

Zadeh L.A. (1965), Fuzzy sets, Information and Control, 8(3), 338-353.

Zheng B. (2001), Statistical inference for poverty measures with relative poverty lines, Journal of Econometrics, 101, pp. 337-356.

 
Source
< Prev   CONTENTS   Source   Next >