Study One: Dimensionality Analysis

Three psychometric issues were considered to evaluate the internal test structure of the Summative ELPAC. First, the number of dimensions (also known as latent factors or language skills) were analyzed to see whether they reflect the number of skills that the items measure. Goodness of model fit indices were used to evaluate the number of dimensions. In this study, two dimensions representing oral and written language skills were expected based on hierarchical score reporting structure. Two dimensions representing receptive (listening and reading) and productive (speaking and writing) language skills were also evaluated to find the best way of demonstrating the dimensionality of the Summative ELPAC.

TABLE 8.2 Demographic Information for Summative ELPAC Field Test Sample.

Subgroup Category

ELPAC

FT Sample

CALPADS

Total Data

Total ELs

41,942

91,639

Gender

Male

53.7%

54.8%

Female

46.3%

45.2%

Ethnicity

Hispanic or Latino

83.3%

85.1%

Asian

9.2%

7.9%

White

4.5%

4.1%

Black or African American

0.3%

0.3%

Native Hawaiian or Pacific Islander

0.3%

0.3%

American Indian or Alaska Native

0.1%

0.1%

Two more races

0.3%

0.3%

Other Subgroups

Students with disabilities

9.9%

16.6%

Economically disadvantaged

84.4%

85.8%

Home Language

Spanish

84.0%

85.6%

Chinese

2.6%

2.1%

Vietnamese

1.8%

1.7%

Arabic

1.6%

1.5%

Hmong

0.6%

0.6%

Note. Students with limited English proficiency (LEP) status — Yes from CALPADS data obtained on January' 4, 2017, when the Summative ELPAC field test sample roster was selected.

TABLE 8.3 Number of Items, Score Points by Grade/Grade Span and by Four Language Skills.

Number of Items (Points)

Grade/Grade Span

K

N=5,338

/

N=6,368

2

N=6,338

3-5

N=6,013

6-8

N=5,M6

9-10

N=6,015

11-12

N=5,142

Total

139 (194)

152 (209)

175 (243)

182 (260)

173 (254)

174 (255)

174 (255)

Listening

48 (48)

41 (41)

48 (48)

53 (53)

52 (52)

55 (55)

55 (55)

Speaking

30 (65)

29 (61)

35 (73)

33 (72)

36 (78)

34 (73)

34 (73)

Reading

37 (45)

52 (52)

64 (64)

72 (72)

61 (61)

60 (60)

60 (60)

Writing

24 (36)

30 (55)

28 (58)

24 (63)

24 (63)

25 (67)

25 (67)

The second issue was how dimensions were correlated with each other. The correlational relationship between dimensions was considered as an important criterion in the dimensionality evaluation because of its implications for the meaningfulness of each domain score and total score (as combined domain scores) in the multidimensional scale. For example, a high correlation between dimensions (e.g., higher than 0.9) indicates that the dimensions are not distinct from each other (Bagozzi & Yi, 1988). In this study, we expected that the two dimensions representing oral and written language skills were distinct but moderately correlated (around 0.7-0.8).

The third issue was how to interpret the meaning of the identified dimensions. The hypothetically defined dimensions that underlie the observed item responses were evaluated by the significance of item parameter estimates (i.e., factor loading estimates in terms of factor analytic approach). The meaning of each dimension was important because a clear understanding of the scores’ meaning is necessary for the proper interpretation of scale scores and ultimately for the choice of scoring methods in psychometric analysis.

Analysis

In this study, the MIRT-based item-level factor analytic approach was used to evaluate four models that represent the hypothesized factor structure of the Summative ELPAC. These four models were:

  • 1. Model 1: A correlated four-factor model in which listening, speaking, reading, and writing items are considered unique language skills (shown in Figure 8.3a)
  • 2. Model 2: A correlated two-factor model in which listening and speaking items are considered oral language skills and reading and writing items are considered written language skills (shown in Figure 8.3b)
a Correlated four-factor model (listening, speaking, reading, writing)

FIGURE 83a Correlated four-factor model (listening, speaking, reading, writing).

Note. Squares and ellipse(s) represent observed items and latent factor(s) respectively.

  • 3. Model 3: A correlated two-factor model in which listening and reading items are considered receptive language skills and speaking and writing items are considered productive language skills (shown in Figure 8.3c)
  • 4. Model 4: A single-factor model in which all four language skills are psy-chometrically indistinguishable from one another (shown in Figure 8.3d)

As illustrated in Figure 8.3, the current study used the simplified factor structure models for practical reasons (i.e., score interpretation issues). These models strongly assumed that each ELPAC test item was associated with a single language skill (listening, speaking, reading, or writing), although items from integrated task types measure multiple language skills. It was the best practice in score reporting to associate each test item to one reporting category; the factor structures tested in this study followed this principle. In instances in which a test item assessed integrated skills, the test item was mapped to the language skill that the student used to provide a response. For example, the task type called Summarize an Academic Presentation assessed a student’s ability to

b Correlated two-factor model (oral and written)

FIGURE 83b Correlated two-factor model (oral and written).

listen to a presentation while viewing supporting images and then provide a spoken summary of the main points and details of the presentation with the assistance of the images. While the item assessed a student’s ELP in the skills of listening and speaking, the item was mapped to the speaking domain because the student’s response was spoken (e.g., Sawaki et al., 2008).

All analyses were conducted using software FlexMIRT (Cai, 2013). Due to the mixed response type data for the ELPAC (i.e., multiple choice and constructed response items), both the unidimensional and multidimensional versions of the 2PLM and GPCM were used. The unidimensional model, which was the most parsimonious, was used as a baseline for comparison with the results from the multidimensional models. In total, 28 factor models (4 factor models x 7 grades/ grade spans) were evaluated to identify the best-fitting test structure of the Sum-mative ELPAC.

Three goodness of model fit indices were used for evaluating the factor models: Akaike information criterion (AIC), Bayesian information criterion (BIC), and -2 log-likelihood (-2 LL). AIC and BIC are comparative measures of fit, which can be used to compare either nested or non-nested models, so

c Correlated two-factor model (receptive and productive)

FIGURE 8.3c Correlated two-factor model (receptive and productive).

they are useful in comparing multiple competing models. The model with the lowest AIC and BIC is the best-fitting model. The difference of -2 LL and the ratio of -2 LL to the degree of freedom were also evaluated.

Despite considerable advances in the estimation of confirmatory factor models, there is no rule of thumb or perfect fit index for factor-model acceptance. In general, the more complex factor model tends to fit better than the model with fewer factors. Thus, model parsimony and the reasonableness of individual parameter estimates (statistical significance) as well as correlations among the latent factors were considered in the evaluation of competing models to select the most useful structure.

Results

Table 8.4 presents the summary of model fit indices for all grade levels. The best-fitting model was the correlated four-factor model and the least-fitting model was the single factor model across all grades. The oral and written

d Unidimensional model (ELP

FIGURE 83d Unidimensional model (ELP: English language proficiency).

factor model also fit reasonably well across all grades, while the receptive and productive factor model only fit well for Grades 6-12.

Factor loadings, which are the regression slopes of test items weighted by latent factor(s), are reported in Tables 8.5 through 8.7 and indicate how distinctively each hypothesized factor contributes to test structure.

The values higher than 0.3 are assumed reasonable factor loading. The results show that reasonable factor loadings were derived from all three multifactor models. Tables 8.6 and 8.7 also provide the correlations between two pairs of latent factors (oral and written; receptive and productive) respectively. Correlations between receptive and productive language skills for kindergarten and Grades 1, 2, and 3-5 tests ranged from 0.80 to 0.84 (Table 8.6), while such correlations were lower for Grade 6 and up. So, the receptive and productive language skills were not as distinct at kindergarten through Grade 5 as they were at Grades 6—12. Table 8.8 shows the correlation across language skills in listening, speaking, reading, and writing. The results showed that reading and writing language skills were more highly correlated in the lower grades than in the higher grades (0.68 to 0.87). The correlation between listening and speaking

TABLE 8.4 Summary of Model Fit Statistics (Kindergarten through Grade 12).

Grade/

Grade Span

Model

df

-2 Log Likelihood

AIC

BIC

Order of Lowest

Fit

Value

К

IF

333

367,779

368,445

370,637

4

2F (O+W)

334

353,507

354,175

356,374

2

2F (R+P)

334

364,232

364,900

367,099

3

4F (L+S+R+W)

339

347,407

348,085

350,316

1

1

IF

361

336,648

337,370

339,810

4

2F (O+W)

362

328,357

329,081

331,528

2

2F (R+P)

362

335,013

335,737

338,184

3

4F (L+S+R+W)

367

324,944

325,678

328,158

1

2

IF

418

408,755

409,591

412,414

4

2F (O+W)

419

400,133

400,971

403,801

2

2F (R+P)

419

406,056

406,894

409,724

3

4F (L+S+R+W)

424

394,676

395,524

398,387

1

3-5

IF

442

455,579

456,463

459,426

4

2F (O+W)

443

452,188

453,074

456,043

2

2F (R+P)

443

453,443

454,329

457,298

3

4F (L+S+R+W)

448

448,248

449,144

452,147

1

6-8

IF

427

466,630

467,484

470,319

4

2F (O+W)

428

463,530

464,386

467,228

3

2F (R+P)

428

463,336

464,192

467,033

2

4F (L+S+R+W)

433

458,802

459,668

462,543

1

9-10

IF

429

505,500

506,358

509,233

4

2F (O+W)

430

500,863

501,723

504,605

3

2F (R+P)

430

499,019

499,879

502,761

2

4F (L+S+R+W)

435

492,963

493,833

496,749

1

11-12

IF

429

429,173

430,031

432,839

4

2F (O+W)

430

426,355

427,215

430,029

3

2F (R+P)

430

424,588

425,448

428,263

2

4F (L+S+R+W)

435

420,489

421,359

424,206

1

Note. IF denotes the single-factor model; 2F (O+W) denotes the correlated two-factor model with oral and written language skills; 2F (R+P) denotes the correlated two-factor model with receptive and productive skills; 4F (L+S+R+W) denotes the correlated four-factor model. AIC and BIC denote Akaike and Bayesian Information Criteria, respectively.

language skills was also moderately high (0.59 to 0.69) across all grades/grade spans but more highly correlated in the lower grades.

Test length as well as ease of score scale maintenance over future administrations were two practical issues considered during the factor model evaluation. Although the correlated four-factor model had a slightly better model fit, it would require a greater number of items to implement than either the receptive and productive

TABLE 8.5 Correlations and Factor Loadings from Oral and Written Language Skills.

Grade/

Grade Span

Mean (and SD) of Non-zero Factor Loadings

Correlations across

Latent Factors

Oral

Written

К

.55 (.14)

.61 (.23)

.62

1

.54 (.14)

.66 (.09)

.62

2

.51 (.11)

.67 (.13)

.58

3-5

.47 (.12)

.45 (.15)

.70

6-8

.47 (.18)

.39 (.12)

.72

9-10

.50 (.19)

.46 (.13)

.75

11-12

.48 (.18)

.43 (.14)

.77

TABLE 8.6 Correlations and Factor Loadings from Receptive and Productive Language Skills.

Grade/

Grade Span

Mean (and SD) of Non-zero Factor Loadings

Correlations across

Latent Factors

Receptive

Productive

К

.53 (.12)

.64 (.18)

.82

1

.59 (.15)

.58 (.11)

.82

2

.61 (.18)

.51 (.14)

.84

3-5

.44 (.15)

.47 (.10)

.80

6-8

.39 (.14)

.53 (.12)

.71

9-10

.45 (.14)

.59 (.13)

.69

11-12

.43 (.15)

.57 (.12)

.69

TABLE 8.7 Factor Loadings from Four Language Skills (Listening, Speaking, Reading, and Writing).

Grade/

Grade Span

Mean (and SD) of Non-zero Factor Loadings

Listening

Speaking

Reading

Writing

К

.57 (.12)

.70 (.07)

.54 (.14)

.85 (.06)

1

.56 (.14)

.66 (.11)

.70 (.09)

.67 (.10)

2

.58 (.15)

.59 (.08)

.71 (.14)

.67 (.10)

3-5

.48 (.11)

.60 (.11)

.46 (.18)

.52 (.10)

6-8

.43 (.15)

.66 (.09)

.40 (.13)

.51 (.09)

9-10

.48 (.14)

.72 (.09)

.48 (.15)

.56 (.09)

11-12

.44 (.15)

.70 (.07)

.46 (.16)

.55 (.10)

TABLE 8.8 Correlations across Four Language Skills (Listening, Speaking. Reading, and Writing).

Grade/Grade Span

Domains

Correlations among Latent Factors

Listening

Speaking

Reading

Writing

K

Listening

1

Speaking

.69

1

Reading

.79

.68

1

W riting

.57

.52

.81

1

1

Listening

1

Speaking

.69

1

Reading

.65

.55

1

Writing

.59

.55

.87

1

2

Listening

1

Speaking

.61

1

Reading

.71

.48

1

Writing

.57

.43

.84

1

3-5

Listening

1

Speaking

.60

1

Reading

.74

.54

1

W riting

.59

.54

.78

1

6-8

Listening

1

Speaking

.59

1

Reading

.75

.49

1

Writing

.61

.60

.68

1

9-10

Listening

1

Speaking

.61

1

Reading

.78

.49

1

Writing

.65

.69

.70

1

11-12

Listening

1

Speaking

.61

1

Reading

.78

.48

1

W riting

.69

.68

.68

1

factor model or the oral and written factor model. Test length was an important consideration in this situation in which the Summative ELPAC would be administered to over 1.2 million students per year. Not only do longer tests increase the burden of test administration and scoring, they also reduce the amount of instructional time, since paper-based tests were administered by local test examiners, who were often ESL classroom teachers. At kindergarten and Grade 1, all domains were administered one-on-one between a test examiner and a student, and the speaking domain was delivered as a one-on-one administration at all grades/grade spans. The small advantage that the correlated four-factor model had in terms of model-fit was outweighed by the negative consequences of increased test length.

A comparison of the oral and written factor model with the receptive and productive factor model revealed mixed results. While the oral and written factor model had a better fit at Grades K—5, the empirical data supported the receptive and productive factor model for Grades 6-12. Since the empirical data did not lead to a clear choice between these two factor models, further practical considerations were made. It is important to note that all receptive language skill items were multiple choice (objectively scored) items, and all productive language skill items were constructed response (scored by human raters) items. The results of the receptive and productive factor model could be interpreted as simple distinctions between two item types instead of latent language skills. One practical issue regarding scoring weights for kindergarten made the oral and written language model more viable than the receptive and productive factor model.

While one purpose of the ELPAG is to provide information about whether a student should be designated an EL, it is difficult to collect such information using assessments of reading and writing at kindergarten when native speakers of English are still developing those skills. As a result, educators in California wanted to place less weight on reading and writing scores than listening and speaking scores in producing overall proficiency scores at kindergarten. In this case, the oral and written language model was suitable because different weights could easily be applied to kindergarten reading and writing scores while maintaining the continuous scales between kindergarten and Grade 1. Since written language skill represented the combined score of reading and writing items, applying the weights on kindergarten students’ scores was simpler using the oral and written factor model than the receptive and productive factor model.

The dimensionality analysis yielded empirical evidence regarding the model fit of four different factor models. Given both statistical analyses and practical considerations, the oral and written factor model was finally selected to represent the internal structure of the Summative ELPAC. The outcome of statistical evaluation across grades/grade spans based on the oral and written language model was considered as a whole to facilitate the anticipated development of continuous scales across the grades/grade spans.

 
Source
< Prev   CONTENTS   Source   Next >