Menu
Home
Log in / Register
 
Home arrow Psychology arrow The Wiley Blackwell handbook of the psychology of recruitment, selection and employee retention
Source

Strengths and Weaknesses of Each Assessment Method

There are three sorts of data to investigate the strengths and weaknesses of each assessment method. The first is evaluation by academic experts. They are interested primarily in validity, but other factors too. There is no consensus, but clear trends can be seen. What these reviews lead one to conclude from the two criteria sets are the following. First, assessment centres and peer ratings are arguably the best selection methods. The former is expensive and the latter low cost. Second, many well-known methods (interviews, references) are of very limited validity. Third, surprisingly little is known about the potential bias of these tests. Fourth, despite the fact that this table was published over 15 years ago, few would disagree with the overall trends.

Schmitt (1989) argued for the validity of, but also fairness in, employment selection. Subgroup means refers to the fact that these tests show results for different groups of people (male vs. female, Black vs. White, old vs. young). This is an important area of bias (see Table 10.1). The larger the subgroup means, the more the potential bias in these tests which differentiate between various groups based on gender, age, race, etc.

Anderson and Cunningham-Snell (2000) make an interesting and important distinction between validity (i.e. predictive accuracy; see Table 10.2) and popularity (see Table 10.3). Cook (2009, pp. 283-287) lists six criteria for judging selection tests:

  • 1 Validity is the most important criterion. Unless a test can predict productivity, there is little point in using it.
  • 2 Cost tends to be accorded far too much weight. Cost is not an important consideration if the test has validity. A valid test, even the most elaborate and expensive, is almost always worth using.

Table 10.1 Level of validity and subgroup mean difference for various predictors.

Predictor

Validity

Subgroup Mean Difference

Cognitive ability and special aptitude

Moderate

Moderate

Personality

Low

Small

Interest

Low

?a

Physical ability

Moderate-high

Largeb

Biographical information

Moderate

?

Interviews

Low

Small (?)

Work samples

High

Small

Seniority

Low

Large (?)

Peer evaluations

High

?

Reference checks

Low

?

Academic performance

Low

?

Self-assessments

Moderate

Small

Assessment centres

High

Small

a=a lack of data or inconsistent data; b = mean differences largely between male and female subgroups.

Table 10.2 Predictive accuracy.

Predictive Accuracy

Range 0-1

Perfect prediction

1

Assessment centres - promotion

0.68

Work samples

0.54

Ability tests

0.54

Structured interviews

0.44

Integrity tests

0.41

Assessment centres - performance

0.41

Biodata

0.37

Personality tests

0.38

Unstructured interviews

0.33

Self-assessment

0.15

Reference

0.13

Astrology

0

Graphology

0

Table 10.3 Popularity of assessment methods.

Popularity

Interviews

97%

References

96%

Application forms

93%

Ability tests

91%

Personality tests

80%

Assessment centres

59%

Biodata

19%

Graphology

2.6%

Astrology

0%

Table 10.4 Summary of 12 selection tests by six criteria.

Selection Test

VAL

COST

PRAC

GEN

ACC

LEGAL

Interview

Low

Medium/Low

High

High

High

Uncertain

Structured

interview

High

High

?Limited

High

Untested

No problems

References

Moderate

Very low

High

High

Medium

Some doubts

Peer rating

High

Very low

Very

limited

Very

limited

Low

Untested

Biodata

High

High/Low

High

High

Low

Some doubts

Ability

High

Low

High

High

Low

Major problems

Psychomotor

test

High

Low

Moderate

Limited

Untested

Untested

Job Knowledge

High

Low

High

Limited

Untested

Some doubts

Personality

Variable

Low

High

High

Low

Some doubts

Assessment

High

Very high

Fair

Fair

High

No problems

Work sample

High

High

Limited

Limited

High

No problems

Education

Moderate

Nil

High

High

Untested

Major doubts

VAL=validity, COST = cost, PRAC = practicality, GEN = generality, ACC = acceptability, LEGAL = legality. Source: Adapted from Cook (2009, p. 386).

  • 3 Practicality is a negative criterion - a reason for not using a test.
  • 4 Generality simply means how many types of employees the test can be used for.
  • 5 Acceptability on the part of candidates is important, especially in periods of full employment.
  • 6 Legality is a negative criterion - a reason for not using something. It is often hard to evaluate, as the legal position on many tests is obscure or confused.

This implies that many organizations have to make a trade-off - cost for validity, practicality for generality. Second, while some methods perform well at some criteria and poorly at others, very few succeed at all criteria. Assessment centres are probably the most successful (see Table 10.4).

The six criteria provide some interesting issues for those using these methods to consider. A key criterion is cost. Cook notes that interview costs are generally graded as low to medium because interviews vary widely and because the costs are taken for granted as part of the process. In contrast, structured interview costs are high because the system has to be tailor-made and requires a full job analysis. Biodata costs are viewed as low or high, as their categorization depends on how they are used - the cost is high if the inventory has to be specially written for the employer, but it be might be low if ‘ready-made’ consortium biodata could be used. The cost of using educational qualifications is given as zero because the information is routinely collected from application forms, and limited analysis is used, save to confirm the data supplied matches the requirements of the role. A further check of qualification certificates may be made at the interview or on appointment, but even with this additional administration the costs remain low.

A second criterion is practicality. This means that the test is not difficult to introduce because it fits easily into the selection process. Ability and personality tests are very practical because they can be given when candidates come for interview, and they generally permit group testing. References are very practical because everyone is used to giving them. Employers may consider assessment centres as only fairly practical, because they need detailed organizing and do not fit into the conventional timetable of selection procedures.

Peer assessments are highly impractical because they require applicants to spend a long time with each other and may require briefings or pre-training to explain the process. Structured interviews may be seen as having limited practicality because managers may resist the loss of autonomy, preferring to use their own questions and questioning style. Finally, work-sample and psychomotor tests are seen as being of limited practicality because candidates have to be tested individually, rather than in groups.

The third criterion is generality. Most selection tests can be used for any category of worker, but Cook notes that true work samples and job knowledge tests can only be used where there is a specific body of knowledge or skill to test. This means they are restricted to skilled manual work. He notes that psychomotor tests are only useful for jobs that require dexterity or good motor control. Peer ratings can probably be used in uniformed disciplined services, due to issues of attendance, and the possible need for training or at least an understanding of the competences required. Assessment centres too tend to be restricted to managers, probably on grounds of cost, although they have been used for more junior posts.

The fourth criterion reviewed is legalization. While this varies between countries or states, much of the legislation has common origins relating to a desire to prevent discrimination on the grounds of gender, colour or ethnicity. Assessment centres, work samples and structured interviews do not usually cause legal problems, but educational qualifications and mental ability tests most certainly do. Cooked notes that in some areas, such as biodata, the position remains uncertain.

Cook notes that;

Taking validity as the overriding consideration, there are seven classes of test with high validity,

namely peer ratings, biodata, structured interviews, ability tests, assessment centres, work-

sample tests and job-knowledge tests. Three of tests have very unlimited generality, which

leaves biodata, structured interviews, ability tests and assessment centres.

  • • Biodata do not achieve such good validity as ability tests and are not as transportable, which makes them more expensive.
  • • Structured interviews have excellent validity but limited transportability, and are expensive to set up.
  • • Ability tests have excellent validity, can be used for all types of jobs, are readily transportable and are cheap and easy to use, but fall foul of the law in the US.
  • • Assessment centres have excellent validity, can be used for most grades of staff and are legally fairly safe, but are difficult to install and are expensive.
  • • Work samples have excellent validity, are easy to use and are generally quite safe legally, but are expensive, because they are specific to the job.
  • • Job-knowledge tests have good validity, are easy to use and are inexpensive because they are commercially available, but they are more likely to give rise to legal problems because they are usually paper-and-pencil tests.
  • • Personality inventories achieve poor validity for predicting job proficiency, but can prove more useful for predicting how well the individual will conform to the job’s norms and rules.
  • • References have only moderate validity, but are cheap to use. However, legal cautions are tending to limit their value (Cook, 2009, pp. 386-387).

Arnold, Silvester, Pattersin, Robertson, Cooper and Burnes (2005) provided a similar analysis of the literature. This is summarized in Table 10.5.

What stands out from Tables 10.1-10.5 is their similarity despite the fact that they may be based on a different database. Occasionally, an individual technique, such as a structured interview, is judged as fair to average (in terms of validity) by one, as good to excellent by another

Table 10.5 A summary of studies on the validity of selection procedures.

Selection Method

Evidence for Criterion- Related Validity

Applicant Reactions

Extent of Use

Structured interviews

High

Moderate to positive

High

Cognitive ability

High

Negative to moderate

Moderate

Personality tests

Moderate

Negative to moderate

Moderate

Biodata

Can be high

Moderate

Moderate

Work sample tests

High

Positive

Low

Assessment centres

Can be high

Positive

Moderate

Handwriting

Low

Negative to moderate

Low

References

Low

Positive

High

but overall the results are robust. Assessment centres, work-sample tests and cognitive ability tests are usually judged most valid in all reviews. This is not surprising as many base their assessments on the same data. What we can say, therefore, is that among academic reviewers there remains good consensus as to the efficacy of different assessment methods.

 
Source
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
 
Subjects
Accounting
Business & Finance
Communication
Computer Science
Economics
Education
Engineering
Environment
Geography
Health
History
Language & Literature
Law
Management
Marketing
Mathematics
Political science
Philosophy
Psychology
Religion
Sociology
Travel