Establishing Criterion-related Validity:
An Examination of the Concurrent Validity
of the CPT Reading Comprehension Test
Anthony R. Napoli, Lanette A. Raymond, Cheryl A. Coffey, and Diane M. Bosco
Suffolk County Community College
Abstract
The establishment of valid placement test standards must conform to accepted and established psychometric practices in order to serve as indices of academic preparedness for exemption from remedial course work and entry into traditional academic programs of study. Thus, in addition to the selection of an assessment instrument with empirically substantiated construct validity, test users must empirically establish the criterion-related validity of the decision points (cutoffs) on a placement test's score distribution. The present article reports and discusses the results of an ongoing research effort to empirically establish the cutoff points for the CPT Reading Comprehension Test (CPT-R; College Entrance Examination Board, 1990) at a two-year community college.Introduction
To provide access and excellence in postsecondary education an institution requires an
effective placement program which assesses basic skills and places students in appropriate
coursework levels at the beginning of their college careers (Smittle, 1993). The ability of college
placement tests to serve as reliable and valid indices of academic preparedness and to assist the
decision-making processes regarding placement in or exemption from remedial coursework, and
selection of appropriate classroom curriculum, ultimately depends on the meaning and usefulness
of the information the tests convey. The present study assesses and improves the usefulness of a
widely used test, the College Board's Computerized Placement Test in Reading Comprehension
(CPT-R), by examining its criterion-related validity in terms of both cut-off scores and their
correspondence to grade-level reading ability. The study provides information to assist
administrators and developmental-reading instructors in coordinating course content with student
skills proficiencies.
Norm-referenced tests, such as the Scholastic Aptitude Test (SAT), American College Test
(ACT), and Graduate Record Exam (GRE), as well as the College Board's Computerized
Placement Tests (CPTs) have high levels of statistical reliability, and content and construct
validity. A test or measure which is statistically reliable has been found to have a statistically
acceptable degree of accuracy or consistency in the assessment of a subject's performance when
the subject is reexamined with the same test or with sets of equivalent items (Aron & Aron, 1994;
Anastasi & Urbina, 1997). CPT-R scores have been found to be accurate in repeated tests, and
consistent across items (College Entrance Examination Board, 1986).
Content and construct validity result when a measure contains a representative sample of the
domain, in this case subject area, it claims to measure (Aron & Aron, 1994; Anastasi & Urbina,
1997). Ideally, content and construct validity are built into a measure as it is constructed through
selection of appropriate items. Items on the CPT-R were selected by reading specialists from a
larger group of items assembled by an advisory committee of experts in the field of reading.
These specialists defined the chosen set of items as representative of college level skills in the
subject area of reading, thus the CPT-R is said to have content and construct validity.
Nevertheless, norm-referenced tests, including the CPT-R, reveal little more about the test-takers
than their relative position on the score distribution.
The CPT-R is used to guide reading skills related placement decisions at 350 colleges and
universities nationwide. Suffolk County Community College (SCCC) administers the CPT-R to
approximately 5,000 students each year. Yet, despite its widespread use, the criterion related
validity of the CPT-R has not been thoroughly examined.
Criterion-related validity represents performance in relation to particular tasks, or discrete
cognitive or behavioral objectives. Measures of concurrent validity, the degree to which scores on
two or more measures directly measure the same thing, and predictive validity, the degree to
which scores predict performance, or both, determine criterion-related validity (Aron & Aron,
1994; Anastasi, 1982; Anastasi & Urbina, 1997). Thus, the CPT-R's concurrent validity is
measured by the degree to which scores on the test correlate with other tests purported to measure
reading skills. Similarly, the predictive validity of the CPT-R is measured by the degree to which
scores on the test accurately predict future reading performance. The concurrent validity of the
CPT-R or its predictive validity, or both most comprehensively, determine the level of
criterion-related validity of this test. Only when sufficient criterion-related validity has been
established, and points on the CPT-R's score continuum can be reliably equated to some relevant
cognitive or behavioral skill, can the test user appropriately deploy the assessment tool to make
placement and curriculum decisions. Further research is required to establish the criterion-related
validity of college placement tests, including the CPT-R.
Evidence has been accumulating in support of the CPT-R's criterion-related validity. Sammon's
(1988) study of the concurrent validity of the New Jersey College Basic Skills Placement Test in
Reading (New Jersey Basic Skills Council, 1987), a paper and pencil predecessor to the College
Entrance Examination Board's (CEEB) CPT-R (CPT-R; CEEB, 1990; Smittle, 1993) , reported
significant correlations (r = .77 and r = .653) between the Degrees of Reading Power test (DRP)
and the New Jersey test employing two independent samples of college students. This established
the criterion-related validity of the New Jersey College Basic Skills Placement Test in Reading.
Later, efforts by Napoli (1991) and Napoli and Wortman (1995) examined the predictive
validity of the CPT-R by employing overall college grade point average and performance in
introductory psychology classes as targeted criterion variables. In addition to finding significant
correlations between CPT-R scores and course grades in Introductory Psychology (r = .52), and
between CPT-R scores and overall grade point average (r = .41), the study was also successful in
identifying specific points (cutoffs) on the CPT-R distribution as predictive of successful and
unsuccessful academic outcomes.
Smittle (1993) studied both the predictive and the concurrent validity of the CPT assessment
battery, which consists of four subtests including Reading Comprehension, Sentence Skills,
Arithmetic, and Algebra, against the ACT to establish the criterion-related validity of each
subtest of the CPT. Although the ACT is more widely used for the assessment of basic skills and
college placement than the CPT, the CPT tests were better predictors of overall academic
performance in college than the ACT tests, and the CPT-R was found to be more discriminating
among levels of reading competency than the ACT's composite reading placement test. This
information is useful to the degree that it establishes the CPT-R's concurrent validity with another
norm-referenced test. Most importantly, Smittle was able to suggest a cut-off score on the CPT-R
which represented college level reading ability and placed the same percentage of students at
each course level that had been previously placed at those levels with traditional paper and pencil
tests.
Murphy's (1995) analysis examined the construct and predictive validity of the CPT-R with
regard to the three sub-scores of the Nelson Denny Reading tests. Significant correlations were
observed between the CPT-R and the Nelson Denny Vocabulary Test (r = .67), the Nelson Denny
Comprehension Test (r = .60), and the Nelson Denny Total Test (r = .69), employing a sample of
663 college students. The Nelson Denny Reading tests offer a set of grade-level reading ability
equivalents that could be applied to the CPT-R, but those grade-level score assignments
themselves require validation.
The present study extends the assessment of the criterion related validity of the CPT, by
examining the concurrent validity of the CPT-R and the Degrees of Reading Power (DRP; Koslin
et al, 1987, 1989) tests. The study builds upon the Murphy (1995) analyses to create a grade level
equivalency table, which would allow for the conversion of CPT-R scores to valid and
representative reading grade levels. The goal of the study is to provide both practitioners and
researchers with information concerning the utility and meaning of student scores on the CPT-R,
with regard to actual, not just relative, reading level and skills.
SCCC currently uses the CPT to assess the basic skills of incoming freshmen students. The
DRP is then used for further placement and monitoring of students identified by the CPT-R as
requiring reading remediation. Extensive psychometric studies on the DRP demonstrate that the
instrument possesses sufficient levels of reliability and validity (Koslin et al, 1987, 1989). The
test and its scoring rubric were designed to provide objective-referenced information concerning
the process of reading comprehension. In other words, DRP scores can and have been converted
into grade specific readability levels. Specifically, the test can discriminate among reading
abilities from fourth grade through twelfth grade, and first year college levels. It was expected,
therefore, that a sufficiently high correlation between CPT-R and DRP scores would allow
statistical procedures, such as regression analysis, to identify CPT-R scores which could serve as
reliable estimates of DRP-established performance standards (i.e. reading grade levels).
Additional knowledge of this type of criterion-related concurrent validity to the CPT-R test
would enhance the use of the instrument by increasing its capacity to assist in making more
precise placement and curriculum decisions across both remedial and college level
coursework.Method
Sample and Procedure
The sample consisted of 1,154 SCCC entering full-time freshmen students who, as part of the
admission process, were administered the CPT-R, received a score of 75 or less, and were placed
into a developmental reading course. Demographically, 45.4 percent of the students were men,
54.5 percent were women; seventy-seven percent of the students were non-minority whites, 6.6
percent were Hispanic, 4.9 percent were Black, 2.1 percent were Asian, and 9.4 did not report any
ethnic/racial affiliation. The mean age of the sample was 19.7 years (SD = 4.1). During the first
week of the semester the DRP was administered to each student in the reading class. Each student
received both tests within a two month interval.
Measures
Computerized Placement Test.
The Computerized Placement Test in Reading Comprehension (CPT-R) is a computerized
adaptive testing technique developed by the College Entrance Examination Board and the
Educational Testing Service (CEEB, 1990). Internal reliability (coefficient alpha = .90; CEEB,
1986) and test-retest reliability (rxx = .90; CEEB, 1986) are both high. Napoli and Wortman
(1995), Nold and Kuechenmeister (1991), and Ward, Kline, and Flauger (1986) provide evidence
for the construct and predictive validity of the test. Descriptively, the CEEB (1990) reports that:
The adaptive testing technique customizes tests according to each student's ability, presenting a student with a series of test questions at the appropriate level of difficulty for his or her abilities, knowledge, and background. Questions that are too difficult or too easy are avoided, and accurate results are obtained with fewer questions administered with no time limit.
Each student is presented with a series of 17 questions of two primary types. The first type consists of a reading passage followed by a question based on the text. Both short and long narratives are provided. The reading passage can also be classified according to the kind of information processing required, including explicit statements related to the main idea, explicit statements related to a secondary idea, application and inference. The second type of question, sentence relationships, presents two sentences followed by a question regarding the relationship between the two sentences. It may ask, for example, if the statement in the second sentence supports that in the first, if it contradicts it, or if it repeats the same information. Both reading passages and sentence relationship questions are also varied according to content categories to help prevent bias because of a student's particular knowledge. (CEEB, 1990, p 3).
Degrees of Reading Power test
The Degrees of Reading Power test, (Koslin, Zeno, and Koslin, 1987) is based on the Degrees of
Reading Power Program (Koslin, Zeno, and Koslin, 1989) and is a criterion-referenced holistic
measure of how well the messages within text are understood. The psychometric properties of the
test have been extensively studied (Koslin, Zeno, and Koslin, 1987, 1989), and demonstrate that
the test possesses high levels of reliability (KuderRichardson, KR-20= .95; and test-retest: rxx =
.95), construct validity, and criterion-related validity. According to the authors, the goal of the
test is to:
Assess current levels of reading achievement. Determine the most difficult prose a student can read with a specific degree or level of comprehension. Match the difficulty of materials with student ability, relative to the purpose of instruction. Set appropriate standards for achievement. Document growth in the ability to read with comprehension. And, indicate the extent of compensatory or remedial help, if any, that a student may need in order to achieve various personal goals, or to satisfy school-determined expectations in reading.
Each test consists of a number of nonfiction paragraphs and passages on a variety of non biased topics. Each paragraph contains a sentence with a blank space, and each passage has seven sentences that contain a blank space, to indicate that a word is missing. For each blank, four or five single-word response options are provided. Students must select the most appropriate response to complete the sentence. It is not possible to answer DRP test items correctly by relying only on the information in the sentence containing the blank. A paragraph, or at least several sentences, must be understood to respond successfully.
Results
Data analyses were run in three phases. Sample means and standard deviations for the CPT-R
and DRP appear in Table 1.
The first phase of analysis was designed to establish sufficient correlation between CPT-R and
DRP scores. The Pearson's product moment correlation coefficient was computed to assess the
strength of the overall relationship between the CPT-R and the DRP. The resulting correlation
coefficient (r = .783, p. < .0001) indicates that there is a significant and moderately strong,
relationship between the two reading tests. Then the DRP scores were organized into intervals
corresponding to their reading grade levels and group means were computed. The DRP reading
grade levels, ranging from fourth grade or below to college level, are shown in Table 2. In
addition, this table reports CPT-R means (n and standard error of the mean) for the students
falling in each DRP reading level. From the table, we see that the mean CPT-R score for students
falling in fourth grade or below equals 35.33 and, in a linear fashion, significant increases in
group means (F linearity (1,1152)= 2045.27, p< .0001) are observed for each successive increase in
DRP reading level.
The second phase consisted of a simple regression analysis of DRP scores on CPT-R scores.
Results are presented in Table 3. Employing the general linear regression equation (Y' = a+bX),
and applying the regression constants (see Table 3) to the raw CPT-R scores produced predicted
DRP scores. Selected CPT-R scores, predicted DRP test scores, and their corresponding grade
level performance standards appear in Table 4. Thus, it can be seen that a CPT-R score of 83 or
above is predictive of a basic reading proficiency level sufficient to work with first-year college
texts. CPT-R scores below this level are associated with reading proficiency below that required
to work with college texts.
Finally, the accuracy of prediction can be determined through an examination of the standard
errors of the estimates. CPT-R scores predict DRP scores and grade level with accuracy.
Specifically the standard error of the estimate for the DRP is ±5 points, and the standard
error of the estimate for grade level is ±1 year.(1) Summary
The objectives of the current research effort were to a) assess the concurrent validity of the
CPT Reading Comprehension test, and if sufficient concordance was observed, to b) empirically
identify points on the CPT-R score continuum which would serve as reliable indices of
college-level basic reading comprehension preparedness. Results from analyses indicated that
substantial correlation does, indeed, exist between the two measures, and that the CPT-R can be
employed, with a high degree of reliability and validity, to identify basic reading proficiency
skills commensurate to the demands of first year college-level texts.
Although this study affirmed the CPT-R college level cut point suggested by Smittle (1993)
which was based on an examination of the score distribution and replicated student placement
percentages found previously with traditional paper and pencil tests, it went further to equate
specific CPT-R scores with expected grade-level performance. This study also replicated the
CPT-R grade-level equivalencies produced in Murphy's (1995) analysis of the Nelson Denny
Reading Tests. Table 4 shows that a CPT-R score of 83 predicts reading ability commensurate
with the demand of college level work, as both the DRP and the Nelson Denny grade-level
assignments indicate. Thus, it is now possible to ascribe meaning to the CPT-R's norm-referenced
scores and to assess the reading ability reflected in a CPT-R score as "at a particular grade level."
In this light scores can be used as multidimensional indicators of proficiency.
The DRP/CPT-R conversion information resulting from this study has applications value for
the instructor, the classroom, and the institution. The CPT-R score now has enhanced meaning in
that it identifies student reading ability in terms of the level of difficulty the student can handle
and the type of scaffolding (support and intervention) required of the instructor and environment.
The practitioner can reference this information to select curriculum and materials with
appropriate level of difficulty. Further administrative application can be made in an effort to
create homogeneous learning groups within courses and individual classrooms.
The generalizations which can be drawn from the results of this study are somewhat limited by
the nature of the sample. The scores used in the present study were drawn from a population of
community college students' scores, thus, it would be recommended that further studies be done
before assuming that these results can be uniformly applied to scores drawn from other student
groups or types of institutions. This study may provide a useful framework for such additional
research efforts.
Table 1. DRP and CPT-R Means and Standard Deviations.
| Measure | Mean | Standard Deviation | Standard Error of Mean |
| CPT-R | 61.12 | 11.28 | .33 |
| DRP* | 63.16 | 8.81 | .26 |
*DRP scores used for the present study represent p.= 90 conversion scores for independent levels.
Table 2. DRP Reading Grade Levels and Actual CPT-R Means.
| DRP Reading Grade Level | N | Mean CPT-R | Standard Error of the Mean |
| Fourth Grade or below
(DRP = 40 or less) |
12 | 35.33 | 1.56 |
| Fifth Grade
(DRP = 41 - 48) |
60 | 40.72 | 0.97 |
| Sixth Grade
(DRP = 49 - 53) |
82 | 47.38 | 0.80 |
| Seventh Grade
(DRP = 54 - 58) |
172 | 53.30 | 0.58 |
| Eighth Grade
(DRP = 59 - 62) |
173 | 59.10 | 0.55 |
| Ninth Grade
(DRP = 63 - 67) |
292 | 64.40 | 0.39 |
| Tenth Grade
(DRP = 68 - 70) |
151 | 68.31 | 0.50 |
| Eleventh Grade
(DRP = 71 - 74) |
97 | 70.60 | 0.51 |
| Twelfth Grade
(DRP = 75 - 76) |
44 | 71.11 | 0.67 |
| College Level
(DRP = 77 or higher) |
71 | 74.46 | 0.56 |
Grade reading level is based on DRP-75 scores achieved by the end of the term.
Table 3. Regression of DRP on CPT-R.
| Variable | Rē | b | a | t |
| CPT-R | .659 | .634 | 24.419 | 29.24* |
*p < .0001, df = 1,154
Table 4. CPT Reading Comprehension Test Scores, Corresponding Predicted DRP1 Test Scores
and DRP Grade-Level Performance Standards2, and Corresponding Predicted Nelson-Denny
Comprehension Scores3 and Nelson-Denny Grade-Level Performance Standards3.
| CPT-R Scores | Predicted DRP Scores1 | DRP Grade-Level Performance Standards2 | Predicted Nelson-Denny Comprehension Scores3 | Nelson-Denny
Grade-Level
Performance Standards3 |
| 83 | 77 | College | 41.55 | 12th & College |
| 80 - 82 | 75 - 76 | 12th Grade | 40 - 41 | 12th Grade |
| 73 - 79 | 71 - 74 | 11th Grade | 38 - 39 | 11th Grade |
| 69 - 72 | 68 - 70 | 10th Grade | 36 - 37 | 10/11th Grade |
| 61 - 68 | 63 - 67 | 9th Grade | 33 - 35 | 9/10th Grade |
| 55 - 60 | 59 - 62 | 8th Grade | 30 - 32 | 8/9th Grade |
| 47 - 54 | 54 - 58 | 7th Grade | 27 - 29 | 7/8th Grade |
| 39 - 46 | 49 - 53 | 6th Grade | 24 - 26 | 6/7th Grade |
| 26 - 38 | 41 - 48 | 5th Grade | 18 - 23 | 5/6th Grade |
| 25 | 40 | 4th Grade and below | 17 | Below 5th Grade |
1 The Standard Error of Estimate predicting DRP from CPT-R is equal to ±5.15.
2 The Standard Error of Estimate predicting DRP grade level performance from CPT-R is equal to
±1.27. Source for DRP grade levels: Koslin, Zeno, & Koslin (1987), p. 143.
3 Source: Murphy, S. (1995).
References
Anastasi, A. (1982). Psychological testing (5th ed.) New York: Macmillan.
Anastasi, A. & Urbina, S. (1997). Psychological testing (7th ed.) Upper Saddle River, NJ:
Simon & Schuster.
Aron, A. & Aron, E. N. (1994). Statistics for psychology. Englewood Cliffs, NJ: Prentice Hall.
College Entrance Examination Board. (1986). Coordinator's notebook for computerized
placement tests. Princeton, NJ: Author.
College Entrance Examination Board. (1990). Coordinator's guide for computerized placement
tests, 3. Princeton, NJ: Author.
Koslin, B.L., Zeno, S., & Koslin, S. (1987). The DRP: An effective measure in reading. TASA
DRP Services, Brewster New York.
Koslin, B.L., Koslin, S. Zeno, S., & Ivens, S. (1989). The Degrees of Reading Power Test:
Primary and standard forms. TASA DRP Services, Brewster New York.
Napoli, A., (1991). Validating CPT Reading Comprehension Test standards employing
relevant college-level performance criteria. Background readings for College Placement Tests,
Educational Testing Service/The College Board. Princeton, New Jersey.
Napoli, A., & Wortman, P.M. (1995). Validating college-level reading placement test
standards. Journal of applied research in the Community College, 2(2), 143-151.
New Jersey Basic Skills Council (1987). New Jersey College Basic Skills Placement Testing.
Trenton, NJ: New Jersey Department of Higher Education.
Nold, D. & Kuechenmeister, M. (1991). Aims Community College computerized placement
tests pilot study: Summary report. Computerized Placement Tests: Background Readings. New
York: College Entrance
Murphy, S. (1995). An analysis of the construct and predictive validity of the CPT-R and
Nelson Denny tests. Unpublished manuscript, Rose State College, Midwest City, Oklahoma.
Sammon, S. F. (1988) . A correlational study: The New Jersey College Basic Skills Test and
the Degrees of Reading Power Test. Unpublished master's thesis, William Patterson College,
New Jersey.
Smittle, P. (1993) Computer adaptive testing: A new era. Journal of developmental education,
17 (1), 8-12.
Ward, W. C., Kline, R.G., and Flauger, J. (1986). Summary of pilot testing results. In College
Entrance Examination Board. (1986). Coordinator's notebook for computerized placement tests.
Princeton, NJ: Author.
Address correspondence to:
Anthony R. Napoli, Ph.D.
Office of Institutional Research
Suffolk Community College
533 College Road, r102b
Selden, NY 11784
e-mail: napolia@sunysuffolk.edu
1. The standard error of estimate indicates the degree to which scores may vary from the values predicted by the regression equation (Aron & Aron, 1994).