Establishing Criterion-related Validity:

An Examination of the Concurrent Validity

of the CPT Reading Comprehension Test

Anthony R. Napoli, Lanette A. Raymond, Cheryl A. Coffey, and Diane M. Bosco

Suffolk County Community College

Abstract

The establishment of valid placement test standards must conform to accepted and established psychometric practices in order to serve as indices of academic preparedness for exemption from remedial course work and entry into traditional academic programs of study. Thus, in addition to the selection of an assessment instrument with empirically substantiated construct validity, test users must empirically establish the criterion-related validity of the decision points (cutoffs) on a placement test's score distribution. The present article reports and discusses the results of an ongoing research effort to empirically establish the cutoff points for the CPT Reading Comprehension Test (CPT-R; College Entrance Examination Board, 1990) at a two-year community college.Introduction

    To provide access and excellence in postsecondary education an institution requires an effective placement program which assesses basic skills and places students in appropriate coursework levels at the beginning of their college careers (Smittle, 1993). The ability of college placement tests to serve as reliable and valid indices of academic preparedness and to assist the decision-making processes regarding placement in or exemption from remedial coursework, and selection of appropriate classroom curriculum, ultimately depends on the meaning and usefulness of the information the tests convey. The present study assesses and improves the usefulness of a widely used test, the College Board's Computerized Placement Test in Reading Comprehension (CPT-R), by examining its criterion-related validity in terms of both cut-off scores and their correspondence to grade-level reading ability. The study provides information to assist administrators and developmental-reading instructors in coordinating course content with student skills proficiencies.
    Norm-referenced tests, such as the Scholastic Aptitude Test (SAT), American College Test (ACT), and Graduate Record Exam (GRE), as well as the College Board's Computerized Placement Tests (CPTs) have high levels of statistical reliability, and content and construct validity. A test or measure which is statistically reliable has been found to have a statistically acceptable degree of accuracy or consistency in the assessment of a subject's performance when the subject is reexamined with the same test or with sets of equivalent items (Aron & Aron, 1994; Anastasi & Urbina, 1997). CPT-R scores have been found to be accurate in repeated tests, and consistent across items (College Entrance Examination Board, 1986).
    Content and construct validity result when a measure contains a representative sample of the domain, in this case subject area, it claims to measure (Aron & Aron, 1994; Anastasi & Urbina, 1997). Ideally, content and construct validity are built into a measure as it is constructed through selection of appropriate items. Items on the CPT-R were selected by reading specialists from a larger group of items assembled by an advisory committee of experts in the field of reading. These specialists defined the chosen set of items as representative of college level skills in the subject area of reading, thus the CPT-R is said to have content and construct validity. Nevertheless, norm-referenced tests, including the CPT-R, reveal little more about the test-takers than their relative position on the score distribution.
    The CPT-R is used to guide reading skills related placement decisions at 350 colleges and universities nationwide. Suffolk County Community College (SCCC) administers the CPT-R to approximately 5,000 students each year. Yet, despite its widespread use, the criterion related validity of the CPT-R has not been thoroughly examined.
    Criterion-related validity represents performance in relation to particular tasks, or discrete cognitive or behavioral objectives. Measures of concurrent validity, the degree to which scores on two or more measures directly measure the same thing, and predictive validity, the degree to which scores predict performance, or both, determine criterion-related validity (Aron & Aron, 1994; Anastasi, 1982; Anastasi & Urbina, 1997). Thus, the CPT-R's concurrent validity is measured by the degree to which scores on the test correlate with other tests purported to measure reading skills. Similarly, the predictive validity of the CPT-R is measured by the degree to which scores on the test accurately predict future reading performance. The concurrent validity of the CPT-R or its predictive validity, or both most comprehensively, determine the level of criterion-related validity of this test. Only when sufficient criterion-related validity has been established, and points on the CPT-R's score continuum can be reliably equated to some relevant cognitive or behavioral skill, can the test user appropriately deploy the assessment tool to make placement and curriculum decisions. Further research is required to establish the criterion-related validity of college placement tests, including the CPT-R.
    Evidence has been accumulating in support of the CPT-R's criterion-related validity. Sammon's (1988) study of the concurrent validity of the New Jersey College Basic Skills Placement Test in Reading (New Jersey Basic Skills Council, 1987), a paper and pencil predecessor to the College Entrance Examination Board's (CEEB) CPT-R (CPT-R; CEEB, 1990; Smittle, 1993) , reported significant correlations (r = .77 and r = .653) between the Degrees of Reading Power test (DRP) and the New Jersey test employing two independent samples of college students. This established the criterion-related validity of the New Jersey College Basic Skills Placement Test in Reading.
    Later, efforts by Napoli (1991) and Napoli and Wortman (1995) examined the predictive validity of the CPT-R by employing overall college grade point average and performance in introductory psychology classes as targeted criterion variables. In addition to finding significant correlations between CPT-R scores and course grades in Introductory Psychology (r = .52), and between CPT-R scores and overall grade point average (r = .41), the study was also successful in identifying specific points (cutoffs) on the CPT-R distribution as predictive of successful and unsuccessful academic outcomes.
    Smittle (1993) studied both the predictive and the concurrent validity of the CPT assessment battery, which consists of four subtests including Reading Comprehension, Sentence Skills, Arithmetic, and Algebra, against the ACT to establish the criterion-related validity of each subtest of the CPT. Although the ACT is more widely used for the assessment of basic skills and college placement than the CPT, the CPT tests were better predictors of overall academic performance in college than the ACT tests, and the CPT-R was found to be more discriminating among levels of reading competency than the ACT's composite reading placement test. This information is useful to the degree that it establishes the CPT-R's concurrent validity with another norm-referenced test. Most importantly, Smittle was able to suggest a cut-off score on the CPT-R which represented college level reading ability and placed the same percentage of students at each course level that had been previously placed at those levels with traditional paper and pencil tests.
    Murphy's (1995) analysis examined the construct and predictive validity of the CPT-R with regard to the three sub-scores of the Nelson Denny Reading tests. Significant correlations were observed between the CPT-R and the Nelson Denny Vocabulary Test (r = .67), the Nelson Denny Comprehension Test (r = .60), and the Nelson Denny Total Test (r = .69), employing a sample of 663 college students. The Nelson Denny Reading tests offer a set of grade-level reading ability equivalents that could be applied to the CPT-R, but those grade-level score assignments themselves require validation.
    The present study extends the assessment of the criterion related validity of the CPT, by examining the concurrent validity of the CPT-R and the Degrees of Reading Power (DRP; Koslin et al, 1987, 1989) tests. The study builds upon the Murphy (1995) analyses to create a grade level equivalency table, which would allow for the conversion of CPT-R scores to valid and representative reading grade levels. The goal of the study is to provide both practitioners and researchers with information concerning the utility and meaning of student scores on the CPT-R, with regard to actual, not just relative, reading level and skills.
    SCCC currently uses the CPT to assess the basic skills of incoming freshmen students. The DRP is then used for further placement and monitoring of students identified by the CPT-R as requiring reading remediation. Extensive psychometric studies on the DRP demonstrate that the instrument possesses sufficient levels of reliability and validity (Koslin et al, 1987, 1989). The test and its scoring rubric were designed to provide objective-referenced information concerning the process of reading comprehension. In other words, DRP scores can and have been converted into grade specific readability levels. Specifically, the test can discriminate among reading abilities from fourth grade through twelfth grade, and first year college levels. It was expected, therefore, that a sufficiently high correlation between CPT-R and DRP scores would allow statistical procedures, such as regression analysis, to identify CPT-R scores which could serve as reliable estimates of DRP-established performance standards (i.e. reading grade levels). Additional knowledge of this type of criterion-related concurrent validity to the CPT-R test would enhance the use of the instrument by increasing its capacity to assist in making more precise placement and curriculum decisions across both remedial and college level coursework.Method

Sample and Procedure
The sample consisted of 1,154 SCCC entering full-time freshmen students who, as part of the admission process, were administered the CPT-R, received a score of 75 or less, and were placed into a developmental reading course. Demographically, 45.4 percent of the students were men, 54.5 percent were women; seventy-seven percent of the students were non-minority whites, 6.6 percent were Hispanic, 4.9 percent were Black, 2.1 percent were Asian, and 9.4 did not report any ethnic/racial affiliation. The mean age of the sample was 19.7 years (SD = 4.1). During the first week of the semester the DRP was administered to each student in the reading class. Each student received both tests within a two month interval.
Measures
Computerized Placement Test.
The Computerized Placement Test in Reading Comprehension (CPT-R) is a computerized adaptive testing technique developed by the College Entrance Examination Board and the Educational Testing Service (CEEB, 1990). Internal reliability (coefficient alpha = .90; CEEB, 1986) and test-retest reliability (rxx = .90; CEEB, 1986) are both high. Napoli and Wortman (1995), Nold and Kuechenmeister (1991), and Ward, Kline, and Flauger (1986) provide evidence for the construct and predictive validity of the test. Descriptively, the CEEB (1990) reports that:

The adaptive testing technique customizes tests according to each student's ability, presenting a student with a series of test questions at the appropriate level of difficulty for his or her abilities, knowledge, and background. Questions that are too difficult or too easy are avoided, and accurate results are obtained with fewer questions administered with no time limit.
Each student is presented with a series of 17 questions of two primary types. The first type consists of a reading passage followed by a question based on the text. Both short and long narratives are provided. The reading passage can also be classified according to the kind of information processing required, including explicit statements related to the main idea, explicit statements related to a secondary idea, application and inference. The second type of question, sentence relationships, presents two sentences followed by a question regarding the relationship between the two sentences. It may ask, for example, if the statement in the second sentence supports that in the first, if it contradicts it, or if it repeats the same information. Both reading passages and sentence relationship questions are also varied according to content categories to help prevent bias because of a student's particular knowledge. (CEEB, 1990, p 3).

Degrees of Reading Power test
The Degrees of Reading Power test, (Koslin, Zeno, and Koslin, 1987) is based on the Degrees of Reading Power Program (Koslin, Zeno, and Koslin, 1989) and is a criterion-referenced holistic measure of how well the messages within text are understood. The psychometric properties of the test have been extensively studied (Koslin, Zeno, and Koslin, 1987, 1989), and demonstrate that the test possesses high levels of reliability (KuderRichardson, KR-20= .95; and test-retest: rxx = .95), construct validity, and criterion-related validity. According to the authors, the goal of the test is to:

Assess current levels of reading achievement. Determine the most difficult prose a student can read with a specific degree or level of comprehension. Match the difficulty of materials with student ability, relative to the purpose of instruction. Set appropriate standards for achievement. Document growth in the ability to read with comprehension. And, indicate the extent of compensatory or remedial help, if any, that a student may need in order to achieve various personal goals, or to satisfy school-determined expectations in reading.
Each test consists of a number of nonfiction paragraphs and passages on a variety of non biased topics. Each paragraph contains a sentence with a blank space, and each passage has seven sentences that contain a blank space, to indicate that a word is missing. For each blank, four or five single-word response options are provided. Students must select the most appropriate response to complete the sentence. It is not possible to answer DRP test items correctly by relying only on the information in the sentence containing the blank. A paragraph, or at least several sentences, must be understood to respond successfully.

Results

    Data analyses were run in three phases. Sample means and standard deviations for the CPT-R and DRP appear in Table 1.
The first phase of analysis was designed to establish sufficient correlation between CPT-R and DRP scores. The Pearson's product moment correlation coefficient was computed to assess the strength of the overall relationship between the CPT-R and the DRP. The resulting correlation coefficient (r = .783, p. < .0001) indicates that there is a significant and moderately strong, relationship between the two reading tests. Then the DRP scores were organized into intervals corresponding to their reading grade levels and group means were computed. The DRP reading grade levels, ranging from fourth grade or below to college level, are shown in Table 2. In addition, this table reports CPT-R means (n and standard error of the mean) for the students falling in each DRP reading level. From the table, we see that the mean CPT-R score for students falling in fourth grade or below equals 35.33 and, in a linear fashion, significant increases in group means (F linearity (1,1152)= 2045.27, p< .0001) are observed for each successive increase in DRP reading level.
    The second phase consisted of a simple regression analysis of DRP scores on CPT-R scores. Results are presented in Table 3. Employing the general linear regression equation (Y' = a+bX), and applying the regression constants (see Table 3) to the raw CPT-R scores produced predicted DRP scores. Selected CPT-R scores, predicted DRP test scores, and their corresponding grade level performance standards appear in Table 4. Thus, it can be seen that a CPT-R score of 83 or above is predictive of a basic reading proficiency level sufficient to work with first-year college texts. CPT-R scores below this level are associated with reading proficiency below that required to work with college texts.
    Finally, the accuracy of prediction can be determined through an examination of the standard errors of the estimates. CPT-R scores predict DRP scores and grade level with accuracy. Specifically the standard error of the estimate for the DRP is ±5 points, and the standard error of the estimate for grade level is ±1 year.(1) Summary

    The objectives of the current research effort were to a) assess the concurrent validity of the CPT Reading Comprehension test, and if sufficient concordance was observed, to b) empirically identify points on the CPT-R score continuum which would serve as reliable indices of college-level basic reading comprehension preparedness. Results from analyses indicated that substantial correlation does, indeed, exist between the two measures, and that the CPT-R can be employed, with a high degree of reliability and validity, to identify basic reading proficiency skills commensurate to the demands of first year college-level texts.
    Although this study affirmed the CPT-R college level cut point suggested by Smittle (1993) which was based on an examination of the score distribution and replicated student placement percentages found previously with traditional paper and pencil tests, it went further to equate specific CPT-R scores with expected grade-level performance. This study also replicated the CPT-R grade-level equivalencies produced in Murphy's (1995) analysis of the Nelson Denny Reading Tests. Table 4 shows that a CPT-R score of 83 predicts reading ability commensurate with the demand of college level work, as both the DRP and the Nelson Denny grade-level assignments indicate. Thus, it is now possible to ascribe meaning to the CPT-R's norm-referenced scores and to assess the reading ability reflected in a CPT-R score as "at a particular grade level." In this light scores can be used as multidimensional indicators of proficiency.
    The DRP/CPT-R conversion information resulting from this study has applications value for the instructor, the classroom, and the institution. The CPT-R score now has enhanced meaning in that it identifies student reading ability in terms of the level of difficulty the student can handle and the type of scaffolding (support and intervention) required of the instructor and environment. The practitioner can reference this information to select curriculum and materials with appropriate level of difficulty. Further administrative application can be made in an effort to create homogeneous learning groups within courses and individual classrooms.
    The generalizations which can be drawn from the results of this study are somewhat limited by the nature of the sample. The scores used in the present study were drawn from a population of community college students' scores, thus, it would be recommended that further studies be done before assuming that these results can be uniformly applied to scores drawn from other student groups or types of institutions. This study may provide a useful framework for such additional research efforts.

Table 1. DRP and CPT-R Means and Standard Deviations. 
 
Measure Mean Standard Deviation Standard Error of Mean
CPT-R 61.12 11.28 .33
DRP* 63.16 8.81 .26

*DRP scores used for the present study represent p.= 90 conversion scores for independent levels.



Table 2. DRP Reading Grade Levels and Actual CPT-R Means.
 
DRP Reading Grade Level N Mean CPT-R Standard Error of the Mean
Fourth Grade or below 

(DRP = 40 or less)

12 35.33 1.56
Fifth Grade

(DRP = 41 - 48)

60 40.72 0.97
Sixth Grade

(DRP = 49 - 53)

82 47.38 0.80
Seventh Grade

(DRP = 54 - 58)

172 53.30 0.58
Eighth Grade

(DRP = 59 - 62)

173 59.10 0.55
Ninth Grade

(DRP = 63 - 67)

292 64.40 0.39
Tenth Grade

(DRP = 68 - 70)

151 68.31 0.50
Eleventh Grade

(DRP = 71 - 74)

97 70.60 0.51
Twelfth Grade

(DRP = 75 - 76)

44 71.11 0.67
College Level

(DRP = 77 or higher)

71 74.46 0.56

Grade reading level is based on DRP-75 scores achieved by the end of the term.
 

Table 3. Regression of DRP on CPT-R.
 
Variable b a t
CPT-R .659 .634 24.419 29.24*

*p < .0001, df = 1,154





Table 4. CPT Reading Comprehension Test Scores, Corresponding Predicted DRP1 Test Scores and DRP Grade-Level Performance Standards2, and Corresponding Predicted Nelson-Denny Comprehension Scores3 and Nelson-Denny Grade-Level Performance Standards3.
 
CPT-R Scores Predicted DRP Scores1 DRP Grade-Level Performance Standards2 Predicted Nelson-Denny Comprehension Scores3 Nelson-Denny Grade-Level 

Performance Standards3

83 77 College 41.55 12th & College
80 - 82 75 - 76 12th Grade 40 - 41 12th Grade
73 - 79 71 - 74 11th Grade 38 - 39 11th Grade
69 - 72 68 - 70 10th Grade 36 - 37 10/11th Grade
61 - 68 63 - 67 9th Grade 33 - 35 9/10th Grade
55 - 60 59 - 62 8th Grade 30 - 32 8/9th Grade
47 - 54 54 - 58 7th Grade 27 - 29 7/8th Grade
39 - 46 49 - 53 6th Grade 24 - 26 6/7th Grade
26 - 38 41 - 48 5th Grade 18 - 23 5/6th Grade
25 40 4th Grade and below 17 Below 5th Grade

1 The Standard Error of Estimate predicting DRP from CPT-R is equal to ±5.15.
2 The Standard Error of Estimate predicting DRP grade level performance from CPT-R is equal to ±1.27. Source for DRP grade levels: Koslin, Zeno, & Koslin (1987), p. 143.
3 Source: Murphy, S. (1995).
 References

    Anastasi, A. (1982). Psychological testing (5th ed.) New York: Macmillan.
    Anastasi, A. & Urbina, S. (1997). Psychological testing (7th ed.) Upper Saddle River, NJ: Simon & Schuster.
    Aron, A. & Aron, E. N. (1994). Statistics for psychology. Englewood Cliffs, NJ: Prentice Hall.
    College Entrance Examination Board. (1986). Coordinator's notebook for computerized placement tests. Princeton, NJ: Author.
    College Entrance Examination Board. (1990). Coordinator's guide for computerized placement tests, 3. Princeton, NJ: Author.
    Koslin, B.L., Zeno, S., & Koslin, S. (1987). The DRP: An effective measure in reading. TASA DRP Services, Brewster New York.
    Koslin, B.L., Koslin, S. Zeno, S., & Ivens, S. (1989). The Degrees of Reading Power Test: Primary and standard forms. TASA DRP Services, Brewster New York.
    Napoli, A., (1991). Validating CPT Reading Comprehension Test standards employing relevant college-level performance criteria. Background readings for College Placement Tests, Educational Testing Service/The College Board. Princeton, New Jersey.
    Napoli, A., & Wortman, P.M. (1995). Validating college-level reading placement test standards. Journal of applied research in the Community College, 2(2), 143-151.
    New Jersey Basic Skills Council (1987). New Jersey College Basic Skills Placement Testing. Trenton, NJ: New Jersey Department of Higher Education.
    Nold, D. & Kuechenmeister, M. (1991). Aims Community College computerized placement tests pilot study: Summary report. Computerized Placement Tests: Background Readings. New York: College Entrance
    Murphy, S. (1995). An analysis of the construct and predictive validity of the CPT-R and Nelson Denny tests. Unpublished manuscript, Rose State College, Midwest City, Oklahoma.
    Sammon, S. F. (1988) . A correlational study: The New Jersey College Basic Skills Test and the Degrees of Reading Power Test. Unpublished master's thesis, William Patterson College, New Jersey.
    Smittle, P. (1993) Computer adaptive testing: A new era. Journal of developmental education, 17 (1), 8-12.
    Ward, W. C., Kline, R.G., and Flauger, J. (1986). Summary of pilot testing results. In College Entrance Examination Board. (1986). Coordinator's notebook for computerized placement tests. Princeton, NJ: Author.

Address correspondence to:
Anthony R. Napoli, Ph.D.
Office of Institutional Research
Suffolk Community College
533 College Road, r102b
Selden, NY 11784
e-mail: napolia@sunysuffolk.edu

1. The standard error of estimate indicates the degree to which scores may vary from the values predicted by the regression equation (Aron & Aron, 1994).