**A Psychometric Analysis of the ****Tri-Campus Mathematics Tests**.

prepared by the SCCC Office of Institutional Research and Assessment

October 1998

__Goal__

The goal of this study was to assess the reliability and validity of the Tri-Campus Mathematics Tests for MA01, MA07, and MA27 level courses. This was accomplished by examining the following psychometric properties of the tests: 1) Inter-Item Reliability - a score referring to the degree of consistency among item scores within a test of a unitary factor, 2) Inter-Rater Reliability - a score allowing for the statistical examination of the adequacy of a scoring rubric, and 3) Construct and Concurrent Validity - a measure of the representativeness of the test in terms of the domain it was designed to assess and the comparability of the test scores with scores on other tests purporting to assess similar skills.

Ultimately this study attempted to examine the appropriateness of the use of the Tri-Campus Mathematics Tests as assessment tools in the evaluation of the Academic Systems Mediated Learning (ASML) program in Interactive Mathematics.

__The Developmental Math Program__

The Developmental Math program at Suffolk County Community College (SCCC) consists of a sequence of two single-semester math courses. The first course (Developmental Mathematical Skills, MA01) was designed for students with low-level basic arithmetic abilities. These students attend two semesters of developmental math instruction. Students with assessed math skills below college level, but above the MA01 level only need to attend the second developmental course (Algebra 1, MA07).

The Developmental Mathematical Skills (MA01) course description appearing in the SCCC catalog (SCCC, 1998) states that the objectives and goals are "for the student to learn or strengthen basic arithmetic skills and introductory plane geometry concepts, and to prepare the student mathematically for entry into beginning algebra" (p. 173). The course description for beginning algebra (Algebra 1, MA07) characterizes it as "equivalent to first-year high school algebra" with topics including "the language of algebra, the order of operations, signed numbers, linear equations, simultaneous equations, solving quadratic equations by factoring," and "application of algebra to selected verbal problems" (p. 173). The college limits class size in developmental education courses, thus providing individualized attention and follow-up. Algebra 2 (MA27) is not considered part of the developmental math program. However, this course provides "a continuation of the study of the basic concepts of algebra"(p. 173), and corresponds to an additional level of instruction provided in ASML courses.

ASML courses in Interactive Mathematics (MALA, MAL1, and MAL2) offer students an opportunity to work autonomously, spending as much time working on a given exercise problem or skill area as they need to meet a predetermined level of proficiency. ASML provides multimedia software and workbooks used to step the students through the topics of the developmental math program. The system assigns exercises and homework adaptively, and tracks the time a student is spending on the various units as well as the results of the on-line pre- and post unit tests. The ASML learning context includes a multimedia computer station for each student, an instructor, a professional assistant, a faculty-student group discussion area, and a projection system for multimedia-computer classroom discussion.

Most students take a placement exam, which includes the College Placement Test in Mathematics (CPT-M), before registering for courses at SCCC. The CPT-M tests basic arithmetic and algebra skills, and provides both separate arithmetic (CPT-Arithmetic) and algebra (CPT-Algebra) scores as well as a composite (CPT-M) score. Students with CPT-M scores below 101 are advised to register for MA01, those with scores between 101 and 149 to register in MA07, and those with scores above 149 to register in MA27. A student who wishes to complete MA01, 07, or 27, can register for MALA; those who wish to complete only MA01 can register for MAL1; and those who wish to complete either MA07 or 27 can register for MAL2. In addition, MA07 and MAL2 are open to students who have passed MA01 or MAL1. Similarly, MA27 is open to students who have passed MA07 or MAL2. At the end of the semester the ASML student receives a single course grade reflecting their highest level of achievement. For example a student registered for MAL2 or MALA intending to complete their MA07 requirement, may be able to accomplish the MA27 curriculum and would therefore receive MA27 credit.

__The Sample__

At the end of the Spring 1998 semester, students within randomly selected MA01, MA07, and MA27 classes and all MAL1, MAL2, and MALA students completed the Tri-Campus Mathematics Tests. The sample included 458 traditional classroom students and 184 mediated learning students (117 MA01, 120 MAL1, 129 MA07, 57 MAL2, 212 MA27 and 7 MALA).

Demographically, more than half (54%) of the sample of students in MA01, MA07, and MA27 were female. Of those who reported ethnicity, the sample was predominately White (54%), with less than 15% representation from Black (5%), Hispanic (12%), Asian (2%), or Native Alaskan/American Indian (1%) ethnic groups. A majority of the students in the sample (60%) were between 18 and 22 years of age. The largest proportion of students (49%) attended classes at the Selden Campus . Most of the students (52%) entered SCCC within the 3 semesters prior to taking the test (Spring and Fall 1997, Spring 1998) and had accumulated 15 or fewer total credits. In this sample the greatest number of students (50%) had earned a cumulative grade point average of "C" (2.0-2.9). Similarly the highest proportion of students (40%) had brought with them to SCCC a high school grade point average of "C" (70-79) as well. For more detailed and course-specific demographic information see Table1.

Based on statistical evidence the sample appears to be demographically representative of all students enrolled in MA01/MAL1, MA07/MAL2, and MA27/MALA for the Spring 1998 semester (the population in the present studies). However, students in the sample put-off taking math until later in their college careers than the population of students enrolled in these classes. Thus the highest proportion of the population (39%) consisted of students who had fewer than 5 total college credits, as opposed to a much smaller proportion in the sample (14%). Moreover, 26% of students in the population had no cumulative college grade point average (as compared to less than 10% of the sample), indicating that Spring 1998 was their first college semester. We addresses the potential impact of this difference below.

Table 1. Distribution of the sample's demographic characteristics (%) according to math course.

Percent of Students in Math Course | |||

Student Demographic Characteristic | MA01/ASML
equivalent |
MA07/ASML
equivalent |
MA27/ASML
equivalent |

Gender *
Male Female |
35.8 55.1 |
39.1 54.0 |
38.9 52.2 |

Ethnicity **
White Black Hispanic Asian Native Alaskan/American Indian No record |
48.2 6.2 10.3 1.2 1.2 15.6 |
59.9 4.8 9.5 2.6 .5 11.6 |
55.1 4.0 14.2 2.4 .8 8.9 |

Age (years) *
18-19 20-22 23-25 26-30 More than 30 |
19.3 40.4 9.9 7.0 14.3 |
18.5 36.6 10.0 11.7 16.3 |
25.9 37.4 9.7 7.3 10.8 |

Campus *
Eastern Selden Western |
19.8 54.3 16.8 |
14.8 55.6 22.7 |
8.9 38.5 43.7 |

Year Entered SCCC *
1998 1997 1996 1995 Before 1995 |
23.9 35.4 15.7 6.2 9.7 |
13.2 38.1 13.8 7.9 20.1 |
8.5 36.0 19.5 10.2 16.9 |

Total Credits *
0 1-5 6-10 11-15 16-25 More than 25 |
7.0 25.9 27.3 11.2 11.1 8.4 |
4.8 11.7 17.5 17.0 26.0 16.1 |
0.0 2.9 5.0 14.4 26.9 41.9 |

Cumulative College GPA *
0 0.1-0.9 1.0-1.9 2.0-2.9 3.0-3.9 4.0 |
7.4 .4 15.6 34.1 26.4 7.0 |
5.3 2.1 12.2 40.8 27.9 4.8 |
.4 1.2 7.6 47.6 30.7 3.6 |

High School GPA ***
0 1-69 70-79 80-89 90-100 |
7.0 10.7 34.3 11.4 0.0 |
8.5 7.9 46.1 12.6 0.0 |
5.8 6.4 39.4 21.8 1.3 |

* Data missing from the set: MA01=9.1%, MA07=6.9%, MA27=8.9%.

** Data missing from the set: MA01=17.3 %, MA07=11.1%, MA27=14.6%.

*** Data missing from the set: MA01=36.6%, MA07=24.9%, MA27=25.3%.

__The Test__

A tri-campus committee of mathematics faculty developed the three tests to measure student proficiency with the basic math skills taught in MA01/MAL1, MA07/MAL2, and MA27/MALA . The committee designed the tests around a set of agreed upon curriculum-based objectives. As such, the tests provide a criterion-related measure of student proficiency. Instructors score these tests on the basis of a subjective scoring rubric (i.e. set of rules).

__Inter-Item Reliability__

Before a test is released for general use it should be checked for reliability. Information on the reliability of a test, and the number and nature of the people on whom reliability was established, allows users to predict whether the test will be more, less, or equally reliable for the group with which they expect to use it. One definition of reliability refers to "internal consistency" (Aron & Aron, 1994; Anastasi & Urbina, 1997) or the degree of consistency among item scores within a test of a designated construct (i.e. inter-item reliability). Internal consistency indicates how similarly the different items measure a single construct. Using this definition, if the tri-campus committee's math test is a reliable measure of the domain "math proficiency," a student's performance on any one item should be highly related to his or her performance on other items. In other words, the information provided by one item should be consistent with the information provided by the other items in the test. Cronbach's (1951) alpha (), a statistic which reflects the homogeneity of a test, measures internal consistency. A high alpha (>.8) indicates high consistency or homogeneity across items, and allows for the claim that these tests are a reliable measure of math proficiency. The Tri-Campus Mathematics Tests showed high internal consistency at all three levels, MA01, MA07, and MA27 ( s = .84, .88, and .88, respectively).

__Inter-Rater Reliability__

Subjectively graded tests, like the math tests, depend upon the judgement of those who score the tests. Standardized procedures and a well defined scoring rubric can make the influence of scorer bias negligible (Anastasi & Urbina, 1997). In such circumstances it is important to evaluate the reliability of the different judges, that is, to establish the interchangeability of judges. In general, inter-rater reliability provides a measure of the degree of agreement or consistency between observers, judges, or scorers (Aron & Aron, 1994). To ensure high inter-rater reliability across all performance levels on these tests, five example forms were developed (representing A, B, C, D, & F grades) for each of the three tests, which were then separately scored by participating instructors.

The inter-rater reliability of the MA01 test across all examples of performance levels (A-F) was very high, with a reliability coefficient (alpha or ) of .91, based on 29 of the 33 items. Moreover, the remaining 4 items showed perfect agreement among judges' scores. Judges' scores agreed highly on A, B, C, and D-graded tests, with agreement levels of 99%, 98%, 100%, and 99.7% respectively. The slight suppression of the overall alpha appears to stem from the judges' scoring of the F-graded test which produced 94% agreement in judges' scores (perfect agreement on 20 of the 33 items).

The inter-rater reliability of the MA07 test across all form performance levels was also very high ( = .95) based on all 26 items. A and F-graded tests showed 98% agreement between judges' scores (perfect agreement on 21 and 23 of the 26 items respectively). Among B, C, and D-graded tests, judges' scores demonstrated 97% agreement, with perfect agreement on 21 of the 26 items for each form.

The inter-rater reliability of the MA27 test across all performance levels was also high, with a reliability coefficient of .85, based on 22 of the 24 items. The remaining 2 items showed perfect agreement among judges' scores. A, B, and F-graded tests showed agreement ratings of 97%, 93%, and 96%, respectively, with perfect agreement on 19 of the 26 items for each of the graded levels. C-graded tests showed 99.5% agreement among judges' scores (25 of the 26 items with perfect agreement). Judges' scoring on D-graded test showed 95% agreement, with perfect agreement on 18 of the 24 items.

__Content, Construct, and Concurrent Validity__

Content and construct validity result when a measure contains a representative sample of
the domain it claims to measure (Aron & Aron, 1994; Anastasi & Urbina, 1997), in this case
certain portions of the math subject area. Ideally, content and construct validity are built into a
measure during its construction through the selection of appropriate items. A tri-campus
committee of Math Department faculty, all math instructors with advanced degrees in
Mathematics, selected the items on the Tri-Campus Mathematics Tests using domain^{(1)} sampling
from the MA01, MA07, and MA27 curriculums. These specialists defined the chosen set of
items as representative of course-appropriate level skills in the subject area of basic mathematics,
thus the Tri-Campus Mathematics Tests have content and construct validity.

Criterion-related validity represents performance in relation to particular tasks, or discrete cognitive or behavioral objectives. Measures of concurrent validity, the degree to which scores on two or more measures directly measure the same thing, and predictive validity, the degree to which scores predict performance, or both, determine criterion-related validity (Aron & Aron, 1994; Anastasi, 1982; Anastasi & Urbina, 1997). Thus, the Tri-Campus Mathematics Tests' concurrent validity is measured by the degree to which scores on this test correlate with scores on other tests purported to measure arithmetic and algebra skills. Similarly, the predictive validity of the Tri-Campus Mathematics Tests can be measured by the degree to which scores on the test accurately predict future arithmetic and algebra performance. The concurrent validity of the Tri-Campus Mathematics Tests or its predictive validity, or both most comprehensively, determine the level of criterion-related validity of these tests. Only when sufficient criterion-related validity has been established, and points on the Tri-Campus Mathematics Tests' score continuums can be reliably equated to some relevant cognitive or behavioral skill, can the test user appropriately deploy the assessment tool to make evaluative decisions. Studies addressing the predictive validity of the Tri-Campus Mathematics Tests are recommended in future semesters, both for their own value and for the support they will lend to the criterion-related validity of the test. The present study establishes concurrent validity between the Tri-Campus Mathematics Tests and the CPT-M. This is sufficient for the present need to establish the appropriateness of these tests for comparisons between regular developmental and ASML math courses.

To examine concurrent validity, we conducted a multiple regression analysis for each math course (MA01, MA07 and MA27) in which score on the appropriate level of the Tri-Campus Mathematics Test was the dependent variable, and the two independent variables were the scores on the CPT Arithmetic and CPT Algebra tests.

In the MA01 group, the overall __R ^{2}__

In the MA07 group, the overall __R ^{2}__

In the MA27 group, the overall __R ^{2}__

Please note that, the above correlations between CPT Arithmetic and Algebra scores and scores on the Tri-Campus Mathematics Tests may, in fact, under-estimate their relationship. Students take the CPTs upon initial entry to SCCC, however many of the students in the sample put-off taking developmental math (and therefore taking the math test) until well beyond their first semester. Napoli and Wortman (1995) suggest that the validity coefficient diminishes over time. Concurrent validity presupposes an established time frame for the overall testing situation, in this case a short period of time (1 to 2 semesters between taking the CPT at entry and the Tri-Campus Mathematics Tests following completion of coursework). Extending this interval increases the likelihood of outside experience moderating inter-test association, thus reducing the relationship between performance on the first test performance on the second test. Because students in the present sample were found to have put-off taking math until later in their college careers than the population of students enrolled in these classes, it is possible to infer that correlations between the tests may be stronger in the population of students enrolled in developmental math than they are in our current sample.

__Summary__

Overall the study establishes that scores on the Tri-Campus Mathematics Tests's continuum can be reliably equated to some relevant cognitive or behavioral skill, specifically arithmetic and algebra skills. This allows the test user (instructor, department, or administration) to appropriately deploy the tests as evaluative tools for designated courses. To accomplish this goal we assessed the Tri-Campus Mathematics Tests in terms of: 1) Inter-Item Reliability - a score referring to the degree of consistency among item scores within a test of a unitary factor, 2) Inter-Rater Reliability - a score allowing for the statistical examination of the adequacy of a scoring rubric, and 3) Construct and Concurrent Validity - a measure of the representativeness of the test in terms of the domain it was designed to assess and the comparability of the test scores with scores on other tests purporting to assess similar skills.

We found high inter-item reliability. This indicates high consistency or homogeneity across items, and allows for the claim that the tests are a reliable measure of math proficiency. Additionally, we found high inter-rater reliability. Thus we established an interchangeability of judges. Moreover, we confirmed that the scoring rubric was sufficiently defined to eliminate the subjective influence of scorer bias. Finally, we determined that the Tri-Campus Mathematics Tests have content, construct, and concurrent validity. Therefore we can claim that the tests do measure the arithmetic and algebra skills they purport to measure.

References

Anastasi, A., & Urbina, S. (1997). __Psychological testing__ (7^{th} ed.). Upper Saddle River,
NJ: Simon & Schuster.

Aron, A., & Aron, E. (1994). __Statistics for psychology__. Englewood Cliffs, NJ: Prentice
Hall.

Course offerings in reading. (1998). __Suffolk County Community College Catalog__, 37,
173.

Cronbach (1951). Coefficient alpha and the internal structure of tests. __Psychometrika,
16,__ 297-334.

Napoli, A., & Wortman, P. M. (1995). Validating college-level reading placement test
standards. __Journal of Applied Research in the Community College, 1(2),__ 143-151.

1. The domain is the full curriculum of the courses (MA01, MA07, and MA27) as well as the cognitive and behavioral skills to be attained through these courses.

2. These finding can also be interpreted as evidence for the convergent and discriminate validity of this test.