Feature: Taking the SATs to the Next Level

Michigan State University artistic image

New research by MSU scientists suggests how college entrance exams can be tweaked to better predict success in college.

In March, as MSU’s men’s and women’s basketball teams made their exciting runs to the NCAA Final Four, another type of “March Madness” unfolded outside the glare of television.

On one Saturday morning, about 3,300,000 teenagers in America took the dreaded SAT test. Many others took a similar test known as the ACT.

The taking of these tests would be their first step into the meritocratic universe of college and university life. These tests are used by the vast majority of college admissions offices, and for good reason. “They are excellent predictors of academic success in college,” says Neal Schmitt, chairperson of the MSU Dept. of Psychology and co-chair of a major national study about college entrance exams. “They have a very high correlation with your grade point average in your first year of college.”

You can expect your children to take one of these tests. And your grandchildren. But by then, thanks to Schmitt’s research at MSU, the exam might be radically different from what it is today.

Already three years into a long-term research study, funded by the College Board organization, the marketing arm of the SAT, Schmitt is trying to ascertain how the test can be tweaked and improved in order to do a better job at helping admissions officers.

“The SATs measure one aspect of student ability that most colleges consider valuable, and that’s academic potential,” explains Schmidt. “But there are many other student characteristics that colleges consider valuable that are not measured by the current SAT and ACT, things like leadership, social responsibility, ethics and integrity.”

In addition, notes Schmitt, most institutions want a diverse student body, and the SATs are less likely to yield that mix since members of different racial groups tend to perform differently in the current tests.

“If the SAT and ACT scores were the only criterion used by MSU for admissions, you would not get the racial or cultural mix we currently have,” notes Schmitt.

University admission officers are interested in many questions dealing with the probability of success of prospective students. How can we predict who will do well in college? Who will graduate? Who will have difficulty adjusting to the independence and new social and cultural atmosphere usually encountered in college? Who will go to class and otherwise benefit from the myriad of opportunities available on most university campuses?

In an effort to predict academic success, high school students in the U. S. hoping to attend college are usually required to take one of two standardized tests. The SAT test published by the College Board in New York is required by the majority of colleges and universities on the east and west coasts. The ACT test published by ACT, Inc. in Iowa City is required by higher educational institutions in the middle of the country including MSU. The SAT-I is comprised of quantitative and verbal reasoning sections while the ACT is comprised of English, Math, Reading, and Science sections.

“The major difference between the two is that the ACT is supposed to be more closely linked to actual high school curricula,” says Schmitt. “In reality, scores on these two tests are often very highly related; in our research we have found them to be nearly interchangeable as indices of student potential. There is also an SAT-II comprised of 22 different subject matter tests ranging from Biology and Chemistry to Chinese Reading.”

These standardized tests predict student performance very well as reflected in first year college GPA, says Schmidt. Study habits, persistence, and degree attainment as well as GPA in later college years are also predicted by the SAT, though not as well. Most large universities use a weighted combination of high school GPA and standardized test scores to make admissions decisions. This combination usually predicts about one quarter to one third of the variability in college grades. Prediction using these indices is often supplemented with judgments of student potential using items such as letters of references, applicant essays, and applicant interviews.

“We know much less about the utility of these informational sources or even how judgments based on these materials are made by admissions personnel,” explains Schmitt. “What is used varies across institutions often as a function of the capability of the institution to evaluate data from these latter sources.”

While standardized tests provide an efficient and relatively valid source of information, they have been criticized for a number of reasons, says Schmitt.

“Scores on these tests often indicate substantial differences between white students and students from various minority groups,” he explains. “Efforts to overcome these differences to allow for the recruitment of an ethnically diverse group of students have been the frequent subject of litigation, perhaps the most highly publicized being the University of Michigan case decided by the Supreme Court in the summer of 2003.”

Another critic of the SAT-I has been Richard Atkinson, president of the University of California system, who favors a test (such as the SAT-II and the ACT) more directly related to the high school curriculum to which a student has access. Data collected at the University of California indicated that the SAT-II was a better predictor of student grades than the SAT-I with little impact on the relative differences in ethnic subgroup average scores.

Still other critics of both the SAT and ACT point to the fact that neither of these tests evaluates students’ actual writing skills even though these tests include multiple choice items on vocabulary, grammar, and sentence structure. Perhaps, in partial response to Atkinson’s concerns, both the SAT and ACT introduced a writing test in 2005-06. The SAT is also introducing a revised version of the SAT-I in 2006 that is designed to be more directly aligned to high school curricula.

“Another criticism of the sole use of high school GPA and standardized test scores is that they do not evaluate other characteristics of students that may impact their success such as motivation, interests, and non-academic experiences,” writes Schmitt in a recent essay. “Further, while standardized tests do relate to college GPA, examination of most universities’ mission statements reveals that they profess to develop students in areas such as leadership, social responsibility, integrity/ethics, the motivation to continue to develop their knowledge, and multicultural appreciation.

“Standardized test scores likely reveal little about individual differences in motivation or interests as most applicants will be highly motivated at least during the test taking session. Scores on these tests are also unlikely to predict development along the ‘non-academic’ dimensions listed above. Addressing these two limitations has guided the College Board-funded research we have conducted over the last three years.”

The Schmidt research (including professor Fred Oswald and a group of graduate and undergraduate psychology students) team began by examining the mission statements of a wide variety of public and private universities located in various parts of the country. These statements were sorted and grouped into 12 major dimensions that reflect the goals of most universities of different types. These include such factors as student interest in their disciplines, motivational factors, ethics and integrity, and leadership and other background measures that are not detected via standard SAT tests.

The 12 student performance dimensions were used as the blueprint to develop two new sets of measures designed to predict outcomes along these dimensions. These measures included “biodata” (background experiences, interests, and motivation) as well as “situational judgment measures” (requires that students indicate how they would respond to hypothetical situations they may face in college). The team relied heavily on similar existing measures as well as numerous meetings with students to ensure that items were relevant and the response scales appropriate.

“We now have collected data on about 150 situational judgment items and 275 biodata items that could be used as alternative forms of these instruments,” says Schmitt. “Our attempt to validate—that is, assess the degree to which these items related to student performance—also involved the development of rating scales designed to assess actual student performance on each dimension.”

The first version of these instruments was given to approximately 650 MSU freshmen late in their first term or early their second term. Self and peer ratings on these dimensions, self reports of class attendance, as well as GPA data for the freshman year were collected at the end of these students’ first year at MSU. Results of analyses relating the biodata and SJI predictors to student outcomes indicated a significant increase in the accuracy of prediction of GPA above standardized tests scores and high school GPA using these measures. Our instruments were also highly related to self ratings, but more importantly to peer ratings of the twelve performance dimensions and, as expected, standardized test scores related very minimally to these alternative outcomes. Both measures also predicted student attendance at class which, in turn, was related to GPA. An additional attractive outcome was that there were very small ethnic or gender differences on biodata and situational judgment measures. The dimensions we are measuring are likely targeted by universities that evaluate reference letters, essays or interviews, but our approach may be efficient enough to allow their use by universities with such large applicant populations that they cannot afford to use these techniques in a systematic fashion.

As might be expected, these measures are not without liabilities, says Schmidt. One major concern is that the new instruments will be easily coachable. Indeed, he notes, “our research has shown this to be true, especially for the biodata.” But the team also found that if respondents are asked to provide specific examples of behavior they are claiming, and if the examples are easily verifiable, and if the students are warned that the responses will be verified, then the answers are “faked” less frequently.

Currently, a large multi-institutional data collection effort is being undertaken to determine whether the results of their work generalize to other student groups. At this point, College Board is weighing the feasibility of two alternative uses of these data. One use would be as supplements to the current SAT-I and SAT-II; a second use would be to use these instruments as counseling tools for high school students. In this context, they might better inform the student about the broad array of expectations about college demands and expectations they can expect to encounter.

“Referring to these instruments as SAT-III as a recent New York Times article did may be premature,” says Schmitt, “but we believe universities will be making increasing efforts to consider a broader set of dimensions to assess student potential and outcomes and we hope the instruments we are developing will aid in those efforts.”

Neal Schmitt, University Distinguished Professor of Psychology and Management at MSU, was editor of Journal of Applied Psychology from 1988-1994, has served on ten editorial boards, co-authored three textbooks and published roughly 150 articles. Over the past three years, he has worked on the development and validation of noncognitive measures for college admissions.

Robert Bao

Read inspiring Spartan stories

Feature taking the sats to the next level

Feature: Taking the SATs to the Next Level