SAT was originally called the Scholastic Aptitude Test. Then its name changed to Scholastic Assessment Test and later was called SAT1: Reasoning Test then, SAT Reasoning Test. Today, it’s simply called SAT. SAT is a paper-based standardized test, and is developed and administered by Educational Testing Services (ETS), a nonprofit organization. ETS gives its services to College Board, which primarily owns this test. College Board sponsors the testing program and decides how the test will be constructed and administered (ETS, n.d.) . SAT was first introduced in 1926 and today it is widely used in the United States for College admissions. The test is offered worldwide and seven times annually in United States. To take the test, fluency in English language is presumed. The test is designed to measure essential ingredients for college and career readiness and success, a stronger connection to classroom learning, and inspire productive practice (College Board, n.d.a) .
SAT’s current revision, introduced in 2016, requires 3 hours to complete. Additional 50 minutes are given to the candidate to complete the Essay portion which is now optional. SAT has three components;
1) Evidence-Based Reading and Writing
・ Reading Test
・ Writing and Language Test
3) Essay (optional)
Whether Essay portion is required can now be decided by the postsecondary institutions. A major change that has occurred in the scoring system is that there is no penalty for an incorrect answer. In this way the test takers can give and answer the test items to the best of their ability. The overall scoring scale now ranges from 400 to 1600. For each of the components 1 and 2, the scale ranges from 200 to 800. Essay is scored on a scale of 0 to 24 and its score is reported separately. In the new revision, subscores for every test are also given to examinees to facilitate their learning process (College Board, n.d.b) . Whereas, its past students were provided with their scaled scores, raw scores (marks gained and lost) and percentile scores, which was the percentage population that scored lower than them.
The SAT test is taken by high school juniors and seniors before they apply for their college admissions. Many colleges (undergraduate degrees), but not all, have it as a prerequisite because SAT score is one of the criteria at the time of admission to decide the entry of the applicant in the program. As mentioned above, the test checks the readiness of the applicant for college, and whether they possess the required skills which are needed to be successful in an undergraduate degree.
Except the Essay portion, all items are objective based. In most sections of the SAT, these objective items are arranged on difficulty scale ranging from easier to hard. Only in Critical reading section the items are arranged in a chronological manner. Multiple Choice questions are predominantly used in SAT, where each question has five choices and one of them is correct. Another objective method used is grid-in response in Math section, here the participant has to supply the answer and fill the grid. Since the test is administered in a restricted time limit, it’s a power and speed test.
The importance and need for standardized test like SAT was found due to the great amount of variability in high school grades, curriculum and teacher recommendations. In order to provide one platform to compare and judge students fairly this test was developed. Researchers have shown that combined use of high school grade and SAT test score is a far better predictor of college success than using anyone of them alone (Camara & Echternacht, 2000; Ragosta, Braun & Kaplan, 1991) .
Although this stands true, various researchers have questioned these results and formulated contradictory hypothesis. One of such studies was conducted by Geiser and Santelices in 2007 . They challenged the overdependence on Standardized tests and reemphasized the importance of High School Grade Point Average (HSGPA). The results show that high-school grades in college-preparatory subjects are the best predictor of college performance. It not only has high predictive validity with first-year college results, but also for long term college outcomes. Indeed, the predictive weight increases after first-year college. The data also showed that standardized test also yielded small but statistically significant improvement in predicting long term college results. The combination of High School GPA and SAT scores helps in prediction of college results. But beyond the predictive validity the researchers question the overemphasis of SAT. SAT scores have strong positive correlation with family income, parent’s education, and school rank, while HSGPA has considerably weaker relation. Thus, SAT puts disadvantage students at further disadvantage. Not only that, SAT score is a mere individual score of a 4-hour test sitting; whereas, high school result is obtained after rigorous work of years on both individual and group levels, which is far more important at college level.
It’s rather clear that SAT has its own merit. But the fact that negatively impacts application of disadvantaged students is quite troublesome. This brings into question that on what other levels discrepancy in SAT scores exists, especially on racial aspect. One of the FAQ on ETS’s site tackles the question of fairness of standardized test to women and minority students. It claims:
“ETS conducts extensive research and applies rigorous quality standards to ensure that the tests we develop are fair to people worldwide. Every question on every test that we produce undergoes a careful review process to ensure that it does not favor―or penalize―any particular group of students. Groups of students (such as male, female, Black, Hispanic, etc.) may have different average scores on the same test. This does not necessarily mean that the test is biased. If the groups actually have different knowledge and skills because of different educational backgrounds and opportunities, the scores will reflect those differences.” (ETS, n.d.)
The above statement shows that careful consideration is given on the part of developers to ensure minimal bias in the standardized test, for which much research work is also carried out. But how far they have been successful can only be seen by the research results. One study compared mean average scores of SAT by race/ethnicity. It was found that mean scores for underrepresented minority applicants, African American, American Indian and Chicano/Latino students, fall below those for Asian American, White and others on all three SAT composites (Geiser & Studley, 2001) . Much debate has focused on whether these differences account for circumstantial difference, for which ETS clearly warns, or actual bias exists in the test. SAT has clearly been revised multiple times to improve upon its predictive validity and make it non-biased. In the upcoming section, we will see how far they have been successful in achieving it.
2. Literature Review
When we look back in history at the development of the test SAT we see that the years 1926-1936 were occupied with defining sections of the test; what to include and what not to include, what kind items should be integrated, and increasing or decreasing time limits. Later, scores were scaled so cross-year comparisons could also be made. In 1960s and 1970s average obtained scores declined, one of the reasons accredited was demographic changes in the test takers. In 1994 the most crucial change was the introduction of calculator for the math section. In 2005 at the criticism of University of California, some major modifications in the test items occurred. Ambiguous questions especially analogies were removed from the test. The most current revision of the test was recently released in spring 2016. The details of the new revision were discussed earlier (“SAT”, 2016).
In an article published in 2003, Jay Mathews shared that for the high school class of 2002 the average score for a non-Hispanic white student on the 1600-point test was 1060, for a black student was 857 (203 points lower), for Asians the average was 1070, and for Hispanics it was slightly over 900. The fact that the gap between blacks and whites had further increased (by 16 points) since 1992 was quite alarming. Continuing the article Mathews mentioned Nicholas Lemann’s book “The Big Test: The Secret History of the American Meritocracy” (1999), in which Lemann goes into details of the testing history. As cited, 1960s and 1970s was the time when colleges and universities with more applicants and less seats started giving preference to some minority students with lower SAT scores, based on their high school grades and personal achievements. This idea acquired further support from a Supreme Court case “Regents of the University of California v. Bakke” (1978), although the Court ruled 5 - 4, against the quota system used by the university’s medical school. However, Justice Lewis F. Powell wrote that in his opinion, race could be considered in admission decisions. This was all made possible owing to the “Affirmative action”, signed by United States President John F. Kennedy on 6 March 1961, to give preference to the applicants from minority or disadvantaged groups (UCI: Office of Equal Opportunity and Diversity, 2016) .
Soon after this College Board established a fairness-review process and SAT questions were scrutinized to remove any form of bias including racial. College Board with ETS started conducting studies to show that SAT was a good predictor of college grades, which it was. And since educational institutions needed a standard method to sift through the applications, they continued using the test. Idea of adjusting scores was also introduced but did not fare well. Later, Yale psychologist Robert Sternberg conducted researches, with the support from College Board, to develop an alternate test which could eliminate racial gap in scores.
In 1999 another court case came into focus, and this time it was against University of Michigan. Case “Gratz V. Bollinger” (1999) was filed to review the point system used by the university during the admission process. The university allocated 20 bonus points to the score of underrepresented ethnic minorities and a perfect SAT score was worth 12 points. The Supreme Court ruled against the point system as it was violation of the Equal Opportunity Law and racially discriminated against White applicants. Again the greater issue that we see in this case is the overemphasis on the test. Instead of reviewing each and every application and giving careful consideration to minority applicants as was emphasized by the Affirmative Action, the university sought the easy and quicker way out by allocating points and sifting through the applications quickly.
It was in the year 2003 when a controversial article by Roy Freedle (2003) was published by Harvard Educational Review. In his research paper Freedle stated the fact that SAT was both culturally and statistically biased, but through his strenuous work he had found a way to reduce disparities in scores. Freedle developed a technique called differential item functioning (DIFF) which helped him in his research to show that white students on average did better on easier items, whereas blacks on average did better on hard items in the verbal section. Seeing this discrepancy Freedle suggested a corrective scoring method called Revised-SAT (R-SAT) which scored only the “hard” items on the test, and was shown to reduce the mean-score difference by one-third between African American and White American. He also argued that low-income White test takers benefited from the revised score as well. This suggestion was made keeping the view that white students score higher on easier items because they were being facilitated by their English speaking family environment. In order to score higher on hard items, rigorous learning on part of the individual was required which gave equal opportunity to the minority groups as well. The hard items were also more appropriate to check college based progress.
Freedle’s article received heavy criticism from College Board and ETS, and the data he used for his research was questioned by the Senior Vice President for Research & Development of ETS, Drew H. Gitomer. Nevertheless, a more recent research by Santelices and Wilson (2010) confirmed the analysis of Freedle. They found that the relationship between item difficulty and DIF estimates, between African American and White American for verbal items, was even greater in the data as compared to when it was analyzed by Freedle. Although these findings didn’t apply to Hispanic students and to other sections except verbal.
College Board as usual disagreed with the findings, and its spokeswomen Kathleen Fine out Steinberg questioned whether such small sample could be used to draw broad conclusions and said that it was “presenting inconsistent findings as conclusive fact”. On the other hand, the organization Fair Test jumped to utilize this opportunity to again point towards the biased methods of the test. However, this study further evoked distrust in the test and warned the colleges and universities who continued using SAT (Jaschik, 2010) .
Much of the researches, articles and court case cited above show that some form of doubt always existed in the minds of scholars that the test was biased against minorities. Compensatory methods were tried which on many occasions failed. It is important to mention though that College Board and ETS always defended SAT and on various occasions claimed that SAT was not a biased test. In their opinion the greater difference seen in the test scores between different minorities was mostly due to the economic disparity and class system. Research also supports this factor that Socioeconomic status (SES) affects SAT scores (DeBold, Friedman, Molla, & Zumbrun, 2015) . Thus, the over- all test result was only showing the inequities that already existed in the society. Therefore, blaming the test in the light of these studies was imprudent.
Another theory that emerged during the constant debate about the fairness of the SAT test was “Stereotype threat” by Steele and Aronson (1995) . Stereotype threat “is a situational predicament in which people are or feel themselves to be at risk of conforming to stereotypes about their social group”. Researchers found that when stereotype threat is induced in a condition through subtlest form by labeling the test as a measure intelligence rather than just a challenge, Black participants’ scores were negatively affected, controlling for SAT scores. The mere presence of stereotype depressed the Black participants’ scores. Thus, their study found considerable interaction between race and condition they placed their participant in, even when SAT scores were controlled for. As the concept of “Stereotype threat” strongly established, the argument that emerged was that the difference in SAT average score on National level continued to exist due to the presence of stereotypes in the social environment and was not due to the presence of bias in the test itself. This is one of the views the defenders of SAT test take when arguing out that actual racial discrimination in test does not exist.
“The National Center for Fair & Open Testing (Fair Test) advances quality education and equal opportunity by promoting fair, open, valid and educationally beneficial evaluations of students, teachers and schools. Fair Test also works to end the misuses and flaws of testing practices that impede those goals. We place special emphasis on eliminating the racial, class, gender, and cultural barriers to equal opportunity posed by standardized tests, and preventing their damage to the quality of education”. (Fair Test: The National Center for Fair & Open Testing, n.d.)
Thus, decades of work and research has focused to evaluate the authenticity of the test SAT. Great many debates have been carried out to see whether the test contains racial bias or not and how the test could further be improved. But serious accusations still exist against SAT while discord upon its use continues to exist among educational institutions.
Psychometric testing is the hallmark of the psychology field. Education is a domain in which great amount of testing is utilized even today. SAT is one of the examples of great many tests produced by the field of psychology. Strengthening our basics is always crucial and therefore any questions which are raised on racial bias regarding the test SAT needs to be explored and solved.
The amount of students taking SAT increases annually and in 2015 a new record of 1.7 million was made (College Board, 2015) . This group was also the most diverse group to ever take SAT. Thus, the more the students rely on the test, greater the responsibility befalls on the test developers to ensure that no form of racial bias exist in the test. Therefore, one of the reasons to undertake this study was to explore whether the claims of non-bias are true or not. If not, then serious changes need to be demanded from the developers to improve the test. The study was also to face the aspect that whether a standard test without any bias is even possible.
In Pakistan today, more and more students have started taking SAT in order to apply for undergraduate degrees abroad. Although the applicants mostly belong to the affluent class, it still raises the question that whether being a non-White places a Pakistani student at a natural disadvantage.
Also another reason of studying the SAT test was to see the success of standardized testing in countries abroad. It was also to examine if alike SAT a standardized test could be prepared for university admission in Pakistan. Whereas, at present in Pakistan few private universities consider SAT scores for admission, but others prepare a separate entry test to select their students. Like United States, Pakistan also has a diverse population and in future to develop a more sophisticated education system, could standardize testing be utilized instead of having separate entrance admission test for each university. It should be mentioned here that few private universities in Pakistan today exempt the applicant from taking the entrance exam if they have achieved a respectable score in SAT.
My personal interest in this study was also to explore why the racial gap, if it does exist, continues to remain even today. Whether the monopoly of the test owners is affecting the improvement of SAT and should its use even be continued in the future.
Many controversies surround the test SAT but its customer base has stayed strong and still heavily relies upon it. Therefore, it is important to study whether SAT is even worth so much attention or not.
The objective of this study was to explore whether racial bias exist in SAT today. If it does exist what are the suggested solutions by the scholars and how they can be utilized to improve upon the test.
It was also an important area of the study that whether the SAT should continue to be utilized or has it become obsolete and other methods should instead be explored to measure the college readiness of the applicants.
The research arguments were developed after reviewing different organization’s websites primarily College Board, Educational Testing Service (ETS), SAT, and Fair Test. Articles and research papers were also reviewed from different journals and websites, one of the example is Harvard Educational Review. Interviews given by known members of the scholarly community were also read upon to gain perspective into their personal point of view and organization’s standpoint. Court cases in relation to SAT test were also reviewed to view the changes that occurred in the test due to judicial rulings. The analysis of this qualitative data helped in presenting multiple angles on the case of racial bias in SAT test.
6. Analysis and Discussion
The greatest issue that presides in the United States’ education system today is the overdependence on the SAT test. The concern here is that if racial bias does exist and if the minority students are at a disadvantage due to this test, then their whole future could be in jeopardy. This is because colleges and universities weigh heavily on SAT score even when they are discouraged to do so by the ETS itself. Some future employers can even ask for SAT score which can further setback the progress of minority groups.
This problem was voiced once again by Monty Neil, deputy director of Fair Test, in an interview with CNN: “In a technical sense, it’s probably not a biased test. The problems become in how it gets used in admissions process”, he said. “Most colleges will use the SAT as one piece of evidence, but a lot of them will use it to weed out a whole lot of kids who never then get a chance” (Prois, 2011) . Thus, colleges also lose potential students if they are too focused on the SAT score and fail to see the overall achievement picture of the applicant. When the use of the test is questioned, supporters of SAT argue this out as, “the fact of the matter is, we sort through how teenagers should be admitted to colleges and universities, the first big step in their adult life. And the SAT plays a valid role. If there’s something better out there, we should use it. If there’s something affordable that we can create, we should all by all means do it. But at the end of the day, that’s a very serious proposition, deciding if a child gets in or not, and the SAT helps”, quoted by J. Grayer in his interview with PBS (FRONTLINE, n.d.) . In this interview, Grayer clearly accepted that biases exist in the test but since SAT continues to be a decent predictor of college success and no other test up till now exist, the use of SAT should be continued. Here Grayer is forgetting an important matter that this easy attitude on part of SAT usage has resulted in maintaining the achievement gap in the American society between classes and races (Jencks & Phillips, 1998) .
The idea of racial bias existing in SAT is too presumptuous for College Board and ETS. They continue to stress that the difference in testing score is more of a reflection of the inequities present in the society rather than race. To counter this issue, for the new revision of test, SAT has partnered with nonprofit Khan Academy so study materials are available to all students for free. The verbal section on which much disparity arose has been reduced and essay section was made optional. But this can still not counter the issue of economic inequity because affluent students have advantage in their schooling, test coaching and upon failure they can even retake the test which lower class students cannot due to high fees. Thus, until and unless educational inequities present in a society are not completely addressed the gap will persist. But the issue of socioeconomic gap is another debate, and still doesn’t account for the racial bias that exists in the test. Studies have shown that even when socioeconomic status has been controlled for, racial biases continue to emerge, showing that there is something inherently wrong with the test and it is biased against non-White students ( Jencks & Phillips, 1998 , Anonymous, 1998 ).
Reduction of the verbal section by SAT this year is somewhat surprising, because this section is where much debate regarding racial bias has centered around. Thus, this was a conscious decision made on the part of SAT developers, even though they are reluctant to announce that racial score differences exist. But such changes are only superficial and still place a question mark over the authenticity of SAT. Here it is important to mention that previously College Board allotted grant to Robert Sternberg to develop an alternate test. Phase I of the study was published in 2006. The research team was looking into the Triarchic theory of successful intelligence, pioneered by Sternberg himself, and developed a test based on this concept called “Sternberg Triarchic Abilities Test (STAT)”. This test measured analytical, practical and creative skills, and the test scores were to supplement SAT scores to enhance prediction of college success. Sternberg’s initial study produced good predictive validity with reduced ethnic group differences (Sternberg & The Rainbow Project Collaborators, 2003) . The group continues to work on the test to improve it further for future use. One study though when tried to replicate these results, was unsuccessful in obtaining similar empirical evidence (Chooi, Long, & Thompson, 2014) . Further empirical studies are required to see whether STAT is a successful measure for the future of alternate testing.
Another concept which has garnered support in the scholarly circles is the stereotype threat. The claim is that society’s stigma is partially responsible for the continued existence of the score gap between different cultures. Many studies have supported this claim (Aronson, Fried, & Good, 2002., Good, Aronson, & Inzlicht, 2003; Aronson & Inzlicht, 2004) and show that the already existing beliefs for example African American will score lower than White Americans seem to act as a self-fulfilling prophecy. So even though bias might exist in the test but this phenomenon is also responsible in producing the score gap. Too much stigma is already attached with the SAT test, with constant accusations against it throughout history. Therefore, one possibility that I think can be utilized is the complete rebranding of the test. Only when the test is not seen related to the past test and is seen as a different entity can this threat be reduced. And if an alternate test to SAT is introduced in future, proper marketing and branding would be required so users view the test with more trust and confidence.
However, one might argue, the complete elimination of racial bias still has not resulted in SAT and continues to exist (Jencks & Phillips, 1998; Geiser & Studley, 2001; Jaschik, 2015) . The most shocking factor is how College Board and ETS easily rejects and ignores various findings. Most notable is the method developed by Freedle which has evidential support from other independent study as well (Santelices & Wilson, 2010) . Freedle’s research on establishing corrective scoring method R-SAT showed that for easier items in verbal section White Americans were getting unfair advantage due to their family background. Either the test developers could have looked into this corrective method or devised plans to develop only hard items which were more relevant to College studies and could be learned by individual effort rather than privilege in upbringing. But instead the organizations paid little heed and continued to claim that the data used was flawed (College Board, 2010) . The essential question that can be raised here is that if the data was flawed in both studies, why doesn’t College Board use its vast data bank and produce a study negating these findings. They can outsource and ask an independent researcher to conduct it to establish even greater authenticity. No such response or responsible action has been seen on the part of the organization.
It won’t be unfair if we recall the stakeholders of the test. Big companies, organizations, Universities and millions of students are involved in the testing process. There is huge responsibility on College Board and ETS and perhaps it won’t be wrong to say that the future is at stake here. Cosmetic changes in SAT cannot be welcomed anymore. If the score gap continues to exist in the newest revision of the test, then serious consideration is required on the part of developers to bring some heavy changes.
An alternate method that could be devised is the complete elimination of SAT test. In one study Geiser and Studley (2001) showed that Subject test of SAT (SAT II) was a superior predictor of college success. It was also shown that it was much less affected by socioeconomic status and slightly better than SAT in eliminating racial/ethnic bias. But even SAT II requires much work to eliminate racial bias. The study also showed that High School grades combined with SAT subject test was the best predictor of college success in most of the ethnic groups. Based on this research it is wise to conclude that working on SAT Subject test to eliminate racial bias would be an even better option. Too many controversies surround SAT test already. Also it is essential to recognize the importance of High School grade in predicting college success. If subjectivity is reduced at initial stages, then a better combination of predictors can emerge. High School assessment reports can be made more objective and uniform nationwide. This way reduction in subjectivity would strengthen college and university’s trust in High School grades.
For future purpose whatever the direction of research is chosen from the many possibilities, it should aim to eliminate or at least reduce racial bias in testing. Responsibility befalls on the part of all the scholars, researchers and organizations to make the right decision and not delay any further.
The complete picture that I deciphered was that many factors came into play in producing score gaps in different ethnic groups. Socioeconomic status and stereotype threat are some of the factors affecting applicants’ scores. But racial bias also continues to exist in the test and denial is not the way to deal with this grave issue. There are plenty of approaches that can be taken to remove this bias which I prefer: These include rebranding the standardized tests, unifying and maintaining objectivity in the High School assessment tests and using SAT subjective test. It is also important to research if the racial bias in SAT could be eliminated. Improvement is possible; postponing it any further would only bring the ethical and moral codes of the society under scrutiny.
This paper is an output of my M.A Psychology course “Testing Theory” and I would like to acknowledge my course supervisor Dr. Anila Amber Malik.
I don’t have any competing interest.
http://diversity.umich.edu/admissions/legal/gratz/gratsumj.html  Camara, W. J., & Echternacht, G. (2000). The SAT I and High School Grades: Utility in Predicting Success in College (Research Notes RN-10). The College Board, Office of Research and Development.
/researchnote-2000-10-sat-high-school-grades-predicting-success.pdf Chooi, W.-T., Long, H. E., & Thompson, L. A. (2014). The Sternberg Triarchic Abilities Test (Level-H) Is a Measure of g. Journal of Intelligence, 2, 56-67.  College Board (2010). College Board Response to Harvard Educational Review Article by Santelices and Wilson.
misc2010-1-response-harvard-ed-review-santelices-wilson.pdf  College Board (2015). Annual Results Reveal Largest and Most Diverse Group of Students Take PSAT/NMSQT?, SAT?, and AP?; Need to Improve Readiness Remains.