As application numbers surge at elite schools, admissions offices face a daunting task of reading through countless personal statements and essays. AI offers a solution: automating essay evaluation. This month, Jed Applerouth, PhD takes a deep dive into the revolution in AI-led essay-reading that’s already underway.
As applications to the most selective schools in the country continue to increase, leading to record application rates, the reading burden on admissions offices increases as well. One of the most time-intensive aspects of reviewing an application involves reading the personal statements and supplemental essays. As AI begins to impact work processes in so many varied domains, It’s only a matter of time until college admissions offices across the country begin to leverage artificial intelligence to help expedite the review process and simplify one of the most time-intensive tasks: reading these student-generated admissions essays.
AI systems can relatively easily evaluate the writing level of an essay, the level of vocabulary, grammatical correctness, length of sentences and variety of sentence structures. For any of you who have ever used Chat GPT, Claude or any other Large Language Model, you know one of the main use cases of these tools is to provide brief content and tone summaries of longer passages. These tools can almost instantly review an essay for content and writing level, which can obviously expedite an application review. Additionally, these tools can be trained to look for evidence of traits or qualities that admissions offices value, such as community involvement, leadership, determination or “grit.” To demonstrate this, simply run any of these sample essays through GPT4, Claude or Gemini and ask for a synopsis, to “evaluate this essay for tone, content and the personal qualities of the author.” The output is impressive.
The Slate Platform, used by hundreds of admissions offices across the country, has promised to deliver AI essay evaluation through its Pre-Reader: “the Slate Pre-Reader will summarize what a reviewer needs to know about a letter of recommendation, college essay, etc.” This function is still in development, but once rolled out, it’s poised to have a very meaningful impact on the admissions world.
A research group based out of the University of Pennsylvania and the University of Colorado at Boulder developed a tool (i.e., modified a Facebook Large Language Model, ROBERTA) to analyze and evaluate student admission essays across seven variables, including teamwork, prosocial purpose, intrinsic motivation and leadership. Admissions officers scored 3,000 student-submitted essays across these variables and then used this data to train the model. Once trained and fine-tuned using the human feedback, the model was applied to 300,000 previously submitted essays (from the 2008-2009 admissions year) and was able to successfully score the essays similarly to the human evaluators.
Through their analysis of the calculated essay scores and subsequent academic outcomes of these applicants, researchers found that meaningful insights were gleaned from the AI scoring. Students whose essays scored positively for leadership were more likely to graduate from college in six years than those whose essays did not, even after controlling for differences in test scores, demographics and other factors. This research offers evidence that AI systems can effectively evaluate student essays for traits that are valuable to colleges.
One of the biggest costs to test-makers is the need to score student essays. Sixteen years ago, the GMAT shifted from having multiple human readers to having a single human reader and a machine grade the Analytical Writing assessment portion of the GMAT assessment. GMAC, the nonprofit that administers the GMAT, partnered with Vantage Learning, using its IntelliMetric essay-grading platform. ACT, Inc. provides the human scoring, and IntelliMetric provides the automated essay scoring, evaluating “more than 50 structural and linguistic features, including organization of ideas, syntactic variety, and topical analysis." In the event the human and computer score are more than a point (out of 6 total) apart, another human is brought in to resolve the discrepancy, which happens extremely rarely. Inter-rater reliability is incredibly high, with scoring discrepancies beyond 1 point occurring in under 5% of cases. This is to say that machines have been successfully evaluating student writing ability, very similarly to humans, in the context of higher education admissions for more than a decade.
The majority of human essay graders who have historically graded state-mandated essays on the State of Texas Assessments of Academic Readiness (STAAR) exams for 3rd-8th graders are being replaced by an “automated scoring engine.” Approximately four thousand human exam graders will be replaced by AI, and annual cost savings are estimated to be in the $15-20 million range. Vice magazine reported in 2019 that 21 states were already using AI essay-scoring engines to some capacity. That was well before the 2023 revolution in Large Language Models, spearheaded by Open AI’s GPT4, Anthropic’s Claude, and Google’s Gemini. The technology has advanced so much, so quickly, and is now vastly superior to everything that came before.
As applications have skyrocketed at the most selective institutions, the reading burden on each admissions office has followed in lockstep. Admissions directors do not want to overly burden or burn out their staff, and AI will provide a release valve to reduce the pressure and the workload. Rick Clark of Georgia Tech has seen the burden the application explosion has placed on his team. He offered: “If you can have an AI model run through and then a human just sort of spot checks it, and it can go ahead and make those decisions, that’s just going to let your team focus on what’s viable or important.” Cutting-edge admissions offices will be the early adopters within the next several years, but it’s only a matter of time until AI becomes broadly embedded in the admissions process. Reading and evaluating essays may be a natural first step towards a more automated and efficient review process.