How Artificial Intelligence Is Being Misused To Harm Students – AI| AI
I hate to be the bearer of bad news but today’s AI is not as advanced as robo-graders want you to think.
If someone you love depends on the SATs or GREs this should trouble you. A recent article by NPR discussed the use of computers to robo-grade essays in exams such as the SAT and GRE.
AI is still missing the I
Developers interviewed in the article claim that “computers are already doing jobs as complicated and as fraught as driving cars, detecting cancer, and carrying on conversations, they can certainly handle grading students’ essays.” Um, okay.
- The most advanced self-driving car systems fail to generalize beyond simple driving conditions and certainly don’t do well in adverse weather.
- Cancer detection still requires humans in the loop because the best algorithms still have a false positive rate of 30% (ie: cancer predicted when there isn’t cancer).
- If by carrying on conversations you mean looping over the same set of nonsensical questions, then… sure.
According to AI researcher and Assistant Professor Zack Lipton at Carnegie Mellon University, “machine learning can find complex patterns, but all it’s doing is discovering associations in the data. For a model to output reasonably correlated predictions, it will use whatever associations it can discover.” These associations are provided by human graders who annotate the data. However, this process can introduce bias if, for example, a grader prefers more complex words. The system will learn to assign high scores to essays riddled with these abstruse words.
Below is an adversarial example designed to fool these systems. I generated it using a program developed by Professor Les Perelman at MIT, a longtime robo-grading critic.
“Activist by ruminations has not, and undoubtedly never will be contrived, pusillanimous, and ensconced…”
The full example bloats the essay with unnecessarily complex words, strings together confusing sentences and adds markers designed to fool the system such as “in conclusion.” Essays with a high number of complex words will be given a higher score by these systems.
But a score alone isn’t meaningful. Lipton points out that “the machine can’t provide guidance to tell you what you could have done differently to make your essay actually better.” That’s because machine learning systems are not sophisticated enough to dream up an improved version of a poorly written sentence.
Grades matter less than teaching students to improve
Assistant Dean Justin Snider, who’s been teaching writing at Columbia University for 10 years, emphasizes that “the main point of assessing a student’s work is not so much to arrive at a final score, but rather to help the student understand what could have been done better.”
As a former SAT prep instructor, Snider highlights the discrepancy between “teaching people about what will get a high score” and what Snider “as a writing teacher would consider good writing.” The ability to manipulate these systems has quickly rendered any signal from standardized essay scores useless for admissions officials. As of July 2018, all eight Ivy League universities no longer require applicants to take — or submit scores from — the essay portion of the SAT or ACT.
Blame the AI hype
With more money being poured into AI, companies and entrepreneurs feel more pressure to conjure up AI solutions that don’t exist. They oversell system capabilities and harm users along the way. Consumers lose trust in AI and fears of an AI apocalypse get overblown.
The real scary threat is an army of eager, unqualified entrepreneurs naively deploying AI systems without understanding the system’s limitations.
Don’t oversell AI. Protect consumers. Be transparent.