What does an “audit” for bias in automated hiring tech really mean?

In conversations with automated hiring vendors, we learned that processes for bias audits vary widely.

article cover — Yuichiro Chino/Getty Images

April 1, 2022

• 7 min read

In our previous story on bias in automated machine-learning hiring platforms, we discussed why employers may want vendors to procure audits of AI-powered hiring tools to ensure they aren’t resulting in unintended bias.

But this seemingly simple solution is deceptively complex: To date, there is no accepted industry standard for what constitutes an AI audit.

Pymetrics, HireVue, and Humanly, three AI-powered HR tech vendors that specialize in streamlining aspects of recruiting, each conduct audits in markedly different ways. Pymetrics contracted a team of computer scientists to test its engineering processes. HireVue hired consultancies to assess bias and validity, and Humanly regularly works with an external vendor to vet its chatbot’s lexicon for potentially problematic language.

All three consider their methods an AI audit.

“There isn’t a vocabulary, there isn’t even a nomenclature that’s common among all this stuff,” Kevin Parker, HireVue’s former CEO, who is now an advisor to the company, told HR Brew.

An audit is an audit is an…audit? Christo Wilson, an associate professor of computer science at Northeastern University, was hired by pymetrics to audit the platform for bias in 2020. Wilson and his team spent three months in 2020 analyzing the machine-learning input datasets, verifying that decisions passed the four-fifths rule (a rule of thumb established by the EEOC for organizations to evaluate employment processes for potential discrimination), assessing pymetrics quality-assurance processes, and putting the AI through its paces in a series of test attacks.

Overall, Wilson told HR Brew that “overall, they passed the audit” (interested parties can read the peer-reviewed results on the pymetrics website), but emphasized that his audit only tested whether the technology produced unbiased choices—he didn’t test the validity of the platform (whether it selected people who would be any good at the job).

“I don’t feel entirely qualified to do that work,” Wilson explained. “Their product has this gamified assessment thing…[that is] drawn from the cognitive science literature. I’m not a cognitive scientist.”

Frida Polli, CEO of pymetrics, told HR Brew that a validation study is currently under review for publication.

HireVue’s audit was, according to Lindsey Zuloaga, HireVue’s chief data scientist, “very different” from pymetrics.

HireVue made headlines in 2019 when AI researchers dubbed its methods of evaluating candidates using facial recognition AI “digital snake oil.” Though the company said they discontinued facial analysis from new assessments in 2020, skepticism around the tech stuck.

Parker said HireVue pursued independent evaluations of the bias and validity of the platform in part to build trust.

“People have concerns around what we're doing. We want to ease their concerns. We want to be very transparent and clear and communicate how we're doing things,” Parker said.

HireVue hired O’Neil Risk Consulting and Algorithmic Auditing (ORCAA) to perform an algorithmic audit (assessing bias), and Landers Workforce Science LLC to perform an industrial organizational psychology audit (assessing validity). Unlike Wilson’s team, which audited pymetrics’s code, ORCAA and Landers performed their audits by reviewing documents, using the game-based platform as though they were candidates, and interviewing stakeholders.

“The approach [ORCAA] takes is pretty holistic. They interview many stakeholders that interact with the system. And then basically make a list of any possible concerns or problems that could arise, and we address all those individually,” Zuloaga explained.

Overall, ORCAA said HireVue passed its audit—though the ORCAA audit, and critics, specifically note that it was “not a comprehensive audit of HireVue’s use of algorithms.” At the time of ORCAA’s audit, HireVue was still testing “the relationship between accents and competency scores assessed by video at the time of the report.” Like pymetrics, the full reports are available for download on HireVue’s website.

“HireVue did not limit the information ORCAA had access to in any way,” said HireVue Chief Data Scientist Lindsey Zuloaga in an email. “The ORCAA team asked for multiple, specific analyses of the data and models. Our science team provided everything that was asked for and went into extensive detail about our algorithms and the function of our optimization and bias mitigation.”

Prem Kumar, Humanly’s CEO, told HR Brew that his company performs internal and third-party audits of their chatbot recruiting tech. Kumar said that before new words are approved, Humanly will run the language by TalVista, a third-party vendor, to check for “ageist or gender-biased” language.

What’s in a word? Wilson said one of the reasons audits vary so wildly from company to company is that there isn’t “a sustained ecosystem” to support and direct AI audits. This leads to questions about terminology, scope, and what results to share.

“[At] one of the AI conferences… someone did a session on the 26 different definitions of bias in machine learning, and all sorts of different ways of defining the concept of measuring it, and then all different ways of trying to mitigate it…none of that is easy stuff,” MacCarthy recalled

Ryan Carrier, executive director of ForHumanity, which launched an independent AI audit certification course, said that this lack of standards means that the reports from companies like pymetrics and HireVue aren’t audits, they are “a consulting gig.” In Carrier’s opinion, until AI audits look more like the audits in other industries—which have independent standards, certified practitioners, and “no feedback loop” back to the company—the words “AI audits” are meaningless.

There are some efforts underway to create such standards. In addition to ForHumanity, nonprofit organization The Data & Trust Alliance brought more than 200 experts together to create an “Algorithmic Bias Safeguards for Workforce” document that includes a scorecard to “grade and compare vendors and document issues” in AI hiring tech. Last month, the National Institute of Standards of Technology published a document called “Towards a Standard for Identifying and Managing Bias in Artificial Intelligence” as “part of its effort to establish a voluntary framework for developing trustworthy and responsible artificial intelligence,” according to JD Supra.

Time will tell. MacCarthy told HR Brew that goals for long-term audit perfection shouldn’t get in the way of good short-term intentions. Vendors that want to jump on the auditing train have, in his opinion, a great place to start: Measuring tech against the four-fifths rule.

“Give us a decade or two, and then we'll be able to give you a better answer to what a good audit is. I think it's a mistake to wait that long [to start] when you've got something that's good enough for right now and something that could be put in place pretty easily and pretty quickly,” MacCarthy said. “Again, what I'm talking about is the standard disparate-impact assessment, we’re using the 80% rule of thumb."—SV

Do you work in HR or have information about your HR department we should know? Email [email protected] or DM @SusannaVogel1 on Twitter. For completely confidential conversations, ask Susanna for her number on Signal.

Quick-to-read HR news & insights

From recruiting and retention to company culture and the latest in HR tech, HR Brew delivers up-to-date industry news and tips to help HR pros stay nimble in today’s fast-changing business environment.