A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals

Abstract

While prior research has shown that facial images signal personal information, publications in this field tend to assess the predictability of a single variable or a small set of variables at a time, which is problematic. Reported prediction quality is hard to compare and generalize across studies due to different study conditions. Another issue is selection bias: researchers may choose to study variables intuitively expected to be predictable and underreport unpredictable variables (the ‘file drawer’ problem). Policy makers thus have an incomplete picture for a risk-benefit analysis of facial analysis technology. To address these limitations, we perform a megastudy—a survey-based study that reports the predictability of numerous personal attributes (349 binary variables) from 2646 distinct facial images of 969 individuals. Using deep learning, we find 82/349 personal attributes (23%) are predictable better than random from facial image pixels. Adding facial images substantially boosts prediction quality versus demographics-only benchmark model. Our unexpected finding of strong predictability of iPhone versus Galaxy preference variable shows how testing many hypotheses simultaneously can facilitate knowledge discovery. Our proposed L1-regularized image decomposition method and other techniques point to smartphone camera artifacts, BMI, skin properties, and facial hair as top candidate non-demographic signals in facial images.

Authors: Yegor Tkachenko and Kamel Jedidi

Format: Journal Article

Publication Date: November 29, 2023

Journal: Scientific Reports

Full Citation

Tkachenko, Yegor and Kamel Jedidi. “A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals.”

Scientific Reports

vol. 13, no.

21073

(November 29, 2023). doi:

https://doi.org/10.1038/s41598-023-42054-9

A megastudy on the predictability of personal information from facial images: Disentangling demographic and non-demographic signals

Abstract

Full Citation

External CSS