The path toward equal performance in medical machine learning

Institut for Fødevare- og Ressourceøkonomi (IFRO)

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Standard

The path toward equal performance in medical machine learning. / Petersen, Eike; Holm, Sune ; Ganz, Melanie; Feragen, Aasa.

I: Patterns, Bind 4, Nr. 7, 100790, 2023.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Harvard

Petersen, E, Holm, S, Ganz, M & Feragen, A 2023, 'The path toward equal performance in medical machine learning', Patterns, bind 4, nr. 7, 100790. https://doi.org/10.1016/j.patter.2023.100790

APA

Petersen, E., Holm, S., Ganz, M., & Feragen, A. (2023). The path toward equal performance in medical machine learning. Patterns, 4(7), [100790]. https://doi.org/10.1016/j.patter.2023.100790

Vancouver

Petersen E, Holm S, Ganz M, Feragen A. The path toward equal performance in medical machine learning. Patterns. 2023;4(7). 100790. https://doi.org/10.1016/j.patter.2023.100790

Author

Petersen, Eike ; Holm, Sune ; Ganz, Melanie ; Feragen, Aasa. / The path toward equal performance in medical machine learning. I: Patterns. 2023 ; Bind 4, Nr. 7.

Bibtex

@article{faa1d7ffe01549649d5480a70a0aabfa,

title = "The path toward equal performance in medical machine learning",

abstract = "To ensure equitable quality of care, differences in machine learning model performance between patient groups must be addressed. Here, we argue that two separate mechanisms can cause performance differences between groups. First, model performance may be worse than theoretically achievable in a given group. This can occur due to a combination of group underrepresentation, modeling choices, and the characteristics of the prediction task at hand. We examine scenarios in which underrepresentation leads to underperformance, scenarios in which it does not, and the differences between them. Second, the optimal achievable performance may also differ between groups due to differences in the intrinsic difficulty of the prediction task. We discuss several possible causes of such differences in task difficulty. In addition, challenges such as label biases and selection biases may confound both learning and performance evaluation. We highlight consequences for the path toward equal performance, and we emphasize that leveling up model performance may require gathering not only more data from underperforming groups but also better data. Throughout, we ground our discussion in real-world medical phenomena and case studies while also referencing relevant statistical theory.",

author = "Eike Petersen and Sune Holm and Melanie Ganz and Aasa Feragen",

year = "2023",

doi = "10.1016/j.patter.2023.100790",

language = "English",

volume = "4",

journal = "Patterns",

issn = "2666-3899",

publisher = "Cell Press",

number = "7",

}

RIS

TY - JOUR

T1 - The path toward equal performance in medical machine learning

AU - Petersen, Eike

AU - Holm, Sune

AU - Ganz, Melanie

AU - Feragen, Aasa

PY - 2023

Y1 - 2023

N2 - To ensure equitable quality of care, differences in machine learning model performance between patient groups must be addressed. Here, we argue that two separate mechanisms can cause performance differences between groups. First, model performance may be worse than theoretically achievable in a given group. This can occur due to a combination of group underrepresentation, modeling choices, and the characteristics of the prediction task at hand. We examine scenarios in which underrepresentation leads to underperformance, scenarios in which it does not, and the differences between them. Second, the optimal achievable performance may also differ between groups due to differences in the intrinsic difficulty of the prediction task. We discuss several possible causes of such differences in task difficulty. In addition, challenges such as label biases and selection biases may confound both learning and performance evaluation. We highlight consequences for the path toward equal performance, and we emphasize that leveling up model performance may require gathering not only more data from underperforming groups but also better data. Throughout, we ground our discussion in real-world medical phenomena and case studies while also referencing relevant statistical theory.

AB - To ensure equitable quality of care, differences in machine learning model performance between patient groups must be addressed. Here, we argue that two separate mechanisms can cause performance differences between groups. First, model performance may be worse than theoretically achievable in a given group. This can occur due to a combination of group underrepresentation, modeling choices, and the characteristics of the prediction task at hand. We examine scenarios in which underrepresentation leads to underperformance, scenarios in which it does not, and the differences between them. Second, the optimal achievable performance may also differ between groups due to differences in the intrinsic difficulty of the prediction task. We discuss several possible causes of such differences in task difficulty. In addition, challenges such as label biases and selection biases may confound both learning and performance evaluation. We highlight consequences for the path toward equal performance, and we emphasize that leveling up model performance may require gathering not only more data from underperforming groups but also better data. Throughout, we ground our discussion in real-world medical phenomena and case studies while also referencing relevant statistical theory.

U2 - 10.1016/j.patter.2023.100790

DO - 10.1016/j.patter.2023.100790

M3 - Journal article

C2 - 37521051

VL - 4

JO - Patterns

JF - Patterns

SN - 2666-3899

IS - 7

M1 - 100790

ER -

ID: 359977749