Information

Jun 10, 2024

Historical Organization and Proposal of Dataset Terminology in Deep Learning for Medicine

Historical Organization and Proposal of Dataset Terminology in Deep Learning for Medicine

We examined the historical evolution and causes of confusion in dataset terminology, focusing on the interpretation of "validation" used in the medical and deep learning fields, and presented solutions that both domains can share.

Paper

Data set terminology of deep learning in medicine: a historical review and recommendation
Japanese Journal of Radiology
https://doi.org/10.1007/s11604-024-01608-1

Author's Comments

Originally, at the 2023 Japanese Society of Medical Radiology annual meeting, we received significant feedback regarding how deep learning dataset terminology, particularly the word "validation," is used very differently in medicine and engineering. There were successive questions about "why the terminology and meanings are so inconsistent," and academic society officials strongly recommended publishing a paper, which led to this writing. We ourselves have repeatedly witnessed instances where intentions were misaligned despite using the same words while navigating between the medical and AI technology worlds. This paper, responding to such voices from the field, was published as an Invited Review summarizing content organized from history and case studies in the June 2024 issue of the Japanese Journal of Radiology.

Paper Overview

This paper organized and contrasted the dataset handling methods accumulated in the engineering field, where deep learning originated, with the concept of validation traditionally emphasized in the medical field. We focused particularly on how the term "validation" often refers to the final accuracy confirmation stage in medicine, while in the deep learning world, it frequently indicates an intermediate stage for parameter adjustment. Considering that misunderstandings arising from this could risk affecting actual research reports and clinical application evaluations, we decided to conduct cross-disciplinary terminology organization based on historical background.

Paper Details

We first reviewed the historical meaning of "validation" in medical literature, exploring the background that has emphasized the concept of "verification" as a final confirmation of diagnostic accuracy. In contrast, deep learning early on established a three-part division of "training," "validation," and "test," and we explained how validation functions not as a final evaluation but as an intermediate role to prevent model overfitting. Furthermore, we introduced the distinction between internal and external data for test sets used for final evaluation and the significance of temporally and geographically divided external datasets, indicating what division methods are appropriate. In conclusion, we proposed that standardizing the three divisions of "training," "validation (or tuning)," and "test" in medicine as well, and clearly defining them in papers, would be an important measure to smoothly connect research in both fields. We expect that such organization will reduce unintended misunderstandings between medical practitioners and AI researchers, and further enhance the reproducibility of results and the versatility of models. We feel that terminology standardization will become increasingly important as deep learning continues to be widely used in the medical field, and we hope this paper will serve as a foundation for that.