Immunocto: a massive immune cell database auto-generated for histopathology

التفاصيل البيبلوغرافية
العنوان: Immunocto: a massive immune cell database auto-generated for histopathology
المؤلفون: Simard, Mikaël, Shen, Zhuoyan, Hawkins, Maria A., Collins-Fekete, Charles-Antoine
سنة النشر: 2024
المجموعة: Computer Science
Quantitative Biology
مصطلحات موضوعية: Quantitative Biology - Quantitative Methods, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Image and Video Processing
الوصف: With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. However, current individual immune cell classification models for digital pathology present relatively poor performance. This is mainly due to the limited size of currently available datasets of individual immune cells, a consequence of the time-consuming and difficult problem of manually annotating immune cells on digitised H&E whole slide images. In that context, we introduce Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, B cell lymphocytes, and macrophages. For each cell, we provide a 64$\times$64 pixels H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. To create Immunocto, we combined open-source models and data to automatically generate the majority of contours and labels. The cells are obtained from a matched H&E and immunofluorescence colorectal dataset from the Orion platform, while contours are obtained using the Segment Anything Model. A classifier trained on H&E images from Immunocto produces an average F1 score of 0.74 to differentiate the 4 immune cell subtypes and other cells. Immunocto can be downloaded at: https://zenodo.org/uploads/11073373.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.02618
رقم الأكسشن: edsarx.2406.02618
قاعدة البيانات: arXiv