Series: Machine Learning Library
Description: (In progress) Introductory concept to multiple instance learning.

What is Multiple Instance Learning?

In traditional supervised learning, there exists one class or label for one instance/object. For example - let’s consider a situation where we have a collection of images taken in nature, with each image showing an animal. In traditional supervised learning, each image would have a label or class (e.g. dog, bear, etc.).

But what if we didn’t have access to the label of each image, and instead we only had access to the region a group of images was taken? This scenario is exactly what multiple instance learning is trying to solve! This might sound like a very specific problem situation - but this problem is actually extremely common in medical imaging!

Multiple instance learning is useful for a variety of reasons:

  1. Inferring characteristics about instances without explicit supervised labels
  2. Predicting characteristics about a group of instances using instance data

Multiple Instance Learning in Medical Imaging

We work with single cell data - features are calculated for each cell visible in an image, such as whole slide images (WSI) or tissue microarrays (TMA). We often don’t have labels for these cells - and obtaining such labels is incredibly difficult. For example, some cell labels of interest might be specific phenotypes, but this can vary from project to project. For example, in one project we might be interested in the identity of the cell (e.g. immune cell, epithelial etc), and in another we might be interested if the cell displays a phenotype which contributes to aggressive tumor growth. As labels can vary from project to project, this makes it extra difficult to obtain labels for each cell.

We do, however, have access to grouped labels - such as treatment outcomes for a patient, diagnosis of a biopsy, etc. With multiple instance learning, we can train classifiers to predict characteristics of a group of cells or of the patient using abundant cell data.