Expected number of recovered singlets
First, we assume that 50% of cells are captured into droplets. On average, for a given GEM bead, there are cells and is distributed as a Poisson distribution. However, we care about cells, not empty droplets. Therefore, we're going to ignore all empty droplets and use the zero-truncated Poisson distribution to calculate the expected recovery of singlets.
Note that as the proportion of multiplet GEMs increases, the proportion of cells in multiplet droplets increases more quickly. This is due to the fact that a multiplet takes up multiple cells, and we must normalize to the expected value and not the probability of this distribution.
We do not have a foolproof way of detecting multiplets, so we simply set the upper-bound of acceptable multiplet rate to around 2%, which typically corresponds to 10,000 loaded cells. With hashing, we can detect most of the multiplets and can load much more cells while maintaining this constraint.
Costs
With the ability to clearly detect most of the multiplets, we can load more cells into a single lane. However, as we load more cells, the proportion of discarded reads scales as the number of multiplet cells in the first figure. With constant sequencing cost, there is a point in which the sequencing cost dominates and we do not get any more savings from loading more cells.
We assume that a lane of Chromium costs $1,500.00 and that sequencing costs $3.00 per million reads.
At higher cell loads, we must turn to scifi-RNA-Seq which, instead of introducing hashing reads, avoids discarding reads by hashing the actual polyA+ RNA molecule.