We’ll be doing a lot of multiplex amplicon-based Illumina sequencing, which means we’ll eventually have a lot of different indices (I think some people refer to these as barcodes) used to multiplex the samples. I’m doing everything as 10 nt indices, so theoretically there is 4^10 or slightly over one million unique nucleotide combinations that could be made with an index of that length. I don’t intend of having anywhere close to 1 million different primers, so I think we’re pretty safe.
That said, I’d like to ensure our indices are of sufficient distance away from each other such that erroneous reads don’t result in switching of one index for another. Anh has come up with a way that we can make sure our randomly generated indices don’t overlap with previous indices, but still useful for me to keep track and make sure things are running smoothly. Thus, I generated an identity matrix of all of the indices we have in the lab right now.
In a sense, the diagonal is a perfect match, and serves as a good positive control for the ability to see what close matches look like. By eye, the closest matches between any two unique indices seem to be 70% identity, which I can live with.