Illumina sequencing of barcoded amplicons is going to be large factor in the work we’ll be doing. Going to a completely new place, I now need to explain to people how it works. To keep this post simpler, I’m skipping all of the landing pad details (hopefully you already understand it). Let’s just start with what was genomically integrated after the recombination reaction. In the case of the PTEN library, it looks like this:
As you can see, the barcode is the blue “NNNNNNNNNNNNNNNNNN” region at the bottom of the above image. We can’t directly sequence it with primers directly flanking it, since those sequences will also be present in the unexpressed plasmid sequences likely contaminating our genomic DNA. Thus, we first have to create an amplicon containing our barcode of interest, but spanning the recombination junction (“Recomb jxn” above).
For the PTEN library, the barcode is located in back of the EGFP-PTEN ORF, so we have to amplify across the whole thing. We did this by using a forward primer located in the landing pad prior to recombination (such as KAM499 found in the Tet-inducible promoter and shown in red as the “Forward “Primer), as well as a reverse primer located behind the barcode associated with the PTEN coding region (KAM501 in this case).
The above is a simplistic representation. The forward primer / KAM499 is indeed just the sequence needed for hybridization, since we can’t going to do anything else with this end of the amplicon. On the other hand, we’ll add some more sequence to the reverse primer to help us with the next steps. In this case, this is a nucleotide sequence that wasn’t present before, so that we can amplify this specific amplicon in the next step. The actual amplicon will thus look something like below:
OK, so the above amplicon is huge; too big to form efficient cluster generation. Thus, we’ll now use that amplicon to make a much smaller, Illumina compatible second amplicon. We’ll also include an index of degenerate nucleotides “nnnnnnnn” that can be used to distinguish different amplicons from each other when we mix multiples samples together before doing the actual sequencing step. This second amplicon looks something like this:
This time, the forward primer has both the hybridizing portion (in the blue-purple) as well as one of the cluster generators, shown as that light blue. At the complete other end, you can see the “nnnnnnnn” index sequence in indigo, followed by the other cluster generator sequence in orange. Please note: Here, the index is a KNOWN sequence, like GTAGCTAC, or GATCGAGC. It’s just that each sample will have a different KNOWN sequence,. so it was simpler just to denote it with “n”s for the purpose of this explanation.
There’s more in the above map though. We’ll likely want to do paired sequencing of the barcode, so we’ll give the Illumina sequencer two read primers: read 1 (in red) which reads through the barcode in the forward direction, as well as read 2 (in green), which goes through the barcode in the reverse direction. We will also need to sequence the index, though we’ve tended to only do that with one primer. This primer is Index 1, and is colored in yellow.
For the actual primer sequences and everything, look at Supplementary Table 7 of my 2018 Nature Genetics paper.