Miniprep efficiency

The SARS CoV-2 pandemic -caused research ramp-down period was a weird time for me / the lab. I sent Sarah to work from home for 10 or so weeks, meaning I had to do the lab work myself if I wanted to make any progress on any of the existing grant-work, or for any of the SARS CoV-2 research I was trying to boot up. This has resulted in some VERY long weeks over the last few months, as I was really trying to do everything at that point. Cognizant of this, I even started timing myself doing some of the more routine / mundane tasks, to see if I could try to maximize my efficiency. Perhaps the most consistent / predictable of the tasks were minipreps. In particular, I was curious whether doing more minipreps simultaneously saved me time in the long run.

So short answer was yes. 24 is a very comfortable / logical number for me (I just fill up my mini-centrifuge, and the result is divisible by three so easy for processing as complete 8-strip PCR tubes for Sanger later on), and I consistently processed those in about an hour. Dong fewer would be somewhat less efficiency, though sometimes you have to do that if you’re in a rush to get some particular clone of recombinant DNA plasmid. Then again, doing more than 24 — while somewhat exhausting — does save me some time overall. Thus, I found out that was a worthwhile strategy to plan for during that period.

That said, I’m very glad to have Sarah back in the lab helping me with some of the wet-lab work again. Not only does it save me time, but also saves me focus; I’ve gotten pretty good at multi-tasking, but I still do hit a limit in terms of the number of DIFFERENT things I can do / think about at the same time.

Modeling bacterial growth

I do a lot of molecular cloning, which means a lot of transformations of chemically competent e.coli. Using 50 uL of purchased competent bacteria would cost about $10 per transformation, which would be an AWFUL waste of money, especially with this being a highly recurring expense in the lab. I had never made my own competent cells before, so I had to figure this out shortly after starting my lab. It took a couple of days of dedicated effort, but it ended up being quite simple (I’ll link to my protocol a bit later on). Though my frozen stocks ended up working fine, I became quite used to creating fresh cells every time I need to do a transformation. The critical step here is taking a saturated overnight starter culture, and diluting it so you can harvest a larger volume of log-phase bacteria some short time later. A range of ODs [optical density here defined as absorbance at 600 nm] work, though I like to use bacteria at an OD around 0.2. I had gotten pretty good at being able to eyeball when a culture was ready for harvesting (for LB in a 250 mL flask, I found this was right when I started seeing turbidity), but I figured there was a better way to know when it’s worth sampling and harvesting.

I started keeping good notes about 1) the starting density of my prep culture (OD of the overnight culture divided by the dilution factor), 2) the amount of time I left the prep culture growing, and 3) the final OD the prep culture. I converted everything into cell density which is a bit more intuitive than OD (I found 1 OD[A600] of my bacteria roughly corresponded to 5e8 bacteria per mL), and worked in those units from there on out. Knowing bacteria exhibit exponential growth, I log base-10 transformed the counts. Much like the increasing number of COVID-19 deaths experienced by the US from early March through early April, exponential growth becomes linear in log-transformed space. I figured I could thus estimate the growth of my prep culture of competent cells by making a multi-variate linear model, where the final density of the bacteria was dependent on the starting bacterial density and how long I left it growing. I figured the lag-phase from taking the saturated culture and sticking it into cold-LB would end up being a constant in the model. Here’s my dataset, and here’s my R Markdown analysis script. My linear model seemed to perform pretty well, as you can see in the below plot. As of writing this, the Pearson’s r was 0.98.

The aforementioned analysis script has a final chunk that allows you to input the starting OD of your starter culture, and assuming a 1000-fold dilution, tells you how long you likely need to wait to hit the right OD of your prep culture. Then again. I don’t think anyone really wants to enter this info into a computer every time they want to set up a culture, so I made a handy little “look-up plot”, shown below, where a lab member could just look at their starter culture OD on the x-axis, choose the dilution they want to do (staying 2x within 1000-fold since I don’t know if smaller dilutions can affect bacterial competency), and figure out when they need to be back to harvest (or at least stick the culture on ice). I’ve now printed this plot out and left it by my bacterial shaker-incubator.

Note: The above data was collected when diluting starter culture bacteria into *COLD* LB that was stored in the fridge. We’ve since shifted to diluting the bacteria into room-temp LB (~ 25*C), which has somewhat expectedly resulted in slightly faster times to reach the desired OD. If you’re doing that too, I would suggest subtracting ~ 30min of incubation time from the above times to make sure you don’t overshoot your desired OD.

I’m still much more of a wet-lab scientist than a computational one. That said, god damn do I still think the moderate amount of computational work I can do is still empowering.

Gibson / IVA success rates

I only learned about Gibson when I started my postdoc, and it completely changed how I approached science. In some experiments with Ethan when I was in the lab, I was blown away when we realized that you don’t even need Gibson mix to piece a plasmid back together; this is something we were exploring to try to figure out if we could come up with an easier & more economical library generation workflow. I was disappointed but equally blown away when I realized numerous people had repeatedly “discovered” this fact in the literature already; the most memorable of the names given to it was IVA, or In-Vitro Assembly. Ethan had tried some experiments, and had said it worked roughly as well as with Gibson. Of course, I can’t recall exactly what his experiment was at this point (Although probably a 1-piece, DNA recircularization reaction, since this was in the context of inverse PCR-based library building, after all). So the take away I had was that it was a possible avenue for molecular cloning in the future.

We’ve done a fair amount of molecular cloning in the lab already, creating ~ 60 constructs in the first 4 months since Sarah joined. I forgot exactly the circumstances, but something was right where it made sense to try some cloning where we didn’t add in Gibson mix. I was still able to get a number of intended constructs on that first try, so I stuck to not adding Gibson mix for a few more panels of constructs. I’ve been trying to keep very organized with my molecular cloning pipelines and inventories, which included keeping track of how often each set of mol cloning reactions yielded correctly pieced-together constructs. I’ve taken this data, and broken it down based on two variables: whether it was a 1- or 2-part DNA combination (I hardly ever try more than 2 in a single reaction, for simplicities’ sake, and also because properly combined cloning intermediates may still be useful down the line, anyway), and whether Gibson mix was added or not. Here’s the current results:

Note: This is a *stacked* smoothed histogram. Essentially, the only real way to look at this data is consider the width of a given color across the range of the x-axis, relative to its thickness in other portions.

So this was extremely informative. Some points
1) I’m willing to screen at least 4 colonies for a construct I really want. Thus, I’m counting a success rate > 0.25 as being a “successful” attempt at cloning a construct. In the above plot, that means any area above the dotted red line. Thus, 1-part DNA recircularizations have pretty decent success rates, since the area of the colored curve above the red dotted like >> the area below it. Sure, Gibson mix helps, but it’s not a night-and-day difference.
2) 2-part DNA combinations are a completely different story,. Lack of Gibson means that I have just as many failed attempts at cloning something as successful attempts. Those are not great odds. Adding Gibson mix makes a big difference here, since it definitely pushes things in favor of a good outcome. Thus, I will ALWAYS be adding GIbson mix before attempting any 2-part DNA combinations.

Other notes: I’m using home-grown NEB 10-beta cells, which give me pretty decent transformation rates (high-efficiency 1-part recircularization reactions can definitely yield many hundreds of colonies on the plate from a successful attempt), so there have been relatively few plates where I literally have ZERO colonies, where I’m more likely to have a few colonies that are just hard-to-remove residual template DNA).

Plasmid Lineages

Recombinant DNA work is integral to what we’re doing here, so I’ve become extremely organized with keeping track of the constructs we are building. This includes having a record of how sequences from two constructs were stitched together to create a new construct. Here’s a network map showing how one or more different plasmid sequences were combined to create each new construct.

[The series of letters and numbers prefixed with G (for Gibson) are unique identifiers I started giving new constructs when it became clear partway through my postdoc that I was going to need a better way of tracking everything I was building. Those prefixed with A are constructs obtained through addgene. Those prefixed with R are important constructs I had built before this tracking system, where I had to start giving them identifiers retroactively.]

Edit 9/1/2020: Even if some of my code / script-writing is kind of haggard, I figure I’ll still publicly post them in case it’s useful for trainees. Thus, you can find the script + data files to recreate the above plot at this page of the lab GitHub.

HEK293Ts with melanin

I think synthetic biology is really cool, and I like playing around with recombinant DNA elements so I can see how well they work in my own hands. If they work OK, then I just let that knowledge stew in the back of my brain until I can eventually figure out a use for it. Reading this paper by Martin Fussenegger made me realize just how easy it is to make cultured cells express melanin. Here was my first foray in creating melanin in HEK cells by overexpressing tyrosinase/

Cells pelleted in the tubes on the left are expressing tyrosinase. The cells pelleted in the tubes on the right are not.

Doesn’t quite work well enough to use as a general reporter (it’s really hard to tell in a cell monolayer, and only becomes noticeable as colonies of cells or in a pellet, like above), but still kind of fun to see. Let’s see if I find an eventual use for this in some future work.

Primer Inventory Google Sheet To Benchling

Here’s a Python script that converts the MatreyekLab primer google sheet into a csv file that is easily imported into benchling.

1) Go to the lab primer inventory google sheet -> “https://docs.google.com/spreadsheets/d/15RDWrPxZXN34KhymHYkKeOglDYzfkhz5-x2PMkPo0/edit?usp=sharing”

2) Go to file -> download -> Microsoft Excel (.xlsx)

3) Then take above file (MatreyekLab_Primer_Inventory.xlsx) and put it in the same directory as the Google_sheet_to_benchling.py file.

4) Open terminal, go to the right directory, and then enter:

Python3 Google_sheet_to_benchling.py

5) It should make a new file called “Matreyeklab_primers_benchling.csv”. The text in this file can be copy-pasted into benchling and imported into the “Primer” folder.

Uploading the list to Benchling

6) Next, Log onto Benchling, go into the “MatreyekLab” project and into the “O_Primers” folder. Make a new folder named with the date (eg. “20200130” for January 30th, 2020). Once in the folder, select “Import Oligos”, and select the csv for importing.

Using the new primer list

7) Once it does finishes uploading, you can go to whatever plasmid map you want to annotate with our current primers. Go to the right-hand side, two icons down to “Primers”. Hit attach existing, add the new folder as the new location, and hit “find binding sites”. Select all of the primers (top check box), and then hit the “Attach Selected Primers” button in the top right.

8) Now click on the sequence map tab and Voila!, you can see the plasmid map now annotated. Find the primer you want (sequencing or otherwise) and go do some science.

Basic data analysis in R Studio

So as I bring in trainees into the lab, I’ll want them to learn how to do some (at the very least) basic data analyses. I realize they may not know exactly where to start, so as I go about making my own basic analysis scripts for analyzing data relevant to me, I’ll post them here and make sure they’re reasonably well commented to explain what is going on.

OK, the specific backstory here. Back in UW, we had next-day IDT orders. It became very clear that was not going to be the case at CWRU, especially after talking to the IDT rep (who did not seem to really care about having a more thriving business here). So, I priced out my options, and Thermo ended up handedly winning the oligo price battle ($0.12 a nucleotide. which is a slight improvement over the $0.15 we seemed to be paying at UW *Shrug*). Thermo also does no shipping cost (another bonus), with the downside being that they only deliver on Tuesdays and Thursdays. I wanted to figure out how long it takes to receive primers are ordering them, so I’ve been keeping track of when I made each order, and how long it took to arrive. Now that I have a decent number of data-points, I decided to start analyzing it to figure out if there were any patterns emerging.

Here’s a link to the data. Here’s a link to the R Markdown file. You’ll want both the data and the R Markdown file in the same directory. I’m also copy-pasting the code below, for ease:

Primer_wait_analysis

This is the first chunk, where I’m setting up my workspace by 1) Starting the workspace fresh 2) Importing the packages I’ll need 3) Importing the data I’ll need

# I always like to start by clearing the memory of existing variables / dataframes / etc.

rm(list = ls())

#Next, let's import the packages we'll need.
#1) Readxl, so that I can import the excel spreadsheet with my data
#2) Tidyverse, for doing the data wrangling (dyplr) and then for plotting the data (ggplot)

library(readxl)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Lastly, use readxl to import the data I want

primer_data <- read_excel("How_long_it_takes_to_receive_items_after_ordering.xlsx")

The idea is to make a bar-graph (or equivalent) showing how many days it takes for the primers to arrive depending on what day I order the primers. I first have to take the raw data and do some wrangling to get it in the format I need for plotting.

# I care about how long it took me to receive primers in a *normal* week, and not about what I encountered during the holidays. Thus, I'm going to filter out the holiday datapoints.

primer_data_normal <- primer_data %>% filter(holiday == "no")

# Next, I want to group data-points based on the day of the week

primer_data_grouped <- primer_data_normal %>% group_by(day_of_week) %>% summarize(average_duration = mean(days_after_ordering), standard_deviation = sd(days_after_ordering), n = n())

# Now let's set the "day_of_week" factor to actually follow the days of the week

primer_data_grouped$day_of_week <- factor(primer_data_grouped$day_of_week, levels = c("M","T","W","R","F","Sa","Su"))

# Since I'll want standard error rather than standard deviation, let's get the standard error
primer_data_grouped$standard_error <- primer_data_grouped$standard_deviation / sqrt(primer_data_grouped$n)

Now that the data is ready, time to plot it in ggplot2.

Primer_plot <- ggplot() + geom_point(data = primer_data_grouped, aes(x = day_of_week, y = average_duration)) +
  geom_errorbar(data = primer_data_grouped, 
                aes(x = day_of_week, ymin = average_duration - standard_error, 
                    ymax = average_duration + standard_error), width = 0.5) + 
  geom_text(data = primer_data_grouped, aes(x = day_of_week, y = 0.2, label = paste("n=",primer_data_grouped$n))) + geom_hline(yintercept = 0) + scale_y_continuous(limits = c(0,7), expand = c(0,0.01)) +
  theme_bw() + xlab("Day primer was ordered") + ylab("Days until primer arrived") + theme(panel.grid.major.x = element_blank())

ggsave(file = "Primer_plot.pdf", Primer_plot, height = 4, width = 6)
Primer_plot

Hmm, probably should have made that figure less tall. Oh well.

So the n values are still rather small, but it looks like I’ll get my primers soonest if I order on Monday or maybe Tuesday. In contrast, ordering on Thursday, Friday, or Saturday give me the longest wait (though well, some of that are the weekend days, which don’t matter as much). Thus, if I have any projects coming up where I have to design primer, it’s probably worth me taking the time to do that on a Sunday or Monday night instead of late in the workweek.

UPDATE 2/22/2020: Uhhh, so the plot above was my first draft attempt, and I have since honed in on the right representation of hte data:

And that’s because average numbers are only somewhat meaningful, and the distribution of frequencies is much more relevant / accurate. Here’s a link to the updated script for generating the plot with the data file at this other link.