Well, something came up this week that made me realize it’s worth having a decision tree to codify the recommended procedure for screening bacterial colonies for successful molecular cloning:

Well, something came up this week that made me realize it’s worth having a decision tree to codify the recommended procedure for screening bacterial colonies for successful molecular cloning:
Plasmidsaurus whole-plasmid nanopore sequencing is a fantastic service. As of today (3/25/25), we’ve sent 1357 samples into them(!!). And they’ve essentially been worth every penny. But there are a bunch of different reasons to use them, and I wanted to make sure everyone in the lab was on the same page in terms of reasons that are well justified vs those that are more arguable, so I made this following decision tree (also, this codifies lab policies of sequencing plasmids from unverified sources before working with them). For anyone in the lab; let me know if you think we should make any specific changes to it (I tried to remember what we discussed in lab meeting, but I may have forgotten something).
As mentioned in a previous post, starting January 20th, 2023, I began doing detailed accounting of how many minutes I was spending each day performing different types of work. This was largely motivated by my involvement on a departmental “committee” where there were around 3 other members assigned to the committee (so, theoretically, everybody could be doing ~ 25% of the necessary work), but it became clear that nobody was really going to do anything, so I had to essentially do > 80% of the necessary work to have the whole thing not completely fail outright. As of today (March, 21, 2025), this time-stamped spreadsheet of my activity log has 5,075 rows. Well, aside from allowing me to keep “receipts” of how much of my actual effort was going toward this piece of departmental service (rather than the assumed ~ 25% ), it had the secondary benefit of allowing me to actually quantitate how long I was spending on any given work-related item (whether it was service-related or not). Here are some activities I was able to assess.
Manuscripts: So for the last 4 manuscripts from the lab, it’s taken me, on average, ~150 hours to go from “OK, time to start putting together the manuscript” to having~ it done. Actually, the number is probably closer to 200 hours on average, since two of these are still ongoing, and at lesat one of these I’ve been working on since like 2021. But ya, it really is still a big lift for me to get a paper published from the lab. Which, in one sense, is probably what it should be. But it is also exhausting.
Presentations: There were four presentations during that span, and it on average, took me about 20 to 25 hours to prep for each. I suppose this is because each presentation was a different topic (necessitating starting from “scratch”).
Grant_writing: This is going to be a completely anomalous number. Two were letters of intents (LOIs), and thus just short ~2-page descriptions, and one was a full-application that took an LOI-like condensed format (This was also a joint grant application, so I didn’t need to pull all of the weight on it). This is because all of my previous major grant applications were made before I started collecting this data. Next time I need to write something that is R01-sized, I imagine it will take me double that number, if not more.
Rotations: Each PhD student rotation seems to take about 15-20 hours of my time. Probably a somewhat moot point for next year, though, since it’s unclear whether I could / should take another student at that point.
Teaching, hiring, and editing fellowship applications: These seem to take between 15 to 22 hours of my time per (new) lecture, new staff hire, and per application. But n = 1 for each, so we’ll see what happens in the future (maybe).
RPPRs: 9 hours on average. Doesn’t seem unreasonable.
Reviewing manuscripts: 5 hours on average. Again, seems pretty reasonable.
I was trying to streamline our existing attB vector. I was prompted to do this for a few reasons: 1) I recently identified a previously unappreciated T7 (bacteriophage) promoter and potential cryptic bacterial promoter in our standard plasmid, 2) There are presumably some weak cryptic eukaryotic promoters hidden somewhere in the plasmid too, and 3) I was trying to “domesticate” the plasmid to get rid of some Type II and Type IIS restriction enzyme sites.
As part of the most recent plan, I decided to delete two different sections of the plasmid; one was the segment of DNA between the attB site and the Amp promoter driving the AmpR gene, and the second was the segment between the origin of replication and the SV40 PolyA signal. First one worked fine (which I knew, since I eventually remembered that I had previously done this back in like….2015, but never used it again for some reason). The second one proved very problematic. Here’s the section in question:
So I had seen those annotations for the lac promoter and lac operon, but assuming the directionality of the map was true, it seemed like they weren’t really pointed at enough bacterial sequence to matter so I just assumed they were vestiges of something. Well, this is what happened in terms of my plasmid yields in this lineage of plasmid.
I’m not going to read into the slightly higher concentration of L036 too much, but man, what really smacks you in the face is just how bad yields became with L048 (a derivative of L036). Well, so I tested the panel for their ability to recombine into landing pad cells, and the phenotype there was obvious as well; all plasmids up to and including L036 recombined at high rates, whereas L048 and one of its sibling plasmids with the same deletion had nonzero but *severely* diminished recombination. So not only is the DNA yield bad, but the “quality” in some sense seems to be much worse in that the DNA that is there is not resulting in good recombination.
I’ve learned my lesson, and I’m now just trying to take out that last BspQI/SapI site with a nucleotide substitution.
But still, this begets the question: so what in the world is in that DNA section, and why is it so important for plasmid propagation? I’m sure some bacteriologists and perhaps some old-school molecular biologists should know, but I’ve always lamented how much of a black box the bacterial portions of plasmids are (my expertise is in eukaryotic / mammalian cell biology). Will I ever figure this out?
Well, I will first try with pLannotate. That’s what told me there was a T7 primer site in this plasmid after all.
Well, so that didn’t really uncover anything new. Hmmm…
I end up having to Google search the commands the relevant commands every time I need to make publication-quality figures in Pymol, so I’m just going to note them here to save myself some time.
See above for an example of what a resulting image looks like.
iCasp9 as a negative selection cassette is amazing. Targeted protein dimerization with AP1903 / Rimiducid is super clean and potent, and the speed of its effect is a cell culturist’s dream (cells floating off the plate in 2 hrs!). It really works.
But when there are enough datapoints, sometimes it doesn’t. I have three recorded instances of email discussions with people that have mentioned it not working in their cells. First was Jeff in Nov 2020 with MEFs. Then Ben in June 2021 with K562s. And Vahid in July 2021 with different MEFs. Very well possible there’s one or two more in there I missed with my search terms.
Reading those emails, it’s clear that I had already put some thought into this (even if I can’t remember doing so), so I may as well copy-paste what some them were:
1) Could iCasp9 not work in murine cells, due to the potential species-based sequence differences in downstream targets? Answer seems to be no, as a quick google search yields a paper that says “Moreover, recent studies demonstrated that iPSCs of various origin including murine, non-human primate and human cells, effectively undergo apoptosis upon the induction of iCasp9 by [Rimiducid]. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7177583/
Separately, after the K562s (human-derived cells) came into the picture:
This is actually the second time this subject has come up for me this week; earlier, I had a collaborator working in MEF cells note that they were seeing slightly increased but still quite incomplete cell death. That really made me start thinking about the mechanism of iCasp9-based killing, which is chemical dimerization and activation of caspase 9, which then presumably cleaves caspases 3 and 7 to then start cleaving the actual targets actually causing apoptosis. So this is really starting to make me think / realize that perhaps those downstream switches aren’t always available to be turned on, depending on the cellular context. In their case, I wondered whether the human caspase 9 may not recognize the binding / substrate motif in murine caspase 3 or 7. In yours, perhaps K562’s are deficient in one (or both?) of those downstream caspases?
Now for the most recent time, which happened in the lab rather than by email: It was recently brought up that there is a particular landing pad line (HEK293T G417A1) which we sometimes use, that apparently has poor negative selection. John and another student each separately noticed it. Just so I could see it in a controlled, side-by-side experiment, I asked John if he’d be willing to do that experiment, and the effect was convincing.
So after enough attempts and inadvertently collecting datapoints, we see the cases where things did not go the way we expected. Perhaps all of these cases share a common underlying mechanism, or perhaps they all have unique ones; we probably won’t ever know. But there are also some potentially interesting perspective shifts (eg. a tool existing only for a singular practical purpose morphing into a potential biological readout), along with the practical implications (ie. if you are having issues with negative selection, you are not alone).
This is the post I will refer people to when they ask about this phenomenon (or what cell types they may wish to avoid if they want to use this feature).
So many of my experimental readouts in my scientific career have been fluorescence-based (and for good reason), but especially as we keep doing pseudotyped virus infection assays, it’s becoming really prohibitive to continue reading out infection by flow cytometry (b/c of cost, availability of the instruments, etc), so we’ve recently shifted over to luminescence. As part of this, I wanted to create a recombination vector construct that encodes firefly luciferase, so we can use it as a control when needed. (I also just remembered the other reason we made this plasmid, and it was also to have a luminescence version of testing recombination efficiencies).
Anyway, I recently recombined and selected cells with two independently generated constructs (clones G1402C and G1402D), and determined how many recombined (fLuc-expressing) cells need to be in the well to be detectable. Here’s the resulting plot.
The datapoints on the y-axis/ left edge of the plot are media only, where there is no luciferase enzyme (thus helping to establish the background of the assay). You can tell based on the plot that we start getting detectable values around 10 cells per well, and we’re clearly in the linear range by ~ 100 cells per well. Above that value, it’s clearly in the linear range at least through 250,000 fLuc expressing cells (it’ll be nice to see if we can ever max out the linear range, but that’s probably best done with high MOI pseudotyped virus transductions). The variability between G1402C and G1402D may be error in cell counts, since we know there’s possible (slight) error in those estimates.
Perhaps one day we’ll also start playing around with renilla luciferase and Nanoluc, but for now, we’ll keep playing around with fLuc only just to keep things simple. But ya, now with this plasmid in hand, we can start more comprehensively testing recombination protocols for efficiency without having to book a ton of time at the flow cytometer…
We have this CryoPlus 2 LN2 storage unit from ThermoFisher.
To fill it, we regularly have to order “NI 230LT22”, or a 230 L tank of LN2 at 22 psi, from AirGas.
So how often do we have to order a LN2 tank to fill the CryoPlus? Well, here’s a graph for that.
So roughly every 12 to 13 days. Although once in a while, we get a lemon of a tank that empties in roughly a week.
The video for Kenny’s talk at the Mutational Scanning Symposium talk held in Boston in May 2024 can now be found at the CMAP_CEGS Youtube channel, at this link: https://youtu.be/C4bFAmKl4Q4?si=OhuPfVPEFtEkREYe
I’ve been thinking about budgets, partially b/c I just went through a protracted experience of getting the school to give me access to the remaining part of my startup (from almost 5 years ago! And while I was repeatedly told the remaining amount was non-expiring, it was just given to me in an account that expires 5 years from now…). Regardless, the goal is to spend down these remaining institutional funds while bolstering my research group, largely through personnel additions and management.
For that reason, I created a simulation of how personnel salary + other operational costs would exhaust my current funding portfolio (currently one R21 that ends in a year, an R35 that ends in 2 years, and the aforementioned remaining startup account). The black line in the plot below shows this happening, with the line terminating around the 3 year mark. This is presumably around when I would expect to run out of money, if zero future action were taken.
Now, keep in mind that there are some MAJOR assumptions going on here:
1. This simulation assumes that I DO NOT receive another grant in the next 5 years. Of course that will not be the case.
2. Aside from Nisha’s intended graduation date, the rest of the end dates are very roughly estimated. In the case of research staff, this is assumed to be indefinite for two of the individuals. Of course, if money ends up getting tight, A) my salary will end up going down some, which will help alleviate costs, and B) I’ll let go of research staff as necessary, and well before there are impacts on students (to which there is a larger commitment).
3. Everything is modeled as a daily recurring expenditure. In real life, I think everything behaves more like discrete sums that are added into the account yearly (like NIH budgets) or monthly / bimonthly (like salaries).
Note: It goes into the negative values since it’s allowing the startup not spent by the end of R35 in two years to subsequently exhaust (that’s when the line ends).
Now, I’m actively trying to recruit more personnel to the lab, at this point largely in the form of PhD students. That’s what the additional colors on the plot are. A singular PhD student would be the orange line, and two new PhD students would be the red line.
Especially in light of the fact that I *have* to spend that startup (or it presumably goes *poof* into the administrative ether), looks like I’m going to be good for at least a couple of years in the worst-case scenario. Regardless, this really does help me frame what I need to be doing at a given time. If financial situations were dire or particularly worrisome, I would be focusing on writing grant applications right now. Based on the above plot, I think it makes a lot more sense for me to devote focus on publishing existing projects and further developing preliminary evidence to increase the success rates of grant applications I could just as well submit in a year.
1/25/25 Update:
Well, with the future now a bit clearer in terms of personnel (and a recent grant award), this is what the current projection looks like:
Not terrible, but almost any projection can still be anxiety-provoking in the context of the current administration (eg. destabilization of the NIH).
2/8/25 Update:
That previous analysis did the simulation from scratch, using total award amounts and monthly personnel and supply costs to perform the calculation. I realized at some point that a more directly relevant analysis would simply use my existing unspent funds and would just see when that is used up (based on the same above monthly personnel and supply costs). Anyway, this is what that plot looks like.
Well, this gives me almost the exact same answer. That gives me some confidence on the accuracy of the estimate, at least.