Installing Enrich2

Posted on December 27, 2019December 28, 2019 by kmatreyek

Installing Enrich2 / using a Conda environment

I have a couple of new Macs in the lab that need Enrich2 installed. The goal today is to go through the steps of making a Conda environment specifically for running Enrich2 (and installing Enrich2), developed by Alan Rubin.

The Enrich2 repository

The Enrich2 docs

Alan instructions for making a Conda environment specifically for Enrih2

Alan’s instrucitons for doing this is pretty good, if I remember correctly. But, Alan is a seasoned programmer / computational scientist, while people like me are novices and far less familiar with these steps. Furthermore, depending on the specifics of your computer / system, you may get different errors in the installation process. Thus, here’s my interpretation of this process for the benefit of others like me.

0) This supposes that you have already installed an updated version of Anaconda on your Mac. Go back and do this now, if you haven’t done so already.

1) Download the Anaconda Enrich2 environment file and put it somewhere you can access using Terminal.

2A) Update to the newest version of Anaconda, just in case. Hit “y” if prompted.

conda update conda

2B) OK, so now was installing the right version of Pandas it needs. To do this, I first went into Anaconda Navigator and made a new environent called python2, that uses Python 2. Then in terminal, I called:

conda activate python2

2C) This activated python2, so that now the prompt didn’t say “base”, but now it said “(python2)”. Cool. Envinroment activated. I then installed pandas 0.19.2 by typing in:

pip install pandas==0.19.2

2D) OK, this seemed to work too. I got an error when trying to re-run the “enrich2_env.yml file”, so I removed the “=0.19” part form the .yml file and ran “conda env create -f enrich2_env.yml”. This actually seemed to work giving me a list of packages being extracted. Now to actually test it out.

conda activate enrich2

3) Cool, that worked, and the terminal prompt now says “(enrich2)”. Now we’re in business. Next is actually installing Enrich2 now that we’re in the enviroment. Go to the Enrich2 repository, download the file, unzip it, and then move to its directory in terminal. Then run:

python setup.py install

That seemed to work since it didn’t throw any errors.

4) Next is actually trying to run the application. Type in the following:

enrich_gui

Oh Jesus christ. It literally crashes the Finder due to the following errors:

CGSTrackingRegionSetIsEnabled returned CG error 268435459
CGSTrackingRegionSetIsEnabled returned CG error 268435459
CGSTrackingRegionSetIsEnabled returned CG error 268435459
HIToolbox: received notification of WindowServer event port death.
port matched the WindowServer port created in BindCGSToRunLoop

Looks like this may have been a problem with the MacOS operating system. Updating my OS to Catalina and then trying again.

5) OK, MacOS has been updated to Catalina. Now let’s try running Enrich again. (You’ll likely want to run this command after you’ve navigated to the directory with your raw files to simplify the file locating process).

enrich_gui

Awesome. It worked!

PS. I followed these instructions for my second Mac and it worked like a charm. I even went straight for the Cataline update early on and didn’t run into the error in Step4.

Downloading JoVE videos

Posted on December 22, 2019April 29, 2023 by kmatreyek

4/29/23 update: I think that at some point JoVE changed how their videos are accessed, and the approach below is no longer applicable.

Part of my goal for this holiday break is to work on an exploratory research grant proposal for a high-throughput investigation studying how protein coding variants in inflammasome components lead to various autoinflammatory diseases. I heard there were supposed to be some cool-looking videos of ASC speck formation (like this video from Kuri et al, 2017, J Cell Biol), so I did a google search for such videos. This lead me to some videos at JoVE, the Journal of Visual Experiments. CWRU has institutional access to tons of journals including JoVE, but always having to log in to watch the video is kind of clunky, so I wanted to be able to download the relevant videos. Thus, I just looked under the hood at the html used to organize the webpage, and found where the video lived so I could download it. Here are some instructions for doing just that:

1) Using Google Chrome (though I’m sure other browsers like FireFox should do this as well), log into your institutional access service to get to the login-protected JoVE page with the full video.

2) Right click on the area with the video and hit “inspect”.

3) In the top inspector pane, go to the area that says something along the lines of…:

<video class=”fp-engine” playsinline=”” webkit-playsinline=”” preload=”none” autoplay=”” crossorigin=”anonymous” src=”blob:https://www.jove.com/d4697446-c6f4-4902-ab6b-37580284d671″ style=”display: block;”><source type=”video/mp4″ src=”https://cloudflare2.jove.com/CDNSource/protected/57463_Fink_051418_P_Web.mp4?verify=1576994641-Y2wt%2BLiPU3Iw9mTSO%2BNHPlX%2BjGEwXuULht5jH92%2FuzY%3D”><track kind=”subtitles” label=”English” srclang=”en” src=”/files/vtt/57463/57463.vtt” id=”en-English”></video>

… and click on the triangle to open up that section and display all of the sub-sections of it.

4) The first subsection should say something like …:

<source type=”video/mp4″ src=”https://cloudflare2.jove.com/CDNSource/protected/57463_Fink_051418_P_Web.mp4?verify=1576994641-Y2wt%2BLiPU3Iw9mTSO%2BNHPlX%2BjGEwXuULht5jH92%2FuzY%3D”>

… and right click on the “src” link and open in a new window.

5) Right click again and download the video file to your hard drive. It will likely be a .mp4 file format. Now you can rewatch it without having to be logged into the JoVE website.

PS-1. Once you’re at the first part of step 3, you can just look in the “src” section and copy-paste the text starting at “https://…” up through “…mp4″ (and not ?verify…” and copy-paste that to a new window as well, and skip to step 5. Though I suppose this is actually the same amount of effort as actually doing step 4.

PS-2. Yes, I find it kind of funny that I just made a tutorial of saving a video from a visual tutorial.

PS-3. For the record, I’m not supporting / condoning bypassing the gatekeeping code this journal has for accessing the full-content. I’m mostly just trying to streamline science so people can get more / better work done without impediments. As far as I can tell, you still do need institutaional access to be able to access the full file (doing the above steps at the non-logged in site only links you to the “teaser” video).

Kenny joins the CWRU Center for AIDS Research

Posted on December 17, 2019February 17, 2020 by kmatreyek

Kenny gives a talk for a CFAR symposium between the Cleveland and Pittsburgh CFAR labs, and joins shortly thereafter.

Simulating sampling during routine lab procedures

Posted on November 24, 2019November 28, 2019 by kmatreyek

TL;DR: Statistics is everywhere, and simulating bottlenecks that happen during routine lab procedures such as dilutions of cells can potentially help you increase reproducibility, and at the least, help you better conceptualize what is happening with each step of an experiment.

I’m still working on getting a cell counter for the lab. In the meantime, we’ve been using an old school hemacytometer to count cells before an experiment. Sarah had used a hemacytometer more recently than me, and knew to dilute the cells from a T75 flask 10-fold to get them into a countable range for the hemacytometer. She said she had performed the dilution by putting 10 ul cells in 90 ul media (and then putting 10 ul of the dilution into the hemacytometer). But as she said this, she asked whether it was OK to perform the dilution as described; a grad student in her previous lab had taught her to do it that way, but a postdoc there said it was a bad idea. My immediate response was that if the cells are sufficiently mixed, then it should be fine. And while that was my gut reaction, I realized that it was something I could simulate and answer myself using available data. Would the accuracy of the count be increased if we diluted 100 ul of cells into 900ul of media, or 1ml of cells into 9ml of media?

Here are the methods (skip if you don’t want to dive in and want to save yourself a paragraph of reading): To me, it would seem the answer to whether the dilution matters depends on how the cells are dispersed in the media / how variable the count is when the same volume is sampled numerous times. Sarah’s standard practice is to count four squares of the hemacytometer, so I had four replicate counts for each volume pipetted. She had repeated this process three times by the time I had performed the analysis, giving me a reasonable dataset I could roll with. I got the mean and standard deviations for each of the three instances, all corresponding to a volume of 0.1 ul. They were all quite similar, so I created a hypothetical normal distribution from the average mean and standard deviation. Next was seeing how different ways of performing the same dilution impacted the accuracy of individual readings. I recreated the 10 ul cells by sampling from this distribution 100 times, 100 ul cells by sampling 1,000 times, and 1 ml by sampling 10,000 times, and taking the mean. I repeated this process 5,000 times for each condition, and looked at how wide each distribution was.

I then turned the counts into concentration (cells / ml):

Instead of stopping there, I thought about the number of cells I was actually trying to plate, which was 250,000. The number the distributions were converging to was ~ 27.3 (black line), so I used that as the “truth”, and saw how many “true” cells would be plated if I had determined the volume needed to be plated based on each of the repeat concentrations calculated by each of the conditions of dilutions. The resulting plot looked like this:

So as you can tell based on the plot, there are slight differences in cells plated depending on imprecision propagated by the manner in which the same 10-fold dilution as performed: while all distributions are centered around 250k, the 10 ul dilution distribution was quite wide, while the 1 ml in 9 ml dilution resulted in cell counts very close to 250k each time. To phrase it another way, ~15% of the time, a 10 ul in 90 ul dilution would cause the “wrong” number of cells to be plated (less than 24k, or more than 26k). In contrast, due to the increased precision, a 100 ul in 900 ul dilution would never result in the “wrong” number of cells being plated. So speaking solely about the dilution, the way the dilution was being performed could have some light impacts on the accuracy of how many cells would be actually plated.

I was going to call this exercise complete, but I ran this analysis by Anna, and she mentioned that I wasn’t REALLY recreating the entire process; sure I had recreated the dilution step, but we would have also counted cells from the dilution in the hemacytometer to actually get the cell counts in real life. Thus, I modified the code such that each dilution step was followed by a random sampling of four counts (using the coefficient of variation determined from the initial hemacytometer readings), and taking the mean of those counts; this represented how we would have ACTUALLY followed up each dilution in real life. The results were VERY different:

In effect, the imprecision imparted by the hemacytometer counts seemed to almost completely drown out the imprecision caused by the suboptimal dilution step. This was pretty mind-blowing for me; especially considering that I would have totally missed this effect had I not run this post by Anna. Now fully modeling the counting process, a 10 ul in 90 ul dilution would cause the “wrong” number of cells (less than 24k, or more than 26k) to be plated ~ 42.5% of the time, and a 100 ul in 900 ul dilution would still cause a “wrong” cell number to be plated ~ 42.2 % of the time; almost identical! Thus, while a 100 ul in 900 ul dilution does impart some slightly increased accuracy, it’s quite minor / negligible over a 10 ul in 90 ul dilution. So while in a sense this wasn’t the initial question asked, it’s still effectively the real answer.

At the end of the day, I think the more impactful aspect of this exercise is the idea that even routine aspects of wet-lab work are deeply rooted in stats (in this case, propagation of errors caused by poor sampling), and that the power of modern computational simulations can be used to optimize these procedures. There’s something truly empowering to having a new tool / capability that gives you new perspectives on procedures you’ve done a bunch of times, and allows you to fully rationalize it rather than relying on advice given to you by others.

Here’s the code if you want to try running it yourself.
Acknowledgements: Thanks to Sarah for bringing this question to my attention. Also, BIG THANKS to Anna for pointing out where I was being myopic in my analysis, which got me to a qualitatively different (and more real-life relevant) answer. It really is worth having smart people look over your work before you finalize it!

Sarah joins the lab!

Posted on November 1, 2019November 3, 2019 by kmatreyek

Sarah Roelle joins the lab as an RA2, and will be using her years of experience in the CWRU Department of Biomedical Engineering to help Kenny finish setting up the lab, and work with him to get the first sets of independent research projects moving. Welcome Sarah! We are very happy to have you here!

Software to download

Posted on November 1, 2019June 13, 2023 by kmatreyek

I recently bought myself a new computer for the office, which meant that I had to download and install all of the key software I used for work. I decided to write down what these were so future me (or future employees) could have it as a reference. All of the following links / commands are for use with a mac.

Everyone should install these:

Google Chrome

Google Drive & Sync or Google Drive File Stream

Microsoft Office

Anaconda

R (requisite for R Studio)

R Studio
packages worth installing: tidyverse, shiny, ggrepel

XQuartz (requisite for Inkscape on older macs)

Optional (but useful):

Box Sync Installer

Dropbox desktop app

MacDown

A plasmid Editor (ApE)

OBS (Open Broadcast Software)

Spyder (Pymol IDE)

Forticlient VPN

Cyberduck

Useful installations from the command line

Samtools

First go to the Samtools directory

$ ./configure

$ make

$ make install

$ export PATH=bin:$PATH

Homebrew

$ xcode-select –install

$ ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”

$ brew doctor

Gifski

$ brew install gifski

FFmpeg

$ brew install ffmpeg

PyPDF2

$ conda install -c conda-forge pypdf2

A note about editing your .bash_profile

Packages to install in RStudio

You can install these as you need them, but some of these packages are so useful you may as well do it up front when setting up your computer.

install.packages(“tidyverse”)
install.packages(“ggrepel”)
install.packages(“patchwork”)
install.packages(“googlesheets4”)
install.packages(“MASS”)
install.packages(“ggbeeswarm”)

Adding printers in WRB 5th floor

Note: You will not be able to set up the printer when on the “CaseGuest” wireless network. The printer can be setup while on the “CaseWireless” wireless network, though it may be hard to access it from the lab-space. In that case, a direct link to the ethernet is probably the best way to go. When connected to the land-line, you may first encounter an error saying the identity of any websites you go to cannot be confirmed. This may be because you are on a new computer, in which case you have to first register your computer with CWRU IT by typing in “setup.case.edu” into a browser.

The most convenient printer is the black-and-white one near my office. To add this printer on a mac: 1) Go to Systems Preferences (such as through the apple icon op) 2) Go to Printers and Scanners 3) Press the “+” sign to add a printer 4) Type in the IP address “129.22.253.166”. 5) The protocol should be “Line Printer Daemon – LPD” 5) You can rename it to something like “Nearby Black and White” to make it easier to remember. 6) It’s fine to leave Use as “Generic PostScript Printer” 7) Click on Duplex Printing (Since it’s useful). Voila!

5/31/22 edit: On my newest Mac, the process was somewhat different. Here, after going to “System Preferences” > “Printers & Scanners”, go to the second “IP” tab with the globe on it, add in the “129.22.253.166” to the address field, set the protocol to “Line Printer Daemon – LPD” with all other settings kept the default except for the location (I write something like “WRB 5-east mailroom”). Note: I click on the option of duplex printing, which allows you to print on the front and back sides.

6/13/23 edit: If, for some reason, the printer on our side of the floor is down, you can always use the printer on the other side of the floor. Same instructions as above, except the IP address is “129.22.253.139”.

Make a movie of a pymol structure spinning for powerpoint

Posted on September 24, 2019May 18, 2020 by kmatreyek

UPDATE 5/18/2020: Well, for whatever reason, (at least my version of) Pymol stopped turning the camera (turn command) or molecule (rotate command) in script form, so the below no longer works. Buuutttt. I just told used the drop-down menu to do the movie > program > camera loop > y roll > 4 (or whatever) seconds. And then went to export movie in the file menu and made my movie that way. *shrug*

I’m sure there are many ways to do this, but this is the way I’ve been doing it most recently:

1) Set up your pymol session. That means importing your structure, turning the background to white, and enabling any other setting to make it the desired quality.

2) I use a custom python script to make a series of commands that make the structure turn slightly, ray-trace the structure (for a high quality image), and export the image to a PNG file. The script that I’ve linked to here makes pymol export 360 images, making the structure spin around completely.

Note: Depending on your settings, this process can take 10 minutes or 3 hours. An easy parameter to change would be the resolution (default setting in the script is 2000). Obviously the settings that are toggled during step 1 can drastically change how long it takes as well.

3) Use ffmpeg (at least, on a mac) to turn the 360 images into a video.

$ ffmpeg -framerate 20 -pattern_type glob -i '*.png' -c:v libx264 -preset slow  -profile:v high -level:v 4.0 -pix_fmt yuv420p -vf pad="width=ceil(iw/2)*2:height=ceil(ih/2)*2" -crf 22 -codec:a aac Output.mp4

I used homebrew to install ffmpeg last time I had to do it (brew install ffmpeg). I followed these instructions last time I had to install homebrew.

4) Drag into your powerpoint presentation, and voila!

First Day!

Posted on September 3, 2019December 19, 2020 by kmatreyek

After a successful move across 3/4 of the country, the lab opens its doors! Time to equip this place up!

Directions to the office & lab

Posted on September 1, 2019September 13, 2022 by kmatreyek

1) Enter Wolstein Research Building, and take an elevator from the lobby elevator bank up to the 5th floor. (or take the stairs if you want the exercise). Both the elevator bank and second floor of Wolstein requires keycard access. If you do not already have access to WRB, I suggest talking to the security desk right behind the elevator bank and they should be able to let you through.

2) Take a 45-degree right turn out of the elevator (or 90-degree left turn off of the stairs) through the double doors (see image below)

3A) My office is the second door on the left (Room 5133; see image below). If we are meeting, then this is where you want to go.

3B): If looking for the lab, turn right through the double doors next to the portrait of Mark A Smith PhD (see image below).

4) Go straight past the service elevator and turn left once you reach Jim Anderson’s office (see image below).

5) Our lab benches will be directly to your right after the turn. If looking for the TC room, keep going straight until you see room 5103 on the right (see image below).

VAMP-seq Tips

Posted on July 20, 2019July 20, 2019 by kmatreyek

VAMP-seq is a valuable tool for identifying loss-of-function variants for a protein when you don’t have a specific assay for characterizing that protein’s activity. This is because almost all proteins can be “broken” by mutations that cause the protein to mis-fold, mis-traffick, or any other perturbations that may cause the protein to disappear from its normal cellular compartment. In the case of PTEN, we identified 1,138 variants that were lower than WT abundance. Notably, ~ 60 % of variants characterized as pathogenic in people were loss-of-abundance variants, confirming the importance of this property.

VAMP-seq uses a genetic fusion between your protein of interest and a fluorescent moeity (eg. EGFP) to assay its steady-state abundance. Fusions with unstable variants of your protein results in cells expressing unstable fusion proteins that don’t really fluoresce. In contrast, fusions with stable variants (such the WT protein) may fluoresce brightly. Unlike western blots which are ensemble measurements, this assay can be performed at a single-cell level. Correspondingly, this single-cell fluorescence readout makes it possible to test a large library of variants in parallel, in a multiplexed format.

Two VAMP-seq orientation discussed here

Still, when thinking about using VAMP-seq, you must first consider its limitations. While strong loss-of-abundance variants will be non-functional, variants of intermediate abundance may still be sufficiently functional in many contexts. Furthermore, while low abundance correlates with inactivity (and pathogenicity, in many clinical genetics contexts), WT-like abundance in no way indicates activity (or benignity when observed in people). For example, active site mutants often destroy protein function while having little to no effect on protein folding and abundance.

There are also proteins that are inherently incompatible with VAMP-seq. Secreted proteins won’t work because you lose the single-cell, genotype-phenotype link needed for the single-cell assay to work. Marginally stable or intrinsically disordered proteins likely won’t work due to a lack of destabilizing effect. Obligate heterodimers won’t work, though you may be able to get around it by overexpressing the protein partner, such as what I did with MLH1 for assessing PMS2 (See Fig 6). Proteins that cannot be tagged are problematic; this likely includes proteins that normally exist in crowded complexes, or that have key trafficking motifs on their termini. Proteins that are toxic to cells when overexpressed also poses problems, though one of my new landing pad platforms may help with that.

If your protein of interest passes those criteria, the rest is empirically confirming that there is enough signal over background to run the assay. A good literature search is a great place to start. You should look for 1) evidence that an N- or C-terminal tag works well (both good expression and normal protein activity), and 2) known destabilized variants that could serve as controls when performing the preliminary experiments. Ideally, there will be a clear difference in fluorescence distributions that are separable by thresholds used in FACS sorting, and that these differences are physiologically meaningful (as far as we know). Comparing /correlating the results of western blotting with the fluorescence distributions of EGFP-tagged protein is helpful (see below for PTEN). If the initial EGFP distributions between WT and the destabilized variants don’t seem super crisp, see what it looks like when you take the EGFP:mCherry ratio. As you can tell in the below figure, the EGFP:mCherry ratio is quite handy for increasing the precision of each distribution, as it divides out much of the heterogeneity in transcription / translation between cells.

Western and flow results

Regardless, I recommend to most people that they clone both N- and C- terminal fusions, and minimally look at the MFI values of the cells expressing each fusion, as low MFI will likely mean low dynamic range of the assay (from too little signal over background). Ideally, both WT and controls will be tested in both contexts. If the signal is relatively high but there’s concern that the large GFP fusion is causing problems, you could try 1-10/11 split EGFP. This only requires fusion with the ~ 15aa beta-strand 11 of EGFP (and separately co-expressing the larger fragment in the cells), though it’s not completely free of steric hindrance as it requires the spontaneous complex formation of the two subunits for fluorescence. This format also worked with PTEN (See SFig 1d), though I noticed a ~ 10-fold hit in overall fluorescence using the splitGFP format.

Fluorescence levels from different formats

Once it passes all those tests, then follow the steps in the Nature Genetics paper. Good luck!