I try to be as deliberate as I can be about designing my synthetic protein-coding constructs. While I’ve largely viewed splicing as an unnecessary complication and have thus left it out of my constructs (though, who knows; maybe transgenes would express better with splicing, such as supposedly happens with the chicken ß-actin intron present in the pCAGGS promoter/5’UTR combo), there’s still a very real possibility that some of my constructs encode cryptic splice sites that could be affecting expression. In a recent conversation with Melissa Chiasson (perhaps my favorite person to talk syn-bio shop with), she noted that there is actually a way to use SpliceAI to predict splicing signals in synthetic constructs, with a nice correspondence here showing what the outputs mean. Below is my attempt to get this to work in my hands:
First is installing Splice AI, following the instructions here.
I started by making a new virtual environment in anaconda:$ conda create --name spliceai
I then activated the environment with:$ conda activate spliceai
Cool, next is installing spliceai. Installing tensorflow the way I have written below got me around some errors that came up.$ conda install -c bioconda spliceai
$ conda install tensorflow -c anaconda
OK, while trying to run the spliceAI custom sequence script, I got an error along the line of “Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized…”. I followed the instructions here, and typed this out into my virtualenv:$ conda install nomkl
Alright, so that was fixed, but there was another error about no config or something (“UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually….”). So I got around that by writing a new flag into the load_module() function based on this response here.
OK, so after that (not so uncommon) struggling with dependencies and flags, I’ve gotten things to work. Here’s the result when I feed it a construct from the days when I was messing around with consensus splicing signals early during my postdoc. In this case, it’s a transcript that encodes the human beta actin cDNA, with its third intron added back in. It’s also fused to mCherry, found on the C-terminal end.
And well, that checks out! The known intron is clearly observed in the plot. The rest of Actin looks pretty clean, while there seems to be some low-level splicing signals within mCherry. That said, the fact that they’re in the wrong order probably means it isn’t really splicing, and I’m guessing the signals are weak and far away enough that there isn’t much cross-splicing with the actin intron.
Oh, and now for good measure, here’s the intron found in transcripts made from the pCAGGS vectors, with this transcript belonging to this plasmid from Addgene encoding codon optimized T7 polymerase.
Nice. Now to start incorporating it into analyzing some of the constructs I commonly use in my research…