If you use this this website, or the underlying code, in any publication, please cite our Bioinformatics paper Lancaster et al. (2014) , which also describes additional extensions and implementation details of the algorithm. Code for the algorithm, as well as for powering this website is available at GitHub under the open-source MIT license.
The user should paste sequence into the text box, or upload a FASTA file. Then the user can specify two optional parameters:
As the prion-like AA frequencies were derived from S. cerevisiae , the default option is to use background frequencies from S. cerevisiae as well (i.e., treat the per-AA likelihood-ratios are species independent); the slider in this case is set to α = 100%.
The other extreme is to use use the background frequencies of the species being scored (i.e., treat the prion-like AA frequencies as species-independent). This is done by setting the slider to α = 0%. One can also linearly interpolate between these two extremes with any value of &alpha between 0% and 100%, e.g. set the slider to α = 50% for an average of the background frequencies of S. cerevisiae and the species being scored.
The background AA frequencies for S. cerevisiae are built into the program. For any other species, PLAAC needs to know what frequencies to use. If α < 100%, the default is to compute the background AA frequencies for the input sequences and then interpolate between these with the background frequencies from S. cerevisiae (degree of interpolation given by the slider α).
However, if the input set of sequences is not an entire proteome, this may not be a good estimate of background frequencies; in particular, if the input sequences are pre-selected to consist of proteins with prion-like domains (e.g. candidates for plotting) then this is a bad idea, as the background AA frequency can be strongly skewed. To avoid this problem, instead one should upload the entire proteome and select the individual sequences to plot on the next page.
One can also paste or upload only subset of proteins but select from the precomputed set of background frequencies for several species we provide. In both cases the frequencies will still be interpolated with the S. cerevisiae background AA frequencies (unless α = 0).
Finer control over PLAAC, including the option to use customized background frequencies is available in the command-line (cli) version of PLAAC available from github: https://github.com/whitehead/PLAAC/blob/master/cli/README.md
The tables in the online output display four output variables: COREscore, LLR, PAPAprop and PAPAfi. Definitions of the values of these columns, along with other columns only available in the TSV summary file are described below. Note in the description below, 'c' = the core length in residues entered by the user (as noted above, this default is set to 60 AA residues).
The original version of the PLAAC algorithm is described in
Alberti et al. (2009);
updated AA frequencies for the
prion-like state, and adjustments for background frequencies of
other species (with α=50%) were used in
Couthouis et al. (2011).
The default hidden Markov model (HMM) for PLAAC, which uses the updated AA frequencies for the prion-like state, and
AA frequencies from
S. cerevisiae
for the background state, is shown below; background AA frequencies may differ at runtime if α < 100% is used. For a
detailed introduction to HMMs see e.g. Chapter 3.2
of
Durbin et al. (1998).
In adddition to HMM parses and profiles of per-residue posterior probabilities for each state, PLAAC produces plots of sliding averages of per-residue log-likelihood ratios for the prion-like versus background state. To illustrate that regions with prion-like composition are often predicted as disordered by FoldIndex ( Prilusky et al., 2005 ) (a sliding window version of the approach described in Uversky et al., (2000) ), these sliding averages are scaled (by using base-4 rather than natural logarithms) and reversed in sign, in a track called -PLAAC, along with a re-implementation of FoldIndex. (There are also many regions that are predicted to be disordered by FoldIndex but that do not score highly in PLAAC, as non-compensating charged residues are favorable for disorder in FoldIndex, but not favorable for PLAAC.) Plots also include a re-implementation of PAPA ( Toombs et al., 2010 ; Toombs et al., 2012 ) which is multiplied by -4 so that lower scores are more predictive of prion propensity, and so that the range is more comparable to the other tracks. A dashed green line represents a similarly rescaled version of the cutoff PAPA > 0.05. Because the cutoffs for PAPA are defined not for sliding averages, but rather for sliding averages of sliding averages, by default we show "doubly-smoothed" versions of plots, as in Toombs et al. (2012) and Kim et al. (2013) ; "singly-smoothed" versions can be produced with the command-line version of PLAAC.
When computing sliding averages centered at each residue of a protein, say for window-size w = 41, there is an issue of what to do when a portion of the window falls outside the sequence. One option is to not score such windows at all (or give them a score of NA), leading to a shortening by (w - 1)/2 = 20 positions from each end for single smoothed profiles, and another 20 from each end for doubly-smoothed profiles. Another option is to ignore just those positions that fall outside the windows when computing each average --- in the extreme cases of windows centered at terminal residues, the scores then represent the averages of (w-1)/2 + 1 = 21 residues rather than w = 41. For consistency, we use the same approach as PAPA for all three tracks. This is to use the second option for the first round of smoothing (so singly-smoothed profiles have the same length as the protein sequences, with no NA values), and a modified version of the first option for the second round of smoothing (giving NA values for the first and last (w-1)/2 positions). The modification is that rather than computing simple averages of singly-smoothed scores in the second round, a weighted average is computed, with weights proportional to the number of positions that contributed to them in the first round (ranging from (w-1)/2 + 1 up to w). Users are also welcome to ignore the first and last w residues from the doubly-smoothed profiles, which then amounts to using two applications of the unmodified first option above.
Alberti S, Halfmann R, King O, Kapila A, Lindquist S. (2009)
A systematic survey identifies prions and illuminates sequence features of prionogenic proteins.
Cell 137, 146-58.
Couthouis J, Hart MP, Shorter J, Dejesus-Hernandez M, Erion R, Oristano R, Liu AX, Ramos D, Jethava N, Hosangadi D, Epstein J, Chiang A, Diaz Z, Nakaya T, Ibrahim F, Kim HJ, Solski JA, Williams KL, Mojsilovic-Petrovic J, Ingre C, Boylan K, Graff-Radford NR, Dickson DW, Clay-Falcone D, Elman L, McCluskey L, Greene R, Kalb RG, Lee VM, Trojanowski JQ, Ludolph A, Robberecht W, Andersen PM, Nicholson GA, Blair IP, King OD, Bonini NM, Van Deerlin V, Rademakers R, Mourelatos Z, Gitler AD (2011)
A yeast functional screen predicts new candidate ALS disease genes.
Proc Natl Acad Sci U S A. 108(52):20881 - 20890.
Durbin R, Eddy S, Krogh A, Mitchison G. (1998)
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
Cambridge University Press
Kim HJ, Kim NC, Wang YD, Scarborough EA, Moore J, Diaz Z, MacLea KS, Freibaum B, Li S, Molliex A, Kanagaraj AP, Carter R, Boylan KB, Wojtas AM, Rademakers R, Pinkus JL, Greenberg SA, Trojanowski JQ, Traynor BJ, Smith BN, Topp S, Gkazi AS, Miller J, Shaw CE, Kottlors M, Kirschner J, Pestronk A, Li YR, Ford AF, Gitler AD, Benatar M, King OD, Kimonis VE, Ross ED, Weihl CC, Shorter J, Taylor JP (2013).
Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS.
Nature 495(7442):467-473.
Lancaster AK, Nutter-Upham A, Lindquist S, King OD. (2014)
PLAAC: a web and command-line application to identify proteins with Prion-Like Amino Acid Composition
Bioinformatics doi:10.1093/bioinformatics/btu310
Michelitsch MD, Weissman JS. (2000)
A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel prions.
Proc Natl Acad Sci USA 97(22):11910-11915.
Prilusky J, Felder C, Zeev-Ben-Mordehai T, Rydberg EH, Man O, Beckmann JS, Silman I, Sussman JL (2005)
FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded.
Bioinformatics 21:3435-3438
Toombs JA, McCarty BR, Ross ED. (2010)
Compositional determinants of prion formation in yeast.
Mol Cell Biol 30:319-332.
Toombs JA, Petri M, Paul KR, Kan GY, Ben-Hur A, Ross ED (2012)
De novo design of synthetic prion domains.
Proc Natl Acad Sci USA 109:6519-6524.
Uversky VN, Gillespe JR, Fink AN (2000)
Why are 'natively unfolded' proteins unstructured under physiologic conditions?
Proteins 41:415-427.