Rankpep: Prediction of MHC peptide binding Help page by Pedro A. Reche

RANKPEP: Prediction of MHC-restricted ligands INFORMATION Usage Help · Results help · Related Resources ·

Binding predictions	cleavage predictions	Immunodominance	Sequence variability masking	References

Rankpep:Overview

T cell immune responses are driven by antigenic epitopes, and hence their identification is important for understanding disease pathogenesis and etiology, and for vaccine design. There are two types of T cell epitopes, named CD8 and CD4, which are only recognized in the context of the MHCI and MHCII molecules, respectively, by the correspondent T-cell types. Engaging both sets of T-cells is desirable for mounting a strong defensive immune response against cancer cells and pathogens. Therefore, we have a developed this site for the prediction of peptide binding to both MHCI and MHCII molecules, and subsequent anticipation of CD8- and CD4-T cell epitopes. Appropriate processing of antigen peptides must occur prior to their binding to the relevant MHC molecules. Incidentally, the C-terminus of most MHCI-restricted epitopes (CD8-T cell epitopes) results from cleavage by the proteasome, and thus, proteasome specifity is important for determing T-cell epitopes. Consequently, this site can can also determine whether the C- terminus of the predicted MHCI-ligands is the result of proteasomal cleavage. Also implemented in the RANKPEP web server is a variability masking feature to focus on the prediction of conserved epitopes, which could thus help to avoid immune evasion resulting from mutation.

Rankpep: MHC-peptide binding predictions

Peptides that bind to a given MHC molecule share sequence similarity. Not surprisingly, sequence patterns have been traditionally used for the prediction of peptides binding to MHC molecules. Such sequence patterns, however, have proven to be too simple, as the complexity of the binding motif cannot be precisely represented by the few residues present in the pattern [1]. To overcome this limitation, RANKPEP uses Position Specific Scoring Matrices (PSSMs) or profiles from set of aligned peptides known to bind to a given MHC molecule as the predictor of MHC-peptide binding.

PSSMs for the prediction of MHC-peptide binding

For a profile to be a good descriptor of the binding motif, peptides must be aligned by structural and/or sequence similarity [2]. MHCI and MHCII molecules bind peptides in similar yet different modes (Fig.1)(3,4), and alignments of MHCI- and MHCII-ligands were obtained to be consistent with the binding mode of the peptides to their MHC class.

Fig.1A MHCI molecule	Fig.1B MHCII molecule

HLA- A*0201 in complex with a peptide LLFGYPVYV from HTLV-1 TAX protein (PDB: 1HHK)	HLA-DR1 in complex with peptide PKYVKQNTLKLAT from Influenza A virus ( (PDB:1FYT)

MHCI ligands are of short length (8-11), as they are constrained into the MHCI peptide binding groove, with their N- and C-terminal ends connected by a network of hydrogen bonds to conserved residues of the MHCI molecule (Fig.1A). Thus, peptides bound to the same MHCI can differ by one or two amino acids, and as discussed in (5), proper structural alignment of these peptides is better guaranteed if the peptides are of the same length. Accordingly, we have separated the peptides bound to a given MHCI molecule into subsets containing only peptides of the same length, and created separate PSSMs from ungapped block alignments. The peptide binding groove of MHCII molecules is open, binding peptides in a manner that both the N- and C-terminus can extend beyond the binding groove (Fig.1B), and thus, peptides bound to MHCII molecules display a great variability in length (9-22). Yet only a peptide core of 9 residues fits into the MHCII binding groove providing the binding energy. Poor amino acid sequence similarity between MHCII ligands together with their great variability in sequence length make them difficult to align. Thus, for the alignment of the MHCII ligands, we have used the motif discovery program MEME (6), including a priori information consistent with the MHCII-peptide binding mode: A) there is only one binding core per MHCII ligand; B) All the peptide sequences define the same motif and C) the length of the motif is 9.

PSSMs were obtained from these alignments of MHC-ligands using PROFILEWEIGTH (7)(for MHCI-ligands) which use brach-proportional sequence weights or BLK2PSSM with a position-based weights (8,9)(for MHCII ligands).

Scoring MHCI-peptide binding using PSSMs

The binding potential (score) of any peptide sequence (query) to a given MHCI is obtained by aligning the relevant PSSM with the protein segments, and adding up the profile scores that match the residue type and position in the profile. To search protein sequences for MHC ligands using PSSMs we use a dynamic algorithm written in Python that scores all protein segments with the length of the PSSM width, and sorts them accordingly. Scoring starts at the beginning of each sequence and the PSSM is slid over the sequence one residue at a time until the end of the sequence. Furthermore, to narrow down the potential binders from the list of ranked peptides, we defined a binding threshold as the score value that includes 90% of the peptides within the PSSM. This binding threshold is built into each of our matrices, delineating the range of putative binders among the top scoring peptides.

Performace of PSSMs predicting MHC-peptide binding

If PSSMs are good predictors of MHC-peptide binding, MHC-restricted T cell epitopes should be expected among the high scoring peptides from within their protein sources. Under this assumption, we found ~80% of MHCI-restricted epitopes are predicted, that at a 2% threshold of top scoring peptides. However, ~80% of MHCII-restricted epitopes are found among the ~5% top scoring peptides. Thus, using these PSSMs a larger number of predicted peptides is required for the correct identification of MHCII-restricted epitopes than MHCI-restricted epitopes

Rankpep: Cleavage Predictions

Anticipation of T-cell epitopes is heavily predicated on the prediction of MHC-peptide binding. Yet prior to MHC binding, correct peptide processing must occur. Processing of MHCII-restricted epitopes occurs in the endosomal compartment, and it is mediated by several endopeptidases in combination with amino- and carboxy-peptidases (10, 11). This complexity makes the identification of any pattern related with processing of class II restricted peptides difficult. On the other hand, there is experimental evidence that the C-terminus of MHCI- restricted epitopes results from just the proteolyses of cytosolic proteins mediated by the proteasome (12). The proteasome thus plays a vital role in determining CTL epitopes, and its specificity can be modeled from MHCI-restricted peptides and their C-terminal flanking regions using statistical language models.

Methods and Implementation

Cleavage by the proteasome occurs at preferential sites within the protein, and the sequence signals from antigenic peptides processed by the proteasome are specially conserved at position P1 of cleavage site (C terminus of antigenic peptide) and its immediate flanking P1' residue (13). Prediction of proteasomal cleavage resembles the problem of language tagging (modeling the location of grammatic tags such as punctuation sing) and thus we have used the SRI Language Modeling toolkit (SRILM)(14) for statistical modeling of proteasomal cleavage sites. Training sets for statistical modeling of proteasomal cleavage were obtained from a database contaning the C-terminus and flanking regions of 332 antigens restricted by human MHCI molecules. Language Models for Proteasomal Cleavage Prediction (LMPCP) were created created from training sets of peptide fragments of variable fragment length derived from the above database. LMPCP were tested at different cutpoint probabilities (0.35 to 0.70) using HIDDEN- NGRAM over testing files of peptide fragments not included in the training sets as well as their entire protein sources. For each probability threshold, 23 different LMPCP^N_i (N= 10, 6, 4; i = 1 N-2) were tested, where N is the fragment size of fragment in training and testing sets and i is the order of the LMPCP tested. Best results (Predicted Cleavage Sites, PCS > 75%) were obtained from LMPCPs under soft cutting probability (0.35-0.5), indicating that the nature of the proteasome specificity is much less rigid than that of grammar tagging

Models

The language models for proteasomal cleavage (LMPCP) available in RANKPEP are the following:

Model	Frag. Size (N)	Order(i)	Prob	PCS(%)(a)	ECS(%)(b)	a-b	Mean Size
ONE: LMPCP₁₀²	10	2	0.1	87.6	39.2	48.4	2.84
TWO: LMPCP₄²	4	2	0.45	71.4	36.2	35.2	3.92
THREE: LMPCP₄²	4	2	0.7	47.9	20.5	27.4	7.77

LMPCPs were obtained using N-GRAM-COUNT and tested using HIDDEN-NGRAM at different probability thresholds (Pro) as indicated.

Frag. Size (N) This is the size of the fragments in training and testing sets.
Order(i): This is the order the models were build with N-GRAM-COUNT and tested with HIDDEN-NGRAM.
Prob:Probability above which a cutpoint is predicted with the relevant model
PCS: Predicted Cleavage Sites.
ECS: Expected Cleavage Sites. Calculated using the equation 100*C/(N-1) where C is the average number of cutpoint per fragment yield by a given model LMPCP when tested in a file of fragments size N.
Mean size. This is average size of the fragments yielded by HIDDEN-NGRAM when tested over source full length protein of the peptide fragments in the test file

Rankpep: immunodominance filter

T-cell epitope immunogenicity is contingent on several factors: 1) appropriated and effective processing of a peptide its their protein source, 2) stable peptide binding to the MHC molecule, and 3) the ability of the TCR to recognize MHC-bound peptide. Appropriate computational modeling of these three processes is required for accurate prediction of T-cell epitopes. Until now, only peptide MHC-binding, and processing of peptides for MHC class I restricted epitopes, have been considered in epitope prediction algorithms. Until now, only peptide MHC-binding, and processing of peptides for MHC class I restricted epitopes, have been considered in epitope prediction algorithms. However, in this version of rankpep we have implemented a model for immunodomant recocgnition of peptides by the TCR.

Methods and Implementation
Modeling of potential immunodominance features in peptides which may make them more readily recognized by TCRs was approached by comparing a set of immunogenic peptides versus a set of non-immunogenic peptides. To minimize any contribution due to MHC-binding and processing, only high affinity peptide binders to MHC class I molecules were chosen, and flanking regions were not considered. The immunogenic peptide set consisted of 101 peptides with high MHC binding and high T-cell activity. On the other hand, the non-immunogenic set consisted of ~400 peptides high MHC binding and no T-cell activity. All considered peptides were 9mers. Modeling of the differences between was carried out using Support Vector Machine trained on these two sets on the bassis of their residue composition, amino acid sequence, and both.

Results
SVM classifiers trained on the sequence [1], physico-chemical properties [2] or on the combination of both [1+2] are able to distinguish the immunogenic and non-immunogenic peptides but with low accuracy. The classifier based on physico-chemical properties is able to outperform the classifier based on sequence alone, thus suggesting that the properties of peptides may be playing a critical role in deciding immunodominance.
SVM-based immunodominace classifiers trained on residue properties and amino acid sequence were able to discriminate the immunogenic peptides from non-immunogenic peptides with an accuracy of 60.0% threshold of 0.5. This low accuray may be due to the fact that we are pooling together all peptides regardless of MHCI restriction. Unfortunatelly, at this time there is not enough data on immunogenic and non-immunogenic for a single MHCI restriction element.

Models
In Rankpep we have choosen an immunodominance filter based on SVM-based classifier trained on both residue properties and amino acid sequence. If this filter is set ON only those peptides that are classified as immmunodominant are returned by the server. Since immunodominance classification is threshold dependent we have given three optional thresholds to chose from.

Threshold 1.0: 49.5% sensitivity, 76.0% specificity
Threshold 0.5: 59.4% sensitivity, 69.4% specificity (Default)
Threshold 0.0: 68.3% sensitivity, 60.9% specificity

Due to the low accuracy of the classification method we recomend using these immunogenic filter only for large genome scale epitope predictions as a resort to limit the number of potential epitopes.

Rankpep: Sequence variability masking

We have noted that mutation offers a means for immune-evasion exploited by some pathogenic organisms such us HIV. In response to this limitation, RANKPEP predictions can be obtained from multiple sequence alignments. In this case, the server first creates a consensus sequence in which the variability positions are masked, and binding predictions are restricted to segments with no variable positions. Sequence variability is calculated from multiple amino acid sequence alignments as indicated by Reche and Reinherz (15), using a variability metric (V) formally identical to the Shannon entropy equation (16)

where Pi is the fraction of residues of amino acid type i, and M is equal to 20, the number of amino acid types. V ranges from 0 (total conservation, only one amino acid type is present at that position) to 4.322 (all 20 amino acids are equally represented in that position). Note that in order to achieve the maximum value V = 4.3, at least 20 sequences are required. Gap symbols (-) are considered for deriving the consensus sequence but are not computed for the variability calculations. Given a sequence variability threshold Vt, the consensus sequence is generated from the sequence alignment only for those positions with a V <= Vt as the most common amino acid, whereas variable position positions (V > Vt) are masked and represented in the consensus sequence with a "." symbol. Segments with a position masked are not considered in the RANKPEP predictions of MHC-peptide binding.

References

1. Ruppert J, Sidney J, Celis E, Kubo RT, Grey HM, Sette A: Prominent role of secondary anchor residues in peptide binding to HLA-A2.1 molecules. Cell 74:929, 1993.
2. Gribskov M, McLachlan AD, Eisenberg D: Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84:4355, 1987.
3. Madden DR: The three-dimensional structure of peptide-MHC complexes. Annu. Rev. Immunol. 13:587, 1995.
4. Stern LJ, Wiley DC: Antigen peptide binding by class I and class II histocompatibility proteins. Structure 2:245, 1994.
5. Reche PA, Glutting JP and Reinherz EL. Prediction of MHC Class I Binding Peptides Using Profile Motifs.Human Immunology 63, 701 709 (2002).
6. Bailey, T. L. and C. Elkan (1995). "The value of prior knowledge in discovering motifs with MEME.Proc Int Conf Intell Syst Mol Biol 3:21-9.
7. Thompson, J. D., D. G. Higgins, et al. (1994). "Improved sensitivity of profile searches through the use of sequence weights and gap excision." Comput Appl Biosci 10(1): 19-29.
8. Henikoff, S., J. G. Henikoff, et al. (1999). "Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations." Bioinformatics 15(6): 471-9.
9.Henikoff, S. and J. G. Henikoff (1994). "Position-based sequence weights." J Mol Biol 243(4): 574-8.
10. Pieters, J. (2000). "MHC class II-restricted antigen processing and presentation." Adv Immunol. 75: 159-208.
11. Watts, C. (2001). "Antigen processing in the endocytic compartment." Curr Opin Immunol 13(1): 26-31.
12. Craiu, A., T. Akopian, et al. (1997). "Two distinct proteolytic processes in the generation of a major histocompatibility complex class I-presented peptide." Proc Natl Acad Sci U S A 94(20): 10850-5.
13. Altuvia, Y. and H. Margalit (2000). "Sequence signals for generation of antigenic peptides by the proteasome: implications for proteasomal cleavage mechanism." J Mol Biol 295(4): 879-90.
14. Stolcke, A. (2002). SRILM -- An Extensible Language Modeling Toolkit. Proceedings of the International Conference of Spoken Language Processing. T. M. N. J. J. Ohala, B. L.
15. Reche, P. A. and E. L. Reinherz (2003). "Sequence variability analysis of human class I and class II MHC molecules: functional and structural correlates of amino acid polymorphisms." J Mol Biol 331(3): 623-41.
16. Shannon, C. E. (1948). "The mathematical theory of communication." The Bell System Technical Journal 27: 379-423, 623-656.

RANKPEP methods are described in full detail in the following publications:

Reche PA, Glutting JP and Reinherz EL.
Prediction of MHC Class I Binding Peptides Using Profile Motifs. Human Immunology 63, 701 709 (2002).

Reche PA, Glutting JP and Reinherz EL. Enhancements to the RANKPEP server for the prediction of MHC-binding peptides using profiles.
In preparation.

Rankpep: Usage help

PSSM: Position-specific scoring matrix or profile.

Profiles basically consist of a table listing the observed sequence-weighted frequency of all amino acids in every column of a sequence alignment. Peptide alignments and PSSMs for the prediction of MHCI and MHCII were obtained differently as indicated here.

RANKPEP includes a selection of 102 and 80 PSSMs for the prediction of peptide binding MHCI and MHCII molecules, respectively. Several PSSMs for the prediction of peptide binders of different sizes are usually available for each MHCI molecule. By default, we suggest to use PSSMs for the prediction of peptides of 9 residues. MHCII-specific PSSMs are always for the prediction of peptide binders of 9 residues.

User can also upload their own PSSMs, and use them as the predictors of peptide-MHC biding. Profiles must be in text format, in the form of a table, and with the amino acid types arranged in columns and specified in a header line. An example of PSSM for the prediction of MHC peptide binding to the mouse MHC molecue K^b is shown here

SEQUENCE INPUT

Input query for RANKPEP can either be s protein sequence/s in FASTA format or a Multiple Sequence Alignment (MSA) in CLUSTALW format. If MSA is provided, MHC-peptide binding predictions are obtained from a consensus sequence with the variable position masked

Input query can be pasted or uploaded from a local file. If a local file is uploaded, it will be processed by default. Files uploaded onto the server must contain only ascii characters. In other words, make sure you save your files as TEXT before you make any attempt of uploading them onto this server. Otherwise, your computer might, literally, exploit in front of your nose. Please, note that the MIF will not liable for any personal injury derived from such missuse of the server.

Example of sequences in FASTA format:

>HLA
MNTTVTTGLLLNGSYSENRTELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNL
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNCVQRTYVACHIRSVIIWLETISKKECKNTSGTKSGNKRAPGP
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFV
>A56881  PIR2 release 71.00
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT
NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW
KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG
YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI
VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG
GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY
DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN
EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAVVHEIVR
SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI
NADSSIEGNYTLRVDCTPLMYSLVHNLTKELKSPDEGFEGKSLYESWTKK
SPSPEFSGMPRISKLGSGNDFEVFFQRLGIASGRARYTKNWETNKFSGYP
LYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELANSIVLPFDCRDY
AVVLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSERL
QDFDKSNPIVLRMMNDQLMFLERAFIDPLGLPDRPFYRHVIYAPSSHNKY
AGESFPGIYDALFDIESKVDPSKAWGEVKRQIYVAAFTVQAAAETLSEVA

Esample of MSA in ClustalW format:

CLUSTAL W (1.81) multiple sequence alignment


hla_a68w_1HSB       VSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWD
hla_a0201_1DUY      VSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWD
hla_b3501_1A1N      MSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWIEQEGPEYWD
hla_b5301_1A1M      MSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWIEQEGPEYWD
hla_b5101_1E27      MSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWIEQEGPEYWD
hla_b2701_1HSA      VSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWIEQEGPEYWD
hla_cw3_1EFX        VSRPGRGEPHFIAVGYVDDTQFVRFDSDAASPRGEPRAPWVEQEGPEYWD
hla-cw4_1IM9        VSWPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPREPWVEQEGPEYWD
mkb_2vaa            VSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWMEQEGPEYWE
db-1BZ9             VSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWMEQEGPEYWE
                    :* **  **::: *****:. ******** . * ***  *:********:

hla_a68w_1HSB       QTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGSDGRFLRGYRQDAYDGK
hla_a0201_1DUY      QTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGK
hla_b3501_1A1N      QTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGPDGRLLRGHDQSAYDGK
hla_b5301_1A1M      QTYRENLRIALRYYNQSEAGSHIIQRMYGCDLGPDGRLLRGHDQSAYDGK
hla_b5101_1E27      QTYRENLRIALRYYNQSEAGSHTWQTMYGCDVGPDGRLLRGHNQYAYDGK
hla_b2701_1HSA      QTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGPDGRLLRGYHQDAYDGK
hla_cw3_1EFX        QTDRVSLRNLRGYYNQSEAGSHIIQRMYGCDVGPDGRLLRGYDQYAYDGK
hla-cw4_1IM9        QADRVNLRKLRGYYNQSEDGSHTLQRMFGCDLGPDGRLLRGYNQFAYDGK
mkb_2vaa            QSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGSDGRLLRGYQQYAYDGC
db-1BZ9             QWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGSDWRLLRGYLQFAYEGR
                    *  * .*     *****  ***  * : **::*.* *:***: * **:* 

hla_a68w_1HSB       RSWTAADMAAQTTKHKWEAAHVAEQWRAYLEGTCVEWLRRYLENGKETLQ
hla_a0201_1DUY      RSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQ
hla_b3501_1A1N      SSWTAADTAAQITQRKWEAARVAEQLRAYLEGLCVEWLRRYLENGKETLQ
hla_b5301_1A1M      SSWTAADTAAQITQRKWEAARVAEQLRAYLEGLCVEWLRRYLENGKETLQ
hla_b5101_1E27      SSWTAADTAAQITQRKWEAAREAEQLRAYLEGLCVEWLRRHLENGKETLQ
hla_b2701_1HSA      SSWTAADTAAQITQRKWEAARVAEQLRAYLEGECVEWLRRYLENGKETLQ
hla_cw3_1EFX        RSWTAADTAAQITQRKWEAAREAEQLRAYLEGLCVEWLRRYLKNGKETLQ
hla-cw4_1IM9        RSWTAADTAAQITQRKWEAAREAEQRRAYLEGTCVEWLRRYLENGKETLQ
mkb_2vaa            KTWTAADMAALITKHKWEQAGEAERLRAYLEGTCVEWLRRYLKNGNATLL
db-1BZ9             KTWTAADMAAQITRRKWEQSGAAEHYKAYLEGECVEWLHRYLKNGNATLL
                     :***** **  *::*** :  **: :***** *****:*:*:**: **

The user can also provide

BINDING THRESHOLD

Threshold of peptides predicted to bind to a given MHC molecule can be set by a given percentage of top scoring peptides or by a fixed number of scoring peptides. In addition, PSSMs in RANKPEP are associated with a specific binding threshold (PSBT) above which sorted peptides are highlighted in the results page. PSBT is obtained by scoring the peptides included in the PSSM alignment, so that 85% of these peptides score above the PSTB. In general, we found that ~80% of MHCI-restricted epitopes are predicted at a 2% threshold of top scoring peptides. On the other hand, ~80% of MHCII-restricted epitopes are found among the ~5% top scoring peptides. Thus, we suggest using a 2-3% binding threshold of top scoring peptides for the prediction of peptide binders to MHCI, and a 4-6% threshold for the prediction of MHCII-peptide binders.

PROTEASOME CLEAVAGE

C-terminus of MHCI-restricted peptides is generated by the proteasome, and thus RANKPEP also determines whether the C-terminus of the predicted MHCI-peptide binders are the result of proteasomal cleavage. Moreover, these sequences are highlighted in purple in the output results. Proteasomal cleavage predictions are carried out using three optional models obatined applying statistical language models to a set of knwon epitopes restricted by human MHCI molecules as indicated here.

Variable parameters in language models for the prediction of proteasomal cleavage include the length of the fragments used to create the training set, the window size used to train the model and for determining the cutpoint probabilities in the tested peptide, and the cutpoint insertion threshold. Three of the best models are available for prediction of C-terminus cleavage. These were constructed as follows:

ONE: This model was built using peptide fragments of ten amino acids in length, and training and prediction windows of two amino acids in length. The cutpoint insertion threshold is set at .35.
TWO: This model was built using peptide fragments of six amino acids in length, and training and prediction windows of 6 amino acids in length. The cutpoint insertion threshold is set at .40.
THREE: This model was built using peptide fragments of four amino acids in length, and training and prediction windows of two amino acids in length. The cutpoint insertion threshold is set at .50.

RESTRICT RESULTS BY MW

This feature allows to output only those peptides that fall into a Molecular Weight window set by the user. By default all predicted peptides are shown

VARIABILITY MASKING

If the user inputs a multiple sequence alignment, the server first creates a consensus sequence in which the variability positions are masked, and binding predictions are restricted to segments with no variable positions. Variable positons are obtained using the Shannon entropy equation as the variability messure.

Default variability threshold for masking is 1.0 (positions with a variability above 1.0 will be masked). However, this value can be set to any value ranging between 0 and 4.3; the limit values from the Shannon entropy equation . If a variability threshold of 4.3 is chosen, binding prediction will be carried out from a consensus sequence without tthe variable positions masked. On the other hand, if the variability threshold is set to 0, all positions in the multiple sequence that are not 100% conserved will be masked in the consensus sequence.

Rankpep: Results help

The output of RANKPEP consists of a list of peptides ordered by their binding potential (score) to the selected MHC molecule. In addition, RANKPEP's output also includes the following information:

RANK, Relative rank of the predicted peptide
POS., Position of the peptide in the input protein
N, Amino acid sequence of the three residues preceding the N-terminus of the predicted peptide
SEQUENCE, Amino acid sequence of predicted peptide
C , Amino acid sequence of the three residues following the C-terminus of the predicted peptide
MW(Da), Molecular weight in Daltons of the predicted peptide
SCORE, Score of the peptide
% OPT., Percentile score of the predicted peptide relative to that of the consensus.
The consensus is the sequence that yields the maximum score, namely optimal score, with the selected profile.

The following is an an example of the RANKPEP output:

Matrix: 10mer_HLA_A11_A_1101_.pwp
Consensus: ATYYGSVVYK
Optimal Score: 162.0
Binding Threshold: 71.00
Protein 1 of 1:

>A56881 PIR2 release 71.00

RANK	POS.	N	SEQUENCE	C	MW (Da)	SCORE	% OPT.
1	491	FEG	KSLYESWTKK	SPS	1246.455	91.0	56.17 %
2	601	RDY	AVVLRKYADK	IYS	1162.395	90.0	55.56 %
3	577	LTV	AQVRGGMVFE	LAN	1093.265	78.0	48.15 %
4	690	YRH	VIYAPSSHNK	YAG	1115.255	77.0	47.53 %
5	207	RYG	KVFRGNKVKN	AQL	1189.405	76.0	46.91 %
6	63	AFL	DELKAENIKK	FLY	1187.355	76.0	46.91 %
7	398	VHE	IVRSFGTLKK	EGW	1148.405	73.0	45.06 %
8	530	RLG	IASGRARYTK	NWE	1122.295	70.0	43.21 %
9	454	NAD	SSIEGNYTLR	VDC	1139.235	70.0	43.21 %
10	731	VKR	QIYVAAFTVQ	AAA	1139.315	67.0	41.36 %
11	614	IYS	ISMKHPQEMK	TYS	1228.485	67.0	41.36 %
12	332	VGP	GFTGNFSTQK	VKM	1086.155	67.0	41.36 %
13	201	GKI	VIARYGKVFR	GNK	1208.475	66.0	40.74 %
14	85	HLA	GTEQNFQLAK	QIQ	1135.235	65.0	40.12 %
15	206	ARY	GKVFRGNKVK	NAQ	1132.355	64.0	39.51 %

The PSSM-specific binding threshold reported by RANKPEP is obtained by scoring all the peptide sequences included in the alignment from which a profile is derived, and is defined as the score value that includes 85% of the peptides within the set. Peptides whose score is above the binding threshold will appear highlighted in red. Peptides produced by the cleavage prediction model are highlighted in violet.

Related Resources

MHC Databases: gene sequence, polymorphisms, etc

IMGT: ImMunoGeneTics database
IMGT/HLA database
dbMHC Database at NCBI
Allele Frequencies
HLA Informatics group
IHWG: International Histocompatibility Working Group
Genetics and Molecular Genetics of the MHC
The Tumor Gene Database

Peptide Databases

MHCPEP
SYFPEITHI
HIV Molecular Immunology Database
MHCPEP HLA Ligand/Motif DATABASE
MHCBN Datatabase: Comprehensive Database of MHC Binding and Non-binding Peptides
HLA Ligand/Motif DATABASE
JenPep Database: MHC and TAP ligands, plus T and B cell epitopes
FIMM Database: T and B cell epitopes
MPID: MHC-peptide interation Database

Server	Class	MHC Method
Online Resources for the prediction of MHC-peptide binding
BIMAS	I	Quantitative matrices (QM)
MHCPred	I	QM
Epitope binding prediction	I	QM
ProPred	II	QM
Epipredict	II	QM
MMPRED	I	QM. Predict mutated binders
LPPEP	I	Linear Programing. (only HLA-A2)
SVMHC	I	Support vector machines (SVM).
nHLAPred	I	Artificial Neural Networks (ANN)
NetMHC	I	ANN, only human HLA-A2, and mouse H2K
SYFPEITHI	I and II	Motif matrices (MM). Only a few MHCII molecules
RANKPEP	I and II	Position Specific Scoring Matrices (PSSM)
PREDEP	I	MM
PROPRED1	I	QM, also proteasomal cleavage and promiscuous peptiptides
CTLPRED	I	Combination of ANN, and SVM
MHC-Thread	II	Structural Peptide Threading

Prediton of Proteasomal cleavage and TAP binding>

PRAPOC. Prteasomal cleavage. Artifical Neural Network
NETCHOP. Immunoproteasomal cleavage. Artifical Neural Network
RANKPEP. Immunoproteasomal cleavage. N-gram statistics
Epitope. Tap binding predictions

Companies offering T cell epitope predictions

Epivax . Based on EpiMatrix algorithm. Motif matrices. Class I MHC molecules
Vaccinome . Based on TEPITOPE algorithm. Quantitative and virtual matrices. Class II MHC molecules
Biovation. Based on peptide-MHC threading. Class I MHC molecules