RANKPEP      RANKPEP: Prediction of MHC-restricted ligands
    INFORMATION                    Usage Help · Results help  · Related Resources ·
Binding predictions cleavage predictions Immunodominance Sequence variability masking References


T cell immune responses are driven by antigenic epitopes, and hence their identification is important for understanding disease pathogenesis and etiology, and for vaccine design. There are two types of T cell epitopes, named CD8 and CD4, which are only recognized in the context of the MHCI and MHCII molecules, respectively, by the correspondent T-cell types. Engaging both sets of T-cells is desirable for mounting a strong defensive immune response against cancer cells and pathogens. Therefore, we have a developed this site for the prediction of peptide binding to both MHCI and MHCII molecules, and subsequent anticipation of CD8- and CD4-T cell epitopes. Appropriate processing of antigen peptides must occur prior to their binding to the relevant MHC molecules. Incidentally, the C-terminus of most MHCI-restricted epitopes (CD8-T cell epitopes) results from cleavage by the proteasome, and thus, proteasome specifity is important for determing T-cell epitopes. Consequently, this site can can also determine whether the C- terminus of the predicted MHCI-ligands is the result of proteasomal cleavage. Also implemented in the RANKPEP web server is a variability masking feature to focus on the prediction of conserved epitopes, which could thus help to avoid immune evasion resulting from mutation.

Rankpep: MHC-peptide binding predictions

Peptides that bind to a given MHC molecule share sequence similarity. Not surprisingly, sequence patterns have been traditionally used for the prediction of peptides binding to MHC molecules. Such sequence patterns, however, have proven to be too simple, as the complexity of the binding motif cannot be precisely represented by the few residues present in the pattern [1]. To overcome this limitation, RANKPEP uses Position Specific Scoring Matrices (PSSMs) or profiles from set of aligned peptides known to bind to a given MHC molecule as the predictor of MHC-peptide binding.

PSSMs for the prediction of MHC-peptide binding
For a profile to be a good descriptor of the binding motif, peptides must be aligned by structural and/or sequence similarity [2]. MHCI and MHCII molecules bind peptides in similar yet different modes (Fig.1)(3,4), and alignments of MHCI- and MHCII-ligands were obtained to be consistent with the binding mode of the peptides to their MHC class.

Fig.1A MHCI molecule Fig.1B MHCII molecule
HLA- A*0201 in complex with a peptide LLFGYPVYV from HTLV-1 TAX protein (PDB: 1HHK)

HLA-DR1 in complex with peptide PKYVKQNTLKLAT from Influenza A virus ( (PDB:1FYT)

MHCI ligands are of short length (8-11), as they are constrained into the MHCI peptide binding groove, with their N- and C-terminal ends connected by a network of hydrogen bonds to conserved residues of the MHCI molecule (Fig.1A). Thus, peptides bound to the same MHCI can differ by one or two amino acids, and as discussed in (5), proper structural alignment of these peptides is better guaranteed if the peptides are of the same length. Accordingly, we have separated the peptides bound to a given MHCI molecule into subsets containing only peptides of the same length, and created separate PSSMs from ungapped block alignments. The peptide binding groove of MHCII molecules is open, binding peptides in a manner that both the N- and C-terminus can extend beyond the binding groove (Fig.1B), and thus, peptides bound to MHCII molecules display a great variability in length (9-22). Yet only a peptide core of 9 residues fits into the MHCII binding groove providing the binding energy. Poor amino acid sequence similarity between MHCII ligands together with their great variability in sequence length make them difficult to align. Thus, for the alignment of the MHCII ligands, we have used the motif discovery program MEME (6), including a priori information consistent with the MHCII-peptide binding mode: A) there is only one binding core per MHCII ligand; B) All the peptide sequences define the same motif and C) the length of the motif is 9.

PSSMs were obtained from these alignments of MHC-ligands using PROFILEWEIGTH (7)(for MHCI-ligands) which use brach-proportional sequence weights or BLK2PSSM with a position-based weights (8,9)(for MHCII ligands).

Scoring MHCI-peptide binding using PSSMs
The binding potential (score) of any peptide sequence (query) to a given MHCI is obtained by aligning the relevant PSSM with the protein segments, and adding up the profile scores that match the residue type and position in the profile. To search protein sequences for MHC ligands using PSSMs we use a dynamic algorithm written in Python that scores all protein segments with the length of the PSSM width, and sorts them accordingly. Scoring starts at the beginning of each sequence and the PSSM is slid over the sequence one residue at a time until the end of the sequence. Furthermore, to narrow down the potential binders from the list of ranked peptides, we defined a binding threshold as the score value that includes 90% of the peptides within the PSSM. This binding threshold is built into each of our matrices, delineating the range of putative binders among the top scoring peptides.

Performace of PSSMs predicting MHC-peptide binding
If PSSMs are good predictors of MHC-peptide binding, MHC-restricted T cell epitopes should be expected among the high scoring peptides from within their protein sources. Under this assumption, we found ~80% of MHCI-restricted epitopes are predicted, that at a 2% threshold of top scoring peptides. However, ~80% of MHCII-restricted epitopes are found among the ~5% top scoring peptides. Thus, using these PSSMs a larger number of predicted peptides is required for the correct identification of MHCII-restricted epitopes than MHCI-restricted epitopes

Rankpep: Cleavage Predictions

Anticipation of T-cell epitopes is heavily predicated on the prediction of MHC-peptide binding. Yet prior to MHC binding, correct peptide processing must occur. Processing of MHCII-restricted epitopes occurs in the endosomal compartment, and it is mediated by several endopeptidases in combination with amino- and carboxy-peptidases (10, 11). This complexity makes the identification of any pattern related with processing of class II restricted peptides difficult. On the other hand, there is experimental evidence that the C-terminus of MHCI- restricted epitopes results from just the proteolyses of cytosolic proteins mediated by the proteasome (12). The proteasome thus plays a vital role in determining CTL epitopes, and its specificity can be modeled from MHCI-restricted peptides and their C-terminal flanking regions using statistical language models.

Methods and Implementation
Cleavage by the proteasome occurs at preferential sites within the protein, and the sequence signals from antigenic peptides processed by the proteasome are specially conserved at position P1 of cleavage site (C terminus of antigenic peptide) and its immediate flanking P1' residue (13). Prediction of proteasomal cleavage resembles the problem of language tagging (modeling the location of grammatic tags such as punctuation sing) and thus we have used the SRI Language Modeling toolkit (SRILM)(14) for statistical modeling of proteasomal cleavage sites. Training sets for statistical modeling of proteasomal cleavage were obtained from a database contaning the C-terminus and flanking regions of 332 antigens restricted by human MHCI molecules. Language Models for Proteasomal Cleavage Prediction (LMPCP) were created created from training sets of peptide fragments of variable fragment length derived from the above database. LMPCP were tested at different cutpoint probabilities (0.35 to 0.70) using HIDDEN- NGRAM over testing files of peptide fragments not included in the training sets as well as their entire protein sources. For each probability threshold, 23 different LMPCPNi (N= 10, 6, 4; i = 1 N-2) were tested, where N is the fragment size of fragment in training and testing sets and i is the order of the LMPCP tested. Best results (Predicted Cleavage Sites, PCS > 75%) were obtained from LMPCPs under soft cutting probability (0.35-0.5), indicating that the nature of the proteasome specificity is much less rigid than that of grammar tagging


The language models for proteasomal cleavage (LMPCP) available in RANKPEP are the following: LMPCPs were obtained using N-GRAM-COUNT and tested using HIDDEN-NGRAM at different probability thresholds (Pro) as indicated.

Frag. Size (N) This is the size of the fragments in training and testing sets.
Order(i): This is the order the models were build with N-GRAM-COUNT and tested with HIDDEN-NGRAM.
Prob:Probability above which a cutpoint is predicted with the relevant model
PCS: Predicted Cleavage Sites.
ECS: Expected Cleavage Sites. Calculated using the equation 100*C/(N-1) where C is the average number of cutpoint per fragment yield by a given model LMPCP when tested in a file of fragments size N.
Mean size. This is average size of the fragments yielded by HIDDEN-NGRAM when tested over source full length protein of the peptide fragments in the test file

Rankpep: immunodominance filter

T-cell epitope immunogenicity is contingent on several factors: 1) appropriated and effective processing of a peptide its their protein source, 2) stable peptide binding to the MHC molecule, and 3) the ability of the TCR to recognize MHC-bound peptide. Appropriate computational modeling of these three processes is required for accurate prediction of T-cell epitopes. Until now, only peptide MHC-binding, and processing of peptides for MHC class I restricted epitopes, have been considered in epitope prediction algorithms. Until now, only peptide MHC-binding, and processing of peptides for MHC class I restricted epitopes, have been considered in epitope prediction algorithms. However, in this version of rankpep we have implemented a model for immunodomant recocgnition of peptides by the TCR.

Methods and Implementation
Modeling of potential immunodominance features in peptides which may make them more readily recognized by TCRs was approached by comparing a set of immunogenic peptides versus a set of non-immunogenic peptides. To minimize any contribution due to MHC-binding and processing, only high affinity peptide binders to MHC class I molecules were chosen, and flanking regions were not considered. The immunogenic peptide set consisted of 101 peptides with high MHC binding and high T-cell activity. On the other hand, the non-immunogenic set consisted of ~400 peptides high MHC binding and no T-cell activity. All considered peptides were 9mers. Modeling of the differences between was carried out using Support Vector Machine trained on these two sets on the bassis of their residue composition, amino acid sequence, and both.

SVM classifiers trained on the sequence [1], physico-chemical properties [2] or on the combination of both [1+2] are able to distinguish the immunogenic and non-immunogenic peptides but with low accuracy. The classifier based on physico-chemical properties is able to outperform the classifier based on sequence alone, thus suggesting that the properties of peptides may be playing a critical role in deciding immunodominance.
SVM-based immunodominace classifiers trained on residue properties and amino acid sequence were able to discriminate the immunogenic peptides from non-immunogenic peptides with an accuracy of 60.0% threshold of 0.5. This low accuray may be due to the fact that we are pooling together all peptides regardless of MHCI restriction. Unfortunatelly, at this time there is not enough data on immunogenic and non-immunogenic for a single MHCI restriction element.

In Rankpep we have choosen an immunodominance filter based on SVM-based classifier trained on both residue properties and amino acid sequence. If this filter is set ON only those peptides that are classified as immmunodominant are returned by the server. Since immunodominance classification is threshold dependent we have given three optional thresholds to chose from.
  • Threshold 1.0: 49.5% sensitivity, 76.0% specificity
  • Threshold 0.5: 59.4% sensitivity, 69.4% specificity (Default)
  • Threshold 0.0: 68.3% sensitivity, 60.9% specificity
Due to the low accuracy of the classification method we recomend using these immunogenic filter only for large genome scale epitope predictions as a resort to limit the number of potential epitopes.

Rankpep: Sequence variability masking

We have noted that mutation offers a means for immune-evasion exploited by some pathogenic organisms such us HIV. In response to this limitation, RANKPEP predictions can be obtained from multiple sequence alignments. In this case, the server first creates a consensus sequence in which the variability positions are masked, and binding predictions are restricted to segments with no variable positions. Sequence variability is calculated from multiple amino acid sequence alignments as indicated by Reche and Reinherz (15), using a variability metric (V) formally identical to the Shannon entropy equation (16)

where Pi is the fraction of residues of amino acid type i, and M is equal to 20, the number of amino acid types. V ranges from 0 (total conservation, only one amino acid type is present at that position) to 4.322 (all 20 amino acids are equally represented in that position). Note that in order to achieve the maximum value V = 4.3, at least 20 sequences are required. Gap symbols (-) are considered for deriving the consensus sequence but are not computed for the variability calculations. Given a sequence variability threshold Vt, the consensus sequence is generated from the sequence alignment only for those positions with a V <= Vt as the most common amino acid, whereas variable position positions (V > Vt) are masked and represented in the consensus sequence with a "." symbol. Segments with a position masked are not considered in the RANKPEP predictions of MHC-peptide binding.


  • 1. Ruppert J, Sidney J, Celis E, Kubo RT, Grey HM, Sette A: Prominent role of secondary anchor residues in peptide binding to HLA-A2.1 molecules. Cell 74:929, 1993.

  • 2. Gribskov M, McLachlan AD, Eisenberg D: Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 84:4355, 1987.

  • 3. Madden DR: The three-dimensional structure of peptide-MHC complexes. Annu. Rev. Immunol. 13:587, 1995.

  • 4. Stern LJ, Wiley DC: Antigen peptide binding by class I and class II histocompatibility proteins. Structure 2:245, 1994.

  • 5. Reche PA, Glutting JP and Reinherz EL. Prediction of MHC Class I Binding Peptides Using Profile Motifs.Human Immunology 63, 701 709 (2002).

  • 6. Bailey, T. L. and C. Elkan (1995). "The value of prior knowledge in discovering motifs with MEME.Proc Int Conf Intell Syst Mol Biol 3:21-9.

  • 7. Thompson, J. D., D. G. Higgins, et al. (1994). "Improved sensitivity of profile searches through the use of sequence weights and gap excision." Comput Appl Biosci 10(1): 19-29.

  • 8. Henikoff, S., J. G. Henikoff, et al. (1999). "Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations." Bioinformatics 15(6): 471-9.

  • 9.Henikoff, S. and J. G. Henikoff (1994). "Position-based sequence weights." J Mol Biol 243(4): 574-8.

  • 10. Pieters, J. (2000). "MHC class II-restricted antigen processing and presentation." Adv Immunol. 75: 159-208.

  • 11. Watts, C. (2001). "Antigen processing in the endocytic compartment." Curr Opin Immunol 13(1): 26-31.

  • 12. Craiu, A., T. Akopian, et al. (1997). "Two distinct proteolytic processes in the generation of a major histocompatibility complex class I-presented peptide." Proc Natl Acad Sci U S A 94(20): 10850-5.

  • 13. Altuvia, Y. and H. Margalit (2000). "Sequence signals for generation of antigenic peptides by the proteasome: implications for proteasomal cleavage mechanism." J Mol Biol 295(4): 879-90.

  • 14. Stolcke, A. (2002). SRILM -- An Extensible Language Modeling Toolkit. Proceedings of the International Conference of Spoken Language Processing. T. M. N. J. J. Ohala, B. L.

  • 15. Reche, P. A. and E. L. Reinherz (2003). "Sequence variability analysis of human class I and class II MHC molecules: functional and structural correlates of amino acid polymorphisms." J Mol Biol 331(3): 623-41.

  • 16. Shannon, C. E. (1948). "The mathematical theory of communication." The Bell System Technical Journal 27: 379-423, 623-656.

    RANKPEP methods are described in full detail in the following publications:

    Reche PA, Glutting JP and Reinherz EL.
    Prediction of MHC Class I Binding Peptides Using Profile Motifs. Human Immunology 63, 701 709 (2002).

    Reche PA, Glutting JP and Reinherz EL. Enhancements to the RANKPEP server for the prediction of MHC-binding peptides using profiles.
    In preparation.

Rankpep: Usage help

PSSM: Position-specific scoring matrix or profile.

Profiles basically consist of a table listing the observed sequence-weighted frequency of all amino acids in every column of a sequence alignment. Peptide alignments and PSSMs for the prediction of MHCI and MHCII were obtained differently as indicated here.

RANKPEP includes a selection of 102 and 80 PSSMs for the prediction of peptide binding MHCI and MHCII molecules, respectively. Several PSSMs for the prediction of peptide binders of different sizes are usually available for each MHCI molecule. By default, we suggest to use PSSMs for the prediction of peptides of 9 residues. MHCII-specific PSSMs are always for the prediction of peptide binders of 9 residues.

User can also upload their own PSSMs, and use them as the predictors of peptide-MHC biding. Profiles must be in text format, in the form of a table, and with the amino acid types arranged in columns and specified in a header line. An example of PSSM for the prediction of MHC peptide binding to the mouse MHC molecue Kb is shown here


Input query for RANKPEP can either be s protein sequence/s in FASTA format or a Multiple Sequence Alignment (MSA) in CLUSTALW format. If MSA is provided, MHC-peptide binding predictions are obtained from a consensus sequence with the variable position masked

Input query can be pasted or uploaded from a local file. If a local file is uploaded, it will be processed by default. Files uploaded onto the server must contain only ascii characters. In other words, make sure you save your files as TEXT before you make any attempt of uploading them onto this server. Otherwise, your computer might, literally, exploit in front of your nose. Please, note that the MIF will not liable for any personal injury derived from such missuse of the server.

  • Example of sequences in FASTA format:
  • >HLA
    >A56881  PIR2 release 71.00
  • Esample of MSA in ClustalW format:
  • CLUSTAL W (1.81) multiple sequence alignment
                        :* **  **::: *****:. ******** . * ***  *:********:
                        *  * .*     *****  ***  * : **::*.* *:***: * **:* 
                         :***** **  *::*** :  **: :***** *****:*:*:**: ** 

    The user can also provide


    Threshold of peptides predicted to bind to a given MHC molecule can be set by a given percentage of top scoring peptides or by a fixed number of scoring peptides. In addition, PSSMs in RANKPEP are associated with a specific binding threshold (PSBT) above which sorted peptides are highlighted in the results page. PSBT is obtained by scoring the peptides included in the PSSM alignment, so that 85% of these peptides score above the PSTB. In general, we found that ~80% of MHCI-restricted epitopes are predicted at a 2% threshold of top scoring peptides. On the other hand, ~80% of MHCII-restricted epitopes are found among the ~5% top scoring peptides. Thus, we suggest using a 2-3% binding threshold of top scoring peptides for the prediction of peptide binders to MHCI, and a 4-6% threshold for the prediction of MHCII-peptide binders.


    C-terminus of MHCI-restricted peptides is generated by the proteasome, and thus RANKPEP also determines whether the C-terminus of the predicted MHCI-peptide binders are the result of proteasomal cleavage. Moreover, these sequences are highlighted in purple in the output results. Proteasomal cleavage predictions are carried out using three optional models obatined applying statistical language models to a set of knwon epitopes restricted by human MHCI molecules as indicated here.

    Variable parameters in language models for the prediction of proteasomal cleavage include the length of the fragments used to create the training set, the window size used to train the model and for determining the cutpoint probabilities in the tested peptide, and the cutpoint insertion threshold. Three of the best models are available for prediction of C-terminus cleavage. These were constructed as follows:
    • ONE: This model was built using peptide fragments of ten amino acids in length, and training and prediction windows of two amino acids in length. The cutpoint insertion threshold is set at .35.
    • TWO: This model was built using peptide fragments of six amino acids in length, and training and prediction windows of 6 amino acids in length. The cutpoint insertion threshold is set at .40.
    • THREE: This model was built using peptide fragments of four amino acids in length, and training and prediction windows of two amino acids in length. The cutpoint insertion threshold is set at .50.

    This feature allows to output only those peptides that fall into a Molecular Weight window set by the user. By default all predicted peptides are shown

    If the user inputs a multiple sequence alignment, the server first creates a consensus sequence in which the variability positions are masked, and binding predictions are restricted to segments with no variable positions. Variable positons are obtained using the Shannon entropy equation as the variability messure.

    Default variability threshold for masking is 1.0 (positions with a variability above 1.0 will be masked). However, this value can be set to any value ranging between 0 and 4.3; the limit values from the Shannon entropy equation . If a variability threshold of 4.3 is chosen, binding prediction will be carried out from a consensus sequence without tthe variable positions masked. On the other hand, if the variability threshold is set to 0, all positions in the multiple sequence that are not 100% conserved will be masked in the consensus sequence.

    Rankpep: Results help

    The output of RANKPEP consists of a list of peptides ordered by their binding potential (score) to the selected MHC molecule. In addition, RANKPEP's output also includes the following information:
    • RANK, Relative rank of the predicted peptide
    • POS., Position of the peptide in the input protein
    • N, Amino acid sequence of the three residues preceding the N-terminus of the predicted peptide
    • SEQUENCE, Amino acid sequence of predicted peptide
    • C , Amino acid sequence of the three residues following the C-terminus of the predicted peptide
    • MW(Da), Molecular weight in Daltons of the predicted peptide
    • SCORE, Score of the peptide
    • % OPT., Percentile score of the predicted peptide relative to that of the consensus.
      The consensus is the sequence that yields the maximum score, namely optimal score, with the selected profile.
    The following is an an example of the RANKPEP output:
    Matrix: 10mer_HLA_A11_A_1101_.pwp
    Consensus: ATYYGSVVYK
    Optimal Score: 162.0
    Binding Threshold: 71.00
    Protein 1 of 1:

    >A56881 PIR2 release 71.00
    1491FEGKSLYESWTKKSPS1246.45591.056.17 %
    2601RDYAVVLRKYADKIYS1162.39590.055.56 %
    3577LTVAQVRGGMVFELAN1093.26578.048.15 %
    4690YRHVIYAPSSHNKYAG1115.25577.047.53 %
    5207RYGKVFRGNKVKNAQL1189.40576.046.91 %
    663AFLDELKAENIKKFLY1187.35576.046.91 %
    7398VHEIVRSFGTLKKEGW1148.40573.045.06 %
    8530RLGIASGRARYTKNWE1122.29570.043.21 %
    9454NADSSIEGNYTLRVDC1139.23570.043.21 %
    10731VKRQIYVAAFTVQAAA1139.31567.041.36 %
    11614IYSISMKHPQEMKTYS1228.48567.041.36 %
    12332VGPGFTGNFSTQKVKM1086.15567.041.36 %
    13201GKIVIARYGKVFRGNK1208.47566.040.74 %
    1485HLAGTEQNFQLAKQIQ1135.23565.040.12 %
    15206ARYGKVFRGNKVKNAQ1132.35564.039.51 %

    The PSSM-specific binding threshold reported by RANKPEP is obtained by scoring all the peptide sequences included in the alignment from which a profile is derived, and is defined as the score value that includes 85% of the peptides within the set. Peptides whose score is above the binding threshold will appear highlighted in red. Peptides produced by the cleavage prediction model are highlighted in violet.

    Related Resources

    MHC Databases: gene sequence, polymorphisms, etc

    Peptide Databases

    Online Resources for the prediction of MHC-peptide binding
    Server Class MHC Method
  • I Quantitative matrices (QM)
  • MHCPred
  • I QM
  • Epitope binding prediction
  • I QM
  • ProPred
  • II QM
  • Epipredict
  • II QM
  • I QM. Predict mutated binders
  • I Linear Programing. (only HLA-A2)
  • I Support vector machines (SVM).
  • nHLAPred
  • I Artificial Neural Networks (ANN)
  • NetMHC
  • I ANN, only human HLA-A2, and mouse H2K
  • I and II Motif matrices (MM). Only a few MHCII molecules
  • I and II Position Specific Scoring Matrices (PSSM)
  • I MM
  • I QM, also proteasomal cleavage and promiscuous peptiptides
  • I Combination of ANN, and SVM
  • MHC-Thread
  • II Structural Peptide Threading

    Prediton of Proteasomal cleavage and TAP binding>

    • PRAPOC. Prteasomal cleavage. Artifical Neural Network
    • NETCHOP. Immunoproteasomal cleavage. Artifical Neural Network
    • RANKPEP. Immunoproteasomal cleavage. N-gram statistics
    • Epitope. Tap binding predictions

    Companies offering T cell epitope predictions

    • Epivax . Based on EpiMatrix algorithm. Motif matrices. Class I MHC molecules
    • Vaccinome . Based on TEPITOPE algorithm. Quantitative and virtual matrices. Class II MHC molecules
    • Biovation. Based on peptide-MHC threading. Class I MHC molecules