PEPVAC DOCUMENTATION
GENOME WIDE PREDICTION OF PROMISCOUS EPITOPES FOR VACCINE DESIGN

HOME Overview HLA-peptide binding Supertypes/Coverage Proteasome | Team References

Overview to delopment of multi-epitope T-cell vaccines

Cellular adaptive immune responses are mediated by a type of leukocyte named T cells, which are mostly responsible for intracellular surveillance. T cells carry out this mission by recognizing peptide-antigens (epitopes) displayed in the context of major histocompatibility complexes (pMHC) via their T cell receptors. Since T cells recognizing self-peptides are eliminated during the process of thymic selection, those pMHC incorporating foreign peptides are the primary focus of T cell-mediated immune responses (von Boehmer, 1991). According to the presence on their cell surface of either of the co-repectors CD8 or CD4, T cells can be divided into CD8- and CD4-T cells, respectively. CD8-T cells and CD4-T cells engage different type pMHC complexes. Thus, CD8-T cells recognize epitopes in the context of MHC molecules of the class I (pMHCI), whereas CD4-T cells recognize peptide-antigens in the context of MHC molecules of class II (pMHCII). CD4- and CD8-T cell based immune responses also differ. CD4-T mediated immune response is more complex, and it works by providing help via cytokine production to other comparments of the immune system (B-cells and/or CD8-T cells). On the other hand, CD8-T cell based immune response is simpler and better understood, as these are cytotoxic T lymphocytes (CTLs) that directly destroy cells detected expressing pMHCI complexes of foreign peptide. Therefore, cytotoxic CD8-T lymphocytes (CTL)-mediated immune responses play a central role in protective immunity against many viral and intracellular bacterial infections (Zinkernagel, 1996 ; Schaible et al. 1999). Moreover, CLTs play also a major role in the recognition and subsequent elimination of tumor cells, as these usually display an altered repertoire of peptides, due to the expression of "new" and/or mutated proteins (Wang and Rosenberg, 1996). Thus, identification of CTL epitopes that can induce protective cellular immunity is critical in the development T cell-based vaccination strategies against human infectious diseases and cancer, and therefore it has received extensive attention.

Sufficient conditions for a peptide to be a CTL epitope are not well known, however a necessary condition is that they bind to MHCI molecules (HLA molecules in human, for Human Leukocyte Antigens). Therefore, most strategies for the anticipation of CTL epitopes are based on the indentification of peptides can that can bind to HLA molecules. In this regard, in PEPVAC identification of peptide-HLA binding is based on using specific motif profiles (Reche et al., 2002). HLAI molecules bind and display peptide fragments for recognition by the TCRs of T-cells only if they have been processed adequately from their protein sources. Class I restricted peptides result from proteolytic degradation of cytosolic proteins, with experimental evidence indicating that most CTL epitopes result from proteosomal cleavage, in particular the C-terminus (Craiu et al., 1997). Thus, since the proteasome plays a vital role in determining CTL epitopes, we have implemented in PEPVAC a probabilistic model that indicates whether the C-terminus of a predicted HLA peptide binder is the result of proteosome cleavage.

HLA molecules are extremely polymorphic, can bind distinct sets of peptides (Reche and Reinherz, 2003), and they are expressed at vastly variable frequencies in different ethnic groups (HLA, 1998). Thus, it would appear that an extremely large and impractical number of peptides would have to be selected in order to develop a multi-epitope vaccine that is broadly protective. In PEPVAC, we have surmounted this formidable obstacle by targeting for peptide binding predictions a 5 groups of HLA alleles ((Supertypes) ) sharing largely overlapping sets of predicted peptide binders, and covering the whole population. Only the peptides predicted to bind to all HLA alleles included in each Supertype (promiscuous epitopes) set are selected as potential T-cell epitopes. Thus, identification of these promiscuous peptide binders minimizes the total number of predicted epitopes without compromising the population coverage required in the design of multi-epitope vaccines.

An additional problem to the development of multi-epitope vaccines is the high mutation rate of many pathogens. To overcome this problem, for pathogens such as H1V1 that are known to use this strategy to evade the immune system, we have considered multiple sequence amino acid alignments for each of the gene products, and generated consensus sequences with the variable residues masked (We have masked those with a value of Shannon Entropy value > 1). Therefore, in PEPVAC the prediction of CTL epitopes for such genomes is restricted to the conserved regions.
HLA-peptide binding predictions

Anticipation of CTL epitopes lies fundamentally in the prediction of HLA peptide binders, and this is accomplished using specific profile motifs of known HLA peptide binders (Reche et al., 2002). For the prediction of peptide-HLA binding these profiles or Position Specific Scoring Matrices (PSSMs) are used in combination with a modified version of the RANKPEP algorithm that scores and ranks all peptides within a query of protein/s accordingly. In RANKPEP, scoring and ranking of peptides is applied locally and independently for each of the proteins entered in the query (scores are not compared between peptides from different proteins). In contrast, in the modified version of RANKPEP used by PEPVAC, peptide scoring occurs globally (independently of protein source), and all peptides are ranked and sorted accordingly. Therefore, a new field in the output have now been added to indicate the protein source of the peptide.

We have reported previously that approximately 80% of all known MHCI-restricted epitopes are found among the 2% of top scoring peptides from their protein sources (Reche et al., 2002), and thus, in this version of PEPVAC we have considered as true binders only the 2% of top scoring peptides. Also as the vast majority of known peptides binding to HLA molecules are 9 mers, we have only considered profiles for the prediction of HLA-peptide binders 9 residues long.

Profiles or PSSMs basically consist of a table containing the sequence-weighted frequency of each one of the 20 amino acids observed in every column of the alignment divided by the corresponding expected frequency of that amino acid in the background (usually the frequency of the amino acid in the SWISSPROT database). An example PSSM be found here . Profiles were derived as indicated elsewhere (Reche et al., 2002), although sequences were obtained from the EPIMHC database (Reche et al., 2003b).
Supertypes/Coverage
Supertype is aterm first coined Sette and Sidney (1998) and serves to indicate a group of HLA alleles that bind a largely overlapping set of peptides. The concept of supertype is linked to that of the supermotif. HLA allele bind peptides that fitting a sequence binding motif specific to that HLA allele. Thus, if a set of peptides can bind at the same time to distinct HLA alleles, that implies the existence of a peptide binding supermotif. Sette and Sidney (1998) defined several HLA supertypes by visual comparison of the reported peptide binding motifs of individual HLA alleles. We have defined HLA supertypes in different way, by comparing the overlap between the predicted binding peptides from a random protein 1000 aa long by a group of HLA specific profile matrices (2% top scoring peptides were considered binders). We have thus generated a distance matrix whose coefficients (dij) are inversely proportional to the number of identical peptide binders (nij) between any two HLA profiles (dij = 200 -nij). Finally, using a Fitch-Margoliash clustering algorithm we derived a dendrogram to determine the kinship among the HLA specific peptide binding sets, and defined the HLA supertypes accordingly. The dendrogram from which the HLA Supertypes were drawn can be seen here. The following supertypes have been included in this version of PEPVAC:

  • A2: A*0201, A*0202, A*0203, A*0205, A*0206
  • A3: A*0301, A*1101, A*3101, A*3301, A*6801
  • A24: A*2301, A*2402, A*2403, A*2405, A*2407
  • B7: B*0702, B*3501, B*5101, B*5301, B*5401
  • B15: A*0101, B*1501_B62, B1502

There is virtually no overlapping between the peptide binders from two different supertypes, and only the promiscuous epitopes of each supertype are selected as potential epitopes (those binding to all alleles included in that supertype). This minimizes the number of peptides without compromising the population coverage required in the design of multi-epitope vaccines. Indeed, these supertypes were selected on the basis that providing epitope prediction for all the included HLA alleles will result in broadly efficacious peptide-based vaccine (population coverage about 95 %). Population coverage for any combination of selected HLA alleles was obtained from HLA allele and haplotype gene frequencies for 5 major American ethnities (Black, Caucasian, Hispanic, Native American, and Asian) (Cao et al., 2001), and it was computed using a modified version of the Schipper et al. (1996) algorithm. This new algorithm takes in account linkage desiquilibrium between alleles of different loci from the haplotype frequencies. Only haplotype frequencies between the HLA-A and -B loci and between the HLA-B and -C loci were included in these calculations. No linkage desequilibrum was considered between the HLA-A and -C genes. For any combination of HLA alleles the population coverage reported by PEPVAC corresponds to that of the ethnic group with the lowest coverage.
Proteasome cleavage
Class I restricted peptides result from the processing of cytosolic proteins, and involves cleavage by the proteosome, cytosolic N-terminal exopeptidases, TAP mediated transport of peptides to the Endopasmic Reticulum (ER), and finally ER N-terminal exopeptidases (Serwold et al., 2002). Thus, the N-terminus of any class I restricted peptide is highly variable, since it is dictated by the progresive catalytic action of several aminopeptidases. On the other hand, the C-terminus of class I restricted epitopes is the direct result of the activity of the proteosome. Thus, proteosome cleavage predictions help to refine and reduce the number of predicted epitopes, and hence we have modeled the probability of the C-terminus of a given peptide to be the result of proteosomal cleavage.

Probabilistic models for protosemal cleavage were generated using the SRILM statistical language model toolkit (Stolcke, 2002) from a training set of protein fragments containing the C-terminal end of 332 selected class I restricted epitopes. The length of the fragment in the training set varies for each model implemented in PEPVAC (10, 6, and 4, for models 1, 2 and 3 respectively), and the C-terminal end of the class I restricted epitope is indicated by a symbol tag ("|") and flanked by the same number of residues on both sides. Training involves using a fixed window size (order) which is the segment of the protein fragment that is processed by the training algorithm (the training window is smaller than the size of the fragment(2, 4, 2, for models 1, 2 and 3 respectively). The model thus created is then applied to a longer test peptide or complete protein, using a testing window (order)- which is the segment of the peptide that is processed by the algorithm to determine the probability that cleavage will take place at the index point of the window. The model is given a cutpoint threshold; cutpoint probabilities above this threshold result in the prediction of a cutpoint. Thus, the variable parameters in each cleavage prediction are the length of the fragments used to create the training set, the window size used to train the model and for determining the cutpoint probabilities in the tested peptide, and the cutpoint insertion threshold. Sensitivity of the three models used is above 80%, as tested in a set of 932 MHCI naturally restricted peptides.
Team
Dr. Pedro A. Reche
JP Glutting, MPH

For question please contact Pedro A. Reche
References
  • von Boehmer H. Positive and negative selection of the alpha beta T-cell repertoire in vivo. Curr Opin Immunol. 1991 Apr;3(2):210-5. Review.
  • Zinkernagel RM. Immunology taught by viruses. Science. 1996 Jan 12;271(5246):173-8. Review.
  • Schaible UE, Collins HL, Kaufmann SH. Confrontation between intracellular bacteria and the immune system. Adv Immunol. 1999;71:267-377. Review. No abstract available.
  • Wang RF, Rosenberg SA. Human tumor antigens recognized by T lymphocytes: implications for cancer therapy. J Leukoc Biol. 1996 Sep;60(3):296-309. Review.
  • Reche PA, Glutting JP, Reinherz EL. Prediction of MHC class I binding peptides using profile motifs.Hum Immunol. 2002 Sep;63(9):701-9.
  • Craiu A, Akopian T, Goldberg A, Rock KL. Two distinct proteolytic processes in the generation of a major histocompatibility complex class I-presented peptide. Proc Natl Acad Sci U S A. 1997 Sep 30;94(20):10850-5.
  • Reche PA, Reinherz EL. Sequence variability Analysis of human class I and class II MHC molecules: fuctional and structural correlates of polymorphisms. J. Mol. Biol. In press
  • HLA 1998. David W. Gjertson and Paul I. Terasaki, Editors
  • Reche PA, Glutting, JP, Reinherz, EL. 2003 EPIMHC database. In preparation
  • Cao K, Hollenbach J, Shi X, Shi W, Chopek M, Fernandez-Vina MA. Analysis of the frequencies of HLA-A, B, and C alleles and haplotypes in the five major ethnic groups of the United States reveals high levels of diversity in these loci and contrasting distribution patterns in these populations. Hum Immunol. 2001 Sep;62(9):1009-30.
  • Schipper RF, van Els CA, D'Amaro J, Oudshoorn M. Minimal phenotype panels. A method for achieving maximum population coverage with a minimum of HLA antigens. Hum Immunol. 1996 Dec;51(2):95-8.
  • Serwold T, Gonzalez F, Kim J, Jacob R, Shastri N. ERAAP customizes peptides for MHC class I molecules in the endoplasmic reticulum. Nature. 2002 Oct 3;419(6906):480-3.
  • Stolke A. An Extensible Language Modeling Toolkit", in Proc. Intl. Conf. Spoken Language Processing, Denver, Colorado, September 2002