Overview
to delopment of multi-epitope T-cell vaccines
|
Cellular adaptive immune responses are mediated by a type of leukocyte
named T cells, which are mostly responsible for intracellular
surveillance. T cells carry out this mission by recognizing
peptide-antigens (epitopes) displayed in the context of major
histocompatibility complexes (pMHC) via their
T cell receptors. Since T cells recognizing self-peptides are
eliminated during the process of thymic selection, those pMHC
incorporating foreign peptides are the primary focus of T cell-mediated
immune responses (von Boehmer, 1991). According to the presence on
their cell surface of either of the co-repectors CD8 or CD4, T cells
can be divided into CD8- and CD4-T cells, respectively. CD8-T cells and
CD4-T cells engage different type pMHC complexes. Thus, CD8-T cells
recognize epitopes in the context of MHC molecules of the class I
(pMHCI), whereas CD4-T cells recognize peptide-antigens in the context
of MHC molecules of class II (pMHCII). CD4- and CD8-T cell based immune
responses also differ. CD4-T mediated immune response is more complex,
and it works by providing help via cytokine production to other
comparments of the immune system (B-cells and/or CD8-T cells). On the
other hand, CD8-T cell based immune response is simpler and better
understood, as these are cytotoxic T lymphocytes (CTLs) that directly
destroy cells detected expressing pMHCI complexes of foreign
peptide. Therefore, cytotoxic CD8-T lymphocytes (CTL)-mediated immune
responses play a central role in protective immunity against many viral
and intracellular bacterial infections (Zinkernagel, 1996 ; Schaible et
al. 1999). Moreover, CLTs play also a major role in the recognition and
subsequent elimination of tumor cells, as these usually display an
altered repertoire of peptides, due to the expression of
"new" and/or mutated proteins (Wang and Rosenberg, 1996).
Thus, identification of CTL epitopes that can induce protective
cellular immunity is critical in the development T cell-based
vaccination strategies against human infectious diseases and cancer,
and therefore it has received extensive attention.
Sufficient conditions for a peptide to be a CTL epitope are not well
known, however a necessary condition is that they bind to MHCI molecules
(HLA molecules in human, for Human Leukocyte Antigens). Therefore, most
strategies for the anticipation of CTL epitopes are based on the
indentification of peptides can that can bind to HLA molecules. In this
regard, in PEPVAC identification of peptide-HLA binding is
based on using specific motif profiles (Reche et al., 2002). HLAI
molecules bind and display peptide fragments for recognition by the
TCRs of T-cells only if they have been processed adequately from their
protein sources. Class I restricted peptides result from proteolytic
degradation of cytosolic proteins, with experimental evidence
indicating that most CTL epitopes result from proteosomal cleavage, in
particular the C-terminus (Craiu et al., 1997). Thus, since the
proteasome plays a vital role in determining CTL epitopes, we have
implemented in PEPVAC a probabilistic model that indicates whether the
C-terminus of a predicted HLA peptide binder is the result of proteosome cleavage.
HLA molecules are extremely polymorphic, can bind distinct sets of
peptides (Reche and Reinherz, 2003), and they are expressed at vastly
variable frequencies in different ethnic groups (HLA, 1998). Thus, it
would appear that an extremely large and impractical number of peptides
would have to be selected in order to develop a multi-epitope vaccine
that is broadly protective. In PEPVAC, we have surmounted this formidable
obstacle by targeting for peptide binding predictions a 5 groups of HLA alleles
((Supertypes) ) sharing largely overlapping sets of
predicted peptide binders, and covering the whole population.
Only the peptides
predicted to bind to all HLA alleles included in each Supertype
(promiscuous epitopes) set are selected as potential T-cell epitopes.
Thus, identification of these promiscuous peptide binders
minimizes the total number of predicted epitopes without compromising
the population coverage required in the design of multi-epitope
vaccines.
An additional problem to the development of multi-epitope vaccines is
the high mutation rate of many pathogens.
To overcome this problem, for
pathogens such as H1V1 that are known to use this strategy to evade the
immune system, we have considered multiple sequence amino acid
alignments for each of the gene products, and generated consensus
sequences with the variable residues masked (We have masked those with
a value of Shannon Entropy value > 1). Therefore, in PEPVAC the
prediction of CTL epitopes for such genomes is restricted to the
conserved regions. |
HLA-peptide binding predictions |
Anticipation of CTL epitopes lies fundamentally in the prediction of
HLA peptide binders, and this is accomplished using specific profile
motifs of known HLA peptide binders (Reche et al., 2002). For the
prediction of peptide-HLA binding these profiles or Position Specific
Scoring Matrices (PSSMs) are used in combination with a modified version
of the RANKPEP
algorithm that scores and ranks all peptides within a query of protein/s
accordingly. In RANKPEP, scoring and ranking of peptides is
applied locally and independently for each of the proteins entered in
the query (scores are not compared between peptides from different
proteins). In contrast, in the modified version of RANKPEP used by
PEPVAC, peptide scoring occurs globally (independently of protein
source), and all peptides are ranked and sorted accordingly. Therefore, a
new field in the output have now been added to indicate the protein
source of the peptide.
We have reported previously that approximately 80% of all known
MHCI-restricted epitopes are found among the 2% of top scoring peptides
from their protein sources (Reche et al., 2002), and thus, in this
version of PEPVAC we have considered as true binders only the 2% of top
scoring peptides. Also as the vast majority of known peptides binding
to HLA molecules are 9 mers, we have only considered profiles for the
prediction of HLA-peptide binders 9 residues long.
Profiles or PSSMs basically consist of a table containing the
sequence-weighted frequency of each one of the 20 amino acids observed
in every column of the alignment divided by the corresponding expected
frequency of that amino acid in the background (usually the frequency
of the amino acid in the SWISSPROT database). An example PSSM be found
here
. Profiles were derived as indicated elsewhere (Reche et al., 2002),
although sequences were obtained from the EPIMHC database (Reche et
al., 2003b). |
Supertypes/Coverage |
Supertype is aterm first coined Sette and Sidney (1998) and serves to
indicate a group of HLA alleles that bind a largely overlapping set of
peptides. The concept of supertype is linked to that of the supermotif. HLA
allele bind peptides that fitting a sequence binding motif specific
to that HLA allele. Thus, if a set of peptides can bind at the same
time to distinct HLA alleles, that implies the existence of a peptide
binding supermotif. Sette and Sidney (1998) defined several HLA
supertypes by visual comparison of the reported peptide binding motifs
of individual HLA alleles. We have defined HLA supertypes in different
way, by comparing the overlap between the predicted binding peptides
from a random protein 1000 aa long by a group of HLA specific profile
matrices (2% top scoring peptides were considered binders). We have
thus generated a distance matrix whose coefficients (dij) are inversely
proportional to the number of identical peptide binders (nij) between
any two HLA profiles (dij = 200 -nij). Finally, using a
Fitch-Margoliash clustering algorithm we derived a dendrogram to
determine the kinship among the HLA specific peptide binding sets, and
defined the HLA supertypes accordingly. The dendrogram from which the
HLA Supertypes were drawn can be seen here. The
following supertypes have been included in this version of PEPVAC:
-
A2: A*0201, A*0202, A*0203, A*0205, A*0206
-
A3: A*0301, A*1101, A*3101, A*3301, A*6801
-
A24: A*2301, A*2402, A*2403, A*2405, A*2407
-
B7: B*0702, B*3501, B*5101, B*5301, B*5401
-
B15: A*0101, B*1501_B62, B1502
There is virtually no overlapping between the peptide binders from two
different supertypes, and only the promiscuous epitopes of each
supertype are selected as potential epitopes (those binding to all
alleles included in that supertype). This minimizes the number
of peptides without compromising the population coverage required in
the design of multi-epitope vaccines. Indeed, these supertypes were
selected on the basis that providing epitope prediction for all the
included HLA alleles will result in broadly efficacious peptide-based
vaccine (population coverage about 95 %). Population
coverage for any combination of selected HLA alleles was obtained from
HLA allele and haplotype gene frequencies for 5 major American ethnities
(Black, Caucasian, Hispanic, Native American, and Asian) (Cao et al., 2001), and
it was computed using a modified version of the Schipper et al. (1996)
algorithm. This new algorithm takes in account linkage desiquilibrium between alleles
of different loci from the haplotype frequencies. Only haplotype
frequencies between the HLA-A and -B loci and between the HLA-B and -C
loci were included in these calculations. No linkage desequilibrum was
considered between the HLA-A and -C genes. For any combination of HLA
alleles the population coverage reported by PEPVAC corresponds to that
of the ethnic group with the lowest coverage. |
Proteasome cleavage |
Class I restricted peptides result from the processing of cytosolic
proteins, and involves cleavage by the proteosome, cytosolic
N-terminal exopeptidases, TAP mediated transport of peptides to the
Endopasmic Reticulum (ER), and finally ER N-terminal exopeptidases
(Serwold et al., 2002). Thus, the N-terminus of
any class I restricted peptide is highly variable, since it is dictated
by the progresive catalytic action of several aminopeptidases. On the
other hand, the C-terminus of class I restricted epitopes is the direct
result of the activity of the proteosome. Thus, proteosome cleavage
predictions help to refine and reduce the number of predicted epitopes,
and hence we have modeled the probability of the C-terminus of a given
peptide to be the result of proteosomal cleavage.
Probabilistic models for protosemal cleavage were generated using the
SRILM statistical language model toolkit
(Stolcke, 2002) from a training set of protein fragments containing the
C-terminal end of 332 selected class I restricted epitopes. The length
of the fragment in the training set varies for each model implemented
in PEPVAC (10, 6, and 4, for models 1, 2 and 3 respectively), and the
C-terminal end of the class I restricted epitope is indicated by a
symbol tag ("|") and flanked by the same number of residues
on both sides. Training involves using a fixed window size (order)
which is the segment of the protein fragment that is processed by the
training algorithm (the training window is smaller than the size of the
fragment(2, 4, 2, for models 1, 2 and 3 respectively). The model thus created is then applied to a longer test
peptide or complete protein, using a testing window (order)- which is
the segment of the peptide that is processed by the algorithm to
determine the probability that cleavage will take place at the index
point of the window. The model is given a cutpoint threshold; cutpoint
probabilities above this threshold result in the prediction of a
cutpoint. Thus, the variable parameters in each cleavage prediction are
the length of the fragments used to create the training set, the window
size used to train the model and for determining the cutpoint
probabilities in the tested peptide, and the cutpoint insertion
threshold. Sensitivity of the three models used is above 80%, as tested
in a set of 932 MHCI naturally restricted peptides. |
Team |
Dr. Pedro A. Reche
JP Glutting, MPH
For question please contact Pedro
A. Reche |
References |
-
von Boehmer H. Positive and negative selection of the alpha beta T-cell
repertoire in vivo. Curr Opin Immunol. 1991 Apr;3(2):210-5. Review.
-
Zinkernagel RM. Immunology taught by viruses. Science. 1996 Jan
12;271(5246):173-8. Review.
-
Schaible UE, Collins HL, Kaufmann SH. Confrontation between
intracellular bacteria and the immune system. Adv Immunol.
1999;71:267-377. Review. No abstract available.
-
Wang RF, Rosenberg SA. Human tumor antigens recognized by T
lymphocytes: implications for cancer therapy. J Leukoc Biol. 1996
Sep;60(3):296-309. Review.
-
Reche PA, Glutting JP, Reinherz EL. Prediction of MHC class I binding
peptides using profile motifs.Hum Immunol. 2002 Sep;63(9):701-9.
-
Craiu A, Akopian T, Goldberg A, Rock KL. Two distinct proteolytic
processes in the generation of a major histocompatibility complex class
I-presented peptide. Proc Natl Acad Sci U S A. 1997 Sep
30;94(20):10850-5.
-
Reche PA, Reinherz EL. Sequence variability Analysis of human class I
and class II MHC molecules: fuctional and structural correlates of
polymorphisms. J. Mol. Biol. In press
-
HLA 1998. David W. Gjertson and Paul I. Terasaki, Editors
-
Reche PA, Glutting, JP, Reinherz, EL. 2003 EPIMHC database. In
preparation
-
Cao K, Hollenbach J, Shi X, Shi W, Chopek M, Fernandez-Vina MA.
Analysis of the frequencies of HLA-A, B, and C alleles and haplotypes
in the five major ethnic groups of the United States reveals high
levels of diversity in these loci and contrasting distribution patterns
in these populations. Hum Immunol. 2001 Sep;62(9):1009-30.
-
Schipper RF, van Els CA, D'Amaro J, Oudshoorn M. Minimal phenotype
panels. A method for achieving maximum population coverage with a
minimum of HLA antigens. Hum Immunol. 1996 Dec;51(2):95-8.
-
Serwold T, Gonzalez F, Kim J, Jacob R, Shastri N. ERAAP customizes
peptides for MHC class I molecules in the endoplasmic reticulum.
Nature. 2002 Oct 3;419(6906):480-3.
-
Stolke A. An Extensible Language Modeling Toolkit", in Proc. Intl.
Conf. Spoken Language Processing, Denver, Colorado, September 2002
|
|
|