MHCLIG HELP | |
|
|
The major histocompatibility complex class I (MHC I) protein family encompass a large number of diverse glycoproteins that
can be divided into classical MHC I molecules (MHC Ia) and non-classical and MHCI-like molecules (MHC Ib molecules) (1).
While all MHC Ia molecules bind peptides, there are known examples of MHC Ib molecules that bind peptides (e.g. HLA-E, HLA-G, etc),
lipids (CD1 antigens, EPCR, etc) or do not have any bound ligand (MICA, MICB, HFE, etc)(2).
Currently, there is a plethora of methods that can predict peptide binding to MHC Ia molecules (3) but, until now, no method is
available to predict whether a given member of the MHC I family can bind any ligand at all, and if so, the nature of such ligand
(peptides or lipids). This is exactly what MHCLIG does.
Models to predict the ligand-type specificity MHC I protein family members were obtained using machine learning (ML) through a classification process. In this approach, the ligand-type specificity of MHCI molecuels is learned from examples consisting of MHCI molecules of known ligand-type specificity. that were collected ex professo.
References
The input of MHCLIG consists of protein sequence/s in FASTA format, which can be pasted or uploaded to the server.
A set of protein sequences in FASTA format follows here:
>H2Q1 GENE ID: 15006 emb|CAA34448.1|
SHSLRYFETSVSRPGFGKPRFISVGYVDDTQFVRFDSDAKNPRYEPRAPWMEQEGPEYWE
RNTRRVKGSEKRFQESLSTLLSYYNQSKGGIHTFQKLSGCDLGSDGRLQSGYLQFAYDGL
DYIALNEDLETWTAADVAAQETRHKWEQAGAAEKHRTYLEGKCLMWLHRYLELGKEMLL
>H2Q2 GENE ID: 15013 emb|CAA41475.1|
SHSMRYFETVVSRPGLGEPRYVSVGYVDDTEFVRFDSDAEKPRYEPRARWMEQEGPEYWE
RITQIAKGHEQWFRVSLRKLLGYYNQSAGGSHTLQEMYGCDVGSDGRLLRGYRQSAYDGC
DYIALNEDLKTWTAKDVAALITRRKWEQDGAAEYYKAYMEGECVQSLRRYLELGKETLL
>MMr1 GENE ID: 15064 ref|NP_032235.1|
THSLRYFRLAVSDPGPVVPEFISVGYVDSHPITTYDSVTRQKEPKAPWMAENLAPDHWER
YTQLLRGWQQTFKAELRHLQRHYNHSGLHTYQRMIGCELLEDGSTTGFLQYAYDGQDFII
FNKDTLSWLAMDYVAHITKQAWEANLHELQYQKNWLEEECIAWLKRFLEYGRDTLE
>H2M1 GENE ID: 224756 gb|AAO50317.1|
SHTLRYVYTLLSWPGPLEPQLIFLGYVDDTQIMGFNSISENLGVESRAPWMYETEEFWEK
TTDNVVREHYILKEIMRSVLHIYNYSIIGYHTIQKTYGCQVMHRRYFSHGFFKLAFNLHD
YITLNEDLKTWRGVGKAGEMLKEMWEKIKYANQVKSFLQITCVNLLHRFLAFGKKSLL
>H2M2 GENE ID: 14990 gb|AAQ81303.1|
SHSLRYFDIAVSRPGLEETHYMTVGYVDDTEFVHFDNEAENPRFEPRVPWMEQMGQKYWD
DQTRIAKAAEQQIRVYFQKLRDYYNQSQNSSHTIQRMTGCYIGPDGHLLHAYRQFGYDGQ
DYLTLNEDLSTWTAADAAAEITRREWEATNVAEFWRVYLEGPCMVWLFKYLTVGNETLL
>H2M9 GENE ID: 14997 gb|EDL23297.1|
SHTLRFVSTFLSWPRHLELQFIFLIYVDETQIMGFNSISESQRMESRVPWLNELNAEFWE
LATQDVLKEKSFVTGIMNKLLHIYNDSMTGYHIIQETYGCQVKQRTYFSHAFMELLFDTH
DYITLNEDLQTWRAVGKAAEIVKEEWEKINLVKSSKSFLLGACVEGLLQYLNFGKKYLL
The MHCI556 training dataset contanins 556 MHCI proteins of known ligad-type specificity. The proteins included in the MHCI556 dataset are shown next.
MHCI |
Species |
Seqs. |
Ligand |
HLA-[ABC] |
Human |
111 |
P |
DLA-88 |
Dog |
22 |
P |
SLA-[123] |
Swine |
51 |
P |
BoLA-N |
Cattle |
39 |
P |
OLA-N |
Sheep |
12 |
P |
ONMY-UBA |
Rainbow trout |
29 |
P |
SASA-UBA |
Atlantic Salmon |
27 |
P |
RT1-A |
Rat |
21 |
P |
H2-X |
Mouse |
26 |
P |
HLA-E |
Human and Primates |
6 |
P |
HLA-G |
Human and primates |
1 |
P |
H2-T23(Qa1) |
Mouse and Rat |
4 |
P |
H2-Q9 |
Mouse |
2 |
P |
H2-M3 |
Mouse and Rat |
4 |
P |
CD1[A-E] |
Vertebrates |
71 |
L |
ZAG |
Vertebrates |
6 |
L |
EPCR |
Vertebrates |
7 |
L |
MICA&B |
Vertebrates |
38 |
N |
HFE |
Vertebrates |
6 |
N |
MILL1&2 |
Mouse and Rat |
4 |
N |
FcRN |
Vertebrates |
9 |
N |
ULPB |
Vertebrates |
45 |
N |
H2-T3(TLA) |
Mouse and Rat |
15 |
N |
We only considered the MHCI α1α2 |
Models build upon the MHCI556 dataset | ||||
Algorithm | SE (%) | SP (%) | ACC (%) | Parameters |
kNN | 100 ± 0 | 100 ± 1 | 99.94 ± 0.42 | K = 4 |
SVM-Pk | 100 ± 0 | 100 ± 2 | 99.46 ± 0.87 | E = 3, C = 1 |
SVM-RBFk | 100 ± 0 | 100 ± 0 | 100.0 ± 0.0 | G = 4, C = 4 |
Models build upon the MHCI500 dataset | ||||
Algorithm | SE (%) | SP (%) | ACC (%) | Parameters |
kNN | 100 ± 0 | 100 ± 1 | 100.0 ± 0.0 | K =1 |
SVM-Pk | 100 ± 0 | 100 ± 2 | 99.42 ± 0.89 | E = 5, C = 1 |
SVM-RBFk | 100 ± 0 | 100 ± 0 | 100.0 ± 0.0 | G = 2, C =1 |
In addition to the ML-based models, MHCLIG also provides a BLAST method to predict the ligand-type specificity of MHCI proteins. BLAST predictions are obtained upon BLAST searches againsts a database consisting of MHCI molecules with their known ligand-type specificity (P,L,N). Subsequently, the Ligand-type specificity of the query is assigned to that of the closest hit. The BLAST formated database was obtained upon the MHCI556 dataset.
The sequences entered in MHCLIG are subjected to a domain search engine to indentify and isolate the amino acid sequence of the MHCI α1α2 domain. Any sequence lacking such domain is discarded for further analysis. Subsequently, the system uses each the selected models to predict the ligand type specificity of MHCI molecules. MHCI α1α2 domain sequences of any size will be subject to the predictive models, but the server wil show warning messages for sequences that have more 190 residues or less that 170 residues.
The output of MHCLIG consists a table indicating whether the MHCI protein sequences entered to the server bind Peptides (P), Lipids (L) or have no ligand(N), as judged by each of the models selected from the webserber front page. A consensus prediction is also reported by the server, consisting in the most common prediction between ML-based models. BLAST-based prediction is not considered for the consensus prediction. A representative output of MHCLIG is shown down below:
Seq # | SEQ. Id | SVM-RBFk | kNN | SVM-Pk | BLAST | CONSENSUS |
---|---|---|---|---|---|---|
1 | H2Q1 | P | P | P | P | P |
2 | H2Q2 | P | P | P | P | P |
3 | mMr1 | N | N | N | P | N |
4 | hMr1 | N | N | N | P | N |
5 | H2M1 | L | P | N | P | N |
6 | H2M2 | P | P | P | P | P |
7 | H2M9 | L | L | L | P | L |
8 | H2M10.1 | N | N | N | P | N |
9 | H2M10.2 | N | N | N | P | N |
10 | H2-M10.3 | N | N | N | P | N |
11 | H2M10.4 | N | P | P | P | N |
12 | H2M10.5 | N | P | P | P | N |
13 | H2M10.6 | N | P | N | P | N |
14 | H2-M11 | L | L | L | P | L |
15 | H2T3 | N | N | N | N | N |
16 | H2T9 | N * | N * | N * | P * | N * |
17 | H2T10 | N * | P * | N * | P * | N * |
18 | H2T22 | N * | N * | N * | P * | N * |
19 | H2T18_TLA | N | N | N | N | N |
20 | H2T24 | N | N | N | P | N |
21 | gi|56541372|HLAF|1 | P | P | P | P | P |
22 | ul18|1 | P | P | P | P | P |
CONTACT: For any questions: Pedro Reche |