MHCLIG Help Page


MHCLIG HELP

HELP INDEX

USER GUIDE

Input
Prediction Models
Processing method and Output

The major histocompatibility complex class I (MHC I) protein family encompass a large number of diverse glycoproteins that can be divided into classical MHC I molecules (MHC Ia) and non-classical and MHCI-like molecules (MHC Ib molecules) (1). While all MHC Ia molecules bind peptides, there are known examples of MHC Ib molecules that bind peptides (e.g. HLA-E, HLA-G, etc), lipids (CD1 antigens, EPCR, etc) or do not have any bound ligand (MICA, MICB, HFE, etc)(2). Currently, there is a plethora of methods that can predict peptide binding to MHC Ia molecules (3) but, until now, no method is available to predict whether a given member of the MHC I family can bind any ligand at all, and if so, the nature of such ligand (peptides or lipids). This is exactly what MHCLIG does.

Models to predict the ligand-type specificity MHC I protein family members were obtained using machine learning (ML) through a classification process. In this approach, the ligand-type specificity of MHCI molecuels is learned from examples consisting of MHCI molecules of known ligand-type specificity. that were collected ex professo.

References

1. Maenaka, K. and Jones, E.Y. (1999) MHC superfamily structure and the immune system. Curr Opin Struct Biol., 9, 745-752.
2. Braud, V.M., Allan, D.S. and McMichael, A.J. (1999) Functions of nonclassical MHC and non-MHC-encoded class I molecules. Curr Opin Immunol., 11, 100-108.
3. Lafuente, E.M. and Reche, P.A. (2009) Prediction of MHC-peptide binding: a systematic and comprehensive overview. Curent Pharmaceutical Design, 15, 3209-3220.
4. Frank, E., Hall, M., Trigg, L., Holmes, G. and Witten, I.H. (2004) Data mining in bioinformatics using Weka. Bioinformatics., 20, 2479-2481. Epub 2004 Apr 2478.
5. Gewehr, J.E., Szugat, M. and Zimmer, R. (2007) BioWeka--extending the Weka framework for bioinformatics. Bioinformatics., 23, 651-653. Epub 2007 Jan 2019.

USER GUIDE

Input

The input of MHCLIG consists of protein sequence/s in FASTA format, which can be pasted or uploaded to the server.
A set of protein sequences in FASTA format follows here:

>H2Q1  GENE ID: 15006 emb|CAA34448.1|
SHSLRYFETSVSRPGFGKPRFISVGYVDDTQFVRFDSDAKNPRYEPRAPWMEQEGPEYWE
RNTRRVKGSEKRFQESLSTLLSYYNQSKGGIHTFQKLSGCDLGSDGRLQSGYLQFAYDGL
DYIALNEDLETWTAADVAAQETRHKWEQAGAAEKHRTYLEGKCLMWLHRYLELGKEMLL
>H2Q2 GENE ID: 15013  emb|CAA41475.1|
SHSMRYFETVVSRPGLGEPRYVSVGYVDDTEFVRFDSDAEKPRYEPRARWMEQEGPEYWE
RITQIAKGHEQWFRVSLRKLLGYYNQSAGGSHTLQEMYGCDVGSDGRLLRGYRQSAYDGC
DYIALNEDLKTWTAKDVAALITRRKWEQDGAAEYYKAYMEGECVQSLRRYLELGKETLL
>MMr1  GENE ID: 15064 ref|NP_032235.1|
THSLRYFRLAVSDPGPVVPEFISVGYVDSHPITTYDSVTRQKEPKAPWMAENLAPDHWER
YTQLLRGWQQTFKAELRHLQRHYNHSGLHTYQRMIGCELLEDGSTTGFLQYAYDGQDFII
FNKDTLSWLAMDYVAHITKQAWEANLHELQYQKNWLEEECIAWLKRFLEYGRDTLE
>H2M1 GENE ID: 224756 gb|AAO50317.1|
SHTLRYVYTLLSWPGPLEPQLIFLGYVDDTQIMGFNSISENLGVESRAPWMYETEEFWEK
TTDNVVREHYILKEIMRSVLHIYNYSIIGYHTIQKTYGCQVMHRRYFSHGFFKLAFNLHD
YITLNEDLKTWRGVGKAGEMLKEMWEKIKYANQVKSFLQITCVNLLHRFLAFGKKSLL
>H2M2 GENE ID: 14990 gb|AAQ81303.1|
SHSLRYFDIAVSRPGLEETHYMTVGYVDDTEFVHFDNEAENPRFEPRVPWMEQMGQKYWD
DQTRIAKAAEQQIRVYFQKLRDYYNQSQNSSHTIQRMTGCYIGPDGHLLHAYRQFGYDGQ
DYLTLNEDLSTWTAADAAAEITRREWEATNVAEFWRVYLEGPCMVWLFKYLTVGNETLL
>H2M9 GENE ID: 14997 gb|EDL23297.1|
SHTLRFVSTFLSWPRHLELQFIFLIYVDETQIMGFNSISESQRMESRVPWLNELNAEFWE
LATQDVLKEKSFVTGIMNKLLHIYNDSMTGYHIIQETYGCQVKQRTYFSHAFMELLFDTH
DYITLNEDLQTWRAVGKAAEIVKEEWEKINLVKSSKSFLLGACVEGLLQYLNFGKKYLL

Prediction Models

MHCLIG ML-models to predict the ligand-type specificity of classical and non-classical MHCI molecules were built upon two distinct datasets, MHCI⁵⁵⁶ and MHCI⁵⁰⁰.

The MHCI⁵⁵⁶ training dataset contanins 556 MHCI proteins of known ligad-type specificity. The proteins included in the MHCI⁵⁵⁶ dataset are shown next.

MHCI	Species	Seqs.	Ligand
HLA-[ABC]	Human	111	P
DLA-88	Dog	22	P
SLA-[123]	Swine	51	P
BoLA-N	Cattle	39	P
OLA-N	Sheep	12	P
ONMY-UBA	Rainbow trout	29	P
SASA-UBA	Atlantic Salmon	27	P
RT1-A	Rat	21	P
H2-X	Mouse	26	P
HLA-E	Human and Primates	6	P
HLA-G	Human and primates	1	P
H2-T23(Qa1)	Mouse and Rat	4	P
H2-Q9	Mouse	2	P
H2-M3	Mouse and Rat	4	P
CD1[A-E]	Vertebrates	71	L
ZAG	Vertebrates	6	L
EPCR	Vertebrates	7	L
MICA&B	Vertebrates	38	N
HFE	Vertebrates	6	N
MILL1&2	Mouse and Rat	4	N
FcRN	Vertebrates	9	N
ULPB	Vertebrates	45	N
H2-T3(TLA)	Mouse and Rat	15	N
We only considered the MHCI α1α2

The MHCI⁵⁰⁰ dataset was derived from the MHCI⁵⁵⁶ dataset by removing all classical MHCI molecules from fish (ONMY-UBA and SASA-UBA).

ML-models were built upon these two datasets using K-nearest Neighbor algorithm (kNN), and support vector machines (SVMs) with polynomial (SVM-Pk) and RBF-kernels (SVM-RBFk). We built and evaluated the models using 10-fold cross-validations. The performance of the relevant models trained on the MHCI⁵⁵⁶ and MHCI⁵⁰ is shown bellow.

Models build upon the MHCI⁵⁵⁶ dataset
Algorithm SE (%) SP (%) ACC (%) Parameters

kNN 100 ± 0 100 ± 1 99.94 ± 0.42 K = 4

SVM-Pk 100 ± 0 100 ± 2 99.46 ± 0.87 E = 3, C = 1

SVM-RBFk 100 ± 0 100 ± 0 100.0 ± 0.0 G = 4, C = 4

Models build upon the MHCI⁵⁰⁰ dataset
Algorithm SE (%) SP (%) ACC (%) Parameters

kNN 100 ± 0 100 ± 1 100.0 ± 0.0 K =1

SVM-Pk 100 ± 0 100 ± 2 99.42 ± 0.89 E = 5, C = 1

SVM-RBFk 100 ± 0 100 ± 0 100.0 ± 0.0 G = 2, C =1

In addition to the ML-based models, MHCLIG also provides a BLAST method to predict the ligand-type specificity of MHCI proteins. BLAST predictions are obtained upon BLAST searches againsts a database consisting of MHCI molecules with their known ligand-type specificity (P,L,N). Subsequently, the Ligand-type specificity of the query is assigned to that of the closest hit. The BLAST formated database was obtained upon the MHCI⁵⁵⁶ dataset.

PROCESSING METHOD AND OUTPUT

The sequences entered in MHCLIG are subjected to a domain search engine to indentify and isolate the amino acid sequence of the MHCI α1α2 domain. Any sequence lacking such domain is discarded for further analysis. Subsequently, the system uses each the selected models to predict the ligand type specificity of MHCI molecules. MHCI α1α2 domain sequences of any size will be subject to the predictive models, but the server wil show warning messages for sequences that have more 190 residues or less that 170 residues.

The output of MHCLIG consists a table indicating whether the MHCI protein sequences entered to the server bind Peptides (P), Lipids (L) or have no ligand(N), as judged by each of the models selected from the webserber front page. A consensus prediction is also reported by the server, consisting in the most common prediction between ML-based models. BLAST-based prediction is not considered for the consensus prediction. A representative output of MHCLIG is shown down below:

Seq # SEQ. Id SVM-RBFk kNN SVM-Pk BLAST CONSENSUS

1 H2Q1 P P P P P

2 H2Q2 P P P P P

3 mMr1 N N N P N

4 hMr1 N N N P N

5 H2M1 L P N P N

6 H2M2 P P P P P

7 H2M9 L L L P L

8 H2M10.1 N N N P N

9 H2M10.2 N N N P N

10 H2-M10.3 N N N P N

11 H2M10.4 N P P P N

12 H2M10.5 N P P P N

13 H2M10.6 N P N P N

14 H2-M11 L L L P L

15 H2T3 N N N N N

16 H2T9 N * N * N * P * N *

17 H2T10 N * P * N * P * N *

18 H2T22 N * N * N * P * N *

19 H2T18_TLA N N N N N

20 H2T24 N N N P N

21 gi|56541372|HLAF|1 P P P P P

22 ul18|1 P P P P P

P: Peptide, L:Ligand, N:Null

Predictions marked with * are too short for our prediction methods

CONTACT: For any questions: Pedro Reche

Seq #	SEQ. Id	SVM-RBFk	kNN	SVM-Pk	BLAST	CONSENSUS
1	H2Q1	P	P	P	P	P
2	H2Q2	P	P	P	P	P
3	mMr1	N	N	N	P	N
4	hMr1	N	N	N	P	N
5	H2M1	L	P	N	P	N
6	H2M2	P	P	P	P	P
7	H2M9	L	L	L	P	L
8	H2M10.1	N	N	N	P	N
9	H2M10.2	N	N	N	P	N
10	H2-M10.3	N	N	N	P	N
11	H2M10.4	N	P	P	P	N
12	H2M10.5	N	P	P	P	N
13	H2M10.6	N	P	N	P	N
14	H2-M11	L	L	L	P	L
15	H2T3	N	N	N	N	N
16	H2T9	N *	N *	N *	P *	N *
17	H2T10	N *	P *	N *	P *	N *
18	H2T22	N *	N *	N *	P *	N *
19	H2T18_TLA	N	N	N	N	N
20	H2T24	N	N	N	P	N
21	gi\|56541372\|HLAF\|1	P	P	P	P	P
22	ul18\|1	P	P	P	P	P

Last change: November 2009

Models build upon the MHCI⁵⁵⁶ dataset
Algorithm	SE (%)	SP (%)	ACC (%)	Parameters
kNN	100 ± 0	100 ± 1	99.94 ± 0.42	K = 4
SVM-Pk	100 ± 0	100 ± 2	99.46 ± 0.87	E = 3, C = 1
SVM-RBFk	100 ± 0	100 ± 0	100.0 ± 0.0	G = 4, C = 4
Models build upon the MHCI⁵⁰⁰ dataset
Algorithm	SE (%)	SP (%)	ACC (%)	Parameters
kNN	100 ± 0	100 ± 1	100.0 ± 0.0	K =1
SVM-Pk	100 ± 0	100 ± 2	99.42 ± 0.89	E = 5, C = 1
SVM-RBFk	100 ± 0	100 ± 0	100.0 ± 0.0	G = 2, C =1

BACKGROUND

USER GUIDE

CONTACT

Input

Prediction Models