MHCLIG HELP

HELP INDEX

  • BACKGROUND
  • USER GUIDE
  • CONTACT

  • BACKGROUND

           The major histocompatibility complex class I (MHC I) protein family encompass a large number of diverse glycoproteins that can be divided into classical MHC I molecules (MHC Ia) and non-classical and MHCI-like molecules (MHC Ib molecules) (1). While all MHC Ia molecules bind peptides, there are known examples of MHC Ib molecules that bind peptides (e.g. HLA-E, HLA-G, etc), lipids (CD1 antigens, EPCR, etc) or do not have any bound ligand (MICA, MICB, HFE, etc)(2). Currently, there is a plethora of methods that can predict peptide binding to MHC Ia molecules (3) but, until now, no method is available to predict whether a given member of the MHC I family can bind any ligand at all, and if so, the nature of such ligand (peptides or lipids). This is exactly what MHCLIG does.

           Models to predict the ligand-type specificity MHC I protein family members were obtained using machine learning (ML) through a classification process. In this approach, the ligand-type specificity of MHCI molecuels is learned from examples consisting of MHCI molecules of known ligand-type specificity. that were collected ex professo.

    References

    USER GUIDE

  • Input
  • The input of MHCLIG consists of protein sequence/s in FASTA format, which can be pasted or uploaded to the server.
    A set of protein sequences in FASTA format follows here:

    >H2Q1  GENE ID: 15006 emb|CAA34448.1|
    SHSLRYFETSVSRPGFGKPRFISVGYVDDTQFVRFDSDAKNPRYEPRAPWMEQEGPEYWE
    RNTRRVKGSEKRFQESLSTLLSYYNQSKGGIHTFQKLSGCDLGSDGRLQSGYLQFAYDGL
    DYIALNEDLETWTAADVAAQETRHKWEQAGAAEKHRTYLEGKCLMWLHRYLELGKEMLL
    >H2Q2 GENE ID: 15013  emb|CAA41475.1|
    SHSMRYFETVVSRPGLGEPRYVSVGYVDDTEFVRFDSDAEKPRYEPRARWMEQEGPEYWE
    RITQIAKGHEQWFRVSLRKLLGYYNQSAGGSHTLQEMYGCDVGSDGRLLRGYRQSAYDGC
    DYIALNEDLKTWTAKDVAALITRRKWEQDGAAEYYKAYMEGECVQSLRRYLELGKETLL
    >MMr1  GENE ID: 15064 ref|NP_032235.1|
    THSLRYFRLAVSDPGPVVPEFISVGYVDSHPITTYDSVTRQKEPKAPWMAENLAPDHWER
    YTQLLRGWQQTFKAELRHLQRHYNHSGLHTYQRMIGCELLEDGSTTGFLQYAYDGQDFII
    FNKDTLSWLAMDYVAHITKQAWEANLHELQYQKNWLEEECIAWLKRFLEYGRDTLE
    >H2M1 GENE ID: 224756 gb|AAO50317.1|
    SHTLRYVYTLLSWPGPLEPQLIFLGYVDDTQIMGFNSISENLGVESRAPWMYETEEFWEK
    TTDNVVREHYILKEIMRSVLHIYNYSIIGYHTIQKTYGCQVMHRRYFSHGFFKLAFNLHD
    YITLNEDLKTWRGVGKAGEMLKEMWEKIKYANQVKSFLQITCVNLLHRFLAFGKKSLL
    >H2M2 GENE ID: 14990 gb|AAQ81303.1|
    SHSLRYFDIAVSRPGLEETHYMTVGYVDDTEFVHFDNEAENPRFEPRVPWMEQMGQKYWD
    DQTRIAKAAEQQIRVYFQKLRDYYNQSQNSSHTIQRMTGCYIGPDGHLLHAYRQFGYDGQ
    DYLTLNEDLSTWTAADAAAEITRREWEATNVAEFWRVYLEGPCMVWLFKYLTVGNETLL
    >H2M9 GENE ID: 14997 gb|EDL23297.1|
    SHTLRFVSTFLSWPRHLELQFIFLIYVDETQIMGFNSISESQRMESRVPWLNELNAEFWE
    LATQDVLKEKSFVTGIMNKLLHIYNDSMTGYHIIQETYGCQVKQRTYFSHAFMELLFDTH
    DYITLNEDLQTWRAVGKAAEIVKEEWEKINLVKSSKSFLLGACVEGLLQYLNFGKKYLL
    


  • Prediction Models
  •        MHCLIG ML-models to predict the ligand-type specificity of classical and non-classical MHCI molecules were built upon two distinct datasets, MHCI556 and MHCI500.

           The MHCI556 training dataset contanins 556 MHCI proteins of known ligad-type specificity. The proteins included in the MHCI556 dataset are shown next.

    MHCI

    Species

    Seqs.

    Ligand

    HLA-[ABC]

    Human

    111

    P

    DLA-88

    Dog

    22

    P

    SLA-[123]

    Swine

    51

    P

    BoLA-N

    Cattle

    39

    P

    OLA-N

    Sheep

    12

    P

    ONMY-UBA

    Rainbow trout

    29

    P

    SASA-UBA

    Atlantic Salmon

    27

    P

    RT1-A

    Rat

    21

    P

    H2-X

    Mouse

    26

    P

    HLA-E

    Human and Primates

    6

    P

    HLA-G

    Human and primates

    1

    P

    H2-T23(Qa1)

    Mouse and Rat

    4

    P

    H2-Q9

    Mouse

    2

    P

    H2-M3

    Mouse and Rat

    4

    P

    CD1[A-E]

    Vertebrates

    71

    L

    ZAG

    Vertebrates

    6

    L

    EPCR

    Vertebrates

    7

    L

    MICA&B

    Vertebrates

    38

    N

    HFE

    Vertebrates

    6

    N

    MILL1&2

    Mouse and Rat

    4

    N

    FcRN

    Vertebrates

    9

    N

    ULPB

    Vertebrates

    45

    N

    H2-T3(TLA)

    Mouse and Rat

    15

    N

    We only considered the MHCI α1α2

          The MHCI500 dataset was derived from the MHCI556 dataset by removing all classical MHCI molecules from fish (ONMY-UBA and SASA-UBA).

    ML-models were built upon these two datasets using K-nearest Neighbor algorithm (kNN), and support vector machines (SVMs) with polynomial (SVM-Pk) and RBF-kernels (SVM-RBFk). We built and evaluated the models using 10-fold cross-validations. The performance of the relevant models trained on the MHCI556 and MHCI50 is shown bellow.

    Models build upon the MHCI556 dataset
    AlgorithmSE (%)SP (%)ACC (%)Parameters
    kNN100 ± 0100 ± 199.94 ± 0.42K = 4
    SVM-Pk100 ± 0100 ± 299.46 ± 0.87E = 3, C = 1
    SVM-RBFk100 ± 0100 ± 0100.0 ± 0.0 G = 4, C = 4
    Models build upon the MHCI500 dataset
    AlgorithmSE (%)SP (%)ACC (%)Parameters
    kNN100 ± 0100 ± 1100.0 ± 0.0K =1
    SVM-Pk100 ± 0100 ± 299.42 ± 0.89E = 5, C = 1
    SVM-RBFk100 ± 0100 ± 0100.0 ± 0.0 G = 2, C =1

          In addition to the ML-based models, MHCLIG also provides a BLAST method to predict the ligand-type specificity of MHCI proteins. BLAST predictions are obtained upon BLAST searches againsts a database consisting of MHCI molecules with their known ligand-type specificity (P,L,N). Subsequently, the Ligand-type specificity of the query is assigned to that of the closest hit. The BLAST formated database was obtained upon the MHCI556 dataset.



    PROCESSING METHOD AND OUTPUT

           The sequences entered in MHCLIG are subjected to a domain search engine to indentify and isolate the amino acid sequence of the MHCI α1α2 domain. Any sequence lacking such domain is discarded for further analysis. Subsequently, the system uses each the selected models to predict the ligand type specificity of MHCI molecules. MHCI α1α2 domain sequences of any size will be subject to the predictive models, but the server wil show warning messages for sequences that have more 190 residues or less that 170 residues.

           The output of MHCLIG consists a table indicating whether the MHCI protein sequences entered to the server bind Peptides (P), Lipids (L) or have no ligand(N), as judged by each of the models selected from the webserber front page. A consensus prediction is also reported by the server, consisting in the most common prediction between ML-based models. BLAST-based prediction is not considered for the consensus prediction. A representative output of MHCLIG is shown down below:

    Seq #SEQ. Id SVM-RBFkkNNSVM-PkBLASTCONSENSUS
    1H2Q1PPPPP
    2H2Q2PPPPP
    3mMr1NNNPN
    4hMr1NNNPN
    5H2M1LPNPN
    6H2M2PPPPP
    7H2M9LLLPL
    8H2M10.1NNNPN
    9H2M10.2NNNPN
    10H2-M10.3NNNPN
    11H2M10.4NPPPN
    12H2M10.5NPPPN
    13H2M10.6NPNPN
    14H2-M11LLLPL
    15H2T3NNNNN
    16H2T9N *N *N *P *N *
    17H2T10N *P *N *P *N *
    18H2T22N *N *N *P *N *
    19H2T18_TLANNNNN
    20H2T24NNNPN
    21gi|56541372|HLAF|1PPPPP
    22ul18|1PPPPP


    P: Peptide, L:Ligand, N:Null

    Predictions marked with * are too short for our prediction methods

    CONTACT: For any questions: Pedro Reche



    Last change: November 2009