HELP

VARIABILITY METHODS USER GUIDE

VARIABILITY METHODS

Shannon Entropy

Shannon entropy analysis (Shannon, 1948 ) is possibly the most sensitive tool to estimate the diversity of a system. For a multiple protein sequence alignment, the Shannon entropy (H) for every position is as follow:

Where Pi is the fraction of residues of amino acid type i, and M is the number of amino acid types (20).
H ranges from 0 (only one residue in present at that position) to 4.322 (all 20 residues are equally represented in that position). Typically, positions with H >2.0 are considerered variable, whereas those with H < 2 are consider conserved. Highly conserved positions are those with H <1.0 (Litwin and Jores, 1992). A minimum number of sequences is however required (~100) for H to describe the diversity of a protein family.

Simpson

The Simpson index is another diversity index calculated from genotype proportions. Below is the formula used to compute it:

This index describes the chance that two genotypes sampled at random and with replacement from a community will be from the same species. The value of this index ranges between 0 and 1, the greater the value, the greater the sample diversity.

Wu-kabat

The Wu-Kabat variability coefficient is a well-established descriptor of the susceptibility of an amino acid position to evolutionary replacements(1977). It highlights stretches of accentuated amino acid variation. The variability coefficient is computed using the following formula:



USERGUIDE

Input

  • Protein Alignment
  • When this option is selected, a multiple sequence alignment in Clustal format must be provided. Only the standard 20 amino acids should be included in the alignment. If other sequence characters are included (e.g. X) the server will return an error message.

    A typical example of Clustal alignment is the following:
    CLUSTAL W (1.81) multiple sequence alignment
    
    
    hla_a68w_1HSB       SHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI
    hla_a0201_1DUY      SHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI
    hla_b3501_1A1N      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI
    hla_b5301_1A1M      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI
    hla_b5101_1E27      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWI
    hla_b2701_1HSA      SHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWI
    hla_cw3_1EFX        SHSMRYFYTAVSRPGRGEPHFIAVGYVDDTQFVRFDSDAASPRGEPRAPWV
    hla-cw4_1IM9        SHSMRYFSTSVSWPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPREPWV
    mkb_2vaa            PHSLRYFVTAVSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWM
    db-1BZ9             PHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWM
                        .**:*** *::* **  **::: *****:. ******** . * ***  *:
    
    hla_a68w_1HSB       RNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGS
    hla_a0201_1DUY      GETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGS
    hla_b3501_1A1N      RNTQIFKTNTQTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGP
    hla_b5301_1A1M      RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHIIQRMYGCDLGP
    hla_b5101_1E27      RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHTWQTMYGCDVGP
    hla_b2701_1HSA      RETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGP
    hla_cw3_1EFX        RETQKYKRQAQTDRVSLRNLRGYYNQSEAGSHIIQRMYGCDVGP
    hla-cw4_1IM9        RETQKYKRQAQADRVNLRKLRGYYNQSEDGSHTLQRMFGCDLGP
    mkb_2vaa            RETQKAKGNEQSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGS
    db-1BZ9             RETQKAKGQEQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGS
                         :*:  * : *  * .*     *****  ***  * : **::*. 
    
    

  • PDB File
  • When this option is selected, the program will generate a multiple sequence alignment from the sequence in the PDB file. Additionally, if a Chain identifier is given, the program will select that chain from the PDB file. When no chain is provided, it will select the first chain by default.

    Options

  • Plot variability.
  • Plot variability consists of a graph of the sequence variability plotted against the selected output sequence as shown below. When several variability methods have been selected, their graphs can be displayed by clicking on the method name

  • Mask variability in sequence
  • This option masks in the selected reference sequence those residues with a variability greater or equal than the selected variability threshold. The variability masked sequence is returned in FASTA format (Shown below). When the user clicks on the 'Run Epitope Prediction' button, the returned FASTA sequence will be sent to the RANKPEP algorithm for the anticipation of conserved T-cell epitopes.

  • Conserved Fragments
  • This option identifies those fragments (minimum length selected by user) in the selected reference sequence consisting of consecutive residues whose variability is under the variability threshold. These fragments are returned in a table sorted by their position in the sequence alignment. Since sequence variability provides a means by which some pathogens escape the immune system, this option and that of the sequence variability masking are relevant for vaccine design considerations. It is important however to notice that relevant antigenic regions can be composed of conserved and variant regions. Unfortunatelly, these fragments will not appear in the conserved fragments ouput if they do not have the minimum number of consecutives conserved residues selected by the user.

  • Map structural variability
  • This option maps the sequence variability onto a representative 3D-structure, using the PDB file provided by the user. This is done using a JMOL applet, and for a correct visualization, javascript must be enabled in the browser. By default, the 3-D structure is shown as 'wireframe', although other display options can be selected by the user. For instance, in the image below, the selected option is 'trace'. The 'Back to original mapping' button will restore the sequence variability mapping when the 'Conserved Fragments' option has been selected and the user has clicked on a fragment to locate it on the PDB file.

    Variable Parameters


    References

    Shannon, C. E. (1948) The mathematical theory of communication. The Bell system Technical Journal, 27, 379-423 & 623-656.

    Kabat, E. A., Wu, T. T., and Bilofsky, H. (1977) Unusual distribution of amino acids in complementarity-determing (hypervariable) segments of heavy and light chains of immunoglobulins and their possible roles in specificity of antibody combining sites.J. Biol. Chem.252, 6609-6616.

    Litwin, S. and Jores, R. (1992) In theoretical and experimental insights into immunology, (Edited by Perelson A. S. and Weisbuch G.), Springer-Verlag, Berlin



    Last change: November 2007