Protein Variability Server Help

HELP

VARIABILITY METHODS

Shannon Entropy
Simpson
Wu-Kabat

USER GUIDE

Input

Protein Alignment
PDB File

Options

Plot Variability
Mask Variability
Return conserved fragments
Map structural variability

Variable Parameters

Variability threshold
Fragment Length
Reference Sequence

VARIABILITY METHODS

Shannon Entropy

Shannon entropy analysis (Shannon, 1948 ) is possibly the most sensitive tool to estimate the diversity of a system. For a multiple protein sequence alignment, the Shannon entropy (H) for every position is as follow:

Where Pi is the fraction of residues of amino acid type i, and M is the number of amino acid types (20).
H ranges from 0 (only one residue in present at that position) to 4.322 (all 20 residues are equally represented in that position). Typically, positions with H >2.0 are considerered variable, whereas those with H < 2 are consider conserved. Highly conserved positions are those with H <1.0 (Litwin and Jores, 1992). A minimum number of sequences is however required (~100) for H to describe the diversity of a protein family.

Simpson

The Simpson index is another diversity index calculated from genotype proportions. Below is the formula used to compute it:

This index describes the chance that two genotypes sampled at random and with replacement from a community will be from the same species. The value of this index ranges between 0 and 1, the greater the value, the greater the sample diversity.

Wu-kabat

The Wu-Kabat variability coefficient is a well-established descriptor of the susceptibility of an amino acid position to evolutionary replacements(1977). It highlights stretches of accentuated amino acid variation. The variability coefficient is computed using the following formula:

USERGUIDE

Input

Protein Alignment

When this option is selected, a multiple sequence alignment in Clustal format must be provided. Only the standard 20 amino acids should be included in the alignment. If other sequence characters are included (e.g. X) the server will return an error message.

A typical example of Clustal alignment is the following:

CLUSTAL W (1.81) multiple sequence alignment


hla_a68w_1HSB       SHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI
hla_a0201_1DUY      SHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI
hla_b3501_1A1N      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI
hla_b5301_1A1M      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI
hla_b5101_1E27      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWI
hla_b2701_1HSA      SHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWI
hla_cw3_1EFX        SHSMRYFYTAVSRPGRGEPHFIAVGYVDDTQFVRFDSDAASPRGEPRAPWV
hla-cw4_1IM9        SHSMRYFSTSVSWPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPREPWV
mkb_2vaa            PHSLRYFVTAVSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWM
db-1BZ9             PHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWM
                    .**:*** *::* **  **::: *****:. ******** . * ***  *:

hla_a68w_1HSB       RNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGS
hla_a0201_1DUY      GETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGS
hla_b3501_1A1N      RNTQIFKTNTQTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGP
hla_b5301_1A1M      RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHIIQRMYGCDLGP
hla_b5101_1E27      RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHTWQTMYGCDVGP
hla_b2701_1HSA      RETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGP
hla_cw3_1EFX        RETQKYKRQAQTDRVSLRNLRGYYNQSEAGSHIIQRMYGCDVGP
hla-cw4_1IM9        RETQKYKRQAQADRVNLRKLRGYYNQSEDGSHTLQRMFGCDLGP
mkb_2vaa            RETQKAKGNEQSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGS
db-1BZ9             RETQKAKGQEQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGS
                     :*:  * : *  * .*     *****  ***  * : **::*.

PDB File

When this option is selected, the program will generate a multiple sequence alignment from the sequence in the PDB file. Additionally, if a Chain identifier is given, the program will select that chain from the PDB file. When no chain is provided, it will select the first chain by default.

Options

Variable Parameters

Variability Threshold:
This parameter has to be set within the range of the Shannon Entropy values (0 to 4.3), and only one decimal is allowed. Those positions with a value of H above the threshold are filtered out. The default value is set to 1.3. Positions with H < 1.3 are considered of low variability (highly conserved).

Fragment Length:
This parameter sets the minimun length of the fragment. Each of the fragment residues has a H that is under the threshold value. Only the longest stretch of residues with H under the threshold is listed.
Reference Sequence:
The reference sequence can either be a consensus sequence or the first sequence in the alignment. The second choice is particularly useful if the user has additional information on a given sequence, and wants to set it as the standard. By default, the consensus sequence is selected.

References

Shannon, C. E. (1948) The mathematical theory of communication. The Bell system Technical Journal, 27, 379-423 & 623-656.

Kabat, E. A., Wu, T. T., and Bilofsky, H. (1977) Unusual distribution of amino acids in complementarity-determing (hypervariable) segments of heavy and light chains of immunoglobulins and their possible roles in specificity of antibody combining sites.J. Biol. Chem.252, 6609-6616.

Litwin, S. and Jores, R. (1992) In theoretical and experimental insights into immunology, (Edited by Perelson A. S. and Weisbuch G.), Springer-Verlag, Berlin

Last change: November 2007