SIAS  
Secuences Identites And Similarities

HELP


Input: Multiple Sequence Alignment

The input for this program is a multiple sequence alignment (MSA). MSA can be of proteins or DNA. Howerver, percentage similarity and normilized similarity scores calculated by the server only applies to proteins.The MSA can either be pasted or uploaded from a file.

Currently, the server accepts MSA in 3 different formats, Clustalw, FASTA GCG/PileUp or MSF. Nest follows an exmample MSA formated int the mentioned formats.

  • Clustalw MSA alignment:
  • CLUSTAL W (1.81) multiple sequence alignment
    
    
    hla_a68w_1HSB       SHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI
    hla_a0201_1DUY      SHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI
    hla_b3501_1A1N      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI
    hla_b5301_1A1M      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI
    hla_b5101_1E27      SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWI
    hla_b2701_1HSA      SHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWI
    hla_cw3_1EFX        SHSMRYFYTAVSRPGRGEPHFIAVGYVDDTQFVRFDSDAASPRGEPRAPWV
    hla-cw4_1IM9        SHSMRYFSTSVSWPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPREPWV
    mkb_2vaa            PHSLRYFVTAVSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWM
    db-1BZ9             PHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWM
                        .**:*** *::* **  **::: *****:. ******** . * ***  *:
    
    hla_a68w_1HSB       RNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGS
    hla_a0201_1DUY      GETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGS
    hla_b3501_1A1N      RNTQIFKTNTQTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGP
    hla_b5301_1A1M      RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHIIQRMYGCDLGP
    hla_b5101_1E27      RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHTWQTMYGCDVGP
    hla_b2701_1HSA      RETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGP
    hla_cw3_1EFX        RETQKYKRQAQTDRVSLRNLRGYYNQSEAGSHIIQRMYGCDVGP
    hla-cw4_1IM9        RETQKYKRQAQADRVNLRKLRGYYNQSEDGSHTLQRMFGCDLGP
    mkb_2vaa            RETQKAKGNEQSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGS
    db-1BZ9             RETQKAKGQEQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGS
                         :*:  * : *  * .*     *****  ***  * : **::*. 
    
    

  • FASTA alignment
  • >hla_b5101_1E27
    SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWIRNTQIFKTN
    TQTYRENLRIALRYYNQSEAGSHTWQTMYGCDVGP
    >hla_a0201_1DUY
    SHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIGETRKVKAH
    SQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGS
    >hla-cw4_1IM9
    SHSMRYFSTSVSWPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPREPWVRETQKYKRQ
    AQADRVNLRKLRGYYNQSEDGSHTLQRMFGCDLGP
    >mkb_2vaa
    PHSLRYFVTAVSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWMRETQKAKGN
    EQSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGS
    >hla_b2701_1HSA
    SHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWIRETQICKAK
    AQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGP
    >hla_cw3_1EFX
    SHSMRYFYTAVSRPGRGEPHFIAVGYVDDTQFVRFDSDAASPRGEPRAPWVRETQKYKRQ
    AQTDRVSLRNLRGYYNQSEAGSHIIQRMYGCDVGP
    >hla_b3501_1A1N
    SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWIRNTQIFKTN
    TQTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGP
    >db-1BZ9
    PHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWMRETQKAKGQ
    EQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGS
    >hla_a68w_1HSB
    SHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIRNTRNVKAQi
    SQTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGS
    

  • GCG/PileUp alignment
  • PileUp
    
    
    
      MSF:  95 Type: P  Check: 5477  ..
    
    Name: hla_a68w_1HSB oo Len:  95 Check: 5515 Weight: 0.0
    Name: hla_a0201_1DUY oo Len:  95 Check: 4661 Weight: 10.0
    Name: hla_b3501_1A1N oo Len:  95 Check: 4585 Weight: 10.0
    Name: hla_b5301_1A1M oo Len:  95 Check: 4402 Weight: 10.0
    Name: hla_b5101_1E27 oo Len:  95 Check: 4791 Weight: 10.0
    Name: hla_b2701_1HSA oo Len:  95 Check: 3347 Weight: 10.0
    Name: hla_cw3_1EFX oo Len:  95 Check: 4868 Weight: 10.0
    Name: hla-cw4_1IM9 oo Len:  95 Check: 4736 Weight: 10.0
    Name: mkb_2vaa oo Len:  95 Check: 4517 Weight: 10.0
    Name: db-1BZ9 oo Len:  95 Check: 4055 Weight: 10.0
    
    //
    
    
    
    hla_a68w_1HSB    SHSMRYFYTS VSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SQRMEPRAPW
    hla_a0201_1DUY   SHSMRYFFTS VSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SQRMEPRAPW
    hla_b3501_1A1N   SHSMRYFYTA MSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRTEPRPPW
    hla_b5301_1A1M   SHSMRYFYTA MSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRTEPRPPW
    hla_b5101_1E27   SHSMRYFYTA MSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRTEPRAPW
    hla_b2701_1HSA   SHSMRYFHTS VSRPGRGEPR FITVGYVDDT LFVRFDSDAA SPREEPRAPW
    hla_cw3_1EFX     SHSMRYFYTA VSRPGRGEPH FIAVGYVDDT QFVRFDSDAA SPRGEPRAPW
    hla-cw4_1IM9     SHSMRYFSTS VSWPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRGEPREPW
    mkb_2vaa         PHSLRYFVTA VSRPGLGEPR YMEVGYVDDT EFVRFDSDAE NPRYEPRARW
    db-1BZ9          PHSMRYFETA VSRPGLEEPR YISVGYVDNK EFVRFDSDAE NPRYEPRAPW
    
    
    hla_a68w_1HSB       IRNTRNVKAQ SQTDRVDLGT LRGYYNQSEA GSHTIQMMYG CDVGS
    hla_a0201_1DUY      IGETRKVKAH SQTHRVDLGT LRGYYNQSEA GSHTVQRMYG CDVGS
    hla_b3501_1A1N      IRNTQIFKTN TQTYRESLRN LRGYYNQSEA GSHIIQRMYG CDLGP
    hla_b5301_1A1M      IRNTQIFKTN TQTYRENLRI ALRYYNQSEA GSHIIQRMYG CDLGP
    hla_b5101_1E27      IRNTQIFKTN TQTYRENLRI ALRYYNQSEA GSHTWQTMYG CDVGP
    hla_b2701_1HSA      IRETQICKAK AQTDREDLRT LLRYYNQSEA GSHTLQNMYG CDVGP
    hla_cw3_1EFX        VRETQKYKRQ AQTDRVSLRN LRGYYNQSEA GSHIIQRMYG CDVGP
    hla-cw4_1IM9        VRETQKYKRQ AQADRVNLRK LRGYYNQSED GSHTLQRMFG CDLGP
    mkb_2vaa            MRETQKAKGN EQSFRVDLRT LLGYYNQSKG GSHTIQVISG CEVGS
    db-1BZ9             MRETQKAKGQ EQWFRVSLRN LLGYYNQSAG GSHTLQQMSG CDLGS
    
    


    Sequence Identity & Similarity Calculation

    This server computes sequence identy and similarity percentages between each pair of sequences in the MSA usign the following equation:

    Eq. 1


    For each sequence pair, identical residues are the number of exactly matching residues while the number of matching similar residues are tallied by considering groups of similar amino acids that can be customized by the users (Fig 1).


    Figure 1.
    Similarity amino acid groups provided by server

    Computing percentage of identity and similarity requires dividing identidies/similarities by a sequence lenght (Eq.1). Howerver,for any given pair of aligned sequences, there is no consensus on which lenght should be used for these calculatios and thereby we provide four options to select from:
    • The lenght of shortest sequence
    • The lenght of largest sequence
    • The mean lenght of the two sequences being compared
    • The lenght of the MSA


    Normalized Global Similarity Score

    For every pair of aligned sequences in the provided MSA the servers calculates a normalized similarity score, S, that penalizes the presence of gaps using equation 2

    Eq. 2

    where:
    • "Mij" are similarity scores for each pair of aligned amino acids (ij) obtained from substitution matrices
    • "o" is the number of gaps and "Po" is the penalty for opening a gap
    • "e" is the total extension of the gaps and "Pe" the penalty for extending a gap.

    Gap penalties

    Sequence alignments do often have gaps and they have a toll in the global sequence similarity (Eq.2). In general, initiating a gap is more costly than extending a gap and therfore the default value of the penalty for opening a gap, Po used by the server is 10 while the defulat value of penalty for extending a gap, Pe, is 0.2. Nonethelles, both, Po and Pe,values can be adjusted by users from 0 to 100.

    Substitution matrices

    Substitution matrices are important in all sequence comparison analyses. In SIAS users can select between tree different matrices (BLOSUM62, PAM250 and GONNET) to obtain Mij scores required to calculate normalized similarity scores according to Eq.2. Here follows each of the matrices used in this work:

    BLOSUM62 Matrix:
    #  BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
    #  * column uses minimum score
    #  Cluster Percentage: >= 62
    #  Entropy =   0.6979, Expected =  -0.5209
    
     A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *
     4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0 -4 
    -1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1 -4 
    -2  0  6  1 -3  0  0  0  1 -3 -3  0 -2 -3 -2  1  0 -4 -2 -3  3  0 -1 -4 
    -2 -2  1  6 -3  0  2 -1 -1 -3 -4 -1 -3 -3 -1  0 -1 -4 -3 -3  4  1 -1 -4 
     0 -3 -3 -3  9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 
    -1  1  0  0 -3  5  2 -2  0 -3 -2  1  0 -3 -1  0 -1 -2 -1 -2  0  3 -1 -4 
    -1  0  0  2 -4  2  5 -2  0 -3 -3  1 -2 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 
     0 -2  0 -1 -3 -2 -2  6 -2 -4 -4 -2 -3 -3 -2  0 -2 -2 -3 -3 -1 -2 -1 -4 
    -2  0  1 -1 -3  0  0 -2  8 -3 -3 -1 -2 -1 -2 -1 -2 -2  2 -3  0  0 -1 -4 
    -1 -3 -3 -3 -1 -3 -3 -4 -3  4  2 -3  1  0 -3 -2 -1 -3 -1  3 -3 -3 -1 -4 
    -1 -2 -3 -4 -1 -2 -3 -4 -3  2  4 -2  2  0 -3 -2 -1 -2 -1  1 -4 -3 -1 -4 
    -1  2  0 -1 -3  1  1 -2 -1 -3 -2  5 -1 -3 -1  0 -1 -3 -2 -2  0  1 -1 -4 
    -1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5  0 -2 -1 -1 -1 -1  1 -3 -1 -1 -4 
    -2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6 -4 -2 -2  1  3 -1 -3 -3 -1 -4 
    -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7 -1 -1 -4 -3 -2 -2 -1 -2 -4 
     1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4  1 -3 -2 -2  0  0  0 -4 
     0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5 -2 -2  0 -1 -1  0 -4 
    -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11  2 -3 -4 -3 -2 -4 
    -2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7 -1 -3 -2 -1 -4 
     0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4 -3 -2 -1 -4 
    -2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1 -4 
    -1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1 -4 
     0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1 -4 
    -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4  1 
    
    

    PAM250 Matrix
    # This matrix was produced by "pam" Version 1.0.6 [28-Jul-93]
    # PAM 250 substitution matrix, scale = ln(2)/3 = 0.231049
    # Expected score = -0.844, Entropy = 0.354 bits
    # Lowest score = -8, Highest score = 17
    
       A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X  *
    A  2 -2  0  0 -2  0  0  1 -1 -1 -2 -1 -1 -3  1  1  1 -6 -3  0  0  0  0 -8
    R -2  6  0 -1 -4  1 -1 -3  2 -2 -3  3  0 -4  0  0 -1  2 -4 -2 -1  0 -1 -8
    N  0  0  2  2 -4  1  1  0  2 -2 -3  1 -2 -3  0  1  0 -4 -2 -2  2  1  0 -8
    D  0 -1  2  4 -5  2  3  1  1 -2 -4  0 -3 -6 -1  0  0 -7 -4 -2  3  3 -1 -8
    C -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5 -5 -4 -3  0 -2 -8  0 -2 -4 -5 -3 -8
    Q  0  1  1  2 -5  4  2 -1  3 -2 -2  1 -1 -5  0 -1 -1 -5 -4 -2  1  3 -1 -8
    E  0 -1  1  3 -5  2  4  0  1 -2 -3  0 -2 -5 -1  0  0 -7 -4 -2  3  3 -1 -8
    G  1 -3  0  1 -3 -1  0  5 -2 -3 -4 -2 -3 -5  0  1  0 -7 -5 -1  0  0 -1 -8
    H -1  2  2  1 -3  3  1 -2  6 -2 -2  0 -2 -2  0 -1 -1 -3  0 -2  1  2 -1 -8
    I -1 -2 -2 -2 -2 -2 -2 -3 -2  5  2 -2  2  1 -2 -1  0 -5 -1  4 -2 -2 -1 -8
    L -2 -3 -3 -4 -6 -2 -3 -4 -2  2  6 -3  4  2 -3 -3 -2 -2 -1  2 -3 -3 -1 -8
    K -1  3  1  0 -5  1  0 -2  0 -2 -3  5  0 -5 -1  0  0 -3 -4 -2  1  0 -1 -8
    M -1  0 -2 -3 -5 -1 -2 -3 -2  2  4  0  6  0 -2 -2 -1 -4 -2  2 -2 -2 -1 -8
    F -3 -4 -3 -6 -4 -5 -5 -5 -2  1  2 -5  0  9 -5 -3 -3  0  7 -1 -4 -5 -2 -8
    P  1  0  0 -1 -3  0 -1  0  0 -2 -3 -1 -2 -5  6  1  0 -6 -5 -1 -1  0 -1 -8
    S  1  0  1  0  0 -1  0  1 -1 -1 -3  0 -2 -3  1  2  1 -2 -3 -1  0  0  0 -8
    T  1 -1  0  0 -2 -1  0  0 -1  0 -2  0 -1 -3  0  1  3 -5 -3  0  0 -1  0 -8
    W -6  2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4  0 -6 -2 -5 17  0 -6 -5 -6 -4 -8
    Y -3 -4 -2 -4  0 -4 -4 -5  0 -1 -1 -4 -2  7 -5 -3 -3  0 10 -2 -3 -4 -2 -8
    V  0 -2 -2 -2 -2 -2 -2 -1 -2  4  2 -2  2 -1 -1 -1  0 -6 -2  4 -2 -2 -1 -8
    B  0 -1  2  3 -4  1  3  0  1 -2 -3  1 -2 -4 -1  0  0 -5 -3 -2  3  2 -1 -8
    Z  0  0  1  3 -5  3  3  0  2 -2 -3  0 -2 -5  0  0 -1 -6 -4 -2  2  3 -1 -8
    X  0 -1  0 -1 -3 -1 -1 -1 -1 -1 -1 -1 -1 -2 -1  0  0 -4 -2 -1 -1 -1 -1 -8
    * -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8  1
    
    

    GONNET Matrix
    # GONNET PAM 250 matrix recommended by Gonnet, Cohen & Benner
    # Science June 5, 1992.
    # Values rounded to nearest integer
    
       C  S  T  P  A  G  N  D  E  Q  H  R  K  M  I  L  V  F  Y  W  X  *
    C 12  0  0 -3  0 -2 -2 -3 -3 -2 -1 -2 -3 -1 -1 -2  0 -1  0 -1 -3 -8
    S  0  2  2  0  1  0  1  0  0  0  0  0  0 -1 -2 -2 -1 -3 -2 -3  0 -8
    T  0  2  2  0  1 -1  0  0  0  0  0  0  0 -1 -1 -1  0 -2 -2 -4  0 -8
    P -3  0  0  8  0 -2 -1 -1  0  0 -1 -1 -1 -2 -3 -2 -2 -4 -3 -5 -1 -8
    A  0  1  1  0  2  0  0  0  0  0 -1 -1  0 -1 -1 -1  0 -2 -2 -4  0 -8
    G -2  0 -1 -2  0  7  0  0 -1 -1 -1 -1 -1 -4 -4 -4 -3 -5 -4 -4 -1 -8
    N -2  1  0 -1  0  0  4  2  1  1  1  0  1 -2 -3 -3 -2 -3 -1 -4  0 -8
    D -3  0  0 -1  0  0  2  5  3  1  0  0  0 -3 -4 -4 -3 -4 -3 -5 -1 -8
    E -3  0  0  0  0 -1  1  3  4  2  0  0  1 -2 -3 -3 -2 -4 -3 -4 -1 -8
    Q -2  0  0  0  0 -1  1  1  2  3  1  2  2 -1 -2 -2 -2 -3 -2 -3 -1 -8
    H -1  0  0 -1 -1 -1  1  0  0  1  6  1  1 -1 -2 -2 -2  0  2 -1 -1 -8
    R -2  0  0 -1 -1 -1  0  0  0  2  1  5  3 -2 -2 -2 -2 -3 -2 -2 -1 -8
    K -3  0  0 -1  0 -1  1  0  1  2  1  3  3 -1 -2 -2 -2 -3 -2 -4 -1 -8
    M -1 -1 -1 -2 -1 -4 -2 -3 -2 -1 -1 -2 -1  4  2  3  2  2  0 -1 -1 -8
    I -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -2 -2 -2  2  4  3  3  1 -1 -2 -1 -8
    L -2 -2 -1 -2 -1 -4 -3 -4 -3 -2 -2 -2 -2  3  3  4  2  2  0 -1 -1 -8
    V  0 -1  0 -2  0 -3 -2 -3 -2 -2 -2 -2 -2  2  3  2  3  0 -1 -3 -1 -8
    F -1 -3 -2 -4 -2 -5 -3 -4 -4 -3  0 -3 -3  2  1  2  0  7  5  4 -2 -8
    Y  0 -2 -2 -3 -2 -4 -1 -3 -3 -2  2 -2 -2  0 -1  0 -1  5  8  4 -2 -8
    W -1 -3 -4 -5 -4 -4 -4 -5 -4 -3 -1 -2 -4 -1 -2 -1 -3  4  4 14 -4 -8
    X -3  0  0 -1  0 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -2 -4 -1 -8
    * -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8  1
    


    Last change: March 2008