SIAS | |
|
|
Input: Multiple Sequence AlignmentThe input for this program is a multiple sequence alignment (MSA). MSA can be of proteins or DNA. Howerver, percentage similarity and normilized similarity scores calculated by the server only applies to proteins.The MSA can either be pasted or uploaded from a file.Currently, the server accepts MSA in 3 different formats, Clustalw, FASTA GCG/PileUp or MSF. Nest follows an exmample MSA formated int the mentioned formats. CLUSTAL W (1.81) multiple sequence alignment hla_a68w_1HSB SHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI hla_a0201_1DUY SHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWI hla_b3501_1A1N SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI hla_b5301_1A1M SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWI hla_b5101_1E27 SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWI hla_b2701_1HSA SHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWI hla_cw3_1EFX SHSMRYFYTAVSRPGRGEPHFIAVGYVDDTQFVRFDSDAASPRGEPRAPWV hla-cw4_1IM9 SHSMRYFSTSVSWPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPREPWV mkb_2vaa PHSLRYFVTAVSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWM db-1BZ9 PHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWM .**:*** *::* ** **::: *****:. ******** . * *** *: hla_a68w_1HSB RNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGS hla_a0201_1DUY GETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGS hla_b3501_1A1N RNTQIFKTNTQTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGP hla_b5301_1A1M RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHIIQRMYGCDLGP hla_b5101_1E27 RNTQIFKTNTQTYRENLRIALRYYNQSEAGSHTWQTMYGCDVGP hla_b2701_1HSA RETQICKAKAQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGP hla_cw3_1EFX RETQKYKRQAQTDRVSLRNLRGYYNQSEAGSHIIQRMYGCDVGP hla-cw4_1IM9 RETQKYKRQAQADRVNLRKLRGYYNQSEDGSHTLQRMFGCDLGP mkb_2vaa RETQKAKGNEQSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGS db-1BZ9 RETQKAKGQEQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGS :*: * : * * .* ***** *** * : **::*. >hla_b5101_1E27 SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRAPWIRNTQIFKTN TQTYRENLRIALRYYNQSEAGSHTWQTMYGCDVGP >hla_a0201_1DUY SHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIGETRKVKAH SQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGS >hla-cw4_1IM9 SHSMRYFSTSVSWPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPREPWVRETQKYKRQ AQADRVNLRKLRGYYNQSEDGSHTLQRMFGCDLGP >mkb_2vaa PHSLRYFVTAVSRPGLGEPRYMEVGYVDDTEFVRFDSDAENPRYEPRARWMRETQKAKGN EQSFRVDLRTLLGYYNQSKGGSHTIQVISGCEVGS >hla_b2701_1HSA SHSMRYFHTSVSRPGRGEPRFITVGYVDDTLFVRFDSDAASPREEPRAPWIRETQICKAK AQTDREDLRTLLRYYNQSEAGSHTLQNMYGCDVGP >hla_cw3_1EFX SHSMRYFYTAVSRPGRGEPHFIAVGYVDDTQFVRFDSDAASPRGEPRAPWVRETQKYKRQ AQTDRVSLRNLRGYYNQSEAGSHIIQRMYGCDVGP >hla_b3501_1A1N SHSMRYFYTAMSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRTEPRPPWIRNTQIFKTN TQTYRESLRNLRGYYNQSEAGSHIIQRMYGCDLGP >db-1BZ9 PHSMRYFETAVSRPGLEEPRYISVGYVDNKEFVRFDSDAENPRYEPRAPWMRETQKAKGQ EQWFRVSLRNLLGYYNQSAGGSHTLQQMSGCDLGS >hla_a68w_1HSB SHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIRNTRNVKAQi SQTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGS PileUp MSF: 95 Type: P Check: 5477 .. Name: hla_a68w_1HSB oo Len: 95 Check: 5515 Weight: 0.0 Name: hla_a0201_1DUY oo Len: 95 Check: 4661 Weight: 10.0 Name: hla_b3501_1A1N oo Len: 95 Check: 4585 Weight: 10.0 Name: hla_b5301_1A1M oo Len: 95 Check: 4402 Weight: 10.0 Name: hla_b5101_1E27 oo Len: 95 Check: 4791 Weight: 10.0 Name: hla_b2701_1HSA oo Len: 95 Check: 3347 Weight: 10.0 Name: hla_cw3_1EFX oo Len: 95 Check: 4868 Weight: 10.0 Name: hla-cw4_1IM9 oo Len: 95 Check: 4736 Weight: 10.0 Name: mkb_2vaa oo Len: 95 Check: 4517 Weight: 10.0 Name: db-1BZ9 oo Len: 95 Check: 4055 Weight: 10.0 // hla_a68w_1HSB SHSMRYFYTS VSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SQRMEPRAPW hla_a0201_1DUY SHSMRYFFTS VSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SQRMEPRAPW hla_b3501_1A1N SHSMRYFYTA MSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRTEPRPPW hla_b5301_1A1M SHSMRYFYTA MSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRTEPRPPW hla_b5101_1E27 SHSMRYFYTA MSRPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRTEPRAPW hla_b2701_1HSA SHSMRYFHTS VSRPGRGEPR FITVGYVDDT LFVRFDSDAA SPREEPRAPW hla_cw3_1EFX SHSMRYFYTA VSRPGRGEPH FIAVGYVDDT QFVRFDSDAA SPRGEPRAPW hla-cw4_1IM9 SHSMRYFSTS VSWPGRGEPR FIAVGYVDDT QFVRFDSDAA SPRGEPREPW mkb_2vaa PHSLRYFVTA VSRPGLGEPR YMEVGYVDDT EFVRFDSDAE NPRYEPRARW db-1BZ9 PHSMRYFETA VSRPGLEEPR YISVGYVDNK EFVRFDSDAE NPRYEPRAPW hla_a68w_1HSB IRNTRNVKAQ SQTDRVDLGT LRGYYNQSEA GSHTIQMMYG CDVGS hla_a0201_1DUY IGETRKVKAH SQTHRVDLGT LRGYYNQSEA GSHTVQRMYG CDVGS hla_b3501_1A1N IRNTQIFKTN TQTYRESLRN LRGYYNQSEA GSHIIQRMYG CDLGP hla_b5301_1A1M IRNTQIFKTN TQTYRENLRI ALRYYNQSEA GSHIIQRMYG CDLGP hla_b5101_1E27 IRNTQIFKTN TQTYRENLRI ALRYYNQSEA GSHTWQTMYG CDVGP hla_b2701_1HSA IRETQICKAK AQTDREDLRT LLRYYNQSEA GSHTLQNMYG CDVGP hla_cw3_1EFX VRETQKYKRQ AQTDRVSLRN LRGYYNQSEA GSHIIQRMYG CDVGP hla-cw4_1IM9 VRETQKYKRQ AQADRVNLRK LRGYYNQSED GSHTLQRMFG CDLGP mkb_2vaa MRETQKAKGN EQSFRVDLRT LLGYYNQSKG GSHTIQVISG CEVGS db-1BZ9 MRETQKAKGQ EQWFRVSLRN LLGYYNQSAG GSHTLQQMSG CDLGS Sequence Identity & Similarity CalculationThis server computes sequence identy and similarity percentages between each pair of sequences in the MSA usign the following equation:
For each sequence pair, identical residues are the number of exactly matching residues while the number of matching similar residues are tallied by considering groups of similar amino acids that can be customized by the users (Fig 1).
Computing percentage of identity and similarity requires dividing identidies/similarities by a sequence lenght (Eq.1). Howerver,for any given pair of aligned sequences, there is no consensus on which lenght should be used for these calculatios and thereby we provide four options to select from:
Normalized Global Similarity ScoreFor every pair of aligned sequences in the provided MSA the servers calculates a normalized similarity score, S, that penalizes the presence of gaps using equation 2
where:
Gap penalties Sequence alignments do often have gaps and they have a toll in the global sequence similarity (Eq.2). In general, initiating a gap is more costly than extending a gap and therfore the default value of the penalty for opening a gap, Po used by the server is 10 while the defulat value of penalty for extending a gap, Pe, is 0.2. Nonethelles, both, Po and Pe,values can be adjusted by users from 0 to 100. Substitution matrices Substitution matrices are important in all sequence comparison analyses. In SIAS users can select between tree different matrices (BLOSUM62, PAM250 and GONNET) to obtain Mij scores required to calculate normalized similarity scores according to Eq.2. Here follows each of the matrices used in this work: BLOSUM62 Matrix: # BLOSUM Clustered Scoring Matrix in 1/2 Bit Units # * column uses minimum score # Cluster Percentage: >= 62 # Entropy = 0.6979, Expected = -0.5209 A R N D C Q E G H I L K M F P S T W Y V B Z X * 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4 -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 -4 -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1 PAM250 Matrix # This matrix was produced by "pam" Version 1.0.6 [28-Jul-93] # PAM 250 substitution matrix, scale = ln(2)/3 = 0.231049 # Expected score = -0.844, Entropy = 0.354 bits # Lowest score = -8, Highest score = 17 A R N D C Q E G H I L K M F P S T W Y V B Z X * A 2 -2 0 0 -2 0 0 1 -1 -1 -2 -1 -1 -3 1 1 1 -6 -3 0 0 0 0 -8 R -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3 0 -4 0 0 -1 2 -4 -2 -1 0 -1 -8 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -3 0 1 0 -4 -2 -2 2 1 0 -8 D 0 -1 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 -1 0 0 -7 -4 -2 3 3 -1 -8 C -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5 -5 -4 -3 0 -2 -8 0 -2 -4 -5 -3 -8 Q 0 1 1 2 -5 4 2 -1 3 -2 -2 1 -1 -5 0 -1 -1 -5 -4 -2 1 3 -1 -8 E 0 -1 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 -1 0 0 -7 -4 -2 3 3 -1 -8 G 1 -3 0 1 -3 -1 0 5 -2 -3 -4 -2 -3 -5 0 1 0 -7 -5 -1 0 0 -1 -8 H -1 2 2 1 -3 3 1 -2 6 -2 -2 0 -2 -2 0 -1 -1 -3 0 -2 1 2 -1 -8 I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 2 -2 2 1 -2 -1 0 -5 -1 4 -2 -2 -1 -8 L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 -3 4 2 -3 -3 -2 -2 -1 2 -3 -3 -1 -8 K -1 3 1 0 -5 1 0 -2 0 -2 -3 5 0 -5 -1 0 0 -3 -4 -2 1 0 -1 -8 M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0 6 0 -2 -2 -1 -4 -2 2 -2 -2 -1 -8 F -3 -4 -3 -6 -4 -5 -5 -5 -2 1 2 -5 0 9 -5 -3 -3 0 7 -1 -4 -5 -2 -8 P 1 0 0 -1 -3 0 -1 0 0 -2 -3 -1 -2 -5 6 1 0 -6 -5 -1 -1 0 -1 -8 S 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1 2 1 -2 -3 -1 0 0 0 -8 T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -3 0 1 3 -5 -3 0 0 -1 0 -8 W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 -4 -8 Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4 -2 7 -5 -3 -3 0 10 -2 -3 -4 -2 -8 V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 2 -1 -1 -1 0 -6 -2 4 -2 -2 -1 -8 B 0 -1 2 3 -4 1 3 0 1 -2 -3 1 -2 -4 -1 0 0 -5 -3 -2 3 2 -1 -8 Z 0 0 1 3 -5 3 3 0 2 -2 -3 0 -2 -5 0 0 -1 -6 -4 -2 2 3 -1 -8 X 0 -1 0 -1 -3 -1 -1 -1 -1 -1 -1 -1 -1 -2 -1 0 0 -4 -2 -1 -1 -1 -1 -8 * -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 1 GONNET Matrix # GONNET PAM 250 matrix recommended by Gonnet, Cohen & Benner # Science June 5, 1992. # Values rounded to nearest integer C S T P A G N D E Q H R K M I L V F Y W X * C 12 0 0 -3 0 -2 -2 -3 -3 -2 -1 -2 -3 -1 -1 -2 0 -1 0 -1 -3 -8 S 0 2 2 0 1 0 1 0 0 0 0 0 0 -1 -2 -2 -1 -3 -2 -3 0 -8 T 0 2 2 0 1 -1 0 0 0 0 0 0 0 -1 -1 -1 0 -2 -2 -4 0 -8 P -3 0 0 8 0 -2 -1 -1 0 0 -1 -1 -1 -2 -3 -2 -2 -4 -3 -5 -1 -8 A 0 1 1 0 2 0 0 0 0 0 -1 -1 0 -1 -1 -1 0 -2 -2 -4 0 -8 G -2 0 -1 -2 0 7 0 0 -1 -1 -1 -1 -1 -4 -4 -4 -3 -5 -4 -4 -1 -8 N -2 1 0 -1 0 0 4 2 1 1 1 0 1 -2 -3 -3 -2 -3 -1 -4 0 -8 D -3 0 0 -1 0 0 2 5 3 1 0 0 0 -3 -4 -4 -3 -4 -3 -5 -1 -8 E -3 0 0 0 0 -1 1 3 4 2 0 0 1 -2 -3 -3 -2 -4 -3 -4 -1 -8 Q -2 0 0 0 0 -1 1 1 2 3 1 2 2 -1 -2 -2 -2 -3 -2 -3 -1 -8 H -1 0 0 -1 -1 -1 1 0 0 1 6 1 1 -1 -2 -2 -2 0 2 -1 -1 -8 R -2 0 0 -1 -1 -1 0 0 0 2 1 5 3 -2 -2 -2 -2 -3 -2 -2 -1 -8 K -3 0 0 -1 0 -1 1 0 1 2 1 3 3 -1 -2 -2 -2 -3 -2 -4 -1 -8 M -1 -1 -1 -2 -1 -4 -2 -3 -2 -1 -1 -2 -1 4 2 3 2 2 0 -1 -1 -8 I -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -2 -2 -2 2 4 3 3 1 -1 -2 -1 -8 L -2 -2 -1 -2 -1 -4 -3 -4 -3 -2 -2 -2 -2 3 3 4 2 2 0 -1 -1 -8 V 0 -1 0 -2 0 -3 -2 -3 -2 -2 -2 -2 -2 2 3 2 3 0 -1 -3 -1 -8 F -1 -3 -2 -4 -2 -5 -3 -4 -4 -3 0 -3 -3 2 1 2 0 7 5 4 -2 -8 Y 0 -2 -2 -3 -2 -4 -1 -3 -3 -2 2 -2 -2 0 -1 0 -1 5 8 4 -2 -8 W -1 -3 -4 -5 -4 -4 -4 -5 -4 -3 -1 -2 -4 -1 -2 -1 -3 4 4 14 -4 -8 X -3 0 0 -1 0 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -2 -4 -1 -8 * -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 1 |