Help for ENDscript

Help & Examples for

1.1

Programs used by ENDscript are: SPDB to check residue numbering and chainIDs from the input PDB¹ file, DSSP² to calculate secondary structure elements, accessibility and disulphide bridges, CNS³ to calculate intermolecular contacts, BLAST⁴ to search for matching sequences in databases, MULTALIN⁵ or CLUSTALW⁶ to perform multiple sequence alignments, ESPript⁷ to gather all this information in PostScript figures, BOBSCRIPT⁸ and MOLSCRIPT⁹ to display sequence conservation in 3D, DINO to generate molecular surfaces colored according to sequence homology, PROFIT to superimpose 3D structures of homolous sequences and PHYLODENDRON to build a phylogenetic tree.

These programs are executed in two phases.

To run the first phase you should connect to the interface, upload your PDB file in the first box and click on SUBMIT in the buttons frame. An ESPript figure is generated, giving information on each monomeric sequence contained in your PDB file.

The second phase is launched by clicking on enable the BLAST search in the second box of the interface and clicking again on SUBMIT. A new ESPript figure is created, resulting of the alignment of the sequence of the first monomer contained in your PDB file against matching sequences. A third PostScript figure is generated with BOBSCRIPT, showing your 3D structure coloured according to sequence similarity. Finally, the program check if an aligned sequence has a known 3D structure, and each detected 3D structure is superimposed to the query by the program PROFIT. A fourth PostScript is generated with BOBSCRIPT: the query is rendered as a tube, its radius being proportional to the calculated rms deviation.

You can skip the first phase and run the second directly. ESPript and BOBSCRIPT figures can be downloaded from the resulting frame of the interface.

Tracing files between programs

First phase
	programs	input files	output files
1a	SPDB	`file.pdb`	`file.spdb`
1b	DSSP	`file.spdb`	`file.dssp`
1c	CNS	`file.spdb`	`file.ctct`
1d	ESPript	`file.spdb file.ctct` `file.dssp`	`file1.ps`
Second phase, part a
	programs	input files	output files
2a	SPDB	`file.pdb`	`file.seq`
2b	BLAST and MULTALIN or CLUSTALW	`file.seq`	`file.aln`
1a 1b 1c	SPDB DSSP CNS	`file.pdb` `file.spdb` `file.spdb`	`file.spdb` `file.dssp` `file.ctct`
2c	ESPript	`file.aln file.ctct file.spdb` `file.dssp`	`file1_bcol.pdb file1.bob` `file2.ps`
2d	BOBSCRIPT	`file1.bob file1_bcol.pdb`	`file3.ps`
Second phase, part b
	programs	input files	output files
2e	PROFIT	`file.aln file.spdb file_seq1.spdb file_seq2.spdb file_seq3.spdb...`	`All rmsd`
2f	BOBSCRIPT	`file2.bob file2_bcol.pdb`	`file4.ps`

First phase: an ESPript figure showing all sequences of protein contained in the input PDB file

1a. SPDB

in -> out	`file.pdb` -> `file.spdb`
main role	checks input file

SPDB (and by extension ENDscript) supports structure files downloaded from the Protein Data Bank or resulting directly from the refinement programs CNS and REFMACS¹⁰.

If necessary, the program renumbers protein residues and re-assign chainIDs from A to Z.

First model is kept for NMR structures.

First conformers are kept for alternate residues. They are flagged by a * in column 68 of the output file.spdb.

Second oxygen atom of C-terminus main chain is removed (atom OXT).

Non-protein atoms are kept if they belong to:

nucleotide groups named ADE, GUA, CYT, THY or URI,

porphyrin groups named HEM, BCL, BPH or MQ7,

sugar groups named GLC, GAL, MAN, NAG, FUC, SIA or XYL,

other ligands named NAD, NAH, NDP, NAP or FMN,

The user can keep other hetero-compounds contained in its PDB file. He must type their names in the first box of the interface (up to 10 names of 2-3 characters per columns and one name per lines, check second example). They will be renamed 0Ax, 0Bx, 0Cx or 0Dx by SPDB, where x is either a letter between A and J or a digit between 0 and 9. An extra topology file needed by CNS is generated in turn:

a group named 0A[A-J] is given the chainID * a group named 0A[0-9] is given the chainID 1
a group named 0B[A-J] is given the chainID : a group named 0B[0-9] is given the chainID 2
a group named 0C[A-J] is given the chainID " a group named 0C[0-9] is given the chainID 3
a group named 0D[A-J] is given the chainID ^ a group named 0D[0-9] is given the chainID 4

Contacts between protein residues and kept hetero-compounds are shown in turn in the PostScript figure, by using the chainIDs characters *, %, !, ^, 1, 2, 3, 4, coloured in red or black. By default, this mark corresponding to a protein-ligand contacts is shown in red, if the distance is less than 3.2 Å, and in black if it is in the range 3.2-4 Å.

Remark: you can check the list of supported and unsupported hetero-compounds included in your PDB file, by clicking on OUT in the grey button panel when running the interface.

1b. DSSP

in -> out	`file.spdb` -> `file.dssp`
main role	calculates secondary structure elements

The program identifies alpha helices (shown by medium squiggles), 310 helices (small squiggles), pi helices (large squiggles), beta strands (arrows), strict alpha turns (TTT letters) and beta turns (TT letters) from the 3D structure.

Accessibility by residues is calculated. Only co-ordinates of protein residues are taken into account.

Cysteins involved in disulphide bridges are extracted.

Residues with alternate conformations, flagged by a * in column 68 of the input file.spdb, have this symbol in column 138 of the output file.dssp.

If the option 'Display all known structures' is activated via the interface, an automatic search is performed to check if a sequence name can be related to a known 3D structure. This option has no effect in phase 1 but can be used in phase 2, when a BLAST search is performed. Known secondary structure elements of each matching sequence are displayed in turn in the ESPript figure.

1c. CNS

in -> out	`file.spdb` -> `file.ctct`
main role	calculates intermolecular contacts

CNS calculates both crystallographic and non-crystallographic contacts between each protein molecule contained in file.spdb (remember that protein residues are identified by A-Z chainIDs). Contacts between protein residues and hetero-compounds are also calculated, if the latter are identified by the chainIDs *, :, ", ^, 1, 2, 3 or 4.

Cell parameters and space group are extracted from the header of file.spdb for crystallographic structures.

Hydrogen atoms are deleted and thus excluded from distance calculation.

Main chain atoms (N, CA, C, O) can also be excluded from distance calculation, by clicking on a button in the first box of the interface.

Upper limit for calculation of intermolecular contacts is 4 Å by default. The shortest intermolecular distance is taken for each residue.

Command lines included in the script to list intermolecular distances are:

Crystallographic contacts

Addition to CNS command file:
       delete selection=(hydrogen) end flags exclude * include pvdw end
       parameter nbond wmin=4.0 end end energy end
generates in CNS log file:
       %atoms "A -62 -ASN -OD1 " and "C -112 -THR -C "(XSYM# 4) only 3.64 A apart
Non-crystallographic contacts
Addition to CNS command file:
       flags exclude * include vdw end parameter nbond wmin=0 end end
       for $1 in (A B C D E F G H I J K L M N O P Q R S T U V W X Y Z) loop main
          distance disposition=print cuton=0.0 cutoff=4
          from =(segid $1) to =(not segid $1) end
       end loop main
generates in CNS log file:
       atoms "A -90 -ALA -CB " and "B -181 -HIS -CE1 " 3.6958 A apart

The line above shows a non-crystallographic vdw contact between carbon CB of Ala90 chainA and carbon CE1 of His181 chainB.

Contacts can be further analysed by looking to the figure produced by ESPript, as explained in the next paragraph.

1d. ESPript

in -> out	`file.spdb file.ctct file.dssp` -> `file1.ps`
main role	generates the first PostScript figure

The protein sequence of each chainID contained in file.spdb is displayed.

Alpha, 310 and pi helices are read from file.dssp. They are shown above sequence in the PostScript as medium, small and large squiggles with alpha, eta and pi labels respectively. Beta strands are shown as arrows labelled beta. Strict alpha and beta turns are marked by TTT and TT letters.

Residues modelled in file.pdb with an alternate conformation are highlighted by a grey star on the top of sequences.

Relative accessibility is calculated from file.dssp. It is shown by a blue-coloured bar below sequence. White is buried (<0.1) cyan is intermediate (0.1-0.4) blue is accessible (>0.4) blue with red borders is highly exposed (>1). A red box means that relative accessibility is not calculated for the residue, because it is truncated.

Hydropathy is calculated from the sequence according to the algorithm of Kyte and Doolittle¹¹. It is shown by a second bar below accessibility: pink is hydrophobic, grey is intermediate and cyan is hydrophilic.

Disulphide bridges are extracted from file.dssp. They are shown by green digits (1 1, 2 2 ...) below the bar of hydropathy.

Intermolecular contacts are extracted from file.ctct and are displayed along with disulphide bridges below the bar of hydropathy. The shortest intermolecular distance is taken for each residue. Corresponding chainIDs are written in red if the distance is less than 3.2 Å and in black if the distance is in the range 3.2-4 Å.

Further information is given according to the font:

a to z in italic are crystallographic contacts between residues
A to Z in italic are non-crystallographic contacts between residues
# identifies a crystallographic contact between two residues having same names, numbers and chainIDs (crystallographic identity)
a to z are crystallographic contacts between residues having same names and numbers but different chainIDs.
A to Z are non-crystallographic contacts between residues having same names, numbers and different chainIDs (for example along a non-crystallographic 2-fold axis).
*, :, ", ^, 1, 2, 3, 4 are contacts between protein residues and supported hetero-compounds.

Thus, if in file.ctct the line:
atoms "A -90 -ALA -CB " and "B -181 -HIS -CE1 " 3.6958 A apart
corresponds to the shortest distance between Ala90 and His181. A black B is written below Ala90 in the sequence of chainA.

Remark: only molecules located in the crystallographic asymmetric unit are taken into account by DSSP in its calculation of accessibility. Thus, you can find 'highly accessible' residues involved in contacts with crystallographic neighbours according to the ESPript figure. These residues are in fact buried in the crystal lattice.

Output Layout

FontSize [default: 6]: Size in points for the fonts (Courier for sequence names and residues).
ColumnNb [default: 90]: Number of residue columns per line.; A value for column number is calculated in the log file, at the end of the section concerning ESPript (click on OUT in the grey buttons panel and look for the sentence 'suggestion columns per line'). Replace the default, 90, by this value and click on SUBMIT so as to obtain a justified figure.
Vgap [default: 7]: Vertical gap between two blocks of sequences. The unit for the distance is the height of a line.
Vshift [default: 0]: Vertical shift for the whole display. The unit for the distance is the height of a line.
Hshift [default: 0, centred]: Horizontal shift for the whole display. The unit for the distance is the width of a residue.
PrinterOpt [default: C]: C coloured output ; T coloured with all letters in bold, ideal for thermal printers and others before reduction of your figure in an article ; S light yellow background, ideal for slides ; B black & white, a grey scale is used ; F flashy colours, similar residues are written with black bold characters and boxed in yellow, ideal for overheads.
Paper [default: P]: P: Portrait(A4) ; L: Landscape(A4) ; P3: Portrait(A3) ; L3: Landscape(A3).

Second phase, part a: a multiple alignment colour-coded with ESPript and a 3D representation obtained with BOBSCRIPT

2a. SPDB

in -> out	`file.pdb` -> `file.seq`
main role	checks input file

SPDB (and by extension ENDscript) supports structure files downloaded from the Protein Data Bank or resulting directly from the refinement programs CNS and REFMACS

The sequence of the first molecule is extracted and written in a one character code in FASTA format.

2b. BLAST and MULTALIN or CLUSTALW

in -> out	`file.seq` -> `(file.blast)` -> `file.aln`
main role	finds homologous sequences for multiple alignment

The BLAST search can be performed against sequences extracted from the PDB (named pdbaa) or against the SWISSPROT or the TrEMBL¹² databases. It can also be performed against SEQUENCED GENOMES downloaded from the ExPASy server.

The comparison matrix is BLOSUM62¹³ and the threshold for the evalue is set to 10^-6.

Multiple alignments are performed by MULTALIN or CLUSTALW. A FAST method is used with CLUSTALW. Output sequences are in the same order as they have been aligned from the guide tree. They can also be in the same order as they have been entered, from the lowest to the highest evalue, if the option input is used in the interface.

You can edit outputs from BLAST with a short description of each sequence by clicking in the Results frame of the interface, section Tracing files.

If activated, the option 'Show strictly conserved side chains' allows to display side chains of strictly conserved residues with a blue colour in the BOBSCRIPT figure.

If needed, BLAST and CLUSTALW searches can be cross-checked using the NPS@¹⁴ server, to have a better control on defaults or to use other sequence databases. Resulting alignment files in MPSA format can be uploaded in ESPript.

2c. ESPript

in -> out	`file.aln file.spdb file.ctct file.dssp` -> `file1_bcol.pdb file1.bob file2.ps`
main role	generates a second figure in PostScript with a multiple alignment

Similarities between the PDB sequence of the chosen chainID (chainA by default) and homologous sequences aligned with CLUSTALW are rendered by a boxing in colour. A score is calculated for each column of residues, according to a matrix based on physico-chemical properties. Residue names are written in black if score is below 0.7 (low similarity); they are in red and framed in blue if score is in the range 0.7-1 (high similarity); they are in white on a red background in case of strict identity.

You can switch to other scoring matrices using the html form and re-run phase 2. A Risler matrix¹⁵ gives usually an excellent rendering, when showing similarities on the 3D structure using BOBSCRIPT.

- Risler, Blosum62, Pam250 and Identity are four possibilities of scoring matrix (check appendix in the ESPript manual).
- A percentage of Equivalent residues can also be calculated considering either physico-chemical properties (HKR are polar positive ; DE are polar negative ; STNQ are polar neutral ; AVLIM are non polar aliphatic ; FYW are non polar aromatic ; PG ; C) or similarities used in MULTALIN (IV ; LM ; FY ; NDQEBZ).

Secondary structure elements, relative accessibility, hydropathy and intermolecular contacts are displayed as in file1.ps.

Sequences can be removed or their order can be changed by using the box 'Defining group' (e.g. 1-15 18-47 removes sequences 16 and 17 as shown in the second example).

Similarity scores by residues are written in the bfactor column of an output file named file1_bcol.pdb. This PDB file includes the co-ordinates of the selected protein chain and of its bound hetero-compounds, according to the list of interactions read in file.ctct. In the meantime, the command file file1.bob is created for BOBSCRIPT. file1.bob contains a script for representation of secondary structure elements and reads file1_bcol.pdb.

2d. Similarities on 3D structure with BOBSCRIPT

in -> out	`file1.bob` `file1_bcol.pdb` -> `file3.ps`
main role	generates a PostScript figure of the 3D structure

Secondary structure elements previously determined by DSSP (helices and beta strands) are shown. They are colour-coded from white to red according to similarity scores (low to high).

By default, side chains of strictly conserved residues are shown in blue, if the button 'Show strictly conserved side chains' has been activated in the section 'Start a BLAST request' of the interface.

A green dashed line links CA atoms of two cysteins connected by a disulphide bridge.

Hetero-compounds in interaction with the selected protein chain are represented by grey ball-and-sticks. Note that only hetero-compounds located in the asymmetric unit are displayed for crystallographic structures.

Second phase, part b: a 3D representation with rmsd information

2e. PROFIT

in -> out	`file.aln file.spdb file_seq1.spdb file_seq2.spdb file_seq3.spdb...` -> `All rmsd`
main role	superimpose homologous structures to the query

Information on zones of equivalent residues is extracted by PROFIT from file.aln, so as to superimpose each known structure of aligned sequence onto the query. Thus, each mobile structure (file_seq1.spdb, file_seq2.spdb...) is fitted onto the reference structure (file.spdb) by using CA pairs. Fitted structures are written in turn in a tar file named All.

A rmsd by residue is calculated using all fitted CA pairs

2f. RMSD on 3D structure with BOBSCRIPT

in -> out	`file2.bob` `file2_bcol.pdb` -> `file4.ps`
main role	generates a PostScript figure with rmsd information

A new file named file2_bcol.pdb is created, which contains a rmsd score by residue in the occupancy column and a similarity score in the temperature factor column. This information is used in the BOBSCRIPT command file file2.bob, in order to generate a ribbon representation of the query: the protein is represented as a tube, its radius being proportional to the calculated rmsd. A colour ramping from white to red is still used to visualize variations in sequence similarity (from low to high).

Critical command line in file2.bob allowing such representation is:

colour ss from rgb .2 .2 .2 via white to red by b-factor from -100 to 100; set coilradius from .2 to 1.2 by occupancy from .2 to 5;

Appendix

Three supplementary links in phase 2

1. MOLSCRIPT

in -> out	`file1.bob` `file1_bcol.pdb` -> `file3.vrml`
main role	rotates a 3D structure with colour-coded similarities

A MOLSCRIPT figure in VRML is also created in phase 2. It can be displayed and rotated on your screen, if your web browser has the appropriate plug-in from CORTONA or COSMO.
For information, command files are similar to generate MOLSCRIPT and BOBSCRIPT figures. However the line in file1.bob defining the colour code is specific to each program.

!!COLOUR IN MOLSCRIPT!! !set colourparts on, residuecolour amino-acids b-factor 0 100 from blue to red; !!COLOUR IN BOBSCRIPT!! colour ss from white to red by b-factor from 0 to 100;

The above line is uncommented in MOLSCRIPT and the VRML file is generated using the command
molscript -vrml < file1.bob > file3.vrml

2. PHYLODENDRON

A click on the icon with a tree in the Results frame allows to generate and view phylogenetic trees, if you have used CLUSTALW as alignment program. Just press the button 'Submit', once PHYLODENDRON's interface appears on your screen.

3. ESPript

You can have access to the full ESPript interface by clicking on the ESPript button of the interface. This form allows you to have a better grip on the layout, to highlight important zones and residues and so on...

Thus, you can find in the box 'Special Characters' of the interface the script.
X B query_A ! secondary structure elements are coloured in blue
Y B query_A ! residue numbering is in accord with the sequence query_A and is coloured in blue
P B query_A ! hydropathy is calculated, 'hyd' is in blue
T B query_A ! sequence name is in blue

You can also obtain a figure with full sequences, by replacing the range typed in the first box of the interface (eg 4-4500) by the string all

Examples

1. One monomer in the asymmetric unit and one supported hetero-compound

The first example refers to the structure of a psychrophilic alpha amylase from bacteria, solved by Nushin Aghajari¹⁶ in the group of Richard Haser, Laboratoire de BioCristallographie, IBCP Lyon. It is deposited with the PDB under the code 1G94. The protein is made of 448 residues and catalyses the hydrolysis of polysaccharides. It crystallizes in space group C2221 with one molecule in the asymmetric unit. Two bound glucoses named GLC are included in the PDB file, in addition to other sugar groups which are unsupported by ENDscript by default.

To start with the interface, either type the code 1G94 in the first box named 'pdb data file' or click on the PDB icon of this box, type 1G94 and click on 'retrieve file'. Click on SUBMIT in the bottom left command panel to generate the first PostScrit output.

Excerpt from the first generated PostScript (gif)

- The sequence is displayed along with secondary structure elements, hydropathy and crystallographic contacts.
- Seven alternate conformations have been detected and are marked by grey stars above sequence.
- Accessibility of Glu118 is in red, because this residue is disordered and the co-ordinates of its side chain beyond CB atom are not entered in the PDB file.
- Contacts between the protein and the two GLC molecules are marked by a " on the same line, as well as crystallographic contacts. Four disulphide bridges are shown with green digits.

Remark: Note, that the list of contacts can be checked by: (i) clicking on the link 'CNS' in the section Tracing text files of the Results frame; (ii) searching in the CNS file the string %atoms at the beginning of each line of non-crystalographic contacts and the string atoms at the beginning of each line of non-crystalographic contacts.

Change the threshold from 10^-6 to 10^-12, switch from MULTALIN to CLUSTALW, click on BLAST to start the search and to extract high homologous sequences from the PDB.

Excerpt from the second generated PostScript (gif)

- Fourteen similar sequences have been detected, including 1G94 itself. According to sequence similarity, one can notice that the alignment is certainly false at the second disulphide bridge of 1G94 (2 2) and should be manually corrected.

The third PostScript figure has been generated, showing similarities on the structure file.

Excerpt from the third generated PostScript (gif)

- CA atoms of the four disulphide bonds are connected by green dashes and two glucose molecules are visible at the surface of the protein.
- The command file for BOBSCRIPT is similar to the one for MOLSCRIPT. It can be copy-paste after a click on the IN of the buttons frame, but the line defining the colour code must be changed.

!!COLOUR IN MOLSCRIPT!! set colourparts on, residuecolour amino-acids b-factor 0 100 from blue to red; !!COLOUR IN BOBSCRIPT!! !colour ss from white to red by b-factor from 0 to 100;

You can rotate the molecule using your web browser by clicking on the VRML file generated by MOLSCRIPT in the RESULTS FRAME, or you can retrieve the PDB file created by ESPript with similarity scores on the bfactor column (hyperlink BCOL in the Results frame) to display the protein on a Silicon Graphics workstation.

molscript -gl < file1.bob

The protein can also be rotated and a new transformation matrix can be chosen.

Finally, homologous structures are superimposed to the query by the program PROFIT and the fourth PostScript figure is generated, showing rmsd information.

Excerpt from the fourth generated PostScript (gif)

- As can be expected, zones at the protein surface exhibit higher rmsd values (larger radius) and lower similarity scores (white colour).

2. Two monomers in the asymmetric unit and keeping unsupported hetero-compounds

The second example refers to the structure of alpha-amylase from barley, isoenzyme 1, solved by Xavier Robert in the same laboratory. This alpha amylase, known as amy1, can crystallize in space group P212121 with two molecules in the asymmetric unit. Each monomer binds molecules of acarbose, an inhibitor for the hydrolysis reaction. Input file is query.pdb. It contains 2x405 residues (protein is truncated and the first 24 residues are missing), 2x2 molecules of inhibitor named ACR and ACE, calcium ions named CA and water molecules named HOH. These hetero-compounds are not supported by ENDscript by default.

Save query.pdb on your disk, click on the button EXIT then EXECUTE of the html interface if you are still dealing with the first example. Click on 'Browse' in the first box to upload the PDB file.
We want to keep acarbose during analysis by ENDscript. The names ACR, ACE are typed in the columns with symbols 1, 2 contained in the box 'Keeping hetero-compounds'.

In the box 'Start a BLAST request': switch from the default database PDBAA to the SWISSPROT, from MULTALIN to CLUSTALW, release button Display all known structures (just for convenience for the example), click on button enable the BLAST search and click on SUBMIT. The three PostScript figures will be obtained in one go after a few minutes of calculation.

Excerpt from the first PostScript (gif)

- The sequences of both chains A and B are displayed on the figure.
-The line query_A corresponds to the contacts made by molecule A: i.e. crystallographic contacts with residues of molecules A and B (a, b), molecular contacts with molecule B (B), crystallographic contacts of Pro268 and Trp330 against their own symmetric (b), both crystallographic and non-crystallographic contacts with acarbose molecules (1 2).

Excerpt from the second PostScript (gif)

- Sequence of chainA has been extracted. It appears in position 9 after the multiple alignment performed by CLUSTALW. Sequence 10 named AMY1_HORVU corresponds to the sequence of amy1 deposited with the SWISSPROT. Therefore, it starts at residue 25.
Its is clear from the PostScript figure, that sequences 16 and 17 are fragments.
Type in the box 'Defining group' of the interface:

1-15 18-47

and click on SUBMIT in the button panel. The three figures are re-created without sequences 16 and 17.
Two residues are now strictly conserved in all sequences. One of them, Asp180, is an essential catalytic residue and binds acarbose molecule 1.

The third figure was obtained with BOBSCRIPT.

Excerpt from the generated PostScript (gif)

The image shows that molecule 1 is bound in a well conserved region, i.e. the active site of the protein, while molecule 2 is bound away in a non-conserved region. However, the multiple alignments figure suggests that this peripheral binding site is conserved in plant alpha amylases.

Finally, we want to generate an ESPript figure with intermolecular contacts made by monomers A and B.
-click on the ESPript button of the interface for connection. The script concerning the second ENDscript figure is transferred.

-add in the box 'Special characters' of ESPript, under T B query_A

R A all
R B all

Note that this request is relevant only if A and B have the same sequence. Click on the SUBMIT button:

Excerpt from the generated PostScript (gif)

-Contacts with acarbose are conserved in the two monomers.

3. Superposing mesophilic and psychrophilic alpha-amylases

This section concern users having already well practiced ESPript. DSSP and CNS output files have been saved on the disk and we want to compare the two sequences of alpha-amylases. Similarities are poor between the two proteins, 15% of identity in sequence, but they share the same fold and 13 matching segments of 3-33 residues were detected after structural superposition by the program TOP¹⁷. The two sequences were aligned manually using both this information and the editor SEAVIEW¹⁸. The following figure was obtained in turn with ESPript:

Excerpt from the generated PostScript (gif)

The main frame has been duplicated by clicking on +1 in the button panel; files concerning barley alpha-amylase have been entered in the upper part [01] and files concerning 1G94 in the bottom part [01].
A vertical shit of -1 in [01] and a bottom shift of -1 in both [01] and [01] allow to display all information on secondary structure elements, hydropathy and intermolecular contacts in the resulting PostScript. The essential Asp180 is highlighted by a blue box. 81 columns have been used instead of 70, in order to obtain a justified figure as suggested at the end of the OUT file.

Structural similarities are obvious on the figure despite the lack of sequence conservation, but there is no striking feature explaining the psychrophilic specificity of 1G94.

References

1. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000)
The Protein Data Bank. Nucleic Acids Res. 28 235-242

2. Kabsch, W. and Sander, C. (1983)
Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 22 2577-2637

3. Brünger, A.T., Adams, P.D., Clore, G.M., DeLano, W.L., Gros P., Grosse-Kunstleve, R.W., Jiang J.S., Kuszewski, J., Nilges, M., Pannu, N.S., Read, R.J., Rice L.M., Simonson, T. and Warren, G.L. (2000)
Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr. D. 54 905-921

4.Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang J., Zhang, Z., Miller, W. and Lipman, D.J. (1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 3389-3402

5. Corpet, F. (1988)
Multiple sequence alignment with hierarchical clustering. Nucl. Acids Res. 16 10881-10890

6. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994)
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22 4673-4680

7. Gouet, P., Courcelle, E., Stuart, D.I. and Metoz, F. (1999)
ESPript: multiple sequence alignments in PostScript. Bioinformatics. 15 305-308

8. Esnouf, R.M. (1997)
An extensively modified version of MolScript that includes greatly enhanced coloring capabilities. J. Mol. Graphics. 15 132-134

9. Kraulis, P.J. (1991) MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst. 24 946-950

10. Murshudov, G.N., Vagin. A.A. and Dodson, E.J. (1997). Refinement of macromolecular structures by the maximum-likelyhood method Acta Crystallogr. D. 53 240-255

11. Kyte, J. and Doolittle, R. (1982)
A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157 105-132

12. Bairoch, A. and Apweiler, R. (1997)
The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res. 25 31-36

13. Henikoff, J.G. and Henikoff, S. (1996)
Blocks database and applications. Meth. in Enzym. 266 88-105

14. Combet, C., Blanchet, C., Geourjon, C. and Deleage, G. (2000)
NPS@: Network Protein Sequence Analysiss. TIBS. 25 147-150

15. Risler, J.L., Delorme, M.O., Delacroix, H. and Henaut, A. (1988)
Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. J. Mol. Biol. 204 1019-1029

16. Aghajari, N., Roth, M. and Haser, R. (2001)
Enzymatic synthesis in the crystalline state of a novel hepta-saccharide: its 3D-structure in complex with a cold active alpha-amylase submitted

17. Lu, G. (2000)
TOP: a new method for protein structure comparisons and similarity searches. J. Appl. Cryst. 33 176-183

18. Galtier, N., Gouy, M. and Gauthier, C. (1996)
SeaView and Phylo_win, two graphic tools for sequence alignment and molecular phylogeny. Comput. Applic. Biosci. 12 543-548

Back to MAIN page.