SCAT: Structural Correlation Analysis Tool
SCAT is designed for the analysis of structural properties of tryptophan environment in proteins. As a result of the SCAT module, the structural properties of tryptophan residues can be calculated and the tryptophan residues can be assigned to one of the five spectral-structural classes. The SCAT module is fully interoperable with the PDB. The program downloads and parses the PDB file after the user provides the protein PDB code.
- Input of data
- Tips: structural calculations
- SCAT calculations
- Output file: 16 parameter.txt
- Output file: List of all atoms.txt
- Output file: List of all atoms.txt
- Output file: Result.txt
- Output file: Result Table.txt
- Output file: Summary.txt
Input of data
There are 2 steps in input of spectral data.
| Step 1. | Provide the exact 4 character PDB code of file . |
| Step 2. | is designed to select chains, protein residues and atoms and hetatoms to be used for the calculation. The link to the selected file in PDB is provided. For each chain a subset of the residues could be selected: enter the range of residues (inclusive) in the text box. If you leave the box blank, all residues in the chain will be used. |
Examples:
- 1-400 (select residues #1 to #400)
- 10-100,105,110-200 (select residues #10 to #100, #105, and #110 to #200)
- Do not enter spaces!
- Do not include chains, which contain only Calpha atoms!
- Do not include chains, which contain Hydrogen atoms!
The separate line is reserved for comments. The comments would appear in all output files together with other information about selected PDB file.
Tips: structural calculations
- For the correct assignment of spectral components to the tryptophan residues, protein, structure of which is selected from PDB, should have the same source (origin) as the protein used in fluorescence measurements.
- Some structures in PDB have several different residues with the same number. You can select the residue to include into analysis.
- In majority of cases there are several chains in PDB file. If the protein is a monomer, need to analyze each chain separately.
- It is recommended to exclude from the analysis all hetatoms except water molecules, unless the fluorescence measurements were performed in presence of these hetatoms.
- We recommend to include into analysis all water atoms presented in PDB file.
SCAT calculations
The calculation of structural parameters of environment of tryptophan residues is expected to be completed in 1-20 minutes (depending on the number of tryptophan residues in protein). If the result will not appear in 30 minutes, please contact PFAST administrator.
Output file: 16-parameters.txt
The results of structural-correlation analysis are presented in the form of six ASCII files. The 16-parameters.txt file contains summary information about structural parameters of environment of tryptophan residues.
Line 1-10
- Date of calculation
- 4 character PDB code
- Name of protein (from PDB file)
- Source of protein (from PDB file)
- Authors (from PDB file)
- Comment line (user comments from the previous step)
- Resolution
- [number of residues in protein] number of tryptophan residues {the position of tryptophan residues in sequence of protein}
Next Lines
The next lines contain 16 structural parameters of environment of each tryptophan residue in protein. The detailed information about calculation of structural parameters could be found in Background section or papers.
- Accessibility to solvent (%) of Ne1 atom of indole ring, which was calculated taking into account all hetatoms included in analysis
- Accessibility to solvent (%) of Ne1 atom of indole ring, which was calculated by excluding of all hetatoms from the analysis
- Accessibility to solvent (%) of CZ2 atom of indole ring, which was calculated taking into account all hetatoms included in analysis
- Accessibility to solvent (%) of CZ2 atom of indole ring, which was calculated by excluding of all hetatoms from the analysis
- Averaged accessibility to solvent (%) of nine atoms of indole ring of tryptophan residue, which was calculated taking into account all hetatoms included in analysis
- Averaged accessibility to solvent (%) of atoms of indole ring of tryptophan residue, which was calculated by excluding of all hetatoms from the analysis
- Packing density (Den1), the number of neighbor atoms at distance < 5.5 Å from the indole ring
- Packing density (Den2), the number of neighbor atoms at distance < 7.5 Å from the indole ring
- Parameter B1: crystallographic B-factors of the polar atoms at distance < 5.5 Å from the indole ring normalized to the mean B-factor value of all the C a atoms in the crystal structure
- Parameter B2: crystallographic B-factors of polar atoms at distance between 5.5 and 7.5 Å from the indole ring normalized to the mean B-factor value of all the C a atoms in the crystal structure
- Parameter R1 [ R1 = Acc*B1 ], “dynamic accessibility” is a dynamic characteristic of the microenvironment
- Parameter R2 [ R2 = Acc*B2 ], “dynamic accessibility” a dynamic characteristic of the microenvironment
- Parameter A1, relative polarity of environment: portion of polar atoms amongst all atoms around the tryptophan residue at distance < 5.5 Å
- Parameter A2, relative polarity of environment: portion of polar atoms amongst all atoms around the tryptophan residue at distance between 5.5 and 7.5 Å
- Charge difference equals to the difference between sum of all changes near pyrole ring minus sum of all charges near benzene ring in sphere of 5.5 Å
- Charge difference equals to the difference between sum of all changes near pyrole ring minus sum of all charges near benzene ring in sphere 7.5 Å
Output file: List of all atoms.txt
File contains list of all protein and solvent atoms at distance of 7.5 Å from the atoms of indole ring of tryptophan residues.
Line 1-10
- Date of calculation
- 4 character PDB code
- Name of protein (from PDB file)
- Source of protein (from PDB file)
- Authors (from PDB file)
- Comment line (user comments from the previous step)
- Resolution
- [number of residues in protein] number of tryptophan residues {the position of tryptophan residues in sequence of protein}
Next Lines
List of all protein and solvent atoms at distance of 7.5 Å from the atoms of indole ring of tryptophan residues. The following information is presented in the file: the neighbor atom name, residues name and number, atom's B-factor, distance to the nearest of nine atoms of indole ring of tryptophan residue in Å.
Output file: List of polar atoms.txt
File contains list of polar (O, N and S) protein and solvent atoms at distance of 7.5 Å from the atoms of indole ring of tryptophan residues.
- Line 1-10
- Date of calculation
- 4 character PDB code
- Name of protein (from PDB file)
- Source of protein (from PDB file)
- Authors (from PDB file)
- Comment line (user comments from the previous step)
- Resolution
- [number of residues in protein] number of tryptophan residues {the position of tryptophan residues in sequence of protein}
Next Lines
List of polar (O, N and S) protein and solvent atoms at distance of 7.5 Å from the atoms of indole ring of tryptophan residues. The following information is presented in file: the neighbor atom name, residues name and number, atom's B-factor, distance to the nearest of nine atoms of indole ring of tryptophan residue in Å.
Output file: Result.txt
File contains information about orientation of protein and solvent atoms located at the distance of 7.5 Å from the atoms of indole ring of tryptophan residues.
Line 1-10
- Date of calculation
- 4 character PDB code
- Name of protein (from PDB file)
- Source of protein (from PDB file)
- Authors (from PDB file)
- Comment line (user comments from the previous step)
- Resolution
- [number of residues in protein] number of tryptophan residues {the position of tryptophan residues in sequence of protein}
Table
In order to describe the surrounding of each of nine atoms of indole ring independently, we introduce new systems of coordinates centered at each atom of indole ring by turn.
First, three atoms of indole ring are listed in the table. The new system of coordinate is centered at the first of three listed atoms.
There are 3 parts of the table, which contain information about atoms - neighbors of the first of three indole atoms:
- list of polar (O, N and S) protein and solvent atoms at distance of 5.5 Å from the first of three atoms of indole ring of tryptophan residues.
- list of carbon protein atoms at distance of 5.5 Å from the first of three atoms of indole ring of tryptophan residues.
- list of polar (O, N and S) protein and solvent atoms at distance between 5.5 and 7.5 Å from the first of three atoms of indole ring of tryptophan residues.
The information in table contains X,Y,Z coordinates for each atom presented in PDB file, the recalculated X,Y,Z coordinates centered at the atoms of tryptophan residues and spherical system of coordinates, which provides distances ( ρ ) and orientations ( φ is the azimuth and θ is the elevation) of all atoms. The last column contains B-factor values.
Output file: Result-table.txt
File contains information about orientation and distances of potential partners of H-bonds with atoms of tryptophan residue, selected among all polar protein and solvent atoms.
Line 1-10
- Date of calculation
- 4 character PDB code
- Name of protein (from PDB file)
- Source of protein (from PDB file)
- Authors (from PDB file)
- Comment line (user comments from the previous step)
- Resolution
- [number of residues in protein] number of tryptophan residues {the position of tryptophan residues in sequence of protein}
Next Lines
The data in this part of table is presented only for tryptophan pairs, the distance between centers of indole ring of which is = 12 Å.
The data include
- Cos Qda - cosine of angle (Qda) between emission dipole of donor and absrobtion dipole of acceptor
- k^2 – orientation factor
- Ro – Forster distance for donor-acceptor pair
- E, % - efficiency of energy transfer
Next Lines
The distances between centers of indole ring of all tryptophan pairs
Table
The first part of the table contains the following parameters calculated for each of nine atoms of indole ring of tryptophan residue separately:
- Accessibility to solvent (%) calculated not taking into account water molecules
- Accessibility to solvent (%) calculated taking into account water molecules
- Number of all neighbor atoms at distances 5.5 and 7.5 Å
- Number of all neighbor polar (N, O, S) atoms at distances 5.5 and 7.5 Å
- Number of all polar protein atoms at distances 5.5 and 7.5 Å
- Number of all water molecules at distances 5.5 and 7.5 Å
- Number of all other hetatoms at distances 5.5 and 7.5 Å
- Fraction of polar atoms among all atoms at distances 5.5 and 7.5 Å
- Fraction of protein polar atoms among all atoms at distances 5.5 and 7.5 Å
- The sum of positive charges at distances 5.5 and 7.5 Å
- The sum of negative charges at distances 5.5 and 7.5 Å
- Accessibility to solvent (%) calculated taking into account water molecules
The second part of the table contains the averaged information for the whole tryptophan residue at distances less than 5.5 Å, from 5.5 Å to 7.5 Å and at distance less than 7.5 Å.
- Accessibility to solvent (%) calculated not taking into account water molecules
- Accessibility to solvent (%) calculated taking into account water molecules Averaged B-factor of polar atoms Parameters B1 and B2: crystallographic B-factors of polar atoms at distance < 5.5 Å and in range of 5.5 -7.5 Å from the indole ring, normalized to the mean B-factor value of all the C a atoms in the crystal structure
- Parameter R1 and R2 [ R1 = Acc*B1 and R2 = Acc*B2 ], dynamic accessibilities are dynamic characteristic of the microenvironment
- Number of all neighbor atoms
- Number of all polar neighbor atoms
- Number of all water molecules Fraction of polar atoms among all neighbor atoms
- Parameter A1 and A2 - relative polarity of environment: portion of polar atoms amongst all atoms
- Sum of all positive charges around tryptophan
- Sum of all negative charges around tryptophan
- Difference between positive and negative charges around tryptophan
- Charge difference equals to the difference between sum of all charges near pyrole ring minus sum of all charges near benzene ring
Next parts of the table contains information (atom name, residue name and number, R – distance, orientation (cos(THETA) and cos(FI)) and B-factors) about possible partners of hydrogen bonds located at distances 5.5 Å and in range of 5.5 -7.5 Å from each of nine atoms of indole ring. According to the geometric criteria of hydrogen bond:
cos(THETA) must be near to 1 for possible donors (these atoms are considered as potential donors: main-chain nitrogen atoms; Sγ of Cys; Nε2 and N&delata;1 of His; Nζ of Lys; Nδ2 of Asn; Nε2 of Gln; Nε, Nη1 and Nη2 of Arg; Oγ of Ser; Oγ1 of Thr; Oη of Tyr, Sγ of Cys); and cos(TETHA) and cos(FI) must be near to 0 and 1, respectively, for possible acceptors (these atoms are considered as potential acceptors: main-chain carbonyl oxygen; Oδ1 and Oδ2 of Asp and Asn; Sγ of Cys; Oε1 and Oε2 of Glu and Gln; Nδ1 of His; Sδ of Met; Oγ of Ser; Oγ of Thr; Oη of Tyr).
All atoms listed above considered as potential partners for hydrogen bonding if they were located within a cone with angles differing by less than ca. 20 ° from the ideal geometry of H-bonds, i. e. |cos(THETA)| >0.9 for possible donors and |cos(THETA)| <0.35 and cos(FI)>0.9 for possible acceptors.
Last part of the table contains information about sulphur atoms of Cys and Met, which are considered as good quenchers of fluorescence of tryptophan residue.
Last Line
The averaged B-factor of all Calpha atoms in protein
Output file: Summary.txt
The Summary.txt file contains information about assignment of tryptophan residues to the spectra-structural classes. The detailed information about assignment procedure could be found in Background section.
Line 1-10
- Date of calculation
- 4 character PDB code
- Name of protein (from PDB file)
- Source of protein (from PDB file)
- Authors (from PDB file)
- Comment line (user comments from the previous step)
- Resolution
- [number of residues in protein] number of tryptophan residues {the position of tryptophan residues in sequence of protein}
Next Lines
Contain information about classification scores, Mahalanobis distances and probabilities (the sum of all 5 probability equals to 1) of assignment of tryptophan residue to one of five spectral-structural classes.
The classification score is used to determine the most probable class to which a tryptophan belongs. A tryptophan residue belongs to the class for which it has the highest classification score.
Mahalanobis distance is a distance measure between tryptophan residue and class centroid, which takes into account correlation in the class. Class centroids are calculated from the training set.
The posterior probability that tryptophan residue belongs to a particular class is proportional to the Mahalanobis distance (the posterior probability is calculated taking into account a priori probability). Tryptophan residue belongs to the class for which it has the highest posterior probability.
More information about calculations of classification score, Mahalanobis distance and probability of assignment could be found in Background section.