mol.alg
– Algorithms for Structures¶
-
LocalDistDiffTest
(model, distance_list, tolerance_list, sequence_separation=0, local_ldt_property_string="")¶ This function counts the number of conserved local contacts between a model and a reference structure which is needed to compute the Local Distance Difference Test score.
The Local Distance Difference Test score is a number between zero and one, which measures the agreement of local contacts between a model and a reference structure. One means complete agreement, and zero means no agreement at all. The calculation of this score does not require any superposition between the model and the reference structures.
All distances between atoms in the reference structure that are shorter than a certain predefined length (inclusion radius) are compared with the corresponding distances in the model structure. If the difference between a reference distance and the corresponding model distance is smaller than a threshold value (tolerance), that distance is considered conserved. The final lDDT score is the fraction of conserved distances. Missing atoms in the model structure lead to non-conserved distances (and thus lower the final lDDT score).
This function takes as an input a list of distances to be checked for conservation. Any number of threshold values can be specified when the function is called. All thresholds are then applied in sequence and the return counts are averaged over all threshold values. A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation in sequence is higher than the provided parameter are considered when the score is computed.
If a string is passed as the last parameter, residue-based counts and the value of the residue-based Local Distance Difference Test score are saved in each ResidueHandle as int and float properties. Specifically, the local residue-based lddt score is stored in a float property named as the provided string, while the residue-based number of conserved and total distances are saved in two int properties named <string>_conserved and <string>_total.
Parameters: - model (
EntityView
) – the model structure - distance_list (
GlobalRDMap
) – the list of distances to check for conservation - tolerance_list – a list of thresholds used to determine distance conservation
- sequence_separation – sequence separation parameter used when computing the score
- local_ldt_property_string – the base name for the ResidueHandle properties that store the local scores
Returns: a tuple containing the counts of the conserved distances in the model and of all the checked distances
- model (
-
LocalDistDiffTest
(alignment, tolerance, radius, ref_index=0, mdl_index=1) Calculates the Local Distance Difference Test score (see previous function) starting from an alignment between a reference structure and a model. The AlignmentHandle parameter used to provide the alignment to the function needs to have the two structures attached to it. By default the first structure in the alignment is considered to be the reference structure, and the second structure is taken as the model. This can however be changed by passing the indexes of the two structures in the AlignmentHandle as parameters to the function.
BEWARE: This function uses the old implementation of the Local Distance Difference Test algorithm and will give slightly different results from the new one.
Parameters: - alignment (
AlignmentHandle
) – an alignment containing the sequences of the reference and of the model structures, with the structures themselves attached - tolerance – a list of thresholds used to determine distance conservation
- radius – the inclusion radius in Angstroms (to determine which distances are checked for conservation)
- ref_index – index of the reference structure in the alignment
- mdl_index – index of the model in the alignment
Returns: the Local Distance Difference Test score
- alignment (
-
LDDTHA
(model, distance_list, sequence_separation=0)¶ This function calculates the Local Distance Difference Test, using the same threshold values as the GDT-HA test (the default set of thresholds used for the lDTT score) (See previous functions). The thresholds are 0.5, 1, 2, and 4 Angstroms.
The function only compares the input distance list to the first chain of the model structure
The local residue-based lDDT score values are stored in the ResidueHandles of the model passed to the function in a float property called “locallddt”
A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score.
Parameters: - model (
EntityView
) – the model structure - distance_list (
GlobalRDMap
) – the list of distances to check for conservation - sequence_separation – sequence separation parameter used when computing the score
Returns: the Local Distance Difference Test score
- model (
-
DistanceRMSDTest
(model, distance_list, cap_difference, sequence_separation=0, local_drmsd_property_string="")¶ This function performs a Distance RMSD Test on a provided model, and calculates the two values that are necessary to determine the Distance RMSD Score, namely the sum of squared distance deviations and the number of distances on which the sum was computed.
The Distance RMSD Test (or DRMSD Test) computes the deviation in the length of local contacts between a model and a reference structure and expresses it in the form of a score value. The score has an an RMSD-like form, with the deviations in the RMSD formula computed as contact distance differences. The score is open-ended, with a value of zero meaning complete agreement of local contact distances, and a positive value revealing a disagreement of magnitude proportional to the score value itself. This score does not require any superposition between the model and the reference.
This function processes a list of distances provided by the user, together with their length in the reference structure. For each distance that is found in the model, its difference with the reference length is computed and used as deviation term in the RMSD-like formula. When a distance is not present in the model because one or both the atoms are missing, a default deviation value provided by the user is used.
The function only processes distances between atoms that do not belong to the same residue, and considers only standard residues in the first chain of the model. For residues with symmetric sidechains (GLU, ASP, ARG, VAL, PHE, TYR), the naming of the atoms is ambiguous. For these residues, the function computes the Distance RMSD Test score that each naming convention would generate when considering all non-ambiguous surrounding atoms. The solution that gives the lower score is then picked to compute the final Distance RMSD Score for the whole model.
A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score
If a string is passed as last parameter to the function, the function computes the Distance RMSD Score for each residue and saves it as a float property in the ResidueHandle, with the passed string as property name. Additionally, the actual sum of squared deviations and the number of distances on which it was computed are stored as properties in the ResidueHandle. The property names are respectively <passed string>_sum (a float property) and <passed string>_count (an integer property)
Parameters: - model (
EntityView
) – the model structure - distance_list (
GlobalRDMap
. This class stores two length values for each distance. Only the first is used by this function as reference length. The second is ignored.) – the list of distances to check. - cap_difference – a default deviation value to be used when a distance is not found in the model
- sequence_separation – sequence separation parameter used when computing the score
- local_ldt_property_string – the base name for the ResidueHandle properties that store the local scores
Returns: a tuple containing the sum of squared distance deviations, and the number of distances on which it was computed.
- model (
-
DRMSD
(model, distance_list, cap_difference, sequence_separation=0)¶ This function calculates the Distance RMSD Test score (see previous function).
- The function only considder distances between atoms not belonging to the same residue, and only compares the input distance list to the first
- chain of the model structure. It requires, in addition to the model and the list themselves, a default deviation value to be used in the DRMSD Test when a distance is not found in the model.
The local Local Distance Difference Test score values are stored in the ResidueHandles of the model passed to the function in a float property called “localdrmsd”
A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score
Parameters: - model (
EntityView
) – the model structure - distance_list (
GlobalRDMap
. This class stores two length values for each distance. Only the first is used by this function as reference length. The second is ignored.) – the list of distances to check for conservation - sequence_separation – sequence separation parameter used when computing the score
- cap_difference – a default deviation value to be used when a distance is not found in the model
Returns: the Distance RMSD Test score
-
CreateDistanceList
(reference, radius)¶
-
CreateDistanceListFromMultipleReferences
(reference_list, tolerance_list, sequence_separation, radius)¶ Both these functions create lists of distances to be checked during a Local Distance Difference Test (see description of the functions above).
Both functions process only standard residues present in the first chain of the reference structures.
The only difference between the two functions is that one takes a single reference structure and the other a list of reference structures. The structures in the list have to be properly prepared before being passed to the function. Corresponding residues in the structures must have the same residue number, the same chain name, etc. Gaps are allowed and automatically dealt with: if information about a distance is present in at least one of the structures, it will be considered.
If a distance between two atoms is shorter than the inclusion radius in all structures in which the two atoms are present, it is included in the list. However, if the distance is longer than the inclusion radius in at least one of the structures, it is not considered to be a local interaction and is excluded from the list.
The multiple-reference function takes care of residues with ambiguous symmetric sidechains. To decide which naming convention to use, the function computes a Local Distance Difference Test score for each reference against the first reference structure in the list, using only non ambiguously-named atoms. It picks then the naming convention that gives the highest score, guaranteeing that all references are processed with the correct atom names.
The cutoff list that will later be used to compute the Local Distance Difference Test score and the sequence separation parameter must be passed to the multi-reference function. These parameters do not influence the output distance list, which always includes all distances within the provided radius (to make it consistent with the single-reference corresponding function). However, the parameters are used when dealing with the naming convention of residues with ambiguous nomenclature.
Parameters: - reference (
EntityView
) – a reference structure from which distances are derived - reference_list (list of
EntityView
) – a list of of reference structures from which distances are derived - tolerance_list – a list of thresholds used to determine distance conservation when computing the LDDT score
- sequence_separation – sequence separation parameter used when computing the LDDT score
- radius – inclusion radius (in Angstroms) used to determine the distances included in the list
Returns: - reference (
-
class
UniqueAtomIdentifier
¶ Object containing enough information to uniquely identify an atom in a structure
-
UniqueAtomIdentifier
(chain, residue_number, residue_name, atom_name)¶ Creates an UniqueAtomIdentifier object starting from relevant atom information
Parameters: - chain – a string containing the name of the chain to which the atom belongs
- residue_number (
ResNum
) – the number of the residue to which the atom belongs - residue_name – a string containing the name of the residue to which the atom belongs
- atom_name – a string containing the name of the atom
-
GetChainName
()¶ Returns the name of the chain to which the atom belongs, as a String
-
GetResNum
()¶ Returns the number of the residue the atom belongs to, as a
ResNum
object
-
GetResidueName
()¶ Returns the name of the residue to which the atom belongs, as a String
-
GetAtomName
()¶ Returns the name of the atom, as a String
-
GetQualifiedAtomName
()¶ Returns the qualified name of the atom (the chain name, followed by a unique residue identifier and the atom name. For example: “A.GLY2.CA”)
-
-
class
ResidueRDMap
¶ Dictionary-like object containing the a list of distances that originate from the a single residue residue, to check during a run of the Local Distance Difference Test algorithm
-
class
GlobalRDMap
¶ Dictionary-like object containing all the
ResidueRDMap
objects related to residues of a single structure
-
PrintResidueRDMap
(residue_distance_list)¶ Prints to standard output all the distances contained in a
ResidueRDMap
object
-
PrintGlobalRDMap
(global_distance_list)¶ Prints to standard output all the distances contained in each of the
ResidueRDMap
objects that make up aGlobalRDMap
object
-
Swappable
(residue_name, atom_name)¶ - This function checks if an atom in a residue has a symmetry equivalent. It returns true if the atom belongs to a residue with a symmetric side-chain
- and a symmetry equivalent atom exists, otherwise it returns false.
Parameters: - residue_name – a string containing the name of the residue to which the atom belongs
- atom_name – a string containing the name of the atom
Returns: True if the atom has a symmetry equivalent, false otherwise
-
SwappedName
(atom_name)¶ - If the atom does belongs to a residue with a symmetric side-chain and if the atom has a symmetry equivalent, the function returns the name of the symmetry equivalent atom, otherwise it returns the name of the original atom
Parameters: atom_name – a string containing the name of the atom Returns: A string containing the same of the symmetry equivalent atom if the input atom has one, otherwise the name of the input atom itself.
Steric Clashes¶
The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).
-
FilterClashes
(entity, clashing_distances, always_remove_bb=False)¶ This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.
The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a
ClashingInfo
object containing information about the detected clashes.Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter
Hydrogen and deuterium atoms are ignored by this function.
Parameters: - entity (
EntityView
orEntityHandle
) – The input entity - clashing_distances (
ClashingDistances
) – information about the clashing distances - always_remove_bb – if set to True, the whole residue is removed even if the clash happens in the side-chain
Returns: A tuple of two elements: The filtered
EntityView
, and aClashingInfo
object- entity (
-
CheckStereoChemistry
(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)¶ This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain
The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a
StereoChemistryInfo
object containing information about the detected stereo-chemical violations.A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance <-> meanlength+std_dev*bond_tolerance] or an angle width lying outside of the range [mean_width-std_dev*angle_tolerance <-> mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.
Hydrogen and deuterium atoms are ignored by this function.
Parameters: - entity (
EntityView
orEntityHandle
) – The input entity - bond_stats (
StereoChemicalParams
) – statistics about bond lengths - angle_stats (
StereoChemicalParams
) – statistics about angle widths - bond_tolerance – tolerance for bond lengths (in standard deviations)
- angle_tolerance – tolerance for angle widths (in standard deviations)£
- always_remove_bb – if set to True, the whole residue is removed even if a violation in just detected in the side-chain
Returns: A tuple of two elements: The filtered
EntityView
, and aStereoChemistryInfo
object- entity (
-
class
ClashingInfo
¶ This object is returned by the FilterClashes function, and contains information about the clashes detected by the function.
-
GetClashCount
()¶ This method returns the number of clashes between non-bonded atoms detected in the input structure
-
GetAverageOffset
()¶ This methods returns a value in Angstroms representing the average offset by which clashing atoms lie closer than the minimum acceptable distance (which of course differs for each possible pair of elements)
Returns: the average offset, in Angstroms
-
GetClashList
()¶ Returns the list of detected inter-atomic clashes
Returns: a list of ClashEvent
objects
-
-
class
ClashEvent
¶ This object contains all the information relative to a single clash detected by the FilterClashes function
-
GetFirstAtom
()¶
-
GetSecondAtom
()¶ These two methods return the two atoms which clash
Returns: UniqueAtomIdentifier
-
GetModelDistance
()¶ This method returns the distance between the two clashing atoms as observed in the model
Returns: the distance in Angstroms between the two atoms
-
GetAdjustedReferenceDistance
()¶ This method returns the minimum acceptable distance between the two atoms involved in the clash, as defined in the
ClashingDistances
classReturns: the minimum acceptable distance in Angstroms
-
-
class
StereoChemistryInfo
¶ This object is returned by the CheckStereoChemistry function, and contains information about bond lengths and planar angle widths in the structure that diverge from the parameters tabulated by Engh and Huber in the International Tables of Crystallography. Only elements that diverge from the tabulated value by a minimum number of standard deviations (defined when the CheckStereoChemistry function is called) are reported.
-
GetBadBondCount
()¶ This method returns the number of bonds where a serious violation was detected
-
GetBondCount
()¶ This method returns the total number of bonds in the structure checked by the CheckStereoChemistry function
-
GetAvgZscoreBonds
()¶ This method returns the average z-score of all the bond lengths in the structure, computed using Engh and Huber’s mean and standard deviation values.
Returns: The average z-score of bond lengths
-
GetBadAngleCount
()¶ This method returns the number of planar angles where a serious violation was detected
-
GetAngleCount
()¶ This method returns the total number of planar angles in the structure checked by the CheckStereoChemistry function
-
GetAvgZscoreAngles
()¶ This method returns the average z-score of all the planar angle widths, computed using Engh and Huber’s mean and standard deviation values.
Returns: The average z-score of planar angle widths
-
GetBondViolationList
()¶ Returns the list of bond length violations detected in the structure
Returns: a list of StereoChemicalBondViolation
objects
-
GetAngleViolationList
()¶ Returns the list of angle width violations detected in the structure
Returns: a list of StereoChemicalAngleViolation
objects
-
-
class
StereoChemicalBondViolation
¶ This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length
-
GetFirstAtom
()¶ Returns the first atom of the bond
Returns: UniqueAtomIdentifier
-
GetSecondAtom
()¶ Returns the first atom of the bond
Returns: UniqueAtomIdentifier
-
GetBondLength
()¶ Returns the length of the bond as observed in the model
Returns: the bond length in Angstroms
-
GetAllowedRange
()¶ Returns the allowed range of bond lengths, according to the Engh and Huber’s tabulated parameters and the tolerance threshold used when CheckStereoChemistry function was called
Returns: a tuple containing the minimum and maximum allowed bond lengths in Angstroms
-
-
class
StereoChemicalAngleViolation
¶ This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width
-
GetFirstAtom
()¶ Returns the first atom that defines the planar angle
Returns: UniqueAtomIndentifier
-
GetSecondAtom
()¶ Returns the vertex atom of the planar angle
Returns: UniqueAtomIdentifier
-
GetThirdAtom
()¶ Returns the third atom that defines the planar angle
Returns: UniqueAtomIdentifier
-
GetAngleWidth
()¶ Returns the width of the planar angle as observed in the model
Returns: the angle width in degrees
-
GetAllowedRange
()¶ Returns the allowed range of angle widths, according to the Engh and Huber’s tabulated parameters and the tolerance threshold used when the CheckStereoChemistry function was called
Returns: a tuple containing the minimum and maximum allowed angle widths in degrees
-
-
class
ClashingDistances
¶ Object containing information about clashing distances between non-bonded atoms
-
ClashingDistances
()¶ Creates an empty distance list
-
SetClashingDistance
(ele1, ele2, clash_distance, tolerance)¶ Adds or replaces an entry in the list
Parameters: - ele1 – string containing the first element’s name
- ele2 – string containing the second element’s name
- clash_distance – minimum clashing distance (in Angstroms)
- tolerance – tolerance threshold (in Angstroms)
-
GetClashingDistance
()¶ - Recovers a reference distance and a tolerance threshold from the list
Parameters: - ele1 – string containing the first element’s name
- ele2 – string containing the second element’s name
Returns: a tuple containing the minimum clashing distance and the tolerance threshold
-
GetAdjustedClashingDistance
()¶ - Recovers a reference distance from the list, already adjusted by the tolerance threshold
Parameters: - ele1 – string containing the first element’s name
- ele2 – string containing the second element’s name
Returns: the adjusted minimum clashing distance
-
GetMaxAdjustedDistance
()¶ Returns the longest clashing distance in the list, after adjustment with tolerance threshold
-
IsEmpty
()¶ Returns True if the list is empty (i.e. in an invalid, useless state)
-
PrintAllDistances
()¶ Prints all distances in the list to standard output
-
-
class
StereoChemicalParams
¶ Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation
-
StereoChemicalParams
()¶ Creates an empty parameter list
-
SetParam
(item, residue, mean, standard_dev)¶ Adds or replaces an entry in the list
Parameters: - item – string defining a bond (format: X-Y) or an angle (format:X-Y-Z), where X,Y an Z are atom names
- residue – string containing the residue type the information pertains to
- mean – mean bond length or angle width
- standard_dev – standard deviation of the bond length or of the angle width
-
IsEmpty
()¶ Returns True if the list is empty (i.e. in an invalid, useless state)
-
PrintAllParameters
()¶ Prints all distances in the list to standard output
-
-
FillClashingDistances
(file_content)¶
-
FillBondStereoChemicalParams
(file_content)¶
-
FillAngleStereoChemicalParams
(file_content)¶ These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a the content of parameter file. The content of the file is passed to the function as a list of strings, each containing a line from the parameter file
Returns: ClashingDistances
andStereoChemicalParams
respectively
-
FillClashingDistancesFromFile
(filename)¶
-
FillBondStereoChemicalParamsFromFile
(filename)¶
-
FillAngleStereoChemicalParamsFromFile
(filename)¶ These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file. The filename passed to the function can be a full path.
Returns: ClashingDistances
andStereoChemicalParams
respectively
-
DefaultClashingDistances
()¶
-
DefaultBondStereoChemicalParams
()¶
-
DefaultAngleStereoChemicalParams
()¶ These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter file distributed with OpenStructure.
Returns: ClashingDistances
andStereoChemicalParams
respectively
-
ResidueNamesMatch
(probe, reference)¶ The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are comapred by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)
Parameters: - probe (
EntityView
) – the structure to test - reference (
EntityView
) – the reference structure
Returns: true if the residue names are the same, false otherwise
- probe (
-
ParseAtomNames
(atoms)¶ Parses different representations of a list of atom names and returns a
set
, understandable byMatchResidueByNum()
. In essence, this function translates- None to
None
- ‘all’ to
None
- ‘backbone’ to
set(['N', 'CA', 'C', 'O'])
- ‘aname1, aname2’ to
set(['aname1', 'aname2'])
['aname1', 'aname2']
toset(['aname1', 'aname2'])
Parameters: atoms ( str
,list
,set
) – Identifier or list of atomsReturns: A set
of atoms.- None to
-
MatchResidueByNum
(ent_a, ent_b, atoms='all')¶ Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
Parameters: - ent_a (
EntityView
orEntityHandle
) – The first entity - ent_b (
EntityView
orEntityHandle
) – The second entity - atoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
Returns: Two
EntityView
instances with the same amount of residues matched by number. Each residue will have the same number & type of atoms.- ent_a (
-
MatchResidueByIdx
(ent_a, ent_b, atoms='all')¶ Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.
Parameters: - ent_a (
EntityView
orEntityHandle
) – The first entity - ent_b (
EntityView
orEntityHandle
) – The second entity - atoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
Returns: Two
EntityView
instances with the same amount of residues matched by position. Each residue will have the same number & type of atoms.- ent_a (
-
MatchResidueByLocalAln
(ent_a, ent_b, atoms='all')¶ Match residues by local alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Smith/Waterman manner using the BLOSUM62 matrix for scoring. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
Parameters: - ent_a (
EntityView
orEntityHandle
) – The first entity - ent_b (
EntityView
orEntityHandle
) – The second entity - atoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
Returns: Two
EntityView
instances with the same number of residues. Each residue will have the same number & type of atoms.- ent_a (
-
MatchResidueByGlobalAln
(ent_a, ent_b, atoms='all')¶ Match residues by global alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Needleman/Wunsch manner using the BLOSUM62 matrix for scoring. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
Parameters: - ent_a (
EntityView
orEntityHandle
) – The first entity - ent_b (
EntityView
orEntityHandle
) – The second entity - atoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
Returns: Two
EntityView
instances with the same number of residues. Each residue will have the same number & type of atoms.- ent_a (
-
Superpose
(ent_a, ent_b, match='number', atoms='all', iterative=False, max_iterations=5, distance_threshold=3.0)¶ Superposes the model entity onto the reference. To do so, two views are created, returned with the result. atoms describes what goes into these views and match the selection method. For superposition,
SuperposeSVD()
is called. For matching, the following methods are recognised:number
- select residues by residue number, includes atoms, callsMatchResidueByNum()
index
- select residues by index in chain, includes atoms, callsMatchResidueByIdx()
local-aln
- select residues from a Smith/Waterman alignment, includes atoms, callsMatchResidueByLocalAln()
global-aln
- select residues from a Needleman/Wunsch alignment, includes atoms, callsMatchResidueByGlobalAln()
There is also an option to use iterative matching which allows for an iterative approach to superposing two structures. iterative takes two additional parameters, max_iteration and distance_threshold.
Parameters: - ent_a (
EntityView
orEntityHandle
) – The model entity - ent_b (
EntityView
orEntityHandle
) – The reference entity - match (
str
) – Method to gather residues/ atoms - atoms (
str
,list
,set
) – The subset of atoms to be used in the superposition - max_iterations (
int
) – They number of iterations that will be run during iterative superposition - distance_threshold (
float
) – The distance threshold between which two atoms that will be used in the next superposition iteration
Returns: An instance of
SuperpositionResult
, containing membersrmsd
- RMSD of the superposed entitiesview1
- FirstEntityView
usedview2
- SecondEntityView
used
Trajectory Analysis¶
This is a set of functions used for basic trajectory analysis such as extracting
positions, distances, angles and RMSDs. The organization is such that most
functions have their counterpart at the individual frame level
so that they can also be called on one frame instead
of the whole trajectory.
All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.
-
SuperposeFrames
(frames, sel, from=0, to=-1, ref=-1)¶ This function superposes the frames of the given coord group and returns them as a new coord group.
Parameters: - frames (
CoordGroupHandle
) – The source coord group. - sel (
ost.mol.EntityView
) – An entity view containing the selection of atoms to be used for superposition. If set to an invalid view, all atoms in the coord group are used. - from – index of the first frame
- to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
- ref – The index of the reference frame to use for superposition. If set to -1, the each frame is superposed to the previous frame.
Returns: A newly created coord group containing the superposed frames.
- frames (
-
SuperposeFrames
(frames, sel, ref_view, from=0, to=-1) Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.
Parameters: - frames (
CoordGroupHandle
) – The source coord group. - sel (
ost.mol.EntityView
) – An entity view containing the selection of atoms of the frames to be used for superposition. - ref_view (
ost.mol.EntityView
) – The reference view on which the frames will be superposed. The number of atoms in this reference view should be equal to the number of atoms in sel. - from – index of the first frame
- to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
Returns: A newly created coord group containing the superposed frames.
- frames (
-
AnalyzeAtomPos
(traj, atom1, stride=1)¶ This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - atom1 – The
AtomHandle
. - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeCenterOfMassPos
(traj, sele, stride=1)¶ This function extracts the position of the center-of-mass of a selection (
EntityView
) from a trajectory and returns it as a vector.Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - sele (
EntityView
.) – The selection from which the center of mass is computed - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeDistanceBetwAtoms
(traj, atom1, atom2, stride=1)¶ This function extracts the distance between two atoms from a trajectory and returns it as a vector.
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - atom1 – The first
AtomHandle
. - atom2 – The second
AtomHandle
. - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeAngle
(traj, atom1, atom2, atom3, stride=1)¶ This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - atom1 – The first
AtomHandle
. - atom2 – The second
AtomHandle
. - atom3 – The third
AtomHandle
. - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeDihedralAngle
(traj, atom1, atom2, atom3, atom4, stride=1)¶ This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - atom1 – The first
AtomHandle
. - atom2 – The second
AtomHandle
. - atom3 – The third
AtomHandle
. - atom4 – The fourth
AtomHandle
. - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeDistanceBetwCenterOfMass
(traj, sele1, sele2, stride=1)¶ This function extracts the distance between the center-of-mass of two selections (
EntityView
) from a trajectory and returns it as a vector.Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - sele1 (
EntityView
.) – The selection from which the first center of mass is computed - sele2 (
EntityView
.) – The selection from which the second center of mass is computed - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeRMSD
(traj, reference_view, sele_view, stride=1)¶ This function extracts the rmsd between two
EntityView
and returns it as a vector. The views don’t have to be from the same entity. The reference positions are taken directly from the reference_view, evaluated only once. The positions from the sele_view are evaluated for each frame. If you want to compare to frame i of the trajectory t, first use t.CopyFrame(i) for example:eh=io.LoadPDB(...) t=io.LoadCHARMMTraj(eh,...) sele=eh.Select(...) t.CopyFrame(0) mol.alg.AnalyzeRMSD(t,sele,sele)
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - reference_view (
EntityView
.) – The selection used as reference structure - sele_view (
EntityView
.) – The selection compared to the reference_view - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeMinDistance
(traj, view1, view2, stride=1)¶ This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - view1 (
EntityView
.) – The first group of atoms - view2 (
EntityView
.) – The second group of atoms - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeMinDistanceBetwCenterOfMassAndView
(traj, view_cm, view_atoms, stride=1)¶ This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - view_cm (
EntityView
.) – The group of atoms from which the center of mass is taken - view_atoms (
EntityView
.) – The second group of atoms - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
-
AnalyzeAromaticRingInteraction
(traj, view_ring1, view_ring2, stride=1)¶ This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings
Parameters: - traj (
CoordGroupHandle
) – The trajectory to be analyzed. - view_ring1 (
EntityView
.) – First group of atoms - view_ring2 (
EntityView
.) – Second group of atoms - stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- traj (
mol.alg.helix_kinks
– Algorithms to calculate Helix Kinks¶
-
CalculateHelixKink
(sele, proline=False)¶ This function calculates the bend,wobble and face-shift angles in an alpha-helix of an EntityView. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the prolin around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.
Parameters: - sele (
EntityView
) – A selection containing the alpha helix to be analyzed - proline=False – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be serached for automatically
Returns: A tuple (bend_angle, face_shift, wobble_angle).
Return type: (float, float, float)
- sele (
-
AnalyzeHelixKink
(t, sele, proline=False)¶ This function calculates the bend,wobble and face-shift angles in an alpha-helix over a trajectory. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.
Parameters: - t (~ost.mol.CoordGroup) – The trajectory to be analyzed
- sele (
EntityView
) – A selection containing the alpha helix to be analyzed - proline=False – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be serached for automatically
Returns: A tuple (bend_angle, face_shift, wobble_angle).
Return type: (FloatList, FLoatList, FloatList)
mol.alg.trajectory_analysis
– DRMSD, pairwise distances and more¶
This Module requires numpy
This module contains functions to analyze trajectories, mainly similiraty measures baed on RMSDS and pairwise distances.
Author: Niklaus Johner (niklaus.johner@unibas.ch)
-
AverageDistanceMatrixFromTraj
(t, sele, first=0, last=-1)¶ This function calcultes the distance between each pair of atoms in sele, averaged over the trajectory t.
Parameters: - t (
CoordGroupHandle
) – the trajectory - sele (
EntityView
) – the selection used to determine the atom pairs - first (
int
) – the first frame of t to be used - last (
int
) – the last frame of t to be used
Returns: a numpy NpairsxNpairs matrix, where Npairs is the number of atom pairs in sele.
- t (
-
DistRMSDFromTraj
(t, sele, ref_sele, radius=7.0, average=False, seq_sep=4, first=0, last=-1)¶ This function calculates the distance RMSD from a trajectory. The distances selected for the calculation are all the distances between pair of atoms from residues that are at least seq_sep apart in the sequence and that are smaller than radius in ref_sel. The number and order of atoms in ref_sele and sele should be the same.
Parameters: - t (
CoordGroupHandle
) – the trajectory - sele (
EntityView
) – the selection used to calculate the distance RMSD - ref_sele (
EntityView
) – the reference selection used to determine the atom pairs and reference distances - radius (
float
) – the upper limit of distances in ref_sele considered for the calculation - seq_sep (
int
) – the minimal sequence separation between atom pairs considered for the calculation - average (
bool
) – use the average distance in the trajectory as reference instead of the distance obtained from ref_sele - first (
int
) – the first frame of t to be used - last (
int
) – the last frame of t to be used
Returns: a numpy vecor dist_rmsd(Nframes).
- t (
-
DistanceMatrixFromPairwiseDistances
(distances, p=2)¶ This function calculates an distance matrix M(NframesxNframes) from the pairwise distances matrix D(NpairsxNframes), where Nframes is the number of frames in the trajectory and Npairs the number of atom pairs. M[i,j] is the distance between frame i and frame j calculated as a p-norm of the differences in distances from the two frames (distance-RMSD for p=2).
Parameters: - distances – a pairwise distance matrix as obtained from
PairwiseDistancesFromTraj()
- p – exponent used for the p-norm.
Returns: a numpy NframesxNframes matrix, where Nframes is the number of frames.
- distances – a pairwise distance matrix as obtained from
-
PairwiseDistancesFromTraj
(t, sele, first=0, last=-1, seq_sep=1)¶ This function calculates the distances between any pair of atoms in sele with sequence separation larger than seq_sep from a trajectory t. It return a matrix containing one line for each atom pair and Nframes columns, where Nframes is the number of frames in the trajectory.
Parameters: - t (
CoordGroupHandle
) – the trajectory - sele (
EntityView
) – the selection used to determine the atom pairs - first (
int
) – the first frame of t to be used - last (
int
) – the last frame of t to be used - seq_sep (
int
) – The minimal sequence separation between atom pairs
Returns: a numpy NpairsxNframes matrix.
- t (
-
RMSD_Matrix_From_Traj
(t, sele, first=0, last=-1, align=True, align_sele=None)¶ This function calculates a matrix M such that M[i,j] is the RMSD (calculated on sele) between frames i and j of the trajectory t aligned on sele.
Parameters: - t (
CoordGroupHandle
) – the trajectory - sele (
EntityView
) – the selection used for alignment and RMSD calculation - first (
int
) – the first frame of t to be used - last (
int
) – the last frame of t to be used
Returns: Returns a numpy NframesxNframes matrix, where Nframes is the number of frames.
- t (
mol.alg.structure_analysis
– Functions to analyze structures¶
Some functions for analyzing structures
Author: Niklaus Johner (Niklaus.Johner@unibas.ch)
-
CalculateBestFitLine
(sele1)¶ This function calculates the best fit line to the atoms in sele1.
Parameters: sele1 ( EntityView
) –Returns: Line3
-
CalculateBestFitPlane
(sele1)¶ This function calculates the best fit plane to the atoms in sele1.
Parameters: sele1 ( EntityView
) –Returns: Plane
-
CalculateDistanceDifferenceMatrix
(sele1, sele2)¶ This function calculates the pairwise distance differences between two selections (
EntityView
). The two selections should have the same number of atoms It returns an NxN DistanceDifferenceMatrix M (where N is the number of atoms in sele1) where M[i,j]=||(sele2.atoms[i].pos-sele2.atoms[j].pos)||-||(sele1.atoms[i].pos-sele1.atoms[j].pos)||Parameters: - sele1 (
EntityView
) – - sele2 (
EntityView
) –
Returns: NxN numpy matrix
- sele1 (
-
CalculateHelixAxis
(sele1)¶ This function calculates the best fit cylinder to the CA atoms in sele1, and returns its axis. Residues should be ordered correctly in sele1.
Parameters: sele1 ( EntityView
) –Returns: Line3
-
GetAlphaHelixContent
(sele1)¶ This function calculates the content of alpha helix in a view. All residues in the view have to ordered and adjacent (no gaps allowed)
Parameters: sele1 ( EntityView
) –Returns: float
-
GetDistanceBetwCenterOfMass
(sele1, sele2)¶ This function calculates the distance between the centers of mass of sele1 and sele2, two selections from the same Entity.
Parameters: - sele1 (
EntityView
) – - sele2 (
EntityView
) –
Returns: float
- sele1 (
-
GetFrameFromEntity
(eh)¶ This function returns a CoordFrame from an EntityHandle
Parameters: eh ( EntityHandle
) –Returns: ost.mol.CoordFrame
-
GetMinDistBetwCenterOfMassAndView
(sele1, sele2)¶ This function calculates the minimal distance between sele2 and the center of mass of sele1, two selections from the same Entity.
Parameters: - sele1 (
EntityView
) – The selection from which the center of mass is taken - sele2 (
EntityView
) –
Returns: distance (
float
)- sele1 (
-
GetMinDistanceBetweenViews
(sele1, sele2)¶ This function calculates the minimal distance between sele1 and sele2, two selections from the same Entity.
Parameters: - sele1 (
EntityView
) – - sele2 (
EntityView
) –
Returns: float
- sele1 (