mol.alg – Algorithms for Structures¶
- LocalDistDiffTest(model, reference, tolerance, radius, local_ldt_property_string="")¶
This function calculates the agreement of local contacts between a model and a reference structure (Local Distance Difference Tests). The overlap is a number between zero and one, where one indicates complete agreement, zero indicates no agreement at all. This score is similar to the GDT, but does not require any superposition between the model and the reference.
The distance of atom pairs in the reference structure, when shorter than a certain predefined distance (inclusion radius), is compared with the same distance in the model. If the difference between these two distances is smaller than a threshold value (tolerance), the distance is considered conserverd in the model. Missing atoms in the model lead to non-conserved distances and thus lower the score.
The function only processes standard residues in the first chains of the model and of the reference For residues with symmetric sidechains (GLU, ASP, ARG, VAL, PHE, TYR), the naming of the atoms is ambigous. For these residues, the function computes the Local Distance Difference Test score that each naming convention would generate when considering all non-ambigous surrounding atoms. The solution that gives higher score is then picked to compute the final Local Difference Distance Score for the whole model.
If a string is passed as last parameter to the function, the function computes the Local Difference Distance Test score for each residue and saves it as a float property in the ResidueHandle, with the passed string as property name. Additionally, the actual residue-based counts of the total checked distances and of the distances conserved in the model are stored as integer properties in the ResidueHandle. The property names are respectively <passed string>_total and <passed string>_conserved.
Parameters: - model (EntityView) – the model structure
- reference (EntityView) – the reference structure
- tolerance – the tolerance threshold used to determine distance conservation
- radius – the inclusion radius in Angstroms
- local_ldt_property_string – the base name for the ResidueHandle properties that store the local scores
Returns: the Local Distance Difference Test score
- LocalDistDiffTest(model, distance_list, tolerance_list, sequence_separation=0, local_ldt_property_string="")
This function counts the conserved local contacts between the model and the reference structure (these are the values needed to compute the Local Distance Difference Test score, see description of the previous function). It shares the same properties as the previous function, with some differences: the thresholds can be more than one (the return counts are then the average over all thresholds), and the input is not the reference structure, but already a list of distances to be checked for conservation
A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score
If a string is passed as the last parameter, residue-based counts and the value of the residue-based Local Distance Difference Test score are saved in each ResidueHandle as int and float properties, as detailed in the description of the previous function.
Parameters: - model (EntityView) – the model structure
- distance_list (GlobalRDMap) – the list of distances to check for conservation
- tolerance_list – a list of thresholds used to determine distance conservation
- sequence_separation – sequence separation parameter used when computing the score
- local_ldt_property_string – the base name for the ResidueHandle properties that store the local scores
Returns: a tuple containing the counts of the conserved distances in the model and of all the checked distances
- LocalDistDiffTest(alignment, tolerance, radius, ref_index=0, mdl_index=1);
Calculates the Local Distance Difference Test score (see previous functions) starting from an alignment between a reference structure and a model. The AlignmentHandle parameter used to provide the alignment to the function needs to have the two structures attached to it. By default the first structure in the alignment is considered to be the reference structure, and the second structure is taken as the model. This can however be changed by passing the indexes of the two structures in the AlignmentHandle as parameters to the function.
BEWARE: This function uses the old implementation of the Local Distance Difference Test algorithm and will give slightly different results from the new one.
Parameters: - alignment (AlignmentHandle) – an alignment containing the sequences of the reference and of the model structures, with the structures themselves attached
- tolerance – a list of thresholds used to determine distance conservation
- radius – the inclusion radius in Angstroms (to determine which distances are checked for conservation)
- ref_index – index of the reference structure in the alignment
- mdl_index – index of the model in the alignment
Returns: the Local Distance Difference Test score
- LDDTHA(model, distance_list, sequence_separation=0);
This function calculates the Local Distance Difference Test - High Accuracy score (see previous functions). The High Accuracy name comes from the fact that the tolerance levels used by this function are the same as the thresholds used by GDT-HA (0.5, 1, 2, and 4 Angstrom).
The function only compares the input distance list to the first chain of the model structure
The local Local Distance Difference Test score values are stored in the ResidueHandles of the model passed to the function in a float property called “locallddt”
A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score
Parameters: - model (EntityView) – the model structure
- distance_list (GlobalRDMap) – the list of distances to check for conservation
- sequence_separation – sequence separation parameter used when computing the score
Returns: the Local Distance Difference Test score
- CreateDistanceList(reference, radius);
- CreateDistanceListFromMultipleReferences(reference_list, tolerance_list, sequence_separation, radius);
Both these functions create lists of distances to be checked during a Local Distance Difference Test (see description of the functions above).
Both functions process only standard residues present in the first chain of the reference structures.
The only difference between the two functions is that one takes a single reference structure and the other a list of reference structures. The structures in the list have to be properly prepared before being passed to the function. Corresponding residues in the structures must have the same residue number, the same chain name, etc. Gaps are allowed and automatically dealt with: if information about a distance is present in at least one of the structures, it will be considered.
If a distance between two atoms is shorter than the inclusion radius in all structures in which the two atoms are present, it is included in the list. However, if the distance is longer than the inclusion radius in at least one of the structures, it is not considered to be a local interaction and is excluded from the list.
The multiple-reference function takes care of residues with ambigous symmetric sidechains. To decide which naming convention to use, the function computes a Local Distance Difference Test score for each reference against the first reference structure in the list, using only non ambigously-named atoms. It picks then the naming convention that gives the highest score, guaranteeing that all references are processed with the correct atom names.
The cutoff list that will later be used to compute the Local Distance Difference Test score and the sequence separation parameter must be passed to the multi-reference function. These parameters do not influence the output distance list, which always includes all distances within the provided radius (to make it consistent with the single-reference corresponding function). However, the parameters are used when dealing with the naming convention of residues with ambiguous nomenclature.
Parameters: - reference (EntityView) – a reference structure from which distances are derived
- reference_list (list of EntityView) – a list of of reference structures from which distances are derived
- tolerance_list – a list of thresholds used to determine distance conservation when computing the LDDT score
- sequence_separation – sequence separation parameter used when computing the LDDT score
- radius – inclusion radius (in Angstroms) used to determine the distances included in the list
Returns: class ~ost.mol.alg.GlobalRDMap
- class UniqueAtomIdentifier¶
Object containing enough information to uniquely identify an atom in a structure
- UniqueAtomIdentifier(chain, residue_number, residue_name, atom_name)¶
Creates an UniqueAtomIdentifier object starting from relevant atom information
Parameters: - chain – a string containing the name of the chain to which the atom belongs
- residue_number (ResNum) – the number of the residue to which the atom belongs
- residue_name – a string containing the name of the residue to which the atom belongs
- atom_name – a string containing the name of the atom
- GetChainName()¶
Returns the name of the chain to which the atom belongs, as a String
- GetResNum()¶
Returns the number of the residue the atom belongs to, as a ResNum object
- GetResidueName()¶
Returns the name of the residue to which the atom belongs, as a String
- GetAtomName()¶
Returns the name of the atom, as a String
- class ResidueRDMap¶
Dictionary-like object containing the a list of distances that originate from the a single residue residue, to check during a run of the Local Distance Difference Test algorithm
- class GlobalRDMap¶
Dictionary-like object containing all the ResidueRDMap objects related to residues of a single structure
Steric Clashes¶
The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).
- FilterClashes(entity, clashing_distances, always_remove_bb=False)¶
This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.
The function performs the filtering directly on the the entity which is passed as an argument. The entity gets altered by the function.
Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter
Hydrogen and deuterium atoms are ignored by this function.
Parameters: - entity (EntityView or EntityHandle) – The input entity
- clashing_distances (ClashingDistances) – information about the clashing distances
- always_remove_bb – if set to True, the whole residue is removed even if the clash happens in the side-chain
Returns: The filtered EntityView
- CheckStereoChemistry(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)¶
This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain
The function performs the filtering directly on the the entity which is passed as an argument. The entity gets altered by the function.
A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance <-> meanlength+std_dev*bond_tolerance] or an angle width lying outside of the range [mean_width-std_dev*angle_tolerance <-> mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.
Hydrogen and deuterium atoms are ignored by this function.
Parameters: - entity (EntityView or EntityHandle) – The input entity
- bond_stats (StereoChemicalParams) – statistics about bond lengths
- angle_stats (StereoChemicalParams) – statistics about angle widths
- bond_tolerance – tolerance for bond lengths (in standard deviations)
- angle_tolerance – tolerance for angle widths (in standard deviations)£
- always_remove_bb – if set to True, the whole residue is removed even if a violation in just detected in the side-chain
Returns: The filtered EntityView
- class ClashingDistances¶
Object containing information about clashing distances between non-bonded atoms
- ClashingDistances()¶
Creates an empty distance list
- SetClashingDistance(ele1, ele2, clash_distance, tolerance)¶
Adds or replaces an entry in the list
Parameters: - ele1 – string containing the first element’s name
- ele2 – string containing the second element’s name
- clash_distance – minimum clashing distance (in Angstroms)
- tolerance – tolerance threshold (in Angstroms)
- GetClashingDistance()¶
- Recovers a reference distance and a tolerance threshold from the list
Parameters: - ele1 – string containing the first element’s name
- ele2 – string containing the second element’s name
Returns: a tuple containing the minimum clashing distance and the tolerance threshold
- GetAdjustedClashingDistance()¶
- Recovers a reference distance from the list, already adjusted by the tolerance threshold
Parameters: - ele1 – string containing the first element’s name
- ele2 – string containing the second element’s name
Returns: the adjusted minimum clashing distance
- GetMaxAdjustedDistance()¶
Returns the longest clashing distance in the list, after adjustment with tolerance threshold
- IsEmpty()¶
Returns True if the list is empty (i.e. in an invalid, useless state)
- PrintAllDistances()¶
Prints all distances in the list to standard output
- class StereoChemicalParams¶
Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation
- StereoChemicalParams()¶
Creates an empty parameter list
- SetParam(item, residue, mean, standard_dev)¶
Adds or replaces an entry in the list
Parameters: - item – string defining a bond (format: X-Y) or an angle (format:X-Y-Z), where X,Y an Z are atom names
- residue – string containing the residue type the information pertains to
- mean – mean bond length or angle width
- standard_dev – standard deviation of the bond length or of the angle width
- IsEmpty()¶
Returns True if the list is empty (i.e. in an invalid, useless state)
- PrintAllParameters()¶
Prints all distances in the list to standard output
- FillClashingDistances(file_content)¶
- FillBondStereoChemicalParams(file_content)¶
- FillAngleStereoChemicalParams(file_content)¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a the content of parameter file. The content of the file is passed to the function as a list of strings, each containing a line from the parameter file
Returns: ClashingDistances and StereoChemicalParams respectively
- FillClashingDistancesFromFile(filename)¶
- FillBondStereoChemicalParamsFromFile(filename)¶
- FillAngleStereoChemicalParamsFromFile(filename)¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file. The filename passed to the function can be a full path.
Returns: ClashingDistances and StereoChemicalParams respectively
- DefaultClashingDistances()¶
- DefaultBondStereoChemicalParams()¶
- DefaultAngleStereoChemicalParams()¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter file distributed with OpenStructure.
Returns: ClashingDistances and StereoChemicalParams respectively
- ResidueNamesMatch(probe, reference)¶
The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are comapred by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)
Parameters: - probe (EntityView) – the structure to test
- reference (EntityView) – the reference structure
Returns: true if the residue names are the same, false otherwise
Trajectory Analysis¶
This is a set of functions used for basic trajectory analysis such as extracting positions, distances, angles and RMSDs. The organization is such that most functions have their counterpart at the individual frame level so that they can also be called on one frame instead of the whole trajectory.
All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.
- AnalyzeAtomPos(traj, atom1, stride=1)¶
This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- atom1 – The AtomHandle.
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeCenterOfMassPos(traj, sele, stride=1)¶
This function extracts the position of the center-of-mass of a selection (EntityView) from a trajectory and returns it as a vector.
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- sele (EntityView.) – The selection from which the center of mass is computed
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDistanceBetwAtoms(traj, atom1, atom2, stride=1)¶
This function extracts the distance between two atoms from a trajectory and returns it as a vector.
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- atom1 – The first AtomHandle.
- atom2 – The second AtomHandle.
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeAngle(traj, atom1, atom2, atom3, stride=1)¶
This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- atom1 – The first AtomHandle.
- atom2 – The second AtomHandle.
- atom3 – The third AtomHandle.
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDihedralAngle(traj, atom1, atom2, atom3, atom4, stride=1)¶
This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- atom1 – The first AtomHandle.
- atom2 – The second AtomHandle.
- atom3 – The third AtomHandle.
- atom4 – The fourth AtomHandle.
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDistanceBetwCenterOfMass(traj, sele1, sele2, stride=1)¶
This function extracts the distance between the center-of-mass of two selections (EntityView) from a trajectory and returns it as a vector.
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- sele1 (EntityView.) – The selection from which the first center of mass is computed
- sele2 (EntityView.) – The selection from which the second center of mass is computed
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeRMSD(traj, reference_view, sele_view, stride=1)¶
This function extracts the rmsd between two EntityView and returns it as a vector. The views don’t have to be from the same entity. The reference positions are taken directly from the reference_view, evaluated only once. The positions from the sele_view are evaluated for each frame. If you want to compare to frame i of the trajectory t, first use t.CopyFrame(i) for example:
eh=io.LoadPDB(...) t=io.LoadCHARMMTraj(eh,...) sele=eh.Select(...) t.CopyFrame(0) mol.alg.AnalyzeRMSD(t,sele,sele)
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- reference_view (EntityView.) – The selection used as reference structure
- sele_view (EntityView.) – The selection compared to the reference_view
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeMinDistance(traj, view1, view2, stride=1)¶
This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- view1 (EntityView.) – The first group of atoms
- view2 (EntityView.) – The second group of atoms
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeMinDistanceBetwCenterOfMassAndView(traj, view_cm, view_atoms, stride=1)¶
This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- view_cm (EntityView.) – The group of atoms from which the center of mass is taken
- view_atoms (EntityView.) – The second group of atoms
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeAromaticRingInteraction(traj, view_ring1, view_ring2, stride=1)¶
This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings
Parameters: - traj (CoordGroupHandle) – The trajectory to be analyzed.
- view_ring1 (EntityView.) – First group of atoms
- view_ring2 (EntityView.) – Second group of atoms
- stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- SuperposeFrames(frames, sel, from=0, to=-1, ref=-1)¶
This function superposes the frames of the given coord group and returns them as a new coord group.
Parameters: - frames (CoordGroupHandle) – The source coord group.
- sel (ost.mol.EntityView) – An entity view containing the selection of atoms to be used for superposition. If set to an invalid view, all atoms in the coord group are used.
- from – index of the first frame
- to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
- ref – The index of the reference frame to use for superposition. If set to -1, the each frame is superposed to the previous frame.
Returns: A newly created coord group containing the superposed frames.
- SuperposeFrames(frames, sel, ref_view, from=0, to=-1)
Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.
Parameters: - frames (CoordGroupHandle) – The source coord group.
- sel (ost.mol.EntityView) – An entity view containing the selection of atoms of the frames to be used for superposition.
- ref_view (ost.mol.EntityView) – The reference view on which the frames will be superposed. The number of atoms in this reference view should be equal to the number of atoms in sel.
- from – index of the first frame
- to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
Returns: A newly created coord group containing the superposed frames.
- ParseAtomNames(atoms)¶
Parses different representations of a list of atom names and returns a set, understandable by MatchResidueByNum(). In essence, this function translates
- None to None
- ‘all’ to None
- ‘backbone’ to set(['N', 'CA', 'C', 'O'])
- ‘aname1, aname2’ to set(['aname1', 'aname2'])
- ['aname1', 'aname2'] to set(['aname1', 'aname2'])
Parameters: atoms (str, list, set) – Identifier or list of atoms Returns: A set of atoms.
- MatchResidueByNum(ent_a, ent_b, atoms='all')¶
Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
Parameters: - ent_a (EntityView or EntityHandle) – The first entity
- ent_b (EntityView or EntityHandle) – The second entity
- atoms (str, list, set) – The subset of atoms to be included in the two views.
Returns: Two EntityView instances with the same amount of residues matched by number. Each residue will have the same number & type of atoms.
- MatchResidueByIdx(ent_a, ent_b, atoms='all')¶
Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.
Parameters: - ent_a (EntityView or EntityHandle) – The first entity
- ent_b (EntityView or EntityHandle) – The second entity
- atoms (str, list, set) – The subset of atoms to be included in the two views.
Returns: Two EntityView instances with the same amount of residues matched by position. Each residue will have the same number & type of atoms.
- MatchResidueByLocalAln(ent_a, ent_b, atoms='all')¶
Match residues by local alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Smith/Waterman manner using the BLOSUM62 matrix for scoring. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
Parameters: - ent_a (EntityView or EntityHandle) – The first entity
- ent_b (EntityView or EntityHandle) – The second entity
- atoms (str, list, set) – The subset of atoms to be included in the two views.
Returns: Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.
- MatchResidueByGlobalAln(ent_a, ent_b, atoms='all')¶
Match residues by global alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Needleman/Wunsch manner using the BLOSUM62 matrix for scoring. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
Parameters: - ent_a (EntityView or EntityHandle) – The first entity
- ent_b (EntityView or EntityHandle) – The second entity
- atoms (str, list, set) – The subset of atoms to be included in the two views.
Returns: Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.
- Superpose(ent_a, ent_b, match='number', atoms='all')¶
Superposes the model entity onto the reference. To do so, two views are created, returned with the result. atoms describes what goes in to these views and match the selection method. For superposition, SuperposeSVD() is called. For matching, following methods are recognised:
- number - select residues by residue number, includes atoms, calls MatchResidueByNum()
- index - select residues by index in chain, includes atoms, calls MatchResidueByIdx()
- local-aln - select residues from a Smith/Waterman alignment, includes atoms, calls MatchResidueByLocalAln()
- global-aln - select residues from a Needleman/Wunsch alignment, includes atoms, calls MatchResidueByGlobalAln()
Parameters: - ent_a (EntityView or EntityHandle) – The model entity
- ent_b (EntityView or EntityHandle) – The reference entity
- match (str) – Method to gather residues/ atoms
- atoms (str, list, set) – The subset of atoms to be used in the superposition.
Returns: An instance of SuperpositionResult, containing members
- rmsd - RMSD of the superposed entities
- view1 - First EntityView used
- view2 - Second EntityView used