alg – Algorithms for Structures

Submodules

Local Distance Test scores (lDDT, DRMSD)

Note

This is a new implementation of lDDT, introduced in OpenStructure 2.4 with focus on supporting quaternary structure and compounds beyond the 20 standard proteinogenic amino acids. The previous lDDT code that comes with Mariani et al. is considered deprecated.

Note

lddt.lDDTScorer provides the raw Python API to compute lDDT but stereochemistry checks as described in Mariani et al. must be done seperately. You may want to check out the compare-structures action (Comparing two structures) to compute lDDT with pre-processing and support for quaternary structures.

class lDDTScorer(target, compound_lib=None, custom_compounds=None, inclusion_radius=15, sequence_separation=0, symmetry_settings=None, seqres_mapping={}, bb_only=False)

lDDT scorer object for a specific target

Sets up everything to score models of that target. lDDT (local distance difference test) is defined as fraction of pairwise distances which exhibit a difference < threshold when considering target and model. In case of multiple thresholds, the average is returned. See

V. Mariani, M. Biasini, A. Barbato, T. Schwede, lDDT : A local superposition-free score for comparing protein structures and models using distance difference tests, Bioinformatics, 2013

Parameters:
  • target (ost.mol.EntityHandle/ost.mol.EntityView) – The target

  • compound_lib (ost.conop.CompoundLib) – Compound library from which a compound for each residue is extracted based on its name. Uses ost.conop.GetDefaultLib() if not given, raises if this returns no valid compound library. Atoms defined in the compound are searched in the residue and build the reference for scoring. If the residue has atoms with names [“A”, “B”, “C”] but the corresponding compound only has [“A”, “B”], “A” and “B” are considered for scoring. If the residue has atoms [“A”, “B”] but the compound has [“A”, “B”, “C”], “C” is considered missing and does not influence scoring, even if present in the model.

  • custom_compounds (dict with residue names (str) as key and CustomCompound as value.) – Custom compounds defining reference atoms. If given, custom_compounds take precedent over compound_lib.

  • inclusion_radius (float) – All pairwise distances < inclusion_radius are considered for scoring

  • sequence_separation (int) – Only pairwise distances between atoms of residues which are further apart than this threshold are considered. Residue distance is based on resnum. The default (0) considers all pairwise distances except intra-residue distances.

  • symmetry_settings (SymmetrySettings) – Define residues exhibiting internal symmetry, uses GetDefaultSymmetrySettings() if not given.

  • seqres_mapping (dict (key: str, value: ost.seq.AlignmentHandle)) – Mapping of model residues at the scoring stage happens with residue numbers defining their location in a reference sequence (SEQRES) using one based indexing. If the residue numbers in target don’t correspond to that SEQRES, you can specify the mapping manually. You can provide a dictionary to specify a reference sequence (SEQRES) for one or more chain(s). Key: chain name, value: alignment (seq1: SEQRES, seq2: sequence of residues in chain). Example: The residues in a chain with name “A” have sequence “YEAH” and residue numbers [42,43,44,45]. You can provide an alignment with seq1 “HELLYEAH” and seq2 “----YEAH”. “Y” gets assigned residue number 5, “E” gets assigned 6 and so on no matter what the original residue numbers were.

  • bb_only (bool) – Only consider atoms with name “CA” in case of amino acids and “C3’” for Nucleotides. this invalidates compound_lib. Raises if any residue in target is not r.chem_class.IsPeptideLinking() or r.chem_class.IsNucleotideLinking()

Raises:

RuntimeError if target contains compound which is not in compound_lib, RuntimeError if symmetry_settings specifies symmetric atoms that are not present in the according compound in compound_lib, RuntimeError if seqres_mapping is not provided and target contains residue numbers with insertion codes or the residue numbers for each chain are not monotonically increasing, RuntimeError if seqres_mapping is provided but an alignment is invalid (seq1 contains gaps, mismatch in seq1/seq2, seq2 does not match residues in corresponding chains).

GetNChainContacts(target_chain, no_interchain=False)

Returns number of contacts expected for a certain chain in target

Parameters:
  • target_chain (str) – Chain in target for which you want the number of expected contacts

  • no_interchain (bool) – Whether to exclude interchain contacts

Raises:

RuntimeError if specified chain doesnt exist

lDDT(model, thresholds=[0.5, 1.0, 2.0, 4.0], local_lddt_prop=None, local_contact_prop=None, chain_mapping=None, no_interchain=False, no_intrachain=False, penalize_extra_chains=False, residue_mapping=None, return_dist_test=False, check_resnames=True, add_mdl_contacts=False, interaction_data=None)

Computes lDDT of model - globally and per-residue

Parameters:
  • model (ost.mol.EntityHandle/ost.mol.EntityView) – Model to be scored - models are preferably scored upon performing stereo-chemistry checks in order to punish for non-sensical irregularities. This must be done separately as a pre-processing step. Target contacts that are not covered by model are considered not conserved, thus decreasing lDDT score. This also includes missing model chains or model chains for which no mapping is provided in chain_mapping.

  • thresholds (list of floats) – Thresholds of distance differences to be considered as correct - see docs in constructor for more info. default: [0.5, 1.0, 2.0, 4.0]

  • local_lddt_prop (str) – If set, per-residue scores will be assigned as generic float property of that name

  • local_contact_prop (str) – If set, number of expected contacts as well as number of conserved contacts will be assigned as generic int property. Excected contacts will be set as <local_contact_prop>_exp, conserved contacts as <local_contact_prop>_cons. Values are summed over all thresholds.

  • chain_mapping (dict with str as keys/values) – Mapping of model chains (key) onto target chains (value). This is required if target or model have more than one chain.

  • no_interchain (bool) – Whether to exclude interchain contacts

  • no_intrachain (bool) – Whether to exclude intrachain contacts (i.e. only consider interface related contacts)

  • penalize_extra_chains (bool) – Whether to include a fixed penalty for additional chains in the model that are not mapped to the target. ONLY AFFECTS RETURNED GLOBAL SCORE. In detail: adds the number of intra-chain contacts of each extra chain to the expected contacts, thus adding a penalty.

  • residue_mapping (dict with key: str, value: ost.seq.AlignmentHandle) – By default, residue mapping is based on residue numbers. That means, a model chain and the respective target chain map to the same underlying reference sequence (SEQRES). Alternatively, you can specify one or several alignment(s) between model and target chains by providing a dictionary. key: Name of chain in model (respective target chain is extracted from chain_mapping), value: Alignment with first sequence corresponding to target chain and second sequence to model chain. There is NO reference sequence involved, so the two sequences MUST exactly match the actual residues observed in the respective target/model chains (ATOMSEQ).

  • return_dist_test – Whether to additionally return the underlying per-residue data for the distance difference test. Adds five objects to the return tuple. First: Number of total contacts summed over all thresholds Second: Number of conserved contacts summed over all thresholds Third: list with length of scored residues. Contains indices referring to model.residues. Fourth: numpy array of size len(scored_residues) containing the number of total contacts, Fifth: numpy matrix of shape (len(scored_residues), len(thresholds)) specifying how many for each threshold are conserved.

  • check_resnames (bool) – On by default. Enforces residue name matches between mapped model and target residues.

  • add_mdl_contacts (bool) – Adds model contacts - Only using contacts that are within a certain distance threshold in the target does not penalize for added model contacts. If set to True, this flag will also consider target contacts that are within the specified distance threshold in the model but not necessarily in the target. No contact will be added if the respective atom pair is not resolved in the target.

  • interaction_data (tuple) – Pro param - don’t use

Returns:

global and per-residue lDDT scores as a tuple - first element is global lDDT score (None if target has no contacts) and second element a list of per-residue scores with length len(model.residues). None is assigned to residues that are not covered by target. If a residue is covered but has no contacts in target, 0.0 is assigned.

class SymmetrySettings

Container for symmetric compounds

lDDT considers symmetries and selects the one resulting in the highest possible score.

A symmetry is defined as a renaming operation on one or more atoms that leads to a chemically equivalent residue. Example would be OD1 and OD2 in ASP => renaming OD1 to OD2 and vice versa gives a chemically equivalent residue.

Use AddSymmetricCompound() to define a symmetry which can then directly be accessed through the symmetric_compounds member.

AddSymmetricCompound(name, symmetric_atoms)

Adds symmetry for compound with name

Parameters:
  • name (str) – Name of compound with symmetry

  • symmetric_atoms (list of tuple) – Pairs of atom names that define renaming operation, i.e. after applying all switches defined in the tuples, the resulting residue should be chemically equivalent. Atom names must refer to the PDB component dictionary.

GetDefaultSymmetrySettings()

Constructs and returns SymmetrySettings object for natural amino acids

class CustomCompound(atom_names)

Defines atoms for custom compounds

lDDT requires the reference atoms of a compound which are typically extracted from a ost.conop.CompoundLib. This lightweight container allows to handle arbitrary compounds which are not necessarily in the compound library.

Parameters:

atom_names (list of str) – Names of atoms of custom compound

static FromResidue(res)

Construct custom compound from residue

Parameters:

res (ost.mol.ResidueView/ost.mol.ResidueHandle) – Residue from which reference atom names are extracted, hydrogen/deuterium atoms are filtered out

Returns:

CustomCompound

class lDDTSettings(radius=15, sequence_separation=0, cutoffs=(0.5, 1.0, 2.0, 4.0), label='locallddt')

Object containing the settings used for lDDT calculations.

Parameters:
radius

Distance inclusion radius.

Type:

float

sequence_separation

Sequence separation.

Type:

int

cutoffs

List of thresholds used to determine distance conservation.

Type:

list of float

label

The base name for the ResidueHandle properties that store the local scores.

Type:

str

PrintParameters()

Print settings.

ToString()
Returns:

String representation of the lDDTSettings object.

Return type:

str

GDT - Global Distance Test

Implements the GDT score, i.e. identifies the largest number of positions that can be superposed within a given distance threshold. The final GDT score is then the returned number divided by the total number of reference positioons. The algorithm is similar to what is described for the LGA tool but simpler. Therefore, the fractions reported by OpenStructure tend to be systematically lower. For benchmarking we computed the full GDT_TS, i.e. average GDT for distance thresholds [1, 2, 4, 8], on all CASP15 TS models. 96.5% of differences to the LGA results from the predictioncenter are within 2 GDT points and 99.2% are within 3 GDT points. The max difference is 7.39 GDT points.

The algorithm expects two position lists of same length and applies a sliding window with specified length to define a subset of position pairs as starting point for iterative superposition. Each iterative superposition applies the following steps:

  • Compute minimal RMSD superposition on subset of position pairs

  • Apply superposition on all model positions

  • Compute pairwise distances of all model positions and reference positions

  • Define new subset of position pairs: pairs within distance threshold

  • Stop if subset doesn’t change anymore

The subset in any of the iterations which is largest is stored.

This is done for each sliding window position and the largest subset ever observed is reported. To avoid long runtimes for large problem sizes, the sliding window is not applied on each possible position but is capped. If the number of positions is larger than this threshold, the sliding window is only applied on N equidistant locations.

GDT(mdl_pos, ref_pos, window_size, max_windows, distance_thresh)

Returns number of positions that can be superposed within distance_thresh and the respective transformation matrix.

Parameters:
  • mdl_pos (ost.geom.Vec3List) – Positions representing the model, typically alpha-carbon positions

  • ref_pos (ost.geom.Vec3List) – Positions representing the reference, typically alpha-carbon positions

  • window_size (int) – Size of the sliding window that is used to serve as starting point for iterative superposition. The described benchmark was done with a value of 7.

  • max_windows (int) – Cap for number of starting points. The described benchmark was done with a value of 1000.

  • distance_thresh (float) – Distance threshold for GDT algorithm

Returns:

tuple with first element being the number of superposable positions (int) and the second element the transformation matrix (ost.geom.Mat4)

Steric Clashes

The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).

FilterClashes(entity, clashing_distances, always_remove_bb=False)

This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.

The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a ClashingInfo object containing information about the detected clashes.

Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter.

Hydrogen and deuterium atoms are ignored by this function.

Parameters:
  • entity (EntityView or EntityHandle) – The input entity

  • clashing_distances (ClashingDistances) – information about the clashing distances

  • always_remove_bb (bool) – if set to True, the whole residue is removed even if the clash happens in the side-chain

Returns:

A tuple of two elements: The filtered EntityView, and a ClashingInfo object

CheckStereoChemistry(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)

This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain.

The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a StereoChemistryInfo object containing information about the detected stereo-chemical violations.

A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance, mean_length+std_dev*bond_tolerance] or an angle width outside of the range [mean_width-std_dev*angle_tolerance, mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.

Hydrogen and deuterium atoms are ignored by this function.

Parameters:
  • entity (EntityView or EntityHandle) – The input entity

  • bond_stats (StereoChemicalParams) – statistics about bond lengths

  • angle_stats (StereoChemicalParams) – statistics about angle widths

  • bond_tolerance (float) – tolerance for bond lengths (in standard deviations)

  • angle_tolerance (float) – tolerance for angle widths (in standard deviations)

  • always_remove_bb (bool) – if set to True, the whole residue is removed even if the clash happens in the side-chain

Returns:

A tuple of two elements: The filtered EntityView, and a StereoChemistryInfo object

class ClashingInfo

This object is returned by the FilterClashes() function, and contains information about the clashes detected by the function.

GetClashCount()
Returns:

number of clashes between non-bonded atoms detected in the input structure

GetAverageOffset()
Returns:

a value in Angstroms representing the average offset by which clashing atoms lie closer than the minimum acceptable distance (which of course differs for each possible pair of elements)

GetClashList()
Returns:

list of detected inter-atomic clashes

Return type:

list of ClashEvent

class ClashEvent

This object contains all the information relative to a single clash detected by the FilterClashes() function

GetFirstAtom()
GetSecondAtom()
Returns:

atoms which clash

Return type:

UniqueAtomIdentifier

GetModelDistance()
Returns:

distance (in Angstroms) between the two clashing atoms as observed in the model

GetAdjustedReferenceDistance()
Returns:

minimum acceptable distance (in Angstroms) between the two atoms involved in the clash, as defined in ClashingDistances

class StereoChemistryInfo

This object is returned by the CheckStereoChemistry() function, and contains information about bond lengths and planar angle widths in the structure that diverge from the parameters tabulated by Engh and Huber in the International Tables of Crystallography. Only elements that diverge from the tabulated value by a minimumnumber of standard deviations (defined when the CheckStereoChemistry function is called) are reported.

GetBadBondCount()
Returns:

number of bonds where a serious violation was detected

GetBondCount()
Returns:

total number of bonds in the structure checked by the CheckStereoChemistry function

GetAvgZscoreBonds()
Returns:

average z-score of all the bond lengths in the structure, computed using Engh and Huber’s mean and standard deviation values

GetBadAngleCount()
Returns:

number of planar angles where a serious violation was detected

GetAngleCount()
Returns:

total number of planar angles in the structure checked by the CheckStereoChemistry function

GetAvgZscoreAngles()
Returns:

average z-score of all the planar angle widths, computed using Engh and Huber’s mean and standard deviation values.

GetBondViolationList()
Returns:

list of bond length violations detected in the structure

Return type:

list of StereoChemicalBondViolation

GetAngleViolationList()
Returns:

list of angle width violations detected in the structure

Return type:

list of StereoChemicalAngleViolation

class StereoChemicalBondViolation

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length

GetFirstAtom()
GetSecondAtom()
Returns:

first / second atom of the bond

Return type:

UniqueAtomIdentifier

GetBondLength()
Returns:

length of the bond (in Angstroms) as observed in the model

GetAllowedRange()
Returns:

allowed range of bond lengths (in Angstroms), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the CheckStereoChemistry() function was called

Return type:

tuple (minimum and maximum)

class StereoChemicalAngleViolation

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width

GetFirstAtom()
GetSecondAtom()
GetThirdAtom()
Returns:

first / second (vertex) / third atom that defines the planar angle

Return type:

UniqueAtomIdentifier

GetAngleWidth()
Returns:

width of the planar angle (in degrees) as observed in the model

GetAllowedRange()
Returns:

allowed range of angle widths (in degrees), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the CheckStereoChemistry() function was called

Return type:

tuple (minimum and maximum)

class ClashingDistances

Object containing information about clashing distances between non-bonded atoms

ClashingDistances()

Creates an empty distance list

SetClashingDistance(ele1, ele2, clash_distance, tolerance)

Adds or replaces an entry in the list

Parameters:
  • ele1 – string containing the first element’s name

  • ele2 – string containing the second element’s name

  • clash_distance – minimum clashing distance (in Angstroms)

  • tolerance – tolerance threshold (in Angstroms)

GetClashingDistance(ele1, ele2)
Returns:

reference distance and a tolerance threshold (both in Angstroms) for two elements

Return type:

tuple (minimum clashing distance, tolerance threshold)

Parameters:
  • ele1 – string containing the first element’s name

  • ele2 – string containing the second element’s name

GetAdjustedClashingDistance(ele1, ele2)
Returns:

reference distance (in Angstroms) for two elements, already adjusted by the tolerance threshold

Parameters:
  • ele1 – string containing the first element’s name

  • ele2 – string containing the second element’s name

GetMaxAdjustedDistance()
Returns:

longest clashing distance (in Angstroms) in the list, after adjustment with tolerance threshold

IsEmpty()
Returns:

True if the list is empty (i.e. in an invalid, useless state)

PrintAllDistances()

Prints all distances in the list to standard output

class StereoChemicalParams

Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation

StereoChemicalParams()

Creates an empty parameter list

SetParam(item, residue, mean, standard_dev)

Adds or replaces an entry in the list

Parameters:
  • item – string defining a bond (format: X-Y) or an angle (format: X-Y-Z), where X,Y an Z are atom names

  • residue – string containing the residue type for this entry

  • mean – mean bond length (in Angstroms) or angle width (in degrees)

  • standard_dev – standard deviation of the bond length (in Angstroms) or of the angle width (in degrees)

IsEmpty()
Returns:

True if the list is empty (i.e. in an invalid, useless state)

PrintAllParameters()

Prints all entries in the list to standard output

FillClashingDistances(file_content)
FillBondStereoChemicalParams(file_content)
FillAngleStereoChemicalParams(file_content)

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from the content of a parameter file.

Parameters:

file_content (list of str) – list of lines from the parameter file

Return type:

ClashingDistances or StereoChemicalParams

FillClashingDistancesFromFile(filename)
FillBondStereoChemicalParamsFromFile(filename)
FillAngleStereoChemicalParamsFromFile(filename)

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file path.

Parameters:

filename (str) – path to parameter file

Return type:

ClashingDistances or StereoChemicalParams

DefaultClashingDistances()
DefaultBondStereoChemicalParams()
DefaultAngleStereoChemicalParams()

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter files distributed with OpenStructure.

Return type:

ClashingDistances or StereoChemicalParams

ResidueNamesMatch(probe, reference)

The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are compared by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)

Parameters:
Returns:

True if the residue names are the same, False otherwise

Superposing structures

Superpose(ent_a, ent_b, match='number', atoms='all', iterative=False, max_iterations=5, distance_threshold=3.0)

Superposes the model entity onto the reference. To do so, two views are created, returned with the result. atoms describes what goes into these views and match the selection method. For superposition, SuperposeSVD() or IterativeSuperposeSVD() are called (depending on iterative). For matching, the following methods are recognised:

Parameters:
  • ent_a (EntityView or EntityHandle) – The model entity (superposition transform is applied on full entity handle here)

  • ent_b (EntityView or EntityHandle) – The reference entity

  • match (str) – Method to gather residues/ atoms

  • atoms (str, list, set) – The subset of atoms to be used in the superposition

  • iterative (bool) – Whether or not to use iterative superpositon.

  • max_iterations (int) – Max. number of iterations for IterativeSuperposeSVD() (only if iterative = True)

  • distance_threshold (float) – Distance threshold for IterativeSuperposeSVD() (only if iterative = True)

Returns:

An instance of SuperpositionResult.

ParseAtomNames(atoms)

Parses different representations of a list of atom names and returns a set, understandable by MatchResidueByNum(). In essence, this function translates

  • None to None

  • ‘all’ to None

  • ‘backbone’ to set(['N', 'CA', 'C', 'O'])

  • ‘aname1, aname2’ to set(['aname1', 'aname2'])

  • ['aname1', 'aname2'] to set(['aname1', 'aname2'])

Parameters:

atoms (str, list, set) – Identifier or list of atoms

Returns:

A set of atoms.

MatchResidueByNum(ent_a, ent_b, atoms='all')

Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.

Parameters:
Returns:

Two EntityView instances with the same amount of residues matched by number. Each residue will have the same number & type of atoms.

MatchResidueByIdx(ent_a, ent_b, atoms='all')

Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.

Parameters:
Returns:

Two EntityView instances with the same amount of residues matched by position. Each residue will have the same number & type of atoms.

MatchResidueByLocalAln(ent_a, ent_b, atoms='all')

Match residues by local alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Smith/Waterman manner using the BLOSUM62 matrix for scoring. Only residues which are marked as peptide linking are considered for alignment. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.

Parameters:
Returns:

Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.

MatchResidueByGlobalAln(ent_a, ent_b, atoms='all')

Match residues by global alignment. Same as MatchResidueByLocalAln() but performs a global Needleman/Wunsch alignment of the sequences using the BLOSUM62 matrix for scoring.

Parameters:
Returns:

Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.

class SuperpositionResult
rmsd

RMSD of the superposed entities.

view1
view2

Two EntityView used in superposition (not set if methods with Vec3List used).

transformation

Transformation (Mat4) used to map view1 onto view2.

fraction_superposed
rmsd_superposed_atoms
ncycles

For iterative superposition (IterativeSuperposeSVD()): fraction and RMSD of atoms that were superposed with a distance below the given threshold and the number of iteration cycles performed.

SuperposeSVD(view1, view2, apply_transform=True)
SuperposeSVD(list1, list2)

Superposition of two sets of atoms minimizing RMSD using a classic SVD based algorithm.

Note that the atom positions in the view are taken blindly in the order in which the atoms appear.

Parameters:
  • view1 (EntityView) – View on the model entity

  • view2 (EntityView) – View on the reference entity

  • list1 (Vec3List) – List of atom positions for model entity

  • list2 (Vec3List) – List of atom positions for reference entity

  • apply_transform (bool) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.

Returns:

An instance of SuperpositionResult.

IterativeSuperposeSVD(view1, view2, max_iterations=5, distance_threshold=3.0, apply_transform=True)
IterativeSuperposeSVD(list1, list2, max_iterations=5, distance_threshold=3.0)

Iterative superposition of two sets of atoms. In each iteration cycle, we keep a fraction of atoms with distances below distance_threshold and get the superposition considering only those atoms.

Note that the atom positions in the view are taken blindly in the order in which the atoms appear.

Parameters:
  • view1 (EntityView) – View on the model entity

  • view2 (EntityView) – View on the reference entity

  • list1 (Vec3List) – List of atom positions for model entity

  • list2 (Vec3List) – List of atom positions for reference entity

  • max_iterations (int) – Max. number of iterations to be performed

  • distance_threshold (float) – Distance threshold defining superposed atoms

  • apply_transform (bool) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.

Returns:

An instance of SuperpositionResult.

Raises:

Exception if atom counts do not match or if less than 3 atoms.

CalculateRMSD(view1, view2, transformation=geom.Mat4())
Returns:

RMSD of atom positions (taken blindly in the order in which the atoms appear) in the two given views.

Return type:

float

Parameters:
  • view1 (EntityView) – View on the model entity

  • view2 (EntityView) – View on the reference entity

  • transformation (Mat4) – Optional transformation to apply on each atom position of view1.

Algorithms on Structures

Accessibility(ent, probe_radius=1.4, include_hydrogens=False, include_hetatm=False, include_water=False, oligo_mode=False, selection='', asa_abs='asaAbs', asa_rel='asaRel', asa_atom='asaAtom', algorithm=NACCESS)

Calculates the accesssible surface area for ever atom in ent. The algorithm mimics the behaviour of the bindings available for the NACCESS and DSSP tools and has been tested to reproduce the numbers accordingly.

Parameters:
  • ent (EntityView / EntityHandle) – Entity on which to calculate the surface

  • probe_radius (float) – Radius of probe to determine accessible surface area

  • include_hydrogens (bool) – Whether to include hydrogens in the solvent accessibility calculations. By default, every atom with ele=H,D is simply neglected.

  • include_hetatms (bool) – Whether to include atoms flagged as hetatms , i.e. ligands, in the solvent accessibility calculations. They are neglected by default.

  • include_water (bool) – Whether to include water in the solvent accessibility calculations. By default, every residue with name “HOH” is neglected.

  • oligo_mode (bool) – A typical used case of accessibility calculations is to determine the solvent accessibility of a full complex and then the accessibility of each chain individually. Lots of calculations can be cached because only the values of the atoms close to an interface change. This is exactly what happens when you activate the oligo mode. It returns exactly the same value but adds, additionally to the values estimated in full complex, the values from each individual chain as float properties on every residue and atom. Example for atom accessible surface if the according property name is set to “asaAtom”: Accessibility in the full complex is stored as “asaAtom”, the accessibility when only considering that particular chain is stored as “asaAtom_single_chain”. The other properties follow the same naming scheme.

  • selection (str) – Selection statement, that gets applied on ent before doing anything. Everything that is not selected is neglected. The default value of “” results in no selection at all.

  • asa_abs (str) – Float property name to assign the summed solvent accessible surface from each atom to a residue.

  • asa_rel (str) –

    Float property name to assign the relative solvent accessibility to a residue. This is the absolute accessibility divided by the maximum solvent accessibility of that particular residue. This maximum solvent accessibility is dependent on the chosen AccessibilityAlgorithm. Only residues of the 20 standarad amino acids can be handled.

    • In case of the NACCESS algorithm you can expect a value in range [0.0, 100.0] and a value of -99.9 for non standard residues.

    • In case of the DSSP algorithm you can expect a value in range [0.0, 1.0], no float property is assigned in case of a non standard residue.

  • asa_atom (str) – Float property name to assign the solvent accessible area to each atom.

  • algorithm (AccessibilityAlgorithm) – Specifies the used algorithm for solvent accessibility calculations

Returns:

The summed solvent accessibilty of each atom in ent.

class AccessibilityAlgorithm

The accessibility algorithm enum specifies the algorithm used by the respective tools. Available are:

NACCESS, DSSP

AssignSecStruct(ent)

Assigns secondary structures to all residues based on hydrogen bond patterns as described by DSSP.

Parameters:

ent (EntityView / EntityHandle) – Entity on which to assign secondary structures

class FindMemParam

Result object for the membrane detection algorithm described below

axis

initial search axis from which optimal membrane slab could be found

tilt_axis

Axis around which we tilt the membrane starting from the initial axis

tilt

Angle to tilt around tilt axis

angle

After the tilt operation we perform a rotation around the initial axis with this angle to get the final membrane axis

membrane_axis

The result of applying the tilt and rotation procedure described above. The membrane_axis is orthogonal to the membrane plane and has unit length.

pos

Real number that describes the membrane center point. To get the actual position you can do: pos * membrane_axis

width

Total width of the membrane in A

energy

Pseudo energy of the implicit solvation model

membrane_asa

Membrane accessible surface area

membrane_representation

Dummy atoms that represent the membrane. This entity is only valid if the according flag has been set to True when calling FindMembrane.

FindMembrane(ent, assign_membrane_representation=True, fast=False)

Estimates the optimal membrane position of a protein by using an implicit solvation model. The original algorithm and the used energy function are described in: Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI (2006) Positioning of proteins in membranes: A computational approach.

There are some modifications in this implementation and the procedure is as follows:

  • Initial axis are constructed that build the starting point for initial parameter grid searches.

  • For every axis, the protein is rotated so that the axis builds the z-axis

    • In order to exclude internal hydrophilic pores, only the outermost atoms with respect the the z-axis enter an initial grid search

    • The width and position of the membrane is optimized for different combinations of tilt and rotation angles (further described in FindMemParam). The top 20 parametrizations (only top parametrization if fast is True) are stored for further processing.

  • The 20 best membrane parametrizations from the initial grid search (only the best if fast is set to True) enter a final minimization step using a Levenberg-Marquardt minimizer.

Parameters:
  • ent (ost.mol.EntityHandle / ost.mol.EntityView) –

    Entity of a transmembrane protein, you’ll get weird results if this is not the case. The energy term of the result is typically a good indicator whether ent is an actual transmembrane protein. The following float properties will be set on the atoms:

    • ’asaAtom’ on all atoms that are selected with ent.Select(‘peptide=true and ele!=H’) as a result of envoking Accessibility().

    • ’membrane_e’ the contribution of the potentially membrane facing atoms to the energy function.

  • assign_membrane_representation (bool) – Whether to construct a membrane representation using dummy atoms

  • fast – If set to false, the 20 best results of the initial grid search undergo a Levenberg-Marquardt minimization and the parametrization with optimal minimized energy is returned. If set to yes, only the best result of the initial grid search is selected and returned after Levenberg-Marquardt minimization.

Returns:

The results object

Return type:

ost.mol.alg.FindMemParam

Trajectory Analysis

This is a set of functions used for basic trajectory analysis such as extracting positions, distances, angles and RMSDs. The organization is such that most functions have their counterpart at the individual frame level so that they can also be called on one frame instead of the whole trajectory.

All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.

SuperposeFrames(frames, sel, from=0, to=-1, ref=-1)

This function superposes the frames of the given coord group and returns them as a new coord group.

Parameters:
  • frames (CoordGroupHandle) – The source coord group.

  • sel (ost.mol.EntityView) – An entity view containing the selection of atoms to be used for superposition. If set to an invalid view, all atoms in the coord group are used.

  • from – index of the first frame

  • to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group

  • ref – The index of the reference frame to use for superposition. If set to -1, the each frame is superposed to the previous frame.

Returns:

A newly created coord group containing the superposed frames.

SuperposeFrames(frames, sel, ref_view, from=0, to=-1)

Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.

Parameters:
  • frames (CoordGroupHandle) – The source coord group.

  • sel (ost.mol.EntityView) – An entity view containing the selection of atoms of the frames to be used for superposition.

  • ref_view (ost.mol.EntityView) – The reference view on which the frames will be superposed. The number of atoms in this reference view should be equal to the number of atoms in sel.

  • from – index of the first frame

  • to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group

Returns:

A newly created coord group containing the superposed frames.

AnalyzeAtomPos(traj, atom1, stride=1)

This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeCenterOfMassPos(traj, sele, stride=1)

This function extracts the position of the center-of-mass of a selection (EntityView) from a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • sele (EntityView.) – The selection from which the center of mass is computed

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeDistanceBetwAtoms(traj, atom1, atom2, stride=1)

This function extracts the distance between two atoms from a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The first AtomHandle.

  • atom2 – The second AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeAngle(traj, atom1, atom2, atom3, stride=1)

This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The first AtomHandle.

  • atom2 – The second AtomHandle.

  • atom3 – The third AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeDihedralAngle(traj, atom1, atom2, atom3, atom4, stride=1)

This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The first AtomHandle.

  • atom2 – The second AtomHandle.

  • atom3 – The third AtomHandle.

  • atom4 – The fourth AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeDistanceBetwCenterOfMass(traj, sele1, sele2, stride=1)

This function extracts the distance between the center-of-mass of two selections (EntityView) from a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • sele1 (EntityView.) – The selection from which the first center of mass is computed

  • sele2 (EntityView.) – The selection from which the second center of mass is computed

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeRMSD(traj, reference_view, sele_view, stride=1)

This function extracts the rmsd between two EntityView and returns it as a vector. The views don’t have to be from the same entity. The reference positions are taken directly from the reference_view, evaluated only once. The positions from the sele_view are evaluated for each frame. If you want to compare to frame i of the trajectory t, first use t.CopyFrame(i) for example:

eh = io.LoadPDB(...)
t = io.LoadCHARMMTraj(eh, ...)
sele = eh.Select(...)
t.CopyFrame(0)
mol.alg.AnalyzeRMSD(t, sele, sele)
Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • reference_view (EntityView.) – The selection used as reference structure

  • sele_view (EntityView.) – The selection compared to the reference_view

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeMinDistance(traj, view1, view2, stride=1)

This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • view1 (EntityView.) – The first group of atoms

  • view2 (EntityView.) – The second group of atoms

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeMinDistanceBetwCenterOfMassAndView(traj, view_cm, view_atoms, stride=1)

This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • view_cm (EntityView.) – The group of atoms from which the center of mass is taken

  • view_atoms (EntityView.) – The second group of atoms

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeAromaticRingInteraction(traj, view_ring1, view_ring2, stride=1)

This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • view_ring1 (EntityView.) – First group of atoms

  • view_ring2 (EntityView.) – Second group of atoms

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

Mapping functions

The following functions help to convert one residue into another by reusing as much as possible from the present atoms. They are mainly meant to map from standard amino acid to other standard amino acids or from modified amino acids to standard amino acids.

CopyResidue(src_res, dst_res, editor)

Copies the atoms of src_res to dst_res using the residue names as guide to decide which of the atoms should be copied. If src_res and dst_res have the same name, or src_res is a modified version of dst_res (i.e. have the same single letter code), CopyConserved() will be called, otherwise CopyNonConserved().

If a CBeta atom wasn’t already copied from src_res, a new one at a reconstructed position will be added to dst_res if it is not GLY and all backbone positions are available to do it.

Parameters:
  • src_res (ResidueHandle) – The source residue

  • dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid)

  • editor (XCSEditor) – Editor used to modify dst_res.

Returns:

True if the residue could be copied as a conserved residue, False if it had to fallback to CopyNonConserved().

CopyConserved(src_res, dst_res, editor)

Copies the atoms of src_res to dst_res assuming that the parent amino acid of src_res (or src_res itself) are identical to dst_res.

If src_res and dst_res are identical, all heavy atoms are copied to dst_res. If src_res is a modified version of dst_res and the modification is a pure addition (e.g. the phosphate group of phosphoserine), the modification is stripped off and all other heavy atoms are copied to dst_res. If the modification is not a pure addition, it falls back to CopyNonConserved().

Additionally, the selenium atom of MSE is converted to sulphur to map MSE to MET.

Parameters:
  • src_res (ResidueHandle) – The source residue

  • dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid)

  • editor (XCSEditor) – Editor used to modify dst_res.

Returns:

A tuple of bools stating whether the residue could be copied without falling back to CopyNonConserved() and whether the CBeta atom was copied from src_res to dst_res.

CopyNonConserved(src_res, dst_res, editor)

Copies the heavy backbone atoms and CBeta (except for GLY) of src_res to dst_res.

Parameters:
  • src_res (ResidueHandle) – The source residue

  • dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid)

  • editor (XCSEditor) – Editor used to modify dst_res.

Returns:

A tuple of bools as in CopyConserved() with the first bool always being False.

Molecular Checker (Molck)

Programmatic usage

Molecular Checker (Molck) could be called directly from the code using Molck function:

#! /bin/env python

"""Run Molck with Python API.


This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
                 --fix-ele --out=<OUTPUT PATH> \
                 --complib=<PATH TO compounds.chemlib>
"""

from ost.io import LoadPDB, SavePDB
from ost.mol.alg import MolckSettings, Molck

from ost.conop import CompoundLib


pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")

# Using Molck function
ent = LoadPDB(pdbid)
ms = MolckSettings(rm_unk_atoms=True,
                   rm_non_std=True,
                   rm_hyd_atoms=True,
                   rm_oxt_atoms=True,
                   rm_zero_occ_atoms=False,
                   colored=False,
                   map_nonstd_res=False,
                   assign_elem=True)
Molck(ent, lib, ms)
SavePDB(ent, "<OUTPUT PATH>")

It can also be split into subsequent commands for greater controll:

#! /bin/env python

"""Run Molck with Python API.


This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
                 --fix-ele --out=<OUTPUT PATH> \
                 --complib=<PATH TO compounds.chemlib>
"""

from ost.io import LoadPDB, SavePDB
from ost.mol.alg import (RemoveAtoms, MapNonStandardResidues,
                         CleanUpElementColumn)
from ost.conop import CompoundLib


pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
map_nonstd = False

# Using function chain
ent = LoadPDB(pdbid)
if map_nonstd:
    MapNonStandardResidues(lib=lib, ent=ent)

RemoveAtoms(lib=lib,
            ent=ent,
            rm_unk_atoms=True,
            rm_non_std=True,
            rm_hyd_atoms=True,
            rm_oxt_atoms=True,
            rm_zero_occ_atoms=False,
            colored=False)

CleanUpElementColumn(lib=lib, ent=ent)
SavePDB(ent, "<OUTPUT PATH>")

API

class MolckSettings(rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=True, assign_elem=True)

Stores settings used for Molecular Checker.

Parameters:
rm_unk_atoms

Remove unknown and atoms not following the nomenclature.

Type:

bool

rm_non_std

Remove all residues not one of the 20 standard amino acids

Type:

bool

rm_hyd_atoms

Remove hydrogen atoms

Type:

bool

rm_oxt_atoms

Remove terminal oxygens

Type:

bool

rm_zero_occ_atoms

Remove atoms with zero occupancy

Type:

bool

colored

Whether output should be colored

Type:

bool

map_nonstd_res

Maps modified residues back to the parent amino acid, for example MSE -> MET, SEP -> SER

Type:

bool

assign_elem

Clean up element column

Type:

bool

ToString()
Returns:

String representation of the MolckSettings.

Return type:

str

Warning

The API here is set such that the functions modify the passed structure ent in-place. If this is not ok, please work on a copy of the structure.

Molck(ent, lib, settings[, prune=True])

Runs Molck on provided entity. Reprocesses ent with ost.conop.HeuristicProcessor and given lib once done.

Parameters:
  • ent (EntityHandle) – Structure to check

  • lib (CompoundLib) – Compound library

  • settings (MolckSettings) – Molck settings

  • prune (bool) – Whether to remove residues/chains that don’t contain atoms anymore after Molck cleanup

MapNonStandardResidues(ent, lib, reprocess=True)

Maps modified residues back to the parent amino acid, for example MSE -> MET.

Parameters:
  • ent (EntityHandle) – Structure to check

  • lib (CompoundLib) – Compound library

  • reprocess – The function generates a deep copy of ent. Highly recommended to enable reprocess that runs ost.conop.HeuristicProcessor with given lib. If set to False, you’ll have no connectivity etc. after calling this function.

RemoveAtoms(ent, lib, rm_unk_atoms=False, rm_non_std=False,                         rm_hyd_atoms=True, rm_oxt_atoms=False,                         rm_zero_occ_atoms=False, colored=False,
reprocess=True)

Removes atoms and residues according to some criteria.

Parameters:
CleanUpElementColumn(ent, lib)

Clean up element column.

Parameters:

Biounits

Biological assemblies, i.e. biounits, are an integral part of mmCIF files and their construction is fully defined in ost.io.MMCifInfoBioUnit. ost.io.MMCifInfoBioUnit.PDBize() provides one possibility to construct such biounits with compatibility with the PDB format in mind. That is single character chain names, dumping all ligands in one chain etc. Here we provide a more mmCIF-style way of constructing biounits. This can either be done starting from a ost.io.MMCifInfoBioUnit or the derived ost.mol.alg.BUInfo. The latter is a minimalistic representation of ost.io.MMCifInfoBioUnit and can be serialized to a byte string.

BUInfo(mmcif_buinfo):

Preprocesses data from ost.io.MMCifInfoBioUnit that are required to construct a biounit from an assymetric unit. Can be serialized.

Parameters:

mmcif_buinfo (ost.io.MMCifInfoBioUnit) – Biounit definition

ToBytes()
Returns:

A byte string from which the object can be reconstructed.

static FromBytes(byte_string)
Parameters:

byte_string – Can be created with ToBytes()

Returns:

A BUInfo object

CreateBU(asu, bu_info)

Constructs a biounit given an assymetric unit and transformation information. The following properties are copied from the assymetric unit and are expected to be set (this is the case if you use ost.io.LoadMMCIF() for the assymetric unit):

Each chain in the returned biounit can be referenced back to the assymetric unit as they follow a standardised naming scheme: <idx>.<asu_cname>, where asu_cname is the chain name in the assymetric unit and idx is the nth occurence of that chain in the biounit with one based indexing. There is a quirk though. An index of 1, for example 1.A, is reserved for the original AU chain with identity transform (read: no transform) applied. If a certain AU chain only occurs with an actual transform applied, numbering starts at 2.

Warning

There is the (rare) possibility that a AU chain that has only identity transform applied is not named 1.<au_cname>. As of january 2024, there are 3 pdb entries (8qn6, 8x1h, 2c0x) where the same AU chain with identity transform occurs several times in the same biounit. This is likely an error in the respective mmCIF files as the resulting chains sit on top of each other. OST just names the FIRST occurence as 1.<au_cname>.

Parameters:
  • asu (ost.mol.EntityHandle) – The assymetric unit

  • bu_info (MMCifInfoBioUnit/BUInfo) – Info object

Returns:

A ost.mol.EntityHandle of the requested biounit

Search

Enter search terms or a module, class or function name.

Contents

Documentation is available for the following OpenStructure versions:

(Currently viewing dev) / 2.8 / 2.7 / 2.6 / 2.5 / 2.4 / 2.3.1 / 2.3 / 2.2 / 2.1 / 2.0 / 1.9 / 1.8 / 1.7.1 / 1.7 / 1.6 / 1.5 / 1.4 / 1.3 / 1.2 / 1.11 / 1.10 / 1.1

This documentation is still under heavy development!
If something is missing or if you need the C++ API description in doxygen style, check our old documentation for further information.