mol.alg – Algorithms for Structures

Local Distance Test scores (lDDT, DRMSD)

Note

This is a new implementation of lDDT, introduced in OpenStructure 2.4 with focus on supporting quaternary structure and compounds beyond the 20 standard proteinogenic amino acids. The previous lDDT code that comes with Mariani et al. is considered deprecated.

Note

lddt.lDDTScorer provides the raw Python API to compute lDDT but stereochemistry checks as described in Mariani et al. must be done seperately. You may want to check out the compare-structures action (Comparing two structures) to compute lDDT with pre-processing and support for quaternary structures.

class lDDTScorer(target, compound_lib=None, custom_compounds=None, inclusion_radius=15, sequence_separation=0, symmetry_settings=None, seqres_mapping={}, bb_only=False)

lDDT scorer object for a specific target

Sets up everything to score models of that target. lDDT (local distance difference test) is defined as fraction of pairwise distances which exhibit a difference < threshold when considering target and model. In case of multiple thresholds, the average is returned. See

V. Mariani, M. Biasini, A. Barbato, T. Schwede, lDDT : A local superposition-free score for comparing protein structures and models using distance difference tests, Bioinformatics, 2013

Parameters:
  • target (ost.mol.EntityHandle/ost.mol.EntityView) – The target

  • compound_lib (ost.conop.CompoundLib) – Compound library from which a compound for each residue is extracted based on its name. Uses ost.conop.GetDefaultLib() if not given, raises if this returns no valid compound library. Atoms defined in the compound are searched in the residue and build the reference for scoring. If the residue has atoms with names [“A”, “B”, “C”] but the corresponding compound only has [“A”, “B”], “A” and “B” are considered for scoring. If the residue has atoms [“A”, “B”] but the compound has [“A”, “B”, “C”], “C” is considered missing and does not influence scoring, even if present in the model.

  • custom_compounds (dict with residue names (str) as key and CustomCompound as value.) – Custom compounds defining reference atoms. If given, custom_compounds take precedent over compound_lib.

  • inclusion_radius (float) – All pairwise distances < inclusion_radius are considered for scoring

  • sequence_separation (int) – Only pairwise distances between atoms of residues which are further apart than this threshold are considered. Residue distance is based on resnum. The default (0) considers all pairwise distances except intra-residue distances.

  • symmetry_settings (SymmetrySettings) – Define residues exhibiting internal symmetry, uses GetDefaultSymmetrySettings() if not given.

  • seqres_mapping (dict (key: str, value: ost.seq.AlignmentHandle)) – Mapping of model residues at the scoring stage happens with residue numbers defining their location in a reference sequence (SEQRES) using one based indexing. If the residue numbers in target don’t correspond to that SEQRES, you can specify the mapping manually. You can provide a dictionary to specify a reference sequence (SEQRES) for one or more chain(s). Key: chain name, value: alignment (seq1: SEQRES, seq2: sequence of residues in chain). Example: The residues in a chain with name “A” have sequence “YEAH” and residue numbers [42,43,44,45]. You can provide an alignment with seq1 “HELLYEAH” and seq2 “----YEAH”. “Y” gets assigned residue number 5, “E” gets assigned 6 and so on no matter what the original residue numbers were.

  • bb_only (bool) – Only consider atoms with name “CA” in case of amino acids and “C3’” for Nucleotides. this invalidates compound_lib. Raises if any residue in target is not r.chem_class.IsPeptideLinking() or r.chem_class.IsNucleotideLinking()

Raises:

RuntimeError if target contains compound which is not in compound_lib, RuntimeError if symmetry_settings specifies symmetric atoms that are not present in the according compound in compound_lib, RuntimeError if seqres_mapping is not provided and target contains residue numbers with insertion codes or the residue numbers for each chain are not monotonically increasing, RuntimeError if seqres_mapping is provided but an alignment is invalid (seq1 contains gaps, mismatch in seq1/seq2, seq2 does not match residues in corresponding chains).

GetNChainContacts(target_chain, no_interchain=False)

Returns number of contacts expected for a certain chain in target

Parameters:
  • target_chain (str) – Chain in target for which you want the number of expected contacts

  • no_interchain (bool) – Whether to exclude interchain contacts

Raises:

RuntimeError if specified chain doesnt exist

lDDT(model, thresholds=[0.5, 1.0, 2.0, 4.0], local_lddt_prop=None, local_contact_prop=None, chain_mapping=None, no_interchain=False, no_intrachain=False, penalize_extra_chains=False, residue_mapping=None, return_dist_test=False, check_resnames=True)

Computes lDDT of model - globally and per-residue

Parameters:
  • model (ost.mol.EntityHandle/ost.mol.EntityView) – Model to be scored - models are preferably scored upon performing stereo-chemistry checks in order to punish for non-sensical irregularities. This must be done separately as a pre-processing step. Target contacts that are not covered by model are considered not conserved, thus decreasing lDDT score. This also includes missing model chains or model chains for which no mapping is provided in chain_mapping.

  • thresholds (list of floats) – Thresholds of distance differences to be considered as correct - see docs in constructor for more info. default: [0.5, 1.0, 2.0, 4.0]

  • local_lddt_prop (str) – If set, per-residue scores will be assigned as generic float property of that name

  • local_contact_prop (str) – If set, number of expected contacts as well as number of conserved contacts will be assigned as generic int property. Excected contacts will be set as <local_contact_prop>_exp, conserved contacts as <local_contact_prop>_cons. Values are summed over all thresholds.

  • chain_mapping (dict with str as keys/values) – Mapping of model chains (key) onto target chains (value). This is required if target or model have more than one chain.

  • no_interchain (bool) – Whether to exclude interchain contacts

  • no_intrachain (bool) – Whether to exclude intrachain contacts (i.e. only consider interface related contacts)

  • penalize_extra_chains – Whether to include a fixed penalty for additional chains in the model that are not mapped to the target. ONLY AFFECTS RETURNED GLOBAL SCORE. In detail: adds the number of intra-chain contacts of each extra chain to the expected contacts, thus adding a penalty.

  • penalize_extra_chainsbool

  • residue_mapping (dict with key: str, value: ost.seq.AlignmentHandle) – By default, residue mapping is based on residue numbers. That means, a model chain and the respective target chain map to the same underlying reference sequence (SEQRES). Alternatively, you can specify one or several alignment(s) between model and target chains by providing a dictionary. key: Name of chain in model (respective target chain is extracted from chain_mapping), value: Alignment with first sequence corresponding to target chain and second sequence to model chain. There is NO reference sequence involved, so the two sequences MUST exactly match the actual residues observed in the respective target/model chains (ATOMSEQ).

  • return_dist_test – Whether to additionally return the underlying per-residue data for the distance difference test. Adds five objects to the return tuple. First: Number of total contacts summed over all thresholds Second: Number of conserved contacts summed over all thresholds Third: list with length of scored residues. Contains indices referring to model.residues. Fourth: numpy array of size len(scored_residues) containing the number of total contacts, Fifth: numpy matrix of shape (len(scored_residues), len(thresholds)) specifying how many for each threshold are conserved.

  • check_resnames (bool) – On by default. Enforces residue name matches between mapped model and target residues.

Returns:

global and per-residue lDDT scores as a tuple - first element is global lDDT score and second element a list of per-residue scores with length len(model.residues) None is assigned to residues that are not covered by target

class SymmetrySettings

Container for symmetric compounds

lDDT considers symmetries and selects the one resulting in the highest possible score.

A symmetry is defined as a renaming operation on one or more atoms that leads to a chemically equivalent residue. Example would be OD1 and OD2 in ASP => renaming OD1 to OD2 and vice versa gives a chemically equivalent residue.

Use AddSymmetricCompound() to define a symmetry which can then directly be accessed through the symmetric_compounds member.

AddSymmetricCompound(name, symmetric_atoms)

Adds symmetry for compound with name

Parameters:
  • name (str) – Name of compound with symmetry

  • symmetric_atoms (list of tuple) – Pairs of atom names that define renaming operation, i.e. after applying all switches defined in the tuples, the resulting residue should be chemically equivalent. Atom names must refer to the PDB component dictionary.

GetDefaultSymmetrySettings()

Constructs and returns SymmetrySettings object for natural amino acids

class CustomCompound(atom_names)

Defines atoms for custom compounds

lDDT requires the reference atoms of a compound which are typically extracted from a ost.conop.CompoundLib. This lightweight container allows to handle arbitrary compounds which are not necessarily in the compound library.

Parameters:

atom_names (list of str) – Names of atoms of custom compound

static FromResidue(res)

Construct custom compound from residue

Parameters:

res (ost.mol.ResidueView/ost.mol.ResidueHandle) – Residue from which reference atom names are extracted, hydrogen/deuterium atoms are filtered out

Returns:

CustomCompound

class lDDTSettings(radius=15, sequence_separation=0, cutoffs=(0.5, 1.0, 2.0, 4.0), label='locallddt')

Object containing the settings used for lDDT calculations.

Parameters:
radius

Distance inclusion radius.

Type:

float

sequence_separation

Sequence separation.

Type:

int

cutoffs

List of thresholds used to determine distance conservation.

Type:

list of float

label

The base name for the ResidueHandle properties that store the local scores.

Type:

str

PrintParameters()

Print settings.

ToString()
Returns:

String representation of the lDDTSettings object.

Return type:

str

stereochemistry – Stereochemistry Checks

Warning

Stereochemistry checks described in Mariani et al. are considered deprecated. They have been re-implemented and now support nucleotides. The old code is still available and documented here.

StereoDataFromMON_LIB(mon_lib_path, compounds=None)

Parses stereochemistry parameters from CCP4 MON_LIB

CCP4 MON_LIB contains data on ideal bond lengths/angles for compounds.

Original data (several updates in the meantime) come from:

  • Amino acid bond lengths and angles: Engh and Huber, Acta Cryst. A47, 392-400 (1991).

  • Purine and pyrimidine bond lengths and angles: O. Kennard & R. Taylor (1982), J. Am. Soc. Chem. vol. 104, pp. 3209-3212.

  • Sugar-phosphate backbone bond lengths and bond angles: W. Saenger’s Principles of Nucleic Acid Structure (1983), Springer-Verlag, pp. 70,86.

This function adds a dependency to the gemmi library to read cif files.

Parameters:
  • mon_lib_path (str) – Path to CCP4 MON_LIB

  • compounds (list) – Compounds to parse - parses proteinogenic amino acids and nucleotides if not given.

Returns:

dict with stereochemistry parameters

GetBondParam(a1, a2, stereo_data=None, stereo_link_data=None)

Returns mean and standard deviation for bond

Parameters:
Returns:

tuple with mean and standard deviation. Values are None if respective bond is not found in stereo_data

GetAngleParam(a1, a2, a3, stereo_data=None, stereo_link_data=None)

Returns mean and standard deviation for angle

Parameters:
Returns:

tuple with mean and standard deviation. Values are None if respective angle is not found in stereo_data

class ClashInfo(a1, a2, dist, tolerated_dist)

Object to hold info on clash

Constructor arguments are available as attributes:

ToJSON()

Return JSON serializable dict

Atoms are represented by a string in format: <chain_name>.<resnum>.<resnum_inscode>.<atom_name>

class BondViolationInfo(a1, a2, length, exp_length, std)

Object to hold info on bond violation

Constructor arguments are available as attributes:

ToJSON()

Return JSON serializable dict

Atoms are represented by a string in format: <chain_name>.<resnum>.<resnum_inscode>.<atom_name>

class AngleViolationInfo(a1, a2, a3, angle, exp_angle, std)

Object to hold info on angle violation

Constructor arguments are available as attributes:

ToJSON()

Return JSON serializable dict

Atoms are represented by a string in format: <chain_name>.<resnum>.<resnum_inscode>.<atom_name>

GetClashes(ent, vdw_radii=None, tolerance=1.5, disulfid_dist=2.03, disulfid_tolerance=1.0)

Identifies clashing atoms

A clash between two non-bonded atoms is defined as their distance d being below the sum of their vdw radii with some subtracted tolerance value.

The default values are not very sensitive.

Parameters:
  • ent (ost.mol.EntityHandle/ost.mol.EntityView) – Entity for which you want to identify clashing atoms

  • vdw_radii (dict) – Element based van der Waals radii. Only atoms of these elements will be considered. If not given, default values for all elements occuring in proteins/nucleotides are used. Must be provided as dict, where they key are elements (capitalized) and value the respective radii in Angstrom.

  • tolerance – Tolerance value

  • disulfid_dist (float) – Summed vdw radius that is used if two Sulfurs that can potentially build a disulfid bond interact

  • disulfid_tolerance – The respective tolerance

Returns:

A list of ClashInfo

GetBadBonds(ent, stereo_data=None, stereo_link_data=None, tolerance=12)

Identify unrealistic bonds

Parameters:
  • ent (ost.mol.EntityHandle/ost.mol.EntityView) – Entity for which you want to identify unrealistic bonds

  • stereo_data (dict) – Stereochemistry data, use return value of GetDefaultStereoData() if not given.

  • stereo_link_data (dict) – Stereochemistry data, use return value of GetDefaultStereoLinkData() if not given.

  • tolerance (int) – Bonds that devaiate more than tolerance times standard deviation from expected mean are considered bad

Returns:

list BondViolationInfo

GetBadAngles(ent, stereo_data=None, stereo_link_data=None, tolerance=12)

Identify unrealistic angles

Parameters:
  • ent (ost.mol.EntityHandle/ost.mol.EntityView) – Entity for which you want to identify unrealistic angles

  • stereo_data (dict) – Stereochemistry data, use return value of GetDefaultStereoData() if not given.

  • stereo_link_data (dict) – Stereochemistry data, use return value of GetDefaultStereoLinkData() if not given.

  • tolerance (int) – Angles that devaiate more than tolerance times standard deviation from expected mean are considered bad

Returns:

list of AngleViolationInfo

StereoCheck(ent, stereo_data=None, stereo_link_data=None)

Remove atoms with stereochemical problems

Selects for peptide/nucleotides and calls GetClashes(), GetBadBonds() and GetBadAngles() with default parameters.

  • Amino acids: Remove full residue if backbone atom is involved in stereochemistry issue (“N”, “CA”, “C”, “O”). Remove sidechain if any of the sidechain atoms is involved in stereochemistry issues.

  • Nucleotides: Remove full residue if backbone atom is involved in stereochemistry issue (“P”, “OP1”, “OP2”, “OP3”, “O5’”, “C5’”, “C4’”, “C3’”, “C2’”, “C1’”, “O4’”, “O3’”, “O2’”). Remove sidechain (base) if any of the sidechain atoms is involved in stereochemistry issues.

Parameters:
Returns:

Tuple with four elements: 1) ost.mol.EntityView of ent processed as described above 2) Return value of GetClashes() 3) return value of GetBadBonds() 4) return value of GetBadAngles()

GetDefaultStereoData()

Get default stereo data derived from CCP4 MON_LIB

Used as default if not provided in GetBadBonds(), GetBadAngles() and StereoCheck().

MON_LIB is licensed under GNU LESSER GENERAL PUBLIC LICENSE Version 3. Consult the latest CCP4 for the full license text.

GetDefaultStereoLinkData()

Get default stereo data for links between compounds

Hardcoded from arbitrary sources, see comments in the code.

Returns:

Data for peptide bonds, nucleotide links and disulfid bonds that are used as default if not provided in GetBadBonds(), GetBadAngles() and StereoCheck().

scoring – Specialized scoring functions

class lDDTBSScorer(reference, model, residue_number_alignment=False)

Scorer specific for a reference/model pair

Finds best possible binding site representation of reference in model given lDDT score. Uses ost.mol.alg.chain_mapping.ChainMapper to deal with chain mapping.

Parameters:
ScoreBS(ligand, radius=4.0, lddt_radius=10.0)

Computes binding site lDDT score given ligand. Best possible binding site representation is selected by lDDT but other scores such as CA based RMSD and GDT are computed too and returned.

Parameters:
  • ligand (r'((Residue)|(Chain)|(Entity))((View)|(Handle))') – Defines the scored binding site, i.e. provides positions to perform proximity search

  • radius (float) – Reference residues with any atom position within radius of ligand consitute the scored binding site

  • lddt_radius (float) – Passed as inclusion_radius to ost.mol.alg.lddt.lDDTScorer

Returns:

Object of type ost.mol.alg.chain_mapping.ReprResult containing all atom lDDT score and mapping information. None if no representation could be found.

class Scorer(model, target, resnum_alignments=False, molck_settings=None, cad_score_exec=None, custom_mapping=None, usalign_exec=None, lddt_no_stereochecks=False, n_max_naive=12)

Helper class to access the various scores available from ost.mol.alg

Deals with structure cleanup, chain mapping, interface identification etc. Intermediate results are available as attributes.

Parameters:
  • model (ost.mol.EntityHandle/ost.mol.EntityView) – Model structure - a deep copy is available as model. Additionally, ost.mol.alg.Molck() using molck_settings is applied.

  • target (ost.mol.EntityHandle/ost.mol.EntityView) – Target structure - a deep copy is available as target. Additionally, ost.mol.alg.Molck() using molck_settings is applied.

  • resnum_alignments (bool) – Whether alignments between chemically equivalent chains in model and target can be computed based on residue numbers. This can be assumed in benchmarking setups such as CAMEO/CASP.

  • molck_settings (ost.mol.alg.MolckSettings) – Settings used for Molck on model and target, if set to None, a default object is constructed by setting everything except rm_zero_occ_atoms and colored to True in ost.mol.alg.MolckSettings constructor.

  • cad_score_exec (str) – Explicit path to voronota-cadscore executable from voronota installation from https://github.com/kliment-olechnovic/voronota. If not given, voronota-cadscore must be in PATH if any of the CAD score related attributes is requested.

  • custom_mapping (dict) – Provide custom chain mapping between model and target. Dictionary with target chain names as key and model chain names as value.

  • usalign_exec (str) – Explicit path to USalign executable used to compute TM-score. If not given, TM-score will be computed with OpenStructure internal copy of USalign code.

  • lddt_no_stereochecks (bool) – Whether to compute lDDT without stereochemistry checks

  • n_max_naive (int) – Parameter for chain mapping. If model and target have less or equal that number of chains, the full mapping solution space is enumerated to find the the optimum. A heuristic is used otherwise.

property model

Model with Molck cleanup

Type:

ost.mol.EntityHandle

property target

Target with Molck cleanup

Type:

ost.mol.EntityHandle

property aln

Alignments of model/target chains

Alignments for each pair of chains mapped in mapping. First sequence is target sequence, second sequence the model sequence.

Type:

list of ost.seq.AlignmentHandle

property stereochecked_aln

Stereochecked equivalent of aln

The alignments may differ, as stereochecks potentially remove residues

Type:

:class:``

property stereochecked_model

View of model that has stereochemistry checks applied

First, a selection for peptide/nucleotide residues is performed, secondly peptide sidechains with stereochemical irregularities are removed (full residue if backbone atoms are involved). Irregularities are clashes or bond lengths/angles more than 12 standard deviations from expected values.

Type:

ost.mol.EntityView

property model_clashes

Clashing model atoms

Type:

list of ost.mol.alg.stereochemistry.ClashInfo

property model_bad_bonds

Model bonds with unexpected stereochemistry

Type:

list of ost.mol.alg.stereochemistry.BondViolationInfo

property model_bad_angles

Model angles with unexpected stereochemistry

Type:

list of ost.mol.alg.stereochemistry.AngleViolationInfo

property stereochecked_target

Same as stereochecked_model for target

Type:

ost.mol.EntityView

property target_clashes

Clashing target atoms

Type:

list of ost.mol.alg.stereochemistry.ClashInfo

property target_bad_bonds

Target bonds with unexpected stereochemistry

Type:

list of ost.mol.alg.stereochemistry.BondViolationInfo

property target_bad_angles

Target angles with unexpected stereochemistry

Type:

list of ost.mol.alg.stereochemistry.AngleViolationInfo

property chain_mapper

Chain mapper object for given target

Type:

ost.mol.alg.chain_mapping.ChainMapper

property mapping

Full chain mapping result for target/model

Type:

ost.mol.alg.chain_mapping.MappingResult

property model_interface_residues

Interface residues in model

Thats all residues having a contact with at least one residue from another chain (CB-CB distance <= 8A, CA in case of Glycine)

Type:

dict with chain names as key and and list with residue numbers of the respective interface residues.

property target_interface_residues

Same as model_interface_residues for target

Type:

dict with chain names as key and and list with residue numbers of the respective interface residues.

property lddt_scorer

lDDT scorer for stereochecked_target (default parameters)

Type:

ost.mol.alg.lddt.lDDTScorer

property bb_lddt_scorer

Backbone only lDDT scorer for target

No stereochecks applied for bb only lDDT which considers CA atoms for peptides and C3’ atoms for nucleotides.

Type:

ost.mol.alg.lddt.lDDTScorer

property qs_scorer

QS scorer constructed from mapping

The scorer object is constructed with default parameters and relates to model and target (no stereochecks).

Type:

ost.mol.alg.qsscore.QSScorer

property lddt

Global lDDT score in range [0.0, 1.0]

Computed based on stereochecked_model. In case of oligomers, mapping is used.

Type:

float

property local_lddt

Per residue lDDT scores in range [0.0, 1.0]

Computed based on stereochecked_model but scores for all residues in model are reported. If a residue has been removed by stereochemistry checks, the respective score is set to 0.0. If a residue is not covered by the target or is in a chain skipped by the chain mapping procedure (happens for super short chains), the respective score is set to None. In case of oligomers, mapping is used.

Type:

dict

property bb_lddt

Backbone only global lDDT score in range [0.0, 1.0]

Computed based on model on backbone atoms only. This is CA for peptides and C3’ for nucleotides. No stereochecks are performed. In case of oligomers, mapping is used.

Type:

float

property bb_local_lddt

Backbone only per residue lDDT scores in range [0.0, 1.0]

Computed based on model on backbone atoms only. This is CA for peptides and C3’ for nucleotides. No stereochecks are performed. If a residue is not covered by the target or is in a chain skipped by the chain mapping procedure (happens for super short chains), the respective score is set to None. In case of oligomers, mapping is used.

Type:

dict

property qs_global

Global QS-score

Computed based on model using mapping

Type:

float

property qs_best

Global QS-score - only computed on aligned residues

Computed based on model using mapping. The QS-score computation only considers contacts between residues with a mapping between target and model. As a result, the score won’t be lowered in case of additional chains/residues in any of the structures.

Type:

float

property interfaces

Interfaces with nonzero native_contacts

Type:

list of tuple with 4 elements each: (trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)

property interface_qs_global

QS-score for each interface in interfaces

Type:

list of float

property interface_qs_best

QS-score for each interface in interfaces

Only computed on aligned residues

Type:

list of float

property native_contacts

N native contacts for interfaces in interfaces

A contact is a pair or residues from distinct chains that have a minimal heavy atom distance < 5A

Type:

list of int

property model_contacts

N model contacts for interfaces in interfaces

A contact is a pair or residues from distinct chains that have a minimal heavy atom distance < 5A

Type:

list of int

property dockq_scores

DockQ scores for interfaces in interfaces

list of float

property fnat

fnat scores for interfaces in interfaces

fnat: Fraction of native contacts that are also present in model

list of float

property fnonnat

fnonnat scores for interfaces in interfaces

fnat: Fraction of model contacts that are not present in target

list of float

property irmsd

irmsd scores for interfaces in interfaces

irmsd: RMSD of interface (RMSD computed on N, CA, C, O atoms) which consists of each residue that has at least one heavy atom within 10A of other chain.

list of float

property lrmsd

lrmsd scores for interfaces in interfaces

lrmsd: The interfaces are superposed based on the receptor (rigid min RMSD superposition) and RMSD for the ligand is reported. Superposition and RMSD are based on N, CA, C and O positions, receptor is the chain contributing to the interface with more residues in total.

list of float

property nonmapped_interfaces

Interfaces present in target that are not mapped

At least one of the chains is not present in target

Type:

list of tuple with two elements each: (trg_ch1, trg_ch2)

property nonmapped_interfaces_contacts

Number of native contacts in nonmapped_interfaces

Type:

list of int

property dockq_ave

Average of DockQ scores in dockq_scores

In its original implementation, DockQ only operates on single interfaces. Thus the requirement to combine scores for higher order oligomers.

Type:

float

property dockq_wave

Same as dockq_ave, weighted by native_contacts

Type:

float

property dockq_ave_full

Same as dockq_ave but penalizing for missing interfaces

Interfaces in nonmapped_interfaces are added as 0.0 in average computation.

Type:

float

property dockq_wave_full

Same as dockq_ave_full, but weighted

Interfaces in nonmapped_interfaces are added as 0.0 in average computations and the respective weights are derived from nonmapped_interfaces_contacts

property mapped_target_pos

Mapped representative positions in target

Thats CA positions for peptide residues and C3’ positions for nucleotides. Has same length as mapped_model_pos and mapping is based on mapping.

Type:

ost.geom.Vec3List

property mapped_model_pos

Mapped representative positions in model

Thats CA positions for peptide residues and C3’ positions for nucleotides. Has same length as mapped_target_pos and mapping is based on mapping.

Type:

ost.geom.Vec3List

property transformed_mapped_model_pos

mapped_model_pos with transform applied

Type:

ost.geom.Vec3List

property n_target_not_mapped

Number of target residues which have no mapping to model

Type:

int

property transform

Transform: mapped_model_pos onto mapped_target_pos

Computed using Kabsch minimal rmsd algorithm

Type:

ost.geom.Mat4

property gdtts

GDT with thresholds: 8.0A, 4.0A, 2.0A and 1.0A

Computed on transformed_mapped_model_pos and mapped_target_pos

Type:

float

property gdtha

GDT with thresholds: 4.0A, 2.0A, 1.0A and 0.5A

Computed on transformed_mapped_model_pos and mapped_target_pos

Type:

float

property rmsd

RMSD

Computed on transformed_mapped_model_pos and mapped_target_pos

Type:

float

property cad_score

The global CAD atom-atom (AA) score

Computed based on model. In case of oligomers, mapping is used.

Type:

float

property local_cad_score

The per-residue CAD atom-atom (AA) scores

Computed based on model. In case of oligomers, mapping is used.

Type:

dict

property patch_qs

Patch QS-scores for each residue in model_interface_residues

Representative patches for each residue r in chain c are computed as follows:

  • mdl_patch_one: All residues in c with CB (CA for GLY) positions within 8A of r and within 12A of residues from any other chain.

  • mdl_patch_two: Closest residue x to r in any other chain gets identified. Patch is then constructed by selecting all residues from any other chain within 8A of x and within 12A from any residue in c.

  • trg_patch_one: Chain name and residue number based mapping from mdl_patch_one

  • trg_patch_two: Chain name and residue number based mapping from mdl_patch_two

Results are stored in the same manner as model_interface_residues, with corresponding scores instead of residue numbers. Scores for residues which are not mol.ChemType.AMINOACIDS are set to None. Additionally, interface patches are derived from model. If they contain residues which are not covered by target, the score is set to None too.

Type:

dict with chain names as key and and list with scores of the respective interface residues.

property patch_dockq

Same as patch_qs but for DockQ scores

property tm_score

TM-score computed with USalign

USalign executable can be specified with usalign_exec kwarg at Scorer construction, an OpenStructure internal copy of the USalign code is used otherwise.

Type:

float

property usalign_mapping

Mapping computed with USalign

Dictionary with target chain names as key and model chain names as values. No guarantee that all chains are mapped. USalign executable can be specified with usalign_exec kwarg at Scorer construction, an OpenStructure internal copy of the USalign code is used otherwise.

Type:

dict

ligand_scoring – Ligand scoring functions

class LigandScorer(model, target, model_ligands=None, target_ligands=None, resnum_alignments=False, check_resnames=True, rename_ligand_chain=False, chain_mapper=None, substructure_match=False, radius=4.0, lddt_pli_radius=6.0, lddt_lp_radius=10.0, binding_sites_topn=100000, global_chain_mapping=False, rmsd_assignment=False, n_max_naive=12, custom_mapping=None)

Scorer class to compute various small molecule ligand (non polymer) scores.

Note

Extra requirements:

  • Python modules numpy and networkx must be available (e.g. use pip install numpy networkx)

At the moment, two scores are available:

  • lDDT-PLI, that looks at the conservation of protein-ligand contacts with lDDT.

  • Binding-site superposed, symmetry-corrected RMSD that assesses the accuracy of the ligand pose (BiSyRMSD, hereinafter referred to as RMSD).

Both scores involve local chain mapping of the reference binding site onto the model, symmetry-correction, and finally assignment (mapping) of model and target ligands, as described in (Manuscript in preparation).

Results are available as matrices ((lddt_pli|rmsd)_matrix), where every target-model score is reported in a matrix; as (lddt_pli|rmsd) where a model-target assignment has been determined (see below) and reported in a dictionary; and as ((lddt_pli|rmsd)_details) methods, which report additional details about different aspects of the scoring such as chain mapping.

The behavior of chain mapping and ligand assignment can be controlled with the global_chain_mapping and rmsd_assignment arguments.

By default, chain mapping is performed locally, ie. only within the binding site. As a result, different ligand scores can correspond to different chain mappings. This tends to produce more favorable scores, especially in large, partially regular oligomeric complexes. Setting global_chain_mapping=True enforces a single global chain mapping, as per ost.mol.alg.chain_mapping.ChainMapper.GetMapping(). Note that this global chain mapping currently ignores non polymer entities such as small ligands, and may result in overly pessimistic scores.

By defaults, target-model ligand assignments are computed independently for the RMSD and lDDT-PLI scores. For RMSD, each model ligand is uniquely assigned to a target ligand, starting from the “best” possible mapping (lowest RMSD) and using each target and model ligand in a single assignment. Ties are resolved by best (highest) lDDT-PLI. Similarly, for lDDT-PLI, the assignment is based on the highest lDDT-PLI, and ties broken by lowest RMSD. Setting rmsd_assignment=True forces a single ligand assignment, based on RMSD only. Ties are broken arbitrarily.

Assumptions:

The class generally assumes that the is_ligand property is properly set on all the ligand atoms, and only ligand atoms. This is typically the case for entities loaded from mmCIF (tested with mmCIF files from the PDB and SWISS-MODEL), but it will most likely not work for most entities loaded from PDB files.

The class doesn’t perform any cleanup of the provided structures. It is up to the caller to ensure that the data is clean and suitable for scoring. Molck should be used with extra care, as many of the options (such as rm_non_std or map_nonstd_res) can cause ligands to be removed from the structure. If cleanup with Molck is needed, ligands should be kept and passed separately. Non-ligand residues should be valid compounds with atom names following the naming conventions of the component dictionary. Non-standard residues are acceptable, and if the model contains a standard residue at that position, only atoms with matching names will be considered.

Unlike most of OpenStructure, this class does not assume that the ligands (either in the model or the target) are part of the PDB component dictionary. They may have arbitrary residue names. Residue names do not have to match between the model and the target. Matching is based on the calculation of isomorphisms which depend on the atom element name and atom connectivity (bond order is ignored). It is up to the caller to ensure that the connectivity of atoms is properly set before passing any ligands to this class. Ligands with improper connectivity will lead to bogus results.

Note, however, that atom names should be unique within a residue (ie two distinct atoms cannot have the same atom name).

This only applies to the ligand. The rest of the model and target structures (protein, nucleic acids) must still follow the usual rules and contain only residues from the compound library.

Although it isn’t a requirement, hydrogen atoms should be removed from the structures. Here is an example code snippet that will perform a reasonable cleanup. Keep in mind that this is most likely not going to work as expected with entities loaded from PDB files, as the is_ligand flag is probably not set properly.

Here is a snippet example of how to use this code:

from ost.mol.alg.ligand_scoring import LigandScorer
from ost.mol.alg import Molck, MolckSettings

# Load data
# Structure model in PDB format, containing the receptor only
model = io.LoadPDB("path_to_model.pdb")
# Ligand model as SDF file
model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf")
# Target loaded from mmCIF, containing the ligand
target = io.LoadMMCIF("path_to_target.cif")

# Cleanup a copy of the structures
cleaned_model = model.Copy()
cleaned_target = target.Copy()
molck_settings = MolckSettings(rm_unk_atoms=True,
                               rm_non_std=False,
                               rm_hyd_atoms=True,
                               rm_oxt_atoms=False,
                               rm_zero_occ_atoms=False,
                               colored=False,
                               map_nonstd_res=False,
                               assign_elem=True)
Molck(cleaned_model, conop.GetDefaultLib(), molck_settings)
Molck(cleaned_target, conop.GetDefaultLib(), molck_settings)

# Setup scorer object and compute lDDT-PLI
model_ligands = [model_ligand.Select("ele != H")]
ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands)
print("lDDT-PLI:", ls.lddt_pli)
print("RMSD:", ls.rmsd)
Parameters:
  • model (ost.mol.EntityHandle/ost.mol.EntityView) – Model structure - a deep copy is available as model. No additional processing (ie. Molck), checks, stereochemistry checks or sanitization is performed on the input. Hydrogen atoms are kept.

  • target (ost.mol.EntityHandle/ost.mol.EntityView) – Target structure - a deep copy is available as target. No additional processing (ie. Molck), checks or sanitization is performed on the input. Hydrogen atoms are kept.

  • model_ligands (list) – Model ligands, as a list of ResidueHandle belonging to the model entity. Can be instantiated with either a :class:list of ResidueHandle/ost.mol.ResidueView or of ost.mol.EntityHandle/ost.mol.EntityView. If None, ligands will be extracted from the model entity, from chains with ChainType CHAINTYPE_NON_POLY (this is normally set properly in entities loaded from mmCIF).

  • target_ligands (list) – Target ligands, as a list of ResidueHandle belonging to the target entity. Can be instantiated either a :class:list of ResidueHandle/ost.mol.ResidueView or of ost.mol.EntityHandle/ost.mol.EntityView containing a single residue each. If None, ligands will be extracted from the target entity, from chains with ChainType CHAINTYPE_NON_POLY (this is normally set properly in entities loaded from mmCIF).

  • resnum_alignments (bool) – Whether alignments between chemically equivalent chains in model and target can be computed based on residue numbers. This can be assumed in benchmarking setups such as CAMEO/CASP.

  • check_resnames (bool) – On by default. Enforces residue name matches between mapped model and target residues.

  • rename_ligand_chain (bool) – If a residue with the same chain name and residue number than an explicitly passed model or target ligand exits in the structure, and rename_ligand_chain is False, a RuntimeError will be raised. If rename_ligand_chain is True, the ligand will be moved to a new chain instead, and the move will be logged to the console with SCRIPT level.

  • chain_mapper (ost.mol.alg.chain_mapping.ChainMapper) – a chain mapper initialized for the target structure. If None (default), a chain mapper will be initialized lazily as required.

  • substructure_match (bool) – Set this to True to allow partial target ligand.

  • radius (float) – Inclusion radius for the binding site. Any residue with atoms within this distance of the ligand will be included in the binding site.

  • lddt_pli_radius (float) – lDDT inclusion radius for lDDT-PLI.

  • lddt_lp_radius (float) – lDDT inclusion radius for lDDT-LP.

  • binding_sites_topn (int) – maximum number of target binding site representations to assess, per target ligand. Ignored if global_chain_mapping is True.

  • global_chain_mapping (bool) – set to True to use a global chain mapping for the polymer (protein, nucleotide) chains. Defaults to False, in which case only local chain mappings are allowed (where different ligand may be scored against different chain mappings).

  • custom_mapping (dict) – Provide custom chain mapping between model and target that is used as global chain mapping. Dictionary with target chain names as key and model chain names as value. Only has an effect if global_chain_mapping is True.

  • rmsd_assignment (bool) – assign ligands based on RMSD only. The default (False) is to use a combination of lDDT-PLI and RMSD for the assignment.

  • n_max_naive (int) – Parameter for global chain mapping. If model and target have less or equal that number of chains, the full mapping solution space is enumerated to find the the optimum. A heuristic is used otherwise.

property chain_mapper

Chain mapper object for the given target.

Type:

ost.mol.alg.chain_mapping.ChainMapper

property rmsd_matrix

Get the matrix of RMSD values.

Target ligands are in rows, model ligands in columns.

NaN values indicate that no RMSD could be computed (i.e. different ligands).

Return type:

ndarray

property lddt_pli_matrix

Get the matrix of lDDT-PLI values.

Target ligands are in rows, model ligands in columns.

NaN values indicate that no lDDT-PLI could be computed (i.e. different ligands).

Return type:

ndarray

property rmsd

Get a dictionary of RMSD score values, keyed by model ligand (chain name, ResNum).

Return type:

dict

property rmsd_details

Get a dictionary of RMSD score details (dictionaries), keyed by model ligand (chain name, ResNum).

Each sub-dictionary contains the following information:

  • rmsd: the RMSD score value.

  • lddt_lp: the lDDT score of the ligand pocket (lDDT-LP).

  • bs_ref_res: a list of residues (ResidueHandle) that define the binding site in the reference.

  • bs_ref_res_mapped: a list of residues (ResidueHandle) in the reference binding site that could be mapped to the model.

  • bs_mdl_res_mapped: a list of residues (ResidueHandle) in the model that were mapped to the reference binding site. The residues are in the same order as bs_ref_res_mapped.

  • bb_rmsd: the RMSD of the binding site backbone after superposition

  • target_ligand: residue handle of the target ligand.

  • model_ligand: residue handle of the model ligand.

  • chain_mapping: local chain mapping as a dictionary, with target chain name as key and model chain name as value.

  • transform: transformation to superpose the model onto the target.

  • substructure_match: whether the score is the result of a partial (substructure) match. A value of True indicates that the target ligand covers only part of the model, while False indicates a perfect match.

  • inconsistent_residues: a list of tuples of mapped residues views (ResidueView) with residue names that differ between the reference and the model, respectively. The list is empty if all residue names match, which is guaranteed if check_resnames=True. Note: more binding site mappings may be explored during scoring, but only inconsistencies in the selected mapping are reported.

Return type:

dict

property lddt_pli

Get a dictionary of lDDT-PLI score values, keyed by model ligand (chain name, ResNum).

Return type:

dict

property lddt_pli_details

Get a dictionary of lDDT-PLI score details (dictionaries), keyed by model ligand (chain name, ResNum).

Each sub-dictionary contains the following information:

  • lddt_pli: the lDDT-PLI score value.

  • rmsd: the RMSD score value corresponding to the lDDT-PLI chain mapping and assignment. This may differ from the RMSD-based assignment. Note that a different isomorphism than lddt_pli may be used.

  • lddt_lp: the lDDT score of the ligand pocket (lDDT-LP).

  • lddt_pli_n_contacts: number of total contacts used in lDDT-PLI, summed over all thresholds. Can be divided by 8 to obtain the number of atomic contacts.

  • bs_ref_res: a list of residues (ResidueHandle) that define the binding site in the reference.

  • bs_ref_res_mapped: a list of residues (ResidueHandle) in the reference binding site that could be mapped to the model.

  • bs_mdl_res_mapped: a list of residues (ResidueHandle) in the model that were mapped to the reference binding site. The residues are in the same order as bs_ref_res_mapped.

  • bb_rmsd: the RMSD of the binding site backbone after superposition. Note: not used for lDDT-PLI computation.

  • target_ligand: residue handle of the target ligand.

  • model_ligand: residue handle of the model ligand.

  • chain_mapping: local chain mapping as a dictionary, with target chain name as key and model chain name as value.

  • transform: transformation to superpose the model onto the target (for RMSD only).

  • substructure_match: whether the score is the result of a partial (substructure) match. A value of True indicates that the target ligand covers only part of the model, while False indicates a perfect match.

  • inconsistent_residues: a list of tuples of mapped residues views (ResidueView) with residue names that differ between the reference and the model, respectively. The list is empty if all residue names match, which is guaranteed if check_resnames=True. Note: more binding site mappings may be explored during scoring, but only inconsistencies in the selected mapping are reported.

Return type:

dict

SCRMSD(model_ligand, target_ligand, transformation=geom.Mat4(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), substructure_match=False)

Calculate symmetry-corrected RMSD.

Binding site superposition must be computed separately and passed as transformation.

Parameters:
Return type:

float

Raises:

NoSymmetryError when no symmetry can be found.

exception NoSymmetryError

Exception to be raised when no symmetry can be found.

chain_mapping – Chain Mapping

Chain mapping aims to identify a one-to-one relationship between chains in a reference structure and a model.

class ChainMapper(target, resnum_alignments=False, pep_seqid_thr=95.0, pep_gap_thr=1.0, nuc_seqid_thr=95.0, nuc_gap_thr=1.0, pep_subst_mat=<ost.seq.alg._ost_seq_alg.SubstWeightMatrix object>, pep_gap_open=-11, pep_gap_ext=-1, nuc_subst_mat=<ost.seq.alg._ost_seq_alg.SubstWeightMatrix object>, nuc_gap_open=-4, nuc_gap_ext=-4, min_pep_length=10, min_nuc_length=4, n_max_naive=100000000.0)

Class to compute chain mappings

All algorithms are performed on processed structures which fulfill criteria as given in constructor arguments (min_pep_length, “min_nuc_length”) and only contain residues which have all required backbone atoms. for peptide residues thats N, CA, C and CB (no CB for GLY), for nucleotide residues thats O5’, C5’, C4’, C3’ and O3’.

Chain mapping is a three step process:

  • Group chemically identical chains in target using pairwise alignments that are either computed with Needleman-Wunsch (NW) or simply derived from residue numbers (resnum_alignments flag). In case of NW, pep_subst_mat, pep_gap_open and pep_gap_ext and their nucleotide equivalents are relevant. Two chains are considered identical if they fulfill the thresholds given by pep_seqid_thr, pep_gap_thr, their nucleotide equivalents respectively. The grouping information is available as attributes of this class.

  • Map chains in an input model to these groups. Generating alignments and the similarity criteria are the same as above. You can either get the group mapping with GetChemMapping() or directly call one of the full fletched one-to-one chain mapping functions which execute that step internally.

  • Obtain one-to-one mapping for chains in an input model and target with one of the available mapping functions. Just to get an idea of complexity. If target and model are octamers, there are 8! = 40320 possible chain mappings.

Parameters:
  • target (ost.mol.EntityView/ost.mol.EntityHandle) – Target structure onto which models are mapped. Computations happen on a selection only containing polypeptides and polynucleotides.

  • resnum_alignments (bool) – Use residue numbers instead of Needleman-Wunsch to compute pairwise alignments. Relevant for chem_groups and related attributes.

  • pep_seqid_thr (float) – Threshold used to decide when two chains are identical. 95 percent tolerates the few mutations crystallographers like to do.

  • pep_gap_thr (float) – Additional threshold to avoid gappy alignments with high seqid. By default this is disabled (set to 1.0). This threshold checks for a maximum allowed fraction of gaps in any of the two sequences after stripping terminal gaps. The reason for not just normalizing seqid by the longer sequence is that one sequence might be a perfect subsequence of the other but only cover half of it.

  • nuc_seqid_thr (float) – Nucleotide equivalent for pep_seqid_thr

  • nuc_gap_thr (float) – Nucleotide equivalent for nuc_gap_thr

  • pep_subst_mat (ost.seq.alg.SubstWeightMatrix) – Substitution matrix to align peptide sequences, irrelevant if resnum_alignments is True, defaults to seq.alg.BLOSUM62

  • pep_gap_open (int) – Gap open penalty to align peptide sequences, irrelevant if resnum_alignments is True

  • pep_gap_ext (int) – Gap extension penalty to align peptide sequences, irrelevant if resnum_alignments is True

  • nuc_subst_mat (ost.seq.alg.SubstWeightMatrix) – Nucleotide equivalent for pep_subst_mat, defaults to seq.alg.NUC44

  • nuc_gap_open (int) – Nucleotide equivalent for pep_gap_open

  • nuc_gap_ext (int) – Nucleotide equivalent for pep_gap_ext

  • min_pep_length (int) – Minimal number of residues for a peptide chain to be considered in target and in models.

  • min_nuc_length (int) – Minimal number of residues for a nucleotide chain to be considered in target and in models.

  • n_max_naive (int) – Max possible chain mappings that are enumerated in GetNaivelDDTMapping() / GetDecomposerlDDTMapping(). A RuntimeError is raised in case of bigger complexity.

property target

Target structure that only contains peptides/nucleotides

Contains only residues that have the backbone representatives (CA for peptide and C3’ for nucleotides) to avoid ATOMSEQ alignment inconsistencies when switching between all atom and backbone only representations.

Type:

ost.mol.EntityView

property polypep_seqs

Sequences of peptide chains in target

Respective EntityView from target for each sequence s are available as s.GetAttachedView()

Type:

ost.seq.SequenceList

property polynuc_seqs

Sequences of nucleotide chains in target

Respective EntityView from target for each sequence s are available as s.GetAttachedView()

Type:

ost.seq.SequenceList

property chem_groups

Groups of chemically equivalent chains in target

First chain in group is the one with longest sequence.

Getter:

Computed on first use (cached)

Type:

list of list of str (chain names)

property chem_group_alignments

MSA for each group in chem_groups

Sequences in MSAs exhibit same order as in chem_groups and have the respective ost.mol.EntityView from target attached.

Getter:

Computed on first use (cached)

Type:

ost.seq.AlignmentList

property chem_group_ref_seqs

Reference (longest) sequence for each group in chem_groups

Respective EntityView from target for each sequence s are available as s.GetAttachedView()

Getter:

Computed on first use (cached)

Type:

ost.seq.SequenceList

property chem_group_types

ChemType of each group in chem_groups

Specifying if groups are poly-peptides/nucleotides, i.e. ost.mol.ChemType.AMINOACIDS or ost.mol.ChemType.NUCLEOTIDES

Getter:

Computed on first use (cached)

Type:

list of ost.mol.ChemType

GetChemMapping(model)

Maps sequences in model to chem_groups of target

Parameters:

model (ost.mol.EntityView/ost.mol.EntityHandle) – Model from which to extract sequences, a selection that only includes peptides and nucleotides is performed and returned along other results.

Returns:

Tuple with two lists of length len(self.chem_groups) and an ost.mol.EntityView representing model: 1) Each element is a list with mdl chain names that map to the chem group at that position. 2) Each element is a ost.seq.AlignmentList aligning these mdl chain sequences to the chem group ref sequences. 3) A selection of model that only contains polypeptides and polynucleotides whose ATOMSEQ exactly matches the sequence info in the returned alignments.

GetlDDTMapping(model, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], strategy='naive', steep_opt_rate=None, full_n_mdl_chains=None, block_seed_size=5, block_blocks_per_chem_group=5, chem_mapping_result=None)

Identify chain mapping by optimizing lDDT score

Maps model chain sequences to chem_groups and find mapping based on backbone only lDDT score (CA for amino acids C3’ for Nucleotides).

Either performs a naive search, i.e. enumerate all possible mappings or executes a greedy strategy that tries to identify a (close to) optimal mapping in an iterative way by starting from a start mapping (seed). In each iteration, the one-to-one mapping that leads to highest increase in number of conserved contacts is added with the additional requirement that this added mapping must have non-zero interface counts towards the already mapped chains. So basically we’re “growing” the mapped structure by only adding connected stuff.

The available strategies:

  • naive: Enumerates all possible mappings and returns best

  • greedy_fast: perform all vs. all single chain lDDTs within the respective ref/mdl chem groups. The mapping with highest number of conserved contacts is selected as seed for greedy extension

  • greedy_full: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and retain the mapping leading to the best lDDT. Optionally, you can reduce the number of mdl chains per ref chain to the full_n_mdl_chains best scoring ones.

  • greedy_block: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and compute single chain lDDTs. The block_blocks_per_chem_group best scoring ones are extend by block_seed_size chains and the best scoring one is exhaustively extended.

Sets MappingResult.opt_score in case of no trivial one-to-one mapping.

Parameters:
  • model (ost.mol.EntityView/ost.mol.EntityHandle) – Model to map

  • inclusion_radius (float) – Inclusion radius for lDDT

  • thresholds (list of float) – Thresholds for lDDT

  • strategy (str) – Strategy to find mapping. Must be in [“naive”, “greedy_fast”, “greedy_full”, “greedy_block”]

  • steep_opt_rate (int) – Only relevant for greedy strategies. If set, every steep_opt_rate mappings, a simple optimization is executed with the goal of avoiding local minima. The optimization iteratively checks all possible swaps of mappings within their respective chem groups and accepts swaps that improve lDDT score. Iteration stops as soon as no improvement can be achieved anymore.

  • full_n_mdl_chains (int) – Param for greedy_full strategy - Max number of mdl chains that are tried per ref chain. The default (None) tries all of them.

  • block_seed_size (int) – Param for greedy_block strategy - Initial seeds are extended by that number of chains.

  • block_blocks_per_chem_group (int) – Param for greedy_block strategy - Number of blocks per chem group that are extended in an initial search for high scoring local solutions.

  • chem_mapping_result (tuple) – Pro param. The result of GetChemMapping() where you provided model. If set, model parameter is not used.

Returns:

A MappingResult

GetQSScoreMapping(model, contact_d=12.0, strategy='naive', full_n_mdl_chains=None, block_seed_size=5, block_blocks_per_chem_group=5, steep_opt_rate=None, chem_mapping_result=None)

Identify chain mapping based on QSScore

Scoring is based on CA/C3’ positions which are present in all chains of a chem_groups as well as the model chains which are mapped to that respective chem group. QS score is not defined for single chains. The greedy strategies that require to identify starting seeds thus often rely on single chain lDDTs.

The following strategies are available:

  • naive: Naively iterate all possible mappings and return best based

    on QS score.

  • greedy_fast: perform all vs. all single chain lDDTs within the respective ref/mdl chem groups. The mapping with highest number of conserved contacts is selected as seed for greedy extension. Extension is based on QS score.

  • greedy_full: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and retain the mapping leading to the best QS score. Optionally, you can reduce the number of mdl chains per ref chain to the full_n_mdl_chains best scoring with respect to single chain lDDT.

  • greedy_block: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and compute single chain lDDTs. The block_blocks_per_chem_group best scoring ones are extend by block_seed_size chains and the block with with best QS score is exhaustively extended.

Sets MappingResult.opt_score in case of no trivial one-to-one mapping.

Parameters:
  • model (ost.mol.EntityView/ost.mol.EntityHandle) – Model to map

  • contact_d (float) – Max distance between two residues to be considered as contact in qs scoring

  • strategy (str) – Strategy for sampling, must be in [“naive”]

  • chem_mapping_result (tuple) – Pro param. The result of GetChemMapping() where you provided model. If set, model parameter is not used.

Returns:

A MappingResult

GetRigidMapping(model, strategy='greedy_single_gdtts', single_chain_gdtts_thresh=0.4, subsampling=None, first_complete=False, iterative_superposition=False, chem_mapping_result=None)

Identify chain mapping based on rigid superposition

Superposition and scoring is based on CA/C3’ positions which are present in all chains of a chem_groups as well as the model chains which are mapped to that respective chem group.

Transformations to superpose model onto ChainMapper.target are estimated using all possible combinations of target and model chains within the same chem groups and build the basis for further extension.

There are four extension strategies:

  • greedy_single_gdtts: Iteratively add the model/target chain pair that adds the most conserved contacts based on the GDT-TS metric (Number of CA/C3’ atoms within [8, 4, 2, 1] Angstrom). The mapping with highest GDT-TS score is returned. However, that mapping is not guaranteed to be complete (see single_chain_gdtts_thresh).

  • greedy_iterative_gdtts: Same as greedy_single_gdtts except that the transformation gets updated with each added chain pair.

  • greedy_single_rmsd: Conceptually similar to greedy_single_gdtts but the added chain pairs are the ones with lowest RMSD. The mapping with lowest overall RMSD gets returned. single_chain_gdtts_thresh is only applied to derive the initial transformations. After that, the minimal RMSD chain pair gets iteratively added without applying any threshold.

  • greedy_iterative_rmsd: Same as greedy_single_rmsd exept that the transformation gets updated with each added chain pair. single_chain_gdtts_thresh is only applied to derive the initial transformations. After that, the minimal RMSD chain pair gets iteratively added without applying any threshold.

Parameters:
  • model (ost.mol.EntityView/ost.mol.EntityHandle) – Model to map

  • strategy (str) – Strategy to extend mappings from initial transforms, see description above. Must be in [“greedy_single”, “greedy_iterative”, “greedy_iterative_rmsd”]

  • single_chain_gdtts_thresh (float) – Minimal GDT-TS score for model/target chain pair to be added to mapping. Mapping extension for a given transform stops when no pair fulfills this threshold, potentially leading to an incomplete mapping.

  • subsampling (int) – If given, only use an equally distributed subset of all CA/C3’ positions for superposition/scoring.

  • first_complete (bool) – Avoid full enumeration and return first found mapping that covers all model chains or all target chains. Has no effect on greedy_iterative_rmsd strategy.

  • iterative_superposition (bool) – Whether to compute inital transformations with ost.mol.alg.IterativeSuperposeSVD() as oposed to ost.mol.alg.SuperposeSVD()

  • chem_mapping_result (tuple) – Pro param. The result of GetChemMapping() where you provided model. If set, model parameter is not used.

Returns:

A MappingResult

GetMapping(model, n_max_naive=12)

Convenience function to get mapping with currently preferred method

If number of chains in model and target are <= n_max_naive, a naive QS-score mapping is performed. For anything else, a QS-score mapping with the greedy_block strategy is performed (steep_opt_rate = 3, block_seed_size = 5, block_blocks_per_chem_group = 6).

GetRepr(substructure, model, topn=1, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], bb_only=False, only_interchain=False, chem_mapping_result=None, global_mapping=None)

Identify topn representations of substructure in model

substructure defines a subset of target for which one wants the topn representations in model. Representations are scored and sorted by lDDT.

Parameters:
  • substructure (ost.mol.EntityView) – A ost.mol.EntityView which is a subset of target. Should be selected with the OpenStructure query language. Example: if you’re interested in residues with number 42,43 and 85 in chain A: substructure=mapper.target.Select("cname=A and rnum=42,43,85") A RuntimeError is raised if substructure does not refer to the same underlying ost.mol.EntityHandle as target.

  • model (ost.mol.EntityView/ost.mol.EntityHandle) – Structure in which one wants to find representations for substructure

  • topn (int) – Max number of representations that are returned

  • inclusion_radius (float) – Inclusion radius for lDDT

  • thresholds (list of float) – Thresholds for lDDT

  • bb_only (bool) – Only consider backbone atoms in lDDT computation

  • only_interchain (bool) – Only score interchain contacts in lDDT. Useful if you want to identify interface patches.

  • chem_mapping_result (tuple) – Pro param. The result of GetChemMapping() where you provided model. If set, model parameter is not used.

  • global_mapping (MappingResult) – Pro param. Specify a global mapping result. This fully defines the desired representation in the model but extracts it and enriches it with all the nice attributes of ReprResult. The target attribute in global_mapping must be of the same entity as self.target and the model attribute of global_mapping must be of the same entity as model.

Returns:

list of ReprResult

GetNMappings(model)

Returns number of possible mappings

Parameters:

model (ost.mol.EntityView/ost.mol.EntityHandle) – Model with chains that are mapped onto chem_groups

ProcessStructure(ent)

Entity processing for chain mapping

  • Selects view containing peptide and nucleotide residues which have required backbone atoms present - for peptide residues thats N, CA, C and CB (no CB for GLY), for nucleotide residues thats O5’, C5’, C4’, C3’ and O3’.

  • filters view by chain lengths, see min_pep_length and min_nuc_length in constructor

  • Extracts atom sequences for each chain in that view

  • Attaches corresponding ost.mol.EntityView to each sequence

  • If residue number alignments are used, strictly increasing residue numbers without insertion codes are ensured in each chain

Parameters:

ent (ost.mol.EntityView/ost.mol.EntityHandle) – Entity to process

Returns:

Tuple with 3 elements: 1) ost.mol.EntityView containing peptide and nucleotide residues 2) ost.seq.SequenceList containing ATOMSEQ sequences for each polypeptide chain in returned view, sequences have ost.mol.EntityView of according chains attached 3) same for polynucleotide chains

Align(s1, s2, stype)

Access to internal sequence alignment functionality

Alignment parameterization is setup at ChainMapper construction

Parameters:
  • s1 (ost.seq.SequenceHandle) – First sequence to align - must have view attached in case of resnum_alignments

  • s2 (ost.seq.SequenceHandle) – Second sequence to align - must have view attached in case of resnum_alignments

  • stype – Type of sequences to align, must be in [ost.mol.ChemType.AMINOACIDS, ost.mol.ChemType.NUCLEOTIDES]

Returns:

Pairwise alignment of s1 and s2

class ReprResult(lDDT, substructure, ref_view, mdl_view)

Result object for ChainMapper.GetRepr()

Constructor is directly called within the function, no need to construct such objects yourself.

Parameters:
  • lDDT (float) – lDDT for this mapping. Depends on how you call ChainMapper.GetRepr() whether this is backbone only or full atom lDDT.

  • substructure (ost.mol.EntityView) – The full substructure for which we searched for a representation

  • ref_view (mol.EntityView) – View pointing to the same underlying entity as substructure but only contains the stuff that is mapped

  • mdl_view (mol.EntityView) – The matching counterpart in model

property lDDT

lDDT of representation result

Depends on how you call ChainMapper.GetRepr() whether this is backbone only or full atom lDDT.

Type:

float

property substructure

The full substructure for which we searched for a representation

Type:

ost.mol.EntityView

property ref_view

View which contains the mapped subset of substructure

Type:

ost.mol.EntityView

property mdl_view

The ref_view representation in the model

Type:

ost.mol.EntityView

property ref_residues

The reference residues

Type:

class:mol.ResidueViewList

property mdl_residues

The model residues

Type:

mol.ResidueViewList

property inconsistent_residues

A list of mapped residue whose names do not match (eg. ALA in the reference and LEU in the model).

The mismatches are reported as a tuple of ResidueView (reference, model), or as an empty list if all the residue names match.

Type:

list

property ref_bb_pos

Representative backbone positions for reference residues.

Thats CA positions for peptides and C3’ positions for Nucleotides.

Type:

geom.Vec3List

property mdl_bb_pos

Representative backbone positions for model residues.

Thats CA positions for peptides and C3’ positions for Nucleotides.

Type:

geom.Vec3List

property ref_full_bb_pos

Representative backbone positions for reference residues.

Thats N, CA and C positions for peptides and O5’, C5’, C4’, C3’, O3’ positions for Nucleotides.

Type:

geom.Vec3List

property mdl_full_bb_pos

Representative backbone positions for reference residues.

Thats N, CA and C positions for peptides and O5’, C5’, C4’, C3’, O3’ positions for Nucleotides.

Type:

geom.Vec3List

property transform

Transformation to superpose mdl residues onto ref residues

Superposition computed as minimal RMSD superposition on ref_bb_pos and mdl_bb_pos. If number of positions is smaller 3, the full_bb_pos equivalents are used instead.

Type:

ost.geom.Mat4

property superposed_mdl_bb_pos

mdl_bb_pos with transform applied

Type:

geom.Vec3List

property bb_rmsd

RMSD between ref_bb_pos and superposed_mdl_bb_pos

Type:

float

property gdt_8

GDT with one single threshold: 8.0

Type:

float

property gdt_4

GDT with one single threshold: 4.0

Type:

float

property gdt_2

GDT with one single threshold: 2.0

Type:

float

property gdt_1

GDT with one single threshold: 1.0

Type:

float

property ost_query

query for mdl residues in OpenStructure query language

Repr can be selected as full_mdl.Select(ost_query)

Returns invalid query if residue numbers have insertion codes.

Type:

str

JSONSummary()

Returns JSON serializable summary of results

GetFlatChainMapping(mdl_as_key=False)

Returns flat mapping of all chains in the representation

Parameters:

mdl_as_key – Default is target chain name as key and model chain name as value. This can be reversed with this flag.

Returns:

dict with str as key/value that describe one-to-one mapping

class MappingResult(target, model, chem_groups, chem_mapping, mapping, alns, opt_score=None)

Result object for the chain mapping functions in ChainMapper

Constructor is directly called within the functions, no need to construct such objects yourself.

property target

Target/reference structure, i.e. ChainMapper.target

Type:

ost.mol.EntityView

property model

Model structure that gets mapped onto target

Underwent same processing as ChainMapper.target, i.e. only contains peptide/nucleotide chains of sufficient size.

Type:

ost.mol.EntityView

property chem_groups

Groups of chemically equivalent chains in target

Same as ChainMapper.chem_group

list of list of str (chain names)

property chem_mapping

Assigns chains in model to chem_groups.

list of list of str (chain names)

property mapping

Mapping of model chains onto target

Exact same shape as chem_groups but containing the names of the mapped chains in model. May contain None for target chains that are not covered. No guarantee that all chains in model are mapped.

list of list of str (chain names)

property alns

Alignments of mapped chains in target and model

Each alignment is accessible with alns[(t_chain,m_chain)]. First sequence is the sequence of target chain, second sequence the one from model. The respective ost.mol.EntityView are attached with ost.seq.ConstSequenceHandle.AttachView().

Type:

dict with key: tuple of str, value: ost.seq.AlignmentHandle

property opt_score

Placeholder property without any guarantee of being set

Different scores get optimized in the various chain mapping algorithms. Some of them may set their final optimal score in that property. Consult the documentation of the respective chain mapping algorithm for more information. Won’t be in the return dict of JSONSummary().

GetFlatMapping(mdl_as_key=False)

Returns flat mapping as dict for all mapable chains

Parameters:

mdl_as_key – Default is target chain name as key and model chain name as value. This can be reversed with this flag.

Returns:

dict with str as key/value that describe one-to-one mapping

JSONSummary()

Returns JSON serializable summary of results

qsscore – New QS score implementation

Note

This is a new implementation of QS Score, introduced in OpenStructure 2.4 and tightly integrated with the chain mapping algorithms. The previous qsscoring code that comes with Bertoni et al. is considered deprecated.

class QSEntity(ent, contact_d=12.0)

Helper object for QS-score computation

Holds structural information and getters for interacting chains, i.e. interfaces. Peptide residues are represented by their CB position (CA for GLY) and nucleotides by C3’.

Parameters:
property view

Processed structure

View that only contains representative atoms. That’s CB for peptide residues (CA for GLY) and C3’ for nucleotides.

Type:

ost.mol.EntityView

property contact_d

Pairwise distance of residues to be considered as contacts

Given at QSEntity construction

Type:

float

property chain_names

Chain names in view

Names are sorted

Type:

list of str

property interacting_chains

Pairs of chains in view with at least one contact

Type:

list of tuples

GetChain(chain_name)

Get chain by name

Parameters:

chain_name (str) – Chain in view

GetSequence(chain_name)

Get sequence of chain

Returns sequence of specified chain as raw str

Parameters:

chain_name (str) – Chain in view

GetPos(chain_name)

Get representative positions of chain

That’s CB positions for peptide residues (CA for GLY) and C3’ for nucleotides. Returns positions as a numpy array of shape (n_residues, 3).

Parameters:

chain_name (str) – Chain in view

PairDist(chain_name_one, chain_name_two)

Get pairwise distances between two chains

Returns distances as numpy array of shape (a, b). Where a is the number of residues of the chain that comes BEFORE the other in chain_names

class QSScorer(target, chem_groups, model, alns, contact_d=12.0)

Helper object to compute QS-score

Tightly integrated into the mechanisms from the chain_mapping module. The prefered way to derive an object of type QSScorer is through the static constructor: FromMappingResult(). Example score computation including mapping:

from ost.mol.alg.qsscore import QSScorer
from ost.mol.alg.chain_mapping import ChainMapper

ent_1 = io.LoadPDB("path_to_assembly_1.pdb")
ent_2 = io.LoadPDB("path_to_assembly_2.pdb")

chain_mapper = ChainMapper(ent_1)
mapping_result = chain_mapper.GetlDDTMapping(ent_2)
qs_scorer = QSScorer.FromMappingResult(mapping_result)
score_result = qs_scorer.Score(mapping_result.mapping)
print("score:", score_result.QS_global)

QS-score computation in QSScorer.Score() implements caching. Repeated computations with alternative chain mappings thus become faster.

Parameters:
static FromMappingResult(mapping_result)

The preferred way to get a QSScorer

Static constructor that derives an object of type QSScorer using a ost.mol.alg.chain_mapping.MappingResult

Parameters:

mapping_result (ost.mol.alg.chain_mapping.MappingResult) – Data source

property qsent1

Represents target

Type:

QSEntity

property chem_groups

Groups of chemically equivalent chains in target

Provided at object construction

Type:

list of list of str

property qsent2

Represents model

Type:

QSEntity

property alns

Alignments between chains in qsent1 and qsent2

Provided at object construction. Each alignment is accessible with alns[(t_chain,m_chain)]. First sequence is the sequence of the respective chain in qsent1, second sequence the one from qsent2.

Type:

dict with key: tuple of str, value: ost.seq.AlignmentHandle

Score(mapping, check=True)

Computes QS-score given chain mapping

Again, the preferred way is to get mapping is from an object of type ost.mol.alg.chain_mapping.MappingResult.

Parameters:
Returns:

Result object of type QSScorerResult

ScoreInterface(trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)

Computes QS-score only considering one interface

This only works for interfaces that are computed in Score(), i.e. interfaces for which the alignments are set up correctly.

Parameters:
  • trg_ch1 (str) – Name of first interface chain in target

  • trg_ch2 (str) – Name of second interface chain in target

  • mdl_ch1 (str) – Name of first interface chain in model

  • mdl_ch2 (str) – Name of second interface chain in model

Returns:

Result object of type QSScorerResult

Raises:

RuntimeError if no aln for trg_ch1/mdl_ch1 or trg_ch2/mdl_ch2 is available.

FromFlatMapping(flat_mapping)

Same as Score() but with flat mapping

Parameters:

flat_mapping (dict with str as key and value) – Dictionary with target chain names as keys and the mapped model chain names as value

Returns:

Result object of type QSScorerResult

DockQ – DockQ implementation

DockQ(mdl, ref, mdl_ch1, mdl_ch2, ref_ch1, ref_ch2, ch1_aln=None, ch2_aln=None)

Computes DockQ for specified interface

DockQ is described in: Sankar Basu and Bjoern Wallner (2016), “DockQ: A Quality Measure for Protein-Protein Docking Models”, PLOS one

Residues are mapped based on residue numbers by default. If you provide ch1_aln and ch2_aln you can enforce an arbitrary mapping.

Parameters:
  • mdl (ost.mol.EntityView/ost.mol.EntityHandle) – Model structure

  • ref (ost.mol.EntityView/ost.mol.EntityHandle) – Reference structure, i.e. native structure

  • mdl_ch1 (str) – Specifies chain in model constituting first part of interface

  • mdl_ch2 (str) – Specifies chain in model constituting second part of interface

  • ref_ch1 (str) – ref equivalent of mdl_ch1

  • ref_ch2 (str) – ref equivalent of mdl_ch2

  • ch1_aln (ost.seq.AlignmentHandle) – Alignment with two sequences to map ref_ch1 and mdl_ch1. The first sequence must match the sequence in ref_ch1 and the second to mdl_ch1.

  • ch2_aln (ost.seq.AlignmentHandle) – Alignment with two sequences to map ref_ch2 and mdl_ch2. The first sequence must match the sequence in ref_ch2 and the second to mdl_ch2.

Returns:

dict with keys nnat, nmdl, fnat, fnonnat, irmsd, lrmsd, DockQ which corresponds to the equivalent values in the original DockQ implementation.

Steric Clashes

The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).

FilterClashes(entity, clashing_distances, always_remove_bb=False)

This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.

The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a ClashingInfo object containing information about the detected clashes.

Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter.

Hydrogen and deuterium atoms are ignored by this function.

Parameters:
  • entity (EntityView or EntityHandle) – The input entity

  • clashing_distances (ClashingDistances) – information about the clashing distances

  • always_remove_bb (bool) – if set to True, the whole residue is removed even if the clash happens in the side-chain

Returns:

A tuple of two elements: The filtered EntityView, and a ClashingInfo object

CheckStereoChemistry(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)

This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain.

The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a StereoChemistryInfo object containing information about the detected stereo-chemical violations.

A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance, mean_length+std_dev*bond_tolerance] or an angle width outside of the range [mean_width-std_dev*angle_tolerance, mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.

Hydrogen and deuterium atoms are ignored by this function.

Parameters:
  • entity (EntityView or EntityHandle) – The input entity

  • bond_stats (StereoChemicalParams) – statistics about bond lengths

  • angle_stats (StereoChemicalParams) – statistics about angle widths

  • bond_tolerance (float) – tolerance for bond lengths (in standard deviations)

  • angle_tolerance (float) – tolerance for angle widths (in standard deviations)

  • always_remove_bb (bool) – if set to True, the whole residue is removed even if the clash happens in the side-chain

Returns:

A tuple of two elements: The filtered EntityView, and a StereoChemistryInfo object

class ClashingInfo

This object is returned by the FilterClashes() function, and contains information about the clashes detected by the function.

GetClashCount()
Returns:

number of clashes between non-bonded atoms detected in the input structure

GetAverageOffset()
Returns:

a value in Angstroms representing the average offset by which clashing atoms lie closer than the minimum acceptable distance (which of course differs for each possible pair of elements)

GetClashList()
Returns:

list of detected inter-atomic clashes

Return type:

list of ClashEvent

class ClashEvent

This object contains all the information relative to a single clash detected by the FilterClashes() function

GetFirstAtom()
GetSecondAtom()
Returns:

atoms which clash

Return type:

UniqueAtomIdentifier

GetModelDistance()
Returns:

distance (in Angstroms) between the two clashing atoms as observed in the model

GetAdjustedReferenceDistance()
Returns:

minimum acceptable distance (in Angstroms) between the two atoms involved in the clash, as defined in ClashingDistances

class StereoChemistryInfo

This object is returned by the CheckStereoChemistry() function, and contains information about bond lengths and planar angle widths in the structure that diverge from the parameters tabulated by Engh and Huber in the International Tables of Crystallography. Only elements that diverge from the tabulated value by a minimumnumber of standard deviations (defined when the CheckStereoChemistry function is called) are reported.

GetBadBondCount()
Returns:

number of bonds where a serious violation was detected

GetBondCount()
Returns:

total number of bonds in the structure checked by the CheckStereoChemistry function

GetAvgZscoreBonds()
Returns:

average z-score of all the bond lengths in the structure, computed using Engh and Huber’s mean and standard deviation values

GetBadAngleCount()
Returns:

number of planar angles where a serious violation was detected

GetAngleCount()
Returns:

total number of planar angles in the structure checked by the CheckStereoChemistry function

GetAvgZscoreAngles()
Returns:

average z-score of all the planar angle widths, computed using Engh and Huber’s mean and standard deviation values.

GetBondViolationList()
Returns:

list of bond length violations detected in the structure

Return type:

list of StereoChemicalBondViolation

GetAngleViolationList()
Returns:

list of angle width violations detected in the structure

Return type:

list of StereoChemicalAngleViolation

class StereoChemicalBondViolation

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length

GetFirstAtom()
GetSecondAtom()
Returns:

first / second atom of the bond

Return type:

UniqueAtomIdentifier

GetBondLength()
Returns:

length of the bond (in Angstroms) as observed in the model

GetAllowedRange()
Returns:

allowed range of bond lengths (in Angstroms), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the CheckStereoChemistry() function was called

Return type:

tuple (minimum and maximum)

class StereoChemicalAngleViolation

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width

GetFirstAtom()
GetSecondAtom()
GetThirdAtom()
Returns:

first / second (vertex) / third atom that defines the planar angle

Return type:

UniqueAtomIdentifier

GetAngleWidth()
Returns:

width of the planar angle (in degrees) as observed in the model

GetAllowedRange()
Returns:

allowed range of angle widths (in degrees), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the CheckStereoChemistry() function was called

Return type:

tuple (minimum and maximum)

class ClashingDistances

Object containing information about clashing distances between non-bonded atoms

ClashingDistances()

Creates an empty distance list

SetClashingDistance(ele1, ele2, clash_distance, tolerance)

Adds or replaces an entry in the list

Parameters:
  • ele1 – string containing the first element’s name

  • ele2 – string containing the second element’s name

  • clash_distance – minimum clashing distance (in Angstroms)

  • tolerance – tolerance threshold (in Angstroms)

GetClashingDistance(ele1, ele2)
Returns:

reference distance and a tolerance threshold (both in Angstroms) for two elements

Return type:

tuple (minimum clashing distance, tolerance threshold)

Parameters:
  • ele1 – string containing the first element’s name

  • ele2 – string containing the second element’s name

GetAdjustedClashingDistance(ele1, ele2)
Returns:

reference distance (in Angstroms) for two elements, already adjusted by the tolerance threshold

Parameters:
  • ele1 – string containing the first element’s name

  • ele2 – string containing the second element’s name

GetMaxAdjustedDistance()
Returns:

longest clashing distance (in Angstroms) in the list, after adjustment with tolerance threshold

IsEmpty()
Returns:

True if the list is empty (i.e. in an invalid, useless state)

PrintAllDistances()

Prints all distances in the list to standard output

class StereoChemicalParams

Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation

StereoChemicalParams()

Creates an empty parameter list

SetParam(item, residue, mean, standard_dev)

Adds or replaces an entry in the list

Parameters:
  • item – string defining a bond (format: X-Y) or an angle (format: X-Y-Z), where X,Y an Z are atom names

  • residue – string containing the residue type for this entry

  • mean – mean bond length (in Angstroms) or angle width (in degrees)

  • standard_dev – standard deviation of the bond length (in Angstroms) or of the angle width (in degrees)

IsEmpty()
Returns:

True if the list is empty (i.e. in an invalid, useless state)

PrintAllParameters()

Prints all entries in the list to standard output

FillClashingDistances(file_content)
FillBondStereoChemicalParams(file_content)
FillAngleStereoChemicalParams(file_content)

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from the content of a parameter file.

Parameters:

file_content (list of str) – list of lines from the parameter file

Return type:

ClashingDistances or StereoChemicalParams

FillClashingDistancesFromFile(filename)
FillBondStereoChemicalParamsFromFile(filename)
FillAngleStereoChemicalParamsFromFile(filename)

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file path.

Parameters:

filename (str) – path to parameter file

Return type:

ClashingDistances or StereoChemicalParams

DefaultClashingDistances()
DefaultBondStereoChemicalParams()
DefaultAngleStereoChemicalParams()

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter files distributed with OpenStructure.

Return type:

ClashingDistances or StereoChemicalParams

ResidueNamesMatch(probe, reference)

The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are compared by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)

Parameters:
Returns:

True if the residue names are the same, False otherwise

Superposing structures

Superpose(ent_a, ent_b, match='number', atoms='all', iterative=False, max_iterations=5, distance_threshold=3.0)

Superposes the model entity onto the reference. To do so, two views are created, returned with the result. atoms describes what goes into these views and match the selection method. For superposition, SuperposeSVD() or IterativeSuperposeSVD() are called (depending on iterative). For matching, the following methods are recognised:

Parameters:
  • ent_a (EntityView or EntityHandle) – The model entity (superposition transform is applied on full entity handle here)

  • ent_b (EntityView or EntityHandle) – The reference entity

  • match (str) – Method to gather residues/ atoms

  • atoms (str, list, set) – The subset of atoms to be used in the superposition

  • iterative (bool) – Whether or not to use iterative superpositon.

  • max_iterations (int) – Max. number of iterations for IterativeSuperposeSVD() (only if iterative = True)

  • distance_threshold (float) – Distance threshold for IterativeSuperposeSVD() (only if iterative = True)

Returns:

An instance of SuperpositionResult.

ParseAtomNames(atoms)

Parses different representations of a list of atom names and returns a set, understandable by MatchResidueByNum(). In essence, this function translates

  • None to None

  • ‘all’ to None

  • ‘backbone’ to set(['N', 'CA', 'C', 'O'])

  • ‘aname1, aname2’ to set(['aname1', 'aname2'])

  • ['aname1', 'aname2'] to set(['aname1', 'aname2'])

Parameters:

atoms (str, list, set) – Identifier or list of atoms

Returns:

A set of atoms.

MatchResidueByNum(ent_a, ent_b, atoms='all')

Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.

Parameters:
Returns:

Two EntityView instances with the same amount of residues matched by number. Each residue will have the same number & type of atoms.

MatchResidueByIdx(ent_a, ent_b, atoms='all')

Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.

Parameters:
Returns:

Two EntityView instances with the same amount of residues matched by position. Each residue will have the same number & type of atoms.

MatchResidueByLocalAln(ent_a, ent_b, atoms='all')

Match residues by local alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Smith/Waterman manner using the BLOSUM62 matrix for scoring. Only residues which are marked as peptide linking are considered for alignment. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.

Parameters:
Returns:

Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.

MatchResidueByGlobalAln(ent_a, ent_b, atoms='all')

Match residues by global alignment. Same as MatchResidueByLocalAln() but performs a global Needleman/Wunsch alignment of the sequences using the BLOSUM62 matrix for scoring.

Parameters:
Returns:

Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.

class SuperpositionResult
rmsd

RMSD of the superposed entities.

view1
view2

Two EntityView used in superposition (not set if methods with Vec3List used).

transformation

Transformation (Mat4) used to map view1 onto view2.

fraction_superposed
rmsd_superposed_atoms
ncycles

For iterative superposition (IterativeSuperposeSVD()): fraction and RMSD of atoms that were superposed with a distance below the given threshold and the number of iteration cycles performed.

SuperposeSVD(view1, view2, apply_transform=True)
SuperposeSVD(list1, list2)

Superposition of two sets of atoms minimizing RMSD using a classic SVD based algorithm.

Note that the atom positions in the view are taken blindly in the order in which the atoms appear.

Parameters:
  • view1 (EntityView) – View on the model entity

  • view2 (EntityView) – View on the reference entity

  • list1 (Vec3List) – List of atom positions for model entity

  • list2 (Vec3List) – List of atom positions for reference entity

  • apply_transform (bool) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.

Returns:

An instance of SuperpositionResult.

IterativeSuperposeSVD(view1, view2, max_iterations=5, distance_threshold=3.0, apply_transform=True)
IterativeSuperposeSVD(list1, list2, max_iterations=5, distance_threshold=3.0)

Iterative superposition of two sets of atoms. In each iteration cycle, we keep a fraction of atoms with distances below distance_threshold and get the superposition considering only those atoms.

Note that the atom positions in the view are taken blindly in the order in which the atoms appear.

Parameters:
  • view1 (EntityView) – View on the model entity

  • view2 (EntityView) – View on the reference entity

  • list1 (Vec3List) – List of atom positions for model entity

  • list2 (Vec3List) – List of atom positions for reference entity

  • max_iterations (int) – Max. number of iterations to be performed

  • distance_threshold (float) – Distance threshold defining superposed atoms

  • apply_transform (bool) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.

Returns:

An instance of SuperpositionResult.

Raises:

Exception if atom counts do not match or if less than 3 atoms.

CalculateRMSD(view1, view2, transformation=geom.Mat4())
Returns:

RMSD of atom positions (taken blindly in the order in which the atoms appear) in the two given views.

Return type:

float

Parameters:
  • view1 (EntityView) – View on the model entity

  • view2 (EntityView) – View on the reference entity

  • transformation (Mat4) – Optional transformation to apply on each atom position of view1.

Algorithms on Structures

Accessibility(ent, probe_radius=1.4, include_hydrogens=False, include_hetatm=False, include_water=False, oligo_mode=False, selection='', asa_abs='asaAbs', asa_rel='asaRel', asa_atom='asaAtom', algorithm=NACCESS)

Calculates the accesssible surface area for ever atom in ent. The algorithm mimics the behaviour of the bindings available for the NACCESS and DSSP tools and has been tested to reproduce the numbers accordingly.

Parameters:
  • ent (EntityView / EntityHandle) – Entity on which to calculate the surface

  • probe_radius (float) – Radius of probe to determine accessible surface area

  • include_hydrogens (bool) – Whether to include hydrogens in the solvent accessibility calculations. By default, every atom with ele=H,D is simply neglected.

  • include_hetatms (bool) – Whether to include atoms flagged as hetatms , i.e. ligands, in the solvent accessibility calculations. They are neglected by default.

  • include_water (bool) – Whether to include water in the solvent accessibility calculations. By default, every residue with name “HOH” is neglected.

  • oligo_mode (bool) – A typical used case of accessibility calculations is to determine the solvent accessibility of a full complex and then the accessibility of each chain individually. Lots of calculations can be cached because only the values of the atoms close to an interface change. This is exactly what happens when you activate the oligo mode. It returns exactly the same value but adds, additionally to the values estimated in full complex, the values from each individual chain as float properties on every residue and atom. Example for atom accessible surface if the according property name is set to “asaAtom”: Accessibility in the full complex is stored as “asaAtom”, the accessibility when only considering that particular chain is stored as “asaAtom_single_chain”. The other properties follow the same naming scheme.

  • selection (str) – Selection statement, that gets applied on ent before doing anything. Everything that is not selected is neglected. The default value of “” results in no selection at all.

  • asa_abs (str) – Float property name to assign the summed solvent accessible surface from each atom to a residue.

  • asa_rel (str) –

    Float property name to assign the relative solvent accessibility to a residue. This is the absolute accessibility divided by the maximum solvent accessibility of that particular residue. This maximum solvent accessibility is dependent on the chosen AccessibilityAlgorithm. Only residues of the 20 standarad amino acids can be handled.

    • In case of the NACCESS algorithm you can expect a value in range [0.0, 100.0] and a value of -99.9 for non standard residues.

    • In case of the DSSP algorithm you can expect a value in range [0.0, 1.0], no float property is assigned in case of a non standard residue.

  • asa_atom (str) – Float property name to assign the solvent accessible area to each atom.

  • algorithm (AccessibilityAlgorithm) – Specifies the used algorithm for solvent accessibility calculations

Returns:

The summed solvent accessibilty of each atom in ent.

class AccessibilityAlgorithm

The accessibility algorithm enum specifies the algorithm used by the respective tools. Available are:

NACCESS, DSSP

AssignSecStruct(ent)

Assigns secondary structures to all residues based on hydrogen bond patterns as described by DSSP.

Parameters:

ent (EntityView / EntityHandle) – Entity on which to assign secondary structures

class FindMemParam

Result object for the membrane detection algorithm described below

axis

initial search axis from which optimal membrane slab could be found

tilt_axis

Axis around which we tilt the membrane starting from the initial axis

tilt

Angle to tilt around tilt axis

angle

After the tilt operation we perform a rotation around the initial axis with this angle to get the final membrane axis

membrane_axis

The result of applying the tilt and rotation procedure described above. The membrane_axis is orthogonal to the membrane plane and has unit length.

pos

Real number that describes the membrane center point. To get the actual position you can do: pos * membrane_axis

width

Total width of the membrane in A

energy

Pseudo energy of the implicit solvation model

membrane_asa

Membrane accessible surface area

membrane_representation

Dummy atoms that represent the membrane. This entity is only valid if the according flag has been set to True when calling FindMembrane.

FindMembrane(ent, assign_membrane_representation=True, fast=False)

Estimates the optimal membrane position of a protein by using an implicit solvation model. The original algorithm and the used energy function are described in: Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI (2006) Positioning of proteins in membranes: A computational approach.

There are some modifications in this implementation and the procedure is as follows:

  • Initial axis are constructed that build the starting point for initial parameter grid searches.

  • For every axis, the protein is rotated so that the axis builds the z-axis

    • In order to exclude internal hydrophilic pores, only the outermost atoms with respect the the z-axis enter an initial grid search

    • The width and position of the membrane is optimized for different combinations of tilt and rotation angles (further described in FindMemParam). The top 20 parametrizations (only top parametrization if fast is True) are stored for further processing.

  • The 20 best membrane parametrizations from the initial grid search (only the best if fast is set to True) enter a final minimization step using a Levenberg-Marquardt minimizer.

Parameters:
  • ent (ost.mol.EntityHandle / ost.mol.EntityView) –

    Entity of a transmembrane protein, you’ll get weird results if this is not the case. The energy term of the result is typically a good indicator whether ent is an actual transmembrane protein. The following float properties will be set on the atoms:

    • ’asaAtom’ on all atoms that are selected with ent.Select(‘peptide=true and ele!=H’) as a result of envoking Accessibility().

    • ’membrane_e’ the contribution of the potentially membrane facing atoms to the energy function.

  • assign_membrane_representation (bool) – Whether to construct a membrane representation using dummy atoms

  • fast – If set to false, the 20 best results of the initial grid search undergo a Levenberg-Marquardt minimization and the parametrization with optimal minimized energy is returned. If set to yes, only the best result of the initial grid search is selected and returned after Levenberg-Marquardt minimization.

Returns:

The results object

Return type:

ost.mol.alg.FindMemParam

Trajectory Analysis

This is a set of functions used for basic trajectory analysis such as extracting positions, distances, angles and RMSDs. The organization is such that most functions have their counterpart at the individual frame level so that they can also be called on one frame instead of the whole trajectory.

All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.

SuperposeFrames(frames, sel, from=0, to=-1, ref=-1)

This function superposes the frames of the given coord group and returns them as a new coord group.

Parameters:
  • frames (CoordGroupHandle) – The source coord group.

  • sel (ost.mol.EntityView) – An entity view containing the selection of atoms to be used for superposition. If set to an invalid view, all atoms in the coord group are used.

  • from – index of the first frame

  • to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group

  • ref – The index of the reference frame to use for superposition. If set to -1, the each frame is superposed to the previous frame.

Returns:

A newly created coord group containing the superposed frames.

SuperposeFrames(frames, sel, ref_view, from=0, to=-1)

Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.

Parameters:
  • frames (CoordGroupHandle) – The source coord group.

  • sel (ost.mol.EntityView) – An entity view containing the selection of atoms of the frames to be used for superposition.

  • ref_view (ost.mol.EntityView) – The reference view on which the frames will be superposed. The number of atoms in this reference view should be equal to the number of atoms in sel.

  • from – index of the first frame

  • to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group

Returns:

A newly created coord group containing the superposed frames.

AnalyzeAtomPos(traj, atom1, stride=1)

This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeCenterOfMassPos(traj, sele, stride=1)

This function extracts the position of the center-of-mass of a selection (EntityView) from a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • sele (EntityView.) – The selection from which the center of mass is computed

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeDistanceBetwAtoms(traj, atom1, atom2, stride=1)

This function extracts the distance between two atoms from a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The first AtomHandle.

  • atom2 – The second AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeAngle(traj, atom1, atom2, atom3, stride=1)

This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The first AtomHandle.

  • atom2 – The second AtomHandle.

  • atom3 – The third AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeDihedralAngle(traj, atom1, atom2, atom3, atom4, stride=1)

This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • atom1 – The first AtomHandle.

  • atom2 – The second AtomHandle.

  • atom3 – The third AtomHandle.

  • atom4 – The fourth AtomHandle.

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeDistanceBetwCenterOfMass(traj, sele1, sele2, stride=1)

This function extracts the distance between the center-of-mass of two selections (EntityView) from a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • sele1 (EntityView.) – The selection from which the first center of mass is computed

  • sele2 (EntityView.) – The selection from which the second center of mass is computed

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeRMSD(traj, reference_view, sele_view, stride=1)

This function extracts the rmsd between two EntityView and returns it as a vector. The views don’t have to be from the same entity. The reference positions are taken directly from the reference_view, evaluated only once. The positions from the sele_view are evaluated for each frame. If you want to compare to frame i of the trajectory t, first use t.CopyFrame(i) for example:

eh = io.LoadPDB(...)
t = io.LoadCHARMMTraj(eh, ...)
sele = eh.Select(...)
t.CopyFrame(0)
mol.alg.AnalyzeRMSD(t, sele, sele)
Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • reference_view (EntityView.) – The selection used as reference structure

  • sele_view (EntityView.) – The selection compared to the reference_view

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeMinDistance(traj, view1, view2, stride=1)

This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • view1 (EntityView.) – The first group of atoms

  • view2 (EntityView.) – The second group of atoms

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeMinDistanceBetwCenterOfMassAndView(traj, view_cm, view_atoms, stride=1)

This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • view_cm (EntityView.) – The group of atoms from which the center of mass is taken

  • view_atoms (EntityView.) – The second group of atoms

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

AnalyzeAromaticRingInteraction(traj, view_ring1, view_ring2, stride=1)

This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings

Parameters:
  • traj (CoordGroupHandle) – The trajectory to be analyzed.

  • view_ring1 (EntityView.) – First group of atoms

  • view_ring2 (EntityView.) – Second group of atoms

  • stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

helix_kinks – Algorithms to calculate Helix Kinks

Functions to calculate helix kinks: bend, face shift and wobble angles

Author: Niklaus Johner

AnalyzeHelixKink(t, sele, proline=False)

This function calculates the bend, wobble and face-shift angles in an alpha- helix over a trajectory. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.

Parameters:
  • t (CoordGroup) – The trajectory to be analyzed

  • sele (EntityView) – A selection containing the alpha helix to be analyzed

  • proline (ost.mol.EntityView) – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be searched for automatically

Returns:

A tuple (bend_angle, face_shift, wobble_angle).

Return type:

(FloatList, FLoatList, FloatList)

CalculateHelixKink(sele, proline=False)

This function calculates the bend, wobble and face-shift angles in an alpha- helix of an EntityView. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.

Parameters:
  • sele (EntityView) – A selection containing the alpha helix to be analyzed

  • proline (ost.mol.EntityView) – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be searched for automatically

Returns:

A tuple (bend_angle, face_shift, wobble_angle).

Return type:

(float, float, float)

trajectory_analysis – DRMSD, pairwise distances and more

This Module requires numpy

This module contains functions to analyze trajectories, mainly similiraty measures baed on RMSDS and pairwise distances.

Author: Niklaus Johner (niklaus.johner@unibas.ch)

AverageDistanceMatrixFromTraj(t, sele, first=0, last=-1)

This function calcultes the distance between each pair of atoms in sele, averaged over the trajectory t.

Parameters:
  • t (CoordGroupHandle) – the trajectory

  • sele (EntityView) – the selection used to determine the atom pairs

  • first (int) – the first frame of t to be used

  • last (int) – the last frame of t to be used

Returns:

a numpy NpairsxNpairs matrix, where Npairs is the number of atom pairs in sele.

DistRMSDFromTraj(t, sele, ref_sele, radius=7.0, average=False, seq_sep=4, first=0, last=-1)

This function calculates the distance RMSD from a trajectory. The distances selected for the calculation are all the distances between pair of atoms from residues that are at least seq_sep apart in the sequence and that are smaller than radius in ref_sel. The number and order of atoms in ref_sele and sele should be the same.

Parameters:
  • t (CoordGroupHandle) – the trajectory

  • sele (EntityView) – the selection used to calculate the distance RMSD

  • ref_sele (EntityView) – the reference selection used to determine the atom pairs and reference distances

  • radius (float) – the upper limit of distances in ref_sele considered for the calculation

  • seq_sep (int) – the minimal sequence separation between atom pairs considered for the calculation

  • average (bool) – use the average distance in the trajectory as reference instead of the distance obtained from ref_sele

  • first (int) – the first frame of t to be used

  • last (int) – the last frame of t to be used

Returns:

a numpy vecor dist_rmsd(Nframes).

DistanceMatrixFromPairwiseDistances(distances, p=2)

This function calculates an distance matrix M(NframesxNframes) from the pairwise distances matrix D(NpairsxNframes), where Nframes is the number of frames in the trajectory and Npairs the number of atom pairs. M[i,j] is the distance between frame i and frame j calculated as a p-norm of the differences in distances from the two frames (distance-RMSD for p=2).

Parameters:
  • distances – a pairwise distance matrix as obtained from PairwiseDistancesFromTraj()

  • p – exponent used for the p-norm.

Returns:

a numpy NframesxNframes matrix, where Nframes is the number of frames.

PairwiseDistancesFromTraj(t, sele, first=0, last=-1, seq_sep=1)

This function calculates the distances between any pair of atoms in sele with sequence separation larger than seq_sep from a trajectory t. It return a matrix containing one line for each atom pair and Nframes columns, where Nframes is the number of frames in the trajectory.

Parameters:
  • t (CoordGroupHandle) – the trajectory

  • sele (EntityView) – the selection used to determine the atom pairs

  • first (int) – the first frame of t to be used

  • last (int) – the last frame of t to be used

  • seq_sep (int) – The minimal sequence separation between atom pairs

Returns:

a numpy NpairsxNframes matrix.

RMSD_Matrix_From_Traj(t, sele, first=0, last=-1, align=True, align_sele=None)

This function calculates a matrix M such that M[i,j] is the RMSD (calculated on sele) between frames i and j of the trajectory t aligned on sele.

Parameters:
  • t (CoordGroupHandle) – the trajectory

  • sele (EntityView) – the selection used for alignment and RMSD calculation

  • first (int) – the first frame of t to be used

  • last (int) – the last frame of t to be used

Returns:

Returns a numpy NframesxNframes matrix, where Nframes is the number of frames.

structure_analysis – Functions to analyze structures

Some functions for analyzing structures

Author: Niklaus Johner (Niklaus.Johner@unibas.ch)

CalculateBestFitLine(sele1)

This function calculates the best fit line to the atoms in sele1.

Parameters:

sele1 (EntityView) –

Returns:

Line3

CalculateBestFitPlane(sele1)

This function calculates the best fit plane to the atoms in sele1.

Parameters:

sele1 (EntityView) –

Returns:

Plane

CalculateDistanceDifferenceMatrix(sele1, sele2)

This function calculates the pairwise distance differences between two selections (EntityView). The two selections should have the same number of atoms It returns an NxN DistanceDifferenceMatrix M (where N is the number of atoms in sele1) where M[i,j]=||(sele2.atoms[i].pos-sele2.atoms[j].pos)||-||(sele1.atoms[i].pos-sele1.atoms[j].pos)||

Parameters:
Returns:

NxN numpy matrix

CalculateHelixAxis(sele1)

This function calculates the best fit cylinder to the CA atoms in sele1, and returns its axis. Residues should be ordered correctly in sele1.

Parameters:

sele1 (EntityView) –

Returns:

Line3

GetAlphaHelixContent(sele1)

This function calculates the content of alpha helix in a view. All residues in the view have to ordered and adjacent (no gaps allowed)

Parameters:

sele1 (EntityView) –

Returns:

float

GetDistanceBetwCenterOfMass(sele1, sele2)

This function calculates the distance between the centers of mass of sele1 and sele2, two selections from the same Entity.

Parameters:
Returns:

float

GetFrameFromEntity(eh)

This function returns a CoordFrame from an EntityHandle

Parameters:

eh (EntityHandle) –

Returns:

ost.mol.CoordFrame

GetMinDistBetwCenterOfMassAndView(sele1, sele2)

This function calculates the minimal distance between sele2 and the center of mass of sele1, two selections from the same Entity.

Parameters:
  • sele1 (EntityView) – The selection from which the center of mass is taken

  • sele2 (EntityView) –

Returns:

distance (float)

GetMinDistanceBetweenViews(sele1, sele2)

This function calculates the minimal distance between sele1 and sele2, two selections from the same Entity.

Parameters:
Returns:

float

Mapping functions

The following functions help to convert one residue into another by reusing as much as possible from the present atoms. They are mainly meant to map from standard amino acid to other standard amino acids or from modified amino acids to standard amino acids.

CopyResidue(src_res, dst_res, editor)

Copies the atoms of src_res to dst_res using the residue names as guide to decide which of the atoms should be copied. If src_res and dst_res have the same name, or src_res is a modified version of dst_res (i.e. have the same single letter code), CopyConserved() will be called, otherwise CopyNonConserved().

If a CBeta atom wasn’t already copied from src_res, a new one at a reconstructed position will be added to dst_res if it is not GLY and all backbone positions are available to do it.

Parameters:
  • src_res (ResidueHandle) – The source residue

  • dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid)

  • editor (XCSEditor) – Editor used to modify dst_res.

Returns:

True if the residue could be copied as a conserved residue, False if it had to fallback to CopyNonConserved().

CopyConserved(src_res, dst_res, editor)

Copies the atoms of src_res to dst_res assuming that the parent amino acid of src_res (or src_res itself) are identical to dst_res.

If src_res and dst_res are identical, all heavy atoms are copied to dst_res. If src_res is a modified version of dst_res and the modification is a pure addition (e.g. the phosphate group of phosphoserine), the modification is stripped off and all other heavy atoms are copied to dst_res. If the modification is not a pure addition, it falls back to CopyNonConserved().

Additionally, the selenium atom of MSE is converted to sulphur to map MSE to MET.

Parameters:
  • src_res (ResidueHandle) – The source residue

  • dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid)

  • editor (XCSEditor) – Editor used to modify dst_res.

Returns:

A tuple of bools stating whether the residue could be copied without falling back to CopyNonConserved() and whether the CBeta atom was copied from src_res to dst_res.

CopyNonConserved(src_res, dst_res, editor)

Copies the heavy backbone atoms and CBeta (except for GLY) of src_res to dst_res.

Parameters:
  • src_res (ResidueHandle) – The source residue

  • dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid)

  • editor (XCSEditor) – Editor used to modify dst_res.

Returns:

A tuple of bools as in CopyConserved() with the first bool always being False.

Molecular Checker (Molck)

Programmatic usage

Molecular Checker (Molck) could be called directly from the code using Molck function:

#! /bin/env python

"""Run Molck with Python API.


This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
                 --fix-ele --out=<OUTPUT PATH> \
                 --complib=<PATH TO compounds.chemlib>
"""

from ost.io import LoadPDB, SavePDB
from ost.mol.alg import MolckSettings, Molck

from ost.conop import CompoundLib


pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")

# Using Molck function
ent = LoadPDB(pdbid)
ms = MolckSettings(rm_unk_atoms=True,
                   rm_non_std=True,
                   rm_hyd_atoms=True,
                   rm_oxt_atoms=True,
                   rm_zero_occ_atoms=False,
                   colored=False,
                   map_nonstd_res=False,
                   assign_elem=True)
Molck(ent, lib, ms)
SavePDB(ent, "<OUTPUT PATH>")

It can also be split into subsequent commands for greater controll:

#! /bin/env python

"""Run Molck with Python API.


This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
                 --fix-ele --out=<OUTPUT PATH> \
                 --complib=<PATH TO compounds.chemlib>
"""

from ost.io import LoadPDB, SavePDB
from ost.mol.alg import (RemoveAtoms, MapNonStandardResidues,
                         CleanUpElementColumn)
from ost.conop import CompoundLib


pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
map_nonstd = False

# Using function chain
ent = LoadPDB(pdbid)
if map_nonstd:
    MapNonStandardResidues(lib=lib, ent=ent)

RemoveAtoms(lib=lib,
            ent=ent,
            rm_unk_atoms=True,
            rm_non_std=True,
            rm_hyd_atoms=True,
            rm_oxt_atoms=True,
            rm_zero_occ_atoms=False,
            colored=False)

CleanUpElementColumn(lib=lib, ent=ent)
SavePDB(ent, "<OUTPUT PATH>")

API

class MolckSettings(rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=True, assign_elem=True)

Stores settings used for Molecular Checker.

Parameters:
rm_unk_atoms

Remove unknown and atoms not following the nomenclature.

Type:

bool

rm_non_std

Remove all residues not one of the 20 standard amino acids

Type:

bool

rm_hyd_atoms

Remove hydrogen atoms

Type:

bool

rm_oxt_atoms

Remove terminal oxygens

Type:

bool

rm_zero_occ_atoms

Remove atoms with zero occupancy

Type:

bool

colored

Whether output should be colored

Type:

bool

map_nonstd_res

Maps modified residues back to the parent amino acid, for example MSE -> MET, SEP -> SER

Type:

bool

assign_elem

Clean up element column

Type:

bool

ToString()
Returns:

String representation of the MolckSettings.

Return type:

str

Warning

The API here is set such that the functions modify the passed structure ent in-place. If this is not ok, please work on a copy of the structure.

Molck(ent, lib, settings[, prune=True])

Runs Molck on provided entity. Reprocesses ent with ost.conop.HeuristicProcessor and given lib once done.

Parameters:
  • ent (EntityHandle) – Structure to check

  • lib (CompoundLib) – Compound library

  • settings (MolckSettings) – Molck settings

  • prune (bool) – Whether to remove residues/chains that don’t contain atoms anymore after Molck cleanup

MapNonStandardResidues(ent, lib, reprocess=True)

Maps modified residues back to the parent amino acid, for example MSE -> MET.

Parameters:
  • ent (EntityHandle) – Structure to check

  • lib (CompoundLib) – Compound library

  • reprocess – The function generates a deep copy of ent. Highly recommended to enable reprocess that runs ost.conop.HeuristicProcessor with given lib. If set to False, you’ll have no connectivity etc. after calling this function.

RemoveAtoms(ent, lib, rm_unk_atoms=False, rm_non_std=False,                         rm_hyd_atoms=True, rm_oxt_atoms=False,                         rm_zero_occ_atoms=False, colored=False,
reprocess=True)

Removes atoms and residues according to some criteria.

Parameters:
CleanUpElementColumn(ent, lib)

Clean up element column.

Parameters: