You are reading the documentation for the development version of OpenStructure. Jump to the documentation of the stable versions: 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.7.1 1.8 1.9 1.10 1.11 2.0 2.1 2.2

# mol.alg – Algorithms for Structures¶

## Local Distance Test scores (lDDT, DRMSD)¶

LocalDistDiffTest(model, distance_list, tolerance_list, sequence_separation=0, local_lddt_property_string="")

This function counts the number of conserved local contacts between a model and a reference structure which is needed to compute the Local Distance Difference Test score.

The Local Distance Difference Test score is a number between zero and one, which measures the agreement of local contacts between a model and a reference structure. One means complete agreement, and zero means no agreement at all. The calculation of this score does not require any superposition between the model and the reference structures.

All distances between atoms in the reference structure that are shorter than a certain predefined length (inclusion radius) are compared with the corresponding distances in the model structure. If the difference between a reference distance and the corresponding model distance is smaller than a threshold value (tolerance), that distance is considered conserved. The final lDDT score is the fraction of conserved distances. Missing atoms in the model structure lead to non-conserved distances (and thus lower the final lDDT score).

This function takes as an input a list of distances to be checked for conservation. Any number of threshold values can be specified when the function is called. All thresholds are then applied in sequence and the return counts are averaged over all threshold values. A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation in sequence is higher than the provided parameter are considered when the score is computed.

If a string is passed as the last parameter, residue-based counts and the value of the residue-based Local Distance Difference Test score are saved in each ResidueHandle as int and float properties. Specifically, the local residue-based lddt score is stored in a float property named as the provided string, while the residue-based number of conserved and total distances are saved in two int properties named <string>_conserved and <string>_total.

Parameters: model (EntityView) – the model structure distance_list (GlobalRDMap) – the list of distances to check for conservation tolerance_list – a list of thresholds used to determine distance conservation sequence_separation – sequence separation parameter used when computing the score local_lddt_property_string – the base name for the ResidueHandle properties that store the local scores a tuple containing the counts of the conserved distances in the model and of all the checked distances
LocalDistDiffTest(model, reference_list, distance_list, settings)

Wrapper around LocalDistDiffTest() above.

Parameters: model (EntityView) – the model structure reference_list (list of EntityView) – the list of reference structures from which distances were derived distance_list (GlobalRDMap) – A residue distance map prepared with PreparelDDTGlobalRDMap() with reference_list and settings as parameters. settings (lDDTSettings) – lDDT settings the Local Distance Difference Test score (conserved distances divided by all the checked distances) float
LocalDistDiffTest(model, target, cutoff, max_dist, local_lddt_property_string="")

Wrapper around LocalDistDiffTest() above using: distance_list = CreateDistanceList() with target and max_dist as parameters and tolerance_list = [cutoff].

Parameters: model (EntityView) – the model structure target (EntityView) – the target structure from which distances are derived cutoff (float) – single distance threshold to determine distance conservation max_dist (float) – the inclusion radius in Angstroms (to determine which distances are checked for conservation) local_lddt_property_string – the base name for the ResidueHandle properties that store the local scores the Local Distance Difference Test score (conserved distances divided by all the checked distances) float
LocalDistDiffTest(alignment, tolerance, radius, ref_index=0, mdl_index=1)

Calculates the Local Distance Difference Test score (see previous function) starting from an alignment between a reference structure and a model. The AlignmentHandle parameter used to provide the alignment to the function needs to have the two structures attached to it. By default the first structure in the alignment is considered to be the reference structure, and the second structure is taken as the model. This can however be changed by passing the indexes of the two structures in the AlignmentHandle as parameters to the function.

Note

This function uses the old implementation of the Local Distance Difference Test algorithm and will give slightly different results from the new one.

Parameters: alignment (AlignmentHandle) – an alignment containing the sequences of the reference and of the model structures, with the structures themselves attached tolerance – a list of thresholds used to determine distance conservation radius – the inclusion radius in Angstroms (to determine which distances are checked for conservation) ref_index – index of the reference structure in the alignment mdl_index – index of the model in the alignment the Local Distance Difference Test score
LDDTHA(model, distance_list, sequence_separation=0)

This function calculates the Local Distance Difference Test, using the same threshold values as the GDT-HA test (the default set of thresholds used for the lDDT score) (See previous functions). The thresholds are 0.5, 1, 2, and 4 Angstroms.

The function only compares the input distance list to the first chain of the model structure.

The local residue-based lDDT score values are stored in the ResidueHandles of the model passed to the function in a float property called “locallddt”.

A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score.

Parameters: model (EntityView) – the model structure distance_list (GlobalRDMap) – the list of distances to check for conservation sequence_separation – sequence separation parameter the Local Distance Difference Test score
DistanceRMSDTest(model, distance_list, cap_difference, sequence_separation=0, local_drmsd_property_string="")

This function performs a Distance RMSD Test on a provided model, and calculates the two values that are necessary to determine the Distance RMSD Score, namely the sum of squared distance deviations and the number of distances on which the sum was computed.

The Distance RMSD Test (or DRMSD Test) computes the deviation in the length of local contacts between a model and a reference structure and expresses it in the form of a score value. The score has an an RMSD-like form, with the deviations in the RMSD formula computed as contact distance differences. The score is open-ended, with a value of zero meaning complete agreement of local contact distances, and a positive value revealing a disagreement of magnitude proportional to the score value itself. This score does not require any superposition between the model and the reference.

This function processes a list of distances provided by the user, together with their length in the reference structure. For each distance that is found in the model, its difference with the reference length is computed and used as deviation term in the RMSD-like formula.When a distance is not present in the model because one or both the atoms are missing, a default deviation value provided by the user is used.

The function only processes distances between atoms that do not belong to the same residue, and considers only standard residues in the first chain of the model. For residues with symmetric sidechains (GLU, ASP, ARG, VAL, PHE, TYR), the naming of the atoms is ambiguous. For these residues, the function computes the Distance RMSD Test score that each naming convention would generate when considering all non-ambiguous surrounding atoms. The solution that gives the lower score is then picked to compute the final Distance RMSD Score for the whole model.

A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score.

If a string is passed as last parameter to the function, the function computes the Distance RMSD Score for each residue and saves it as a float property in the ResidueHandle, with the passed string as property name. Additionally, the actual sum of squared deviations and the number of distances on which it was computed are stored as properties in the ResidueHandle. The property names are respectively <passed string>_sum (a float property) and <passed string>_count (an integer property).

Parameters: model (EntityView) – the model structure distance_list (GlobalRDMap) – the list of distances to check (here we only use the first of the two distance values stored, the second is ignored) cap_difference – a default deviation value to be used when a distance is not found in the model sequence_separation – sequence separation parameter local_ldt_property_string – the base name for the ResidueHandle properties that store the local scores a tuple containing the sum of squared distance deviations, and the number of distances on which it was computed.
DRMSD(model, distance_list, cap_difference, sequence_separation=0)

This function calculates the Distance RMSD Test score (see DistanceRMSDTest()).

The function only considers distances between atoms not belonging to the same residue, and only compares the input distance list to the first chain of the model structure. It requires, in addition to the model and the list themselves, a default deviation value to be used in the DRMSD Test when a distance is not found in the model.

The local Local Distance Difference Test score values are stored in the ResidueHandles of the model passed to the function in a float property called “localdrmsd”.

A sequence separation parameter can be passed to the function. If this happens, only distances between residues whose separation is higher than the provided parameter are considered when computing the score.

Parameters: model (EntityView) – the model structure distance_list (GlobalRDMap) – the list of distances as in DistanceRMSDTest() cap_difference – a default deviation value to be used when a distance is not found in the model sequence_separation – sequence separation parameter the Distance RMSD Test score
CreateDistanceList(reference, radius)
CreateDistanceListFromMultipleReferences(reference_list, tolerance_list, sequence_separation, radius)

Both these functions create lists of distances to be checked during a Local Distance Difference Test (see description of the functions above).

Note

These functions process only standard residues present in the first chain of the reference structures.

The only difference between the two functions is that one takes a single reference structure and the other a list of reference structures. The structures in the list have to be properly prepared before being passed to the function. Corresponding residues in the structures must have the same residue number, the same chain name, etc. Gaps are allowed and automatically dealt with: if information about a distance is present in at least one of the structures, it will be considered.

If a distance between two atoms is shorter than the inclusion radius in all structures in which the two atoms are present, it is included in the list. However, if the distance is longer than the inclusion radius in at least one of the structures, it is not considered to be a local interaction and is excluded from the list.

The multiple-reference function takes care of residues with ambiguous symmetric sidechains. To decide which naming convention to use, the function computes a Local Distance Difference Test score foreach reference against the first reference structure in the list, using only non ambiguously-named atoms. It picks then the naming convention that gives the highest score, guaranteeing that all references are processed with the correct atom names.

The cutoff list that will later be used to compute the Local Distance Difference Test score and the sequence separation parameter must be passed to the multi-reference function. These parameters do not influence the output distance list, which always includes all distances within the provided radius (to make it consistent with the single-reference corresponding function). However, the parameters are used when dealing with the naming convention of residues with ambiguous nomenclature.

Parameters: reference (EntityView) – a reference structure from which distances are derived reference_list (list of EntityView) – a list of reference structures from which distances are derived tolerance_list – a list of thresholds used to determine distance conservation when computing the lDDT score sequence_separation – sequence separation parameter used when computing the lDDT score radius – inclusion radius (in Angstroms) used to determine the distances included in the list GlobalRDMap
PreparelDDTGlobalRDMap(reference_list, cutoff_list, sequence_separation, max_dist)

A wrapper around CreateDistanceList() and CreateDistanceListFromMultipleReferences(). Depending on the length of the reference_list it calls one or the other.

Parameters: reference_list (list of EntityView) – a list of reference structures from which distances are derived max_dist (float) – the inclusion radius in Angstroms (to determine which distances are checked for conservation) sequence_separation (int) – sequence separation parameter ie. maximum distance between two sequences. GlobalRDMap
CleanlDDTReferences(reference_list)

Prepares references to be used in lDDT calculation. It checks if all references has the same chain name and selects this chain for for further calculations.

Warning

This function modifies the passed reference_list list.

Parameters: reference_list (list of EntityView) – A list of reference structures from which distances are derived
CheckStructure(ent, bond_table, angle_table, nonbonded_table, bond_tolerance, angle_tolerance)

Perform structural checks and filters the structure.

Parameters: ent (EntityView) – Structure to check bond_table (StereoChemicalParams) – List of bond stereo chemical parameters obtained from StereoChemicalParamsReader or FillStereoChemicalParams() angle_table (StereoChemicalParams) – List of angle stereo chemical parameters obtained from StereoChemicalParamsReader or FillStereoChemicalParams() nonbonded_table (ClashingDistances) – Information about the clashing distances obtained from StereoChemicalParamsReader or FillClashingDistances() bond_tolerance (float) – Tolerance in stddev for bonds angle_tolerance (float) – Tolerance in stddev for angles
GetlDDTPerResidueStats(model, distance_list, structural_checks, label)

Get the per-residue statistics from the lDDT calculation.

Parameters: model (EntityHandle) – The model structure distance_list (GlobalRDMap) – The list of distances to check for conservation structural_checks (bool) – Were structural checks performed on the model? label (str) – Label used for ResidueHandle properties that store the local scores. Per-residue local lDDT scores list of lDDTLocalScore
PrintlDDTPerResidueStats(scores, structural_checks, cutoffs_length)

Print per-residue statistics from lDDT calculation.

Parameters: scores (list of lDDTLocalScore) – Local lDDT scores structural_checks (bool) – Where structural checks performed on the model? cutoffs_length (int) – Length of the cutoffs list used to calculate lDDT
class lDDTLocalScore(cname, rname, rnum, is_assessed, quality_problems, local_lddt, conserved_dist, total_dist)

Object containing per-residue information about calculated lDDT.

Parameters: cname – Sets cname rname – Sets rname rnum – Sets rnum is_assessed – Sets is_assessed quality_problems – Sets quality_problems local_lddt – Sets local_lddt conserved_dist – Sets conserved_dist total_dist – Sets total_dist
cname

Chain name.

Type: str
rname

Residue name.

Type: str
rnum

Residue number.

Type: int
is_assessed

Is the residue taken into account? Yes or No.

Type: str
quality_problems

Does the residue have quality problems? No if there are no problems, NA if the problems were not assessed, Yes if there are sidechain problems and Yes+ if there are backbone problems.

Type: str
local_lddt

Local lDDT score for residue.

Type: float
conserved_dist

Number of conserved distances.

Type: int
total_dist

Total number of distances.

Type: int
ToString(structural_checks)
Returns: String representation of the lDDTLocalScore object. str structural_checks (bool) – Where structural checks applied during calculations?
GetHeader(structural_checks, cutoffs_length)

Get the names of the fields as printed by ToString method.

Parameters: structural_checks (bool) – Where structural checks applied during calculations? cutoffs_length (int) – Length of the cutoffs list used for calculations
class StereoChemicalProps(bond_table, angle_table, nonbonded_table)

Object containing the stereo-chemical properties read form stereochmical_props.txt file.

Parameters: bond_table – Sets bond_table angle_table – Sets angle_table nonbonded_table – Sets nonbonded_table
bond_table

Object containing bond parameters

angle_table

Object containing angle parameters

nonbonded_table

Object containing clashing distances parameters

class lDDTSettings(radius=15, sequence_separation=0, cutoffs=(0.5, 1.0, 2.0, 4.0), label="locallddt")

Object containing the settings used for lDDT calculations.

Parameters: radius – Sets radius. sequence_separation – Sets sequence_separation. cutoffs – Sets cutoffs. label – Sets label.
radius

Type: float
sequence_separation

Sequence separation.

Type: int
cutoffs

List of thresholds used to determine distance conservation.

Type: list of float
label

The base name for the ResidueHandle properties that store the local scores.

Type: str
PrintParameters()

Print settings.

ToString()
Returns: String representation of the lDDTSettings object. str
class lDDTScorer(reference, model, settings)

Object to compute lDDT scores using LocalDistDiffTest() as in Mariani et al..

Example usage.

#! /bin/env python
"""Run lDDT from within script."""
from ost.mol.alg import (CleanlDDTReferences,
lDDTSettings, lDDTScorer)

model_view = ent_full.Select('cname=A')
references = [ent_full.Select('cname=C')]

#
# Initialize settings with default parameters and print them
settings = lDDTSettings()
settings.PrintParameters()

# Clean up references
CleanlDDTReferences(references)
#
# Calculate lDDT
scorer = lDDTScorer(references=references, model=model_view, settings=settings)
print("Global score:", scorer.global_score)
scorer.PrintPerResidueStats()

Parameters: references – Sets references model – Sets model settings – Sets settings
references

A list of reference structures.

Type: list(EntityView)
model

A model structure.

Type: EntityView
settings

Settings used to calculate lDDT.

global_dist_list

Global map of residue properties.

global_score

Global lDDT score. It is calculated as conserved_contacts divided by total_contacts.

Type: float
conserved_contacts

Number of conserved distances.

Type: int
total_contacts

Number of total distances.

local_scores

Local scores. For each of the residue lDDT is it is calculated as residue conserved contacts divided by residue total contacts.

Type: list(lDDTLocalScore)
is_valid

Is the calculated score valid?

Type: bool
PrintPerResidueStats()

Print per-residue statistics.

class UniqueAtomIdentifier(chain, residue_number, residue_name, atom_name)

Object containing enough information to uniquely identify an atom in a structure.

Parameters: chain – A string containing the name of the chain to which the atom belongs residue_number (ResNum) – The number of the residue to which the atom belongs residue_name – A string containing the name of the residue to which the atom belongs atom_name – A string containing the name of the atom
GetChainName()

Returns the name of the chain to which the atom belongs, as a String

GetResNum()

Returns the number of the residue the atom belongs to, as a ResNum object

GetResidueName()

Returns the name of the residue to which the atom belongs, as a String

GetAtomName()

Returns the name of the atom, as a String

GetQualifiedAtomName()

Returns the qualified name of the atom (the chain name, followed by a unique residue identifier and the atom name. For example: “A.GLY2.CA”)

class ResidueRDMap

Dictionary-like object containing the list of interatomic distances that originate from a single residue to be checked during a run of the Local Distance Difference Test algorithm (key = pair of UniqueAtomIdentifier, value = pair of floats representing min and max distance observed in the structures used to build the map).

class GlobalRDMap

Dictionary-like object containing all the ResidueRDMap objects related to all the residues (key = ResNum, value = ResidueRDMap).

PrintResidueRDMap(residue_distance_list)

Prints to standard output all the distances contained in a ResidueRDMap object.

PrintGlobalRDMap(global_distance_list)

Prints to standard output all the distances contained in each of the ResidueRDMap objects that make up a GlobalRDMap object.

## qsscoring – Quaternary Structure (QS) scores¶

Scoring of quaternary structures (QS). The QS scoring is according to the paper by Bertoni et al..

Note

Requirements for use:

exception QSscoreError

Exception to be raised for “acceptable” exceptions in QS scoring.

Those are cases we might want to capture for default behavior.

class QSscorer(ent_1, ent_2, res_num_alignment=False)

Object to compute QS scores.

Simple usage without any precomputed contacts, symmetries and mappings:

import ost
from ost.mol.alg import qsscoring

# load two biounits to compare
ent_1 = ent_full.Select('cname=A,D')
ent_2 = ent_full.Select('cname=B,C')
# get score
ost.PushVerbosityLevel(3)
try:
qs_scorer = qsscoring.QSscorer(ent_1, ent_2)
ost.LogScript('QSscore:', str(qs_scorer.global_score))
ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping))
# commonly you want the QS global score as output
qs_score = qs_scorer.global_score
except qsscoring.QSscoreError as ex:
# default handling: report failure and set score to 0
ost.LogError('QSscore failed:', str(ex))
qs_score = 0


For maximal performance when computing QS scores of the same entity with many others, it is advisable to construct and reuse QSscoreEntity objects.

Any known / precomputed information can be filled into the appropriate attribute here (no checks done!). Otherwise most quantities are computed on first access and cached (lazy evaluation). Setters are provided to set values with extra checks (e.g. SetSymmetries()).

All necessary seq. alignments are done by global BLOSUM62-based alignment. A multiple sequence alignment is performed with ClustalW unless chain_mapping is provided manually. You will need to have an executable clustalw or clustalw2 in your PATH or you must set clustalw_bin accordingly. Otherwise an exception (ost.settings.FileNotFound) is thrown.

Formulas for QS scores:

- QS_best = weighted_scores / (weight_sum + weight_extra_mapped)
- QS_global = weighted_scores / (weight_sum + weight_extra_all)
-> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared
-> weight_sum = sum(w(min(d1,d2))) for shared
-> weight_extra_mapped = sum(w(d)) for all mapped but non-shared
-> weight_extra_all = sum(w(d)) for all non-shared
-> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else


In the formulas above:

• “d”: CA/CB-CA/CB distance of an “inter-chain contact” (“d1”, “d2” for “shared” contacts).
• “mapped”: we could map chains of two structures and align residues in alignments.
• “shared”: pairs of residues which are “mapped” and have “inter-chain contact” in both structures.
• “inter-chain contact”: CB-CB pairs (CA for GLY) with distance <= 12 A (fallback to CA-CA if calpha_only is True).
• “w(d)”: weighting function (prob. of 2 res. to interact given CB distance) from Xu et al. 2009.
Parameters: ent_1 (QSscoreEntity, EntityHandle or EntityView) – First structure to be scored. ent_2 (QSscoreEntity, EntityHandle or EntityView) – Second structure to be scored. res_num_alignment – Sets res_num_alignment QSscoreError if input structures are invalid or are monomers or have issues that make it impossible for a QS score to be computed.
qs_ent_1

QSscoreEntity object for ent_1 given at construction. If entity names (original_name) are not unique, we set it to ‘pdb_1’ using SetName().

qs_ent_2

QSscoreEntity object for ent_2 given at construction. If entity names (original_name) are not unique, we set it to ‘pdb_2’ using SetName().

calpha_only

True if any of the two structures is CA-only (after cleanup).

Type: bool
max_ca_per_chain_for_cm

Maximal number of CA atoms to use in each chain to determine chain mappings. Setting this to -1 disables the limit. Limiting it speeds up determination of symmetries and chain mappings. By default it is set to 100.

Type: int
max_mappings_extensive

Maximal number of chain mappings to test for ‘extensive’ chain_mapping_scheme. The extensive chain mapping search must in the worst case check O(N^2) * O(N!) possible mappings for complexes with N chains. Two octamers without symmetry would require 322560 mappings to be checked. To limit computations, a QSscoreError is thrown if we try more than the maximal number of chain mappings. The value must be set before the first use of chain_mapping. By default it is set to 100000.

Type: int
res_num_alignment

Forces each alignment in alignments to be based on residue numbers instead of using a global BLOSUM62-based alignment.

Type: bool
GetOligoLDDTScorer(settings, penalize_extra_chains=True)
Returns: OligoLDDTScorer object, setup for this QS scoring problem. The scorer is set up with qs_ent_1 as the reference and qs_ent_2 as the model. settings – Passed to OligoLDDTScorer constructor. penalize_extra_chains – Passed to OligoLDDTScorer constructor.
SetSymmetries(symm_1, symm_2)

Set user-provided symmetry groups.

These groups are restricted to chain names appearing in ent_to_cm_1 and ent_to_cm_2 respectively. They are only valid if they cover all chains and both symm_1 and symm_2 have same lengths of symmetry group tuples. Otherwise trivial symmetry group used (see symm_1).

Parameters: symm_1 – Value to set for symm_1. symm_2 – Value to set for symm_2.
alignments

List of successful sequence alignments using chain_mapping.

There will be one alignment for each mapped chain and they are ordered by their chain names in qs_ent_1.

The first sequence of each alignment belongs to qs_ent_1 and the second one to qs_ent_2. The sequences are named according to the mapped chain names and have views attached into QSscoreEntity.ent of qs_ent_1 and qs_ent_2.

If res_num_alignment is False, each alignment is performed using a global BLOSUM62-based alignment. Otherwise, the positions in the alignment sequences are simply given by the residue number so that residues with matching numbers are aligned.

Getter: Computed on first use (cached) list of AlignmentHandle
best_score

QS-score without penalties.

Like global_score, but neglecting additional residues or chains in one of the biounits (i.e. the score is calculated considering only mapped chains and residues).

Getter: Computed on first use (cached) float QSscoreError if only one chain is mapped
chain_mapping

Mapping from ent_to_cm_1 to ent_to_cm_2.

Properties:

Details on algorithms used to find mapping:

• We try all pairs of chem. mapped chains within symmetry group and get superpose-transformation for them
• First option: check for “sufficient overlap” of other chain-pairs
• For each chain-pair defined above: apply superposition to full oligomer and map chains based on structural overlap
• Structural overlap = X% of residues in second oligomer covered within Y Angstrom of a (chem. mapped) chain in first oligomer. We successively try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in mapping (warning shown for most permissive one).
• If multiple possible mappings are found, we choose the one which leads to the lowest multi-chain-RMSD given the superposition
• Fallback option: try all mappings to find minimal multi-chain-RMSD (warning shown)
• For each chain-pair defined above: apply superposition, try all (!) possible chain mappings (within symmetry group) and keep mapping with lowest multi-chain-RMSD
• Repeat procedure above to resolve symmetry. Within the symmetry group we can use the chain mapping computed before and we just need to find which symmetry group in first oligomer maps to which in the second one. We again try all possible combinations...
• Limitations:
• Trying all possible mappings is a combinatorial nightmare (factorial). We throw an exception if too many combinations (e.g. octomer vs octomer with no usable symmetry)
• The mapping is forced: the “best” mapping will be chosen independently of how badly they fit in terms of multi-chain-RMSD
• As a result, such a forced mapping can lead to a large range of resulting QS scores. An extreme example was observed between 1on3.1 and 3u9r.1, where global_score can range from 0.12 to 0.43 for mappings with very similar multi-chain-RMSD.
Getter: Computed on first use (cached) dict with key / value = str (chain names, key for ent_to_cm_1, value for ent_to_cm_2) QSscoreError if there are too many combinations to check to find a chain mapping (see max_mappings_extensive).
chain_mapping_scheme

Mapping scheme used to get chain_mapping.

Possible values:

• ‘strict’: 80% overlap needed within 4 Angstrom (overlap based mapping).
• ‘tolerant’: 40% overlap needed within 6 Angstrom (overlap based mapping).
• ‘permissive’: 20% overlap needed within 8 Angstrom (overlap based mapping). It’s best if you check mapping manually!
• ‘extensive’: Extensive search used for mapping detection (fallback). This approach has known limitations and may be removed in future versions. Mapping should be checked manually!
• ‘user’: chain_mapping was set by user before first use of this attribute.
Getter: Computed with chain_mapping on first use (cached) str QSscoreError as in chain_mapping.
chem_mapping

Inter-complex mapping of chemical groups.

Each group (see QSscoreEntity.chem_groups) is mapped according to highest sequence identity. Alignment is between longest sequences in groups.

Limitations:

• If different numbers of groups, we map only the groups for the complex with less groups (rest considered unmapped and shown as warning)
• The mapping is forced: the “best” mapping will be chosen independently of how low the seq. identity may be
Getter: Computed on first use (cached) dict with key = tuple of chain names in qs_ent_1 and value = tuple of chain names in qs_ent_2. QSscoreError if we end up having no chains for either entity in the mapping (can happen if chains do not have CA atoms).
clustalw_bin

Full path to clustalw or clustalw2 executable to use for multiple sequence alignments (unless chain_mapping is provided manually).

Getter: Located in path on first use (cached) str
ent_to_cm_1

Subset of qs_ent_1 used to compute chain mapping and symmetries.

Properties:

This entity might be transformed (i.e. all positions rotated/translated by same transformation matrix) if this can speed up computations. So do not assume fixed global positions (but relative distances will remain fixed).

Getter: Computed on first use (cached) EntityHandle QSscoreError if any chain ends up having less than 5 res.
ent_to_cm_2

Subset of qs_ent_1 used to compute chain mapping and symmetries (see ent_to_cm_1 for details).

global_score

QS-score with penalties.

The range of the score is between 0 (i.e. no interface residues are shared between biounits) and 1 (i.e. the interfaces are identical).

The global QS-score is computed applying penalties when interface residues or entire chains are missing (i.e. anything that is not mapped in mapped_residues / chain_mapping) in one of the biounits.

Getter: Computed on first use (cached) float QSscoreError if only one chain is mapped
mapped_residues

Mapping of shared residues in alignments.

Getter: Computed on first use (cached) dict mapped_residues[c1][r1] = r2 with: c1 = Chain name in first entity (= first sequence in aln), r1 = Residue number in first entity, r2 = Residue number in second entity
superposition

Superposition result based on shared CA atoms in alignments.

The superposition can be used to map QSscoreEntity.ent of qs_ent_1 onto the one of qs_ent_2. Use ost.geom.Invert() if you need the opposite transformation.

Getter: Computed on first use (cached) ost.mol.alg.SuperpositionResult
symm_1

Symmetry groups for qs_ent_1 used to speed up chain mapping.

This is a list of chain-lists where each chain-list can be used reconstruct the others via cyclic C or dihedral D symmetry. The first chain-list is used as a representative symmetry group. For heteromers, the group-members must contain all different seqres in oligomer.

Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).

Properties:

• All symmetry group tuples have the same length (num. of chains)
• All chains in ent_to_cm_1 appear (w/o duplicates)
• For heteros: symmetry group tuples have all different chem. groups
• Trivial symmetry group = one tuple with all chains (used if inconsistent data provided or if no symmetry is found)
• Either compatible to symm_2 or trivial symmetry groups used. Compatibility requires same lengths of symmetry group tuples and it must be possible to get an overlap (80% of residues covered within 6 A of a (chem. mapped) chain) of all chains in representative symmetry groups by superposing one pair of chains.
Getter: Computed on first use (cached) list of tuple of str (chain names)
symm_2

Symmetry groups for qs_ent_2 (see symm_1 for details).

class QSscoreEntity(ent)

Entity with cached entries for QS scoring.

Any known / precomputed information can be filled into the appropriate attribute here as long as they are labelled as read/write. Otherwise the quantities are computed on first access and cached (lazy evaluation). The heaviest load is expected when computing contacts and contacts_ca.

Parameters: ent (EntityHandle or EntityView) – Entity to be used for QS scoring. A copy of it will be processed.
is_valid

True, if successfully initialized. False, if input structure has no protein chains with >= 20 residues.

Type: bool
original_name

Name set for ent when object was created.

Type: str
ent

Cleaned version of ent passed at construction. Hydrogens are removed, the entity is processed with a RuleBasedProcessor and chains listed in removed_chains have been removed. The name of this entity might change during scoring (see GetName()). Otherwise, this will be fixed.

removed_chains

Chains removed from ent passed at construction. These are ligand and water chains as well as small (< 20 res.) peptides or chains with no amino acids (determined by chem. type, which is set by rule based processor).

Type: list of str
calpha_only

Whether entity is CA-only (i.e. it has 0 CB atoms)

Type: bool
GetAlignment(c1, c2)

Get sequence alignment of chain c1 with chain c2. Computed on first use based on ca_chains (cached).

Parameters: c1 (str) – Chain name for first chain to align. c2 (str) – Chain name for second chain to align. AlignmentHandle or None if it failed.
GetAngles(c1, c2)

Get Euler angles from superposition of chain c1 with chain c2. Computed on first use based on ca_chains (cached).

Parameters: c1 (str) – Chain name for first chain to superpose. c2 (str) – Chain name for second chain to superpose. 3 Euler angles (may contain nan if something fails). numpy.array
GetAxis(c1, c2)

Get axis of symmetry from superposition of chain c1 with chain c2. Computed on first use based on ca_chains (cached).

Parameters: c1 (str) – Chain name for first chain to superpose. c2 (str) – Chain name for second chain to superpose. Rotational axis (may contain nan if something fails). numpy.array
GetName()

Wrapper to GetName() of ent. This is used to uniquely identify the entity while scoring. The name may therefore change while original_name remains fixed.

SetName(new_name)

Wrapper to SetName() of ent. Use this to change unique identifier while scoring (see GetName()).

ca_chains

Map of chain names in ent to sequences with attached view to CA-only chains (into ca_entity). Useful for alignments and superpositions.

Getter: Computed on first use (cached) dict (key = str, value = SequenceHandle)
ca_entity

Reduced representation of ent with only CA atoms. This guarantees that each included residue has exactly one atom.

Getter: Computed on first use (cached) EntityHandle
chem_groups

Intra-complex group of chemically identical (seq. id. > 95%) polypeptide chains as extracted from ca_chains. First chain in group is the one with the longest sequence.

Getter: Computed on first use (cached) list of list of str (chain names)
contacts

Connectivity dictionary (read/write). As given by GetContacts() with calpha_only = False on ent.

Getter: Computed on first use (cached) Uses FilterContacts() to ensure that we only keep contacts for chains in the cleaned entity. See return type of GetContacts()
contacts_ca

CA-only connectivity dictionary (read/write). Like contacts but with calpha_only = True in GetContacts().

FilterContacts(contacts, chain_names)

Filter contacts to contain only contacts for chains in chain_names.

Parameters: contacts (dict) – Connectivity dictionary as produced by GetContacts(). chain_names (list or (better) set) – Chain names to keep. New connectivity dictionary (format as in GetContacts()) dict
GetContacts(entity, calpha_only, dist_thr=12.0)

Get inter-chain contacts of a macromolecular entity.

Contacts are pairs of residues within a given distance belonging to different chains. They are stored once per pair and include the CA/CB-CA/CB distance.

Parameters: entity (EntityHandle or EntityView) – An entity to check connectivity for. calpha_only (bool) – If True, we only consider CA-CA distances. Else, we use CB unless the residue is a GLY. dist_thr (float) – Maximal CA/CB-CA/CB distance to be considered in contact. A connectivity dictionary. A pair of residues with chain names ch_name1 & ch_name2 (ch_name1 < ch_name2), residue numbers res_num1 & res_num2 and distance dist (<= dist_thr) are stored as result[ch_name1][ch_name2][res_num1][res_num2] = dist. dict
class OligoLDDTScorer(ref, mdl, alignments, calpha_only, settings, penalize_extra_chains=False, chem_mapping=None)

Helper class to calculate oligomeric lDDT scores.

This class can be used independently, but commonly it will be created by calling QSscorer.GetOligoLDDTScorer().

Note

By construction, lDDT scores are not symmetric and hence it matters which structure is the reference (ref) and which one is the model (mdl). Extra residues in the model are generally not considered. Extra chains in both model and reference can be considered by setting the penalize_extra_chains flag to True.

Parameters: ref – Sets ref mdl – Sets mdl alignments – Sets alignments calpha_only – Sets calpha_only settings – Sets settings penalize_extra_chains – Sets penalize_extra_chains chem_mapping – Sets chem_mapping. Must be given if penalize_extra_chains is True.
ref
mdl

Full reference/model entity to be scored. The entity must contain all chains mapped in alignments and may also contain additional ones which are considered if penalize_extra_chains is True.

alignments

One alignment for each mapped chain of ref/mdl as defined in QSscorer.alignments. The first sequence of each alignment belongs to ref and the second one to mdl. Sequences must have sequence naming and attached views as defined in QSscorer.alignments.

Type: list of AlignmentHandle
calpha_only

If True, restricts lDDT score to CA only.

Type: bool
settings

Settings to use for lDDT scoring.

penalize_extra_chains

If True, extra chains in both ref and mdl will penalize the lDDT scores.

Type: bool
chem_mapping

Inter-complex mapping of chemical groups as defined in QSscorer.chem_mapping. Used to find “chem-mapped” chains in ref for unmapped chains in mdl when penalizing scores. Each unmapped model chain can add extra reference-contacts according to the average total contacts of each single “chem-mapped” reference chain. If there is no “chem-mapped” reference chain, a warning is shown and the model chain is ignored.

Only relevant if penalize_extra_chains is True.

Type: dict with key = tuple of chain names in ref and value = tuple of chain names in mdl.
lddt_mdl

The model entity used for oligomeric lDDT scoring (oligo_lddt / oligo_lddt_scorer).

Like lddt_ref, this is a single chain X containing all chains of mdl. The residue numbers match the ones in lddt_ref where aligned and have unique numbers for additional residues.

Getter: Computed on first use (cached) EntityHandle
lddt_ref

The reference entity used for oligomeric lDDT scoring (oligo_lddt / oligo_lddt_scorer).

Since the lDDT computation requires a single chain with mapped residue numbering, all chains of ref are appended into a single chain X with unique residue numbers according to the column-index in the alignment. The alignments are in the same order as they appear in alignments. Additional residues are appended at the end of the chain with unique residue numbers. Unmapped chains are only added if penalize_extra_chains is True. Only CA atoms are considered if calpha_only is True.

Getter: Computed on first use (cached) EntityHandle
mapped_lddt_scorers

List of scorer objects for each chain mapped in alignments.

Getter: Computed on first use (cached) list of MappedLDDTScorer
oligo_lddt

Oligomeric lDDT score.

The score is computed as conserved contacts divided by the total contacts in the reference using the oligo_lddt_scorer, which uses the full complex as reference/model structure. If penalize_extra_chains is True, the reference/model complexes contain all chains (otherwise only the mapped ones) and additional contacts are added to the reference’s total contacts for unmapped model chains according to the chem_mapping.

The main difference with weighted_lddt is that the lDDT scorer “sees” the full complex here (incl. inter-chain contacts), while the weighted single chain score looks at each chain separately.

Getter: Computed on first use (cached) float
oligo_lddt_scorer

lDDT Scorer object for lddt_ref and lddt_mdl.

Getter: Computed on first use (cached) lDDTScorer
sc_lddt

List of global scores extracted from sc_lddt_scorers.

If scoring for a mapped chain fails, an error is displayed and a score of 0 is assigned.

Getter: Computed on first use (cached) list of float
sc_lddt_scorers

List of lDDT scorer objects extracted from mapped_lddt_scorers.

Type: list of lDDTScorer
weighted_lddt

Weighted average of single chain lDDT scores.

The score is computed as a weighted average of single chain lDDT scores (see sc_lddt_scorers) using the total contacts of each single reference chain as weights. If penalize_extra_chains is True, unmapped chains are added with a 0 score and total contacts taken from the actual reference chains or (for unmapped model chains) using the chem_mapping.

See oligo_lddt for a comparison of the two scores.

Getter: Computed on first use (cached) float
class MappedLDDTScorer(alignment, calpha_only, settings)

A simple class to calculate a single-chain lDDT score on a given chain to chain mapping as extracted from OligoLDDTScorer.

Parameters: alignment – Sets alignment calpha_only – Sets calpha_only settings – Sets settings
alignment

Alignment with two sequences named according to the mapped chains and with views attached to both sequences (e.g. one of the items of QSscorer.alignments).

The first sequence is assumed to be the reference and the second one the model. Since the lDDT score is not symmetric (extra residues in model are ignored), the order is important.

calpha_only

If True, restricts lDDT score to CA only.

Type: bool
settings

Settings to use for lDDT scoring.

lddt_scorer

lDDT Scorer object for the given chains.

Type: lDDTScorer
reference_chain_name

Chain name of the reference.

Type: str
model_chain_name

Chain name of the model.

Type: str
GetPerResidueScores()
Returns: Scores for each residue list of dict with one item for each residue existing in model and reference: “residue_number”: Residue number in reference chain “residue_name”: Residue name in reference chain “lddt”: local lDDT “conserved_contacts”: number of conserved contacts “total_contacts”: total number of contacts

## Steric Clashes¶

The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).

FilterClashes(entity, clashing_distances, always_remove_bb=False)

This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.

The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a ClashingInfo object containing information about the detected clashes.

Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter.

Hydrogen and deuterium atoms are ignored by this function.

Parameters: entity (EntityView or EntityHandle) – The input entity clashing_distances (ClashingDistances) – information about the clashing distances always_remove_bb (bool) – if set to True, the whole residue is removed even if the clash happens in the side-chain A tuple of two elements: The filtered EntityView, and a ClashingInfo object
CheckStereoChemistry(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)

This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain.

The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a StereoChemistryInfo object containing information about the detected stereo-chemical violations.

A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance, mean_length+std_dev*bond_tolerance] or an angle width outside of the range [mean_width-std_dev*angle_tolerance, mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.

Hydrogen and deuterium atoms are ignored by this function.

Parameters: entity (EntityView or EntityHandle) – The input entity bond_stats (StereoChemicalParams) – statistics about bond lengths angle_stats (StereoChemicalParams) – statistics about angle widths bond_tolerance (float) – tolerance for bond lengths (in standard deviations) angle_tolerance (float) – tolerance for angle widths (in standard deviations) always_remove_bb (bool) – if set to True, the whole residue is removed even if the clash happens in the side-chain A tuple of two elements: The filtered EntityView, and a StereoChemistryInfo object
class ClashingInfo

This object is returned by the FilterClashes() function, and contains information about the clashes detected by the function.

GetClashCount()
Returns: number of clashes between non-bonded atoms detected in the input structure
GetAverageOffset()
Returns: a value in Angstroms representing the average offset by which clashing atoms lie closer than the minimum acceptable distance (which of course differs for each possible pair of elements)
GetClashList()
Returns: list of detected inter-atomic clashes list of ClashEvent
class ClashEvent

This object contains all the information relative to a single clash detected by the FilterClashes() function

GetFirstAtom()
GetSecondAtom()
Returns: atoms which clash UniqueAtomIdentifier
GetModelDistance()
Returns: distance (in Angstroms) between the two clashing atoms as observed in the model
GetAdjustedReferenceDistance()
Returns: minimum acceptable distance (in Angstroms) between the two atoms involved in the clash, as defined in ClashingDistances
class StereoChemistryInfo

This object is returned by the CheckStereoChemistry() function, and contains information about bond lengths and planar angle widths in the structure that diverge from the parameters tabulated by Engh and Huber in the International Tables of Crystallography. Only elements that diverge from the tabulated value by a minimumnumber of standard deviations (defined when the CheckStereoChemistry function is called) are reported.

GetBadBondCount()
Returns: number of bonds where a serious violation was detected
GetBondCount()
Returns: total number of bonds in the structure checked by the CheckStereoChemistry function
GetAvgZscoreBonds()
Returns: average z-score of all the bond lengths in the structure, computed using Engh and Huber’s mean and standard deviation values
GetBadAngleCount()
Returns: number of planar angles where a serious violation was detected
GetAngleCount()
Returns: total number of planar angles in the structure checked by the CheckStereoChemistry function
GetAvgZscoreAngles()
Returns: average z-score of all the planar angle widths, computed using Engh and Huber’s mean and standard deviation values.
GetBondViolationList()
Returns: list of bond length violations detected in the structure list of StereoChemicalBondViolation
GetAngleViolationList()
Returns: list of angle width violations detected in the structure list of StereoChemicalAngleViolation
class StereoChemicalBondViolation

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length

GetFirstAtom()
GetSecondAtom()
Returns: first / second atom of the bond UniqueAtomIdentifier
GetBondLength()
Returns: length of the bond (in Angstroms) as observed in the model
GetAllowedRange()
Returns: allowed range of bond lengths (in Angstroms), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the CheckStereoChemistry() function was called tuple (minimum and maximum)
class StereoChemicalAngleViolation

This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width

GetFirstAtom()
GetSecondAtom()
GetThirdAtom()
Returns: first / second (vertex) / third atom that defines the planar angle UniqueAtomIdentifier
GetAngleWidth()
Returns: width of the planar angle (in degrees) as observed in the model
GetAllowedRange()
Returns: allowed range of angle widths (in degrees), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the CheckStereoChemistry() function was called tuple (minimum and maximum)
class ClashingDistances

Object containing information about clashing distances between non-bonded atoms

ClashingDistances()

Creates an empty distance list

SetClashingDistance(ele1, ele2, clash_distance, tolerance)

Adds or replaces an entry in the list

Parameters: ele1 – string containing the first element’s name ele2 – string containing the second element’s name clash_distance – minimum clashing distance (in Angstroms) tolerance – tolerance threshold (in Angstroms)
GetClashingDistance(ele1, ele2)
Returns: reference distance and a tolerance threshold (both in Angstroms) for two elements tuple (minimum clashing distance, tolerance threshold) ele1 – string containing the first element’s name ele2 – string containing the second element’s name
GetAdjustedClashingDistance(ele1, ele2)
Returns: reference distance (in Angstroms) for two elements, already adjusted by the tolerance threshold ele1 – string containing the first element’s name ele2 – string containing the second element’s name
GetMaxAdjustedDistance()
Returns: longest clashing distance (in Angstroms) in the list, after adjustment with tolerance threshold
IsEmpty()
Returns: True if the list is empty (i.e. in an invalid, useless state)
PrintAllDistances()

Prints all distances in the list to standard output

class StereoChemicalParams

Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation

StereoChemicalParams()

Creates an empty parameter list

SetParam(item, residue, mean, standard_dev)

Adds or replaces an entry in the list

Parameters: item – string defining a bond (format: X-Y) or an angle (format: X-Y-Z), where X,Y an Z are atom names residue – string containing the residue type for this entry mean – mean bond length (in Angstroms) or angle width (in degrees) standard_dev – standard deviation of the bond length (in Angstroms) or of the angle width (in degrees)
IsEmpty()
Returns: True if the list is empty (i.e. in an invalid, useless state)
PrintAllParameters()

Prints all entries in the list to standard output

FillClashingDistances(file_content)
FillBondStereoChemicalParams(file_content)
FillAngleStereoChemicalParams(file_content)

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from the content of a parameter file.

Parameters: file_content (list of str) – list of lines from the parameter file ClashingDistances or StereoChemicalParams
FillClashingDistancesFromFile(filename)
FillBondStereoChemicalParamsFromFile(filename)
FillAngleStereoChemicalParamsFromFile(filename)

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file path.

Parameters: filename (str) – path to parameter file ClashingDistances or StereoChemicalParams
DefaultClashingDistances()
DefaultBondStereoChemicalParams()
DefaultAngleStereoChemicalParams()

These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter files distributed with OpenStructure.

Return type: ClashingDistances or StereoChemicalParams
ResidueNamesMatch(probe, reference)

The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are compared by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)

Parameters: probe (EntityView) – the structure to test reference (EntityView) – the reference structure True if the residue names are the same, False otherwise

## Superposing structures¶

Superpose(ent_a, ent_b, match='number', atoms='all', iterative=False, max_iterations=5, distance_threshold=3.0)

Superposes the model entity onto the reference. To do so, two views are created, returned with the result. atoms describes what goes into these views and match the selection method. For superposition, SuperposeSVD() or IterativeSuperposeSVD() are called (depending on iterative). For matching, the following methods are recognised:

Parameters: ent_a (EntityView or EntityHandle) – The model entity (superposition transform is applied on full entity handle here) ent_b (EntityView or EntityHandle) – The reference entity match (str) – Method to gather residues/ atoms atoms (str, list, set) – The subset of atoms to be used in the superposition iterative (bool) – Whether or not to use iterative superpositon. max_iterations (int) – Max. number of iterations for IterativeSuperposeSVD() (only if iterative = True) distance_threshold (float) – Distance threshold for IterativeSuperposeSVD() (only if iterative = True) An instance of SuperpositionResult.
ParseAtomNames(atoms)

Parses different representations of a list of atom names and returns a set, understandable by MatchResidueByNum(). In essence, this function translates

• None to None
• ‘all’ to None
• ‘backbone’ to set(['N', 'CA', 'C', 'O'])
• ‘aname1, aname2’ to set(['aname1', 'aname2'])
• ['aname1', 'aname2'] to set(['aname1', 'aname2'])
Parameters: atoms (str, list, set) – Identifier or list of atoms A set of atoms.
MatchResidueByNum(ent_a, ent_b, atoms='all')

Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.

Parameters: ent_a (EntityView or EntityHandle) – The first entity ent_b (EntityView or EntityHandle) – The second entity atoms (str, list, set) – The subset of atoms to be included in the two views. Two EntityView instances with the same amount of residues matched by number. Each residue will have the same number & type of atoms.
MatchResidueByIdx(ent_a, ent_b, atoms='all')

Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.

Parameters: ent_a (EntityView or EntityHandle) – The first entity ent_b (EntityView or EntityHandle) – The second entity atoms (str, list, set) – The subset of atoms to be included in the two views. Two EntityView instances with the same amount of residues matched by position. Each residue will have the same number & type of atoms.
MatchResidueByLocalAln(ent_a, ent_b, atoms='all')

Match residues by local alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Smith/Waterman manner using the BLOSUM62 matrix for scoring. Only residues which are marked as peptide linking are considered for alignment. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.

Parameters: ent_a (EntityView or EntityHandle) – The first entity ent_b (EntityView or EntityHandle) – The second entity atoms (str, list, set) – The subset of atoms to be included in the two views. Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.
MatchResidueByGlobalAln(ent_a, ent_b, atoms='all')

Match residues by global alignment. Same as MatchResidueByLocalAln() but performs a global Needleman/Wunsch alignment of the sequences using the BLOSUM62 matrix for scoring.

Parameters: ent_a (EntityView or EntityHandle) – The first entity ent_b (EntityView or EntityHandle) – The second entity atoms (str, list, set) – The subset of atoms to be included in the two views. Two EntityView instances with the same number of residues. Each residue will have the same number & type of atoms.
class SuperpositionResult
rmsd

RMSD of the superposed entities.

view1
view2

Two EntityView used in superposition (not set if methods with Vec3List used).

transformation

Transformation (Mat4) used to map view1 onto view2.

fraction_superposed
rmsd_superposed_atoms
ncycles

For iterative superposition (IterativeSuperposeSVD()): fraction and RMSD of atoms that were superposed with a distance below the given threshold and the number of iteration cycles performed.

SuperposeSVD(view1, view2, apply_transform=True)
SuperposeSVD(list1, list2)

Superposition of two sets of atoms minimizing RMSD using a classic SVD based algorithm.

Note that the atom positions in the view are taken blindly in the order in which the atoms appear.

Parameters: view1 (EntityView) – View on the model entity view2 (EntityView) – View on the reference entity list1 (Vec3List) – List of atom positions for model entity list2 (Vec3List) – List of atom positions for reference entity apply_transform (bool) – If True, the superposition transform is applied to the (full!) entity handle linked to view1. An instance of SuperpositionResult.
IterativeSuperposeSVD(view1, view2, max_iterations=5, distance_threshold=3.0, apply_transform=True)
IterativeSuperposeSVD(list1, list2, max_iterations=5, distance_threshold=3.0)

Iterative superposition of two sets of atoms. In each iteration cycle, we keep a fraction of atoms with distances below distance_threshold and get the superposition considering only those atoms.

Note that the atom positions in the view are taken blindly in the order in which the atoms appear.

Parameters: view1 (EntityView) – View on the model entity view2 (EntityView) – View on the reference entity list1 (Vec3List) – List of atom positions for model entity list2 (Vec3List) – List of atom positions for reference entity max_iterations (int) – Max. number of iterations to be performed distance_threshold (float) – Distance threshold defining superposed atoms apply_transform (bool) – If True, the superposition transform is applied to the (full!) entity handle linked to view1. An instance of SuperpositionResult. Exception if atom counts do not match or if less than 3 atoms.
CalculateRMSD(view1, view2, transformation=geom.Mat4())
Returns: RMSD of atom positions (taken blindly in the order in which the atoms appear) in the two given views. float view1 (EntityView) – View on the model entity view2 (EntityView) – View on the reference entity transformation (Mat4) – Optional transformation to apply on each atom position of view1.

## Algorithms on Structures¶

Accessibility(ent, probe_radius=1.4, include_hydrogens=False, include_hetatm=False, include_water=False, oligo_mode=False, selection="", asa_abs="asaAbs", asa_rel="asaRel", asa_atom="asaAtom", algorithm = NACCESS)

Calculates the accesssible surface area for ever atom in ent. The algorithm mimics the behaviour of the bindings available for the NACCESS and DSSP tools and has been tested to reproduce the numbers accordingly.

Parameters: ent (EntityView / EntityHandle) – Entity on which to calculate the surface probe_radius (float) – Radius of probe to determine accessible surface area include_hydrogens (bool) – Whether to include hydrogens in the solvent accessibility calculations. By default, every atom with ele=H,D is simply neglected. include_hetatms (bool) – Whether to include atoms flagged as hetatms , i.e. ligands, in the solvent accessibility calculations. They are neglected by default. include_water (bool) – Whether to include water in the solvent accessibility calculations. By default, every residue with name “HOH” is neglected. oligo_mode (bool) – A typical used case of accessibility calculations is to determine the solvent accessibility of a full complex and then the accessibility of each chain individually. Lots of calculations can be cached because only the values of the atoms close to an interface change. This is exactly what happens when you activate the oligo mode. It returns exactly the same value but adds, additionally to the values estimated in full complex, the values from each individual chain as float properties on every residue and atom. Example for atom accessible surface if the according property name is set to “asaAtom”: Accessibility in the full complex is stored as “asaAtom”, the accessibility when only considering that particular chain is stored as “asaAtom_single_chain”. The other properties follow the same naming scheme. selection (str) – Selection statement, that gets applied on ent before doing anything. Everything that is not selected is neglected. The default value of “” results in no selection at all. asa_abs (str) – Float property name to assign the summed solvent accessible surface from each atom to a residue. asa_rel (str) – Float property name to assign the relative solvent accessibility to a residue. This is the absolute accessibility divided by the maximum solvent accessibility of that particular residue. This maximum solvent accessibility is dependent on the chosen AccessibilityAlgorithm. Only residues of the 20 standarad amino acids can be handled. In case of the NACCESS algorithm you can expect a value in range [0.0, 100.0] and a value of -99.9 for non standard residues. In case of the DSSP algorithm you can expect a value in range [0.0, 1.0], no float property is assigned in case of a non standard residue. asa_atom (str) – Float property name to assign the solvent accessible area to each atom. algorithm (AccessibilityAlgorithm) – Specifies the used algorithm for solvent accessibility calculations The summed solvent accessibilty of each atom in ent.
class AccessibilityAlgorithm

The accessibility algorithm enum specifies the algorithm used by the respective tools. Available are:

NACCESS, DSSP
AssignSecStruct(ent)

Assigns secondary structures to all residues based on hydrogen bond patterns as described by DSSP.

Parameters: ent (EntityView / EntityHandle) – Entity on which to assign secondary structures
class FindMemParam

Result object for the membrane detection algorithm described below

axis

initial search axis from which optimal membrane slab could be found

tilt_axis

Axis around which we tilt the membrane starting from the initial axis

tilt

Angle to tilt around tilt axis

angle

After the tilt operation we perform a rotation around the initial axis with this angle to get the final membrane axis

membrane_axis

The result of applying the tilt and rotation procedure described above. The membrane_axis is orthogonal to the membrane plane and has unit length.

pos

Real number that describes the membrane center point. To get the actual position you can do: pos * membrane_axis

width

Total width of the membrane in A

energy

Pseudo energy of the implicit solvation model

membrane_asa

Membrane accessible surface area

membrane_representation

Dummy atoms that represent the membrane. This entity is only valid if the according flag has been set to True when calling FindMembrane.

FindMembrane(ent, assign_membrane_representation=True, fast=False)

Estimates the optimal membrane position of a protein by using an implicit solvation model. The original algorithm and the used energy function are described in: Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI (2006) Positioning of proteins in membranes: A computational approach.

There are some modifications in this implementation and the procedure is as follows:

• Initial axis are constructed that build the starting point for initial parameter grid searches.
• For every axis, the protein is rotated so that the axis builds the z-axis
• In order to exclude internal hydrophilic pores, only the outermost atoms with respect the the z-axis enter an initial grid search
• The width and position of the membrane is optimized for different combinations of tilt and rotation angles (further described in FindMemParam). The top 20 parametrizations (only top parametrization if fast is True) are stored for further processing.
• The 20 best membrane parametrizations from the initial grid search (only the best if fast is set to True) enter a final minimization step using a Levenberg-Marquardt minimizer.
Parameters: ent (ost.mol.EntityHandle / ost.mol.EntityView) – Entity of a transmembrane protein, you’ll get weird results if this is not the case. The energy term of the result is typically a good indicator whether ent is an actual transmembrane protein. The following float properties will be set on the atoms: ‘asaAtom’ on all atoms that are selected with ent.Select(‘peptide=true and ele!=H’) as a result of envoking Accessibility(). ‘membrane_e’ the contribution of the potentially membrane facing atoms to the energy function. assign_membrane_representation (bool) – Whether to construct a membrane representation using dummy atoms fast – If set to false, the 20 best results of the initial grid search undergo a Levenberg-Marquardt minimization and the parametrization with optimal minimized energy is returned. If set to yes, only the best result of the initial grid search is selected and returned after Levenberg-Marquardt minimization. The results object ost.mol.alg.FindMemParam

## Trajectory Analysis¶

This is a set of functions used for basic trajectory analysis such as extracting positions, distances, angles and RMSDs. The organization is such that most functions have their counterpart at the individual frame level so that they can also be called on one frame instead of the whole trajectory.

All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.

SuperposeFrames(frames, sel, from=0, to=-1, ref=-1)

This function superposes the frames of the given coord group and returns them as a new coord group.

Parameters: frames (CoordGroupHandle) – The source coord group. sel (ost.mol.EntityView) – An entity view containing the selection of atoms to be used for superposition. If set to an invalid view, all atoms in the coord group are used. from – index of the first frame to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group ref – The index of the reference frame to use for superposition. If set to -1, the each frame is superposed to the previous frame. A newly created coord group containing the superposed frames.
SuperposeFrames(frames, sel, ref_view, from=0, to=-1)

Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.

Parameters: frames (CoordGroupHandle) – The source coord group. sel (ost.mol.EntityView) – An entity view containing the selection of atoms of the frames to be used for superposition. ref_view (ost.mol.EntityView) – The reference view on which the frames will be superposed. The number of atoms in this reference view should be equal to the number of atoms in sel. from – index of the first frame to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group A newly created coord group containing the superposed frames.
AnalyzeAtomPos(traj, atom1, stride=1)

This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. atom1 – The AtomHandle. stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeCenterOfMassPos(traj, sele, stride=1)

This function extracts the position of the center-of-mass of a selection (EntityView) from a trajectory and returns it as a vector.

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. sele (EntityView.) – The selection from which the center of mass is computed stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeDistanceBetwAtoms(traj, atom1, atom2, stride=1)

This function extracts the distance between two atoms from a trajectory and returns it as a vector.

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. atom1 – The first AtomHandle. atom2 – The second AtomHandle. stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeAngle(traj, atom1, atom2, atom3, stride=1)

This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. atom1 – The first AtomHandle. atom2 – The second AtomHandle. atom3 – The third AtomHandle. stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeDihedralAngle(traj, atom1, atom2, atom3, atom4, stride=1)

This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. atom1 – The first AtomHandle. atom2 – The second AtomHandle. atom3 – The third AtomHandle. atom4 – The fourth AtomHandle. stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeDistanceBetwCenterOfMass(traj, sele1, sele2, stride=1)

This function extracts the distance between the center-of-mass of two selections (EntityView) from a trajectory and returns it as a vector.

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. sele1 (EntityView.) – The selection from which the first center of mass is computed sele2 (EntityView.) – The selection from which the second center of mass is computed stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeRMSD(traj, reference_view, sele_view, stride=1)

This function extracts the rmsd between two EntityView and returns it as a vector. The views don’t have to be from the same entity. The reference positions are taken directly from the reference_view, evaluated only once. The positions from the sele_view are evaluated for each frame. If you want to compare to frame i of the trajectory t, first use t.CopyFrame(i) for example:

eh = io.LoadPDB(...)
sele = eh.Select(...)
t.CopyFrame(0)
mol.alg.AnalyzeRMSD(t, sele, sele)

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. reference_view (EntityView.) – The selection used as reference structure sele_view (EntityView.) – The selection compared to the reference_view stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeMinDistance(traj, view1, view2, stride=1)

This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. view1 (EntityView.) – The first group of atoms view2 (EntityView.) – The second group of atoms stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeMinDistanceBetwCenterOfMassAndView(traj, view_cm, view_atoms, stride=1)

This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. view_cm (EntityView.) – The group of atoms from which the center of mass is taken view_atoms (EntityView.) – The second group of atoms stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
AnalyzeAromaticRingInteraction(traj, view_ring1, view_ring2, stride=1)

This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings

Parameters: traj (CoordGroupHandle) – The trajectory to be analyzed. view_ring1 (EntityView.) – First group of atoms view_ring2 (EntityView.) – Second group of atoms stride – Size of the increment of the frame’s index between two consecutive frames analyzed.

## helix_kinks – Algorithms to calculate Helix Kinks¶

Functions to calculate helix kinks: bend, face shift and wobble angles

Author: Niklaus Johner

AnalyzeHelixKink(t, sele, proline=False)

This function calculates the bend, wobble and face-shift angles in an alpha- helix over a trajectory. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.

Parameters: t (CoordGroup) – The trajectory to be analyzed sele (EntityView) – A selection containing the alpha helix to be analyzed proline (ost.mol.EntityView) – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be searched for automatically A tuple (bend_angle, face_shift, wobble_angle). (FloatList, FLoatList, FloatList)
CalculateHelixKink(sele, proline=False)

This function calculates the bend, wobble and face-shift angles in an alpha- helix of an EntityView. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.

Parameters: sele (EntityView) – A selection containing the alpha helix to be analyzed proline (ost.mol.EntityView) – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be searched for automatically A tuple (bend_angle, face_shift, wobble_angle). (float, float, float)

## trajectory_analysis – DRMSD, pairwise distances and more¶

This Module requires numpy

This module contains functions to analyze trajectories, mainly similiraty measures baed on RMSDS and pairwise distances.

Author: Niklaus Johner (niklaus.johner@unibas.ch)

AverageDistanceMatrixFromTraj(t, sele, first=0, last=-1)

This function calcultes the distance between each pair of atoms in sele, averaged over the trajectory t.

Parameters: t (CoordGroupHandle) – the trajectory sele (EntityView) – the selection used to determine the atom pairs first (int) – the first frame of t to be used last (int) – the last frame of t to be used a numpy NpairsxNpairs matrix, where Npairs is the number of atom pairs in sele.
DistRMSDFromTraj(t, sele, ref_sele, radius=7.0, average=False, seq_sep=4, first=0, last=-1)

This function calculates the distance RMSD from a trajectory. The distances selected for the calculation are all the distances between pair of atoms from residues that are at least seq_sep apart in the sequence and that are smaller than radius in ref_sel. The number and order of atoms in ref_sele and sele should be the same.

Parameters: t (CoordGroupHandle) – the trajectory sele (EntityView) – the selection used to calculate the distance RMSD ref_sele (EntityView) – the reference selection used to determine the atom pairs and reference distances radius (float) – the upper limit of distances in ref_sele considered for the calculation seq_sep (int) – the minimal sequence separation between atom pairs considered for the calculation average (bool) – use the average distance in the trajectory as reference instead of the distance obtained from ref_sele first (int) – the first frame of t to be used last (int) – the last frame of t to be used a numpy vecor dist_rmsd(Nframes).
DistanceMatrixFromPairwiseDistances(distances, p=2)

This function calculates an distance matrix M(NframesxNframes) from the pairwise distances matrix D(NpairsxNframes), where Nframes is the number of frames in the trajectory and Npairs the number of atom pairs. M[i,j] is the distance between frame i and frame j calculated as a p-norm of the differences in distances from the two frames (distance-RMSD for p=2).

Parameters: distances – a pairwise distance matrix as obtained from PairwiseDistancesFromTraj() p – exponent used for the p-norm. a numpy NframesxNframes matrix, where Nframes is the number of frames.
PairwiseDistancesFromTraj(t, sele, first=0, last=-1, seq_sep=1)

This function calculates the distances between any pair of atoms in sele with sequence separation larger than seq_sep from a trajectory t. It return a matrix containing one line for each atom pair and Nframes columns, where Nframes is the number of frames in the trajectory.

Parameters: t (CoordGroupHandle) – the trajectory sele (EntityView) – the selection used to determine the atom pairs first (int) – the first frame of t to be used last (int) – the last frame of t to be used seq_sep (int) – The minimal sequence separation between atom pairs a numpy NpairsxNframes matrix.
RMSD_Matrix_From_Traj(t, sele, first=0, last=-1, align=True, align_sele=None)

This function calculates a matrix M such that M[i,j] is the RMSD (calculated on sele) between frames i and j of the trajectory t aligned on sele.

Parameters: t (CoordGroupHandle) – the trajectory sele (EntityView) – the selection used for alignment and RMSD calculation first (int) – the first frame of t to be used last (int) – the last frame of t to be used Returns a numpy NframesxNframes matrix, where Nframes is the number of frames.

## structure_analysis – Functions to analyze structures¶

Some functions for analyzing structures

Author: Niklaus Johner (Niklaus.Johner@unibas.ch)

CalculateBestFitLine(sele1)

This function calculates the best fit line to the atoms in sele1.

Parameters: sele1 (EntityView) – Line3
CalculateBestFitPlane(sele1)

This function calculates the best fit plane to the atoms in sele1.

Parameters: sele1 (EntityView) – Plane
CalculateDistanceDifferenceMatrix(sele1, sele2)

This function calculates the pairwise distance differences between two selections (EntityView). The two selections should have the same number of atoms It returns an NxN DistanceDifferenceMatrix M (where N is the number of atoms in sele1) where M[i,j]=||(sele2.atoms[i].pos-sele2.atoms[j].pos)||-||(sele1.atoms[i].pos-sele1.atoms[j].pos)||

Parameters: sele1 (EntityView) – sele2 (EntityView) – NxN numpy matrix
CalculateHelixAxis(sele1)

This function calculates the best fit cylinder to the CA atoms in sele1, and returns its axis. Residues should be ordered correctly in sele1.

Parameters: sele1 (EntityView) – Line3
GetAlphaHelixContent(sele1)

This function calculates the content of alpha helix in a view. All residues in the view have to ordered and adjacent (no gaps allowed)

Parameters: sele1 (EntityView) – float
GetDistanceBetwCenterOfMass(sele1, sele2)

This function calculates the distance between the centers of mass of sele1 and sele2, two selections from the same Entity.

Parameters: sele1 (EntityView) – sele2 (EntityView) – float
GetFrameFromEntity(eh)

This function returns a CoordFrame from an EntityHandle

Parameters: eh (EntityHandle) – ost.mol.CoordFrame
GetMinDistBetwCenterOfMassAndView(sele1, sele2)

This function calculates the minimal distance between sele2 and the center of mass of sele1, two selections from the same Entity.

Parameters: sele1 (EntityView) – The selection from which the center of mass is taken sele2 (EntityView) – distance (float)
GetMinDistanceBetweenViews(sele1, sele2)

This function calculates the minimal distance between sele1 and sele2, two selections from the same Entity.

Parameters: sele1 (EntityView) – sele2 (EntityView) – float

## Mapping functions¶

The following functions help to convert one residue into another by reusing as much as possible from the present atoms. They are mainly meant to map from standard amino acid to other standard amino acids or from modified amino acids to standard amino acids.

CopyResidue(src_res, dst_res, editor)

Copies the atoms of src_res to dst_res using the residue names as guide to decide which of the atoms should be copied. If src_res and dst_res have the same name, or src_res is a modified version of dst_res (i.e. have the same single letter code), CopyConserved() will be called, otherwise CopyNonConserved().

If a CBeta atom wasn’t already copied from src_res, a new one at a reconstructed position will be added to dst_res if it is not GLY and all backbone positions are available to do it.

Parameters: src_res (ResidueHandle) – The source residue dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid) editor (XCSEditor) – Editor used to modify dst_res. True if the residue could be copied as a conserved residue, False if it had to fallback to CopyNonConserved().
CopyConserved(src_res, dst_res, editor)

Copies the atoms of src_res to dst_res assuming that the parent amino acid of src_res (or src_res itself) are identical to dst_res.

If src_res and dst_res are identical, all heavy atoms are copied to dst_res. If src_res is a modified version of dst_res and the modification is a pure addition (e.g. the phosphate group of phosphoserine), the modification is stripped off and all other heavy atoms are copied to dst_res. If the modification is not a pure addition, it falls back to CopyNonConserved().

Additionally, the selenium atom of MSE is converted to sulphur to map MSE to MET.

Parameters: src_res (ResidueHandle) – The source residue dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid) editor (XCSEditor) – Editor used to modify dst_res. A tuple of bools stating whether the residue could be copied without falling back to CopyNonConserved() and whether the CBeta atom was copied from src_res to dst_res.
CopyNonConserved(src_res, dst_res, editor)

Copies the heavy backbone atoms and CBeta (except for GLY) of src_res to dst_res.

Parameters: src_res (ResidueHandle) – The source residue dst_res (ResidueHandle) – The destination residue (expected to be a standard amino acid) editor (XCSEditor) – Editor used to modify dst_res. A tuple of bools as in CopyConserved() with the first bool always being False.

## Molecular Checker (Molck)¶

### Programmatic usage¶

Molecular Checker (Molck) could be called directly from the code using Molck function:

#! /bin/env python

"""Run Molck with Python API.

This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""

from ost.mol.alg import MolckSettings, Molck

from ost.conop import CompoundLib

pdbid = "<PDB PATH>"

# Using Molck function
ms = MolckSettings(rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False,
map_nonstd_res=False,
assign_elem=True)
Molck(ent, lib, ms)
SavePDB(ent, "<OUTPUT PATH>")


It can also be split into subsequent commands for greater controll:

#! /bin/env python

"""Run Molck with Python API.

This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:

molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""

from ost.mol.alg import (RemoveAtoms, MapNonStandardResidues,
CleanUpElementColumn)
from ost.conop import CompoundLib

pdbid = "<PDB PATH>"
map_nonstd = False

# Using function chain
if map_nonstd:
MapNonStandardResidues(lib=lib, ent=ent)

RemoveAtoms(lib=lib,
ent=ent,
rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False)

CleanUpElementColumn(lib=lib, ent=ent)
SavePDB(ent, "<OUTPUT PATH>")


### API¶

class MolckSettings(rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=True, assign_elem=True)

Stores settings used for Molecular Checker.

Parameters: rm_unk_atoms – Sets rm_unk_atoms. rm_non_std – Sets rm_non_std. rm_hyd_atoms – Sets rm_hyd_atoms. rm_oxt_atoms – Sets rm_oxt_atoms. rm_zero_occ_atoms – Sets rm_zero_occ_atoms. colored – Sets colored. map_nonstd_res – Sets map_nonstd_res. assign_elem – Sets assign_elem.
rm_unk_atoms

Remove unknown and atoms not following the nomenclature.

Type: bool
rm_non_std

Remove all residues not one of the 20 standard amino acids

Type: bool
rm_hyd_atoms

Remove hydrogen atoms

Type: bool
rm_oxt_atoms

Remove terminal oxygens

Type: bool
rm_zero_occ_atoms

Remove atoms with zero occupancy

Type: bool
colored

Whether output should be colored

Type: bool
map_nonstd_res

Maps modified residues back to the parent amino acid, for example MSE -> MET, SEP -> SER

Type: bool
assign_elem

Clean up element column

Type: bool
ToString()
Returns: String representation of the MolckSettings. str

Warning

The API here is set such that the functions modify the passed structure ent in-place. If this is not ok, please work on a copy of the structure.

Molck(ent, lib, settings[, prune=True])

Runs Molck on provided entity.

Parameters: ent (EntityHandle) – Structure to check lib (CompoundLib) – Compound library settings (MolckSettings) – Molck settings prune (bool) – Whether to remove residues/chains that don’t contain atoms anymore after Molck cleanup
MapNonStandardResidues(ent, lib)

Maps modified residues back to the parent amino acid, for example MSE -> MET.

Parameters: ent (EntityHandle) – Structure to check lib (CompoundLib) – Compound library
RemoveAtoms(ent, lib, rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False)

Removes atoms and residues according to some criteria.

Parameters: ent (EntityHandle) – Structure to check lib (CompoundLib) – Compound library rm_unk_atoms – See MolckSettings.rm_unk_atoms rm_non_std – See MolckSettings.rm_non_std rm_hyd_atoms – See MolckSettings.rm_hyd_atoms rm_oxt_atoms – See MolckSettings.rm_oxt_atoms rm_zero_occ_atoms – See MolckSettings.rm_zero_occ_atoms colored – See MolckSettings.colored
CleanUpElementColumn(ent, lib)

Clean up element column.

Parameters: ent (EntityHandle) – Structure to check lib (CompoundLib) – Compound library

## Search

Enter search terms or a module, class or function name.

Trajectories

The mm Module