mol.alg
– Algorithms for Structures¶
Local Distance Test scores (lDDT, DRMSD)¶
Note
This is a new implementation of lDDT, introduced in OpenStructure 2.4 with focus on supporting quaternary structure and compounds beyond the 20 standard proteinogenic amino acids. The previous lDDT code that comes with Mariani et al. is considered deprecated.
Note
lddt.lDDTScorer
provides the raw Python API to compute lDDT but
stereochemistry checks as described in
Mariani et al.
must be done seperately. You may want to check out the
compare-structures
action (Comparing two structures) to
compute lDDT with pre-processing and support for quaternary structures.
- class lDDTScorer(target, compound_lib=None, custom_compounds=None, inclusion_radius=15, sequence_separation=0, symmetry_settings=None, seqres_mapping={}, bb_only=False)¶
lDDT scorer object for a specific target
Sets up everything to score models of that target. lDDT (local distance difference test) is defined as fraction of pairwise distances which exhibit a difference < threshold when considering target and model. In case of multiple thresholds, the average is returned. See
V. Mariani, M. Biasini, A. Barbato, T. Schwede, lDDT : A local superposition-free score for comparing protein structures and models using distance difference tests, Bioinformatics, 2013
- Parameters:
target (
ost.mol.EntityHandle
/ost.mol.EntityView
) – The targetcompound_lib (
ost.conop.CompoundLib
) – Compound library from which a compound for each residue is extracted based on its name. Usesost.conop.GetDefaultLib()
if not given, raises if this returns no valid compound library. Atoms defined in the compound are searched in the residue and build the reference for scoring. If the residue has atoms with names [“A”, “B”, “C”] but the corresponding compound only has [“A”, “B”], “A” and “B” are considered for scoring. If the residue has atoms [“A”, “B”] but the compound has [“A”, “B”, “C”], “C” is considered missing and does not influence scoring, even if present in the model.custom_compounds (
dict
with residue names (str
) as key andCustomCompound
as value.) – Custom compounds defining reference atoms. If given, custom_compounds take precedent over compound_lib.inclusion_radius (
float
) – All pairwise distances < inclusion_radius are considered for scoringsequence_separation (
int
) – Only pairwise distances between atoms of residues which are further apart than this threshold are considered. Residue distance is based on resnum. The default (0) considers all pairwise distances except intra-residue distances.symmetry_settings (
SymmetrySettings
) – Define residues exhibiting internal symmetry, usesGetDefaultSymmetrySettings()
if not given.seqres_mapping (
dict
(key:str
, value:ost.seq.AlignmentHandle
)) – Mapping of model residues at the scoring stage happens with residue numbers defining their location in a reference sequence (SEQRES) using one based indexing. If the residue numbers in target don’t correspond to that SEQRES, you can specify the mapping manually. You can provide a dictionary to specify a reference sequence (SEQRES) for one or more chain(s). Key: chain name, value: alignment (seq1: SEQRES, seq2: sequence of residues in chain). Example: The residues in a chain with name “A” have sequence “YEAH” and residue numbers [42,43,44,45]. You can provide an alignment with seq1 “HELLYEAH
” and seq2 “----YEAH
”. “Y” gets assigned residue number 5, “E” gets assigned 6 and so on no matter what the original residue numbers were.bb_only (
bool
) – Only consider atoms with name “CA” in case of amino acids and “C3’” for Nucleotides. this invalidates compound_lib. Raises if any residue in target is not r.chem_class.IsPeptideLinking() or r.chem_class.IsNucleotideLinking()
- Raises:
RuntimeError
if target contains compound which is not in compound_lib,RuntimeError
if symmetry_settings specifies symmetric atoms that are not present in the according compound in compound_lib,RuntimeError
if seqres_mapping is not provided and target contains residue numbers with insertion codes or the residue numbers for each chain are not monotonically increasing,RuntimeError
if seqres_mapping is provided but an alignment is invalid (seq1 contains gaps, mismatch in seq1/seq2, seq2 does not match residues in corresponding chains).
- GetNChainContacts(target_chain, no_interchain=False)¶
Returns number of contacts expected for a certain chain in target
- Parameters:
target_chain (
str
) – Chain in target for which you want the number of expected contactsno_interchain (
bool
) – Whether to exclude interchain contacts
- Raises:
RuntimeError
if specified chain doesnt exist
- lDDT(model, thresholds=[0.5, 1.0, 2.0, 4.0], local_lddt_prop=None, local_contact_prop=None, chain_mapping=None, no_interchain=False, no_intrachain=False, penalize_extra_chains=False, residue_mapping=None, return_dist_test=False, check_resnames=True)¶
Computes lDDT of model - globally and per-residue
- Parameters:
model (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Model to be scored - models are preferably scored upon performing stereo-chemistry checks in order to punish for non-sensical irregularities. This must be done separately as a pre-processing step. Target contacts that are not covered by model are considered not conserved, thus decreasing lDDT score. This also includes missing model chains or model chains for which no mapping is provided in chain_mapping.thresholds (
list
offloats
) – Thresholds of distance differences to be considered as correct - see docs in constructor for more info. default: [0.5, 1.0, 2.0, 4.0]local_lddt_prop (
str
) – If set, per-residue scores will be assigned as generic float property of that namelocal_contact_prop (
str
) – If set, number of expected contacts as well as number of conserved contacts will be assigned as generic int property. Excected contacts will be set as <local_contact_prop>_exp, conserved contacts as <local_contact_prop>_cons. Values are summed over all thresholds.chain_mapping (
dict
withstr
as keys/values) – Mapping of model chains (key) onto target chains (value). This is required if target or model have more than one chain.no_interchain (
bool
) – Whether to exclude interchain contactsno_intrachain (
bool
) – Whether to exclude intrachain contacts (i.e. only consider interface related contacts)penalize_extra_chains – Whether to include a fixed penalty for additional chains in the model that are not mapped to the target. ONLY AFFECTS RETURNED GLOBAL SCORE. In detail: adds the number of intra-chain contacts of each extra chain to the expected contacts, thus adding a penalty.
penalize_extra_chains –
bool
residue_mapping (
dict
with key:str
, value:ost.seq.AlignmentHandle
) – By default, residue mapping is based on residue numbers. That means, a model chain and the respective target chain map to the same underlying reference sequence (SEQRES). Alternatively, you can specify one or several alignment(s) between model and target chains by providing a dictionary. key: Name of chain in model (respective target chain is extracted from chain_mapping), value: Alignment with first sequence corresponding to target chain and second sequence to model chain. There is NO reference sequence involved, so the two sequences MUST exactly match the actual residues observed in the respective target/model chains (ATOMSEQ).return_dist_test – Whether to additionally return the underlying per-residue data for the distance difference test. Adds five objects to the return tuple. First: Number of total contacts summed over all thresholds Second: Number of conserved contacts summed over all thresholds Third: list with length of scored residues. Contains indices referring to model.residues. Fourth: numpy array of size len(scored_residues) containing the number of total contacts, Fifth: numpy matrix of shape (len(scored_residues), len(thresholds)) specifying how many for each threshold are conserved.
check_resnames (
bool
) – On by default. Enforces residue name matches between mapped model and target residues.
- Returns:
global and per-residue lDDT scores as a tuple - first element is global lDDT score and second element a list of per-residue scores with length len(model.residues) None is assigned to residues that are not covered by target
- class SymmetrySettings¶
Container for symmetric compounds
lDDT considers symmetries and selects the one resulting in the highest possible score.
A symmetry is defined as a renaming operation on one or more atoms that leads to a chemically equivalent residue. Example would be OD1 and OD2 in ASP => renaming OD1 to OD2 and vice versa gives a chemically equivalent residue.
Use
AddSymmetricCompound()
to define a symmetry which can then directly be accessed through the symmetric_compounds member.- AddSymmetricCompound(name, symmetric_atoms)¶
Adds symmetry for compound with name
- Parameters:
name (
str
) – Name of compound with symmetrysymmetric_atoms (
list
oftuple
) – Pairs of atom names that define renaming operation, i.e. after applying all switches defined in the tuples, the resulting residue should be chemically equivalent. Atom names must refer to the PDB component dictionary.
- GetDefaultSymmetrySettings()¶
Constructs and returns
SymmetrySettings
object for natural amino acids
- class CustomCompound(atom_names)¶
Defines atoms for custom compounds
lDDT requires the reference atoms of a compound which are typically extracted from a
ost.conop.CompoundLib
. This lightweight container allows to handle arbitrary compounds which are not necessarily in the compound library.- Parameters:
atom_names (
list
ofstr
) – Names of atoms of custom compound
- static FromResidue(res)¶
Construct custom compound from residue
- Parameters:
res (
ost.mol.ResidueView
/ost.mol.ResidueHandle
) – Residue from which reference atom names are extracted, hydrogen/deuterium atoms are filtered out- Returns:
- class lDDTSettings(radius=15, sequence_separation=0, cutoffs=(0.5, 1.0, 2.0, 4.0), label='locallddt')¶
Object containing the settings used for lDDT calculations.
- Parameters:
radius – Sets
radius
.sequence_separation – Sets
sequence_separation
.cutoffs – Sets
cutoffs
.label – Sets
label
.
- radius¶
Distance inclusion radius.
- Type:
float
- sequence_separation¶
Sequence separation.
- Type:
int
- cutoffs¶
List of thresholds used to determine distance conservation.
- Type:
list
offloat
- label¶
The base name for the ResidueHandle properties that store the local scores.
- Type:
str
- PrintParameters()¶
Print settings.
- ToString()¶
- Returns:
String representation of the lDDTSettings object.
- Return type:
str
stereochemistry
– Stereochemistry Checks¶
Warning
Stereochemistry checks described in Mariani et al. are considered deprecated. They have been re-implemented and now support nucleotides. The old code is still available and documented here.
- StereoDataFromMON_LIB(mon_lib_path, compounds=None)¶
Parses stereochemistry parameters from CCP4 MON_LIB
CCP4 MON_LIB contains data on ideal bond lengths/angles for compounds.
Original data (several updates in the meantime) come from:
Amino acid bond lengths and angles: Engh and Huber, Acta Cryst. A47, 392-400 (1991).
Purine and pyrimidine bond lengths and angles: O. Kennard & R. Taylor (1982), J. Am. Soc. Chem. vol. 104, pp. 3209-3212.
Sugar-phosphate backbone bond lengths and bond angles: W. Saenger’s Principles of Nucleic Acid Structure (1983), Springer-Verlag, pp. 70,86.
This function adds a dependency to the gemmi library to read cif files.
- Parameters:
mon_lib_path (
str
) – Path to CCP4 MON_LIBcompounds (
list
) – Compounds to parse - parses proteinogenic amino acids and nucleotides if not given.
- Returns:
dict
with stereochemistry parameters
- GetBondParam(a1, a2, stereo_data=None, stereo_link_data=None)¶
Returns mean and standard deviation for bond
- Parameters:
a1 (
ost.mol.AtomView
/ost.mol.AtomHandle
) – First atom that defines bonda2 (
ost.mol.AtomView
/ost.mol.AtomHandle
) – Second atom that defines bondstereo_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoData()
if not given. If you call this function repeatedly, you really should provide stereo_data!stereo_link_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoLinkData()
if not given. If you call this function repeatedly, you really should provide stereo_link_data!
- Returns:
tuple
with mean and standard deviation. Values are None if respective bond is not found in stereo_data
- GetAngleParam(a1, a2, a3, stereo_data=None, stereo_link_data=None)¶
Returns mean and standard deviation for angle
- Parameters:
a1 (
ost.mol.AtomView
/ost.mol.AtomHandle
) – First atom that defines anglea2 (
ost.mol.AtomView
/ost.mol.AtomHandle
) – Second atom that defines anglea3 (
ost.mol.AtomView
/ost.mol.AtomHandle
) – Third atom that defines anglestereo_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoData()
if not given. If you call this function repeatedly, you really should provide stereo_data!stereo_link_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoLinkData()
if not given. If you call this function repeatedly, you really should provide stereo_link_data!
- Returns:
tuple
with mean and standard deviation. Values are None if respective angle is not found in stereo_data
- class ClashInfo(a1, a2, dist, tolerated_dist)¶
Object to hold info on clash
Constructor arguments are available as attributes:
a1 (
ost.mol.AtomHandle
)a2 (
ost.mol.AtomHandle
)dist (
float
)tolerated_dist (
float
)
- ToJSON(decimals=3)¶
Return JSON serializable dict
Atoms are represented by a string in format: <chain_name>.<resnum>.<resnum_inscode>.<atom_name>
- class BondViolationInfo(a1, a2, length, exp_length, std)¶
Object to hold info on bond violation
Constructor arguments are available as attributes:
a1 (
ost.mol.AtomHandle
)a2 (
ost.mol.AtomHandle
)length (
float
)exp_length (
float
)std (
float
)
- ToJSON(decimals=3)¶
Return JSON serializable dict
Atoms are represented by a string in format: <chain_name>.<resnum>.<resnum_inscode>.<atom_name>
- class AngleViolationInfo(a1, a2, a3, angle, exp_angle, std)¶
Object to hold info on angle violation
Constructor arguments are available as attributes:
a1 (
ost.mol.AtomHandle
)a2 (
ost.mol.AtomHandle
)a3 (
ost.mol.AtomHandle
)angle (
float
)exp_angle (
float
)std (
float
)
- ToJSON(decimals=3)¶
Return JSON serializable dict
Atoms are represented by a string in format: <chain_name>.<resnum>.<resnum_inscode>.<atom_name>
- GetClashes(ent, vdw_radii=None, tolerance=1.5, disulfid_dist=2.03, disulfid_tolerance=1.0)¶
Identifies clashing atoms
A clash between two non-bonded atoms is defined as their distance d being below the sum of their vdw radii with some subtracted tolerance value.
The default values are not very sensitive.
- Parameters:
ent (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Entity for which you want to identify clashing atomsvdw_radii (
dict
) – Element based van der Waals radii. Only atoms of these elements will be considered. If not given, default values for all elements occuring in proteins/nucleotides are used. Must be provided asdict
, where they key are elements (capitalized) and value the respective radii in Angstrom.tolerance – Tolerance value
disulfid_dist (
float
) – Summed vdw radius that is used if two Sulfurs that can potentially build a disulfid bond interactdisulfid_tolerance – The respective tolerance
- Returns:
A
list
ofClashInfo
- GetBadBonds(ent, stereo_data=None, stereo_link_data=None, tolerance=12)¶
Identify unrealistic bonds
- Parameters:
ent (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Entity for which you want to identify unrealistic bondsstereo_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoData()
if not given.stereo_link_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoLinkData()
if not given.tolerance (
int
) – Bonds that devaiate more than tolerance times standard deviation from expected mean are considered bad
- Returns:
list
BondViolationInfo
- GetBadAngles(ent, stereo_data=None, stereo_link_data=None, tolerance=12)¶
Identify unrealistic angles
- Parameters:
ent (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Entity for which you want to identify unrealistic anglesstereo_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoData()
if not given.stereo_link_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoLinkData()
if not given.tolerance (
int
) – Angles that devaiate more than tolerance times standard deviation from expected mean are considered bad
- Returns:
list
ofAngleViolationInfo
- StereoCheck(ent, stereo_data=None, stereo_link_data=None)¶
Remove atoms with stereochemical problems
Selects for peptide/nucleotides and calls
GetClashes()
,GetBadBonds()
andGetBadAngles()
with default parameters.Amino acids: Remove full residue if backbone atom is involved in stereochemistry issue (“N”, “CA”, “C”, “O”). Remove sidechain if any of the sidechain atoms is involved in stereochemistry issues.
Nucleotides: Remove full residue if backbone atom is involved in stereochemistry issue (“P”, “OP1”, “OP2”, “OP3”, “O5’”, “C5’”, “C4’”, “C3’”, “C2’”, “C1’”, “O4’”, “O3’”, “O2’”). Remove sidechain (base) if any of the sidechain atoms is involved in stereochemistry issues.
- Parameters:
ent (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Entity to be stereocheckedstereo_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoData()
if not given.stereo_link_data (
dict
) – Stereochemistry data, use return value ofGetDefaultStereoLinkData()
if not given.
- Returns:
Tuple with four elements: 1)
ost.mol.EntityView
of ent processed as described above 2) Return value ofGetClashes()
3) return value ofGetBadBonds()
4) return value ofGetBadAngles()
- GetDefaultStereoData()¶
Get default stereo data derived from CCP4 MON_LIB
Used as default if not provided in
GetBadBonds()
,GetBadAngles()
andStereoCheck()
.MON_LIB is licensed under GNU LESSER GENERAL PUBLIC LICENSE Version 3. Consult the latest CCP4 for the full license text.
- GetDefaultStereoLinkData()¶
Get default stereo data for links between compounds
Hardcoded from arbitrary sources, see comments in the code.
- Returns:
Data for peptide bonds, nucleotide links and disulfid bonds that are used as default if not provided in
GetBadBonds()
,GetBadAngles()
andStereoCheck()
.
scoring
– Specialized scoring functions¶
- class lDDTBSScorer(reference, model, residue_number_alignment=False)¶
Scorer specific for a reference/model pair
Finds best possible binding site representation of reference in model given lDDT score. Uses
ost.mol.alg.chain_mapping.ChainMapper
to deal with chain mapping.- Parameters:
reference (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Reference structuremodel (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Model structureresidue_number_alignment (
bool
) – Passed to ChainMapper constructor
- ScoreBS(ligand, radius=4.0, lddt_radius=10.0)¶
Computes binding site lDDT score given ligand. Best possible binding site representation is selected by lDDT but other scores such as CA based RMSD and GDT are computed too and returned.
- Parameters:
ligand (r'((Residue)|(Chain)|(Entity))((View)|(Handle))') – Defines the scored binding site, i.e. provides positions to perform proximity search
radius (
float
) – Reference residues with any atom position within radius of ligand consitute the scored binding sitelddt_radius (
float
) – Passed as inclusion_radius toost.mol.alg.lddt.lDDTScorer
- Returns:
Object of type
ost.mol.alg.chain_mapping.ReprResult
containing all atom lDDT score and mapping information. None if no representation could be found.
- class Scorer(model, target, resnum_alignments=False, molck_settings=None, cad_score_exec=None, custom_mapping=None, usalign_exec=None, lddt_no_stereochecks=False, n_max_naive=40320, oum=False, min_pep_length=6, min_nuc_length=4)¶
Helper class to access the various scores available from ost.mol.alg
Deals with structure cleanup, chain mapping, interface identification etc. Intermediate results are available as attributes.
- Parameters:
model (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Model structure - a deep copy is available asmodel
. Additionally,ost.mol.alg.Molck()
using molck_settings is applied.target (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Target structure - a deep copy is available astarget
. Additionally,ost.mol.alg.Molck()
using molck_settings is applied.resnum_alignments (
bool
) – Whether alignments between chemically equivalent chains in model and target can be computed based on residue numbers. This can be assumed in benchmarking setups such as CAMEO/CASP.molck_settings (
ost.mol.alg.MolckSettings
) – Settings used for Molck on model and target, if set to None, a default object is constructed by setting everything except rm_zero_occ_atoms and colored to True inost.mol.alg.MolckSettings
constructor.cad_score_exec (
str
) – Explicit path to voronota-cadscore executable from voronota installation from https://github.com/kliment-olechnovic/voronota. If not given, voronota-cadscore must be in PATH if any of the CAD score related attributes is requested.custom_mapping (
dict
) – Provide custom chain mapping between model and target. Dictionary with target chain names as key and model chain names as value.usalign_exec (
str
) – Explicit path to USalign executable used to compute TM-score. If not given, TM-score will be computed with OpenStructure internal copy of USalign code.lddt_no_stereochecks (
bool
) – Whether to compute lDDT without stereochemistry checksn_max_naive (
int
) – Parameter for chain mapping. If the number of possible mappings is <= n_max_naive, the full mapping solution space is enumerated to find the the optimum. A heuristic is used otherwise. The default of 40320 corresponds to an octamer (8! = 40320). A structure with stoichiometry A6B2 would be 6!*2! = 1440 etc.oum (
bool
) – Override USalign Mapping. Inject mapping ofScorer
object into USalign to compute TM-score. Experimental feature with limitations.min_pep_length (
int
) – Relevant parameter if short peptides are involved in scoring. Minimum peptide length for a chain in the target structure to be considered in chain mapping. The chain mapping algorithm first performs an all vs. all pairwise sequence alignment to identify “equal” chains within the target structure. We go for simple sequence identity there. Short sequences can be problematic as they may produce high sequence identity alignments by pure chance.min_nuc_length (
int
) – Relevant parameter if short nucleotides are involved in scoring. Minimum nucleotide length for a chain in the target structure to be considered in chain mapping. The chain mapping algorithm first performs an all vs. all pairwise sequence alignment to identify “equal” chains within the target structure. We go for simple sequence identity there. Short sequences can be problematic as they may produce high sequence identity alignments by pure chance.
- property model¶
Model with Molck cleanup
- Type:
- property model_orig¶
The original model passed at object construction
- property target¶
Target with Molck cleanup
- Type:
- property target_orig¶
The original target passed at object construction
- property aln¶
Alignments of
model
/target
chainsAlignments for each pair of chains mapped in
mapping
. First sequence is target sequence, second sequence the model sequence.- Type:
list
ofost.seq.AlignmentHandle
- property stereochecked_aln¶
Stereochecked equivalent of
aln
The alignments may differ, as stereochecks potentially remove residues
- Type:
list
ofost.seq.AlignmentHandle
- property pepnuc_aln¶
Alignments of
model_orig
/target_orig
chainsSelects for peptide and nucleotide residues before sequence extraction. Includes residues that would be removed by molck in structure preprocessing.
- Type:
list
ofost.seq.AlignmentHandle
- property stereochecked_model¶
View of
model
that has stereochemistry checks appliedFirst, a selection for peptide/nucleotide residues is performed, secondly peptide sidechains with stereochemical irregularities are removed (full residue if backbone atoms are involved). Irregularities are clashes or bond lengths/angles more than 12 standard deviations from expected values.
- Type:
- property model_clashes¶
Clashing model atoms
- Type:
- property model_bad_bonds¶
Model bonds with unexpected stereochemistry
- Type:
- property model_bad_angles¶
Model angles with unexpected stereochemistry
- Type:
- property stereochecked_target¶
Same as
stereochecked_model
fortarget
- Type:
- property target_clashes¶
Clashing target atoms
- Type:
- property target_bad_bonds¶
Target bonds with unexpected stereochemistry
- Type:
- property target_bad_angles¶
Target angles with unexpected stereochemistry
- Type:
- property mapping¶
- property model_interface_residues¶
Interface residues in
model
Thats all residues having a contact with at least one residue from another chain (CB-CB distance <= 8A, CA in case of Glycine)
- Type:
dict
with chain names as key and andlist
with residue numbers of the respective interface residues.
- property target_interface_residues¶
Same as
model_interface_residues
fortarget
- Type:
dict
with chain names as key and andlist
with residue numbers of the respective interface residues.
- property lddt_scorer¶
lDDT scorer for
stereochecked_target
(default parameters)
- property bb_lddt_scorer¶
Backbone only lDDT scorer for
target
No stereochecks applied for bb only lDDT which considers CA atoms for peptides and C3’ atoms for nucleotides.
- property qs_scorer¶
QS scorer constructed from
mapping
The scorer object is constructed with default parameters and relates to
model
andtarget
(no stereochecks).
- property lddt¶
Global lDDT score in range [0.0, 1.0]
Computed based on
stereochecked_model
. In case of oligomers,mapping
is used.- Type:
float
- property local_lddt¶
Per residue lDDT scores in range [0.0, 1.0]
Computed based on
stereochecked_model
but scores for all residues inmodel
are reported. If a residue has been removed by stereochemistry checks, the respective score is set to 0.0. If a residue is not covered by the target or is in a chain skipped by the chain mapping procedure (happens for super short chains), the respective score is set to None. In case of oligomers,mapping
is used.- Type:
dict
- property bb_lddt¶
Backbone only global lDDT score in range [0.0, 1.0]
Computed based on
model
on backbone atoms only. This is CA for peptides and C3’ for nucleotides. No stereochecks are performed. In case of oligomers,mapping
is used.- Type:
float
- property bb_local_lddt¶
Backbone only per residue lDDT scores in range [0.0, 1.0]
Computed based on
model
on backbone atoms only. This is CA for peptides and C3’ for nucleotides. No stereochecks are performed. If a residue is not covered by the target or is in a chain skipped by the chain mapping procedure (happens for super short chains), the respective score is set to None. In case of oligomers,mapping
is used.- Type:
dict
- property qs_best¶
Global QS-score - only computed on aligned residues
Computed based on
model
usingmapping
. The QS-score computation only considers contacts between residues with a mapping between target and model. As a result, the score won’t be lowered in case of additional chains/residues in any of the structures.- Type:
float
- property qs_target_interfaces¶
Interfaces in
target
with non-zero contribution toqs_global
/qs_best
Chain names are lexicographically sorted.
- Type:
list
oftuple
with 2 elements each: (trg_ch1, trg_ch2)
- property qs_model_interfaces¶
Interfaces in
model
with non-zero contribution toqs_global
/qs_best
Chain names are lexicographically sorted.
- Type:
list
oftuple
with 2 elements each: (mdl_ch1, mdl_ch2)
- property qs_interfaces¶
Interfaces in
qs_target_interfaces
that can be mapped tomodel
.Target chain names are lexicographically sorted.
- Type:
list
oftuple
with 4 elements each: (trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)
- property per_interface_qs_global¶
QS-score for each interface in
qs_interfaces
- Type:
list
offloat
- property per_interface_qs_best¶
QS-score for each interface in
qs_interfaces
Only computed on aligned residues
- Type:
list
offloat
- property native_contacts¶
Native contacts
A contact is a pair or residues from distinct chains that have a minimal heavy atom distance < 5A. Contacts are specified as
tuple
with two strings in format: <cname>.<rnum>.<ins_code>- Type:
list
oftuple
- property model_contacts¶
Same for model
- property contact_target_interfaces¶
Interfaces in
target
which have at least one contactContact as defined in
native_contacts
, chain names are lexicographically sorted.- Type:
list
oftuple
with 2 elements each (trg_ch1, trg_ch2)
- property contact_model_interfaces¶
Interfaces in
model
which have at least one contactContact as defined in
native_contacts
, chain names are lexicographically sorted.- Type:
list
oftuple
with 2 elements each (mdl_ch1, mdl_ch2)
- property ics_precision¶
Fraction of model contacts that are also present in target
- Type:
float
- property ics_recall¶
Fraction of target contacts that are correctly reproduced in model
- Type:
float
- property ics¶
ICS (Interface Contact Similarity) score
Combination of
ics_precision
andics_recall
using the F1-measure- Type:
float
- property per_interface_ics_precision¶
Per-interface ICS precision
ics_precision
for each interface incontact_target_interfaces
- Type:
list
offloat
- property per_interface_ics_recall¶
Per-interface ICS recall
ics_recall
for each interface incontact_target_interfaces
- Type:
list
offloat
- property per_interface_ics¶
Per-interface ICS (Interface Contact Similarity) score
ics
for each interface incontact_target_interfaces
- Type:
float
- property ips_precision¶
Fraction of model interface residues that are also interface residues in target
- Type:
float
- property ips_recall¶
Fraction of target interface residues that are also interface residues in model
- Type:
float
- property ips¶
IPS (Interface Patch Similarity) score
Jaccard coefficient of interface residues in target and their mapped counterparts in model
- Type:
float
- property per_interface_ips_precision¶
Per-interface IPS precision
ips_precision
for each interface incontact_target_interfaces
- Type:
list
offloat
- property per_interface_ips_recall¶
Per-interface IPS recall
ips_recall
for each interface incontact_target_interfaces
- Type:
list
offloat
- property per_interface_ips¶
Per-interface IPS (Interface Patch Similarity) score
ips
for each interface incontact_target_interfaces
- Type:
list
offloat
- property dockq_target_interfaces¶
Interfaces in
target
that are relevant for DockQIn principle a subset of
contact_target_interfaces
that only contains peptide sequences. Chain names are lexicographically sorted.- Type:
list
oftuple
with 2 elements each: (trg_ch1, trg_ch2)
- property dockq_interfaces¶
Interfaces in
dockq_target_interfaces
that can be mapped to modelTarget chain names are lexicographically sorted
- Type:
list
oftuple
with 4 elements each: (trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)
- property dockq_scores¶
DockQ scores for interfaces in
dockq_interfaces
list
offloat
- property fnat¶
fnat scores for interfaces in
dockq_interfaces
fnat: Fraction of native contacts that are also present in model
list
offloat
- property nnat¶
N native contacts for interfaces in
dockq_interfaces
list
ofint
- property nmdl¶
N model contacts for interfaces in
dockq_interfaces
list
ofint
- property fnonnat¶
fnonnat scores for interfaces in
dockq_interfaces
fnat: Fraction of model contacts that are not present in target
list
offloat
- property irmsd¶
irmsd scores for interfaces in
dockq_interfaces
irmsd: RMSD of interface (RMSD computed on N, CA, C, O atoms) which consists of each residue that has at least one heavy atom within 10A of other chain.
list
offloat
- property lrmsd¶
lrmsd scores for interfaces in
dockq_interfaces
lrmsd: The interfaces are superposed based on the receptor (rigid min RMSD superposition) and RMSD for the ligand is reported. Superposition and RMSD are based on N, CA, C and O positions, receptor is the chain contributing to the interface with more residues in total.
list
offloat
- property dockq_ave¶
Average of DockQ scores in
dockq_scores
In its original implementation, DockQ only operates on single interfaces. Thus the requirement to combine scores for higher order oligomers.
- Type:
float
- property dockq_ave_full¶
Same as
dockq_ave
but penalizing for missing interfacesInterfaces that are not covered in model are added as 0.0 in average computation.
- Type:
float
- property dockq_wave_full¶
Same as
dockq_ave_full
, but weightedInterfaces that are not covered in model are added as 0.0 in average computations and the respective weights are derived from number of contacts in respective target interface.
- property mapped_target_pos¶
Mapped representative positions in target
Thats CA positions for peptide residues and C3’ positions for nucleotides. Has same length as
mapped_model_pos
and mapping is based onmapping
.- Type:
ost.geom.Vec3List
- property mapped_model_pos¶
Mapped representative positions in model
Thats CA positions for peptide residues and C3’ positions for nucleotides. Has same length as
mapped_target_pos
and mapping is based onmapping
.- Type:
ost.geom.Vec3List
- property transformed_mapped_model_pos¶
mapped_model_pos
withtransform
applied- Type:
ost.geom.Vec3List
- property n_target_not_mapped¶
Number of target residues which have no mapping to model
- Type:
int
- property transform¶
Transform:
mapped_model_pos
ontomapped_target_pos
Computed using Kabsch minimal rmsd algorithm
- Type:
- property gdtts¶
GDT with thresholds: 8.0A, 4.0A, 2.0A and 1.0A
Computed on
transformed_mapped_model_pos
andmapped_target_pos
- Type:
float
- property gdtha¶
GDT with thresholds: 4.0A, 2.0A, 1.0A and 0.5A
Computed on
transformed_mapped_model_pos
andmapped_target_pos
- Type:
float
- property rmsd¶
RMSD
Computed on
transformed_mapped_model_pos
andmapped_target_pos
- Type:
float
- property cad_score¶
The global CAD atom-atom (AA) score
Computed based on
model
. In case of oligomers,mapping
is used.- Type:
float
- property local_cad_score¶
The per-residue CAD atom-atom (AA) scores
Computed based on
model
. In case of oligomers,mapping
is used.- Type:
dict
- property patch_qs¶
Patch QS-scores for each residue in
model_interface_residues
Representative patches for each residue r in chain c are computed as follows:
mdl_patch_one: All residues in c with CB (CA for GLY) positions within 8A of r and within 12A of residues from any other chain.
mdl_patch_two: Closest residue x to r in any other chain gets identified. Patch is then constructed by selecting all residues from any other chain within 8A of x and within 12A from any residue in c.
trg_patch_one: Chain name and residue number based mapping from mdl_patch_one
trg_patch_two: Chain name and residue number based mapping from mdl_patch_two
Results are stored in the same manner as
model_interface_residues
, with corresponding scores instead of residue numbers. Scores for residues which are notmol.ChemType.AMINOACIDS
are set to None. Additionally, interface patches are derived frommodel
. If they contain residues which are not covered bytarget
, the score is set to None too.- Type:
dict
with chain names as key and andlist
with scores of the respective interface residues.
- property tm_score¶
TM-score computed with USalign
USalign executable can be specified with usalign_exec kwarg at Scorer construction, an OpenStructure internal copy of the USalign code is used otherwise.
- Type:
float
- property usalign_mapping¶
Mapping computed with USalign
Dictionary with target chain names as key and model chain names as values. No guarantee that all chains are mapped. USalign executable can be specified with usalign_exec kwarg at Scorer construction, an OpenStructure internal copy of the USalign code is used otherwise.
- Type:
dict
ligand_scoring
– Ligand scoring functions¶
- class LigandScorer(model, target, model_ligands=None, target_ligands=None, resnum_alignments=False, check_resnames=True, rename_ligand_chain=False, chain_mapper=None, substructure_match=False, coverage_delta=0.2, radius=4.0, lddt_pli_radius=6.0, lddt_lp_radius=10.0, binding_sites_topn=100000, global_chain_mapping=False, rmsd_assignment=False, n_max_naive=12, max_symmetries=100000.0, custom_mapping=None, unassigned=False)¶
Scorer class to compute various small molecule ligand (non polymer) scores.
Note
Extra requirements:
Python modules numpy and networkx must be available (e.g. use
pip install numpy networkx
)
At the moment, two scores are available:
lDDT-PLI, that looks at the conservation of protein-ligand contacts with
lDDT
.Binding-site superposed, symmetry-corrected RMSD that assesses the accuracy of the ligand pose (BiSyRMSD, hereinafter referred to as RMSD).
Both scores involve local chain mapping of the reference binding site onto the model, symmetry-correction, and finally assignment (mapping) of model and target ligands, as described in (Manuscript in preparation).
The binding site is defined based on a radius around the target ligand in the reference structure only. It only contains protein and nucleic acid chains that pass the criteria for the
chain mapping
. This means ignoring other ligands, waters, short polymers as well as any incorrectly connected chains that may be in proximity.Results are available as matrices ((lddt_pli|rmsd)_matrix), where every target-model score is reported in a matrix; as (lddt_pli|rmsd) where a model-target assignment has been determined (see below) and reported in a dictionary; and as ((lddt_pli|rmsd)_details) methods, which report additional details about different aspects of the scoring such as chain mapping.
The behavior of chain mapping and ligand assignment can be controlled with the global_chain_mapping and rmsd_assignment arguments.
By default, chain mapping is performed locally, ie. only within the binding site. As a result, different ligand scores can correspond to different chain mappings. This tends to produce more favorable scores, especially in large, partially regular oligomeric complexes. Setting global_chain_mapping=True enforces a single global chain mapping, as per
ost.mol.alg.chain_mapping.ChainMapper.GetMapping()
. Note that this global chain mapping currently ignores non polymer entities such as small ligands, and may result in overly pessimistic scores.By default, target-model ligand assignments are computed independently for the RMSD and lDDT-PLI scores. For RMSD, each model ligand is uniquely assigned to a target ligand, starting from the “best” possible mapping (lowest RMSD) and using each target and model ligand in a single assignment. Ties are resolved by best (highest) lDDT-PLI. Similarly, for lDDT-PLI, the assignment is based on the highest lDDT-PLI, and ties broken by lowest RMSD. Setting rmsd_assignment=True forces a single ligand assignment, based on RMSD only. Ties are broken arbitrarily.
By default, only exact matches between target and model ligands are considered. This is a problem when the target only contains a subset of the expected atoms (for instance if atoms are missing in an experimental structure, which often happens in the PDB). With substructure_match=True, complete model ligands can be scored against partial target ligands. One problem with this approach is that it is very easy to find good matches to small, irrelevant ligands like EDO, CO2 or GOL. To counter that, the assignment algorithm considers the coverage, expressed as the fraction of atoms of the model ligand atoms covered in the target. Higher coverage matches are prioritized, but a match with a better score will be preferred if it falls within a window of coverage_delta (by default 0.2) of a worse-scoring match. As a result, for instance, with a delta of 0.2, a low-score match with coverage 0.96 would be preferred to a high-score match with coverage 0.90.
Assumptions:
The class generally assumes that the
is_ligand
property is properly set on all the ligand atoms, and only ligand atoms. This is typically the case for entities loaded from mmCIF (tested with mmCIF files from the PDB and SWISS-MODEL). Legacy PDB files must contain HET headers (which is usually the case for files downloaded from the PDB but not elsewhere).The class doesn’t perform any cleanup of the provided structures. It is up to the caller to ensure that the data is clean and suitable for scoring. Molck should be used with extra care, as many of the options (such as rm_non_std or map_nonstd_res) can cause ligands to be removed from the structure. If cleanup with Molck is needed, ligands should be kept aside and passed separately. Non-ligand residues should be valid compounds with atom names following the naming conventions of the component dictionary. Non-standard residues are acceptable, and if the model contains a standard residue at that position, only atoms with matching names will be considered.
Unlike most of OpenStructure, this class does not assume that the ligands (either in the model or the target) are part of the PDB component dictionary. They may have arbitrary residue names. Residue names do not have to match between the model and the target. Matching is based on the calculation of isomorphisms which depend on the atom element name and atom connectivity (bond order is ignored). It is up to the caller to ensure that the connectivity of atoms is properly set before passing any ligands to this class. Ligands with improper connectivity will lead to bogus results.
Note, however, that atom names should be unique within a residue (ie two distinct atoms cannot have the same atom name).
This only applies to the ligand. The rest of the model and target structures (protein, nucleic acids) must still follow the usual rules and contain only residues from the compound library.
Although it isn’t a requirement, hydrogen atoms should be removed from the structures. Here is an example code snippet that will perform a reasonable cleanup. Keep in mind that this is most likely not going to work as expected with entities loaded from PDB files, as the is_ligand flag is probably not set properly.
Here is a snippet example of how to use this code:
from ost.mol.alg.ligand_scoring import LigandScorer from ost.mol.alg import Molck, MolckSettings # Load data # Structure model in PDB format, containing the receptor only model = io.LoadPDB("path_to_model.pdb") # Ligand model as SDF file model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf") # Target loaded from mmCIF, containing the ligand target = io.LoadMMCIF("path_to_target.cif") # Cleanup a copy of the structures cleaned_model = model.Copy() cleaned_target = target.Copy() molck_settings = MolckSettings(rm_unk_atoms=True, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=False, assign_elem=True) Molck(cleaned_model, conop.GetDefaultLib(), molck_settings) Molck(cleaned_target, conop.GetDefaultLib(), molck_settings) # Setup scorer object and compute lDDT-PLI model_ligands = [model_ligand.Select("ele != H")] ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands) print("lDDT-PLI:", ls.lddt_pli) print("RMSD:", ls.rmsd)
- Parameters:
model (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Model structure - a deep copy is available asmodel
. No additional processing (ie. Molck), checks, stereochemistry checks or sanitization is performed on the input. Hydrogen atoms are kept.target (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Target structure - a deep copy is available astarget
. No additional processing (ie. Molck), checks or sanitization is performed on the input. Hydrogen atoms are kept.model_ligands (
list
) – Model ligands, as a list ofResidueHandle
belonging to the model entity. Can be instantiated with either a :class:list ofResidueHandle
/ost.mol.ResidueView
or ofost.mol.EntityHandle
/ost.mol.EntityView
. If None, ligands will be extracted based on theis_ligand
flag (this is normally set properly in entities loaded from mmCIF).target_ligands (
list
) – Target ligands, as a list ofResidueHandle
belonging to the target entity. Can be instantiated either a :class:list ofResidueHandle
/ost.mol.ResidueView
or ofost.mol.EntityHandle
/ost.mol.EntityView
containing a single residue each. If None, ligands will be extracted based on theis_ligand
flag (this is normally set properly in entities loaded from mmCIF).resnum_alignments (
bool
) – Whether alignments between chemically equivalent chains in model and target can be computed based on residue numbers. This can be assumed in benchmarking setups such as CAMEO/CASP.check_resnames (
bool
) – On by default. Enforces residue name matches between mapped model and target residues.rename_ligand_chain (
bool
) – If a residue with the same chain name and residue number than an explicitly passed model or target ligand exits in the structure, and rename_ligand_chain is False, a RuntimeError will be raised. If rename_ligand_chain is True, the ligand will be moved to a new chain instead, and the move will be logged to the console with SCRIPT level.chain_mapper (
ost.mol.alg.chain_mapping.ChainMapper
) – a chain mapper initialized for the target structure. If None (default), a chain mapper will be initialized lazily as required.substructure_match (
bool
) – Set this to True to allow partial target ligand.coverage_delta (
float
) – the coverage delta for partial ligand assignment.radius (
float
) – Inclusion radius for the binding site. Residues with atoms within this distance of the ligand will be considered for inclusion in the binding site.lddt_pli_radius (
float
) – lDDT inclusion radius for lDDT-PLI.lddt_lp_radius (
float
) – lDDT inclusion radius for lDDT-LP.binding_sites_topn (
int
) – maximum number of target binding site representations to assess, per target ligand. Ignored if global_chain_mapping is True.global_chain_mapping (
bool
) – set to True to use a global chain mapping for the polymer (protein, nucleotide) chains. Defaults to False, in which case only local chain mappings are allowed (where different ligand may be scored against different chain mappings).custom_mapping (
dict
) – Provide custom chain mapping between model and target that is used as global chain mapping. Dictionary with target chain names as key and model chain names as value. Only has an effect if global_chain_mapping is True.rmsd_assignment (
bool
) – assign ligands based on RMSD only. The default (False) is to use a combination of lDDT-PLI and RMSD for the assignment.n_max_naive (
int
) – Parameter for global chain mapping. If model and target have less or equal that number of chains, the full mapping solution space is enumerated to find the the optimum. A heuristic is used otherwise.max_symmetries (
int
) – If more than that many isomorphisms exist for a target-ligand pair, it will be ignored and reported as unassigned.unassigned (
bool
) – If True, unassigned model ligands are reported in the output together with assigned ligands, with a score of None, and reason for not being assigned in the *_details matrix. Defaults to False.
- property chain_mapper¶
Chain mapper object for the given
target
.
- property rmsd_matrix¶
Get the matrix of RMSD values.
Target ligands are in rows, model ligands in columns.
NaN values indicate that no RMSD could be computed (i.e. different ligands).
- Return type:
ndarray
- property lddt_pli_matrix¶
Get the matrix of lDDT-PLI values.
Target ligands are in rows, model ligands in columns.
NaN values indicate that no lDDT-PLI could be computed (i.e. different ligands).
- Return type:
ndarray
- property coverage_matrix¶
Get the matrix of model ligand atom coverage in the target.
Target ligands are in rows, model ligands in columns.
A value of 0 indicates that there was no isomorphism between the model and target ligands. If substructure_match=False, only full match isomorphisms are considered, and therefore only values of 1.0 and 0.0 are reported.
- Return type:
ndarray
- property rmsd¶
Get a dictionary of RMSD score values, keyed by model ligand (chain name,
ResNum
).If the scoring object was instantiated with unassigned=True, some scores may be None.
- Return type:
dict
- property rmsd_details¶
Get a dictionary of RMSD score details (dictionaries), keyed by model ligand (chain name,
ResNum
).The value is a dictionary. For ligands that were assigned (mapped) to the target, the dictionary contain the following information:
rmsd: the RMSD score value.
lddt_lp: the lDDT score of the ligand pocket (lDDT-LP).
bs_ref_res: a list of residues (
ResidueHandle
) that define the binding site in the reference.bs_ref_res_mapped: a list of residues (
ResidueHandle
) in the reference binding site that could be mapped to the model.bs_mdl_res_mapped: a list of residues (
ResidueHandle
) in the model that were mapped to the reference binding site. The residues are in the same order as bs_ref_res_mapped.bb_rmsd: the RMSD of the binding site backbone after superposition
target_ligand: residue handle of the target ligand.
model_ligand: residue handle of the model ligand.
chain_mapping: local chain mapping as a dictionary, with target chain name as key and model chain name as value.
transform: transformation to superpose the model onto the target.
substructure_match: whether the score is the result of a partial (substructure) match. A value of True indicates that the target ligand covers only part of the model, while False indicates a perfect match.
coverage: the fraction of model atoms covered by the assigned target ligand, in the interval (0, 1]. If substructure_match is False, this will always be 1.
inconsistent_residues: a list of tuples of mapped residues views (
ResidueView
) with residue names that differ between the reference and the model, respectively. The list is empty if all residue names match, which is guaranteed if check_resnames=True. Note: more binding site mappings may be explored during scoring, but only inconsistencies in the selected mapping are reported.unassigned: only if the scorer was instantiated with unassigned=True: False
If the scoring object was instantiated with unassigned=True, in addition the unassigned ligands will be reported with a score of None and the following information:
unassigned: True,
reason_short: a short token of the reason, see
unassigned_model_ligands
for details and meaning.reason_long: a human-readable text of the reason, see
unassigned_model_ligands
for details and meaning.rmsd: None
- Return type:
dict
- property lddt_pli¶
Get a dictionary of lDDT-PLI score values, keyed by model ligand (chain name,
ResNum
).If the scoring object was instantiated with unassigned=True, some scores may be None.
- Return type:
dict
- property lddt_pli_details¶
Get a dictionary of lDDT-PLI score details (dictionaries), keyed by model ligand (chain name,
ResNum
).Each sub-dictionary contains the following information:
lddt_pli: the lDDT-PLI score value.
rmsd: the RMSD score value corresponding to the lDDT-PLI chain mapping and assignment. This may differ from the RMSD-based assignment. Note that a different isomorphism than lddt_pli may be used.
lddt_lp: the lDDT score of the ligand pocket (lDDT-LP).
lddt_pli_n_contacts: number of total contacts used in lDDT-PLI, summed over all thresholds. Can be divided by 8 to obtain the number of atomic contacts.
bs_ref_res: a list of residues (
ResidueHandle
) that define the binding site in the reference.bs_ref_res_mapped: a list of residues (
ResidueHandle
) in the reference binding site that could be mapped to the model.bs_mdl_res_mapped: a list of residues (
ResidueHandle
) in the model that were mapped to the reference binding site. The residues are in the same order as bs_ref_res_mapped.bb_rmsd: the RMSD of the binding site backbone after superposition. Note: not used for lDDT-PLI computation.
target_ligand: residue handle of the target ligand.
model_ligand: residue handle of the model ligand.
chain_mapping: local chain mapping as a dictionary, with target chain name as key and model chain name as value.
transform: transformation to superpose the model onto the target (for RMSD only).
substructure_match: whether the score is the result of a partial (substructure) match. A value of True indicates that the target ligand covers only part of the model, while False indicates a perfect match.
inconsistent_residues: a list of tuples of mapped residues views (
ResidueView
) with residue names that differ between the reference and the model, respectively. The list is empty if all residue names match, which is guaranteed if check_resnames=True. Note: more binding site mappings may be explored during scoring, but only inconsistencies in the selected mapping are reported.unassigned: only if the scorer was instantiated with unassigned=True: False
If the scoring object was instantiated with unassigned=True, in addition the unmapped ligands will be reported with a score of None and the following information:
unassigned: True,
reason_short: a short token of the reason, see
unassigned_model_ligands
for details and meaning.reason_long: a human-readable text of the reason, see
unassigned_model_ligands
for details and meaning.lddt_pli: None
- Return type:
dict
- property unassigned_target_ligands¶
Get a dictionary of target ligands not assigned to any model ligand, keyed by target ligand (chain name,
ResNum
).The assignment for the lDDT-PLI score is used (and is controlled by the rmsd_assignment argument).
Each item contains a string from a controlled dictionary about the reason for the absence of assignment. A human-readable description can be obtained from the
unassigned_target_ligand_descriptions
property.Currently, the following reasons are reported:
no_ligand: there was no ligand in the model.
disconnected: the ligand graph was disconnected.
binding_site: no residues were in proximity of the ligand.
model_representation: no representation of the reference binding site was found in the model. (I.e. the binding site was not modeled. Remember: the binding site is defined in the target structure, the position of the model ligand itself is ignored at this point.)
identity: the ligand was not found in the model (by graph isomorphism). Check your ligand connectivity, and enable the substructure_match option if the target ligand is incomplete.
stoichiometry: there was a possible assignment in the model, but the model ligand was already assigned to a different target ligand. This indicates different stoichiometries.
symmetries: too many symmetries were found (by graph isomorphisms). Increase max_symmetries.
Some of these reasons can be overlapping, but a single reason will be reported.
- Return type:
dict
- property unassigned_target_ligand_descriptions¶
Get a human-readable description of why target ligands were unassigned, as a dictionary keyed by the controlled dictionary from
unassigned_target_ligands
.
- property unassigned_model_ligands¶
Get a dictionary of model ligands not assigned to any target ligand, keyed by model ligand (chain name,
ResNum
).The assignment for the lDDT-PLI score is used (and is controlled by the rmsd_assignment argument).
Each item contains a string from a controlled dictionary about the reason for the absence of assignment. A human-readable description can be obtained from the
unassigned_model_ligand_descriptions
property. Currently, the following reasons are reported:no_ligand: there was no ligand in the target.
disconnected: the ligand graph is disconnected.
binding_site: a potential assignment was found in the target, but there were no polymer residues in proximity of the ligand in the target.
model_representation: a potential assignment was found in the target, but no representation of the binding site was found in the model. (I.e. the binding site was not modeled. Remember: the binding site is defined in the target structure, the position of the model ligand itself is ignored at this point.)
identity: the ligand was not found in the target (by graph isomorphism). Check your ligand connectivity, and enable the substructure_match option if the target ligand is incomplete.
stoichiometry: there was a possible assignment in the target, but the model target was already assigned to a different model ligand. This indicates different stoichiometries.
symmetries: too many symmetries were found (by graph isomorphisms). Increase max_symmetries.
Some of these reasons can be overlapping, but a single reason will be reported.
- Return type:
dict
- property unassigned_model_ligand_descriptions¶
Get a human-readable description of why model ligands were unassigned, as a dictionary keyed by the controlled dictionary from
unassigned_model_ligands
.
- SCRMSD(model_ligand, target_ligand, transformation=geom.Mat4(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1), substructure_match=False, max_symmetries=1000000.0)¶
Calculate symmetry-corrected RMSD.
Binding site superposition must be computed separately and passed as transformation.
- Parameters:
model_ligand (
ost.mol.ResidueHandle
orost.mol.ResidueView
) – The model ligandtarget_ligand (
ost.mol.ResidueHandle
orost.mol.ResidueView
) – The target ligandtransformation (
ost.geom.Mat4
) – Optional transformation to apply on each atom position of model_ligand.substructure_match (
bool
) – Set this to True to allow partial target ligand.max_symmetries (
int
) – If more than that many isomorphisms exist, raise aTooManySymmetriesError
. This can only be assessed by generating at least that many isomorphisms and can take some time.
- Return type:
float
- Raises:
NoSymmetryError
when no symmetry can be found,DisconnectedGraphError
when ligand graph is disconnected,TooManySymmetriesError
when more than max_symmetries isomorphisms are found.
- exception NoSymmetryError¶
Exception raised when no symmetry can be found.
- exception TooManySymmetriesError¶
Exception raised when too many symmetries are found.
- exception DisconnectedGraphError¶
Exception raised when the ligand graph is disconnected.
chain_mapping
– Chain Mapping¶
Chain mapping aims to identify a one-to-one relationship between chains in a reference structure and a model.
- class ChainMapper(target, resnum_alignments=False, pep_seqid_thr=95.0, pep_gap_thr=1.0, nuc_seqid_thr=95.0, nuc_gap_thr=1.0, pep_subst_mat=<ost.seq.alg._ost_seq_alg.SubstWeightMatrix object>, pep_gap_open=-11, pep_gap_ext=-1, nuc_subst_mat=<ost.seq.alg._ost_seq_alg.SubstWeightMatrix object>, nuc_gap_open=-4, nuc_gap_ext=-4, min_pep_length=6, min_nuc_length=4, n_max_naive=100000000.0)¶
Class to compute chain mappings
All algorithms are performed on processed structures which fulfill criteria as given in constructor arguments (min_pep_length, “min_nuc_length”) and only contain residues which have all required backbone atoms. for peptide residues thats N, CA, C and CB (no CB for GLY), for nucleotide residues thats O5’, C5’, C4’, C3’ and O3’.
Chain mapping is a three step process:
Group chemically identical chains in target using pairwise alignments that are either computed with Needleman-Wunsch (NW) or simply derived from residue numbers (resnum_alignments flag). In case of NW, pep_subst_mat, pep_gap_open and pep_gap_ext and their nucleotide equivalents are relevant. Two chains are considered identical if they fulfill the thresholds given by pep_seqid_thr, pep_gap_thr, their nucleotide equivalents respectively. The grouping information is available as attributes of this class.
Map chains in an input model to these groups. Generating alignments and the similarity criteria are the same as above. You can either get the group mapping with
GetChemMapping()
or directly call one of the full fletched one-to-one chain mapping functions which execute that step internally.Obtain one-to-one mapping for chains in an input model and target with one of the available mapping functions. Just to get an idea of complexity. If target and model are octamers, there are
8! = 40320
possible chain mappings.
- Parameters:
target (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Target structure onto which models are mapped. Computations happen on a selection only containing polypeptides and polynucleotides.resnum_alignments (
bool
) – Use residue numbers instead of Needleman-Wunsch to compute pairwise alignments. Relevant forchem_groups
and related attributes.pep_seqid_thr (
float
) – Threshold used to decide when two chains are identical. 95 percent tolerates the few mutations crystallographers like to do.pep_gap_thr (
float
) – Additional threshold to avoid gappy alignments with high seqid. By default this is disabled (set to 1.0). This threshold checks for a maximum allowed fraction of gaps in any of the two sequences after stripping terminal gaps. The reason for not just normalizing seqid by the longer sequence is that one sequence might be a perfect subsequence of the other but only cover half of it.nuc_seqid_thr (
float
) – Nucleotide equivalent for pep_seqid_thrnuc_gap_thr (
float
) – Nucleotide equivalent for nuc_gap_thrpep_subst_mat (
ost.seq.alg.SubstWeightMatrix
) – Substitution matrix to align peptide sequences, irrelevant if resnum_alignments is True, defaults to seq.alg.BLOSUM62pep_gap_open (
int
) – Gap open penalty to align peptide sequences, irrelevant if resnum_alignments is Truepep_gap_ext (
int
) – Gap extension penalty to align peptide sequences, irrelevant if resnum_alignments is Truenuc_subst_mat (
ost.seq.alg.SubstWeightMatrix
) – Nucleotide equivalent for pep_subst_mat, defaults to seq.alg.NUC44nuc_gap_open (
int
) – Nucleotide equivalent for pep_gap_opennuc_gap_ext (
int
) – Nucleotide equivalent for pep_gap_extmin_pep_length (
int
) – Minimal number of residues for a peptide chain to be considered in target and in models.min_nuc_length (
int
) – Minimal number of residues for a nucleotide chain to be considered in target and in models.n_max_naive (
int
) – Max possible chain mappings that are enumerated inGetNaivelDDTMapping()
/GetDecomposerlDDTMapping()
. ARuntimeError
is raised in case of bigger complexity.
- property target¶
Target structure that only contains peptides/nucleotides
Contains only residues that have the backbone representatives (CA for peptide and C3’ for nucleotides) to avoid ATOMSEQ alignment inconsistencies when switching between all atom and backbone only representations.
- Type:
- property polypep_seqs¶
Sequences of peptide chains in
target
Respective
EntityView
from target for each sequence s are available ass.GetAttachedView()
- Type:
- property polynuc_seqs¶
Sequences of nucleotide chains in
target
Respective
EntityView
from target for each sequence s are available ass.GetAttachedView()
- Type:
- property chem_groups¶
Groups of chemically equivalent chains in
target
First chain in group is the one with longest sequence.
- Getter:
Computed on first use (cached)
- Type:
list
oflist
ofstr
(chain names)
- property chem_group_alignments¶
MSA for each group in
chem_groups
Sequences in MSAs exhibit same order as in
chem_groups
and have the respectiveost.mol.EntityView
from target attached.- Getter:
Computed on first use (cached)
- Type:
ost.seq.AlignmentList
- property chem_group_ref_seqs¶
Reference (longest) sequence for each group in
chem_groups
Respective
EntityView
from target for each sequence s are available ass.GetAttachedView()
- Getter:
Computed on first use (cached)
- Type:
- property chem_group_types¶
ChemType of each group in
chem_groups
Specifying if groups are poly-peptides/nucleotides, i.e.
ost.mol.ChemType.AMINOACIDS
orost.mol.ChemType.NUCLEOTIDES
- Getter:
Computed on first use (cached)
- Type:
list
ofost.mol.ChemType
- GetChemMapping(model)¶
Maps sequences in model to chem_groups of target
- Parameters:
model (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Model from which to extract sequences, a selection that only includes peptides and nucleotides is performed and returned along other results.- Returns:
Tuple with two lists of length len(self.chem_groups) and an
ost.mol.EntityView
representing model: 1) Each element is alist
with mdl chain names that map to the chem group at that position. 2) Each element is aost.seq.AlignmentList
aligning these mdl chain sequences to the chem group ref sequences. 3) A selection of model that only contains polypeptides and polynucleotides whose ATOMSEQ exactly matches the sequence info in the returned alignments.
- GetlDDTMapping(model, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], strategy='naive', steep_opt_rate=None, block_seed_size=5, block_blocks_per_chem_group=5, chem_mapping_result=None)¶
Identify chain mapping by optimizing lDDT score
Maps model chain sequences to
chem_groups
and find mapping based on backbone only lDDT score (CA for amino acids C3’ for Nucleotides).Either performs a naive search, i.e. enumerate all possible mappings or executes a greedy strategy that tries to identify a (close to) optimal mapping in an iterative way by starting from a start mapping (seed). In each iteration, the one-to-one mapping that leads to highest increase in number of conserved contacts is added with the additional requirement that this added mapping must have non-zero interface counts towards the already mapped chains. So basically we’re “growing” the mapped structure by only adding connected stuff.
The available strategies:
naive: Enumerates all possible mappings and returns best
greedy_fast: perform all vs. all single chain lDDTs within the respective ref/mdl chem groups. The mapping with highest number of conserved contacts is selected as seed for greedy extension
greedy_full: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and retain the mapping leading to the best lDDT.
greedy_block: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and extend them to block_seed_size. block_blocks_per_chem_group for each chem group are selected for exhaustive extension.
Sets
MappingResult.opt_score
in case of no trivial one-to-one mapping.- Parameters:
model (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Model to mapinclusion_radius (
float
) – Inclusion radius for lDDTthresholds (
list
offloat
) – Thresholds for lDDTstrategy (
str
) – Strategy to find mapping. Must be in [“naive”, “greedy_fast”, “greedy_full”, “greedy_block”]steep_opt_rate (
int
) – Only relevant for greedy strategies. If set, every steep_opt_rate mappings, a simple optimization is executed with the goal of avoiding local minima. The optimization iteratively checks all possible swaps of mappings within their respective chem groups and accepts swaps that improve lDDT score. Iteration stops as soon as no improvement can be achieved anymore.block_seed_size (
int
) – Param for greedy_block strategy - Initial seeds are extended by that number of chains.block_blocks_per_chem_group (
int
) – Param for greedy_block strategy - Number of blocks per chem group that are extended in an initial search for high scoring local solutions.chem_mapping_result (
tuple
) – Pro param. The result ofGetChemMapping()
where you provided model. If set, model parameter is not used.
- Returns:
- GetQSScoreMapping(model, contact_d=12.0, strategy='naive', block_seed_size=5, block_blocks_per_chem_group=5, steep_opt_rate=None, chem_mapping_result=None, greedy_prune_contact_map=False)¶
Identify chain mapping based on QSScore
Scoring is based on CA/C3’ positions which are present in all chains of a
chem_groups
as well as the model chains which are mapped to that respective chem group.The following strategies are available:
- naive: Naively iterate all possible mappings and return best based
on QS score.
greedy_fast: perform all vs. all single chain lDDTs within the respective ref/mdl chem groups. The mapping with highest number of conserved contacts is selected as seed for greedy extension. Extension is based on QS-score.
greedy_full: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and retain the mapping leading to the best QS-score.
greedy_block: try multiple seeds for greedy extension, i.e. try all ref/mdl chain combinations within the respective chem groups and extend them to block_seed_size. block_blocks_per_chem_group for each chem group are selected for exhaustive extension.
Sets
MappingResult.opt_score
in case of no trivial one-to-one mapping.- Parameters:
model (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Model to mapcontact_d (
float
) – Max distance between two residues to be considered as contact in qs scoringstrategy (
str
) – Strategy for sampling, must be in [“naive”]chem_mapping_result (
tuple
) – Pro param. The result ofGetChemMapping()
where you provided model. If set, model parameter is not used.greedy_prune_contact_map – Relevant for all strategies that use greedy extensions. If True, only chains with at least 3 contacts (8A CB distance) towards already mapped chains in trg/mdl are considered for extension. All chains that give a potential non-zero QS-score increase are used otherwise (at least one contact within 12A). The consequence is reduced runtime and usually no real reduction in accuracy.
- Returns:
- GetRigidMapping(model, strategy='greedy_single_gdtts', single_chain_gdtts_thresh=0.4, subsampling=None, first_complete=False, iterative_superposition=False, chem_mapping_result=None)¶
Identify chain mapping based on rigid superposition
Superposition and scoring is based on CA/C3’ positions which are present in all chains of a
chem_groups
as well as the model chains which are mapped to that respective chem group.Transformations to superpose model onto
ChainMapper.target
are estimated using all possible combinations of target and model chains within the same chem groups and build the basis for further extension.There are four extension strategies:
greedy_single_gdtts: Iteratively add the model/target chain pair that adds the most conserved contacts based on the GDT-TS metric (Number of CA/C3’ atoms within [8, 4, 2, 1] Angstrom). The mapping with highest GDT-TS score is returned. However, that mapping is not guaranteed to be complete (see single_chain_gdtts_thresh).
greedy_iterative_gdtts: Same as greedy_single_gdtts except that the transformation gets updated with each added chain pair.
greedy_single_rmsd: Conceptually similar to greedy_single_gdtts but the added chain pairs are the ones with lowest RMSD. The mapping with lowest overall RMSD gets returned. single_chain_gdtts_thresh is only applied to derive the initial transformations. After that, the minimal RMSD chain pair gets iteratively added without applying any threshold.
greedy_iterative_rmsd: Same as greedy_single_rmsd exept that the transformation gets updated with each added chain pair. single_chain_gdtts_thresh is only applied to derive the initial transformations. After that, the minimal RMSD chain pair gets iteratively added without applying any threshold.
- Parameters:
model (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Model to mapstrategy (
str
) – Strategy to extend mappings from initial transforms, see description above. Must be in [“greedy_single”, “greedy_iterative”, “greedy_iterative_rmsd”]single_chain_gdtts_thresh (
float
) – Minimal GDT-TS score for model/target chain pair to be added to mapping. Mapping extension for a given transform stops when no pair fulfills this threshold, potentially leading to an incomplete mapping.subsampling (
int
) – If given, only use an equally distributed subset of all CA/C3’ positions for superposition/scoring.first_complete (
bool
) – Avoid full enumeration and return first found mapping that covers all model chains or all target chains. Has no effect on greedy_iterative_rmsd strategy.iterative_superposition (
bool
) – Whether to compute inital transformations withost.mol.alg.IterativeSuperposeSVD()
as oposed toost.mol.alg.SuperposeSVD()
chem_mapping_result (
tuple
) – Pro param. The result ofGetChemMapping()
where you provided model. If set, model parameter is not used.
- Returns:
- GetMapping(model, n_max_naive=40320)¶
Convenience function to get mapping with currently preferred method
If number of possible chain mappings is <= n_max_naive, a naive QS-score mapping is performed and optimal QS-score is guaranteed. For anything else, a QS-score mapping with the greedy_full strategy is performed (greedy_prune_contact_map = True). The default for n_max_naive of 40320 corresponds to an octamer (8!=40320). A structure with stoichiometry A6B2 would be 6!*2!=1440 etc.
- GetRepr(substructure, model, topn=1, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], bb_only=False, only_interchain=False, chem_mapping_result=None, global_mapping=None)¶
Identify topn representations of substructure in model
substructure defines a subset of
target
for which one wants the topn representations in model. Representations are scored and sorted by lDDT.- Parameters:
substructure (
ost.mol.EntityView
) – Aost.mol.EntityView
which is a subset oftarget
. Should be selected with the OpenStructure query language. Example: if you’re interested in residues with number 42,43 and 85 in chain A:substructure=mapper.target.Select("cname=A and rnum=42,43,85")
ARuntimeError
is raised if substructure does not refer to the same underlyingost.mol.EntityHandle
astarget
.model (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Structure in which one wants to find representations for substructuretopn (
int
) – Max number of representations that are returnedinclusion_radius (
float
) – Inclusion radius for lDDTthresholds (
list
offloat
) – Thresholds for lDDTbb_only (
bool
) – Only consider backbone atoms in lDDT computationonly_interchain (
bool
) – Only score interchain contacts in lDDT. Useful if you want to identify interface patches.chem_mapping_result (
tuple
) – Pro param. The result ofGetChemMapping()
where you provided model. If set, model parameter is not used.global_mapping (
MappingResult
) – Pro param. Specify a global mapping result. This fully defines the desired representation in the model but extracts it and enriches it with all the nice attributes ofReprResult
. The target attribute in global_mapping must be of the same entity as self.target and the model attribute of global_mapping must be of the same entity as model.
- Returns:
list
ofReprResult
- GetNMappings(model)¶
Returns number of possible mappings
- Parameters:
model (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Model with chains that are mapped ontochem_groups
- ProcessStructure(ent)¶
Entity processing for chain mapping
Selects view containing peptide and nucleotide residues which have required backbone atoms present - for peptide residues thats N, CA, C and CB (no CB for GLY), for nucleotide residues thats O5’, C5’, C4’, C3’ and O3’.
filters view by chain lengths, see min_pep_length and min_nuc_length in constructor
Extracts atom sequences for each chain in that view
Attaches corresponding
ost.mol.EntityView
to each sequenceIf residue number alignments are used, strictly increasing residue numbers without insertion codes are ensured in each chain
- Parameters:
ent (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Entity to process- Returns:
Tuple with 3 elements: 1)
ost.mol.EntityView
containing peptide and nucleotide residues 2)ost.seq.SequenceList
containing ATOMSEQ sequences for each polypeptide chain in returned view, sequences haveost.mol.EntityView
of according chains attached 3) same for polynucleotide chains
- Align(s1, s2, stype)¶
Access to internal sequence alignment functionality
Alignment parameterization is setup at ChainMapper construction
- Parameters:
s1 (
ost.seq.SequenceHandle
) – First sequence to align - must have view attached in case of resnum_alignmentss2 (
ost.seq.SequenceHandle
) – Second sequence to align - must have view attached in case of resnum_alignmentsstype – Type of sequences to align, must be in [
ost.mol.ChemType.AMINOACIDS
,ost.mol.ChemType.NUCLEOTIDES
]
- Returns:
Pairwise alignment of s1 and s2
- class ReprResult(lDDT, substructure, ref_view, mdl_view)¶
Result object for
ChainMapper.GetRepr()
Constructor is directly called within the function, no need to construct such objects yourself.
- Parameters:
lDDT (
float
) – lDDT for this mapping. Depends on how you callChainMapper.GetRepr()
whether this is backbone only or full atom lDDT.substructure (
ost.mol.EntityView
) – The full substructure for which we searched for a representationref_view (
mol.EntityView
) – View pointing to the same underlying entity as substructure but only contains the stuff that is mappedmdl_view (
mol.EntityView
) – The matching counterpart in model
- property lDDT¶
lDDT of representation result
Depends on how you call
ChainMapper.GetRepr()
whether this is backbone only or full atom lDDT.- Type:
float
- property substructure¶
The full substructure for which we searched for a representation
- Type:
- property ref_view¶
View which contains the mapped subset of
substructure
- Type:
- property ref_residues¶
The reference residues
- Type:
class:mol.ResidueViewList
- property mdl_residues¶
The model residues
- Type:
mol.ResidueViewList
- property inconsistent_residues¶
A list of mapped residue whose names do not match (eg. ALA in the reference and LEU in the model).
The mismatches are reported as a tuple of
ResidueView
(reference, model), or as an empty list if all the residue names match.- Type:
list
- property ref_bb_pos¶
Representative backbone positions for reference residues.
Thats CA positions for peptides and C3’ positions for Nucleotides.
- Type:
geom.Vec3List
- property mdl_bb_pos¶
Representative backbone positions for model residues.
Thats CA positions for peptides and C3’ positions for Nucleotides.
- Type:
geom.Vec3List
- property ref_full_bb_pos¶
Representative backbone positions for reference residues.
Thats N, CA and C positions for peptides and O5’, C5’, C4’, C3’, O3’ positions for Nucleotides.
- Type:
geom.Vec3List
- property mdl_full_bb_pos¶
Representative backbone positions for reference residues.
Thats N, CA and C positions for peptides and O5’, C5’, C4’, C3’, O3’ positions for Nucleotides.
- Type:
geom.Vec3List
- property transform¶
Transformation to superpose mdl residues onto ref residues
Superposition computed as minimal RMSD superposition on
ref_bb_pos
andmdl_bb_pos
. If number of positions is smaller 3, the full_bb_pos equivalents are used instead.- Type:
- property superposed_mdl_bb_pos¶
mdl_bb_pos
withtransform applied
- Type:
geom.Vec3List
- property bb_rmsd¶
RMSD between
ref_bb_pos
andsuperposed_mdl_bb_pos
- Type:
float
- property gdt_8¶
GDT with one single threshold: 8.0
- Type:
float
- property gdt_4¶
GDT with one single threshold: 4.0
- Type:
float
- property gdt_2¶
GDT with one single threshold: 2.0
- Type:
float
- property gdt_1¶
GDT with one single threshold: 1.0
- Type:
float
- property ost_query¶
query for mdl residues in OpenStructure query language
Repr can be selected as
full_mdl.Select(ost_query)
Returns invalid query if residue numbers have insertion codes.
- Type:
str
- JSONSummary()¶
Returns JSON serializable summary of results
- GetFlatChainMapping(mdl_as_key=False)¶
Returns flat mapping of all chains in the representation
- Parameters:
mdl_as_key – Default is target chain name as key and model chain name as value. This can be reversed with this flag.
- Returns:
dict
withstr
as key/value that describe one-to-one mapping
- class MappingResult(target, model, chem_groups, chem_mapping, mapping, alns, opt_score=None)¶
Result object for the chain mapping functions in
ChainMapper
Constructor is directly called within the functions, no need to construct such objects yourself.
- property target¶
Target/reference structure, i.e.
ChainMapper.target
- Type:
- property model¶
Model structure that gets mapped onto
target
Underwent same processing as
ChainMapper.target
, i.e. only contains peptide/nucleotide chains of sufficient size.- Type:
- property chem_groups¶
Groups of chemically equivalent chains in
target
Same as
ChainMapper.chem_group
list
oflist
ofstr
(chain names)
- property chem_mapping¶
Assigns chains in
model
tochem_groups
.list
oflist
ofstr
(chain names)
- property mapping¶
Mapping of
model
chains ontotarget
Exact same shape as
chem_groups
but containing the names of the mapped chains inmodel
. May contain None fortarget
chains that are not covered. No guarantee that all chains inmodel
are mapped.list
oflist
ofstr
(chain names)
- property alns¶
Alignments of mapped chains in
target
andmodel
Each alignment is accessible with
alns[(t_chain,m_chain)]
. First sequence is the sequence oftarget
chain, second sequence the one frommodel
. The respectiveost.mol.EntityView
are attached withost.seq.ConstSequenceHandle.AttachView()
.- Type:
dict
with key:tuple
ofstr
, value:ost.seq.AlignmentHandle
- property opt_score¶
Placeholder property without any guarantee of being set
Different scores get optimized in the various chain mapping algorithms. Some of them may set their final optimal score in that property. Consult the documentation of the respective chain mapping algorithm for more information. Won’t be in the return dict of
JSONSummary()
.
- GetFlatMapping(mdl_as_key=False)¶
Returns flat mapping as
dict
for all mapable chains- Parameters:
mdl_as_key – Default is target chain name as key and model chain name as value. This can be reversed with this flag.
- Returns:
dict
withstr
as key/value that describe one-to-one mapping
- JSONSummary()¶
Returns JSON serializable summary of results
qsscore
– New QS score implementation¶
Note
This is a new implementation of QS Score, introduced in OpenStructure 2.4 and tightly integrated with the chain mapping algorithms. The previous qsscoring code that comes with Bertoni et al. is considered deprecated.
- class QSEntity(ent, contact_d=12.0)¶
Helper object for QS-score computation
Holds structural information and getters for interacting chains, i.e. interfaces. Peptide residues are represented by their CB position (CA for GLY) and nucleotides by C3’.
- Parameters:
ent (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Structure for QS score computationcontact_d (
float
) – Pairwise distance of residues to be considered as contacts
- property view¶
Processed structure
View that only contains representative atoms. That’s CB for peptide residues (CA for GLY) and C3’ for nucleotides.
- Type:
- property contact_d¶
Pairwise distance of residues to be considered as contacts
Given at
QSEntity
construction- Type:
float
- GetSequence(chain_name)¶
Get sequence of chain
Returns sequence of specified chain as raw
str
- Parameters:
chain_name (
str
) – Chain inview
- GetPos(chain_name)¶
Get representative positions of chain
That’s CB positions for peptide residues (CA for GLY) and C3’ for nucleotides. Returns positions as a numpy array of shape (n_residues, 3).
- Parameters:
chain_name (
str
) – Chain inview
- PairDist(chain_name_one, chain_name_two)¶
Get pairwise distances between two chains
Returns distances as numpy array of shape (a, b). Where a is the number of residues of the chain that comes BEFORE the other in
chain_names
- GetMinPos(chain_name)¶
Get min x,y,z cooridnates for given chain
Based on positions that are extracted with GetPos
- Parameters:
chain_name (
str
) – Chain inview
- GetMaxPos(chain_name)¶
Get max x,y,z cooridnates for given chain
Based on positions that are extracted with GetPos
- Parameters:
chain_name (
str
) – Chain inview
- PotentialInteraction(chain_name_one, chain_name_two)¶
Returns True if chains potentially interact
Based on crude collision detection. There is no guarantee that they actually interact if True is returned. However, if False is returned, they don’t interact for sure.
- class QSScorer(target, chem_groups, model, alns, contact_d=12.0)¶
Helper object to compute QS-score
Tightly integrated into the mechanisms from the chain_mapping module. The prefered way to derive an object of type
QSScorer
is through the static constructor:FromMappingResult()
. Example score computation including mapping:from ost.mol.alg.qsscore import QSScorer from ost.mol.alg.chain_mapping import ChainMapper ent_1 = io.LoadPDB("path_to_assembly_1.pdb") ent_2 = io.LoadPDB("path_to_assembly_2.pdb") chain_mapper = ChainMapper(ent_1) mapping_result = chain_mapper.GetlDDTMapping(ent_2) qs_scorer = QSScorer.FromMappingResult(mapping_result) score_result = qs_scorer.Score(mapping_result.mapping) print("score:", score_result.QS_global)
QS-score computation in
QSScorer.Score()
implements caching. Repeated computations with alternative chain mappings thus become faster.- Parameters:
target (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Structure designated as “target”. Can be fetched fromost.mol.alg.chain_mapping.MappingResult
chem_groups (
list
oflist
ofstr
) – Groups of chemically equivalent chains in target. Can be fetched fromost.mol.alg.chain_mapping.MappingResult
model (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Structure designated as “model”. Can be fetched fromost.mol.alg.chain_mapping.MappingResult
alns (
dict
with key:tuple
ofstr
, value:ost.seq.AlignmentHandle
) – Each alignment is accessible withalns[(t_chain,m_chain)]
. First sequence is the sequence of the respective chain inqsent1
, second sequence the one fromqsent2
. Can be fetched fromost.mol.alg.chain_mapping.MappingResult
- static FromMappingResult(mapping_result)¶
The preferred way to get a
QSScorer
Static constructor that derives an object of type
QSScorer
using aost.mol.alg.chain_mapping.MappingResult
- Parameters:
mapping_result (
ost.mol.alg.chain_mapping.MappingResult
) – Data source
- property chem_groups¶
Groups of chemically equivalent chains in target
Provided at object construction
- Type:
list
oflist
ofstr
- property alns¶
Alignments between chains in
qsent1
andqsent2
Provided at object construction. Each alignment is accessible with
alns[(t_chain,m_chain)]
. First sequence is the sequence of the respective chain inqsent1
, second sequence the one fromqsent2
.- Type:
dict
with key:tuple
ofstr
, value:ost.seq.AlignmentHandle
- Score(mapping, check=True)¶
Computes QS-score given chain mapping
Again, the preferred way is to get mapping is from an object of type
ost.mol.alg.chain_mapping.MappingResult
.- Parameters:
mapping (
list
oflist
ofstr
) – seeost.mol.alg.chain_mapping.MappingResult.mapping
check (
bool
) – Perform input checks, can be disabled for speed purposes if you know what you’re doing.
- Returns:
Result object of type
QSScorerResult
- ScoreInterface(trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)¶
Computes QS-score only considering one interface
This only works for interfaces that are computed in
Score()
, i.e. interfaces for which the alignments are set up correctly.- Parameters:
trg_ch1 (
str
) – Name of first interface chain in targettrg_ch2 (
str
) – Name of second interface chain in targetmdl_ch1 (
str
) – Name of first interface chain in modelmdl_ch2 (
str
) – Name of second interface chain in model
- Returns:
Result object of type
QSScorerResult
- Raises:
RuntimeError
if no aln for trg_ch1/mdl_ch1 or trg_ch2/mdl_ch2 is available.
DockQ
– DockQ implementation¶
- DockQ(mdl, ref, mdl_ch1, mdl_ch2, ref_ch1, ref_ch2, ch1_aln=None, ch2_aln=None)¶
Computes DockQ for specified interface
DockQ is described in: Sankar Basu and Bjoern Wallner (2016), “DockQ: A Quality Measure for Protein-Protein Docking Models”, PLOS one
Residues are mapped based on residue numbers by default. If you provide ch1_aln and ch2_aln you can enforce an arbitrary mapping.
- Parameters:
mdl (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Model structureref (
ost.mol.EntityView
/ost.mol.EntityHandle
) – Reference structure, i.e. native structuremdl_ch1 (
str
) – Specifies chain in model constituting first part of interfacemdl_ch2 (
str
) – Specifies chain in model constituting second part of interfaceref_ch1 (
str
) – ref equivalent of mdl_ch1ref_ch2 (
str
) – ref equivalent of mdl_ch2ch1_aln (
ost.seq.AlignmentHandle
) – Alignment with two sequences to map ref_ch1 and mdl_ch1. The first sequence must match the sequence in ref_ch1 and the second to mdl_ch1.ch2_aln (
ost.seq.AlignmentHandle
) – Alignment with two sequences to map ref_ch2 and mdl_ch2. The first sequence must match the sequence in ref_ch2 and the second to mdl_ch2.
- Returns:
dict
with keys nnat, nmdl, fnat, fnonnat, irmsd, lrmsd, DockQ which corresponds to the equivalent values in the original DockQ implementation.
Contact Scores
– Contact based scores¶
- class ContactEntity(ent, contact_d=5.0, contact_mode='aa')¶
Helper object for Contact-score computation
- property view¶
The structure depending on contact_mode
Full view in case of “aa”, view that only contains representative atoms in case of “repr”.
- Type:
- property contact_mode¶
The contact mode
Can either be “aa”, meaning that all atoms are considered to identify contacts, or “repr” which only considers distances between representative atoms. For peptides thats CB (CA for GLY), for nucleotides thats C3’.
- Type:
str
- property contact_d¶
Pairwise distance of residues to be considered as contacts
Given at
ContactScorer
construction- Type:
float
- property contacts¶
Interchain contacts
Organized as
dict
with key (cname1, cname2) and values being a set of tuples with the respective residue indices. cname1 < cname2 evaluates to True.
- property hr_contacts¶
Human readable interchain contacts
Human readable version of
contacts
. Simple list with tuples containing two strings specifying the residues in contact. Format: <cname>.<rnum>.<ins_code>
- property interface_residues¶
Interface residues
Residues in each chain that are in contact with any other chain. Organized as
dict
with key cname and values the respective residue indices in aset
.
- property hr_interface_residues¶
Human readable interface residues
Human readable version of
interface_residues
.list
of strings specifying the interface residues in format: <cname>.<rnum>.<ins_code>
- class ContactScorerResultICS(n_trg_contacts, n_mdl_contacts, n_union, n_intersection)¶
Holds data relevant to compute ics
- property n_trg_contacts¶
Number of contacts in target
- Type:
int
- property n_mdl_contacts¶
Number of contacts in model
- Type:
int
- property precision¶
Precision of model contacts
The fraction of model contacts that are also present in target
- Type:
int
- property recall¶
Recall of model contacts
The fraction of target contacts that are also present in model
- Type:
int
- class ContactScorerResultIPS(n_trg_int_res, n_mdl_int_res, n_union, n_intersection)¶
Holds data relevant to compute ips
- property n_trg_int_res¶
Number of interface residues in target
- Type:
int
- property n_mdl_int_res¶
Number of interface residues in model
- Type:
int
- property precision¶
Precision of model interface residues
The fraction of model interface residues that are also interface residues in target
- Type:
int
- property recall¶
Recall of model interface residues
The fraction of target interface residues that are also interface residues in model
- Type:
int
- property ips¶
The Interface Patch Similarity score (IPS)
Jaccard coefficient of interface residues in model/target. Technically thats
intersection
/union
- Type:
float
- class ContactScorer(target, chem_groups, model, alns, contact_mode='aa', contact_d=5.0)¶
Helper object to compute Contact scores
Tightly integrated into the mechanisms from the chain_mapping module. The prefered way to derive an object of type
ContactScorer
is through the static constructor:FromMappingResult()
.Usage is the same as for
ost.mol.alg.QSScorer
- static FromMappingResult(mapping_result, contact_mode='aa', contact_d=5.0)¶
The preferred way to get a
ContactScorer
Static constructor that derives an object of type
ContactScorer
using aost.mol.alg.chain_mapping.MappingResult
- Parameters:
mapping_result (
ost.mol.alg.chain_mapping.MappingResult
) – Data source
- property cent1¶
Represents target
- Type:
- property chem_groups¶
Groups of chemically equivalent chains in target
Provided at object construction
- Type:
list
oflist
ofstr
- property cent2¶
Represents model
- Type:
- property alns¶
Alignments between chains in
cent1
andcent2
Provided at object construction. Each alignment is accessible with
alns[(t_chain,m_chain)]
. First sequence is the sequence of the respective chain incent1
, second sequence the one fromcent2
.- Type:
dict
with key:tuple
ofstr
, value:ost.seq.AlignmentHandle
- ScoreICS(mapping, check=True)¶
Computes ICS given chain mapping
Again, the preferred way is to get mapping is from an object of type
ost.mol.alg.chain_mapping.MappingResult
.- Parameters:
mapping (
list
oflist
ofstr
) – seeost.mol.alg.chain_mapping.MappingResult.mapping
check (
bool
) – Perform input checks, can be disabled for speed purposes if you know what you’re doing.
- Returns:
Result object of type
ContactScorerResultICS
- ScoreICSInterface(trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)¶
Computes ICS scores only considering one interface
This only works for interfaces that are computed in
Score()
, i.e. interfaces for which the alignments are set up correctly.- Parameters:
trg_ch1 (
str
) – Name of first interface chain in targettrg_ch2 (
str
) – Name of second interface chain in targetmdl_ch1 (
str
) – Name of first interface chain in modelmdl_ch2 (
str
) – Name of second interface chain in model
- Returns:
Result object of type
ContactScorerResultICS
- Raises:
RuntimeError
if no aln for trg_ch1/mdl_ch1 or trg_ch2/mdl_ch2 is available.
- ICSFromFlatMapping(flat_mapping)¶
Same as
ScoreICS()
but with flat mapping- Parameters:
flat_mapping (
dict
withstr
as key and value) – Dictionary with target chain names as keys and the mapped model chain names as value- Returns:
Result object of type
ContactScorerResultICS
- ScoreIPS(mapping, check=True)¶
Computes IPS given chain mapping
Again, the preferred way is to get mapping is from an object of type
ost.mol.alg.chain_mapping.MappingResult
.- Parameters:
mapping (
list
oflist
ofstr
) – seeost.mol.alg.chain_mapping.MappingResult.mapping
check (
bool
) – Perform input checks, can be disabled for speed purposes if you know what you’re doing.
- Returns:
Result object of type
ContactScorerResultIPS
- ScoreIPSInterface(trg_ch1, trg_ch2, mdl_ch1, mdl_ch2)¶
Computes IPS scores only considering one interface
This only works for interfaces that are computed in
Score()
, i.e. interfaces for which the alignments are set up correctly.- Parameters:
trg_ch1 (
str
) – Name of first interface chain in targettrg_ch2 (
str
) – Name of second interface chain in targetmdl_ch1 (
str
) – Name of first interface chain in modelmdl_ch2 (
str
) – Name of second interface chain in model
- Returns:
Result object of type
ContactScorerResultIPS
- Raises:
RuntimeError
if no aln for trg_ch1/mdl_ch1 or trg_ch2/mdl_ch2 is available.
- IPSFromFlatMapping(flat_mapping)¶
Same as
ScoreIPS()
but with flat mapping- Parameters:
flat_mapping (
dict
withstr
as key and value) – Dictionary with target chain names as keys and the mapped model chain names as value- Returns:
Result object of type
ContactScorerResultIPS
Steric Clashes¶
The following function detects steric clashes in atomic structures. Two atoms are clashing if their euclidian distance is smaller than a threshold value (minus a tolerance offset).
- FilterClashes(entity, clashing_distances, always_remove_bb=False)¶
This function filters out residues with non-bonded clashing atoms. If the clashing atom is a backbone atom, the complete residue is removed from the structure, if the atom is part of the sidechain, only the sidechain atoms are removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a clash is just detected in the side-chain.
The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a
ClashingInfo
object containing information about the detected clashes.Two atoms are defined as clashing if their distance is shorter than the reference distance minus a tolerance threshold. The information about the clashing distances and the tolerance thresholds for all possible pairs of atoms is passed to the function as a parameter.
Hydrogen and deuterium atoms are ignored by this function.
- Parameters:
entity (
EntityView
orEntityHandle
) – The input entityclashing_distances (
ClashingDistances
) – information about the clashing distancesalways_remove_bb (
bool
) – if set to True, the whole residue is removed even if the clash happens in the side-chain
- Returns:
A tuple of two elements: The filtered
EntityView
, and aClashingInfo
object
- CheckStereoChemistry(entity, bond_stats, angle_stats, bond_tolerance, angle_tolerance, always_remove_bb=False)¶
This function filters out residues with severe stereo-chemical violations. If the violation involves a backbone atom, the complete residue is removed from the structure, if it involves an atom that is part of the sidechain, only the sidechain is removed. This behavior is changed by the always_remove_bb flag: when the flag is set to True the whole residue is removed even if a violation is just detected in the side-chain.
The function returns a view containing all elements (residues, atoms) that have not been removed from the input structure, plus a
StereoChemistryInfo
object containing information about the detected stereo-chemical violations.A violation is defined as a bond length that lies outside of the range: [mean_length-std_dev*bond_tolerance, mean_length+std_dev*bond_tolerance] or an angle width outside of the range [mean_width-std_dev*angle_tolerance, mean_width+std_dev*angle_tolerance ]. The information about the mean lengths and widths and the corresponding standard deviations is passed to the function using two parameters.
Hydrogen and deuterium atoms are ignored by this function.
- Parameters:
entity (
EntityView
orEntityHandle
) – The input entitybond_stats (
StereoChemicalParams
) – statistics about bond lengthsangle_stats (
StereoChemicalParams
) – statistics about angle widthsbond_tolerance (
float
) – tolerance for bond lengths (in standard deviations)angle_tolerance (
float
) – tolerance for angle widths (in standard deviations)always_remove_bb (
bool
) – if set to True, the whole residue is removed even if the clash happens in the side-chain
- Returns:
A tuple of two elements: The filtered
EntityView
, and aStereoChemistryInfo
object
- class ClashingInfo¶
This object is returned by the
FilterClashes()
function, and contains information about the clashes detected by the function.- GetClashCount()¶
- Returns:
number of clashes between non-bonded atoms detected in the input structure
- GetAverageOffset()¶
- Returns:
a value in Angstroms representing the average offset by which clashing atoms lie closer than the minimum acceptable distance (which of course differs for each possible pair of elements)
- GetClashList()¶
- Returns:
list of detected inter-atomic clashes
- Return type:
list
ofClashEvent
- class ClashEvent¶
This object contains all the information relative to a single clash detected by the
FilterClashes()
function- GetModelDistance()¶
- Returns:
distance (in Angstroms) between the two clashing atoms as observed in the model
- GetAdjustedReferenceDistance()¶
- Returns:
minimum acceptable distance (in Angstroms) between the two atoms involved in the clash, as defined in
ClashingDistances
- class StereoChemistryInfo¶
This object is returned by the
CheckStereoChemistry()
function, and contains information about bond lengths and planar angle widths in the structure that diverge from the parameters tabulated by Engh and Huber in the International Tables of Crystallography. Only elements that diverge from the tabulated value by a minimumnumber of standard deviations (defined when the CheckStereoChemistry function is called) are reported.- GetBadBondCount()¶
- Returns:
number of bonds where a serious violation was detected
- GetBondCount()¶
- Returns:
total number of bonds in the structure checked by the CheckStereoChemistry function
- GetAvgZscoreBonds()¶
- Returns:
average z-score of all the bond lengths in the structure, computed using Engh and Huber’s mean and standard deviation values
- GetBadAngleCount()¶
- Returns:
number of planar angles where a serious violation was detected
- GetAngleCount()¶
- Returns:
total number of planar angles in the structure checked by the CheckStereoChemistry function
- GetAvgZscoreAngles()¶
- Returns:
average z-score of all the planar angle widths, computed using Engh and Huber’s mean and standard deviation values.
- GetBondViolationList()¶
- Returns:
list of bond length violations detected in the structure
- Return type:
list
ofStereoChemicalBondViolation
- GetAngleViolationList()¶
- Returns:
list of angle width violations detected in the structure
- Return type:
list
ofStereoChemicalAngleViolation
- class StereoChemicalBondViolation¶
This object contains all the information relative to a single detected violation of stereo-chemical parameters in a bond length
- GetFirstAtom()¶
- GetSecondAtom()¶
- Returns:
first / second atom of the bond
- Return type:
UniqueAtomIdentifier
- GetBondLength()¶
- Returns:
length of the bond (in Angstroms) as observed in the model
- GetAllowedRange()¶
- Returns:
allowed range of bond lengths (in Angstroms), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the
CheckStereoChemistry()
function was called- Return type:
tuple
(minimum and maximum)
- class StereoChemicalAngleViolation¶
This object contains all the information relative to a single detected violation of stereo-chemical parameters in a planar angle width
- GetFirstAtom()¶
- GetSecondAtom()¶
- GetThirdAtom()¶
- Returns:
first / second (vertex) / third atom that defines the planar angle
- Return type:
- GetAngleWidth()¶
- Returns:
width of the planar angle (in degrees) as observed in the model
- GetAllowedRange()¶
- Returns:
allowed range of angle widths (in degrees), according to Engh and Huber’s tabulated parameters and the tolerance threshold used when the
CheckStereoChemistry()
function was called- Return type:
tuple
(minimum and maximum)
- class ClashingDistances¶
Object containing information about clashing distances between non-bonded atoms
- ClashingDistances()¶
Creates an empty distance list
- SetClashingDistance(ele1, ele2, clash_distance, tolerance)¶
Adds or replaces an entry in the list
- Parameters:
ele1 – string containing the first element’s name
ele2 – string containing the second element’s name
clash_distance – minimum clashing distance (in Angstroms)
tolerance – tolerance threshold (in Angstroms)
- GetClashingDistance(ele1, ele2)¶
- Returns:
reference distance and a tolerance threshold (both in Angstroms) for two elements
- Return type:
tuple
(minimum clashing distance, tolerance threshold)- Parameters:
ele1 – string containing the first element’s name
ele2 – string containing the second element’s name
- GetAdjustedClashingDistance(ele1, ele2)¶
- Returns:
reference distance (in Angstroms) for two elements, already adjusted by the tolerance threshold
- Parameters:
ele1 – string containing the first element’s name
ele2 – string containing the second element’s name
- GetMaxAdjustedDistance()¶
- Returns:
longest clashing distance (in Angstroms) in the list, after adjustment with tolerance threshold
- IsEmpty()¶
- Returns:
True if the list is empty (i.e. in an invalid, useless state)
- PrintAllDistances()¶
Prints all distances in the list to standard output
- class StereoChemicalParams¶
Object containing stereo-chemical information about bonds and angles. For each item (bond or angle in a specific residue), stores the mean and standard deviation
- StereoChemicalParams()¶
Creates an empty parameter list
- SetParam(item, residue, mean, standard_dev)¶
Adds or replaces an entry in the list
- Parameters:
item – string defining a bond (format: X-Y) or an angle (format: X-Y-Z), where X,Y an Z are atom names
residue – string containing the residue type for this entry
mean – mean bond length (in Angstroms) or angle width (in degrees)
standard_dev – standard deviation of the bond length (in Angstroms) or of the angle width (in degrees)
- IsEmpty()¶
- Returns:
True if the list is empty (i.e. in an invalid, useless state)
- PrintAllParameters()¶
Prints all entries in the list to standard output
- FillClashingDistances(file_content)¶
- FillBondStereoChemicalParams(file_content)¶
- FillAngleStereoChemicalParams(file_content)¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from the content of a parameter file.
- Parameters:
file_content (
list
ofstr
) – list of lines from the parameter file- Return type:
- FillClashingDistancesFromFile(filename)¶
- FillBondStereoChemicalParamsFromFile(filename)¶
- FillAngleStereoChemicalParamsFromFile(filename)¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, starting from a file path.
- Parameters:
filename (
str
) – path to parameter file- Return type:
- DefaultClashingDistances()¶
- DefaultBondStereoChemicalParams()¶
- DefaultAngleStereoChemicalParams()¶
These three functions fill a list of reference clashing distances, a list of stereo-chemical parameters for bonds and a list of stereo-chemical parameters for angles, respectively, using the default parameter files distributed with OpenStructure.
- Return type:
- ResidueNamesMatch(probe, reference)¶
The function requires a reference structure and a probe structure. The function checks that all the residues in the reference structure that appear in the probe structure (i.e., that have the same ResNum) are of the same residue type. Chains are compared by order, not by chain name (i.e.: the first chain of the reference will be compared with the first chain of the probe structure, etc.)
- Parameters:
probe (
EntityView
) – the structure to testreference (
EntityView
) – the reference structure
- Returns:
True if the residue names are the same, False otherwise
Superposing structures¶
- Superpose(ent_a, ent_b, match='number', atoms='all', iterative=False, max_iterations=5, distance_threshold=3.0)¶
Superposes the model entity onto the reference. To do so, two views are created, returned with the result. atoms describes what goes into these views and match the selection method. For superposition,
SuperposeSVD()
orIterativeSuperposeSVD()
are called (depending on iterative). For matching, the following methods are recognised:number
- select residues by residue number, includes atoms, callsMatchResidueByNum()
index
- select residues by index in chain, includes atoms, callsMatchResidueByIdx()
local-aln
- select residues from a Smith/Waterman alignment, includes atoms, callsMatchResidueByLocalAln()
global-aln
- select residues from a Needleman/Wunsch alignment, includes atoms, callsMatchResidueByGlobalAln()
- Parameters:
ent_a (
EntityView
orEntityHandle
) – The model entity (superposition transform is applied on full entity handle here)ent_b (
EntityView
orEntityHandle
) – The reference entitymatch (
str
) – Method to gather residues/ atomsatoms (
str
,list
,set
) – The subset of atoms to be used in the superpositioniterative (
bool
) – Whether or not to use iterative superpositon.max_iterations (
int
) – Max. number of iterations forIterativeSuperposeSVD()
(only if iterative = True)distance_threshold (
float
) – Distance threshold forIterativeSuperposeSVD()
(only if iterative = True)
- Returns:
An instance of
SuperpositionResult
.
- ParseAtomNames(atoms)¶
Parses different representations of a list of atom names and returns a
set
, understandable byMatchResidueByNum()
. In essence, this function translatesNone to
None
‘all’ to
None
‘backbone’ to
set(['N', 'CA', 'C', 'O'])
‘aname1, aname2’ to
set(['aname1', 'aname2'])
['aname1', 'aname2']
toset(['aname1', 'aname2'])
- Parameters:
atoms (
str
,list
,set
) – Identifier or list of atoms- Returns:
A
set
of atoms.
- MatchResidueByNum(ent_a, ent_b, atoms='all')¶
Returns a tuple of views containing exactly the same number of atoms. Residues are matched by residue number. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in the order they occur in the entities. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.
- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same amount of residues matched by number. Each residue will have the same number & type of atoms.
- MatchResidueByIdx(ent_a, ent_b, atoms='all')¶
Returns a tuple of views containing exactly the same number of atoms. Residues are matched by position in the chains of an entity. A subset of atoms to be included in the views can be specified in the atoms argument. Regardless of what the list of atoms says, only those present in two matched residues will be included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count. The number of residues per chain is supposed to be the same.
- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same amount of residues matched by position. Each residue will have the same number & type of atoms.
- MatchResidueByLocalAln(ent_a, ent_b, atoms='all')¶
Match residues by local alignment. Takes ent_a and ent_b, extracts the sequences chain-wise and aligns them in Smith/Waterman manner using the BLOSUM62 matrix for scoring. Only residues which are marked as
peptide linking
are considered for alignment. The residues of the entities are then matched based on this alignment. Only atoms present in both residues are included in the views. Chains are processed in order of appearance. If ent_a and ent_b contain a different number of chains, processing stops with the lower count.- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same number of residues. Each residue will have the same number & type of atoms.
- MatchResidueByGlobalAln(ent_a, ent_b, atoms='all')¶
Match residues by global alignment. Same as
MatchResidueByLocalAln()
but performs a global Needleman/Wunsch alignment of the sequences using the BLOSUM62 matrix for scoring.- Parameters:
ent_a (
EntityView
orEntityHandle
) – The first entityent_b (
EntityView
orEntityHandle
) – The second entityatoms (
str
,list
,set
) – The subset of atoms to be included in the two views.
- Returns:
Two
EntityView
instances with the same number of residues. Each residue will have the same number & type of atoms.
- class SuperpositionResult¶
- rmsd¶
RMSD of the superposed entities.
- view1¶
- view2¶
Two
EntityView
used in superposition (not set if methods withVec3List
used).
- fraction_superposed¶
- rmsd_superposed_atoms¶
- ncycles¶
For iterative superposition (
IterativeSuperposeSVD()
): fraction and RMSD of atoms that were superposed with a distance below the given threshold and the number of iteration cycles performed.
- SuperposeSVD(view1, view2, apply_transform=True)¶
- SuperposeSVD(list1, list2)
Superposition of two sets of atoms minimizing RMSD using a classic SVD based algorithm.
Note that the atom positions in the view are taken blindly in the order in which the atoms appear.
- Parameters:
view1 (
EntityView
) – View on the model entityview2 (
EntityView
) – View on the reference entitylist1 (
Vec3List
) – List of atom positions for model entitylist2 (
Vec3List
) – List of atom positions for reference entityapply_transform (
bool
) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.
- Returns:
An instance of
SuperpositionResult
.
- IterativeSuperposeSVD(view1, view2, max_iterations=5, distance_threshold=3.0, apply_transform=True)¶
- IterativeSuperposeSVD(list1, list2, max_iterations=5, distance_threshold=3.0)
Iterative superposition of two sets of atoms. In each iteration cycle, we keep a fraction of atoms with distances below distance_threshold and get the superposition considering only those atoms.
Note that the atom positions in the view are taken blindly in the order in which the atoms appear.
- Parameters:
view1 (
EntityView
) – View on the model entityview2 (
EntityView
) – View on the reference entitylist1 (
Vec3List
) – List of atom positions for model entitylist2 (
Vec3List
) – List of atom positions for reference entitymax_iterations (
int
) – Max. number of iterations to be performeddistance_threshold (
float
) – Distance threshold defining superposed atomsapply_transform (
bool
) – If True, the superposition transform is applied to the (full!) entity handle linked to view1.
- Returns:
An instance of
SuperpositionResult
.- Raises:
Exception if atom counts do not match or if less than 3 atoms.
- CalculateRMSD(view1, view2, transformation=geom.Mat4())¶
- Returns:
RMSD of atom positions (taken blindly in the order in which the atoms appear) in the two given views.
- Return type:
float
- Parameters:
view1 (
EntityView
) – View on the model entityview2 (
EntityView
) – View on the reference entitytransformation (
Mat4
) – Optional transformation to apply on each atom position of view1.
Algorithms on Structures¶
- Accessibility(ent, probe_radius=1.4, include_hydrogens=False, include_hetatm=False, include_water=False, oligo_mode=False, selection='', asa_abs='asaAbs', asa_rel='asaRel', asa_atom='asaAtom', algorithm=NACCESS)¶
Calculates the accesssible surface area for ever atom in ent. The algorithm mimics the behaviour of the bindings available for the NACCESS and DSSP tools and has been tested to reproduce the numbers accordingly.
- Parameters:
ent (
EntityView
/EntityHandle
) – Entity on which to calculate the surfaceprobe_radius (
float
) – Radius of probe to determine accessible surface areainclude_hydrogens (
bool
) – Whether to include hydrogens in the solvent accessibility calculations. By default, every atom with ele=H,D is simply neglected.include_hetatms (
bool
) – Whether to include atoms flagged as hetatms , i.e. ligands, in the solvent accessibility calculations. They are neglected by default.include_water (
bool
) – Whether to include water in the solvent accessibility calculations. By default, every residue with name “HOH” is neglected.oligo_mode (
bool
) – A typical used case of accessibility calculations is to determine the solvent accessibility of a full complex and then the accessibility of each chain individually. Lots of calculations can be cached because only the values of the atoms close to an interface change. This is exactly what happens when you activate the oligo mode. It returns exactly the same value but adds, additionally to the values estimated in full complex, the values from each individual chain as float properties on every residue and atom. Example for atom accessible surface if the according property name is set to “asaAtom”: Accessibility in the full complex is stored as “asaAtom”, the accessibility when only considering that particular chain is stored as “asaAtom_single_chain”. The other properties follow the same naming scheme.selection (
str
) – Selection statement, that gets applied on ent before doing anything. Everything that is not selected is neglected. The default value of “” results in no selection at all.asa_abs (
str
) – Float property name to assign the summed solvent accessible surface from each atom to a residue.asa_rel (
str
) –Float property name to assign the relative solvent accessibility to a residue. This is the absolute accessibility divided by the maximum solvent accessibility of that particular residue. This maximum solvent accessibility is dependent on the chosen
AccessibilityAlgorithm
. Only residues of the 20 standarad amino acids can be handled.In case of the NACCESS algorithm you can expect a value in range [0.0, 100.0] and a value of -99.9 for non standard residues.
In case of the DSSP algorithm you can expect a value in range [0.0, 1.0], no float property is assigned in case of a non standard residue.
asa_atom (
str
) – Float property name to assign the solvent accessible area to each atom.algorithm (
AccessibilityAlgorithm
) – Specifies the used algorithm for solvent accessibility calculations
- Returns:
The summed solvent accessibilty of each atom in ent.
- class AccessibilityAlgorithm¶
The accessibility algorithm enum specifies the algorithm used by the respective tools. Available are:
NACCESS, DSSP
- AssignSecStruct(ent)¶
Assigns secondary structures to all residues based on hydrogen bond patterns as described by DSSP.
- Parameters:
ent (
EntityView
/EntityHandle
) – Entity on which to assign secondary structures
- class FindMemParam¶
Result object for the membrane detection algorithm described below
- axis¶
initial search axis from which optimal membrane slab could be found
- tilt_axis¶
Axis around which we tilt the membrane starting from the initial axis
- tilt¶
Angle to tilt around tilt axis
- angle¶
After the tilt operation we perform a rotation around the initial axis with this angle to get the final membrane axis
- membrane_axis¶
The result of applying the tilt and rotation procedure described above. The membrane_axis is orthogonal to the membrane plane and has unit length.
- pos¶
Real number that describes the membrane center point. To get the actual position you can do: pos * membrane_axis
- width¶
Total width of the membrane in A
- energy¶
Pseudo energy of the implicit solvation model
- membrane_asa¶
Membrane accessible surface area
- membrane_representation¶
Dummy atoms that represent the membrane. This entity is only valid if the according flag has been set to True when calling FindMembrane.
- FindMembrane(ent, assign_membrane_representation=True, fast=False)¶
Estimates the optimal membrane position of a protein by using an implicit solvation model. The original algorithm and the used energy function are described in: Lomize AL, Pogozheva ID, Lomize MA, Mosberg HI (2006) Positioning of proteins in membranes: A computational approach.
There are some modifications in this implementation and the procedure is as follows:
Initial axis are constructed that build the starting point for initial parameter grid searches.
For every axis, the protein is rotated so that the axis builds the z-axis
In order to exclude internal hydrophilic pores, only the outermost atoms with respect the the z-axis enter an initial grid search
The width and position of the membrane is optimized for different combinations of tilt and rotation angles (further described in
FindMemParam
). The top 20 parametrizations (only top parametrization if fast is True) are stored for further processing.
The 20 best membrane parametrizations from the initial grid search (only the best if fast is set to True) enter a final minimization step using a Levenberg-Marquardt minimizer.
- Parameters:
ent (
ost.mol.EntityHandle
/ost.mol.EntityView
) –Entity of a transmembrane protein, you’ll get weird results if this is not the case. The energy term of the result is typically a good indicator whether ent is an actual transmembrane protein. The following float properties will be set on the atoms:
’asaAtom’ on all atoms that are selected with ent.Select(‘peptide=true and ele!=H’) as a result of envoking
Accessibility()
.’membrane_e’ the contribution of the potentially membrane facing atoms to the energy function.
assign_membrane_representation (
bool
) – Whether to construct a membrane representation using dummy atomsfast – If set to false, the 20 best results of the initial grid search undergo a Levenberg-Marquardt minimization and the parametrization with optimal minimized energy is returned. If set to yes, only the best result of the initial grid search is selected and returned after Levenberg-Marquardt minimization.
- Returns:
The results object
- Return type:
Trajectory Analysis¶
This is a set of functions used for basic trajectory analysis such as extracting
positions, distances, angles and RMSDs. The organization is such that most
functions have their counterpart at the individual frame level
so that they can also be called on one frame instead of
the whole trajectory.
All these functions have a “stride” argument that defaults to stride=1, which is used to skip frames in the analysis.
- SuperposeFrames(frames, sel, from=0, to=-1, ref=-1)¶
This function superposes the frames of the given coord group and returns them as a new coord group.
- Parameters:
frames (
CoordGroupHandle
) – The source coord group.sel (
ost.mol.EntityView
) – An entity view containing the selection of atoms to be used for superposition. If set to an invalid view, all atoms in the coord group are used.from – index of the first frame
to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
ref – The index of the reference frame to use for superposition. If set to -1, the each frame is superposed to the previous frame.
- Returns:
A newly created coord group containing the superposed frames.
- SuperposeFrames(frames, sel, ref_view, from=0, to=-1)
Same as SuperposeFrames above, but the superposition is done on a reference view and not on another frame of the trajectory.
- Parameters:
frames (
CoordGroupHandle
) – The source coord group.sel (
ost.mol.EntityView
) – An entity view containing the selection of atoms of the frames to be used for superposition.ref_view (
ost.mol.EntityView
) – The reference view on which the frames will be superposed. The number of atoms in this reference view should be equal to the number of atoms in sel.from – index of the first frame
to – index of the last frame plus one. If set to -1, the value is set to the number of frames in the coord group
- Returns:
A newly created coord group containing the superposed frames.
- AnalyzeAtomPos(traj, atom1, stride=1)¶
This function extracts the position of an atom from a trajectory. It returns a vector containing the position of the atom for each analyzed frame.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeCenterOfMassPos(traj, sele, stride=1)¶
This function extracts the position of the center-of-mass of a selection (
EntityView
) from a trajectory and returns it as a vector.- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.sele (
EntityView
.) – The selection from which the center of mass is computedstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDistanceBetwAtoms(traj, atom1, atom2, stride=1)¶
This function extracts the distance between two atoms from a trajectory and returns it as a vector.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The first
AtomHandle
.atom2 – The second
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeAngle(traj, atom1, atom2, atom3, stride=1)¶
This function extracts the angle between three atoms from a trajectory and returns it as a vector. The second atom is taken as being the central atom, so that the angle is between the vectors (atom1.pos-atom2.pos) and (atom3.pos-atom2.pos).
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The first
AtomHandle
.atom2 – The second
AtomHandle
.atom3 – The third
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDihedralAngle(traj, atom1, atom2, atom3, atom4, stride=1)¶
This function extracts the dihedral angle between four atoms from a trajectory and returns it as a vector. The angle is between the planes containing the first three and the last three atoms.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.atom1 – The first
AtomHandle
.atom2 – The second
AtomHandle
.atom3 – The third
AtomHandle
.atom4 – The fourth
AtomHandle
.stride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeDistanceBetwCenterOfMass(traj, sele1, sele2, stride=1)¶
This function extracts the distance between the center-of-mass of two selections (
EntityView
) from a trajectory and returns it as a vector.- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.sele1 (
EntityView
.) – The selection from which the first center of mass is computedsele2 (
EntityView
.) – The selection from which the second center of mass is computedstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeRMSD(traj, reference_view, sele_view, stride=1)¶
This function extracts the rmsd between two
EntityView
and returns it as a vector. The views don’t have to be from the same entity. The reference positions are taken directly from the reference_view, evaluated only once. The positions from the sele_view are evaluated for each frame. If you want to compare to frame i of the trajectory t, first use t.CopyFrame(i) for example:eh = io.LoadPDB(...) t = io.LoadCHARMMTraj(eh, ...) sele = eh.Select(...) t.CopyFrame(0) mol.alg.AnalyzeRMSD(t, sele, sele)
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.reference_view (
EntityView
.) – The selection used as reference structuresele_view (
EntityView
.) – The selection compared to the reference_viewstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeMinDistance(traj, view1, view2, stride=1)¶
This function extracts the minimal distance between two sets of atoms (view1 and view2) for each frame in a trajectory and returns it as a vector.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.view1 (
EntityView
.) – The first group of atomsview2 (
EntityView
.) – The second group of atomsstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeMinDistanceBetwCenterOfMassAndView(traj, view_cm, view_atoms, stride=1)¶
This function extracts the minimal distance between a set of atoms (view_atoms) and the center of mass of a second set of atoms (view_cm) for each frame in a trajectory and returns it as a vector.
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.view_cm (
EntityView
.) – The group of atoms from which the center of mass is takenview_atoms (
EntityView
.) – The second group of atomsstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
- AnalyzeAromaticRingInteraction(traj, view_ring1, view_ring2, stride=1)¶
This function is a crude analysis of aromatic ring interactions. For each frame in a trajectory, it calculates the minimal distance between the atoms in one view and the center of mass of the other and vice versa, and returns the minimum between these two minimal distances. Concretely, if the two views are the heavy atoms of two rings, then it returns the minimal center of mass - heavy atom distance betweent he two rings
- Parameters:
traj (
CoordGroupHandle
) – The trajectory to be analyzed.view_ring1 (
EntityView
.) – First group of atomsview_ring2 (
EntityView
.) – Second group of atomsstride – Size of the increment of the frame’s index between two consecutive frames analyzed.
helix_kinks
– Algorithms to calculate Helix Kinks¶
Functions to calculate helix kinks: bend, face shift and wobble angles
Author: Niklaus Johner
- AnalyzeHelixKink(t, sele, proline=False)¶
This function calculates the bend, wobble and face-shift angles in an alpha- helix over a trajectory. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.
- Parameters:
t (
CoordGroup
) – The trajectory to be analyzedsele (
EntityView
) – A selection containing the alpha helix to be analyzedproline (
ost.mol.EntityView
) – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be searched for automatically
- Returns:
A tuple (bend_angle, face_shift, wobble_angle).
- Return type:
(FloatList, FLoatList, FloatList)
- CalculateHelixKink(sele, proline=False)¶
This function calculates the bend, wobble and face-shift angles in an alpha- helix of an EntityView. The determination is more stable if there are at least 4 residues on each side (8 is even better) of the proline around which the helix is kinked. The selection should contain all residues in the correct order and with no gaps and no missing C-alphas.
- Parameters:
sele (
EntityView
) – A selection containing the alpha helix to be analyzedproline (
ost.mol.EntityView
) – A selection containing only the proline (or another residue) around which the helix is kinked. If False, the proline will be searched for automatically
- Returns:
A tuple (bend_angle, face_shift, wobble_angle).
- Return type:
(float, float, float)
trajectory_analysis
– DRMSD, pairwise distances and more¶
This Module requires numpy
This module contains functions to analyze trajectories, mainly similiraty measures baed on RMSDS and pairwise distances.
Author: Niklaus Johner (niklaus.johner@unibas.ch)
- AverageDistanceMatrixFromTraj(t, sele, first=0, last=-1)¶
This function calcultes the distance between each pair of atoms in sele, averaged over the trajectory t.
- Parameters:
t (
CoordGroupHandle
) – the trajectorysele (
EntityView
) – the selection used to determine the atom pairsfirst (
int
) – the first frame of t to be usedlast (
int
) – the last frame of t to be used
- Returns:
a numpy NpairsxNpairs matrix, where Npairs is the number of atom pairs in sele.
- DistRMSDFromTraj(t, sele, ref_sele, radius=7.0, average=False, seq_sep=4, first=0, last=-1)¶
This function calculates the distance RMSD from a trajectory. The distances selected for the calculation are all the distances between pair of atoms from residues that are at least seq_sep apart in the sequence and that are smaller than radius in ref_sel. The number and order of atoms in ref_sele and sele should be the same.
- Parameters:
t (
CoordGroupHandle
) – the trajectorysele (
EntityView
) – the selection used to calculate the distance RMSDref_sele (
EntityView
) – the reference selection used to determine the atom pairs and reference distancesradius (
float
) – the upper limit of distances in ref_sele considered for the calculationseq_sep (
int
) – the minimal sequence separation between atom pairs considered for the calculationaverage (
bool
) – use the average distance in the trajectory as reference instead of the distance obtained from ref_selefirst (
int
) – the first frame of t to be usedlast (
int
) – the last frame of t to be used
- Returns:
a numpy vecor dist_rmsd(Nframes).
- DistanceMatrixFromPairwiseDistances(distances, p=2)¶
This function calculates an distance matrix M(NframesxNframes) from the pairwise distances matrix D(NpairsxNframes), where Nframes is the number of frames in the trajectory and Npairs the number of atom pairs. M[i,j] is the distance between frame i and frame j calculated as a p-norm of the differences in distances from the two frames (distance-RMSD for p=2).
- Parameters:
distances – a pairwise distance matrix as obtained from
PairwiseDistancesFromTraj()
p – exponent used for the p-norm.
- Returns:
a numpy NframesxNframes matrix, where Nframes is the number of frames.
- PairwiseDistancesFromTraj(t, sele, first=0, last=-1, seq_sep=1)¶
This function calculates the distances between any pair of atoms in sele with sequence separation larger than seq_sep from a trajectory t. It return a matrix containing one line for each atom pair and Nframes columns, where Nframes is the number of frames in the trajectory.
- Parameters:
t (
CoordGroupHandle
) – the trajectorysele (
EntityView
) – the selection used to determine the atom pairsfirst (
int
) – the first frame of t to be usedlast (
int
) – the last frame of t to be usedseq_sep (
int
) – The minimal sequence separation between atom pairs
- Returns:
a numpy NpairsxNframes matrix.
- RMSD_Matrix_From_Traj(t, sele, first=0, last=-1, align=True, align_sele=None)¶
This function calculates a matrix M such that M[i,j] is the RMSD (calculated on sele) between frames i and j of the trajectory t aligned on sele.
- Parameters:
t (
CoordGroupHandle
) – the trajectorysele (
EntityView
) – the selection used for alignment and RMSD calculationfirst (
int
) – the first frame of t to be usedlast (
int
) – the last frame of t to be used
- Returns:
Returns a numpy NframesxNframes matrix, where Nframes is the number of frames.
structure_analysis
– Functions to analyze structures¶
Some functions for analyzing structures
Author: Niklaus Johner (Niklaus.Johner@unibas.ch)
- CalculateBestFitLine(sele1)¶
This function calculates the best fit line to the atoms in sele1.
- Parameters:
sele1 (
EntityView
) –- Returns:
- CalculateBestFitPlane(sele1)¶
This function calculates the best fit plane to the atoms in sele1.
- Parameters:
sele1 (
EntityView
) –- Returns:
- CalculateDistanceDifferenceMatrix(sele1, sele2)¶
This function calculates the pairwise distance differences between two selections (
EntityView
). The two selections should have the same number of atoms It returns an NxN DistanceDifferenceMatrix M (where N is the number of atoms in sele1) where M[i,j]=||(sele2.atoms[i].pos-sele2.atoms[j].pos)||-||(sele1.atoms[i].pos-sele1.atoms[j].pos)||- Parameters:
sele1 (
EntityView
) –sele2 (
EntityView
) –
- Returns:
NxN numpy matrix
- CalculateHelixAxis(sele1)¶
This function calculates the best fit cylinder to the CA atoms in sele1, and returns its axis. Residues should be ordered correctly in sele1.
- Parameters:
sele1 (
EntityView
) –- Returns:
- GetAlphaHelixContent(sele1)¶
This function calculates the content of alpha helix in a view. All residues in the view have to ordered and adjacent (no gaps allowed)
- Parameters:
sele1 (
EntityView
) –- Returns:
float
- GetDistanceBetwCenterOfMass(sele1, sele2)¶
This function calculates the distance between the centers of mass of sele1 and sele2, two selections from the same Entity.
- Parameters:
sele1 (
EntityView
) –sele2 (
EntityView
) –
- Returns:
float
- GetFrameFromEntity(eh)¶
This function returns a CoordFrame from an EntityHandle
- Parameters:
eh (
EntityHandle
) –- Returns:
- GetMinDistBetwCenterOfMassAndView(sele1, sele2)¶
This function calculates the minimal distance between sele2 and the center of mass of sele1, two selections from the same Entity.
- Parameters:
sele1 (
EntityView
) – The selection from which the center of mass is takensele2 (
EntityView
) –
- Returns:
distance (
float
)
- GetMinDistanceBetweenViews(sele1, sele2)¶
This function calculates the minimal distance between sele1 and sele2, two selections from the same Entity.
- Parameters:
sele1 (
EntityView
) –sele2 (
EntityView
) –
- Returns:
float
Mapping functions¶
The following functions help to convert one residue into another by reusing as much as possible from the present atoms. They are mainly meant to map from standard amino acid to other standard amino acids or from modified amino acids to standard amino acids.
- CopyResidue(src_res, dst_res, editor)¶
Copies the atoms of
src_res
todst_res
using the residue names as guide to decide which of the atoms should be copied. Ifsrc_res
anddst_res
have the same name, orsrc_res
is a modified version ofdst_res
(i.e. have the same single letter code),CopyConserved()
will be called, otherwiseCopyNonConserved()
.If a CBeta atom wasn’t already copied from
src_res
, a new one at a reconstructed position will be added todst_res
if it is notGLY
and all backbone positions are available to do it.- Parameters:
src_res (
ResidueHandle
) – The source residuedst_res (
ResidueHandle
) – The destination residue (expected to be a standard amino acid)editor (
XCSEditor
) – Editor used to modify dst_res.
- Returns:
True if the residue could be copied as a conserved residue, False if it had to fallback to
CopyNonConserved()
.
- CopyConserved(src_res, dst_res, editor)¶
Copies the atoms of
src_res
todst_res
assuming that the parent amino acid ofsrc_res
(orsrc_res
itself) are identical todst_res
.If
src_res
anddst_res
are identical, all heavy atoms are copied todst_res
. Ifsrc_res
is a modified version ofdst_res
and the modification is a pure addition (e.g. the phosphate group of phosphoserine), the modification is stripped off and all other heavy atoms are copied todst_res
. If the modification is not a pure addition, it falls back toCopyNonConserved()
.Additionally, the selenium atom of
MSE
is converted to sulphur to mapMSE
toMET
.- Parameters:
src_res (
ResidueHandle
) – The source residuedst_res (
ResidueHandle
) – The destination residue (expected to be a standard amino acid)editor (
XCSEditor
) – Editor used to modify dst_res.
- Returns:
A tuple of bools stating whether the residue could be copied without falling back to
CopyNonConserved()
and whether the CBeta atom was copied fromsrc_res
todst_res
.
- CopyNonConserved(src_res, dst_res, editor)¶
Copies the heavy backbone atoms and CBeta (except for
GLY
) ofsrc_res
todst_res
.- Parameters:
src_res (
ResidueHandle
) – The source residuedst_res (
ResidueHandle
) – The destination residue (expected to be a standard amino acid)editor (
XCSEditor
) – Editor used to modify dst_res.
- Returns:
A tuple of bools as in
CopyConserved()
with the first bool always being False.
Molecular Checker (Molck)¶
Programmatic usage¶
Molecular Checker (Molck) could be called directly from the code using Molck function:
#! /bin/env python
"""Run Molck with Python API.
This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:
molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""
from ost.io import LoadPDB, SavePDB
from ost.mol.alg import MolckSettings, Molck
from ost.conop import CompoundLib
pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
# Using Molck function
ent = LoadPDB(pdbid)
ms = MolckSettings(rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False,
map_nonstd_res=False,
assign_elem=True)
Molck(ent, lib, ms)
SavePDB(ent, "<OUTPUT PATH>")
It can also be split into subsequent commands for greater controll:
#! /bin/env python
"""Run Molck with Python API.
This is an exemplary procedure on how to run Molck using Python API which is
equivalent to the command line:
molck <PDB PATH> --rm=hyd,oxt,nonstd,unk \
--fix-ele --out=<OUTPUT PATH> \
--complib=<PATH TO compounds.chemlib>
"""
from ost.io import LoadPDB, SavePDB
from ost.mol.alg import (RemoveAtoms, MapNonStandardResidues,
CleanUpElementColumn)
from ost.conop import CompoundLib
pdbid = "<PDB PATH>"
lib = CompoundLib.Load("<PATH TO compounds.chemlib>")
map_nonstd = False
# Using function chain
ent = LoadPDB(pdbid)
if map_nonstd:
MapNonStandardResidues(lib=lib, ent=ent)
RemoveAtoms(lib=lib,
ent=ent,
rm_unk_atoms=True,
rm_non_std=True,
rm_hyd_atoms=True,
rm_oxt_atoms=True,
rm_zero_occ_atoms=False,
colored=False)
CleanUpElementColumn(lib=lib, ent=ent)
SavePDB(ent, "<OUTPUT PATH>")
API¶
- class MolckSettings(rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=True, assign_elem=True)¶
Stores settings used for Molecular Checker.
- Parameters:
rm_unk_atoms – Sets
rm_unk_atoms
.rm_non_std – Sets
rm_non_std
.rm_hyd_atoms – Sets
rm_hyd_atoms
.rm_oxt_atoms – Sets
rm_oxt_atoms
.rm_zero_occ_atoms – Sets
rm_zero_occ_atoms
.colored – Sets
colored
.map_nonstd_res – Sets
map_nonstd_res
.assign_elem – Sets
assign_elem
.
- rm_unk_atoms¶
Remove unknown and atoms not following the nomenclature.
- Type:
bool
- rm_non_std¶
Remove all residues not one of the 20 standard amino acids
- Type:
bool
- rm_hyd_atoms¶
Remove hydrogen atoms
- Type:
bool
- rm_oxt_atoms¶
Remove terminal oxygens
- Type:
bool
- rm_zero_occ_atoms¶
Remove atoms with zero occupancy
- Type:
bool
- colored¶
Whether output should be colored
- Type:
bool
- map_nonstd_res¶
Maps modified residues back to the parent amino acid, for example MSE -> MET, SEP -> SER
- Type:
bool
- assign_elem¶
Clean up element column
- Type:
bool
- ToString()¶
- Returns:
String representation of the MolckSettings.
- Return type:
str
Warning
The API here is set such that the functions modify the passed structure ent in-place. If this is not ok, please work on a copy of the structure.
- Molck(ent, lib, settings[, prune=True])¶
Runs Molck on provided entity. Reprocesses ent with
ost.conop.HeuristicProcessor
and given lib once done.- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound librarysettings (
MolckSettings
) – Molck settingsprune (
bool
) – Whether to remove residues/chains that don’t contain atoms anymore after Molck cleanup
- MapNonStandardResidues(ent, lib, reprocess=True)¶
Maps modified residues back to the parent amino acid, for example MSE -> MET.
- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound libraryreprocess – The function generates a deep copy of ent. Highly recommended to enable reprocess that runs
ost.conop.HeuristicProcessor
with given lib. If set to False, you’ll have no connectivity etc. after calling this function.
- RemoveAtoms(ent, lib, rm_unk_atoms=False, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False,
- reprocess=True)
Removes atoms and residues according to some criteria.
- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound libraryrm_unk_atoms – See
MolckSettings.rm_unk_atoms
rm_non_std – See
MolckSettings.rm_non_std
rm_hyd_atoms – See
MolckSettings.rm_hyd_atoms
rm_oxt_atoms – See
MolckSettings.rm_oxt_atoms
rm_zero_occ_atoms – See
MolckSettings.rm_zero_occ_atoms
colored – See
MolckSettings.colored
reprocess – Removing atoms may impact certain annotations on the structure (chem class etc.) which are set by
ost.conop.Processor
. If set to True, aost.conop.HeuristicProcessor
with given lib reprocesses ent.
- CleanUpElementColumn(ent, lib)¶
Clean up element column.
- Parameters:
ent (
EntityHandle
) – Structure to checklib (
CompoundLib
) – Compound library
Biounits¶
Biological assemblies, i.e. biounits, are an integral part of mmCIF files and
their construction is fully defined in ost.io.MMCifInfoBioUnit
.
ost.io.MMCifInfoBioUnit.PDBize()
provides one possibility to construct
such biounits with compatibility with the PDB format in mind. That is single
character chain names, dumping all ligands in one chain etc. Here we provide a
more mmCIF-style way of constructing biounits. This can either be done starting
from a ost.io.MMCifInfoBioUnit
or the derived
ost.mol.alg.BUInfo
. The latter is a minimalistic representation of
ost.io.MMCifInfoBioUnit
and can be serialized to a byte string.
- BUInfo(mmcif_buinfo):
Preprocesses data from
ost.io.MMCifInfoBioUnit
that are required to construct a biounit from an assymetric unit. Can be serialized.- Parameters:
mmcif_buinfo (
ost.io.MMCifInfoBioUnit
) – Biounit definition
- ToBytes()¶
- Returns:
A byte string from which the object can be reconstructed.
- CreateBU(asu, bu_info)¶
Constructs a biounit given an assymetric unit and transformation information. The following properties are copied from the assymetric unit and are expected to be set (this is the case if you use
ost.io.LoadMMCIF()
for the assymetric unit):Chain level:
Chain type (see
ost.mol.ChainHandle.type
)
Residue level:
Chem type (see
ost.mol.ResidueHandle.chem_type
)Chem class (
ost.mol.ResidueHandle.chem_class
)One letter code (see
ost.mol.ResidueHandle.one_letter_code
)Secondary structure (see
ost.mol.ResidueHandle.sec_structure
)IsProtein and IsLigand properties (see
ost.mol.ResidueHandle.is_protein
/ost.mol.ResidueHandle.is_ligand
)
Each chain in the returned biounit can be referenced back to the assymetric unit as they follow a standardised naming scheme: <idx>.<asu_cname>, where asu_cname is the chain name in the assymetric unit and idx is the nth occurence of that chain in the biounit with one based indexing. There is a quirk though. An index of 1, for example 1.A, is reserved for the original AU chain with identity transform (read: no transform) applied. If a certain AU chain only occurs with an actual transform applied, numbering starts at 2.
Warning
There is the (rare) possibility that a AU chain that has only identity transform applied is not named 1.<au_cname>. As of january 2024, there are 3 pdb entries (8qn6, 8x1h, 2c0x) where the same AU chain with identity transform occurs several times in the same biounit. This is likely an error in the respective mmCIF files as the resulting chains sit on top of each other. OST just names the FIRST occurence as 1.<au_cname>.
- Parameters:
asu (
ost.mol.EntityHandle
) – The assymetric unitbu_info (
MMCifInfoBioUnit
/BUInfo
) – Info object
- Returns:
A
ost.mol.EntityHandle
of the requested biounit