Local Distance Difference Test (LDDT)¶
Note
This is a new implementation of LDDT, introduced in OpenStructure 2.4 with focus on supporting quaternary structure and compounds beyond the 20 standard proteinogenic amino acids. The previous LDDT code that comes with Mariani et al. is considered deprecated.
Note
lddt.lDDTScorer
provides the raw Python API to compute LDDT but
stereochemistry checks as described in
Mariani et al.
must be done seperately. You may want to check out the
compare-structures
action (Comparing two structures) to
compute LDDT with pre-processing and support for quaternary structures.
- class lDDTScorer(target, compound_lib=None, custom_compounds=None, inclusion_radius=15, sequence_separation=0, symmetry_settings=None, seqres_mapping={}, bb_only=False)¶
LDDT scorer object for a specific target
Sets up everything to score models of that target. LDDT (local distance difference test) is defined as fraction of pairwise distances which exhibit a difference < threshold when considering target and model. In case of multiple thresholds, the average is returned. See
V. Mariani, M. Biasini, A. Barbato, T. Schwede, lDDT : A local superposition-free score for comparing protein structures and models using distance difference tests, Bioinformatics, 2013
- Parameters:
target (
ost.mol.EntityHandle
/ost.mol.EntityView
) – The targetcompound_lib (
ost.conop.CompoundLib
) – Compound library from which a compound for each residue is extracted based on its name. Usesost.conop.GetDefaultLib()
if not given, raises if this returns no valid compound library. Atoms defined in the compound are searched in the residue and build the reference for scoring. If the residue has atoms with names [“A”, “B”, “C”] but the corresponding compound only has [“A”, “B”], “A” and “B” are considered for scoring. If the residue has atoms [“A”, “B”] but the compound has [“A”, “B”, “C”], “C” is considered missing and does not influence scoring, even if present in the model.custom_compounds (
dict
with residue names (str
) as key andCustomCompound
as value.) – Custom compounds defining reference atoms. If given, custom_compounds take precedent over compound_lib.inclusion_radius (
float
) – All pairwise distances < inclusion_radius are considered for scoringsequence_separation (
int
) – Only pairwise distances between atoms of residues which are further apart than this threshold are considered. Residue distance is based on resnum. The default (0) considers all pairwise distances except intra-residue distances.symmetry_settings (
SymmetrySettings
) – Define residues exhibiting internal symmetry, usesGetDefaultSymmetrySettings()
if not given.seqres_mapping (
dict
(key:str
, value:ost.seq.AlignmentHandle
)) – Mapping of model residues at the scoring stage happens with residue numbers defining their location in a reference sequence (SEQRES) using one based indexing. If the residue numbers in target don’t correspond to that SEQRES, you can specify the mapping manually. You can provide a dictionary to specify a reference sequence (SEQRES) for one or more chain(s). Key: chain name, value: alignment (seq1: SEQRES, seq2: sequence of residues in chain). Example: The residues in a chain with name “A” have sequence “YEAH” and residue numbers [42,43,44,45]. You can provide an alignment with seq1 “HELLYEAH
” and seq2 “----YEAH
”. “Y” gets assigned residue number 5, “E” gets assigned 6 and so on no matter what the original residue numbers were.bb_only (
bool
) – Only consider atoms with name “CA” in case of amino acids and “C3’” for Nucleotides. this invalidates compound_lib. Raises if any residue in target is not r.chem_class.IsPeptideLinking() or r.chem_class.IsNucleotideLinking()
- Raises:
RuntimeError
if target contains compound which is not in compound_lib,RuntimeError
if symmetry_settings specifies symmetric atoms that are not present in the according compound in compound_lib,RuntimeError
if seqres_mapping is not provided and target contains residue numbers with insertion codes or the residue numbers for each chain are not monotonically increasing,RuntimeError
if seqres_mapping is provided but an alignment is invalid (seq1 contains gaps, mismatch in seq1/seq2, seq2 does not match residues in corresponding chains).
- DRMSD(model, dist_cap=5, chain_mapping=None, no_interchain=False, no_intrachain=False, residue_mapping=None, check_resnames=True, add_mdl_contacts=False, interaction_data=None)¶
DRMSD of model - globally and per-residue
Very similar to LDDT as we operate on distance differences for all interatomic distances within the same inclusion radius as in LDDT. DRMSD is the distance rmsd, i.e. the RMSD of distance differences. Distance differences are capped at dist_cap which is also the default value for missing distances.
- Parameters:
model (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Model to be scored - models are preferably scored upon performing stereo-chemistry checks in order to punish for non-sensical irregularities. This must be done separately as a pre-processing step. Target contacts that are not covered by model are considered not conserved, thus increasing DRMSD score. This also includes missing model chains or model chains for which no mapping is provided in chain_mapping.dist_cap (
float
) – Cap for distance differences.chain_mapping (
dict
withstr
as keys/values) – Mapping of model chains (key) onto target chains (value). This is required if target or model have more than one chain.no_interchain (
bool
) – Whether to exclude interchain contactsno_intrachain (
bool
) – Whether to exclude intrachain contacts (i.e. only consider interface related contacts)residue_mapping (
dict
with key:str
, value:ost.seq.AlignmentHandle
) – By default, residue mapping is based on residue numbers. That means, a model chain and the respective target chain map to the same underlying reference sequence (SEQRES). Alternatively, you can specify one or several alignment(s) between model and target chains by providing a dictionary. key: Name of chain in model (respective target chain is extracted from chain_mapping), value: Alignment with first sequence corresponding to target chain and second sequence to model chain. There is NO reference sequence involved, so the two sequences MUST exactly match the actual residues observed in the respective target/model chains (ATOMSEQ).check_resnames (
bool
) – On by default. Enforces residue name matches between mapped model and target residues.add_mdl_contacts (
bool
) – Adds model contacts - Only using contacts that are within a certain distance threshold in the target does not penalize for added model contacts. If set to True, this flag will also consider target contacts that are within the specified distance threshold in the model but not necessarily in the target. No contact will be added if the respective atom pair is not resolved in the target.interaction_data (
tuple
) – Pro param - don’t use
- Returns:
global and per-residue DRMSD scores as a tuple - first element is global DRMSD score (None if target has no contacts) and second element a list of per-residue scores with length len(model.residues). None is assigned to residues that are not covered by target. If a residue is covered but has no contacts in target, None is assigned.
- GetNChainContacts(target_chain, no_interchain=False)¶
Returns number of contacts expected for a certain chain in target
- Parameters:
target_chain (
str
) – Chain in target for which you want the number of expected contactsno_interchain (
bool
) – Whether to exclude interchain contacts
- Raises:
RuntimeError
if specified chain doesnt exist
- lDDT(model, thresholds=[0.5, 1.0, 2.0, 4.0], local_lddt_prop=None, local_contact_prop=None, chain_mapping=None, no_interchain=False, no_intrachain=False, penalize_extra_chains=False, residue_mapping=None, return_dist_test=False, check_resnames=True, add_mdl_contacts=False, interaction_data=None, set_atom_props=False)¶
Computes LDDT of model - globally and per-residue
- Parameters:
model (
ost.mol.EntityHandle
/ost.mol.EntityView
) – Model to be scored - models are preferably scored upon performing stereo-chemistry checks in order to punish for non-sensical irregularities. This must be done separately as a pre-processing step. Target contacts that are not covered by model are considered not conserved, thus decreasing LDDT score. This also includes missing model chains or model chains for which no mapping is provided in chain_mapping.thresholds (
list
offloats
) – Thresholds of distance differences to be considered as correct - see docs in constructor for more info. default: [0.5, 1.0, 2.0, 4.0]local_lddt_prop (
str
) – If set, per-residue scores will be assigned as generic float property of that namelocal_contact_prop (
str
) – If set, number of expected contacts as well as number of conserved contacts will be assigned as generic int property. Excected contacts will be set as <local_contact_prop>_exp, conserved contacts as <local_contact_prop>_cons. Values are summed over all thresholds.chain_mapping (
dict
withstr
as keys/values) – Mapping of model chains (key) onto target chains (value). This is required if target or model have more than one chain.no_interchain (
bool
) – Whether to exclude interchain contactsno_intrachain (
bool
) – Whether to exclude intrachain contacts (i.e. only consider interface related contacts)penalize_extra_chains (
bool
) – Whether to include a fixed penalty for additional chains in the model that are not mapped to the target. ONLY AFFECTS RETURNED GLOBAL SCORE. In detail: adds the number of intra-chain contacts of each extra chain to the expected contacts, thus adding a penalty.residue_mapping (
dict
with key:str
, value:ost.seq.AlignmentHandle
) – By default, residue mapping is based on residue numbers. That means, a model chain and the respective target chain map to the same underlying reference sequence (SEQRES). Alternatively, you can specify one or several alignment(s) between model and target chains by providing a dictionary. key: Name of chain in model (respective target chain is extracted from chain_mapping), value: Alignment with first sequence corresponding to target chain and second sequence to model chain. There is NO reference sequence involved, so the two sequences MUST exactly match the actual residues observed in the respective target/model chains (ATOMSEQ).return_dist_test – Whether to additionally return the underlying per-residue data for the distance difference test. Adds five objects to the return tuple. First: Number of total contacts summed over all thresholds Second: Number of conserved contacts summed over all thresholds Third: list with length of scored residues. Contains indices referring to model.residues. Fourth: numpy array of size len(scored_residues) containing the number of total contacts, Fifth: numpy matrix of shape (len(scored_residues), len(thresholds)) specifying how many for each threshold are conserved.
check_resnames (
bool
) – On by default. Enforces residue name matches between mapped model and target residues.add_mdl_contacts (
bool
) – Adds model contacts - Only using contacts that are within a certain distance threshold in the target does not penalize for added model contacts. If set to True, this flag will also consider target contacts that are within the specified distance threshold in the model but not necessarily in the target. No contact will be added if the respective atom pair is not resolved in the target.interaction_data (
tuple
) – Pro param - don’t useset_atom_props (
bool
) – If True, sets generic properties on a per atom level if local_lddt_prop/local_contact_prop are set as well. In other words: this is the only way you can get per-atom LDDT values.
- Returns:
global and per-residue LDDT scores as a tuple - first element is global LDDT score (None if target has no contacts) and second element a list of per-residue scores with length len(model.residues). None is assigned to residues that are not covered by target. If a residue is covered but has no contacts in target, 0.0 is assigned.
- class SymmetrySettings¶
Container for symmetric compounds
LDDT considers symmetries and selects the one resulting in the highest possible score.
A symmetry is defined as a renaming operation on one or more atoms that leads to a chemically equivalent residue. Example would be OD1 and OD2 in ASP => renaming OD1 to OD2 and vice versa gives a chemically equivalent residue.
Use
AddSymmetricCompound()
to define a symmetry which can then directly be accessed through the symmetric_compounds member.- AddSymmetricCompound(name, symmetric_atoms)¶
Adds symmetry for compound with name
- Parameters:
name (
str
) – Name of compound with symmetrysymmetric_atoms (
list
oftuple
) – Pairs of atom names that define renaming operation, i.e. after applying all switches defined in the tuples, the resulting residue should be chemically equivalent. Atom names must refer to the PDB component dictionary.
- GetDefaultSymmetrySettings()¶
Constructs and returns
SymmetrySettings
object for natural amino acids
- class CustomCompound(atom_names)¶
Defines atoms for custom compounds
LDDT requires the reference atoms of a compound which are typically extracted from a
ost.conop.CompoundLib
. This lightweight container allows to handle arbitrary compounds which are not necessarily in the compound library.- Parameters:
atom_names (
list
ofstr
) – Names of atoms of custom compound
- static FromResidue(res)¶
Construct custom compound from residue
- Parameters:
res (
ost.mol.ResidueView
/ost.mol.ResidueHandle
) – Residue from which reference atom names are extracted, hydrogen/deuterium atoms are filtered out- Returns:
- class lDDTSettings(radius=15, sequence_separation=0, cutoffs=(0.5, 1.0, 2.0, 4.0), label='locallddt')¶
Object containing the settings used for LDDT calculations.
- Parameters:
radius – Sets
radius
.sequence_separation – Sets
sequence_separation
.cutoffs – Sets
cutoffs
.label – Sets
label
.
- radius¶
Distance inclusion radius.
- Type:
float
- sequence_separation¶
Sequence separation.
- Type:
int
- cutoffs¶
List of thresholds used to determine distance conservation.
- Type:
list
offloat
- label¶
The base name for the ResidueHandle properties that store the local scores.
- Type:
str
- PrintParameters()¶
Print settings.
- ToString()¶
- Returns:
String representation of the lDDTSettings object.
- Return type:
str