Quaternary Structure (QS) scores (deprecated)

qsscoring – Quaternary Structure (QS) scores

Scoring of quaternary structures (QS). The QS scoring is according to the paper by Bertoni et al..

Warning

The qsscoring module is deprecated. Consider using the newer implementation in qsscore instead.

Note

Requirements for use:

  • A default compound library must be defined and accessible via GetDefaultLib(). This is set by default when executing scripts with ost. Otherwise, you must set this with SetDefaultLib().

  • ClustalW must be installed (unless you provide chain mappings)

  • Python modules numpy and scipy must be installed and available (e.g. use pip install scipy numpy)

FilterContacts(contacts, chain_names)

Filter contacts to contain only contacts for chains in chain_names.

Parameters:
  • contacts (dict) – Connectivity dictionary as produced by GetContacts().

  • chain_names (list or (better) set) – Chain names to keep.

Returns:

New connectivity dictionary (format as in GetContacts())

Return type:

dict

GetContacts(entity, calpha_only, dist_thr=12.0)

Get inter-chain contacts of a macromolecular entity.

Contacts are pairs of residues within a given distance belonging to different chains. They are stored once per pair and include the CA/CB-CA/CB distance.

Parameters:
  • entity (EntityHandle or EntityView) – An entity to check connectivity for.

  • calpha_only (bool) – If True, we only consider CA-CA distances. Else, we use CB unless the residue is a GLY.

  • dist_thr (float) – Maximal CA/CB-CA/CB distance to be considered in contact.

Returns:

A connectivity dictionary. A pair of residues with chain names ch_name1 & ch_name2 (ch_name1 < ch_name2), residue numbers res_num1 & res_num2 and distance dist (<= dist_thr) are stored as result[ch_name1][ch_name2][res_num1][res_num2] = dist.

Return type:

dict

class MappedLDDTScorer(alignment, calpha_only, settings)

A simple class to calculate a single-chain lDDT score on a given chain to chain mapping as extracted from OligoLDDTScorer.

Parameters:
alignment

Alignment with two sequences named according to the mapped chains and with views attached to both sequences (e.g. one of the items of QSscorer.alignments).

The first sequence is assumed to be the reference and the second one the model. Since the lDDT score is not symmetric (extra residues in model are ignored), the order is important.

Type:

AlignmentHandle

calpha_only

If True, restricts lDDT score to CA only.

Type:

bool

settings

Settings to use for lDDT scoring.

Type:

lDDTSettings

lddt_scorer

lDDT Scorer object for the given chains.

Type:

lDDTScorer

reference_chain_name

Chain name of the reference.

Type:

str

model_chain_name

Chain name of the model.

Type:

str

GetPerResidueScores()
Returns:

Scores for each residue

Return type:

list of dict with one item for each residue existing in model and reference:

  • ”residue_number”: Residue number in reference chain

  • ”residue_name”: Residue name in reference chain

  • ”lddt”: local lDDT

  • ”conserved_contacts”: number of conserved contacts

  • ”total_contacts”: total number of contacts

class OligoLDDTScorer(ref, mdl, alignments, calpha_only, settings, penalize_extra_chains=False, chem_mapping=None)

Helper class to calculate oligomeric lDDT scores.

This class can be used independently, but commonly it will be created by calling QSscorer.GetOligoLDDTScorer().

Note

By construction, lDDT scores are not symmetric and hence it matters which structure is the reference (ref) and which one is the model (mdl). Extra residues in the model are generally not considered. Extra chains in both model and reference can be considered by setting the penalize_extra_chains flag to True.

Parameters:
ref
mdl

Full reference/model entity to be scored. The entity must contain all chains mapped in alignments and may also contain additional ones which are considered if penalize_extra_chains is True.

Type:

EntityHandle

alignments

One alignment for each mapped chain of ref/mdl as defined in QSscorer.alignments. The first sequence of each alignment belongs to ref and the second one to mdl. Sequences must have sequence naming and attached views as defined in QSscorer.alignments.

Type:

list of AlignmentHandle

calpha_only

If True, restricts lDDT score to CA only.

Type:

bool

settings

Settings to use for lDDT scoring.

Type:

lDDTSettings

penalize_extra_chains

If True, extra chains in both ref and mdl will penalize the lDDT scores.

Type:

bool

chem_mapping

Inter-complex mapping of chemical groups as defined in QSscorer.chem_mapping. Used to find “chem-mapped” chains in ref for unmapped chains in mdl when penalizing scores. Each unmapped model chain can add extra reference-contacts according to the average total contacts of each single “chem-mapped” reference chain. If there is no “chem-mapped” reference chain, a warning is shown and the model chain is ignored.

Only relevant if penalize_extra_chains is True.

Type:

dict with key = tuple of chain names in ref and value = tuple of chain names in mdl.

property lddt_mdl

The model entity used for oligomeric lDDT scoring (oligo_lddt / oligo_lddt_scorer).

Like lddt_ref, this is a single chain X containing all chains of mdl. The residue numbers match the ones in lddt_ref where aligned and have unique numbers for additional residues.

Getter:

Computed on first use (cached)

Type:

EntityHandle

property lddt_ref

The reference entity used for oligomeric lDDT scoring (oligo_lddt / oligo_lddt_scorer).

Since the lDDT computation requires a single chain with mapped residue numbering, all chains of ref are appended into a single chain X with unique residue numbers according to the column-index in the alignment. The alignments are in the same order as they appear in alignments. Additional residues are appended at the end of the chain with unique residue numbers. Unmapped chains are only added if penalize_extra_chains is True. Only CA atoms are considered if calpha_only is True.

Getter:

Computed on first use (cached)

Type:

EntityHandle

property mapped_lddt_scorers

List of scorer objects for each chain mapped in alignments.

Getter:

Computed on first use (cached)

Type:

list of MappedLDDTScorer

property oligo_lddt

Oligomeric lDDT score.

The score is computed as conserved contacts divided by the total contacts in the reference using the oligo_lddt_scorer, which uses the full complex as reference/model structure. If penalize_extra_chains is True, the reference/model complexes contain all chains (otherwise only the mapped ones) and additional contacts are added to the reference’s total contacts for unmapped model chains according to the chem_mapping.

The main difference with weighted_lddt is that the lDDT scorer “sees” the full complex here (incl. inter-chain contacts), while the weighted single chain score looks at each chain separately.

Getter:

Computed on first use (cached)

Type:

float

property oligo_lddt_scorer

lDDT Scorer object for lddt_ref and lddt_mdl.

Getter:

Computed on first use (cached)

Type:

lDDTScorer

property sc_lddt

List of global scores extracted from sc_lddt_scorers.

If scoring for a mapped chain fails, an error is displayed and a score of 0 is assigned.

Getter:

Computed on first use (cached)

Type:

list of float

property sc_lddt_scorers

List of lDDT scorer objects extracted from mapped_lddt_scorers.

Type:

list of lDDTScorer

property weighted_lddt

Weighted average of single chain lDDT scores.

The score is computed as a weighted average of single chain lDDT scores (see sc_lddt_scorers) using the total contacts of each single reference chain as weights. If penalize_extra_chains is True, unmapped chains are added with a 0 score and total contacts taken from the actual reference chains or (for unmapped model chains) using the chem_mapping.

See oligo_lddt for a comparison of the two scores.

Getter:

Computed on first use (cached)

Type:

float

class QSscoreEntity(ent)

Entity with cached entries for QS scoring.

Any known / precomputed information can be filled into the appropriate attribute here as long as they are labelled as read/write. Otherwise the quantities are computed on first access and cached (lazy evaluation). The heaviest load is expected when computing contacts and contacts_ca.

Parameters:

ent (EntityHandle or EntityView) – Entity to be used for QS scoring. A copy of it will be processed.

is_valid

True, if successfully initialized. False, if input structure has no protein chains with >= 20 residues.

Type:

bool

original_name

Name set for ent when object was created.

Type:

str

ent

Cleaned version of ent passed at construction. Hydrogens are removed, the entity is processed with a RuleBasedProcessor and chains listed in removed_chains have been removed. The name of this entity might change during scoring (see GetName()). Otherwise, this will be fixed.

Type:

EntityHandle

removed_chains

Chains removed from ent passed at construction. These are ligand and water chains as well as small (< 20 res.) peptides or chains with no amino acids (determined by chem. type, which is set by rule based processor).

Type:

list of str

calpha_only

Whether entity is CA-only (i.e. it has 0 CB atoms)

Type:

bool

GetAlignment(c1, c2)

Get sequence alignment of chain c1 with chain c2. Computed on first use based on ca_chains (cached).

Parameters:
  • c1 (str) – Chain name for first chain to align.

  • c2 (str) – Chain name for second chain to align.

Return type:

AlignmentHandle or None if it failed.

GetAngles(c1, c2)

Get Euler angles from superposition of chain c1 with chain c2. Computed on first use based on ca_chains (cached).

Parameters:
  • c1 (str) – Chain name for first chain to superpose.

  • c2 (str) – Chain name for second chain to superpose.

Returns:

3 Euler angles (may contain nan if something fails).

Return type:

numpy.array

GetAxis(c1, c2)

Get axis of symmetry from superposition of chain c1 with chain c2. Computed on first use based on ca_chains (cached).

Parameters:
  • c1 (str) – Chain name for first chain to superpose.

  • c2 (str) – Chain name for second chain to superpose.

Returns:

Rotational axis (may contain nan if something fails).

Return type:

numpy.array

GetName()

Wrapper to GetName() of ent. This is used to uniquely identify the entity while scoring. The name may therefore change while original_name remains fixed.

SetName(new_name)

Wrapper to SetName() of ent. Use this to change unique identifier while scoring (see GetName()).

property ca_chains

Map of chain names in ent to sequences with attached view to CA-only chains (into ca_entity). Useful for alignments and superpositions.

Getter:

Computed on first use (cached)

Type:

dict (key = str, value = SequenceHandle)

property ca_entity

Reduced representation of ent with only CA atoms. This guarantees that each included residue has exactly one atom.

Getter:

Computed on first use (cached)

Type:

EntityHandle

property chem_groups

Intra-complex group of chemically identical (seq. id. > 95%) polypeptide chains as extracted from ca_chains. First chain in group is the one with the longest sequence.

Getter:

Computed on first use (cached)

Type:

list of list of str (chain names)

property contacts

Connectivity dictionary (read/write). As given by GetContacts() with calpha_only = False on ent.

Getter:

Computed on first use (cached)

Setter:

Uses FilterContacts() to ensure that we only keep contacts for chains in the cleaned entity.

Type:

See return type of GetContacts()

property contacts_ca

CA-only connectivity dictionary (read/write). Like contacts but with calpha_only = True in GetContacts().

exception QSscoreError

Exception to be raised for “acceptable” exceptions in QS scoring.

Those are cases we might want to capture for default behavior.

class QSscorer(ent_1, ent_2, res_num_alignment=False)

Object to compute QS scores.

Simple usage without any precomputed contacts, symmetries and mappings:

import ost
from ost.mol.alg import qsscoring

# load two biounits to compare
ent_full = ost.io.LoadPDB('3ia3', remote=True)
ent_1 = ent_full.Select('cname=A,D')
ent_2 = ent_full.Select('cname=B,C')
# get score
ost.PushVerbosityLevel(3)
try:
  qs_scorer = qsscoring.QSscorer(ent_1, ent_2)
  ost.LogScript('QSscore:', str(qs_scorer.global_score))
  ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping))
  # commonly you want the QS global score as output
  qs_score = qs_scorer.global_score
except qsscoring.QSscoreError as ex:
  # default handling: report failure and set score to 0
  ost.LogError('QSscore failed:', str(ex))
  qs_score = 0

For maximal performance when computing QS scores of the same entity with many others, it is advisable to construct and reuse QSscoreEntity objects.

Any known / precomputed information can be filled into the appropriate attribute here (no checks done!). Otherwise most quantities are computed on first access and cached (lazy evaluation). Setters are provided to set values with extra checks (e.g. SetSymmetries()).

All necessary seq. alignments are done by global BLOSUM62-based alignment. A multiple sequence alignment is performed with ClustalW unless chain_mapping is provided manually. You will need to have an executable clustalw or clustalw2 in your PATH or you must set clustalw_bin accordingly. Otherwise an exception (ost.settings.FileNotFound) is thrown.

Formulas for QS scores:

- QS_best = weighted_scores / (weight_sum + weight_extra_mapped)
- QS_global = weighted_scores / (weight_sum + weight_extra_all)
-> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared
-> weight_sum = sum(w(min(d1,d2))) for shared
-> weight_extra_mapped = sum(w(d)) for all mapped but non-shared
-> weight_extra_all = sum(w(d)) for all non-shared
-> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else

In the formulas above:

  • “d”: CA/CB-CA/CB distance of an “inter-chain contact” (“d1”, “d2” for “shared” contacts).

  • “mapped”: we could map chains of two structures and align residues in alignments.

  • “shared”: pairs of residues which are “mapped” and have “inter-chain contact” in both structures.

  • “inter-chain contact”: CB-CB pairs (CA for GLY) with distance <= 12 A (fallback to CA-CA if calpha_only is True).

  • “w(d)”: weighting function (prob. of 2 res. to interact given CB distance) from Xu et al. 2009.

Parameters:
Raises:

QSscoreError if input structures are invalid or are monomers or have issues that make it impossible for a QS score to be computed.

qs_ent_1

QSscoreEntity object for ent_1 given at construction. If entity names (original_name) are not unique, we set it to ‘pdb_1’ using SetName().

qs_ent_2

QSscoreEntity object for ent_2 given at construction. If entity names (original_name) are not unique, we set it to ‘pdb_2’ using SetName().

calpha_only

True if any of the two structures is CA-only (after cleanup).

Type:

bool

max_ca_per_chain_for_cm

Maximal number of CA atoms to use in each chain to determine chain mappings. Setting this to -1 disables the limit. Limiting it speeds up determination of symmetries and chain mappings. By default it is set to 100.

Type:

int

max_mappings_extensive

Maximal number of chain mappings to test for ‘extensive’ chain_mapping_scheme. The extensive chain mapping search must in the worst case check O(N^2) * O(N!) possible mappings for complexes with N chains. Two octamers without symmetry would require 322560 mappings to be checked. To limit computations, a QSscoreError is thrown if we try more than the maximal number of chain mappings. The value must be set before the first use of chain_mapping. By default it is set to 100000.

Type:

int

res_num_alignment

Forces each alignment in alignments to be based on residue numbers instead of using a global BLOSUM62-based alignment.

Type:

bool

GetOligoLDDTScorer(settings, penalize_extra_chains=True)
Returns:

OligoLDDTScorer object, setup for this QS scoring problem. The scorer is set up with qs_ent_1 as the reference and qs_ent_2 as the model.

Parameters:
SetSymmetries(symm_1, symm_2)

Set user-provided symmetry groups.

These groups are restricted to chain names appearing in ent_to_cm_1 and ent_to_cm_2 respectively. They are only valid if they cover all chains and both symm_1 and symm_2 have same lengths of symmetry group tuples. Otherwise trivial symmetry group used (see symm_1).

Parameters:
  • symm_1 – Value to set for symm_1.

  • symm_2 – Value to set for symm_2.

property alignments

List of successful sequence alignments using chain_mapping.

There will be one alignment for each mapped chain and they are ordered by their chain names in qs_ent_1.

The first sequence of each alignment belongs to qs_ent_1 and the second one to qs_ent_2. The sequences are named according to the mapped chain names and have views attached into QSscoreEntity.ent of qs_ent_1 and qs_ent_2.

If res_num_alignment is False, each alignment is performed using a global BLOSUM62-based alignment. Otherwise, the positions in the alignment sequences are simply given by the residue number so that residues with matching numbers are aligned.

Getter:

Computed on first use (cached)

Type:

list of AlignmentHandle

property best_score

QS-score without penalties.

Like global_score, but neglecting additional residues or chains in one of the biounits (i.e. the score is calculated considering only mapped chains and residues).

Getter:

Computed on first use (cached)

Type:

float

Raises:

QSscoreError if only one chain is mapped

property chain_mapping

Mapping from ent_to_cm_1 to ent_to_cm_2.

Properties:

  • Mapping is between chains of same chem. group (see chem_mapping)

  • Each chain can appear only once in mapping

  • All chains of complex with less chains are mapped

  • Symmetry (symm_1, symm_2) is taken into account

Details on algorithms used to find mapping:

  • We try all pairs of chem. mapped chains within symmetry group and get superpose-transformation for them

  • First option: check for “sufficient overlap” of other chain-pairs

    • For each chain-pair defined above: apply superposition to full oligomer and map chains based on structural overlap

    • Structural overlap = X% of residues in second oligomer covered within Y Angstrom of a (chem. mapped) chain in first oligomer. We successively try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in mapping (warning shown for most permissive one).

    • If multiple possible mappings are found, we choose the one which leads to the lowest multi-chain-RMSD given the superposition

  • Fallback option: try all mappings to find minimal multi-chain-RMSD (warning shown)

    • For each chain-pair defined above: apply superposition, try all (!) possible chain mappings (within symmetry group) and keep mapping with lowest multi-chain-RMSD

    • Repeat procedure above to resolve symmetry. Within the symmetry group we can use the chain mapping computed before and we just need to find which symmetry group in first oligomer maps to which in the second one. We again try all possible combinations…

    • Limitations:

      • Trying all possible mappings is a combinatorial nightmare (factorial). We throw an exception if too many combinations (e.g. octomer vs octomer with no usable symmetry)

      • The mapping is forced: the “best” mapping will be chosen independently of how badly they fit in terms of multi-chain-RMSD

      • As a result, such a forced mapping can lead to a large range of resulting QS scores. An extreme example was observed between 1on3.1 and 3u9r.1, where global_score can range from 0.12 to 0.43 for mappings with very similar multi-chain-RMSD.

Getter:

Computed on first use (cached)

Type:

dict with key / value = str (chain names, key for ent_to_cm_1, value for ent_to_cm_2)

Raises:

QSscoreError if there are too many combinations to check to find a chain mapping (see max_mappings_extensive).

property chain_mapping_scheme

Mapping scheme used to get chain_mapping.

Possible values:

  • ‘strict’: 80% overlap needed within 4 Angstrom (overlap based mapping).

  • ‘tolerant’: 40% overlap needed within 6 Angstrom (overlap based mapping).

  • ‘permissive’: 20% overlap needed within 8 Angstrom (overlap based mapping). It’s best if you check mapping manually!

  • ‘extensive’: Extensive search used for mapping detection (fallback). This approach has known limitations and may be removed in future versions. Mapping should be checked manually!

  • ‘user’: chain_mapping was set by user before first use of this attribute.

Getter:

Computed with chain_mapping on first use (cached)

Type:

str

Raises:

QSscoreError as in chain_mapping.

property chem_mapping

Inter-complex mapping of chemical groups.

Each group (see QSscoreEntity.chem_groups) is mapped according to highest sequence identity. Alignment is between longest sequences in groups.

Limitations:

  • If different numbers of groups, we map only the groups for the complex with less groups (rest considered unmapped and shown as warning)

  • The mapping is forced: the “best” mapping will be chosen independently of how low the seq. identity may be

Getter:

Computed on first use (cached)

Type:

dict with key = tuple of chain names in qs_ent_1 and value = tuple of chain names in qs_ent_2.

Raises:

QSscoreError if we end up having no chains for either entity in the mapping (can happen if chains do not have CA atoms).

property clustalw_bin

Full path to clustalw or clustalw2 executable to use for multiple sequence alignments (unless chain_mapping is provided manually).

Getter:

Located in path on first use (cached)

Type:

str

property ent_to_cm_1

Subset of qs_ent_1 used to compute chain mapping and symmetries.

Properties:

  • Includes only residues aligned according to chem_mapping

  • Includes only 1 CA atom per residue

  • Has at least 5 and at most max_ca_per_chain_for_cm atoms per chain

  • All chains of the same chemical group have the same number of atoms (also in ent_to_cm_2 according to chem_mapping)

  • All chains appearing in chem_mapping appear in this entity (so the two can be safely used together)

This entity might be transformed (i.e. all positions rotated/translated by same transformation matrix) if this can speed up computations. So do not assume fixed global positions (but relative distances will remain fixed).

Getter:

Computed on first use (cached)

Type:

EntityHandle

Raises:

QSscoreError if any chain ends up having less than 5 res.

property ent_to_cm_2

Subset of qs_ent_1 used to compute chain mapping and symmetries (see ent_to_cm_1 for details).

property global_score

QS-score with penalties.

The range of the score is between 0 (i.e. no interface residues are shared between biounits) and 1 (i.e. the interfaces are identical).

The global QS-score is computed applying penalties when interface residues or entire chains are missing (i.e. anything that is not mapped in mapped_residues / chain_mapping) in one of the biounits.

Getter:

Computed on first use (cached)

Type:

float

Raises:

QSscoreError if only one chain is mapped

property mapped_residues

Mapping of shared residues in alignments.

Getter:

Computed on first use (cached)

Type:

dict mapped_residues[c1][r1] = r2 with: c1 = Chain name in first entity (= first sequence in aln), r1 = Residue number in first entity, r2 = Residue number in second entity

property superposition

Superposition result based on shared CA atoms in alignments.

The superposition can be used to map QSscoreEntity.ent of qs_ent_1 onto the one of qs_ent_2. Use ost.geom.Invert() if you need the opposite transformation.

Getter:

Computed on first use (cached)

Type:

ost.mol.alg.SuperpositionResult

property symm_1

Symmetry groups for qs_ent_1 used to speed up chain mapping.

This is a list of chain-lists where each chain-list can be used reconstruct the others via cyclic C or dihedral D symmetry. The first chain-list is used as a representative symmetry group. For heteromers, the group-members must contain all different seqres in oligomer.

Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).

Properties:

  • All symmetry group tuples have the same length (num. of chains)

  • All chains in ent_to_cm_1 appear (w/o duplicates)

  • For heteros: symmetry group tuples have all different chem. groups

  • Trivial symmetry group = one tuple with all chains (used if inconsistent data provided or if no symmetry is found)

  • Either compatible to symm_2 or trivial symmetry groups used. Compatibility requires same lengths of symmetry group tuples and it must be possible to get an overlap (80% of residues covered within 6 A of a (chem. mapped) chain) of all chains in representative symmetry groups by superposing one pair of chains.

Getter:

Computed on first use (cached)

Type:

list of tuple of str (chain names)

property symm_2

Symmetry groups for qs_ent_2 (see symm_1 for details).