OpenStructure
Loading...
Searching...
No Matches
Public Member Functions | Data Fields | Protected Member Functions | Protected Attributes
QSscorer Class Reference

Public Member Functions

 __init__ (self, ent_1, ent_2, res_num_alignment=False)
 
 chem_mapping (self)
 
 chem_mapping (self, chem_mapping)
 
 ent_to_cm_1 (self)
 
 ent_to_cm_1 (self, ent_to_cm_1)
 
 ent_to_cm_2 (self)
 
 ent_to_cm_2 (self, ent_to_cm_2)
 
 symm_1 (self)
 
 symm_2 (self)
 
 SetSymmetries (self, symm_1, symm_2)
 
 chain_mapping (self)
 
 chain_mapping (self, chain_mapping)
 
 chain_mapping_scheme (self)
 
 alignments (self)
 
 alignments (self, alignments)
 
 mapped_residues (self)
 
 mapped_residues (self, mapped_residues)
 
 global_score (self)
 
 best_score (self)
 
 superposition (self)
 
 clustalw_bin (self)
 
 clustalw_bin (self, clustalw_bin)
 
 GetOligoLDDTScorer (self, settings, penalize_extra_chains=True)
 

Data Fields

 qs_ent_1
 
 qs_ent_2
 
 res_num_alignment
 
 calpha_only
 
 max_ca_per_chain_for_cm
 
 max_mappings_extensive
 
 ent_to_cm_1
 
 ent_to_cm_2
 
 symm_1
 
 symm_2
 
 chem_mapping
 
 chain_mapping
 
 alignments
 
 clustalw_bin
 

Protected Member Functions

 _ComputeAlignedEntities (self)
 
 _ComputeSymmetry (self)
 
 _ComputeScores (self)
 

Protected Attributes

 _chem_mapping
 
 _ent_to_cm_1
 
 _ent_to_cm_2
 
 _symm_1
 
 _symm_2
 
 _chain_mapping
 
 _chain_mapping_scheme
 
 _alignments
 
 _mapped_residues
 
 _global_score
 
 _best_score
 
 _superposition
 
 _clustalw_bin
 

Detailed Description

Object to compute QS scores.

Simple usage without any precomputed contacts, symmetries and mappings:

.. code-block:: python

  import ost
  from ost.mol.alg import qsscoring

  # load two biounits to compare
  ent_full = ost.io.LoadPDB('3ia3', remote=True)
  ent_1 = ent_full.Select('cname=A,D')
  ent_2 = ent_full.Select('cname=B,C')
  # get score
  ost.PushVerbosityLevel(3)
  try:
    qs_scorer = qsscoring.QSscorer(ent_1, ent_2)
    ost.LogScript('QSscore:', str(qs_scorer.global_score))
    ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping))
    # commonly you want the QS global score as output
    qs_score = qs_scorer.global_score
  except qsscoring.QSscoreError as ex:
    # default handling: report failure and set score to 0
    ost.LogError('QSscore failed:', str(ex))
    qs_score = 0

For maximal performance when computing QS scores of the same entity with many
others, it is advisable to construct and reuse :class:`QSscoreEntity` objects.

Any known / precomputed information can be filled into the appropriate
attribute here (no checks done!). Otherwise most quantities are computed on
first access and cached (lazy evaluation). Setters are provided to set values
with extra checks (e.g. :func:`SetSymmetries`).

All necessary seq. alignments are done by global BLOSUM62-based alignment. A
multiple sequence alignment is performed with ClustalW unless
:attr:`chain_mapping` is provided manually. You will need to have an
executable ``clustalw`` or ``clustalw2`` in your ``PATH`` or you must set
:attr:`clustalw_bin` accordingly. Otherwise an exception
(:class:`ost.settings.FileNotFound`) is thrown.

Formulas for QS scores:

::

  - QS_best = weighted_scores / (weight_sum + weight_extra_mapped)
  - QS_global = weighted_scores / (weight_sum + weight_extra_all)
  -> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared
  -> weight_sum = sum(w(min(d1,d2))) for shared
  -> weight_extra_mapped = sum(w(d)) for all mapped but non-shared
  -> weight_extra_all = sum(w(d)) for all non-shared
  -> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else

In the formulas above:

* "d": CA/CB-CA/CB distance of an "inter-chain contact" ("d1", "d2" for
  "shared" contacts).
* "mapped": we could map chains of two structures and align residues in
  :attr:`alignments`.
* "shared": pairs of residues which are "mapped" and have
  "inter-chain contact" in both structures.
* "inter-chain contact": CB-CB pairs (CA for GLY) with distance <= 12 A
  (fallback to CA-CA if :attr:`calpha_only` is True).
* "w(d)": weighting function (prob. of 2 res. to interact given CB distance)
  from `Xu et al. 2009 <https://dx.doi.org/10.1016%2Fj.jmb.2008.06.002>`_.

:param ent_1: First structure to be scored.
:type ent_1:  :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
              :class:`~ost.mol.EntityView`
:param ent_2: Second structure to be scored.
:type ent_2:  :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
              :class:`~ost.mol.EntityView`
:param res_num_alignment: Sets :attr:`res_num_alignment`

:raises: :class:`QSscoreError` if input structures are invalid or are monomers
         or have issues that make it impossible for a QS score to be computed.

.. attribute:: qs_ent_1

  :class:`QSscoreEntity` object for *ent_1* given at construction.
  If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
  set it to 'pdb_1' using :func:`~QSscoreEntity.SetName`.

.. attribute:: qs_ent_2

  :class:`QSscoreEntity` object for *ent_2* given at construction.
  If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
  set it to 'pdb_2' using :func:`~QSscoreEntity.SetName`.

.. attribute:: calpha_only

  True if any of the two structures is CA-only (after cleanup).

  :type: :class:`bool`

.. attribute:: max_ca_per_chain_for_cm

  Maximal number of CA atoms to use in each chain to determine chain mappings.
  Setting this to -1 disables the limit. Limiting it speeds up determination
  of symmetries and chain mappings. By default it is set to 100.

  :type: :class:`int`

.. attribute:: max_mappings_extensive

  Maximal number of chain mappings to test for 'extensive'
  :attr:`chain_mapping_scheme`. The extensive chain mapping search must in the
  worst case check O(N^2) * O(N!) possible mappings for complexes with N
  chains. Two octamers without symmetry would require 322560 mappings to be
  checked. To limit computations, a :class:`QSscoreError` is thrown if we try
  more than the maximal number of chain mappings.
  The value must be set before the first use of :attr:`chain_mapping`.
  By default it is set to 100000.

  :type: :class:`int`

.. attribute:: res_num_alignment

  Forces each alignment in :attr:`alignments` to be based on residue numbers
  instead of using a global BLOSUM62-based alignment.

  :type: :class:`bool`

Definition at line 46 of file qsscoring.py.

Constructor & Destructor Documentation

◆ __init__()

__init__ (   self,
  ent_1,
  ent_2,
  res_num_alignment = False 
)

Definition at line 170 of file qsscoring.py.

Member Function Documentation

◆ _ComputeAlignedEntities()

_ComputeAlignedEntities (   self)
protected

Class internal helpers (anything that doesnt easily work without this class)

Fills cached ent_to_cm_1 and ent_to_cm_2.

Definition at line 579 of file qsscoring.py.

◆ _ComputeScores()

_ComputeScores (   self)
protected
Fills cached global_score and best_score.

Definition at line 604 of file qsscoring.py.

◆ _ComputeSymmetry()

_ComputeSymmetry (   self)
protected
Fills cached symm_1 and symm_2.

Definition at line 593 of file qsscoring.py.

◆ alignments() [1/2]

alignments (   self)
List of successful sequence alignments using :attr:`chain_mapping`.

There will be one alignment for each mapped chain and they are ordered by
their chain names in :attr:`qs_ent_1`.

The first sequence of each alignment belongs to :attr:`qs_ent_1` and the
second one to :attr:`qs_ent_2`. The sequences are named according to the
mapped chain names and have views attached into :attr:`QSscoreEntity.ent`
of :attr:`qs_ent_1` and :attr:`qs_ent_2`.

If :attr:`res_num_alignment` is False, each alignment is performed using a
global BLOSUM62-based alignment. Otherwise, the positions in the alignment
sequences are simply given by the residue number so that residues with
matching numbers are aligned.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`~ost.seq.AlignmentHandle`

Definition at line 438 of file qsscoring.py.

◆ alignments() [2/2]

alignments (   self,
  alignments 
)

Definition at line 465 of file qsscoring.py.

◆ best_score()

best_score (   self)
QS-score without penalties.

Like :attr:`global_score`, but neglecting additional residues or chains in
one of the biounits (i.e. the score is calculated considering only mapped
chains and residues).

:getter: Computed on first use (cached)
:type: :class:`float`
:raises: :class:`QSscoreError` if only one chain is mapped

Definition at line 506 of file qsscoring.py.

◆ chain_mapping() [1/2]

chain_mapping (   self)
Mapping from :attr:`ent_to_cm_1` to :attr:`ent_to_cm_2`.

Properties:

- Mapping is between chains of same chem. group (see :attr:`chem_mapping`)
- Each chain can appear only once in mapping
- All chains of complex with less chains are mapped
- Symmetry (:attr:`symm_1`, :attr:`symm_2`) is taken into account

Details on algorithms used to find mapping:

- We try all pairs of chem. mapped chains within symmetry group and get
  superpose-transformation for them
- First option: check for "sufficient overlap" of other chain-pairs

  - For each chain-pair defined above: apply superposition to full oligomer
    and map chains based on structural overlap
  - Structural overlap = X% of residues in second oligomer covered within Y
    Angstrom of a (chem. mapped) chain in first oligomer. We successively
    try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in
    mapping (warning shown for most permissive one).
  - If multiple possible mappings are found, we choose the one which leads
    to the lowest multi-chain-RMSD given the superposition

- Fallback option: try all mappings to find minimal multi-chain-RMSD
  (warning shown)

  - For each chain-pair defined above: apply superposition, try all (!)
    possible chain mappings (within symmetry group) and keep mapping with
    lowest multi-chain-RMSD
  - Repeat procedure above to resolve symmetry. Within the symmetry group we
    can use the chain mapping computed before and we just need to find which
    symmetry group in first oligomer maps to which in the second one. We
    again try all possible combinations...
  - Limitations:
    
    - Trying all possible mappings is a combinatorial nightmare (factorial).
      We throw an exception if too many combinations (e.g. octomer vs
      octomer with no usable symmetry)
    - The mapping is forced: the "best" mapping will be chosen independently
      of how badly they fit in terms of multi-chain-RMSD
    - As a result, such a forced mapping can lead to a large range of
      resulting QS scores. An extreme example was observed between 1on3.1
      and 3u9r.1, where :attr:`global_score` can range from 0.12 to 0.43
      for mappings with very similar multi-chain-RMSD.

:getter: Computed on first use (cached)
:type: :class:`dict` with key / value = :class:`str` (chain names, key
       for :attr:`ent_to_cm_1`, value for :attr:`ent_to_cm_2`)
:raises: :class:`QSscoreError` if there are too many combinations to check
         to find a chain mapping (see :attr:`max_mappings_extensive`).

Definition at line 343 of file qsscoring.py.

◆ chain_mapping() [2/2]

chain_mapping (   self,
  chain_mapping 
)

Definition at line 405 of file qsscoring.py.

◆ chain_mapping_scheme()

chain_mapping_scheme (   self)
Mapping scheme used to get :attr:`chain_mapping`.

Possible values:

- 'strict': 80% overlap needed within 4 Angstrom (overlap based mapping).
- 'tolerant': 40% overlap needed within 6 Angstrom (overlap based mapping).
- 'permissive': 20% overlap needed within 8 Angstrom (overlap based
  mapping). It's best if you check mapping manually!
- 'extensive': Extensive search used for mapping detection (fallback). This
  approach has known limitations and may be removed in future versions.
  Mapping should be checked manually!
- 'user': :attr:`chain_mapping` was set by user before first use of this
  attribute.

:getter: Computed with :attr:`chain_mapping` on first use (cached)
:type: :class:`str`
:raises: :class:`QSscoreError` as in :attr:`chain_mapping`.

Definition at line 409 of file qsscoring.py.

◆ chem_mapping() [1/2]

chem_mapping (   self)
Inter-complex mapping of chemical groups.

Each group (see :attr:`QSscoreEntity.chem_groups`) is mapped according to
highest sequence identity. Alignment is between longest sequences in groups.

Limitations:

- If different numbers of groups, we map only the groups for the complex
  with less groups (rest considered unmapped and shown as warning)
- The mapping is forced: the "best" mapping will be chosen independently of
  how low the seq. identity may be

:getter: Computed on first use (cached)
:type: :class:`dict` with key = :class:`tuple` of chain names in
       :attr:`qs_ent_1` and value = :class:`tuple` of chain names in
       :attr:`qs_ent_2`.

:raises: :class:`QSscoreError` if we end up having no chains for either
         entity in the mapping (can happen if chains do not have CA atoms).

Definition at line 211 of file qsscoring.py.

◆ chem_mapping() [2/2]

chem_mapping (   self,
  chem_mapping 
)

Definition at line 237 of file qsscoring.py.

◆ clustalw_bin() [1/2]

clustalw_bin (   self)
Full path to ``clustalw`` or ``clustalw2`` executable to use for multiple
sequence alignments (unless :attr:`chain_mapping` is provided manually).

:getter: Located in path on first use (cached)
:type: :class:`str`

Definition at line 542 of file qsscoring.py.

◆ clustalw_bin() [2/2]

clustalw_bin (   self,
  clustalw_bin 
)

Definition at line 555 of file qsscoring.py.

◆ ent_to_cm_1() [1/2]

ent_to_cm_1 (   self)
Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries.

Properties:

- Includes only residues aligned according to :attr:`chem_mapping`
- Includes only 1 CA atom per residue
- Has at least 5 and at most :attr:`max_ca_per_chain_for_cm` atoms per chain
- All chains of the same chemical group have the same number of atoms
  (also in :attr:`ent_to_cm_2` according to :attr:`chem_mapping`)
- All chains appearing in :attr:`chem_mapping` appear in this entity
  (so the two can be safely used together)

This entity might be transformed (i.e. all positions rotated/translated by
same transformation matrix) if this can speed up computations. So do not
assume fixed global positions (but relative distances will remain fixed).

:getter: Computed on first use (cached)
:type: :class:`~ost.mol.EntityHandle`

:raises: :class:`QSscoreError` if any chain ends up having less than 5 res.

Definition at line 241 of file qsscoring.py.

◆ ent_to_cm_1() [2/2]

ent_to_cm_1 (   self,
  ent_to_cm_1 
)

Definition at line 268 of file qsscoring.py.

◆ ent_to_cm_2() [1/2]

ent_to_cm_2 (   self)
Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries
(see :attr:`ent_to_cm_1` for details).

Definition at line 272 of file qsscoring.py.

◆ ent_to_cm_2() [2/2]

ent_to_cm_2 (   self,
  ent_to_cm_2 
)

Definition at line 281 of file qsscoring.py.

◆ GetOligoLDDTScorer()

GetOligoLDDTScorer (   self,
  settings,
  penalize_extra_chains = True 
)
:return: :class:`OligoLDDTScorer` object, setup for this QS scoring problem.
         The scorer is set up with :attr:`qs_ent_1` as the reference and
         :attr:`qs_ent_2` as the model.
:param settings: Passed to :class:`OligoLDDTScorer` constructor.
:param penalize_extra_chains: Passed to :class:`OligoLDDTScorer` constructor.

Definition at line 558 of file qsscoring.py.

◆ global_score()

global_score (   self)
QS-score with penalties.

The range of the score is between 0 (i.e. no interface residues are shared
between biounits) and 1 (i.e. the interfaces are identical).

The global QS-score is computed applying penalties when interface residues
or entire chains are missing (i.e. anything that is not mapped in
:attr:`mapped_residues` / :attr:`chain_mapping`) in one of the biounits.

:getter: Computed on first use (cached)
:type: :class:`float`
:raises: :class:`QSscoreError` if only one chain is mapped

Definition at line 487 of file qsscoring.py.

◆ mapped_residues() [1/2]

mapped_residues (   self)
Mapping of shared residues in :attr:`alignments`.

:getter: Computed on first use (cached)
:type: :class:`dict` *mapped_residues[c1][r1] = r2* with:
       *c1* = Chain name in first entity (= first sequence in aln),
       *r1* = Residue number in first entity,
       *r2* = Residue number in second entity

Definition at line 469 of file qsscoring.py.

◆ mapped_residues() [2/2]

mapped_residues (   self,
  mapped_residues 
)

Definition at line 483 of file qsscoring.py.

◆ SetSymmetries()

SetSymmetries (   self,
  symm_1,
  symm_2 
)
Set user-provided symmetry groups.

These groups are restricted to chain names appearing in :attr:`ent_to_cm_1`
and :attr:`ent_to_cm_2` respectively. They are only valid if they cover all
chains and both *symm_1* and *symm_2* have same lengths of symmetry group
tuples. Otherwise trivial symmetry group used (see :attr:`symm_1`).

:param symm_1: Value to set for :attr:`symm_1`.
:param symm_2: Value to set for :attr:`symm_2`.

Definition at line 323 of file qsscoring.py.

◆ superposition()

superposition (   self)
Superposition result based on shared CA atoms in :attr:`alignments`.

The superposition can be used to map :attr:`QSscoreEntity.ent` of
:attr:`qs_ent_1` onto the one of :attr:`qs_ent_2`. Use
:func:`ost.geom.Invert` if you need the opposite transformation.

:getter: Computed on first use (cached)
:type: :class:`ost.mol.alg.SuperpositionResult`

Definition at line 522 of file qsscoring.py.

◆ symm_1()

symm_1 (   self)
Symmetry groups for :attr:`qs_ent_1` used to speed up chain mapping.

This is a list of chain-lists where each chain-list can be used reconstruct
the others via cyclic C or dihedral D symmetry. The first chain-list is used
as a representative symmetry group. For heteromers, the group-members must
contain all different seqres in oligomer.

Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are
symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).

Properties:

- All symmetry group tuples have the same length (num. of chains)
- All chains in :attr:`ent_to_cm_1` appear (w/o duplicates)
- For heteros: symmetry group tuples have all different chem. groups
- Trivial symmetry group = one tuple with all chains (used if inconsistent
  data provided or if no symmetry is found)
- Either compatible to :attr:`symm_2` or trivial symmetry groups used.
  Compatibility requires same lengths of symmetry group tuples and it must
  be possible to get an overlap (80% of residues covered within 6 A of a
  (chem. mapped) chain) of all chains in representative symmetry groups by
  superposing one pair of chains.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`tuple` of :class:`str` (chain names)

Definition at line 285 of file qsscoring.py.

◆ symm_2()

symm_2 (   self)
Symmetry groups for :attr:`qs_ent_2` (see :attr:`symm_1` for details).

Definition at line 317 of file qsscoring.py.

Field Documentation

◆ _alignments

_alignments
protected

Definition at line 203 of file qsscoring.py.

◆ _best_score

_best_score
protected

Definition at line 206 of file qsscoring.py.

◆ _chain_mapping

_chain_mapping
protected

Definition at line 201 of file qsscoring.py.

◆ _chain_mapping_scheme

_chain_mapping_scheme
protected

Definition at line 202 of file qsscoring.py.

◆ _chem_mapping

_chem_mapping
protected

Definition at line 196 of file qsscoring.py.

◆ _clustalw_bin

_clustalw_bin
protected

Definition at line 208 of file qsscoring.py.

◆ _ent_to_cm_1

_ent_to_cm_1
protected

Definition at line 197 of file qsscoring.py.

◆ _ent_to_cm_2

_ent_to_cm_2
protected

Definition at line 198 of file qsscoring.py.

◆ _global_score

_global_score
protected

Definition at line 205 of file qsscoring.py.

◆ _mapped_residues

_mapped_residues
protected

Definition at line 204 of file qsscoring.py.

◆ _superposition

_superposition
protected

Definition at line 207 of file qsscoring.py.

◆ _symm_1

_symm_1
protected

Definition at line 199 of file qsscoring.py.

◆ _symm_2

_symm_2
protected

Definition at line 200 of file qsscoring.py.

◆ alignments

alignments

Definition at line 568 of file qsscoring.py.

◆ calpha_only

calpha_only

Definition at line 192 of file qsscoring.py.

◆ chain_mapping

chain_mapping

Definition at line 460 of file qsscoring.py.

◆ chem_mapping

chem_mapping

Definition at line 399 of file qsscoring.py.

◆ clustalw_bin

clustalw_bin

Definition at line 585 of file qsscoring.py.

◆ ent_to_cm_1

ent_to_cm_1

Definition at line 398 of file qsscoring.py.

◆ ent_to_cm_2

ent_to_cm_2

Definition at line 398 of file qsscoring.py.

◆ max_ca_per_chain_for_cm

max_ca_per_chain_for_cm

Definition at line 193 of file qsscoring.py.

◆ max_mappings_extensive

max_mappings_extensive

Definition at line 194 of file qsscoring.py.

◆ qs_ent_1

qs_ent_1

Definition at line 173 of file qsscoring.py.

◆ qs_ent_2

qs_ent_2

Definition at line 177 of file qsscoring.py.

◆ res_num_alignment

res_num_alignment

Definition at line 191 of file qsscoring.py.

◆ symm_1

symm_1

Definition at line 398 of file qsscoring.py.

◆ symm_2

symm_2

Definition at line 399 of file qsscoring.py.


The documentation for this class was generated from the following file: