2 Scoring of quaternary structures (QS). The QS scoring is according to the paper
3 by `Bertoni et al. <https://dx.doi.org/10.1038/s41598-017-09654-8>`_.
7 The `qsscoring` module is deprecated. Consider using the newer implementation
8 in :mod:`~ost.mol.alg.qsscore` instead.
14 - A default :class:`compound library <ost.conop.CompoundLib>` must be defined
15 and accessible via :func:`~ost.conop.GetDefaultLib`. This is set by default
16 when executing scripts with ``ost``. Otherwise, you must set this with
17 :func:`~ost.conop.SetDefaultLib`.
18 - ClustalW must be installed (unless you provide chain mappings)
19 - Python modules `numpy` and `scipy` must be installed and available
20 (e.g. use ``pip install scipy numpy``)
24 from ost
import mol, geom, conop, seq, settings, PushVerbosityLevel
25 from ost
import LogError, LogWarning, LogScript, LogInfo, LogVerbose, LogDebug
30 from scipy.special
import factorial
31 from scipy.special
import binom
32 from scipy.cluster.hierarchy
import fclusterdata
40 """Exception to be raised for "acceptable" exceptions in QS scoring.
42 Those are cases we might want to capture for default behavior.
47 """Object to compute QS scores.
49 Simple usage without any precomputed contacts, symmetries and mappings:
51 .. code-block:: python
54 from ost.mol.alg import qsscoring
56 # load two biounits to compare
57 ent_full = ost.io.LoadPDB('3ia3', remote=True)
58 ent_1 = ent_full.Select('cname=A,D')
59 ent_2 = ent_full.Select('cname=B,C')
61 ost.PushVerbosityLevel(3)
63 qs_scorer = qsscoring.QSscorer(ent_1, ent_2)
64 ost.LogScript('QSscore:', str(qs_scorer.global_score))
65 ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping))
66 # commonly you want the QS global score as output
67 qs_score = qs_scorer.global_score
68 except qsscoring.QSscoreError as ex:
69 # default handling: report failure and set score to 0
70 ost.LogError('QSscore failed:', str(ex))
73 For maximal performance when computing QS scores of the same entity with many
74 others, it is advisable to construct and reuse :class:`QSscoreEntity` objects.
76 Any known / precomputed information can be filled into the appropriate
77 attribute here (no checks done!). Otherwise most quantities are computed on
78 first access and cached (lazy evaluation). Setters are provided to set values
79 with extra checks (e.g. :func:`SetSymmetries`).
81 All necessary seq. alignments are done by global BLOSUM62-based alignment. A
82 multiple sequence alignment is performed with ClustalW unless
83 :attr:`chain_mapping` is provided manually. You will need to have an
84 executable ``clustalw`` or ``clustalw2`` in your ``PATH`` or you must set
85 :attr:`clustalw_bin` accordingly. Otherwise an exception
86 (:class:`ost.settings.FileNotFound`) is thrown.
88 Formulas for QS scores:
92 - QS_best = weighted_scores / (weight_sum + weight_extra_mapped)
93 - QS_global = weighted_scores / (weight_sum + weight_extra_all)
94 -> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared
95 -> weight_sum = sum(w(min(d1,d2))) for shared
96 -> weight_extra_mapped = sum(w(d)) for all mapped but non-shared
97 -> weight_extra_all = sum(w(d)) for all non-shared
98 -> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else
100 In the formulas above:
102 * "d": CA/CB-CA/CB distance of an "inter-chain contact" ("d1", "d2" for
104 * "mapped": we could map chains of two structures and align residues in
106 * "shared": pairs of residues which are "mapped" and have
107 "inter-chain contact" in both structures.
108 * "inter-chain contact": CB-CB pairs (CA for GLY) with distance <= 12 A
109 (fallback to CA-CA if :attr:`calpha_only` is True).
110 * "w(d)": weighting function (prob. of 2 res. to interact given CB distance)
111 from `Xu et al. 2009 <https://dx.doi.org/10.1016%2Fj.jmb.2008.06.002>`_.
113 :param ent_1: First structure to be scored.
114 :type ent_1: :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
115 :class:`~ost.mol.EntityView`
116 :param ent_2: Second structure to be scored.
117 :type ent_2: :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or
118 :class:`~ost.mol.EntityView`
119 :param res_num_alignment: Sets :attr:`res_num_alignment`
121 :raises: :class:`QSscoreError` if input structures are invalid or are monomers
122 or have issues that make it impossible for a QS score to be computed.
124 .. attribute:: qs_ent_1
126 :class:`QSscoreEntity` object for *ent_1* given at construction.
127 If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
128 set it to 'pdb_1' using :func:`~QSscoreEntity.SetName`.
130 .. attribute:: qs_ent_2
132 :class:`QSscoreEntity` object for *ent_2* given at construction.
133 If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we
134 set it to 'pdb_2' using :func:`~QSscoreEntity.SetName`.
136 .. attribute:: calpha_only
138 True if any of the two structures is CA-only (after cleanup).
142 .. attribute:: max_ca_per_chain_for_cm
144 Maximal number of CA atoms to use in each chain to determine chain mappings.
145 Setting this to -1 disables the limit. Limiting it speeds up determination
146 of symmetries and chain mappings. By default it is set to 100.
150 .. attribute:: max_mappings_extensive
152 Maximal number of chain mappings to test for 'extensive'
153 :attr:`chain_mapping_scheme`. The extensive chain mapping search must in the
154 worst case check O(N^2) * O(N!) possible mappings for complexes with N
155 chains. Two octamers without symmetry would require 322560 mappings to be
156 checked. To limit computations, a :class:`QSscoreError` is thrown if we try
157 more than the maximal number of chain mappings.
158 The value must be set before the first use of :attr:`chain_mapping`.
159 By default it is set to 100000.
163 .. attribute:: res_num_alignment
165 Forces each alignment in :attr:`alignments` to be based on residue numbers
166 instead of using a global BLOSUM62-based alignment.
170 def __init__(self, ent_1, ent_2, res_num_alignment=False):
172 if isinstance(ent_1, QSscoreEntity):
176 if isinstance(ent_2, QSscoreEntity):
181 if not self.qs_ent_1.is_valid
or not self.qs_ent_2.is_valid:
184 if self.qs_ent_1.original_name == self.qs_ent_2.original_name:
185 self.qs_ent_1.SetName(
'pdb_1')
186 self.qs_ent_2.SetName(
'pdb_2')
188 self.qs_ent_1.SetName(self.qs_ent_1.original_name)
189 self.qs_ent_2.SetName(self.qs_ent_2.original_name)
192 self.
calpha_only = self.qs_ent_1.calpha_only
or self.qs_ent_2.calpha_only
212 """Inter-complex mapping of chemical groups.
214 Each group (see :attr:`QSscoreEntity.chem_groups`) is mapped according to
215 highest sequence identity. Alignment is between longest sequences in groups.
219 - If different numbers of groups, we map only the groups for the complex
220 with less groups (rest considered unmapped and shown as warning)
221 - The mapping is forced: the "best" mapping will be chosen independently of
222 how low the seq. identity may be
224 :getter: Computed on first use (cached)
225 :type: :class:`dict` with key = :class:`tuple` of chain names in
226 :attr:`qs_ent_1` and value = :class:`tuple` of chain names in
229 :raises: :class:`QSscoreError` if we end up having no chains for either
230 entity in the mapping (can happen if chains do not have CA atoms).
242 """Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries.
246 - Includes only residues aligned according to :attr:`chem_mapping`
247 - Includes only 1 CA atom per residue
248 - Has at least 5 and at most :attr:`max_ca_per_chain_for_cm` atoms per chain
249 - All chains of the same chemical group have the same number of atoms
250 (also in :attr:`ent_to_cm_2` according to :attr:`chem_mapping`)
251 - All chains appearing in :attr:`chem_mapping` appear in this entity
252 (so the two can be safely used together)
254 This entity might be transformed (i.e. all positions rotated/translated by
255 same transformation matrix) if this can speed up computations. So do not
256 assume fixed global positions (but relative distances will remain fixed).
258 :getter: Computed on first use (cached)
259 :type: :class:`~ost.mol.EntityHandle`
261 :raises: :class:`QSscoreError` if any chain ends up having less than 5 res.
273 """Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries
274 (see :attr:`ent_to_cm_1` for details).
286 """Symmetry groups for :attr:`qs_ent_1` used to speed up chain mapping.
288 This is a list of chain-lists where each chain-list can be used reconstruct
289 the others via cyclic C or dihedral D symmetry. The first chain-list is used
290 as a representative symmetry group. For heteromers, the group-members must
291 contain all different seqres in oligomer.
293 Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are
294 symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C).
298 - All symmetry group tuples have the same length (num. of chains)
299 - All chains in :attr:`ent_to_cm_1` appear (w/o duplicates)
300 - For heteros: symmetry group tuples have all different chem. groups
301 - Trivial symmetry group = one tuple with all chains (used if inconsistent
302 data provided or if no symmetry is found)
303 - Either compatible to :attr:`symm_2` or trivial symmetry groups used.
304 Compatibility requires same lengths of symmetry group tuples and it must
305 be possible to get an overlap (80% of residues covered within 6 A of a
306 (chem. mapped) chain) of all chains in representative symmetry groups by
307 superposing one pair of chains.
309 :getter: Computed on first use (cached)
310 :type: :class:`list` of :class:`tuple` of :class:`str` (chain names)
318 """Symmetry groups for :attr:`qs_ent_2` (see :attr:`symm_1` for details)."""
324 """Set user-provided symmetry groups.
326 These groups are restricted to chain names appearing in :attr:`ent_to_cm_1`
327 and :attr:`ent_to_cm_2` respectively. They are only valid if they cover all
328 chains and both *symm_1* and *symm_2* have same lengths of symmetry group
329 tuples. Otherwise trivial symmetry group used (see :attr:`symm_1`).
331 :param symm_1: Value to set for :attr:`symm_1`.
332 :param symm_2: Value to set for :attr:`symm_2`.
339 self.
_symm_1 = [tuple(ch.name
for ch
in self.ent_to_cm_1.chains)]
340 self.
_symm_2 = [tuple(ch.name
for ch
in self.ent_to_cm_2.chains)]
344 """Mapping from :attr:`ent_to_cm_1` to :attr:`ent_to_cm_2`.
348 - Mapping is between chains of same chem. group (see :attr:`chem_mapping`)
349 - Each chain can appear only once in mapping
350 - All chains of complex with less chains are mapped
351 - Symmetry (:attr:`symm_1`, :attr:`symm_2`) is taken into account
353 Details on algorithms used to find mapping:
355 - We try all pairs of chem. mapped chains within symmetry group and get
356 superpose-transformation for them
357 - First option: check for "sufficient overlap" of other chain-pairs
359 - For each chain-pair defined above: apply superposition to full oligomer
360 and map chains based on structural overlap
361 - Structural overlap = X% of residues in second oligomer covered within Y
362 Angstrom of a (chem. mapped) chain in first oligomer. We successively
363 try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in
364 mapping (warning shown for most permissive one).
365 - If multiple possible mappings are found, we choose the one which leads
366 to the lowest multi-chain-RMSD given the superposition
368 - Fallback option: try all mappings to find minimal multi-chain-RMSD
371 - For each chain-pair defined above: apply superposition, try all (!)
372 possible chain mappings (within symmetry group) and keep mapping with
373 lowest multi-chain-RMSD
374 - Repeat procedure above to resolve symmetry. Within the symmetry group we
375 can use the chain mapping computed before and we just need to find which
376 symmetry group in first oligomer maps to which in the second one. We
377 again try all possible combinations...
380 - Trying all possible mappings is a combinatorial nightmare (factorial).
381 We throw an exception if too many combinations (e.g. octomer vs
382 octomer with no usable symmetry)
383 - The mapping is forced: the "best" mapping will be chosen independently
384 of how badly they fit in terms of multi-chain-RMSD
385 - As a result, such a forced mapping can lead to a large range of
386 resulting QS scores. An extreme example was observed between 1on3.1
387 and 3u9r.1, where :attr:`global_score` can range from 0.12 to 0.43
388 for mappings with very similar multi-chain-RMSD.
390 :getter: Computed on first use (cached)
391 :type: :class:`dict` with key / value = :class:`str` (chain names, key
392 for :attr:`ent_to_cm_1`, value for :attr:`ent_to_cm_2`)
393 :raises: :class:`QSscoreError` if there are too many combinations to check
394 to find a chain mapping (see :attr:`max_mappings_extensive`).
404 @chain_mapping.setter
410 """Mapping scheme used to get :attr:`chain_mapping`.
414 - 'strict': 80% overlap needed within 4 Angstrom (overlap based mapping).
415 - 'tolerant': 40% overlap needed within 6 Angstrom (overlap based mapping).
416 - 'permissive': 20% overlap needed within 8 Angstrom (overlap based
417 mapping). It's best if you check mapping manually!
418 - 'extensive': Extensive search used for mapping detection (fallback). This
419 approach has known limitations and may be removed in future versions.
420 Mapping should be checked manually!
421 - 'user': :attr:`chain_mapping` was set by user before first use of this
424 :getter: Computed with :attr:`chain_mapping` on first use (cached)
426 :raises: :class:`QSscoreError` as in :attr:`chain_mapping`.
439 """List of successful sequence alignments using :attr:`chain_mapping`.
441 There will be one alignment for each mapped chain and they are ordered by
442 their chain names in :attr:`qs_ent_1`.
444 The first sequence of each alignment belongs to :attr:`qs_ent_1` and the
445 second one to :attr:`qs_ent_2`. The sequences are named according to the
446 mapped chain names and have views attached into :attr:`QSscoreEntity.ent`
447 of :attr:`qs_ent_1` and :attr:`qs_ent_2`.
449 If :attr:`res_num_alignment` is False, each alignment is performed using a
450 global BLOSUM62-based alignment. Otherwise, the positions in the alignment
451 sequences are simply given by the residue number so that residues with
452 matching numbers are aligned.
454 :getter: Computed on first use (cached)
455 :type: :class:`list` of :class:`~ost.seq.AlignmentHandle`
458 self.
_alignments = _GetMappedAlignments(self.qs_ent_1.ent,
470 """Mapping of shared residues in :attr:`alignments`.
472 :getter: Computed on first use (cached)
473 :type: :class:`dict` *mapped_residues[c1][r1] = r2* with:
474 *c1* = Chain name in first entity (= first sequence in aln),
475 *r1* = Residue number in first entity,
476 *r2* = Residue number in second entity
482 @mapped_residues.setter
488 """QS-score with penalties.
490 The range of the score is between 0 (i.e. no interface residues are shared
491 between biounits) and 1 (i.e. the interfaces are identical).
493 The global QS-score is computed applying penalties when interface residues
494 or entire chains are missing (i.e. anything that is not mapped in
495 :attr:`mapped_residues` / :attr:`chain_mapping`) in one of the biounits.
497 :getter: Computed on first use (cached)
498 :type: :class:`float`
499 :raises: :class:`QSscoreError` if only one chain is mapped
507 """QS-score without penalties.
509 Like :attr:`global_score`, but neglecting additional residues or chains in
510 one of the biounits (i.e. the score is calculated considering only mapped
511 chains and residues).
513 :getter: Computed on first use (cached)
514 :type: :class:`float`
515 :raises: :class:`QSscoreError` if only one chain is mapped
523 """Superposition result based on shared CA atoms in :attr:`alignments`.
525 The superposition can be used to map :attr:`QSscoreEntity.ent` of
526 :attr:`qs_ent_1` onto the one of :attr:`qs_ent_2`. Use
527 :func:`ost.geom.Invert` if you need the opposite transformation.
529 :getter: Computed on first use (cached)
530 :type: :class:`ost.mol.alg.SuperpositionResult`
535 sup_rmsd = self._superposition.rmsd
536 cmp_view = self._superposition.view1
537 LogInfo(
'CA RMSD for %s aligned residues on %s chains: %.2f' \
538 % (cmp_view.residue_count, cmp_view.chain_count, sup_rmsd))
544 Full path to ``clustalw`` or ``clustalw2`` executable to use for multiple
545 sequence alignments (unless :attr:`chain_mapping` is provided manually).
547 :getter: Located in path on first use (cached)
551 self.
_clustalw_bin = settings.Locate((
'clustalw',
'clustalw2'))
560 :return: :class:`OligoLDDTScorer` object, setup for this QS scoring problem.
561 The scorer is set up with :attr:`qs_ent_1` as the reference and
562 :attr:`qs_ent_2` as the model.
563 :param settings: Passed to :class:`OligoLDDTScorer` constructor.
564 :param penalize_extra_chains: Passed to :class:`OligoLDDTScorer` constructor.
566 if penalize_extra_chains:
579 def _ComputeAlignedEntities(self):
580 """Fills cached ent_to_cm_1 and ent_to_cm_2."""
590 self._ent_to_cm_1.SetName(self.qs_ent_1.GetName())
591 self._ent_to_cm_2.SetName(self.qs_ent_2.GetName())
593 def _ComputeSymmetry(self):
594 """Fills cached symm_1 and symm_2."""
601 self.
_symm_1 = [tuple(ch.name
for ch
in self.ent_to_cm_1.chains)]
602 self.
_symm_2 = [tuple(ch.name
for ch
in self.ent_to_cm_2.chains)]
604 def _ComputeScores(self):
605 """Fills cached global_score and best_score."""
607 raise QSscoreError(
"QS-score is not defined for monomers")
610 contacts_1 = self.qs_ent_1.contacts_ca
611 contacts_2 = self.qs_ent_2.contacts_ca
613 contacts_1 = self.qs_ent_1.contacts
614 contacts_2 = self.qs_ent_2.contacts
621 LogInfo(
'QSscore %s, %s: best: %.2f, global: %.2f' \
622 % (self.qs_ent_1.GetName(), self.qs_ent_2.GetName(),
631 """Entity with cached entries for QS scoring.
633 Any known / precomputed information can be filled into the appropriate
634 attribute here as long as they are labelled as read/write. Otherwise the
635 quantities are computed on first access and cached (lazy evaluation). The
636 heaviest load is expected when computing :attr:`contacts` and
639 :param ent: Entity to be used for QS scoring. A copy of it will be processed.
640 :type ent: :class:`~ost.mol.EntityHandle` or :class:`~ost.mol.EntityView`
642 .. attribute:: is_valid
644 True, if successfully initialized. False, if input structure has no protein
645 chains with >= 20 residues.
649 .. attribute:: original_name
651 Name set for *ent* when object was created.
657 Cleaned version of *ent* passed at construction. Hydrogens are removed, the
658 entity is processed with a :class:`~ost.conop.RuleBasedProcessor` and chains
659 listed in :attr:`removed_chains` have been removed. The name of this entity
660 might change during scoring (see :func:`GetName`). Otherwise, this will be
663 :type: :class:`~ost.mol.EntityHandle`
665 .. attribute:: removed_chains
667 Chains removed from *ent* passed at construction. These are ligand and water
668 chains as well as small (< 20 res.) peptides or chains with no amino acids
669 (determined by chem. type, which is set by rule based processor).
671 :type: :class:`list` of :class:`str`
673 .. attribute:: calpha_only
675 Whether entity is CA-only (i.e. it has 0 CB atoms)
682 ent = mol.CreateEntityFromView(ent.Select(
'ele!=H and aname!=HN'),
True)
683 if not conop.GetDefaultLib():
684 raise RuntimeError(
"QSscore computation requires a compound library!")
687 self.ent, self.removed_chains, self.
calpha_only = _CleanInputEntity(ent)
689 if self.ent.chain_count == 0:
690 LogError(
'Bad input file: ' + ent.GetName() +
'. No chains left after '
691 'removing water, ligands and small peptides.')
693 elif self.ent.chain_count == 1:
694 LogWarning(
'Structure ' + ent.GetName() +
' is a monomer.')
709 """Wrapper to :func:`~ost.mol.EntityHandle.GetName` of :attr:`ent`.
710 This is used to uniquely identify the entity while scoring. The name may
711 therefore change while :attr:`original_name` remains fixed.
714 return self.ent.GetName()
717 """Wrapper to :func:`~ost.mol.EntityHandle.SetName` of :attr:`ent`.
718 Use this to change unique identifier while scoring (see :func:`GetName`).
721 self.ent.SetName(new_name)
726 Reduced representation of :attr:`ent` with only CA atoms.
727 This guarantees that each included residue has exactly one atom.
729 :getter: Computed on first use (cached)
730 :type: :class:`~ost.mol.EntityHandle`
739 Map of chain names in :attr:`ent` to sequences with attached view to CA-only
740 chains (into :attr:`ca_entity`). Useful for alignments and superpositions.
742 :getter: Computed on first use (cached)
743 :type: :class:`dict` (key = :class:`str`,
744 value = :class:`~ost.seq.SequenceHandle`)
749 for ch
in ca_entity.chains:
750 self.
_ca_chains[ch.name] = seq.SequenceFromChain(ch.name, ch)
754 """Get sequence alignment of chain *c1* with chain *c2*.
755 Computed on first use based on :attr:`ca_chains` (cached).
757 :param c1: Chain name for first chain to align.
758 :type c1: :class:`str`
759 :param c2: Chain name for second chain to align.
760 :type c2: :class:`str`
761 :rtype: :class:`~ost.seq.AlignmentHandle` or None if it failed.
765 self.
_alignments[(c1,c2)] = _AlignAtomSeqs(ca_chains[c1], ca_chains[c2])
771 Intra-complex group of chemically identical (seq. id. > 95%) polypeptide
772 chains as extracted from :attr:`ca_chains`. First chain in group is the one
773 with the longest sequence.
775 :getter: Computed on first use (cached)
776 :type: :class:`list` of :class:`list` of :class:`str` (chain names)
780 LogInfo(
'Chemically equivalent chain-groups in %s: %s' \
785 """Get Euler angles from superposition of chain *c1* with chain *c2*.
786 Computed on first use based on :attr:`ca_chains` (cached).
788 :param c1: Chain name for first chain to superpose.
789 :type c1: :class:`str`
790 :param c2: Chain name for second chain to superpose.
791 :type c2: :class:`str`
792 :return: 3 Euler angles (may contain nan if something fails).
793 :rtype: :class:`numpy.array`
795 if (c1,c2)
not in self.
_angles:
800 """Get axis of symmetry from superposition of chain *c1* with chain *c2*.
801 Computed on first use based on :attr:`ca_chains` (cached).
803 :param c1: Chain name for first chain to superpose.
804 :type c1: :class:`str`
805 :param c2: Chain name for second chain to superpose.
806 :type c2: :class:`str`
807 :return: Rotational axis (may contain nan if something fails).
808 :rtype: :class:`numpy.array`
810 if (c1,c2)
not in self.
_axis:
812 return self.
_axis[(c1,c2)]
817 Connectivity dictionary (**read/write**).
818 As given by :func:`GetContacts` with *calpha_only* = False on :attr:`ent`.
820 :getter: Computed on first use (cached)
821 :setter: Uses :func:`FilterContacts` to ensure that we only keep contacts
822 for chains in the cleaned entity.
823 :type: See return type of :func:`GetContacts`
831 chain_names = set([ch.name
for ch
in self.ent.chains])
837 CA-only connectivity dictionary (**read/write**).
838 Like :attr:`contacts` but with *calpha_only* = True in :func:`GetContacts`.
846 chain_names = set([ch.name
for ch
in self.ent.chains])
853 def _GetSuperposeData(self, c1, c2):
854 """Fill _angles and _axis from superposition of CA chains of c1 and c2."""
859 self.
_angles[(c1,c2)] = np.empty(3) * np.nan
860 self.
_axis[(c1,c2)] = np.empty(3) * np.nan
862 v1, v2 = seq.ViewsFromAlignment(aln)
863 if v1.atom_count < 3:
865 self.
_angles[(c1,c2)] = np.empty(3) * np.nan
866 self.
_axis[(c1,c2)] = np.empty(3) * np.nan
869 sup_res = mol.alg.SuperposeSVD(v1, v2, apply_transform=
False)
870 Rt = sup_res.transformation
872 a,b,c = _GetAngles(Rt)
873 self.
_angles[(c1,c2)] = np.asarray([a,b,c])
876 self.
_axis[(c1,c2)] = np.asarray(R3.GetRotationAxis().data)
883 """Filter contacts to contain only contacts for chains in *chain_names*.
885 :param contacts: Connectivity dictionary as produced by :func:`GetContacts`.
886 :type contacts: :class:`dict`
887 :param chain_names: Chain names to keep.
888 :type chain_names: :class:`list` or (better) :class:`set`
889 :return: New connectivity dictionary (format as in :func:`GetContacts`)
890 :rtype: :class:`dict`
893 filtered_contacts = dict()
895 if c1
in chain_names:
896 new_contacts = dict()
897 for c2
in contacts[c1]:
898 if c2
in chain_names:
899 new_contacts[c2] = contacts[c1][c2]
902 filtered_contacts[c1] = new_contacts
903 return filtered_contacts
906 """Get inter-chain contacts of a macromolecular entity.
908 Contacts are pairs of residues within a given distance belonging to different
909 chains. They are stored once per pair and include the CA/CB-CA/CB distance.
911 :param entity: An entity to check connectivity for.
912 :type entity: :class:`~ost.mol.EntityHandle` or :class:`~ost.mol.EntityView`
913 :param calpha_only: If True, we only consider CA-CA distances. Else, we use CB
914 unless the residue is a GLY.
915 :type calpha_only: :class:`bool`
916 :param dist_thr: Maximal CA/CB-CA/CB distance to be considered in contact.
917 :type dist_thr: :class:`float`
918 :return: A connectivity dictionary. A pair of residues with chain names
919 *ch_name1* & *ch_name2* (*ch_name1* < *ch_name2*), residue numbers
920 *res_num1* & *res_num2* and distance *dist* (<= *dist_thr*) are
921 stored as *result[ch_name1][ch_name2][res_num1][res_num2]* = *dist*.
922 :rtype: :class:`dict`
926 ev = entity.Select(
"aname=CA")
928 ev = entity.Select(
"(rname=GLY and aname=CA) or aname=CB")
929 ent = mol.CreateEntityFromView(ev,
True)
932 for atom
in ent.atoms:
933 ch_name1 = atom.chain.name
934 res_num1 = atom.residue.number.num
935 close_atoms = ent.FindWithin(atom.pos, dist_thr)
936 for close_atom
in close_atoms:
937 ch_name2 = close_atom.chain.name
938 if ch_name2 > ch_name1:
939 res_num2 = close_atom.residue.number.num
942 if ch_name1
not in contacts:
943 contacts[ch_name1] = dict()
944 if ch_name2
not in contacts[ch_name1]:
945 contacts[ch_name1][ch_name2] = dict()
946 if res_num1
not in contacts[ch_name1][ch_name2]:
947 contacts[ch_name1][ch_name2][res_num1] = dict()
948 contacts[ch_name1][ch_name2][res_num1][res_num2] = round(dist, 3)
957 """Helper class to calculate oligomeric lDDT scores.
959 This class can be used independently, but commonly it will be created by
960 calling :func:`QSscorer.GetOligoLDDTScorer`.
964 By construction, lDDT scores are not symmetric and hence it matters which
965 structure is the reference (:attr:`ref`) and which one is the model
966 (:attr:`mdl`). Extra residues in the model are generally not considered.
967 Extra chains in both model and reference can be considered by setting the
968 :attr:`penalize_extra_chains` flag to True.
970 :param ref: Sets :attr:`ref`
971 :param mdl: Sets :attr:`mdl`
972 :param alignments: Sets :attr:`alignments`
973 :param calpha_only: Sets :attr:`calpha_only`
974 :param settings: Sets :attr:`settings`
975 :param penalize_extra_chains: Sets :attr:`penalize_extra_chains`
976 :param chem_mapping: Sets :attr:`chem_mapping`. Must be given if
977 *penalize_extra_chains* is True.
982 Full reference/model entity to be scored. The entity must contain all chains
983 mapped in :attr:`alignments` and may also contain additional ones which are
984 considered if :attr:`penalize_extra_chains` is True.
986 :type: :class:`~ost.mol.EntityHandle`
988 .. attribute:: alignments
990 One alignment for each mapped chain of :attr:`ref`/:attr:`mdl` as defined in
991 :attr:`QSscorer.alignments`. The first sequence of each alignment belongs to
992 :attr:`ref` and the second one to :attr:`mdl`. Sequences must have sequence
993 naming and attached views as defined in :attr:`QSscorer.alignments`.
995 :type: :class:`list` of :class:`~ost.seq.AlignmentHandle`
997 .. attribute:: calpha_only
999 If True, restricts lDDT score to CA only.
1001 :type: :class:`bool`
1003 .. attribute:: settings
1005 Settings to use for lDDT scoring.
1007 :type: :class:`~ost.mol.alg.lDDTSettings`
1009 .. attribute:: penalize_extra_chains
1011 If True, extra chains in both :attr:`ref` and :attr:`mdl` will penalize the
1014 :type: :class:`bool`
1016 .. attribute:: chem_mapping
1018 Inter-complex mapping of chemical groups as defined in
1019 :attr:`QSscorer.chem_mapping`. Used to find "chem-mapped" chains in
1020 :attr:`ref` for unmapped chains in :attr:`mdl` when penalizing scores.
1021 Each unmapped model chain can add extra reference-contacts according to the
1022 average total contacts of each single "chem-mapped" reference chain. If
1023 there is no "chem-mapped" reference chain, a warning is shown and the model
1027 Only relevant if :attr:`penalize_extra_chains` is True.
1029 :type: :class:`dict` with key = :class:`tuple` of chain names in
1030 :attr:`ref` and value = :class:`tuple` of chain names in
1037 def __init__(self, ref, mdl, alignments, calpha_only, settings,
1038 penalize_extra_chains=
False, chem_mapping=
None):
1040 if chem_mapping
is None and penalize_extra_chains:
1041 raise RuntimeError(
"Must provide chem_mapping when requesting penalty "
1042 "for extra chains!")
1043 if not penalize_extra_chains:
1046 if unmapped_mdl_chains:
1047 LogWarning(
'MODEL contains chains unmapped to REFERENCE, '
1048 'lDDT is not considering MODEL chains %s' \
1049 % str(list(unmapped_mdl_chains)))
1051 ref_chains = set(ch.name
for ch
in ref.chains)
1052 mapped_ref_chains = set(aln.GetSequence(0).GetName()
for aln
in alignments)
1053 unmapped_ref_chains = (ref_chains - mapped_ref_chains)
1054 if unmapped_ref_chains:
1055 LogWarning(
'REFERENCE contains chains unmapped to MODEL, '
1056 'lDDT is not considering REFERENCE chains %s' \
1057 % str(list(unmapped_ref_chains)))
1078 """Oligomeric lDDT score.
1080 The score is computed as conserved contacts divided by the total contacts
1081 in the reference using the :attr:`oligo_lddt_scorer`, which uses the full
1082 complex as reference/model structure. If :attr:`penalize_extra_chains` is
1083 True, the reference/model complexes contain all chains (otherwise only the
1084 mapped ones) and additional contacts are added to the reference's total
1085 contacts for unmapped model chains according to the :attr:`chem_mapping`.
1087 The main difference with :attr:`weighted_lddt` is that the lDDT scorer
1088 "sees" the full complex here (incl. inter-chain contacts), while the
1089 weighted single chain score looks at each chain separately.
1091 :getter: Computed on first use (cached)
1092 :type: :class:`float`
1095 LogInfo(
'Reference %s has: %s chains' \
1096 % (self.ref.GetName(), self.ref.chain_count))
1097 LogInfo(
'Model %s has: %s chains' \
1098 % (self.mdl.GetName(), self.mdl.chain_count))
1102 denominator = self.oligo_lddt_scorer.total_contacts
1105 oligo_lddt = self.oligo_lddt_scorer.conserved_contacts \
1106 / float(denominator)
1110 oligo_lddt = self.oligo_lddt_scorer.global_score
1116 """Weighted average of single chain lDDT scores.
1118 The score is computed as a weighted average of single chain lDDT scores
1119 (see :attr:`sc_lddt_scorers`) using the total contacts of each single
1120 reference chain as weights. If :attr:`penalize_extra_chains` is True,
1121 unmapped chains are added with a 0 score and total contacts taken from
1122 the actual reference chains or (for unmapped model chains) using the
1123 :attr:`chem_mapping`.
1125 See :attr:`oligo_lddt` for a comparison of the two scores.
1127 :getter: Computed on first use (cached)
1128 :type: :class:`float`
1133 nominator = sum([s * w
for s, w
in zip(scores, weights)])
1136 denominator = sum(s.total_contacts
for s
in list(ref_scorers.values()))
1139 denominator = sum(weights)
1148 """The reference entity used for oligomeric lDDT scoring
1149 (:attr:`oligo_lddt` / :attr:`oligo_lddt_scorer`).
1151 Since the lDDT computation requires a single chain with mapped residue
1152 numbering, all chains of :attr:`ref` are appended into a single chain X with
1153 unique residue numbers according to the column-index in the alignment. The
1154 alignments are in the same order as they appear in :attr:`alignments`.
1155 Additional residues are appended at the end of the chain with unique residue
1156 numbers. Unmapped chains are only added if :attr:`penalize_extra_chains` is
1157 True. Only CA atoms are considered if :attr:`calpha_only` is True.
1159 :getter: Computed on first use (cached)
1160 :type: :class:`~ost.mol.EntityHandle`
1168 """The model entity used for oligomeric lDDT scoring
1169 (:attr:`oligo_lddt` / :attr:`oligo_lddt_scorer`).
1171 Like :attr:`lddt_ref`, this is a single chain X containing all chains of
1172 :attr:`mdl`. The residue numbers match the ones in :attr:`lddt_ref` where
1173 aligned and have unique numbers for additional residues.
1175 :getter: Computed on first use (cached)
1176 :type: :class:`~ost.mol.EntityHandle`
1184 """lDDT Scorer object for :attr:`lddt_ref` and :attr:`lddt_mdl`.
1186 :getter: Computed on first use (cached)
1187 :type: :class:`~ost.mol.alg.lDDTScorer`
1191 references=[self.lddt_ref.Select(
"")],
1192 model=self.lddt_mdl.Select(
""),
1198 """List of scorer objects for each chain mapped in :attr:`alignments`.
1200 :getter: Computed on first use (cached)
1201 :type: :class:`list` of :class:`MappedLDDTScorer`
1208 self._mapped_lddt_scorers.append(mapped_lddt_scorer)
1213 """List of lDDT scorer objects extracted from :attr:`mapped_lddt_scorers`.
1215 :type: :class:`list` of :class:`~ost.mol.alg.lDDTScorer`
1221 """List of global scores extracted from :attr:`sc_lddt_scorers`.
1223 If scoring for a mapped chain fails, an error is displayed and a score of 0
1226 :getter: Computed on first use (cached)
1227 :type: :class:`list` of :class:`float`
1233 self._sc_lddt.append(lddt_scorer.global_score)
1234 except Exception
as ex:
1235 LogError(
'Single chain lDDT failed:', str(ex))
1236 self._sc_lddt.append(0.0)
1243 def _PrepareOligoEntities(self):
1250 def _GetUnmappedMdlChains(mdl, alignments):
1252 mdl_chains = set(ch.name
for ch
in mdl.chains)
1253 mapped_mdl_chains = set(aln.GetSequence(1).GetName()
for aln
in alignments)
1254 return (mdl_chains - mapped_mdl_chains)
1256 def _GetRefScorers(self):
1260 ref_scorers = dict()
1262 ref_ch_name = mapped_lddt_scorer.reference_chain_name
1263 ref_scorers[ref_ch_name] = mapped_lddt_scorer.lddt_scorer
1265 for ch
in self.ref.chains:
1266 if ch.name
not in ref_scorers:
1268 ref_chain = ch.Select(
'aname=CA')
1270 ref_chain = ch.Select(
'')
1272 references=[ref_chain],
1280 def _GetModelPenalty(self):
1286 raise RuntimeError(
"Must provide chem_mapping when requesting penalty "
1287 "for extra model chains!")
1294 for ch_name_mdl
in sorted(unmapped_mdl_chains):
1297 for cm_ref, cm_mdl
in self.chem_mapping.items():
1298 if ch_name_mdl
in cm_mdl:
1301 for ch_name_ref
in cm_ref:
1303 cur_penalty += ref_scorers[ch_name_ref].total_contacts
1304 cur_penalty /= float(len(cm_ref))
1307 if cur_penalty
is None:
1308 LogWarning(
'Extra MODEL chain %s could not be chemically mapped to '
1309 'any chain in REFERENCE, lDDT cannot consider it!' \
1312 LogScript(
'Extra MODEL chain %s added to lDDT score by considering '
1313 'chemically mapped chains in REFERENCE.' % ch_name_mdl)
1314 model_penalty += cur_penalty
1322 """A simple class to calculate a single-chain lDDT score on a given chain to
1323 chain mapping as extracted from :class:`OligoLDDTScorer`.
1325 :param alignment: Sets :attr:`alignment`
1326 :param calpha_only: Sets :attr:`calpha_only`
1327 :param settings: Sets :attr:`settings`
1329 .. attribute:: alignment
1331 Alignment with two sequences named according to the mapped chains and with
1332 views attached to both sequences (e.g. one of the items of
1333 :attr:`QSscorer.alignments`).
1335 The first sequence is assumed to be the reference and the second one the
1336 model. Since the lDDT score is not symmetric (extra residues in model are
1337 ignored), the order is important.
1339 :type: :class:`~ost.seq.AlignmentHandle`
1341 .. attribute:: calpha_only
1343 If True, restricts lDDT score to CA only.
1345 :type: :class:`bool`
1347 .. attribute:: settings
1349 Settings to use for lDDT scoring.
1351 :type: :class:`~ost.mol.alg.lDDTSettings`
1353 .. attribute:: lddt_scorer
1355 lDDT Scorer object for the given chains.
1357 :type: :class:`~ost.mol.alg.lDDTScorer`
1359 .. attribute:: reference_chain_name
1361 Chain name of the reference.
1365 .. attribute:: model_chain_name
1367 Chain name of the model.
1386 :return: Scores for each residue
1387 :rtype: :class:`list` of :class:`dict` with one item for each residue
1388 existing in model and reference:
1390 - "residue_number": Residue number in reference chain
1391 - "residue_name": Residue name in reference chain
1392 - "lddt": local lDDT
1393 - "conserved_contacts": number of conserved contacts
1394 - "total_contacts": total number of contacts
1397 assigned_residues = list()
1399 self.lddt_scorer.global_score
1401 if col[0] !=
"-" and col.GetResidue(3).IsValid():
1402 ref_res = col.GetResidue(0)
1403 mdl_res = col.GetResidue(1)
1404 ref_res_renum = col.GetResidue(2)
1405 mdl_res_renum = col.GetResidue(3)
1406 if ref_res.one_letter_code != ref_res_renum.one_letter_code:
1407 raise RuntimeError(
"Reference residue name mapping inconsistent: %s != %s" %
1408 (ref_res.one_letter_code,
1409 ref_res_renum.one_letter_code))
1410 if mdl_res.one_letter_code != mdl_res_renum.one_letter_code:
1411 raise RuntimeError(
"Model residue name mapping inconsistent: %s != %s" %
1412 (mdl_res.one_letter_code,
1413 mdl_res_renum.one_letter_code))
1415 raise RuntimeError(
"Reference residue number mapping inconsistent: %s != %s" %
1416 (ref_res.GetNumber().num,
1419 raise RuntimeError(
"Model residue number mapping inconsistent: %s != %s" %
1420 (mdl_res.GetNumber().num,
1422 if ref_res.qualified_name
in assigned_residues:
1423 raise RuntimeError(
"Duplicated residue in reference: " %
1424 (ref_res.qualified_name))
1426 assigned_residues.append(ref_res.qualified_name)
1428 if mdl_res_renum.HasProp(self.settings.label):
1430 "residue_number": ref_res.GetNumber().num,
1431 "residue_name": ref_res.name,
1432 "lddt": mdl_res_renum.GetFloatProp(self.settings.label),
1433 "conserved_contacts": mdl_res_renum.GetFloatProp(self.settings.label +
"_conserved"),
1434 "total_contacts": mdl_res_renum.GetFloatProp(self.settings.label +
"_total")})
1441 def _InitScorer(self):
1443 aln = self.alignment.Copy()
1448 refseq = seq.CreateSequence(
1449 "reference_renumbered",
1450 aln.GetSequence(0).GetString())
1451 refseq.AttachView(reference)
1452 aln.AddSequence(refseq)
1456 modelseq = seq.CreateSequence(
1458 aln.GetSequence(1).GetString())
1459 modelseq.AttachView(model)
1460 aln.AddSequence(modelseq)
1464 references=[reference.Select(
'aname=CA')],
1465 model=model.Select(
'aname=CA'),
1469 references=[reference],
1481 def _AlignAtomSeqs(seq_1, seq_2):
1483 :type seq_1: :class:`ost.seq.SequenceHandle`
1484 :type seq_2: :class:`ost.seq.SequenceHandle`
1485 :return: Alignment of two sequences using a global alignment. Views attached
1486 to the input sequences will remain attached in the aln.
1487 :rtype: :class:`~ost.seq.AlignmentHandle` or None if it failed.
1492 alns = seq.alg.GlobalAlign(seq_1, seq_2, seq.alg.BLOSUM62)
1493 if alns: aln = alns[0]
1495 LogWarning(
'Failed to align %s to %s' % (seq_1.name, seq_2.name))
1496 LogWarning(
'%s: %s' % (seq_1.name, seq_1.string))
1497 LogWarning(
'%s: %s' % (seq_2.name, seq_2.string))
1500 def _FixSelectChainNames(ch_names):
1502 :return: String to be used with Select(cname=<RETURN>). Takes care of joining
1503 and putting quotation marks where needed.
1504 :rtype: :class:`str`
1505 :param ch_names: Some iterable list of chain names (:class:`str` items).
1507 return ','.join(mol.QueryQuoteName(ch_name)
for ch_name
in ch_names)
1511 def _CleanInputEntity(ent):
1513 :param ent: The OST entity to be cleaned.
1514 :type ent: :class:`EntityHandle` or :class:`EntityView`
1515 :return: A tuple of 3 items: :attr:`QSscoreEntity.ent`,
1516 :attr:`QSscoreEntity.removed_chains`,
1517 :attr:`QSscoreEntity.calpha_only`
1521 for ch
in ent.chains:
1524 if ch.name
in [
'-',
'_'] \
1525 or ch.residue_count < 20 \
1526 or not any(r.chem_type.IsAminoAcid()
for r
in ch.residues) \
1527 or not (set(r.one_letter_code
for r
in ch.residues) - {
'?',
'X'}):
1528 removed_chains.append(ch.name)
1532 view = ent.Select(
'cname!=%s' % _FixSelectChainNames(set(removed_chains)))
1533 ent_new = mol.CreateEntityFromView(view,
True)
1534 ent_new.SetName(ent.GetName())
1540 if ent_new.atom_count > 0
and ent_new.Select(
'aname=CB').atom_count == 0:
1541 LogInfo(
'Structure %s is a CA only structure!' % ent_new.GetName())
1546 LogInfo(
'Chains removed from %s: %s' \
1547 % (ent_new.GetName(),
''.join(removed_chains)))
1548 LogInfo(
'Chains in %s: %s' \
1549 % (ent_new.GetName(),
''.join([c.name
for c
in ent_new.chains])))
1550 return ent_new, removed_chains, calpha_only
1552 def _GetCAOnlyEntity(ent):
1554 :param ent: Entity to process
1555 :type ent: :class:`EntityHandle` or :class:`EntityView`
1556 :return: New entity with only CA and only one atom per residue
1557 (see :attr:`QSscoreEntity.ca_entity`)
1560 ca_view = ent.CreateEmptyView()
1562 for res
in ent.residues:
1563 ca_atom = res.FindAtom(
"CA")
1564 if ca_atom.IsValid():
1565 ca_view.AddAtom(ca_atom)
1567 return mol.CreateEntityFromView(ca_view,
False)
1569 def _GetChemGroups(qs_ent, seqid_thr=95.):
1571 :return: Intra-complex group of chemically identical polypeptide chains
1572 (see :attr:`QSscoreEntity.chem_groups`)
1574 :param qs_ent: Entity to process
1575 :type qs_ent: :class:`QSscoreEntity`
1576 :param seqid_thr: Threshold used to decide when two chains are identical.
1577 95 percent tolerates the few mutations crystallographers
1579 :type seqid_thr: :class:`float`
1582 ca_chains = qs_ent.ca_chains
1583 chain_names = sorted(ca_chains.keys())
1588 for ch_1, ch_2
in itertools.combinations(chain_names, 2):
1589 aln = qs_ent.GetAlignment(ch_1, ch_2)
1590 if aln
and seq.alg.SequenceIdentity(aln) > seqid_thr:
1591 id_seqs.append((ch_1, ch_2))
1594 return [[name]
for name
in chain_names]
1598 for ch_1, ch_2
in id_seqs:
1601 if ch_1
in g
or ch_2
in g:
1606 groups.append(set([ch_1, ch_2]))
1610 ranked_g = sorted([(-ca_chains[ch].length, ch)
for ch
in g])
1611 chem_groups.append([ch
for _,ch
in ranked_g])
1613 for ch
in chain_names:
1614 if not any(ch
in g
for g
in chem_groups):
1615 chem_groups.append([ch])
1620 """Computes the Euler angles given a transformation matrix.
1622 :param Rt: Rt operator.
1623 :type Rt: :class:`ost.geom.Mat4`
1624 :return: A :class:`tuple` of angles for each axis (x,y,z)
1626 rot = np.asarray(Rt.ExtractRotation().data).reshape(3,3)
1627 tx = np.arctan2(rot[2,1], rot[2,2])
1630 ty = np.arctan2(rot[2,0], np.sqrt(rot[2,1]**2 + rot[2,2]**2))
1633 tz = np.arctan2(rot[1,0], rot[0,0])
1640 def _GetChemGroupsMapping(qs_ent_1, qs_ent_2):
1642 :return: Inter-complex mapping of chemical groups
1643 (see :attr:`QSscorer.chem_mapping`)
1645 :param qs_ent_1: See :attr:`QSscorer.qs_ent_1`
1646 :param qs_ent_2: See :attr:`QSscorer.qs_ent_2`
1649 chem_groups_1 = qs_ent_1.chem_groups
1650 chem_groups_2 = qs_ent_2.chem_groups
1651 repr_chains_1 = {x[0]: tuple(x)
for x
in chem_groups_1}
1652 repr_chains_2 = {x[0]: tuple(x)
for x
in chem_groups_2}
1657 if len(repr_chains_2) < len(repr_chains_1):
1658 repr_chains_1, repr_chains_2 = repr_chains_2, repr_chains_1
1659 qs_ent_1, qs_ent_2 = qs_ent_2, qs_ent_1
1669 ca_chains_1 = qs_ent_1.ca_chains
1670 ca_chains_2 = qs_ent_2.ca_chains
1671 for ch_1
in list(repr_chains_1.keys()):
1672 for ch_2
in list(repr_chains_2.keys()):
1673 aln = _AlignAtomSeqs(ca_chains_1[ch_1], ca_chains_2[ch_2])
1675 chains_seqid = seq.alg.SequenceIdentity(aln)
1676 LogVerbose(
'Sequence identity', ch_1, ch_2,
'seqid=%.2f' % chains_seqid)
1677 chain_pairs.append((chains_seqid, ch_1, ch_2))
1680 chain_pairs = sorted(chain_pairs, reverse=
True)
1682 for _, c1, c2
in chain_pairs:
1684 for a,b
in chem_mapping.items():
1685 if repr_chains_1[c1] == a
or repr_chains_2[c2] == b:
1689 chem_mapping[repr_chains_1[c1]] = repr_chains_2[c2]
1691 chem_mapping = {y: x
for x, y
in chem_mapping.items()}
1692 qs_ent_1, qs_ent_2 = qs_ent_2, qs_ent_1
1695 mapped_1 = set([i
for s
in list(chem_mapping.keys())
for i
in s])
1696 chains_1 = set([c.name
for c
in qs_ent_1.ent.chains])
1697 if chains_1 - mapped_1:
1698 LogWarning(
'Unmapped Chains in %s: %s'
1699 % (qs_ent_1.GetName(),
','.join(list(chains_1 - mapped_1))))
1701 mapped_2 = set([i
for s
in list(chem_mapping.values())
for i
in s])
1702 chains_2 = set([c.name
for c
in qs_ent_2.ent.chains])
1703 if chains_2 - mapped_2:
1704 LogWarning(
'Unmapped Chains in %s: %s'
1705 % (qs_ent_2.GetName(),
','.join(list(chains_2 - mapped_2))))
1708 LogInfo(
'Chemical chain-groups mapping: ' + str(chem_mapping))
1709 if len(mapped_1) < 1
or len(mapped_2) < 1:
1710 raise QSscoreError(
'Less than 1 chains left in chem_mapping.')
1713 def _SelectFew(l, max_elements):
1714 """Return l or copy of l with at most *max_elements* entries."""
1716 if n_el <= max_elements:
1720 d_el = ((n_el - 1) // max_elements) + 1
1722 for i
in range(0, n_el, d_el):
1726 def _GetAlignedResidues(qs_ent_1, qs_ent_2, chem_mapping, max_ca_per_chain,
1729 :return: Tuple of two :class:`~ost.mol.EntityView` objects containing subsets
1730 of *qs_ent_1* and *qs_ent_2*. Two entities are later created from
1731 those views (see :attr:`QSscorer.ent_to_cm_1` and
1732 :attr:`QSscorer.ent_to_cm_2`)
1734 :param qs_ent_1: See :attr:`QSscorer.qs_ent_1`
1735 :param qs_ent_2: See :attr:`QSscorer.qs_ent_2`
1736 :param chem_mapping: See :attr:`QSscorer.chem_mapping`
1737 :param max_ca_per_chain: See :attr:`QSscorer.max_ca_per_chain_for_cm`
1740 def _FixName(seq_name, seq_names):
1742 seq_name = seq_name.replace(
' ',
'-')
1743 while seq_name
in seq_names:
1747 ent_view_1 = qs_ent_1.ca_entity.CreateEmptyView()
1748 ent_view_2 = qs_ent_2.ca_entity.CreateEmptyView()
1749 ca_chains_1 = qs_ent_1.ca_chains
1750 ca_chains_2 = qs_ent_2.ca_chains
1752 for group_1, group_2
in chem_mapping.items():
1754 seq_list = seq.CreateSequenceList()
1756 seq_to_empty_view = dict()
1758 sequence = ca_chains_1[ch].Copy()
1759 sequence.name = _FixName(qs_ent_1.GetName() +
'.' + ch, seq_to_empty_view)
1760 seq_to_empty_view[sequence.name] = ent_view_1
1761 seq_list.AddSequence(sequence)
1763 sequence = ca_chains_2[ch].Copy()
1764 sequence.name = _FixName(qs_ent_2.GetName() +
'.' + ch, seq_to_empty_view)
1765 seq_to_empty_view[sequence.name] = ent_view_2
1766 seq_list.AddSequence(sequence)
1767 alnc =
ClustalW(seq_list, clustalw=clustalw_bin)
1770 residue_lists = list()
1771 sequence_count = alnc.sequence_count
1773 residues = [col.GetResidue(ir)
for ir
in range(sequence_count)]
1774 if all([r.IsValid()
for r
in residues]):
1775 residue_lists.append(residues)
1777 if len(residue_lists) < 5:
1779 elif max_ca_per_chain > 0:
1780 residue_lists = _SelectFew(residue_lists, max_ca_per_chain)
1782 res_views = [seq_to_empty_view[alnc.sequences[ir].name] \
1783 for ir
in range(sequence_count)]
1785 for res_list
in residue_lists:
1786 for res, view
in zip(res_list, res_views):
1787 view.AddResidue(res, mol.ViewAddFlag.INCLUDE_ATOMS)
1789 return ent_view_1, ent_view_2
1792 def _FindSymmetry(qs_ent_1, qs_ent_2, ent_to_cm_1, ent_to_cm_2, chem_mapping):
1794 :return: A pair of comparable symmetry groups (for :attr:`QSscorer.symm_1`
1795 and :attr:`QSscorer.symm_2`) between the two structures.
1796 Empty lists if no symmetry identified.
1798 :param qs_ent_1: See :attr:`QSscorer.qs_ent_1`
1799 :param qs_ent_2: See :attr:`QSscorer.qs_ent_2`
1800 :param ent_to_cm_1: See :attr:`QSscorer.ent_to_cm_1`
1801 :param ent_to_cm_2: See :attr:`QSscorer.ent_to_cm_2`
1802 :param chem_mapping: See :attr:`QSscorer.chem_mapping`
1804 LogInfo(
'Identifying Symmetry Groups...')
1807 symm_subg_1 = _GetSymmetrySubgroups(qs_ent_1, ent_to_cm_1,
1808 list(chem_mapping.keys()))
1809 symm_subg_2 = _GetSymmetrySubgroups(qs_ent_2, ent_to_cm_2,
1810 list(chem_mapping.values()))
1813 LogInfo(
'Selecting Symmetry Groups...')
1819 for symm_1, symm_2
in itertools.product(symm_subg_1, symm_subg_2):
1822 if len(s1) != len(s2):
1823 LogDebug(
'Discarded', str(symm_1), str(symm_2),
1824 ': different size of symmetry groups')
1826 count = _CountSuperpositionsAndMappings(symm_1, symm_2, chem_mapping)
1827 nr_mapp = count[
'intra'][
'mappings'] + count[
'inter'][
'mappings']
1828 LogDebug(
'OK', str(symm_1), str(symm_2),
': %s mappings' % nr_mapp)
1829 best_symm.append((nr_mapp, symm_1, symm_2))
1831 for _, symm_1, symm_2
in sorted(best_symm):
1834 group_1 = ent_to_cm_1.Select(
'cname=%s' % _FixSelectChainNames(s1))
1835 group_2 = ent_to_cm_2.Select(
'cname=%s' % _FixSelectChainNames(s2))
1839 closed_symm = _CheckClosedSymmetry(group_1, group_2, symm_1, symm_2,
1840 chem_mapping, 6, 0.8, find_best=
False)
1843 return symm_1, symm_2
1846 LogInfo(
'No coherent symmetry identified between structures')
1850 def _GetChainMapping(ent_1, ent_2, symm_1, symm_2, chem_mapping,
1851 max_mappings_extensive):
1853 :return: Tuple with mapping from *ent_1* to *ent_2* (see
1854 :attr:`QSscorer.chain_mapping`) and scheme used (see
1855 :attr:`QSscorer.chain_mapping_scheme`)
1857 :param ent_1: See :attr:`QSscorer.ent_to_cm_1`
1858 :param ent_2: See :attr:`QSscorer.ent_to_cm_2`
1859 :param symm_1: See :attr:`QSscorer.symm_1`
1860 :param symm_2: See :attr:`QSscorer.symm_2`
1861 :param chem_mapping: See :attr:`QSscorer.chem_mapping`
1862 :param max_mappings_extensive: See :attr:`QSscorer.max_mappings_extensive`
1864 LogInfo(
'Symmetry-groups used in %s: %s' % (ent_1.GetName(), str(symm_1)))
1865 LogInfo(
'Symmetry-groups used in %s: %s' % (ent_2.GetName(), str(symm_2)))
1868 thresholds = [(
'strict', 4, 0.8),
1869 (
'tolerant', 6, 0.4),
1870 (
'permissive', 8, 0.2)]
1871 for scheme, sup_thr, sup_fract
in thresholds:
1875 mapping = _CheckClosedSymmetry(ent_1, ent_2, symm_1, symm_2,
1876 chem_mapping, sup_thr, sup_fract)
1878 LogInfo(
'Closed Symmetry with %s parameters' % scheme)
1879 if scheme ==
'permissive':
1880 LogWarning(
'Permissive thresholds used for overlap based mapping ' + \
1881 'detection: check mapping manually: %s' % mapping)
1882 return mapping, scheme
1896 count = _CountSuperpositionsAndMappings(symm_1, symm_2, chem_mapping)
1897 LogInfo(
'Intra Symmetry-group mappings to check: %s' \
1898 % count[
'intra'][
'mappings'])
1899 LogInfo(
'Inter Symmetry-group mappings to check: %s' \
1900 % count[
'inter'][
'mappings'])
1901 nr_mapp = count[
'intra'][
'mappings'] + count[
'inter'][
'mappings']
1902 if nr_mapp > max_mappings_extensive:
1903 raise QSscoreError(
'Too many possible mappings: %s' % nr_mapp)
1912 ref_symm_1 = symm_1[0]
1913 ref_symm_2 = symm_2[0]
1914 asu_chem_mapping = _LimitChemMapping(chem_mapping, ref_symm_1, ref_symm_2)
1916 for g1, g2
in asu_chem_mapping.items():
1918 for c1, c2
in itertools.product(g1, g2):
1920 LogVerbose(
'Superposing chains: %s, %s' % (c1, c2))
1921 res = cached_rmsd.GetSuperposition(c1, c2)
1924 for mapping
in _ListPossibleMappings(c1, c2, asu_chem_mapping):
1926 chains_rmsd = cached_rmsd.GetMappedRMSD(mapping, res.transformation)
1927 cur_mappings.append((chains_rmsd, mapping))
1928 LogVerbose(str(mapping), str(chains_rmsd))
1929 best_rmsd, best_mapp = min(cur_mappings)
1930 intra_mappings.append((best_rmsd, c1, c2, best_mapp))
1932 _, intra_asu_c1, intra_asu_c2, intra_asu_mapp = min(intra_mappings)
1935 if len(symm_1) == 1
or len(symm_2) == 1:
1936 mapping = intra_asu_mapp
1941 index_asu_c1 = ref_symm_1.index(intra_asu_c1)
1942 index_asu_c2 = ref_symm_2.index(intra_asu_c2)
1943 index_mapp = {ref_symm_1.index(s1): ref_symm_2.index(s2) \
1944 for s1, s2
in intra_asu_mapp.items()}
1945 LogInfo(
'Intra symmetry-group mapping: %s' % str(intra_asu_mapp))
1949 if len(symm_1) < len(symm_2):
1950 symm_combinations = list(itertools.product(symm_1, [symm_2[0]]))
1952 symm_combinations = list(itertools.product([symm_1[0]], symm_2))
1954 full_asu_mapp = {tuple(symm_1): tuple(symm_2)}
1956 for s1, s2
in symm_combinations:
1958 LogVerbose(
'Superposing symmetry-group: %s, %s in %s, %s' \
1959 % (s1[index_asu_c1], s2[index_asu_c2], s1, s2))
1960 res = cached_rmsd.GetSuperposition(s1[index_asu_c1], s2[index_asu_c2])
1962 for asu_mapp
in _ListPossibleMappings(s1, s2, full_asu_mapp):
1966 for a, b
in asu_mapp.items():
1967 for id_a, id_b
in index_mapp.items():
1968 mapping[a[id_a]] = b[id_b]
1969 chains_rmsd = cached_rmsd.GetMappedRMSD(mapping, res.transformation)
1970 full_mappings.append((chains_rmsd, mapping))
1971 LogVerbose(str(mapping), str(chains_rmsd))
1973 if check != nr_mapp:
1974 LogWarning(
'Mapping number estimation was wrong')
1977 mapping = min(full_mappings, key=
lambda x: x[0])[1]
1979 LogWarning(
'Extensive search used for mapping detection (fallback). This ' + \
1980 'approach has known limitations. Check mapping manually: %s' \
1982 return mapping,
'extensive'
1985 def _GetSymmetrySubgroups(qs_ent, ent, chem_groups):
1987 Identify the symmetry (either cyclic C or dihedral D) of the protein and find
1988 all possible symmetry subgroups. This is done testing all combination of chain
1989 superposition and clustering them by the angles (D) or the axis (C) of the Rt
1992 Groups of superposition which can fully reconstruct the structure are possible
1995 :param qs_ent: Entity with cached angles and axis.
1996 :type qs_ent: :class:`QSscoreEntity`
1997 :param ent: Entity from which to extract chains (perfect alignment assumed
1998 for superposition as in :attr:`QSscorer.ent_to_cm_1`).
1999 :type ent: :class:`EntityHandle` or :class:`EntityView`
2000 :param chem_groups: List of tuples/lists of chain names in *ent*. Each list
2001 contains all chains belonging to a chem. group (could be
2002 from :attr:`QSscoreEntity.chem_groups` or from "keys()"
2003 or "values()" of :attr:`QSscorer.chem_mapping`)
2005 :return: A list of possible symmetry subgroups (each in same format as
2006 :attr:`QSscorer.symm_1`). If no symmetry is found, we return a list
2007 with a single symmetry subgroup with a single group with all chains.
2012 for cnames
in chem_groups:
2013 for c1, c2
in itertools.combinations(cnames, 2):
2014 cur_angles = qs_ent.GetAngles(c1, c2)
2015 if not np.any(np.isnan(cur_angles)):
2016 angles[(c1,c2)] = cur_angles
2017 cur_axis = qs_ent.GetAxis(c1, c2)
2018 if not np.any(np.isnan(cur_axis)):
2019 axis[(c1,c2)] = cur_axis
2023 LogVerbose(
'Possible symmetry-groups for: %s' % ent.GetName())
2024 for angle_thr
in np.arange(0.1, 1, 0.1):
2025 d_symm_groups = _GetDihedralSubgroups(ent, chem_groups, angles, angle_thr)
2027 for group
in d_symm_groups:
2028 if group
not in symm_groups:
2029 symm_groups.append(group)
2030 LogVerbose(
'Dihedral: %s' % group)
2031 LogInfo(
'Symmetry threshold %.1f used for angles of %s' \
2032 % (angle_thr, ent.GetName()))
2036 for axis_thr
in np.arange(0.1, 1, 0.1):
2037 c_symm_groups = _GetCyclicSubgroups(ent, chem_groups, axis, axis_thr)
2039 for group
in c_symm_groups:
2040 if group
not in symm_groups:
2041 symm_groups.append(group)
2042 LogVerbose(
'Cyclic: %s' % group)
2043 LogInfo(
'Symmetry threshold %.1f used for axis of %s' \
2044 % (axis_thr, ent.GetName()))
2049 LogInfo(
'No symmetry-group detected in %s' % ent.GetName())
2050 symm_groups = [[tuple([c
for g
in chem_groups
for c
in g])]]
2054 def _GetDihedralSubgroups(ent, chem_groups, angles, angle_thr):
2056 :return: Dihedral subgroups for :func:`_GetSymmetrySubgroups`
2057 (same return type as there). Empty list if fail.
2059 :param ent: See :func:`_GetSymmetrySubgroups`
2060 :param chem_groups: See :func:`_GetSymmetrySubgroups`
2061 :param angles: :class:`dict` (key = chain-pair-tuple, value = Euler angles)
2062 :param angle_thr: Euler angles distance threshold for clustering (float).
2065 if len(angles) == 0:
2069 same_angles = _ClusterData(angles, angle_thr, _AngleArrayDistance)
2073 for clst
in list(same_angles.values()):
2074 group = list(clst.keys())
2075 if _ValidChainGroup(group, ent):
2076 if len(chem_groups) > 1:
2078 symm_groups.append(list(zip(*group)))
2081 symm_groups.append(group)
2082 symm_groups.append(list(zip(*group)))
2085 def _GetCyclicSubgroups(ent, chem_groups, axis, axis_thr):
2087 :return: Cyclic subgroups for :func:`_GetSymmetrySubgroups`
2088 (same return type as there). Empty list if fail.
2090 :param ent: See :func:`_GetSymmetrySubgroups`
2091 :param chem_groups: See :func:`_GetSymmetrySubgroups`
2092 :param angles: :class:`dict` (key = chain-pair-tuple, value = rotation axis)
2093 :param angle_thr: Axis distance threshold for clustering (float).
2100 same_axis = _ClusterData(axis, axis_thr, _AxisDistance)
2104 for clst
in list(same_axis.values()):
2105 all_chain = [item
for sublist
in list(clst.keys())
for item
in sublist]
2106 if len(set(all_chain)) == ent.chain_count:
2108 if len(chem_groups) > 1:
2109 ref_chem_group = chem_groups[0]
2111 cyclic_group_closest = []
2112 cyclic_group_iface = []
2113 for c1
in ref_chem_group:
2114 group_closest = [c1]
2116 for chains
in chem_groups[1:]:
2118 closest = _GetClosestChain(ent, c1, chains)
2120 closest_iface = _GetClosestChainInterface(ent, c1, chains)
2121 group_closest.append(closest)
2122 group_iface.append(closest_iface)
2123 cyclic_group_closest.append(tuple(group_closest))
2124 cyclic_group_iface.append(tuple(group_iface))
2125 if _ValidChainGroup(cyclic_group_closest, ent):
2126 symm_groups.append(cyclic_group_closest)
2127 if _ValidChainGroup(cyclic_group_iface, ent):
2128 symm_groups.append(cyclic_group_iface)
2131 symm_groups.append(chem_groups)
2134 def _ClusterData(data, thr, metric):
2135 """Wrapper for fclusterdata to get dict of clusters.
2137 :param data: :class:`dict` (keys for ID, values used for clustering)
2138 :return: :class:`dict` {cluster_idx: {data-key: data-value}}
2142 return {0: {list(data.keys())[0]: list(data.values())[0]} }
2144 cluster_indices = fclusterdata(np.asarray(list(data.values())), thr,
2145 method=
'complete', criterion=
'distance',
2149 for cluster_idx, data_key
in zip(cluster_indices, list(data.keys())):
2150 if not cluster_idx
in res:
2151 res[cluster_idx] = {}
2152 res[cluster_idx][data_key] = data[data_key]
2155 def _AngleArrayDistance(u, v):
2157 :return: Average angular distance of two arrays of angles.
2158 :param u: Euler angles.
2159 :param v: Euler angles.
2162 for a,b
in zip(u,v):
2165 delta = abs(2*np.pi - delta)
2167 return np.mean(dist)
2169 def _AxisDistance(u, v):
2171 :return: Euclidean distance between two rotation axes. Axes can point in
2172 either direction so we ensure to use the closer one.
2173 :param u: Rotation axis.
2174 :param v: Rotation axis.
2182 d1 = np.dot(dv1, dv1)
2183 d2 = np.dot(dv2, dv2)
2184 return np.sqrt(min(d1, d2))
2186 def _GetClosestChain(ent, ref_chain, chains):
2188 :return: Chain closest to *ref_chain* based on center of atoms distance.
2189 :rtype: :class:`str`
2190 :param ent: See :func:`_GetSymmetrySubgroups`
2191 :param ref_chain: We look for chains closest to this one
2192 :type ref_chain: :class:`str`
2193 :param chains: We only consider these chains
2194 :type chains: :class:`list` of :class:`str`
2197 ref_center = ent.FindChain(ref_chain).center_of_atoms
2199 center = ent.FindChain(ch).center_of_atoms
2201 closest_chain = min(contacts)[1]
2202 return closest_chain
2204 def _GetClosestChainInterface(ent, ref_chain, chains):
2206 :return: Chain with biggest interface (within 10 A) to *ref_chain*.
2207 :rtype: :class:`str`
2208 :param ent: See :func:`_GetSymmetrySubgroups`
2209 :param ref_chain: We look for chains closest to this one
2210 :type ref_chain: :class:`str`
2211 :param chains: We only consider these chains
2212 :type chains: :class:`list` of :class:`str`
2218 iface_view = ent.Select(
'cname="%s" and 10 <> [cname="%s"]' \
2220 nr_res = iface_view.residue_count
2221 closest.append((nr_res, ch))
2222 closest_chain = max(closest)[1]
2223 return closest_chain
2225 def _ValidChainGroup(group, ent):
2227 :return: True, if *group* has unique chain names and as many chains as *ent*
2228 :rtype: :class:`bool`
2229 :param group: Symmetry groups to check
2230 :type group: Same as :attr:`QSscorer.symm_1`
2231 :param ent: See :func:`_GetSymmetrySubgroups`
2233 all_chain = [item
for sublist
in group
for item
in sublist]
2234 if len(all_chain) == len(set(all_chain))
and\
2235 len(all_chain) == ent.chain_count:
2239 def _LimitChemMapping(chem_mapping, limit_1, limit_2):
2241 :return: Chem. mapping containing only chains in *limit_1* and *limit_2*
2242 :rtype: Same as :attr:`QSscorer.chem_mapping`
2243 :param chem_mapping: See :attr:`QSscorer.chem_mapping`
2244 :param limit_1: Limits chain names in chem_mapping.keys()
2245 :type limit_1: List/tuple of strings
2246 :param limit_2: Limits chain names in chem_mapping.values()
2247 :type limit_2: List/tuple of strings
2250 limit_1_set = set(limit_1)
2251 limit_2_set = set(limit_2)
2252 asu_chem_mapping = dict()
2253 for key, value
in chem_mapping.items():
2254 new_key = tuple(set(key) & limit_1_set)
2256 asu_chem_mapping[new_key] = tuple(set(value) & limit_2_set)
2257 return asu_chem_mapping
2260 def _CountSuperpositionsAndMappings(symm_1, symm_2, chem_mapping):
2262 :return: Dictionary of number of mappings and superpositions to be performed.
2263 Returned as *result[X][Y] = number* with X = "intra" or "inter" and
2264 Y = "mappings" or "superpositions". The idea is that for each
2265 pairwise superposition we check all possible mappings.
2266 We can check the combinations within (intra) a symmetry group and
2267 once established, we check the combinations between different (inter)
2269 :param symm_1: See :attr:`QSscorer.symm_1`
2270 :param symm_2: See :attr:`QSscorer.symm_2`
2271 :param chem_mapping: See :attr:`QSscorer.chem_mapping`
2277 c[
'intra'][
'mappings'] = 0
2278 c[
'intra'][
'superpositions'] = 0
2279 c[
'inter'][
'mappings'] = 0
2280 c[
'inter'][
'superpositions'] = 0
2282 ref_symm_1 = symm_1[0]
2283 ref_symm_2 = symm_2[0]
2284 asu_chem_mapping = _LimitChemMapping(chem_mapping, ref_symm_1, ref_symm_2)
2285 for g1, g2
in asu_chem_mapping.items():
2286 chain_superpositions = len(g1) * len(g2)
2287 c[
'intra'][
'superpositions'] += chain_superpositions
2288 map_per_sup = _PermutationOrCombinations(g1[0], g2[0], asu_chem_mapping)
2289 c[
'intra'][
'mappings'] += chain_superpositions * map_per_sup
2290 if len(symm_1) == 1
or len(symm_2) == 1:
2293 asu_superposition = min(len(symm_1), len(symm_2))
2294 c[
'inter'][
'superpositions'] = asu_superposition
2295 asu = {tuple(symm_1): tuple(symm_2)}
2296 map_per_sup = _PermutationOrCombinations(ref_symm_1, ref_symm_2, asu)
2297 c[
'inter'][
'mappings'] = asu_superposition * map_per_sup
2300 def _PermutationOrCombinations(sup1, sup2, chem_mapping):
2301 """Should match len(_ListPossibleMappings(sup1, sup2, chem_mapping))."""
2304 for a,b
in chem_mapping.items():
2305 new_a = tuple([x
for x
in a
if x != sup1])
2306 new_b = tuple([x
for x
in b
if x != sup2])
2307 chem_map[new_a] = new_b
2309 for a, b
in chem_map.items():
2310 if len(a) == len(b):
2311 mapp_nr *= factorial(len(a))
2312 elif len(a) < len(b):
2313 mapp_nr *= binom(len(b), len(a))
2314 elif len(a) > len(b):
2315 mapp_nr *= binom(len(a), len(b))
2318 def _ListPossibleMappings(sup1, sup2, chem_mapping):
2320 Return a flat list of all possible mappings given *chem_mapping* and keeping
2321 mapping of *sup1* and *sup2* fixed. For instance if elements are chain names
2322 this is all the mappings to check for a given superposition.
2324 Elements in first complex are defined by *chem_mapping.keys()* (list of list
2325 of elements) and elements in second complex by *chem_mapping.values()*. If
2326 complexes don't have same number of elements, we map only elements for the one
2327 with less. Also mapping is only between elements of mapped groups according to
2330 :rtype: :class:`list` of :class:`dict` (key = element in chem_mapping-key,
2331 value = element in chem_mapping-value)
2332 :param sup1: Element for which mapping is fixed.
2333 :type sup1: Like element in chem_mapping-key
2334 :param sup2: Element for which mapping is fixed.
2335 :type sup2: Like element in chem_mapping-value
2336 :param chem_mapping: Defines mapping between groups of elements (e.g. result
2337 of :func:`_LimitChemMapping`).
2338 :type chem_mapping: :class:`dict` with key / value = :class:`tuple`
2340 :raises: :class:`QSscoreError` if reference complex (first one or one with
2341 less elements) has more elements for any given mapped group.
2344 chain1 = [i
for s
in list(chem_mapping.keys())
for i
in s]
2345 chain2 = [i
for s
in list(chem_mapping.values())
for i
in s]
2347 if len(chain1) > len(chain2):
2351 for a, b
in chem_mapping.items():
2352 new_a = tuple([x
for x
in a
if x != sup1])
2353 new_b = tuple([x
for x
in b
if x != sup2])
2355 chem_map[new_b] = new_a
2357 chem_map[new_a] = new_b
2362 for a, b
in chem_map.items():
2363 if len(a) == len(b):
2364 chem_perm.append(list(itertools.permutations(b)))
2366 elif len(a) < len(b):
2367 chem_perm.append(list(itertools.combinations(b, len(a))))
2370 raise QSscoreError(
'Impossible to define reference group: %s' \
2374 flat_ref = [i
for s
in chem_ref
for i
in s]
2375 for perm
in itertools.product(*chem_perm):
2376 flat_perm = [i
for s
in perm
for i
in s]
2377 d = {c1: c2
for c1, c2
in zip(flat_ref, flat_perm)}
2379 d = {v: k
for k, v
in list(d.items())}
2380 d.update({sup1: sup2})
2385 def _CheckClosedSymmetry(ent_1, ent_2, symm_1, symm_2, chem_mapping,
2386 sup_thr=4, sup_fract=0.8, find_best=
True):
2388 Quick check if we can superpose two chains and get a mapping for all other
2389 chains using the same transformation. The mapping is defined by sufficient
2390 overlap of the transformed chain of *ent_1* onto another chain in *ent_2*.
2392 :param ent_1: Entity to map to *ent_2* (perfect alignment assumed between
2393 chains of same chem. group as in :attr:`QSscorer.ent_to_cm_1`).
2394 Views are ok but only to select full chains!
2395 :param ent_2: Entity to map to (perfect alignment assumed between
2396 chains of same chem. group as in :attr:`QSscorer.ent_to_cm_2`).
2397 Views are ok but only to select full chains!
2398 :param symm_1: Symmetry groups to use. We only superpose chains within
2399 reference symmetry group of *symm_1* and *symm_2*.
2400 See :attr:`QSscorer.symm_1`
2401 :param symm_2: See :attr:`QSscorer.symm_2`
2402 :param chem_mapping: See :attr:`QSscorer.chem_mapping`.
2403 All chains in *ent_1* / *ent_2* must be listed here!
2404 :param sup_thr: Distance around transformed chain in *ent_1* to check for
2406 :type sup_thr: :class:`float`
2407 :param sup_fract: Fraction of atoms in chain of *ent_2* that must be
2408 overlapped for overlap to be sufficient.
2409 :type sup_fract: :class:`float`
2410 :param find_best: If True, we look for best mapping according to
2411 :func:`_GetMappedRMSD`. Otherwise, we return first suitable
2413 :type find_best: :class:`bool`
2415 :return: Mapping from *ent_1* to *ent_2* or None if none found. Mapping, if
2416 found, is as in :attr:`QSscorer.chain_mapping`.
2417 :rtype: :class:`dict` (:class:`str` / :class:`str`)
2420 ref_symm_1 = symm_1[0]
2421 ref_symm_2 = symm_2[0]
2422 asu_chem_mapping = _LimitChemMapping(chem_mapping, ref_symm_1, ref_symm_2)
2425 for g1, g2
in asu_chem_mapping.items():
2429 for c1, c2
in itertools.product(g1, g2):
2431 chain_1 = ent_1.Select(
'cname="%s"' % c1)
2432 chain_2 = ent_2.Select(
'cname="%s"' % c2)
2433 res = mol.alg.SuperposeSVD(chain_1, chain_2, apply_transform=
False)
2435 mapping = _GetSuperpositionMapping(ent_1, ent_2, chem_mapping,
2436 res.transformation, sup_thr,
2445 rmsd = _GetMappedRMSD(ent_1, ent_2, mapping, res.transformation)
2446 rmsd_mappings.append((rmsd, mapping))
2449 return min(rmsd_mappings, key=
lambda x: x[0])[1]
2453 def _GetSuperpositionMapping(ent_1, ent_2, chem_mapping, transformation,
2454 sup_thr, sup_fract):
2456 :return: Dict with chain mapping from *ent_1* to *ent_2* or None if failed
2457 (see :func:`_CheckClosedSymmetry`).
2458 :param ent_1: See :func:`_CheckClosedSymmetry`
2459 :param ent_2: See :func:`_CheckClosedSymmetry`
2460 :param chem_mapping: See :func:`_CheckClosedSymmetry`
2461 :param transformation: Superposition transformation to be applied to *ent_1*.
2462 :param sup_thr: See :func:`_CheckClosedSymmetry`
2463 :param sup_fract: See :func:`_CheckClosedSymmetry`
2470 if ent_1.chain_count > ent_2.chain_count:
2472 ent_1, ent_2 = ent_2, ent_1
2474 chem_pairs = list(zip(list(chem_mapping.values()), list(chem_mapping.keys())))
2477 chem_pairs = iter(chem_mapping.items())
2479 if ent_1.chain_count == 0:
2482 chem_partners = dict()
2483 for cg1, cg2
in chem_pairs:
2485 chem_partners[ch] = set(cg2)
2488 mapped_chains = set()
2489 for ch_1
in ent_1.chains:
2491 ch_atoms = {ch_2.name: set()
for ch_2
in ent_2.chains}
2492 for a_1
in ch_1.handle.atoms:
2494 a_within = ent_2.FindWithin(transformed_pos, sup_thr)
2495 for a_2
in a_within:
2496 ch_atoms[a_2.chain.name].add(a_2.hash_code)
2498 max_num, max_name = max((len(atoms), name)
2499 for name, atoms
in ch_atoms.items())
2501 ch_2 = ent_2.FindChain(max_name)
2502 if max_num == 0
or max_num / float(ch_2.handle.atom_count) < sup_fract:
2505 ch_1_name = ch_1.name
2506 if ch_1_name
not in chem_partners:
2507 raise RuntimeError(
"Chem. mapping doesn't contain all chains!")
2508 if max_name
not in chem_partners[ch_1_name]:
2511 if max_name
in mapped_chains:
2514 mapping[ch_1_name] = max_name
2515 mapped_chains.add(max_name)
2518 mapping = {v: k
for k, v
in mapping.items()}
2521 def _GetMappedRMSD(ent_1, ent_2, chain_mapping, transformation):
2523 :return: RMSD between complexes considering chain mapping.
2524 :param ent_1: Entity mapped to *ent_2* (perfect alignment assumed between
2525 mapped chains as in :attr:`QSscorer.ent_to_cm_1`).
2526 :param ent_2: Entity which was mapped to (perfect alignment assumed between
2527 mapped chains as in :attr:`QSscorer.ent_to_cm_2`).
2528 :param chain_mapping: See :attr:`QSscorer.chain_mapping`
2529 :param transformation: Superposition transformation to be applied to *ent_1*.
2534 for c1, c2
in chain_mapping.items():
2536 chain_1 = ent_1.Select(
'cname="%s"' % c1)
2537 chain_2 = ent_2.Select(
'cname="%s"' % c2)
2538 atom_count = chain_1.atom_count
2539 if atom_count != chain_2.atom_count:
2540 raise RuntimeError(
'Chains in _GetMappedRMSD must be perfectly aligned!')
2542 rmsd = mol.alg.CalculateRMSD(chain_1, chain_2, transformation)
2545 atoms.append(float(atom_count))
2547 rmsd = np.sqrt( sum([a * r**2
for a, r
in zip(atoms, rmsds)]) / sum(atoms) )
2551 """Helper object for repetitive RMSD calculations.
2552 Meant to speed up :func:`_GetChainMapping` but could also be used to replace
2553 :func:`_GetMappedRMSD` in :func:`_CheckClosedSymmetry`.
2555 :param ent_1: See :attr:`QSscorer.ent_to_cm_1`
2556 :param ent_2: See :attr:`QSscorer.ent_to_cm_2`
2567 """Get cached view on chain *cname* for :attr:`ent_1`."""
2569 self.
_chain_views_1[cname] = self.ent_1.Select(
'cname="%s"' % cname)
2573 """Get cached view on chain *cname* for :attr:`ent_2`."""
2575 self.
_chain_views_2[cname] = self.ent_2.Select(
'cname="%s"' % cname)
2579 """Get superposition result (no change in entities!) for *c1* to *c2*.
2580 This invalidates cached RMSD results used in :func:`GetMappedRMSD`.
2582 :param c1: chain name for :attr:`ent_1`.
2583 :param c2: chain name for :attr:`ent_2`.
2590 if chain_1.atom_count != chain_2.atom_count:
2591 raise RuntimeError(
'Chains in GetSuperposition not perfectly aligned!')
2592 return mol.alg.SuperposeSVD(chain_1, chain_2, apply_transform=
False)
2596 :return: RMSD between complexes considering chain mapping.
2597 :param chain_mapping: See :attr:`QSscorer.chain_mapping`.
2598 :param transformation: Superposition transformation (e.g. res.transformation
2599 for res = :func:`GetSuperposition`).
2604 for c1, c2
in chain_mapping.items():
2612 atom_count = chain_1.atom_count
2613 if atom_count != chain_2.atom_count:
2614 raise RuntimeError(
'Chains in GetMappedRMSD not perfectly aligned!')
2616 rmsd = mol.alg.CalculateRMSD(chain_1, chain_2, transformation)
2617 self.
_pair_rmsd[(c1, c2)] = (atom_count, rmsd)
2620 atoms.append(float(atom_count))
2622 rmsd = np.sqrt( sum([a * r**2
for a, r
in zip(atoms, rmsds)]) / sum(atoms) )
2626 def _CleanUserSymmetry(symm, ent):
2627 """Clean-up user provided symmetry.
2629 :param symm: Symmetry group as in :attr:`QSscorer.symm_1`
2630 :param ent: Entity from which to extract chain names
2631 :type ent: :class:`~ost.mol.EntityHandle` or :class:`~ost.mol.EntityView`
2632 :return: New symmetry group limited to chains in sub-structure *ent*
2633 (see :attr:`QSscorer.symm_1`). Empty list if invalid symmetry.
2636 chain_names = set([ch.name
for ch
in ent.chains])
2638 for symm_group
in symm:
2639 new_group = tuple(ch
for ch
in symm_group
if ch
in chain_names)
2641 new_symm.append(new_group)
2643 if new_symm != symm:
2644 LogInfo(
"Cleaned user symmetry for %s: %s" % (ent.GetName(), new_symm))
2646 lengths = [len(symm_group)
for symm_group
in new_symm]
2647 if lengths[1:] != lengths[:-1]:
2648 LogWarning(
'User symmetry group of different sizes for %s. Ignoring it!' \
2652 s_chain_names = set([ch
for symm_group
in new_symm
for ch
in symm_group])
2653 if len(s_chain_names) != sum(lengths):
2654 LogWarning(
'User symmetry group for %s has duplicate chains. Ignoring it!' \
2657 if s_chain_names != chain_names:
2658 LogWarning(
'User symmetry group for %s is missing chains. Ignoring it!' \
2664 def _AreValidSymmetries(symm_1, symm_2):
2665 """Check symmetry pair for major problems.
2667 :return: False if any of the two is empty or if they're compatible in size.
2668 :rtype: :class:`bool`
2670 if not symm_1
or not symm_2:
2672 if len(symm_1) != 1
or len(symm_2) != 1:
2673 if not len(symm_1[0]) == len(symm_2[0]):
2674 LogWarning(
'Symmetry groups of different sizes. Ignoring them!')
2678 def _GetMappedAlignments(ent_1, ent_2, chain_mapping, res_num_alignment):
2680 :return: Alignments of 2 structures given chain mapping
2681 (see :attr:`QSscorer.alignments`).
2682 :param ent_1: Entity containing all chains in *chain_mapping.keys()*.
2683 Views to this entity attached to first sequence of each aln.
2684 :param ent_2: Entity containing all chains in *chain_mapping.values()*.
2685 Views to this entity attached to second sequence of each aln.
2686 :param chain_mapping: See :attr:`QSscorer.chain_mapping`
2687 :param res_num_alignment: See :attr:`QSscorer.res_num_alignment`
2690 for ch_1_name
in sorted(chain_mapping):
2692 ch_1 = ent_1.FindChain(ch_1_name)
2693 ch_2 = ent_2.FindChain(chain_mapping[ch_1_name])
2694 if res_num_alignment:
2695 max_res_num = max([r.number.GetNum()
for r
in ch_1.residues] +
2696 [r.number.GetNum()
for r
in ch_2.residues])
2697 ch1_aln = [
"-"] * max_res_num
2698 ch2_aln = [
"-"] * max_res_num
2699 for res
in ch_1.residues:
2700 ch1_aln[res.number.GetNum() - 1] = res.GetOneLetterCode()
2701 ch1_aln =
"".join(ch1_aln)
2702 seq_1 = seq.CreateSequence(ch_1.name, str(ch1_aln))
2703 seq_1.AttachView(ch_1.Select(
""))
2704 for res
in ch_2.residues:
2705 ch2_aln[res.number.GetNum() - 1] = res.GetOneLetterCode()
2706 ch2_aln =
"".join(ch2_aln)
2707 seq_2 = seq.CreateSequence(ch_2.name, str(ch2_aln))
2708 seq_2.AttachView(ch_2.Select(
""))
2710 aln = seq.CreateAlignment()
2711 aln.AddSequence(seq_1)
2712 aln.AddSequence(seq_2)
2714 seq_1 = seq.SequenceFromChain(ch_1.name, ch_1)
2715 seq_2 = seq.SequenceFromChain(ch_2.name, ch_2)
2717 aln = _AlignAtomSeqs(seq_1, seq_2)
2722 def _GetMappedResidues(alns):
2724 :return: Mapping of shared residues in *alns* (with views attached)
2725 (see :attr:`QSscorer.mapped_residues`).
2726 :param alns: See :attr:`QSscorer.alignments`
2728 mapped_residues = dict()
2731 c1 = aln.GetSequence(0).name
2732 mapped_residues[c1] = dict()
2734 v1, v2 = seq.ViewsFromAlignment(aln)
2735 for res_1, res_2
in zip(v1.residues, v2.residues):
2736 r1 = res_1.number.num
2737 r2 = res_2.number.num
2738 mapped_residues[c1][r1] = r2
2740 return mapped_residues
2742 def _GetExtraWeights(contacts, done, mapped_residues):
2743 """Return sum of extra weights for contacts of chains in set and not in done.
2744 :return: Tuple (weight_extra_mapped, weight_extra_all).
2745 weight_extra_mapped only sums if both cX,rX in mapped_residues
2746 weight_extra_all sums all
2747 :param contacts: See :func:`GetContacts` for first entity
2748 :param done: List of (c1, c2, r1, r2) tuples to ignore
2749 :param mapped_residues: See :func:`_GetMappedResidues`
2751 weight_extra_mapped = 0
2752 weight_extra_non_mapped = 0
2754 for c2
in contacts[c1]:
2755 for r1
in contacts[c1][c2]:
2756 for r2
in contacts[c1][c2][r1]:
2757 if (c1, c2, r1, r2)
not in done:
2758 weight = _weight(contacts[c1][c2][r1][r2])
2759 if c1
in mapped_residues
and r1
in mapped_residues[c1] \
2760 and c2
in mapped_residues
and r2
in mapped_residues[c2]:
2761 weight_extra_mapped += weight
2763 weight_extra_non_mapped += weight
2764 return weight_extra_mapped, weight_extra_mapped + weight_extra_non_mapped
2766 def _GetScores(contacts_1, contacts_2, mapped_residues, chain_mapping):
2767 """Get QS scores (see :class:`QSscorer`).
2769 Note that if some chains are not to be considered at all, they must be removed
2770 from *contacts_1* / *contacts_2* prior to calling this.
2772 :param contacts_1: See :func:`GetContacts` for first entity
2773 :param contacts_2: See :func:`GetContacts` for second entity
2774 :param mapped_residues: See :func:`_GetMappedResidues`
2775 :param chain_mapping: Maps any chain name in *mapped_residues* to chain name
2777 :type chain_mapping: :class:`dict` (:class:`str` / :class:`str`)
2778 :return: Tuple (QS_best, QS_global) (see :attr:`QSscorer.best_score` and
2779 see :attr:`QSscorer.global_score`)
2788 for c11
in contacts_1:
2790 if c11
not in mapped_residues:
continue
2791 c1T = chain_mapping[c11]
2793 for c21
in contacts_1[c11]:
2795 if c21
not in mapped_residues:
continue
2796 c2T = chain_mapping[c21]
2798 flipped_chains = (c1T > c2T)
2804 if c12
not in contacts_2
or c22
not in contacts_2[c12]:
continue
2806 for r11
in contacts_1[c11][c21]:
2808 if r11
not in mapped_residues[c11]:
continue
2809 r1T = mapped_residues[c11][r11]
2811 for r21
in contacts_1[c11][c21][r11]:
2813 if r21
not in mapped_residues[c21]:
continue
2814 r2T = mapped_residues[c21][r21]
2821 if r12
not in contacts_2[c12][c22]:
continue
2822 if r22
not in contacts_2[c12][c22][r12]:
continue
2824 d1 = contacts_1[c11][c21][r11][r21]
2825 d2 = contacts_2[c12][c22][r12][r22]
2826 weight = _weight(min(d1, d2))
2827 weighted_scores += weight * (1 - abs(d1 - d2) / 12)
2828 weight_sum += weight
2830 done_1.add((c11, c21, r11, r21))
2831 done_2.add((c12, c22, r12, r22))
2833 LogVerbose(
"Shared contacts: %d" % len(done_1))
2836 weights_extra_1 = _GetExtraWeights(contacts_1, done_1, mapped_residues)
2837 mapped_residues_2 = dict()
2838 for c1
in mapped_residues:
2839 c2 = chain_mapping[c1]
2840 mapped_residues_2[c2] = set()
2841 for r1
in mapped_residues[c1]:
2842 mapped_residues_2[c2].add(mapped_residues[c1][r1])
2843 weights_extra_2 = _GetExtraWeights(contacts_2, done_2, mapped_residues_2)
2844 weight_extra_mapped = weights_extra_1[0] + weights_extra_2[0]
2845 weight_extra_all = weights_extra_1[1] + weights_extra_2[1]
2848 denom_best = weight_sum + weight_extra_mapped
2849 denom_all = weight_sum + weight_extra_all
2853 QS_best = weighted_scores / denom_best
2857 QS_global = weighted_scores / denom_all
2858 return QS_best, QS_global
2862 This weight expresses the probability of two residues to interact given the CB
2863 distance (from Xu et al. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2573399/)
2867 x = (dist-5.0)/4.28;
2868 return np.exp(-2*x*x)
2871 def _GetQsSuperposition(alns):
2873 :return: Superposition result based on shared CA atoms in *alns*
2874 (with views attached) (see :attr:`QSscorer.superposition`).
2875 :param alns: See :attr:`QSscorer.alignments`
2879 raise QSscoreError(
'No successful alignments for superposition!')
2881 view_1 = alns[0].GetSequence(0).attached_view.CreateEmptyView()
2882 view_2 = alns[0].GetSequence(1).attached_view.CreateEmptyView()
2886 res_1 = col.GetResidue(0)
2887 res_2 = col.GetResidue(1)
2888 if res_1.IsValid()
and res_2.IsValid():
2889 ca_1 = res_1.FindAtom(
'CA')
2890 ca_2 = res_2.FindAtom(
'CA')
2891 if ca_1.IsValid()
and ca_2.IsValid():
2892 view_1.AddAtom(ca_1)
2893 view_2.AddAtom(ca_2)
2895 res = mol.alg.SuperposeSVD(view_1, view_2, apply_transform=
False)
2899 def _AddResidue(edi, res, rnum, chain, calpha_only):
2901 Add residue *res* with res. num. *run* to given *chain* using editor *edi*.
2902 Either all atoms added or (if *calpha_only*) only CA.
2905 ca_atom = res.FindAtom(
'CA')
2906 if ca_atom.IsValid():
2907 new_res = edi.AppendResidue(chain, res.name, rnum)
2908 edi.InsertAtom(new_res, ca_atom.name, ca_atom.pos)
2910 new_res = edi.AppendResidue(chain, res.name, rnum)
2911 for atom
in res.atoms:
2912 edi.InsertAtom(new_res, atom.name, atom.pos)
2914 def _MergeAlignedChains(alns, ent_1, ent_2, calpha_only, penalize_extra_chains):
2916 Create two new entities (based on the alignments attached views) where all
2917 residues have same numbering (when they're aligned) and they are all pushed to
2918 a single chain X. Also append extra chains contained in *ent_1* and *ent_2*
2919 but not contained in *alns*.
2921 Used for :attr:`QSscorer.lddt_ref` and :attr:`QSscorer.lddt_mdl`
2923 :param alns: List of alignments with attached views (first sequence: *ent_1*,
2924 second: *ent_2*). Residue number in single chain is column index
2925 of current alignment + sum of lengths of all previous alignments
2926 (order of alignments as in input list).
2927 :type alns: See :attr:`QSscorer.alignments`
2928 :param ent_1: First entity to process.
2929 :type ent_1: :class:`~ost.mol.EntityHandle`
2930 :param ent_2: Second entity to process.
2931 :type ent_2: :class:`~ost.mol.EntityHandle`
2932 :param calpha_only: If True, we only include CA atoms instead of all.
2933 :type calpha_only: :class:`bool`
2934 :param penalize_extra_chains: If True, extra chains are added to model and
2935 reference. Otherwise, only mapped ones.
2936 :type penalize_extra_chains: :class:`bool`
2938 :return: Tuple of two single chain entities (from *ent_1* and from *ent_2*)
2939 :rtype: :class:`tuple` of :class:`~ost.mol.EntityHandle`
2942 ent_ren_1 = mol.CreateEntity()
2943 ed_1 = ent_ren_1.EditXCS()
2944 new_chain_1 = ed_1.InsertChain(
'X')
2946 ent_ren_2 = mol.CreateEntity()
2947 ed_2 = ent_ren_2.EditXCS()
2948 new_chain_2 = ed_2.InsertChain(
'X')
2951 chain_done_1 = set()
2952 chain_done_2 = set()
2954 chain_done_1.add(aln.GetSequence(0).name)
2955 chain_done_2.add(aln.GetSequence(1).name)
2959 res_1 = col.GetResidue(0)
2961 _AddResidue(ed_1, res_1, rnum, new_chain_1, calpha_only)
2962 res_2 = col.GetResidue(1)
2964 _AddResidue(ed_2, res_2, rnum, new_chain_2, calpha_only)
2966 if penalize_extra_chains:
2967 for chain
in ent_1.chains:
2968 if chain.name
in chain_done_1:
2970 for res
in chain.residues:
2972 _AddResidue(ed_1, res, rnum, new_chain_1, calpha_only)
2973 for chain
in ent_2.chains:
2974 if chain.name
in chain_done_2:
2976 for res
in chain.residues:
2978 _AddResidue(ed_2, res, rnum, new_chain_2, calpha_only)
2980 ent_ren_1.SetName(aln.GetSequence(0).GetAttachedView().GetName())
2981 ent_ren_2.SetName(aln.GetSequence(1).GetAttachedView().GetName())
2983 if not conop.GetDefaultLib():
2984 raise RuntimeError(
"QSscore computation requires a compound library!")
2986 pr.Process(ent_ren_1)
2988 pr.Process(ent_ren_2)
2990 return ent_ren_1, ent_ren_2
2994 __all__ = (
'QSscoreError',
'QSscorer',
'QSscoreEntity',
'FilterContacts',
2995 'GetContacts',
'OligoLDDTScorer',
'MappedLDDTScorer')
def _ComputeAlignedEntities
Class internal helpers (anything that doesnt easily work without this class)
def _GetSuperposeData
Class internal helpers (anything that doesnt easily work without this class)
Mat4 DLLEXPORT_OST_GEOM Invert(const Mat4 &m)
def _InitScorer
Class internal helpers (anything that doesnt easily work without this class)
Real Distance(const Vec3 &p1, const Vec3 &p2)
returns the distance between two points
Three dimensional vector class, using Real precision.
def _GetUnmappedMdlChains
def _PrepareOligoEntities
Class internal helpers.