4 import numpy.ma
as np_ma
9 from ost
import LogError, LogWarning, LogScript, LogInfo, LogVerbose, LogDebug
14 """ Scorer class to compute various small molecule ligand (non polymer) scores.
19 - Python modules `numpy` and `networkx` must be available
20 (e.g. use ``pip install numpy networkx``)
22 At the moment, two scores are available:
24 * lDDT-PLI, that looks at the conservation of protein-ligand contacts
25 with :class:`lDDT <ost.mol.alg.lddt.lDDTScorer>`.
26 * Binding-site superposed, symmetry-corrected RMSD that assesses the
27 accuracy of the ligand pose (BiSyRMSD, hereinafter referred to as RMSD).
29 Both scores involve local chain mapping of the reference binding site
30 onto the model, symmetry-correction, and finally assignment (mapping)
31 of model and target ligands, as described in (Manuscript in preparation).
33 The binding site is defined based on a radius around the target ligand
34 in the reference structure only. It only contains protein and nucleic
35 acid chains that pass the criteria for the
36 :class:`chain mapping <ost.mol.alg.chain_mapping>`. This means ignoring
37 other ligands, waters, short polymers as well as any incorrectly connected
38 chains that may be in proximity.
40 Results are available as matrices (`(lddt_pli|rmsd)_matrix`), where every
41 target-model score is reported in a matrix; as `(lddt_pli|rmsd)` where
42 a model-target assignment has been determined (see below) and reported in
43 a dictionary; and as (`(lddt_pli|rmsd)_details`) methods, which report
44 additional details about different aspects of the scoring such as chain
47 The behavior of chain mapping and ligand assignment can be controlled
48 with the `global_chain_mapping` and `rmsd_assignment` arguments.
50 By default, chain mapping is performed locally, ie. only within the
51 binding site. As a result, different ligand scores can correspond to
52 different chain mappings. This tends to produce more favorable scores,
53 especially in large, partially regular oligomeric complexes.
54 Setting `global_chain_mapping=True` enforces a single global chain mapping,
55 as per :meth:`ost.mol.alg.chain_mapping.ChainMapper.GetMapping`.
56 Note that this global chain mapping currently ignores non polymer entities
57 such as small ligands, and may result in overly pessimistic scores.
59 By default, target-model ligand assignments are computed independently
60 for the RMSD and lDDT-PLI scores. For RMSD, each model ligand is uniquely
61 assigned to a target ligand, starting from the "best" possible mapping
62 (lowest RMSD) and using each target and model ligand in a single
63 assignment. Ties are resolved by best (highest) lDDT-PLI. Similarly,
64 for lDDT-PLI, the assignment is based on the highest lDDT-PLI, and ties
65 broken by lowest RMSD. Setting `rmsd_assignment=True` forces a single
66 ligand assignment, based on RMSD only. Ties are broken arbitrarily.
68 By default, only exact matches between target and model ligands are
69 considered. This is a problem when the target only contains a subset
70 of the expected atoms (for instance if atoms are missing in an
71 experimental structure, which often happens in the PDB). With
72 `substructure_match=True`, complete model ligands can be scored against
73 partial target ligands. One problem with this approach is that it is
74 very easy to find good matches to small, irrelevant ligands like EDO, CO2
75 or GOL. To counter that, the assignment algorithm considers the coverage,
76 expressed as the fraction of atoms of the model ligand atoms covered in the
77 target. Higher coverage matches are prioritized, but a match with a better
78 score will be preferred if it falls within a window of `coverage_delta`
79 (by default 0.2) of a worse-scoring match. As a result, for instance,
80 with a delta of 0.2, a low-score match with coverage 0.96 would be
81 preferred to a high-score match with coverage 0.90.
85 The class generally assumes that the
86 :attr:`~ost.mol.ResidueHandle.is_ligand` property is properly set on all
87 the ligand atoms, and only ligand atoms. This is typically the case for
88 entities loaded from mmCIF (tested with mmCIF files from the PDB and
89 SWISS-MODEL), but it will most likely not work for most entities loaded
92 The class doesn't perform any cleanup of the provided structures.
93 It is up to the caller to ensure that the data is clean and suitable for
94 scoring. :ref:`Molck <molck>` should be used with extra
95 care, as many of the options (such as `rm_non_std` or `map_nonstd_res`) can
96 cause ligands to be removed from the structure. If cleanup with Molck is
97 needed, ligands should be kept aside and passed separately. Non-ligand residues
98 should be valid compounds with atom names following the naming conventions
99 of the component dictionary. Non-standard residues are acceptable, and if
100 the model contains a standard residue at that position, only atoms with
101 matching names will be considered.
103 Unlike most of OpenStructure, this class does not assume that the ligands
104 (either in the model or the target) are part of the PDB component
105 dictionary. They may have arbitrary residue names. Residue names do not
106 have to match between the model and the target. Matching is based on
107 the calculation of isomorphisms which depend on the atom element name and
108 atom connectivity (bond order is ignored).
109 It is up to the caller to ensure that the connectivity of atoms is properly
110 set before passing any ligands to this class. Ligands with improper
111 connectivity will lead to bogus results.
113 Note, however, that atom names should be unique within a residue (ie two
114 distinct atoms cannot have the same atom name).
116 This only applies to the ligand. The rest of the model and target
117 structures (protein, nucleic acids) must still follow the usual rules and
118 contain only residues from the compound library.
120 Although it isn't a requirement, hydrogen atoms should be removed from the
121 structures. Here is an example code snippet that will perform a reasonable
122 cleanup. Keep in mind that this is most likely not going to work as
123 expected with entities loaded from PDB files, as the `is_ligand` flag is
124 probably not set properly.
126 Here is a snippet example of how to use this code::
128 from ost.mol.alg.ligand_scoring import LigandScorer
129 from ost.mol.alg import Molck, MolckSettings
132 # Structure model in PDB format, containing the receptor only
133 model = io.LoadPDB("path_to_model.pdb")
134 # Ligand model as SDF file
135 model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf")
136 # Target loaded from mmCIF, containing the ligand
137 target = io.LoadMMCIF("path_to_target.cif")
139 # Cleanup a copy of the structures
140 cleaned_model = model.Copy()
141 cleaned_target = target.Copy()
142 molck_settings = MolckSettings(rm_unk_atoms=True,
146 rm_zero_occ_atoms=False,
148 map_nonstd_res=False,
150 Molck(cleaned_model, conop.GetDefaultLib(), molck_settings)
151 Molck(cleaned_target, conop.GetDefaultLib(), molck_settings)
153 # Setup scorer object and compute lDDT-PLI
154 model_ligands = [model_ligand.Select("ele != H")]
155 ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands)
156 print("lDDT-PLI:", ls.lddt_pli)
157 print("RMSD:", ls.rmsd)
159 :param model: Model structure - a deep copy is available as :attr:`model`.
160 No additional processing (ie. Molck), checks,
161 stereochemistry checks or sanitization is performed on the
162 input. Hydrogen atoms are kept.
163 :type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
164 :param target: Target structure - a deep copy is available as :attr:`target`.
165 No additional processing (ie. Molck), checks or sanitization
166 is performed on the input. Hydrogen atoms are kept.
167 :type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
168 :param model_ligands: Model ligands, as a list of
169 :class:`~ost.mol.ResidueHandle` belonging to the model
170 entity. Can be instantiated with either a :class:list of
171 :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
172 or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`.
173 If `None`, ligands will be extracted from the `model` entity,
174 from chains with :class:`~ost.mol.ChainType`
175 `CHAINTYPE_NON_POLY` (this is normally set properly in
176 entities loaded from mmCIF).
177 :type model_ligands: :class:`list`
178 :param target_ligands: Target ligands, as a list of
179 :class:`~ost.mol.ResidueHandle` belonging to the target
180 entity. Can be instantiated either a :class:list of
181 :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
182 or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
183 containing a single residue each. If `None`, ligands will be
184 extracted from the `target` entity, from chains with
185 :class:`~ost.mol.ChainType` `CHAINTYPE_NON_POLY` (this is
186 normally set properly in entities loaded from mmCIF).
187 :type target_ligands: :class:`list`
188 :param resnum_alignments: Whether alignments between chemically equivalent
189 chains in *model* and *target* can be computed
190 based on residue numbers. This can be assumed in
191 benchmarking setups such as CAMEO/CASP.
192 :type resnum_alignments: :class:`bool`
193 :param check_resnames: On by default. Enforces residue name matches
194 between mapped model and target residues.
195 :type check_resnames: :class:`bool`
196 :param rename_ligand_chain: If a residue with the same chain name and
197 residue number than an explicitly passed model
198 or target ligand exits in the structure,
199 and `rename_ligand_chain` is False, a
200 RuntimeError will be raised. If
201 `rename_ligand_chain` is True, the ligand will
202 be moved to a new chain instead, and the move
203 will be logged to the console with SCRIPT
205 :type rename_ligand_chain: :class:`bool`
206 :param chain_mapper: a chain mapper initialized for the target structure.
207 If None (default), a chain mapper will be initialized
209 :type chain_mapper: :class:`ost.mol.alg.chain_mapping.ChainMapper`
210 :param substructure_match: Set this to True to allow partial target ligand.
211 :type substructure_match: :class:`bool`
212 :param coverage_delta: the coverage delta for partial ligand assignment.
213 :type coverage_delta: :class:`float`
214 :param radius: Inclusion radius for the binding site. Residues with
215 atoms within this distance of the ligand will be considered
216 for inclusion in the binding site.
217 :type radius: :class:`float`
218 :param lddt_pli_radius: lDDT inclusion radius for lDDT-PLI.
219 :type lddt_pli_radius: :class:`float`
220 :param lddt_lp_radius: lDDT inclusion radius for lDDT-LP.
221 :type lddt_lp_radius: :class:`float`
222 :param binding_sites_topn: maximum number of target binding site
223 representations to assess, per target ligand.
224 Ignored if `global_chain_mapping` is True.
225 :type binding_sites_topn: :class:`int`
226 :param global_chain_mapping: set to True to use a global chain mapping for
227 the polymer (protein, nucleotide) chains.
228 Defaults to False, in which case only local
229 chain mappings are allowed (where different
230 ligand may be scored against different chain
232 :type global_chain_mapping: :class:`bool`
233 :param custom_mapping: Provide custom chain mapping between *model* and
234 *target* that is used as global chain mapping.
235 Dictionary with target chain names as key and model
236 chain names as value. Only has an effect if
237 *global_chain_mapping* is True.
238 :type custom_mapping: :class:`dict`
239 :param rmsd_assignment: assign ligands based on RMSD only. The default
240 (False) is to use a combination of lDDT-PLI and
241 RMSD for the assignment.
242 :type rmsd_assignment: :class:`bool`
243 :param n_max_naive: Parameter for global chain mapping. If *model* and
244 *target* have less or equal that number of chains,
246 mapping solution space is enumerated to find the
247 the optimum. A heuristic is used otherwise.
248 :type n_max_naive: :class:`int`
249 :param unassigned: If True, unassigned model ligands are reported in
250 the output together with assigned ligands, with a score
251 of None, and reason for not being assigned in the
252 \*_details matrix. Defaults to False.
253 :type unassigned: :class:`bool`
255 def __init__(self, model, target, model_ligands=None, target_ligands=None,
256 resnum_alignments=
False, check_resnames=
True,
257 rename_ligand_chain=
False,
258 chain_mapper=
None, substructure_match=
False,
260 radius=4.0, lddt_pli_radius=6.0, lddt_lp_radius=10.0,
261 binding_sites_topn=100000, global_chain_mapping=
False,
262 rmsd_assignment=
False, n_max_naive=12,
263 custom_mapping=
None, unassigned=
False):
266 self.
model = mol.CreateEntityFromView(model,
False)
268 self.
model = model.Copy()
270 raise RuntimeError(
"model must be of type EntityView/EntityHandle")
273 self.
target = mol.CreateEntityFromView(target,
False)
275 self.
target = target.Copy()
277 raise RuntimeError(
"target must be of type EntityView/EntityHandle")
280 if target_ligands
is None:
287 LogWarning(
"No ligands in the target")
290 if model_ligands
is None:
297 LogWarning(
"No ligands in the model")
299 raise ValueError(
"No ligand in the model and in the target")
348 if custom_mapping
is not None:
353 """ Chain mapper object for the given :attr:`target`.
355 :type: :class:`ost.mol.alg.chain_mapping.ChainMapper`
364 def _model_mapping(self):
365 """Get the global chain mapping for the model."""
372 def _extract_ligands(entity):
373 """Extract ligands from entity. Return a list of residues.
375 Assumes that ligands are contained in one or more chain with chain type
376 `mol.ChainType.CHAINTYPE_NON_POLY`. This is typically the case
377 for entities loaded from mmCIF (tested with mmCIF files from the PDB
378 and SWISS-MODEL), but it will most likely not work for most entities
379 loaded from PDB files.
381 As a deviation from the mmCIF semantics, we allow a chain, set as
382 `CHAINTYPE_NON_POLY`, to contain more than one ligand. This function
383 performs basic checks to ensure that the residues in this chain are
384 not forming polymer bonds (ie peptide/nucleotide ligands) and will
385 raise a RuntimeError if this assumption is broken.
387 Note: This will not extract ligands based on the HET record in the old
388 PDB style, as this is not a reliable indicator and depends on how the
391 :param entity: the entity to extract ligands from
392 :type entity: :class:`~ost.mol.EntityHandle`
393 :rtype: :class:`list` of :class:`~ost.mol.ResidueHandle`
396 extracted_ligands = []
397 for chain
in entity.chains:
398 if chain.chain_type == mol.ChainType.CHAINTYPE_NON_POLY:
399 for residue
in chain.residues:
400 if mol.InSequence(residue, residue.next):
401 raise RuntimeError(
"Connected residues in non polymer "
402 "chain %s" % (chain.name))
403 residue.SetIsLigand(
True)
404 extracted_ligands.append(residue)
405 LogVerbose(
"Detected residue %s as ligand" % residue)
406 return extracted_ligands
409 def _prepare_ligands(new_entity, old_entity, ligands, rename_chain):
410 """Prepare the ligands given into a list of ResidueHandles which are
411 part of the copied entity, suitable for the model_ligands and
412 target_ligands properties.
414 This function takes a list of ligands as (Entity|Residue)(Handle|View).
415 Entities can contain multiple ligands, which will be considered as
418 Ligands which are part of the entity are simply fetched in the new
419 copied entity. Otherwise, they are copied over to the copied entity.
421 extracted_ligands = []
426 def _copy_residue(residue, rename_chain):
427 """ Copy the residue into the new chain.
428 Return the new residue handle."""
429 nonlocal next_chain_num, new_editor
432 if new_editor
is None:
433 new_editor = new_entity.EditXCS()
435 new_chain = new_entity.FindChain(residue.chain.name)
436 if not new_chain.IsValid():
437 new_chain = new_editor.InsertChain(residue.chain.name)
440 already_exists = new_chain.FindResidue(residue.number).IsValid()
445 new_chain_name = residue.chain.name +
"_" + str(chain_ext)
446 new_chain = new_entity.FindChain(new_chain_name)
447 if new_chain.IsValid():
451 new_chain = new_editor.InsertChain(new_chain_name)
453 LogScript(
"Moved ligand residue %s to new chain %s" % (
454 residue.qualified_name, new_chain.name))
456 msg =
"A residue number %s already exists in chain %s" % (
457 residue.number, residue.chain.name)
458 raise RuntimeError(msg)
461 new_res = new_editor.AppendResidue(new_chain, residue.name, residue.number)
463 for old_atom
in residue.atoms:
464 new_editor.InsertAtom(new_res, old_atom.name, old_atom.pos,
465 element=old_atom.element, occupancy=old_atom.occupancy,
466 b_factor=old_atom.b_factor, is_hetatm=old_atom.is_hetatom)
468 for old_atom
in residue.atoms:
469 for old_bond
in old_atom.bonds:
470 new_first = new_res.FindAtom(old_bond.first.name)
471 new_second = new_res.FindAtom(old_bond.second.name)
472 new_editor.Connect(new_first, new_second)
475 def _process_ligand_residue(res, rename_chain):
476 """Copy or fetch the residue. Return the residue handle."""
478 if res.entity.handle == old_entity.handle:
482 new_res = new_entity.FindResidue(res.chain.name, res.number)
483 if new_res
and new_res.valid:
484 LogVerbose(
"Ligand residue %s already in entity" % res.handle.qualified_name)
487 new_res = _copy_residue(res, rename_chain)
488 LogVerbose(
"Copied ligand residue %s" % res.handle.qualified_name)
489 new_res.SetIsLigand(
True)
492 for ligand
in ligands:
494 for residue
in ligand.residues:
495 new_residue = _process_ligand_residue(residue, rename_chain)
496 extracted_ligands.append(new_residue)
498 new_residue = _process_ligand_residue(ligand, rename_chain)
499 extracted_ligands.append(new_residue)
501 raise RuntimeError(
"Ligands should be given as Entity or Residue")
503 if new_editor
is not None:
504 new_editor.UpdateICS()
505 return extracted_ligands
507 def _get_binding_sites(self, ligand):
508 """Find representations of the binding site of *ligand* in the model.
510 Only consider protein and nucleic acid chains that pass the criteria
511 for the :class:`ost.mol.alg.chain_mapping`. This means ignoring other
512 ligands, waters, short polymers as well as any incorrectly connected
513 chain that may be in proximity.
515 :param ligand: Defines the binding site to identify.
516 :type ligand: :class:`~ost.mol.ResidueHandle`
521 ref_residues_hashes = set()
522 ignored_residue_hashes = {ligand.hash_code}
523 for ligand_at
in ligand.atoms:
524 close_atoms = self.target.FindWithin(ligand_at.GetPos(), self.
radius)
525 for close_at
in close_atoms:
527 ref_res = close_at.GetResidue()
528 h = ref_res.handle.GetHashCode()
529 if h
not in ref_residues_hashes
and \
530 h
not in ignored_residue_hashes:
531 if self.chain_mapper.target.ViewForHandle(ref_res).IsValid():
532 h = ref_res.handle.GetHashCode()
533 ref_residues_hashes.add(h)
534 elif ref_res.is_ligand:
535 LogWarning(
"Ignoring ligand %s in binding site of %s" % (
536 ref_res.qualified_name, ligand.qualified_name))
537 ignored_residue_hashes.add(h)
538 elif ref_res.chem_type == mol.ChemType.WATERS:
541 LogWarning(
"Ignoring residue %s in binding site of %s" % (
542 ref_res.qualified_name, ligand.qualified_name))
543 ignored_residue_hashes.add(h)
545 if ref_residues_hashes:
549 ref_bs = self.target.CreateEmptyView()
550 for ch
in self.target.chains:
551 for r
in ch.residues:
552 if r.handle.GetHashCode()
in ref_residues_hashes:
553 ref_bs.AddResidue(r, mol.ViewAddFlag.INCLUDE_ALL)
554 if len(ref_bs.residues) == 0:
555 raise RuntimeError(
"Failed to add proximity residues to "
556 "the reference binding site entity")
560 self.
_binding_sites[ligand.hash_code] = self.chain_mapper.GetRepr(
564 self.
_binding_sites[ligand.hash_code] = self.chain_mapper.GetRepr(
571 "model_representation",
572 "No representation of the reference binding site was "
573 "found in the model")
578 "No residue in proximity of the target ligand")
584 def _build_binding_site_entity(ligand, residues, extra_residues=[]):
585 """ Build an entity with all the binding site residues in chain A
586 and the ligand in chain _. Residues are renumbered consecutively from
587 1. The ligand is assigned residue number 1 and residue name LIG.
588 Residues in extra_residues not in `residues` in the model are added
589 at the end of chain A.
591 :param ligand: the Residue Handle of the ligand
592 :type ligand: :class:`~ost.mol.ResidueHandle`
593 :param residues: a list of binding site residues
594 :type residues: :class:`list` of :class:`~ost.mol.ResidueHandle`
595 :param extra_residues: an optional list with addition binding site
596 residues. Residues in this list which are not
597 in `residues` will be added at the end of chain
598 A. This allows for instance adding unmapped
599 residues missing from the model into the
600 reference binding site.
601 :type extra_residues: :class:`list` of :class:`~ost.mol.ResidueHandle`
602 :rtype: :class:`~ost.mol.EntityHandle`
604 bs_ent = mol.CreateEntity()
605 ed = bs_ent.EditXCS()
606 bs_chain = ed.InsertChain(
"A")
608 for resnum, old_res
in enumerate(residues, 1):
609 seen_res_qn.append(old_res.qualified_name)
610 new_res = ed.AppendResidue(bs_chain, old_res.handle,
612 ed.SetResidueNumber(new_res,
mol.ResNum(resnum))
615 for extra_res
in extra_residues:
616 if extra_res.qualified_name
not in seen_res_qn:
618 seen_res_qn.append(extra_res.qualified_name)
619 new_res = ed.AppendResidue(bs_chain,
622 ed.SetResidueNumber(new_res,
mol.ResNum(resnum))
624 ligand_chain = ed.InsertChain(
"_")
625 ligand_res = ed.AppendResidue(ligand_chain, ligand,
627 ed.RenameResidue(ligand_res,
"LIG")
628 ed.SetResidueNumber(ligand_res,
mol.ResNum(1))
633 def _compute_scores(self):
635 Compute the RMSD and lDDT-PLI scores for every possible target-model
636 ligand pair and store the result in internal matrices.
639 rmsd_full_matrix = np.empty(
641 lddt_pli_full_matrix = np.empty(
649 LogVerbose(
"Analyzing target ligand %s" % target_ligand)
652 LogVerbose(
"Found binding site with chain mapping %s" % (binding_site.GetFlatChainMapping()))
655 target_ligand, binding_site.ref_residues,
656 binding_site.substructure.residues)
657 ref_bs_ent_ligand = ref_bs_ent.FindResidue(
"_", 1)
660 ref_bs_ent_ligand.name:
661 mol.alg.lddt.CustomCompound.FromResidue(
665 custom_compounds=custom_compounds,
670 symmetries = _ComputeSymmetries(
671 model_ligand, target_ligand,
674 LogVerbose(
"Ligands %s and %s symmetry match" % (
675 str(model_ligand), str(target_ligand)))
676 except NoSymmetryError:
678 LogVerbose(
"No symmetry between %s and %s" % (
679 str(model_ligand), str(target_ligand)))
682 except DisconnectedGraphError:
685 substructure_match = len(symmetries[0][0]) != len(
687 coverage = len(symmetries[0][0]) / len(model_ligand.atoms)
691 rmsd =
SCRMSD(model_ligand, target_ligand,
692 transformation=binding_site.transform,
694 LogDebug(
"RMSD: %.4f" % rmsd)
697 if not rmsd_full_matrix[target_i, model_i]
or \
698 rmsd_full_matrix[target_i, model_i][
"rmsd"] > rmsd:
699 rmsd_full_matrix[target_i, model_i] = {
701 "lddt_lp": binding_site.lDDT,
702 "bs_ref_res": binding_site.substructure.residues,
703 "bs_ref_res_mapped": binding_site.ref_residues,
704 "bs_mdl_res_mapped": binding_site.mdl_residues,
705 "bb_rmsd": binding_site.bb_rmsd,
706 "target_ligand": target_ligand,
707 "model_ligand": model_ligand,
708 "chain_mapping": binding_site.GetFlatChainMapping(),
709 "transform": binding_site.transform,
710 "substructure_match": substructure_match,
711 "coverage": coverage,
712 "inconsistent_residues": binding_site.inconsistent_residues,
715 rmsd_full_matrix[target_i, model_i][
716 "unassigned"] =
False
717 LogDebug(
"Saved RMSD")
720 model_ligand, binding_site.mdl_residues, [])
721 mdl_bs_ent_ligand = mdl_bs_ent.FindResidue(
"_", 1)
725 mdl_editor = mdl_bs_ent.EditXCS()
726 for i, (trg_sym, mdl_sym)
in enumerate(symmetries):
728 for mdl_anum, trg_anum
in zip(mdl_sym, trg_sym):
730 trg_atom = ref_bs_ent_ligand.atoms[trg_anum]
731 mdl_atom = mdl_bs_ent_ligand.atoms[mdl_anum]
732 mdl_editor.RenameAtom(mdl_atom, trg_atom.name)
733 mdl_editor.UpdateICS()
735 global_lddt, local_lddt, lddt_tot, lddt_cons, n_res, \
736 n_cont, n_cons = lddt_scorer.lDDT(
737 mdl_bs_ent, chain_mapping={
"A":
"A",
"_":
"_"},
739 return_dist_test=
True,
741 LogDebug(
"lDDT-PLI for symmetry %d: %.4f" % (i, global_lddt))
744 if not lddt_pli_full_matrix[target_i, model_i]:
748 last_best_lddt = lddt_pli_full_matrix[
749 target_i, model_i][
"lddt_pli"]
750 last_best_rmsd = lddt_pli_full_matrix[
751 target_i, model_i][
"rmsd"]
752 if global_lddt > last_best_lddt:
755 elif global_lddt == last_best_lddt
and \
756 rmsd < last_best_rmsd:
762 lddt_pli_full_matrix[target_i, model_i] = {
763 "lddt_pli": global_lddt,
765 "lddt_lp": binding_site.lDDT,
766 "lddt_pli_n_contacts": lddt_tot,
767 "bs_ref_res": binding_site.substructure.residues,
768 "bs_ref_res_mapped": binding_site.ref_residues,
769 "bs_mdl_res_mapped": binding_site.mdl_residues,
770 "bb_rmsd": binding_site.bb_rmsd,
771 "target_ligand": target_ligand,
772 "model_ligand": model_ligand,
773 "chain_mapping": binding_site.GetFlatChainMapping(),
774 "transform": binding_site.transform,
775 "substructure_match": substructure_match,
776 "coverage": coverage,
777 "inconsistent_residues": binding_site.inconsistent_residues,
780 lddt_pli_full_matrix[target_i, model_i][
781 "unassigned"] =
False
782 LogDebug(
"Saved lDDT-PLI")
788 def _find_ligand_assignment(mat1, mat2=None, coverage=None, coverage_delta=None):
789 """ Find the ligand assignment based on mat1. If mat2 is provided, it
790 will be used to break ties in mat1. If mat2 is not provided, ties will
791 be resolved by taking the first match arbitrarily.
793 Both mat1 and mat2 should "look" like RMSD - ie be between inf (bad)
800 mat2[~np.isnan(mat2)] = np.inf
804 coverage = np.copy(mat1)
807 coverage = np.copy(coverage)
812 LogDebug(
"No model or target ligand, returning no assignment.")
815 def _get_best_match(mat1_val, coverage_val):
816 """ Extract the row/column indices of the prediction matching the
818 mat1_match_idx = np.argwhere((mat1 == mat1_val) & (coverage >= coverage_val))
820 if len(mat1_match_idx) > 1:
822 best_mat2_match = [mat2[tuple(x)]
for x
in mat1_match_idx]
825 best_mat2_idx = np.array(best_mat2_match).argmin()
827 return mat1_match_idx[best_mat2_idx]
829 return mat1_match_idx[0]
832 min_coverage = np.max(coverage)
833 while min_coverage > 0:
834 LogVerbose(
"Looking for matches with coverage >= %s" % min_coverage)
835 min_mat1 = LigandScorer._nanmin_nowarn(mat1, coverage < min_coverage)
836 while not np.isnan(min_mat1):
837 max_i_trg, max_i_mdl = _get_best_match(min_mat1, min_coverage)
841 alternative_matches = (mat1[:, max_i_mdl] < min_mat1) & (
842 coverage[:, max_i_mdl] > (min_coverage - coverage_delta))
843 if np.any(alternative_matches):
845 LogVerbose(
"Found match with lower coverage but better score")
846 min_mat1 = np.min(mat1[alternative_matches])
847 max_i_trg, max_i_mdl = _get_best_match(min_mat1, min_coverage - coverage_delta)
850 mat1[max_i_trg, :] = np.nan
851 mat1[:, max_i_mdl] = np.nan
852 mat2[max_i_trg, :] = np.nan
853 mat2[:, max_i_mdl] = np.nan
854 coverage[max_i_trg, :] = -np.inf
855 coverage[:, max_i_mdl] = -np.inf
858 assignments.append((max_i_trg, max_i_mdl))
861 min_mat1 = LigandScorer._nanmin_nowarn(mat1, coverage < min_coverage)
863 min_coverage = np.max(coverage)
867 def _nanmin_nowarn(array, mask):
868 """Compute np.nanmin but ignore the RuntimeWarning."""
869 masked_array = np_ma.masked_array(array, mask=mask)
870 with warnings.catch_warnings():
871 warnings.simplefilter(
"ignore")
872 min = np.nanmin(masked_array, )
873 if np_ma.is_masked(min):
882 def _reverse_lddt(lddt):
883 """Reverse lDDT means turning it from a number between 0 and 1 to a
884 number between infinity and 0 (0 being better).
886 In practice, this is 1/lDDT. If lDDT is 0, the result is infinity.
888 with warnings.catch_warnings():
889 warnings.simplefilter(
"ignore")
890 return np.float64(1) / lddt
892 def _assign_ligands_rmsd(self):
893 """Assign (map) ligands between model and target.
895 Sets self._rmsd and self._rmsd_details.
903 self.
_rmsd = mat_tuple[0]
910 def _assign_matrices(self, mat1, mat2, data, main_key):
912 Perform the ligand assignment, ie find the mapping between model and
915 The algorithm starts by assigning the "best" mapping, and then discards
916 the target and model ligands (row, column) so that every model ligand
917 can be assigned to a single target ligand, and every target ligand
918 is only assigned to a single model ligand. Repeat until there is
919 nothing left to assign.
921 In case of a tie in values in `mat1`, it uses `mat2` to break the tie.
923 This algorithm doesn't guarantee a globally optimal assignment.
925 Both `mat1` and `mat2` should contain values between 0 and infinity,
926 with lower values representing better scores. Use the
927 :meth:`_reverse_lddt` method to convert lDDT values to such a score.
929 :param mat1: the main ligand assignment criteria (RMSD or lDDT-PLI)
930 :param mat2: the secondary ligand assignment criteria (lDDT-PLI or RMSD)
931 :param data: the data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
932 :param main_key: the key of data (dictionnaries within `data`) to
933 assign into out_main.
934 :return: a tuple with 2 dictionaries of matrices containing the main
935 data, and details, respectively.
944 for assignment
in assignments:
945 trg_idx, mdl_idx = assignment
946 assigned_mdl[mdl_idx] =
True
947 assigned_trg[trg_idx] =
True
949 mdl_cname = mdl_lig.chain.name
950 mdl_resnum = mdl_lig.number
951 if mdl_cname
not in out_main:
952 out_main[mdl_cname] = {}
953 out_details[mdl_cname] = {}
954 out_main[mdl_cname][mdl_resnum] = data[
955 trg_idx, mdl_idx][main_key]
956 out_details[mdl_cname][mdl_resnum] = data[
961 assigned_trg, assigned_mdl, [out_main], [out_details], [main_key])
962 return out_main, out_details, unassigned_trg, unassigned_mdl
964 def _assign_unassigned(self, assigned_trg, assigned_mdl,
965 out_main, out_details, main_key):
969 unassigned_trg_idx = [i
for i, x
in enumerate(assigned_trg)
if not x]
970 unassigned_mdl_idx = [i
for i, x
in enumerate(assigned_mdl)
if not x]
972 for mdl_idx
in unassigned_mdl_idx:
975 mdl_cname = mdl_lig.chain.name
976 mdl_resnum = mdl_lig.number
977 if mdl_cname
not in unassigned_mdl:
978 unassigned_mdl[mdl_cname] = {}
979 unassigned_mdl[mdl_cname][mdl_resnum] = reason
981 for i, _
in enumerate(out_main):
982 if mdl_cname
not in out_main[i]:
983 out_main[i][mdl_cname] = {}
984 out_details[i][mdl_cname] = {}
985 out_main[i][mdl_cname][mdl_resnum] =
None
986 out_details[i][mdl_cname][mdl_resnum] = {
988 "reason_short": reason[0],
989 "reason_long": reason[1],
992 LogInfo(
"Model ligand %s is unassigned: %s" % (
993 mdl_lig.qualified_name, reason[1]))
995 for trg_idx
in unassigned_trg_idx:
998 trg_cname = trg_lig.chain.name
999 trg_resnum = trg_lig.number
1000 if trg_cname
not in unassigned_trg:
1001 unassigned_trg[trg_cname] = {}
1002 unassigned_trg[trg_cname][trg_resnum] = reason
1003 LogInfo(
"Target ligand %s is unassigned: %s" % (
1004 trg_lig.qualified_name, reason[1]))
1006 return unassigned_trg, unassigned_mdl
1009 def _assign_matrix(self, mat, data1, main_key1, data2, main_key2):
1011 Perform the ligand assignment, ie find the mapping between model and
1012 target ligands, based on a single matrix
1014 The algorithm starts by assigning the "best" mapping, and then discards
1015 the target and model ligands (row, column) so that every model ligand
1016 can be assigned to a single target ligand, and every target ligand
1017 is only assigned to a single model ligand. Repeat until there is
1018 nothing left to assign.
1020 This algorithm doesn't guarantee a globally optimal assignment.
1022 `mat` should contain values between 0 and infinity,
1023 with lower values representing better scores. Use the
1024 :meth:`_reverse_lddt` method to convert lDDT values to such a score.
1026 :param mat: the ligand assignment criteria (RMSD or lDDT-PLI)
1027 :param data1: the first data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
1028 :param main_key1: the first key of data (dictionnaries within `data`) to
1029 assign into out_main.
1030 :param data2: the second data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
1031 :param main_key2: the second key of data (dictionnaries within `data`) to
1032 assign into out_main.
1033 :return: a tuple with 4 dictionaries of matrices containing the main
1034 data1, details1, main data2 and details2, respectively.
1045 for assignment
in assignments:
1046 trg_idx, mdl_idx = assignment
1047 assigned_mdl[mdl_idx] =
True
1048 assigned_trg[trg_idx] =
True
1050 mdl_cname = mdl_lig.chain.name
1051 mdl_resnum = mdl_lig.number
1053 if mdl_cname
not in out_main1:
1054 out_main1[mdl_cname] = {}
1055 out_details1[mdl_cname] = {}
1056 out_main1[mdl_cname][mdl_resnum] = data1[
1057 trg_idx, mdl_idx][main_key1]
1058 out_details1[mdl_cname][mdl_resnum] = data1[
1061 if mdl_cname
not in out_main2:
1062 out_main2[mdl_cname] = {}
1063 out_details2[mdl_cname] = {}
1064 out_main2[mdl_cname][mdl_resnum] = data2[
1065 trg_idx, mdl_idx][main_key2]
1066 out_details2[mdl_cname][mdl_resnum] = data2[
1070 assigned_trg, assigned_mdl,
1071 [out_main1, out_main2], [out_details1, out_details2],
1072 [main_key1, main_key2])
1074 return out_main1, out_details1, out_main2, out_details2, \
1075 unassigned_trg, unassigned_mdl
1077 def _assign_ligands_lddt_pli(self):
1078 """ Assign ligands based on lDDT-PLI.
1080 Sets self._lddt_pli and self._lddt_pli_details.
1093 def _assign_ligands_rmsd_only(self):
1094 """Assign (map) ligands between model and target based on RMSD only.
1096 Sets self._rmsd, self._rmsd_details, self._lddt_pli and
1097 self._lddt_pli_details.
1104 self.
_rmsd = mat_tuple[0]
1113 """ Get the matrix of RMSD values.
1115 Target ligands are in rows, model ligands in columns.
1117 NaN values indicate that no RMSD could be computed (i.e. different
1120 :rtype: :class:`~numpy.ndarray`
1126 shape = self._rmsd_full_matrix.shape
1128 for i, j
in np.ndindex(shape):
1136 """ Get the matrix of lDDT-PLI values.
1138 Target ligands are in rows, model ligands in columns.
1140 NaN values indicate that no lDDT-PLI could be computed (i.e. different
1143 :rtype: :class:`~numpy.ndarray`
1149 shape = self._lddt_pli_full_matrix.shape
1151 for i, j
in np.ndindex(shape):
1159 """ Get the matrix of model ligand atom coverage in the target.
1161 Target ligands are in rows, model ligands in columns.
1163 A value of 0 indicates that there was no isomorphism between the model
1164 and target ligands. If `substructure_match=False`, only full match
1165 isomorphisms are considered, and therefore only values of 1.0 and 0.0
1168 :rtype: :class:`~numpy.ndarray`
1176 """Get a dictionary of RMSD score values, keyed by model ligand
1177 (chain name, :class:`~ost.mol.ResNum`).
1179 If the scoring object was instantiated with `unassigned=True`, some
1180 scores may be `None`.
1182 :rtype: :class:`dict`
1184 if self.
_rmsd is None:
1193 """Get a dictionary of RMSD score details (dictionaries), keyed by
1194 model ligand (chain name, :class:`~ost.mol.ResNum`).
1196 The value is a dictionary. For ligands that were assigned (mapped) to
1197 the target, the dictionary contain the following information:
1199 * `rmsd`: the RMSD score value.
1200 * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
1201 * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
1202 that define the binding site in the reference.
1203 * `bs_ref_res_mapped`: a list of residues
1204 (:class:`~ost.mol.ResidueHandle`) in the reference binding site
1205 that could be mapped to the model.
1206 * `bs_mdl_res_mapped`: a list of residues
1207 (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
1208 the reference binding site. The residues are in the same order as
1209 `bs_ref_res_mapped`.
1210 * `bb_rmsd`: the RMSD of the binding site backbone after superposition
1211 * `target_ligand`: residue handle of the target ligand.
1212 * `model_ligand`: residue handle of the model ligand.
1213 * `chain_mapping`: local chain mapping as a dictionary, with target
1214 chain name as key and model chain name as value.
1215 * `transform`: transformation to superpose the model onto the target.
1216 * `substructure_match`: whether the score is the result of a partial
1217 (substructure) match. A value of `True` indicates that the target
1218 ligand covers only part of the model, while `False` indicates a
1220 * `coverage`: the fraction of model atoms covered by the assigned
1221 target ligand, in the interval (0, 1]. If `substructure_match`
1222 is `False`, this will always be 1.
1223 * `inconsistent_residues`: a list of tuples of mapped residues views
1224 (:class:`~ost.mol.ResidueView`) with residue names that differ
1225 between the reference and the model, respectively.
1226 The list is empty if all residue names match, which is guaranteed
1227 if `check_resnames=True`.
1228 Note: more binding site mappings may be explored during scoring,
1229 but only inconsistencies in the selected mapping are reported.
1230 * `unassigned`: only if the scorer was instantiated with
1231 `unassigned=True`: `False`
1233 If the scoring object was instantiated with `unassigned=True`, in
1234 addition the unassigned ligands will be reported with a score of `None`
1235 and the following information:
1237 * `unassigned`: `True`,
1238 * `reason_short`: a short token of the reason, see
1239 :attr:`unassigned_model_ligands` for details and meaning.
1240 * `reason_long`: a human-readable text of the reason, see
1241 :attr:`unassigned_model_ligands` for details and meaning.
1244 :rtype: :class:`dict`
1255 """Get a dictionary of lDDT-PLI score values, keyed by model ligand
1256 (chain name, :class:`~ost.mol.ResNum`).
1258 If the scoring object was instantiated with `unassigned=True`, some
1259 scores may be `None`.
1261 :rtype: :class:`dict`
1272 """Get a dictionary of lDDT-PLI score details (dictionaries), keyed by
1273 model ligand (chain name, :class:`~ost.mol.ResNum`).
1275 Each sub-dictionary contains the following information:
1277 * `lddt_pli`: the lDDT-PLI score value.
1278 * `rmsd`: the RMSD score value corresponding to the lDDT-PLI
1279 chain mapping and assignment. This may differ from the RMSD-based
1280 assignment. Note that a different isomorphism than `lddt_pli` may
1282 * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
1283 * `lddt_pli_n_contacts`: number of total contacts used in lDDT-PLI,
1284 summed over all thresholds. Can be divided by 8 to obtain the number
1286 * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
1287 that define the binding site in the reference.
1288 * `bs_ref_res_mapped`: a list of residues
1289 (:class:`~ost.mol.ResidueHandle`) in the reference binding site
1290 that could be mapped to the model.
1291 * `bs_mdl_res_mapped`: a list of residues
1292 (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
1293 the reference binding site. The residues are in the same order as
1294 `bs_ref_res_mapped`.
1295 * `bb_rmsd`: the RMSD of the binding site backbone after superposition.
1296 Note: not used for lDDT-PLI computation.
1297 * `target_ligand`: residue handle of the target ligand.
1298 * `model_ligand`: residue handle of the model ligand.
1299 * `chain_mapping`: local chain mapping as a dictionary, with target
1300 chain name as key and model chain name as value.
1301 * `transform`: transformation to superpose the model onto the target
1303 * `substructure_match`: whether the score is the result of a partial
1304 (substructure) match. A value of `True` indicates that the target
1305 ligand covers only part of the model, while `False` indicates a
1307 * `inconsistent_residues`: a list of tuples of mapped residues views
1308 (:class:`~ost.mol.ResidueView`) with residue names that differ
1309 between the reference and the model, respectively.
1310 The list is empty if all residue names match, which is guaranteed
1311 if `check_resnames=True`.
1312 Note: more binding site mappings may be explored during scoring,
1313 but only inconsistencies in the selected mapping are reported.
1314 * `unassigned`: only if the scorer was instantiated with
1315 `unassigned=True`: `False`
1317 If the scoring object was instantiated with `unassigned=True`, in
1318 addition the unmapped ligands will be reported with a score of `None`
1319 and the following information:
1321 * `unassigned`: `True`,
1322 * `reason_short`: a short token of the reason, see
1323 :attr:`unassigned_model_ligands` for details and meaning.
1324 * `reason_long`: a human-readable text of the reason, see
1325 :attr:`unassigned_model_ligands` for details and meaning.
1326 * `lddt_pli`: `None`
1328 :rtype: :class:`dict`
1339 """Get a dictionary of target ligands not assigned to any model ligand,
1340 keyed by target ligand (chain name, :class:`~ost.mol.ResNum`).
1342 The assignment for the lDDT-PLI score is used (and is controlled
1343 by the `rmsd_assignment` argument).
1345 Each item contains a string from a controlled dictionary
1346 about the reason for the absence of assignment.
1347 A human-readable description can be obtained from the
1348 :attr:`unassigned_target_ligand_descriptions` property.
1350 Currently, the following reasons are reported:
1352 * `no_ligand`: there was no ligand in the model.
1353 * `disconnected`: the ligand graph was disconnected.
1354 * `binding_site`: no residues were in proximity of the ligand.
1355 * `model_representation`: no representation of the reference binding
1356 site was found in the model. (I.e. the binding site was not modeled.
1357 Remember: the binding site is defined in the target structure,
1358 the position of the model ligand itself is ignored at this point.)
1359 * `identity`: the ligand was not found in the model (by graph
1360 isomorphism). Check your ligand connectivity, and enable the
1361 `substructure_match` option if the target ligand is incomplete.
1362 * `stoichiometry`: there was a possible assignment in the model, but
1363 the model ligand was already assigned to a different target ligand.
1364 This indicates different stoichiometries.
1366 Some of these reasons can be overlapping, but a single reason will be
1369 :rtype: :class:`dict`
1378 for cname, res
in self._unassigned_target_ligands.items():
1380 for resnum, val
in res.items():
1387 """Get a human-readable description of why target ligands were
1388 unassigned, as a dictionary keyed by the controlled dictionary
1389 from :attr:`unassigned_target_ligands`.
1397 """Get a dictionary of model ligands not assigned to any target ligand,
1398 keyed by model ligand (chain name, :class:`~ost.mol.ResNum`).
1400 The assignment for the lDDT-PLI score is used (and is controlled
1401 by the `rmsd_assignment` argument).
1403 Each item contains a string from a controlled dictionary
1404 about the reason for the absence of assignment.
1405 A human-readable description can be obtained from the
1406 :attr:`unassigned_model_ligand_descriptions` property.
1407 Currently, the following reasons are reported:
1409 * `no_ligand`: there was no ligand in the target.
1410 * `disconnected`: the ligand graph is disconnected.
1411 * `binding_site`: a potential assignment was found in the target, but
1412 there were no polymer residues in proximity of the ligand in the
1414 * `model_representation`: a potential assignment was found in the target,
1415 but no representation of the binding site was found in the model.
1416 (I.e. the binding site was not modeled. Remember: the binding site
1417 is defined in the target structure, the position of the model ligand
1418 itself is ignored at this point.)
1419 * `identity`: the ligand was not found in the target (by graph
1420 isomorphism). Check your ligand connectivity, and enable the
1421 `substructure_match` option if the target ligand is incomplete.
1422 * `stoichiometry`: there was a possible assignment in the target, but
1423 the model target was already assigned to a different model ligand.
1424 This indicates different stoichiometries.
1426 Some of these reasons can be overlapping, but a single reason will be
1429 :rtype: :class:`dict`
1438 for cname, res
in self._unassigned_model_ligands.items():
1440 for resnum, val
in res.items():
1447 """Get a human-readable description of why model ligands were
1448 unassigned, as a dictionary keyed by the controlled dictionary
1449 from :attr:`unassigned_model_ligands`.
1456 def _set_custom_mapping(self, mapping):
1457 """ sets self.__model_mapping with a full blown MappingResult object
1459 :param mapping: mapping with trg chains as key and mdl ch as values
1460 :type mapping: :class:`dict`
1463 chem_mapping, chem_group_alns, mdl = \
1464 chain_mapper.GetChemMapping(self.
model)
1469 if len(mapping) != len(set(mapping.keys())):
1470 raise RuntimeError(f
"Expect unique trg chain names in mapping. Got "
1471 f
"{mapping.keys()}")
1472 if len(mapping) != len(set(mapping.values())):
1473 raise RuntimeError(f
"Expect unique mdl chain names in mapping. Got "
1474 f
"{mapping.values()}")
1476 trg_chains = set([ch.GetName()
for ch
in chain_mapper.target.chains])
1477 mdl_chains = set([ch.GetName()
for ch
in mdl.chains])
1478 for k,v
in mapping.items():
1479 if k
not in trg_chains:
1480 raise RuntimeError(f
"Target chain \"{k}\" is not available "
1481 f
"in target processed for chain mapping "
1483 if v
not in mdl_chains:
1484 raise RuntimeError(f
"Model chain \"{v}\" is not available "
1485 f
"in model processed for chain mapping "
1488 for trg_ch, mdl_ch
in mapping.items():
1489 trg_group_idx =
None
1490 mdl_group_idx =
None
1491 for idx, group
in enumerate(chain_mapper.chem_groups):
1495 for idx, group
in enumerate(chem_mapping):
1499 if trg_group_idx
is None or mdl_group_idx
is None:
1500 raise RuntimeError(
"Could not establish a valid chem grouping "
1501 "of chain names provided in custom mapping.")
1503 if trg_group_idx != mdl_group_idx:
1504 raise RuntimeError(f
"Chem group mismatch in custom mapping: "
1505 f
"target chain \"{trg_ch}\" groups with the "
1506 f
"following chemically equivalent target "
1508 f
"{chain_mapper.chem_groups[trg_group_idx]} "
1509 f
"but model chain \"{mdl_ch}\" maps to the "
1510 f
"following target chains: "
1511 f
"{chain_mapper.chem_groups[mdl_group_idx]}")
1513 pairs = set([(trg_ch, mdl_ch)
for trg_ch, mdl_ch
in mapping.items()])
1515 chain_mapping._GetRefMdlAlns(chain_mapper.chem_groups,
1516 chain_mapper.chem_group_alignments,
1522 final_mapping = list()
1523 for ref_chains
in chain_mapper.chem_groups:
1524 mapped_mdl_chains = list()
1525 for ref_ch
in ref_chains:
1526 if ref_ch
in mapping:
1527 mapped_mdl_chains.append(mapping[ref_ch])
1529 mapped_mdl_chains.append(
None)
1530 final_mapping.append(mapped_mdl_chains)
1533 for ref_group, mdl_group
in zip(chain_mapper.chem_groups,
1535 for ref_ch, mdl_ch
in zip(ref_group, mdl_group):
1536 if ref_ch
is not None and mdl_ch
is not None:
1537 aln = ref_mdl_alns[(ref_ch, mdl_ch)]
1538 trg_view = chain_mapper.target.Select(f
"cname={ref_ch}")
1539 mdl_view = mdl.Select(f
"cname={mdl_ch}")
1540 aln.AttachView(0, trg_view)
1541 aln.AttachView(1, mdl_view)
1542 alns[(ref_ch, mdl_ch)] = aln
1545 chain_mapper.chem_groups,
1547 final_mapping, alns)
1549 def _find_unassigned_model_ligand_reason(self, ligand, assignment="lddt_pli", check=True):
1552 ligand_idx = self.model_ligands.index(ligand)
1555 raise ValueError(
"Ligand %s is not in self.model_ligands" % ligand)
1559 details = getattr(self, assignment +
"_details")
1560 if ligand.chain.name
in details
and ligand.number
in details[ligand.chain.name]:
1561 ligand_details = details[ligand.chain.name][ligand.number]
1562 if not (
"unassigned" in ligand_details
and ligand_details[
"unassigned"]):
1563 raise RuntimeError(
"Ligand %s is mapped to %s" % (ligand, ligand_details[
"target_ligand"]))
1567 return (
"no_ligand",
"No ligand in the target")
1570 graph = _ResidueToGraph(ligand)
1571 if not networkx.is_connected(graph):
1572 return (
"disconnected",
"Ligand graph is disconnected")
1576 if np.isnan(assigned):
1583 return_symmetries=
False)
1584 except (NoSymmetryError, DisconnectedGraphError):
1592 assignment_matrix = getattr(self, assignment +
"_matrix")
1593 all_nan = np.all(np.isnan(assignment_matrix[:, ligand_idx]))
1601 return (
"stoichiometry",
1602 "Ligand was already assigned to an other "
1603 "model ligand (different stoichiometry)")
1607 iso =
"subgraph isomorphism"
1609 iso =
"full graph isomorphism"
1610 return (
"identity",
"Ligand was not found in the target (by %s)" % iso)
1612 def _find_unassigned_target_ligand_reason(self, ligand, assignment="lddt_pli", check=True):
1615 ligand_idx = self.target_ligands.index(ligand)
1618 raise ValueError(
"Ligand %s is not in self.target_ligands" % ligand)
1622 details = getattr(self, assignment +
"_details")
1623 for cname, chain_ligands
in details.items():
1624 for rnum, details
in chain_ligands.items():
1625 if "unassigned" in details
and details[
"unassigned"]:
1627 if details[
'target_ligand'] == ligand:
1628 raise RuntimeError(
"Ligand %s is mapped to %s.%s" % (
1629 ligand, cname, rnum))
1633 return (
"no_ligand",
"No ligand in the model")
1636 graph = _ResidueToGraph(ligand)
1637 if not networkx.is_connected(graph):
1638 return (
"disconnected",
"Ligand graph is disconnected")
1644 for model_lig_idx, assigned
in enumerate(
1646 if np.isnan(assigned):
1653 return_symmetries=
False)
1654 except (NoSymmetryError, DisconnectedGraphError):
1661 return (
"stoichiometry",
1662 "Ligand was already assigned to an other "
1663 "target ligand (different stoichiometry)")
1667 iso =
"subgraph isomorphism"
1669 iso =
"full graph isomorphism"
1670 return (
"identity",
"Ligand was not found in the model (by %s)" % iso)
1673 def _ResidueToGraph(residue, by_atom_index=False):
1674 """Return a NetworkX graph representation of the residue.
1676 :param residue: the residue from which to derive the graph
1677 :type residue: :class:`ost.mol.ResidueHandle` or
1678 :class:`ost.mol.ResidueView`
1679 :param by_atom_index: Set this parameter to True if you need the nodes to
1680 be labeled by atom index (within the residue).
1681 Otherwise, if False, the nodes will be labeled by
1683 :type by_atom_index: :class:`bool`
1684 :rtype: :class:`~networkx.classes.graph.Graph`
1686 Nodes are labeled with the Atom's uppercase :attr:`~ost.mol.AtomHandle.element`.
1688 nxg = networkx.Graph()
1690 for atom
in residue.atoms:
1691 nxg.add_node(atom.name, element=atom.element.upper())
1695 nxg.add_edges_from([(
1697 b.second.name)
for a
in residue.atoms
for b
in a.GetBondList()])
1700 nxg = networkx.relabel_nodes(nxg,
1701 {a: b
for a, b
in zip(
1702 [a.name
for a
in residue.atoms],
1703 range(len(residue.atoms)))},
1708 def SCRMSD(model_ligand, target_ligand, transformation=geom.Mat4(),
1709 substructure_match=
False):
1710 """Calculate symmetry-corrected RMSD.
1712 Binding site superposition must be computed separately and passed as
1715 :param model_ligand: The model ligand
1716 :type model_ligand: :class:`ost.mol.ResidueHandle` or
1717 :class:`ost.mol.ResidueView`
1718 :param target_ligand: The target ligand
1719 :type target_ligand: :class:`ost.mol.ResidueHandle` or
1720 :class:`ost.mol.ResidueView`
1721 :param transformation: Optional transformation to apply on each atom
1722 position of model_ligand.
1723 :type transformation: :class:`ost.geom.Mat4`
1724 :param substructure_match: Set this to True to allow partial target
1726 :type substructure_match: :class:`bool`
1727 :rtype: :class:`float`
1728 :raises: :class:`NoSymmetryError` when no symmetry can be found,
1729 :class:`DisconnectedGraphError` when ligand graph is disconnected.
1732 symmetries = _ComputeSymmetries(model_ligand, target_ligand,
1733 substructure_match=substructure_match,
1737 for i, (trg_sym, mdl_sym)
in enumerate(symmetries):
1739 trg_lig_rmsd_ent = mol.CreateEntity()
1740 trg_lig_rmsd_editor = trg_lig_rmsd_ent.EditXCS()
1741 trg_lig_rmsd_chain = trg_lig_rmsd_editor.InsertChain(
"_")
1742 trg_lig_rmsd_res = trg_lig_rmsd_editor.AppendResidue(trg_lig_rmsd_chain,
"LIG")
1744 mdl_lig_rmsd_ent = mol.CreateEntity()
1745 mdl_lig_rmsd_editor = mdl_lig_rmsd_ent.EditXCS()
1746 mdl_lig_rmsd_chain = mdl_lig_rmsd_editor.InsertChain(
"_")
1747 mdl_lig_rmsd_res = mdl_lig_rmsd_editor.AppendResidue(mdl_lig_rmsd_chain,
"LIG")
1749 for mdl_anum, trg_anum
in zip(mdl_sym, trg_sym):
1751 trg_atom = target_ligand.atoms[trg_anum]
1752 mdl_atom = model_ligand.atoms[mdl_anum]
1754 trg_lig_rmsd_editor.InsertAtom(trg_lig_rmsd_res, trg_atom.name, trg_atom.pos)
1755 mdl_lig_rmsd_editor.InsertAtom(mdl_lig_rmsd_res, mdl_atom.name, mdl_atom.pos)
1757 trg_lig_rmsd_editor.UpdateICS()
1758 mdl_lig_rmsd_editor.UpdateICS()
1760 rmsd = mol.alg.CalculateRMSD(mdl_lig_rmsd_ent.CreateFullView(),
1761 trg_lig_rmsd_ent.CreateFullView(),
1763 if rmsd < best_rmsd:
1769 def _ComputeSymmetries(model_ligand, target_ligand, substructure_match=False,
1770 by_atom_index=
False, return_symmetries=
True):
1771 """Return a list of symmetries (isomorphisms) of the model onto the target
1774 :param model_ligand: The model ligand
1775 :type model_ligand: :class:`ost.mol.ResidueHandle` or
1776 :class:`ost.mol.ResidueView`
1777 :param target_ligand: The target ligand
1778 :type target_ligand: :class:`ost.mol.ResidueHandle` or
1779 :class:`ost.mol.ResidueView`
1780 :param substructure_match: Set this to True to allow partial ligands
1782 :type substructure_match: :class:`bool`
1783 :param by_atom_index: Set this parameter to True if you need the symmetries
1784 to refer to atom index (within the residue).
1785 Otherwise, if False, the symmetries refer to atom
1787 :type by_atom_index: :class:`bool`
1788 :type return_symmetries: If Truthy, return the mappings, otherwise simply
1789 return True if a mapping is found (and raise if
1790 no mapping is found). This is useful to quickly
1791 find out if a mapping exist without the expensive
1792 step to find all the mappings.
1793 :type return_symmetries: :class:`bool`
1794 :raises: :class:`NoSymmetryError` when no symmetry can be found.
1799 model_graph = _ResidueToGraph(model_ligand, by_atom_index=by_atom_index)
1800 target_graph = _ResidueToGraph(target_ligand, by_atom_index=by_atom_index)
1802 if not networkx.is_connected(model_graph):
1804 if not networkx.is_connected(target_graph):
1811 gm = networkx.algorithms.isomorphism.GraphMatcher(
1812 model_graph, target_graph, node_match=
lambda x, y:
1813 x[
"element"] == y[
"element"])
1814 if gm.is_isomorphic():
1815 if not return_symmetries:
1818 (list(isomorphism.values()), list(isomorphism.keys()))
1819 for isomorphism
in gm.isomorphisms_iter()]
1820 assert len(symmetries) > 0
1821 LogDebug(
"Found %s isomorphic mappings (symmetries)" % len(symmetries))
1822 elif gm.subgraph_is_isomorphic()
and substructure_match:
1823 if not return_symmetries:
1825 symmetries = [(list(isomorphism.values()), list(isomorphism.keys()))
for isomorphism
in
1826 gm.subgraph_isomorphisms_iter()]
1827 assert len(symmetries) > 0
1829 assert len(symmetries[0][0]) == len(target_ligand.atoms)
1830 LogDebug(
"Found %s subgraph isomorphisms (symmetries)" % len(symmetries))
1831 elif gm.subgraph_is_isomorphic():
1832 LogDebug(
"Found subgraph isomorphisms (symmetries), but"
1833 " ignoring because substructure_match=False")
1835 str(model_ligand), str(target_ligand)))
1837 LogDebug(
"Found no isomorphic mappings (symmetries)")
1839 str(model_ligand), str(target_ligand)))
1845 """Exception to be raised when no symmetry can be found.
1849 class DisconnectedGraphError(Exception):
1850 """Exception to be raised when the ligand graph is disconnected.
1855 __all__ = [
"LigandScorer",
"SCRMSD",
"NoSymmetryError",
"DisconnectedGraphError"]
def unassigned_model_ligand_descriptions
def _assign_ligands_rmsd_only
_unassigned_target_ligand_short
def _find_unassigned_target_ligand_reason
def unassigned_model_ligands
def _build_binding_site_entity
_unassigned_target_ligands
_unassigned_target_ligands_reason
def _find_ligand_assignment
def _find_unassigned_model_ligand_reason
def unassigned_target_ligands
_unassigned_model_ligand_short
_unassigned_model_ligands
def unassigned_target_ligand_descriptions
_unassigned_model_ligand_descriptions
_unassigned_target_ligand_descriptions
_assignment_match_coverage
def _assign_ligands_lddt_pli