OpenStructure
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
ligand_scoring.py
Go to the documentation of this file.
1 import warnings
2 
3 import numpy as np
4 import numpy.ma as np_ma
5 import networkx
6 
7 from ost import mol
8 from ost import geom
9 from ost import LogError, LogWarning, LogScript, LogInfo, LogVerbose, LogDebug
10 from ost.mol.alg import chain_mapping
11 
12 
14  """ Scorer class to compute various small molecule ligand (non polymer) scores.
15 
16  .. note ::
17  Extra requirements:
18 
19  - Python modules `numpy` and `networkx` must be available
20  (e.g. use ``pip install numpy networkx``)
21 
22  At the moment, two scores are available:
23 
24  * lDDT-PLI, that looks at the conservation of protein-ligand contacts
25  with :class:`lDDT <ost.mol.alg.lddt.lDDTScorer>`.
26  * Binding-site superposed, symmetry-corrected RMSD that assesses the
27  accuracy of the ligand pose (BiSyRMSD, hereinafter referred to as RMSD).
28 
29  Both scores involve local chain mapping of the reference binding site
30  onto the model, symmetry-correction, and finally assignment (mapping)
31  of model and target ligands, as described in (Manuscript in preparation).
32 
33  The binding site is defined based on a radius around the target ligand
34  in the reference structure only. It only contains protein and nucleic
35  acid chains that pass the criteria for the
36  :class:`chain mapping <ost.mol.alg.chain_mapping>`. This means ignoring
37  other ligands, waters, short polymers as well as any incorrectly connected
38  chains that may be in proximity.
39 
40  Results are available as matrices (`(lddt_pli|rmsd)_matrix`), where every
41  target-model score is reported in a matrix; as `(lddt_pli|rmsd)` where
42  a model-target assignment has been determined (see below) and reported in
43  a dictionary; and as (`(lddt_pli|rmsd)_details`) methods, which report
44  additional details about different aspects of the scoring such as chain
45  mapping.
46 
47  The behavior of chain mapping and ligand assignment can be controlled
48  with the `global_chain_mapping` and `rmsd_assignment` arguments.
49 
50  By default, chain mapping is performed locally, ie. only within the
51  binding site. As a result, different ligand scores can correspond to
52  different chain mappings. This tends to produce more favorable scores,
53  especially in large, partially regular oligomeric complexes.
54  Setting `global_chain_mapping=True` enforces a single global chain mapping,
55  as per :meth:`ost.mol.alg.chain_mapping.ChainMapper.GetMapping`.
56  Note that this global chain mapping currently ignores non polymer entities
57  such as small ligands, and may result in overly pessimistic scores.
58 
59  By default, target-model ligand assignments are computed independently
60  for the RMSD and lDDT-PLI scores. For RMSD, each model ligand is uniquely
61  assigned to a target ligand, starting from the "best" possible mapping
62  (lowest RMSD) and using each target and model ligand in a single
63  assignment. Ties are resolved by best (highest) lDDT-PLI. Similarly,
64  for lDDT-PLI, the assignment is based on the highest lDDT-PLI, and ties
65  broken by lowest RMSD. Setting `rmsd_assignment=True` forces a single
66  ligand assignment, based on RMSD only. Ties are broken arbitrarily.
67 
68  By default, only exact matches between target and model ligands are
69  considered. This is a problem when the target only contains a subset
70  of the expected atoms (for instance if atoms are missing in an
71  experimental structure, which often happens in the PDB). With
72  `substructure_match=True`, complete model ligands can be scored against
73  partial target ligands. One problem with this approach is that it is
74  very easy to find good matches to small, irrelevant ligands like EDO, CO2
75  or GOL. To counter that, the assignment algorithm considers the coverage,
76  expressed as the fraction of atoms of the model ligand atoms covered in the
77  target. Higher coverage matches are prioritized, but a match with a better
78  score will be preferred if it falls within a window of `coverage_delta`
79  (by default 0.2) of a worse-scoring match. As a result, for instance,
80  with a delta of 0.2, a low-score match with coverage 0.96 would be
81  preferred to a high-score match with coverage 0.90.
82 
83  Assumptions:
84 
85  The class generally assumes that the
86  :attr:`~ost.mol.ResidueHandle.is_ligand` property is properly set on all
87  the ligand atoms, and only ligand atoms. This is typically the case for
88  entities loaded from mmCIF (tested with mmCIF files from the PDB and
89  SWISS-MODEL). Legacy PDB files must contain `HET` headers (which is usually
90  the case for files downloaded from the PDB but not elsewhere).
91 
92  The class doesn't perform any cleanup of the provided structures.
93  It is up to the caller to ensure that the data is clean and suitable for
94  scoring. :ref:`Molck <molck>` should be used with extra
95  care, as many of the options (such as `rm_non_std` or `map_nonstd_res`) can
96  cause ligands to be removed from the structure. If cleanup with Molck is
97  needed, ligands should be kept aside and passed separately. Non-ligand residues
98  should be valid compounds with atom names following the naming conventions
99  of the component dictionary. Non-standard residues are acceptable, and if
100  the model contains a standard residue at that position, only atoms with
101  matching names will be considered.
102 
103  Unlike most of OpenStructure, this class does not assume that the ligands
104  (either in the model or the target) are part of the PDB component
105  dictionary. They may have arbitrary residue names. Residue names do not
106  have to match between the model and the target. Matching is based on
107  the calculation of isomorphisms which depend on the atom element name and
108  atom connectivity (bond order is ignored).
109  It is up to the caller to ensure that the connectivity of atoms is properly
110  set before passing any ligands to this class. Ligands with improper
111  connectivity will lead to bogus results.
112 
113  Note, however, that atom names should be unique within a residue (ie two
114  distinct atoms cannot have the same atom name).
115 
116  This only applies to the ligand. The rest of the model and target
117  structures (protein, nucleic acids) must still follow the usual rules and
118  contain only residues from the compound library.
119 
120  Although it isn't a requirement, hydrogen atoms should be removed from the
121  structures. Here is an example code snippet that will perform a reasonable
122  cleanup. Keep in mind that this is most likely not going to work as
123  expected with entities loaded from PDB files, as the `is_ligand` flag is
124  probably not set properly.
125 
126  Here is a snippet example of how to use this code::
127 
128  from ost.mol.alg.ligand_scoring import LigandScorer
129  from ost.mol.alg import Molck, MolckSettings
130 
131  # Load data
132  # Structure model in PDB format, containing the receptor only
133  model = io.LoadPDB("path_to_model.pdb")
134  # Ligand model as SDF file
135  model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf")
136  # Target loaded from mmCIF, containing the ligand
137  target = io.LoadMMCIF("path_to_target.cif")
138 
139  # Cleanup a copy of the structures
140  cleaned_model = model.Copy()
141  cleaned_target = target.Copy()
142  molck_settings = MolckSettings(rm_unk_atoms=True,
143  rm_non_std=False,
144  rm_hyd_atoms=True,
145  rm_oxt_atoms=False,
146  rm_zero_occ_atoms=False,
147  colored=False,
148  map_nonstd_res=False,
149  assign_elem=True)
150  Molck(cleaned_model, conop.GetDefaultLib(), molck_settings)
151  Molck(cleaned_target, conop.GetDefaultLib(), molck_settings)
152 
153  # Setup scorer object and compute lDDT-PLI
154  model_ligands = [model_ligand.Select("ele != H")]
155  ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands)
156  print("lDDT-PLI:", ls.lddt_pli)
157  print("RMSD:", ls.rmsd)
158 
159  :param model: Model structure - a deep copy is available as :attr:`model`.
160  No additional processing (ie. Molck), checks,
161  stereochemistry checks or sanitization is performed on the
162  input. Hydrogen atoms are kept.
163  :type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
164  :param target: Target structure - a deep copy is available as :attr:`target`.
165  No additional processing (ie. Molck), checks or sanitization
166  is performed on the input. Hydrogen atoms are kept.
167  :type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
168  :param model_ligands: Model ligands, as a list of
169  :class:`~ost.mol.ResidueHandle` belonging to the model
170  entity. Can be instantiated with either a :class:list of
171  :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
172  or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`.
173  If `None`, ligands will be extracted based on the
174  :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is
175  normally set properly in entities loaded from mmCIF).
176  :type model_ligands: :class:`list`
177  :param target_ligands: Target ligands, as a list of
178  :class:`~ost.mol.ResidueHandle` belonging to the target
179  entity. Can be instantiated either a :class:list of
180  :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
181  or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
182  containing a single residue each.
183  If `None`, ligands will be extracted based on the
184  :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is
185  normally set properly in entities loaded from mmCIF).
186  :type target_ligands: :class:`list`
187  :param resnum_alignments: Whether alignments between chemically equivalent
188  chains in *model* and *target* can be computed
189  based on residue numbers. This can be assumed in
190  benchmarking setups such as CAMEO/CASP.
191  :type resnum_alignments: :class:`bool`
192  :param check_resnames: On by default. Enforces residue name matches
193  between mapped model and target residues.
194  :type check_resnames: :class:`bool`
195  :param rename_ligand_chain: If a residue with the same chain name and
196  residue number than an explicitly passed model
197  or target ligand exits in the structure,
198  and `rename_ligand_chain` is False, a
199  RuntimeError will be raised. If
200  `rename_ligand_chain` is True, the ligand will
201  be moved to a new chain instead, and the move
202  will be logged to the console with SCRIPT
203  level.
204  :type rename_ligand_chain: :class:`bool`
205  :param chain_mapper: a chain mapper initialized for the target structure.
206  If None (default), a chain mapper will be initialized
207  lazily as required.
208  :type chain_mapper: :class:`ost.mol.alg.chain_mapping.ChainMapper`
209  :param substructure_match: Set this to True to allow partial target ligand.
210  :type substructure_match: :class:`bool`
211  :param coverage_delta: the coverage delta for partial ligand assignment.
212  :type coverage_delta: :class:`float`
213  :param radius: Inclusion radius for the binding site. Residues with
214  atoms within this distance of the ligand will be considered
215  for inclusion in the binding site.
216  :type radius: :class:`float`
217  :param lddt_pli_radius: lDDT inclusion radius for lDDT-PLI.
218  :type lddt_pli_radius: :class:`float`
219  :param lddt_lp_radius: lDDT inclusion radius for lDDT-LP.
220  :type lddt_lp_radius: :class:`float`
221  :param binding_sites_topn: maximum number of target binding site
222  representations to assess, per target ligand.
223  Ignored if `global_chain_mapping` is True.
224  :type binding_sites_topn: :class:`int`
225  :param global_chain_mapping: set to True to use a global chain mapping for
226  the polymer (protein, nucleotide) chains.
227  Defaults to False, in which case only local
228  chain mappings are allowed (where different
229  ligand may be scored against different chain
230  mappings).
231  :type global_chain_mapping: :class:`bool`
232  :param custom_mapping: Provide custom chain mapping between *model* and
233  *target* that is used as global chain mapping.
234  Dictionary with target chain names as key and model
235  chain names as value. Only has an effect if
236  *global_chain_mapping* is True.
237  :type custom_mapping: :class:`dict`
238  :param rmsd_assignment: assign ligands based on RMSD only. The default
239  (False) is to use a combination of lDDT-PLI and
240  RMSD for the assignment.
241  :type rmsd_assignment: :class:`bool`
242  :param n_max_naive: Parameter for global chain mapping. If *model* and
243  *target* have less or equal that number of chains,
244  the full
245  mapping solution space is enumerated to find the
246  the optimum. A heuristic is used otherwise.
247  :type n_max_naive: :class:`int`
248  :param max_symmetries: If more than that many isomorphisms exist for
249  a target-ligand pair, it will be ignored and reported
250  as unassigned.
251  :type max_symmetries: :class:`int`
252  :param unassigned: If True, unassigned model ligands are reported in
253  the output together with assigned ligands, with a score
254  of None, and reason for not being assigned in the
255  \\*_details matrix. Defaults to False.
256  :type unassigned: :class:`bool`
257  """
258  def __init__(self, model, target, model_ligands=None, target_ligands=None,
259  resnum_alignments=False, check_resnames=True,
260  rename_ligand_chain=False,
261  chain_mapper=None, substructure_match=False,
262  coverage_delta=0.2,
263  radius=4.0, lddt_pli_radius=6.0, lddt_lp_radius=10.0,
264  binding_sites_topn=100000, global_chain_mapping=False,
265  rmsd_assignment=False, n_max_naive=12, max_symmetries=1e5,
266  custom_mapping=None, unassigned=False):
267 
268  if isinstance(model, mol.EntityView):
269  self.model = mol.CreateEntityFromView(model, False)
270  elif isinstance(model, mol.EntityHandle):
271  self.model = model.Copy()
272  else:
273  raise RuntimeError("model must be of type EntityView/EntityHandle")
274 
275  if isinstance(target, mol.EntityView):
276  self.target = mol.CreateEntityFromView(target, False)
277  elif isinstance(target, mol.EntityHandle):
278  self.target = target.Copy()
279  else:
280  raise RuntimeError("target must be of type EntityView/EntityHandle")
281 
282  # Extract ligands from target
283  if target_ligands is None:
285  else:
286  self.target_ligands = self._prepare_ligands(self.target, target,
287  target_ligands,
288  rename_ligand_chain)
289  if len(self.target_ligands) == 0:
290  LogWarning("No ligands in the target")
291 
292  # Extract ligands from model
293  if model_ligands is None:
295  else:
296  self.model_ligands = self._prepare_ligands(self.model, model,
297  model_ligands,
298  rename_ligand_chain)
299  if len(self.model_ligands) == 0:
300  LogWarning("No ligands in the model")
301  if len(self.target_ligands) == 0:
302  raise ValueError("No ligand in the model and in the target")
303 
304  self._chain_mapper = chain_mapper
305  self.resnum_alignments = resnum_alignments
306  self.check_resnames = check_resnames
307  self.rename_ligand_chain = rename_ligand_chain
308  self.substructure_match = substructure_match
309  self.radius = radius
310  self.lddt_pli_radius = lddt_pli_radius
311  self.lddt_lp_radius = lddt_lp_radius
312  self.binding_sites_topn = binding_sites_topn
313  self.global_chain_mapping = global_chain_mapping
314  self.rmsd_assignment = rmsd_assignment
315  self.n_max_naive = n_max_naive
316  self.max_symmetries = max_symmetries
317  self.unassigned = unassigned
318  self.coverage_delta = coverage_delta
319 
320  # scoring matrices
321  self._rmsd_matrix = None
322  self._rmsd_full_matrix = None
323  self._lddt_pli_matrix = None
324  self._lddt_pli_full_matrix = None
325 
326  # lazily computed scores
327  self._rmsd = None
328  self._rmsd_details = None
329  self._lddt_pli = None
330  self._lddt_pli_details = None
331 
332  # lazily precomputed variables
333  self._binding_sites = {}
334  self.__model_mapping = None
335 
336  # Bookkeeping of unassigned ligands
337  self._unassigned_target_ligands = None
338  self._unassigned_model_ligands = None
344  # Keep track of symmetries/isomorphisms (regardless of scoring)
345  # 0.0: no isomorphism
346  # 1.0: isomorphic
347  # np.nan: not assessed yet - that's why we can't use a boolean
348  self._assignment_isomorphisms = None
349  # Keep track of match coverage (only in case there was a score)
350  self._assignment_match_coverage = None
351 
352  if custom_mapping is not None:
353  self._set_custom_mapping(custom_mapping)
354 
355  @property
356  def chain_mapper(self):
357  """ Chain mapper object for the given :attr:`target`.
358 
359  :type: :class:`ost.mol.alg.chain_mapping.ChainMapper`
360  """
361  if self._chain_mapper is None:
363  n_max_naive=1e9,
364  resnum_alignments=self.resnum_alignments)
365  return self._chain_mapper
366 
367  @property
368  def _model_mapping(self):
369  """Get the global chain mapping for the model."""
370  if self.__model_mapping is None:
371  self.__model_mapping = self.chain_mapper.GetMapping(self.model,
372  n_max_naive=self.n_max_naive)
373  return self.__model_mapping
374 
375  @staticmethod
376  def _extract_ligands(entity):
377  """Extract ligands from entity. Return a list of residues.
378 
379  Assumes that ligands have the :attr:`~ost.mol.ResidueHandle.is_ligand`
380  flag set. This is typically the case for entities loaded from mmCIF
381  (tested with mmCIF files from the PDB and SWISS-MODEL).
382  Legacy PDB files must contain `HET` headers (which is usually the
383  case for files downloaded from the PDB but not elsewhere).
384 
385  This function performs basic checks to ensure that the residues in this
386  chain are not forming polymer bonds (ie peptide/nucleotide ligands) and
387  will raise a RuntimeError if this assumption is broken.
388 
389  :param entity: the entity to extract ligands from
390  :type entity: :class:`~ost.mol.EntityHandle`
391  :rtype: :class:`list` of :class:`~ost.mol.ResidueHandle`
392 
393  """
394  extracted_ligands = []
395  for residue in entity.residues:
396  if residue.is_ligand:
397  if mol.InSequence(residue, residue.next):
398  raise RuntimeError("Residue %s connected in polymer sequen"
399  "ce %s" % (residue.qualified_name))
400  extracted_ligands.append(residue)
401  LogVerbose("Detected residue %s as ligand" % residue)
402  return extracted_ligands
403 
404  @staticmethod
405  def _prepare_ligands(new_entity, old_entity, ligands, rename_chain):
406  """Prepare the ligands given into a list of ResidueHandles which are
407  part of the copied entity, suitable for the model_ligands and
408  target_ligands properties.
409 
410  This function takes a list of ligands as (Entity|Residue)(Handle|View).
411  Entities can contain multiple ligands, which will be considered as
412  separate ligands.
413 
414  Ligands which are part of the entity are simply fetched in the new
415  copied entity. Otherwise, they are copied over to the copied entity.
416  """
417  extracted_ligands = []
418 
419  next_chain_num = 1
420  new_editor = None
421 
422  def _copy_residue(residue, rename_chain):
423  """ Copy the residue into the new chain.
424  Return the new residue handle."""
425  nonlocal next_chain_num, new_editor
426 
427  # Instantiate the editor
428  if new_editor is None:
429  new_editor = new_entity.EditXCS()
430 
431  new_chain = new_entity.FindChain(residue.chain.name)
432  if not new_chain.IsValid():
433  new_chain = new_editor.InsertChain(residue.chain.name)
434  else:
435  # Does a residue with the same name already exist?
436  already_exists = new_chain.FindResidue(residue.number).IsValid()
437  if already_exists:
438  if rename_chain:
439  chain_ext = 2 # Extend the chain name by this
440  while True:
441  new_chain_name = residue.chain.name + "_" + str(chain_ext)
442  new_chain = new_entity.FindChain(new_chain_name)
443  if new_chain.IsValid():
444  chain_ext += 1
445  continue
446  else:
447  new_chain = new_editor.InsertChain(new_chain_name)
448  break
449  LogScript("Moved ligand residue %s to new chain %s" % (
450  residue.qualified_name, new_chain.name))
451  else:
452  msg = "A residue number %s already exists in chain %s" % (
453  residue.number, residue.chain.name)
454  raise RuntimeError(msg)
455 
456  # Add the residue with its original residue number
457  new_res = new_editor.AppendResidue(new_chain, residue.name, residue.number)
458  # Add atoms
459  for old_atom in residue.atoms:
460  new_editor.InsertAtom(new_res, old_atom.name, old_atom.pos,
461  element=old_atom.element, occupancy=old_atom.occupancy,
462  b_factor=old_atom.b_factor, is_hetatm=old_atom.is_hetatom)
463  # Add bonds
464  for old_atom in residue.atoms:
465  for old_bond in old_atom.bonds:
466  new_first = new_res.FindAtom(old_bond.first.name)
467  new_second = new_res.FindAtom(old_bond.second.name)
468  new_editor.Connect(new_first, new_second)
469  return new_res
470 
471  def _process_ligand_residue(res, rename_chain):
472  """Copy or fetch the residue. Return the residue handle."""
473  new_res = None
474  if res.entity.handle == old_entity.handle:
475  # Residue is part of the old_entity handle.
476  # However, it may not be in the copied one, for instance it may have been a view
477  # We try to grab it first, otherwise we copy it
478  new_res = new_entity.FindResidue(res.chain.name, res.number)
479  if new_res and new_res.valid:
480  LogVerbose("Ligand residue %s already in entity" % res.handle.qualified_name)
481  else:
482  # Residue is not part of the entity, need to copy it first
483  new_res = _copy_residue(res, rename_chain)
484  LogVerbose("Copied ligand residue %s" % res.handle.qualified_name)
485  new_res.SetIsLigand(True)
486  return new_res
487 
488  for ligand in ligands:
489  if isinstance(ligand, mol.EntityHandle) or isinstance(ligand, mol.EntityView):
490  for residue in ligand.residues:
491  new_residue = _process_ligand_residue(residue, rename_chain)
492  extracted_ligands.append(new_residue)
493  elif isinstance(ligand, mol.ResidueHandle) or isinstance(ligand, mol.ResidueView):
494  new_residue = _process_ligand_residue(ligand, rename_chain)
495  extracted_ligands.append(new_residue)
496  else:
497  raise RuntimeError("Ligands should be given as Entity or Residue")
498 
499  if new_editor is not None:
500  new_editor.UpdateICS()
501  return extracted_ligands
502 
503  def _get_binding_sites(self, ligand):
504  """Find representations of the binding site of *ligand* in the model.
505 
506  Only consider protein and nucleic acid chains that pass the criteria
507  for the :class:`ost.mol.alg.chain_mapping`. This means ignoring other
508  ligands, waters, short polymers as well as any incorrectly connected
509  chain that may be in proximity.
510 
511  :param ligand: Defines the binding site to identify.
512  :type ligand: :class:`~ost.mol.ResidueHandle`
513  """
514  if ligand.hash_code not in self._binding_sites:
515 
516  # create view of reference binding site
517  ref_residues_hashes = set() # helper to keep track of added residues
518  ignored_residue_hashes = {ligand.hash_code}
519  for ligand_at in ligand.atoms:
520  close_atoms = self.target.FindWithin(ligand_at.GetPos(), self.radius)
521  for close_at in close_atoms:
522  # Skip any residue not in the chain mapping target
523  ref_res = close_at.GetResidue()
524  h = ref_res.handle.GetHashCode()
525  if h not in ref_residues_hashes and \
526  h not in ignored_residue_hashes:
527  if self.chain_mapper.target.ViewForHandle(ref_res).IsValid():
528  h = ref_res.handle.GetHashCode()
529  ref_residues_hashes.add(h)
530  elif ref_res.is_ligand:
531  LogWarning("Ignoring ligand %s in binding site of %s" % (
532  ref_res.qualified_name, ligand.qualified_name))
533  ignored_residue_hashes.add(h)
534  elif ref_res.chem_type == mol.ChemType.WATERS:
535  pass # That's ok, no need to warn
536  else:
537  LogWarning("Ignoring residue %s in binding site of %s" % (
538  ref_res.qualified_name, ligand.qualified_name))
539  ignored_residue_hashes.add(h)
540 
541  if ref_residues_hashes:
542  # reason for doing that separately is to guarantee same ordering of
543  # residues as in underlying entity. (Reorder by ResNum seems only
544  # available on ChainHandles)
545  ref_bs = self.target.CreateEmptyView()
546  for ch in self.target.chains:
547  for r in ch.residues:
548  if r.handle.GetHashCode() in ref_residues_hashes:
549  ref_bs.AddResidue(r, mol.ViewAddFlag.INCLUDE_ALL)
550  if len(ref_bs.residues) == 0:
551  raise RuntimeError("Failed to add proximity residues to "
552  "the reference binding site entity")
553 
554  # Find the representations
555  if self.global_chain_mapping:
556  self._binding_sites[ligand.hash_code] = self.chain_mapper.GetRepr(
557  ref_bs, self.model, inclusion_radius=self.lddt_lp_radius,
558  global_mapping = self._model_mapping)
559  else:
560  self._binding_sites[ligand.hash_code] = self.chain_mapper.GetRepr(
561  ref_bs, self.model, inclusion_radius=self.lddt_lp_radius,
562  topn=self.binding_sites_topn)
563 
564  # Flag empty representation
565  if not self._binding_sites[ligand.hash_code]:
566  self._unassigned_target_ligands_reason[ligand] = (
567  "model_representation",
568  "No representation of the reference binding site was "
569  "found in the model")
570 
571  else: # if ref_residues_hashes
572  # Flag missing binding site
573  self._unassigned_target_ligands_reason[ligand] = ("binding_site",
574  "No residue in proximity of the target ligand")
575  self._binding_sites[ligand.hash_code] = []
576 
577  return self._binding_sites[ligand.hash_code]
578 
579  @staticmethod
580  def _build_binding_site_entity(ligand, residues, extra_residues=[]):
581  """ Build an entity with all the binding site residues in chain A
582  and the ligand in chain _. Residues are renumbered consecutively from
583  1. The ligand is assigned residue number 1 and residue name LIG.
584  Residues in extra_residues not in `residues` in the model are added
585  at the end of chain A.
586 
587  :param ligand: the Residue Handle of the ligand
588  :type ligand: :class:`~ost.mol.ResidueHandle`
589  :param residues: a list of binding site residues
590  :type residues: :class:`list` of :class:`~ost.mol.ResidueHandle`
591  :param extra_residues: an optional list with addition binding site
592  residues. Residues in this list which are not
593  in `residues` will be added at the end of chain
594  A. This allows for instance adding unmapped
595  residues missing from the model into the
596  reference binding site.
597  :type extra_residues: :class:`list` of :class:`~ost.mol.ResidueHandle`
598  :rtype: :class:`~ost.mol.EntityHandle`
599  """
600  bs_ent = mol.CreateEntity()
601  ed = bs_ent.EditXCS()
602  bs_chain = ed.InsertChain("A")
603  seen_res_qn = []
604  for resnum, old_res in enumerate(residues, 1):
605  seen_res_qn.append(old_res.qualified_name)
606  new_res = ed.AppendResidue(bs_chain, old_res.handle,
607  deep=True)
608  ed.SetResidueNumber(new_res, mol.ResNum(resnum))
609 
610  # Add extra residues at the end.
611  for extra_res in extra_residues:
612  if extra_res.qualified_name not in seen_res_qn:
613  resnum += 1
614  seen_res_qn.append(extra_res.qualified_name)
615  new_res = ed.AppendResidue(bs_chain,
616  extra_res.handle,
617  deep=True)
618  ed.SetResidueNumber(new_res, mol.ResNum(resnum))
619  # Add the ligand in chain _
620  ligand_chain = ed.InsertChain("_")
621  ligand_res = ed.AppendResidue(ligand_chain, ligand.handle,
622  deep=True)
623  ed.RenameResidue(ligand_res, "LIG")
624  ed.SetResidueNumber(ligand_res, mol.ResNum(1))
625  ed.UpdateICS()
626 
627  return bs_ent
628 
629  def _compute_scores(self):
630  """
631  Compute the RMSD and lDDT-PLI scores for every possible target-model
632  ligand pair and store the result in internal matrices.
633  """
634  # Create the result matrices
635  rmsd_full_matrix = np.empty(
636  (len(self.target_ligands), len(self.model_ligands)), dtype=dict)
637  lddt_pli_full_matrix = np.empty(
638  (len(self.target_ligands), len(self.model_ligands)), dtype=dict)
639  self._assignment_isomorphisms = np.full(
640  (len(self.target_ligands), len(self.model_ligands)), fill_value=np.nan)
641  self._assignment_match_coverage = np.zeros(
642  (len(self.target_ligands), len(self.model_ligands)))
643 
644  for target_i, target_ligand in enumerate(self.target_ligands):
645  LogVerbose("Analyzing target ligand %s" % target_ligand)
646 
647  for binding_site in self._get_binding_sites(target_ligand):
648  LogVerbose("Found binding site with chain mapping %s" % (binding_site.GetFlatChainMapping()))
649 
650  ref_bs_ent = self._build_binding_site_entity(
651  target_ligand, binding_site.ref_residues,
652  binding_site.substructure.residues)
653  ref_bs_ent_ligand = ref_bs_ent.FindResidue("_", 1) # by definition
654 
655  custom_compounds = {
656  ref_bs_ent_ligand.name:
657  mol.alg.lddt.CustomCompound.FromResidue(
658  ref_bs_ent_ligand)}
659  lddt_scorer = mol.alg.lddt.lDDTScorer(
660  ref_bs_ent,
661  custom_compounds=custom_compounds,
662  inclusion_radius=self.lddt_pli_radius)
663 
664  for model_i, model_ligand in enumerate(self.model_ligands):
665  try:
666  symmetries = _ComputeSymmetries(
667  model_ligand, target_ligand,
668  substructure_match=self.substructure_match,
669  by_atom_index=True,
670  max_symmetries=self.max_symmetries)
671  LogVerbose("Ligands %s and %s symmetry match" % (
672  str(model_ligand), str(target_ligand)))
673  except NoSymmetryError:
674  # Ligands are different - skip
675  LogVerbose("No symmetry between %s and %s" % (
676  str(model_ligand), str(target_ligand)))
677  self._assignment_isomorphisms[target_i, model_i] = 0.
678  continue
679  except TooManySymmetriesError:
680  # Ligands are too symmetrical - skip
681  LogVerbose("Too many symmetries between %s and %s" % (
682  str(model_ligand), str(target_ligand)))
683  self._assignment_isomorphisms[target_i, model_i] = -1.
684  continue
685  except DisconnectedGraphError:
686  # Disconnected graph is handled elsewhere
687  continue
688  substructure_match = len(symmetries[0][0]) != len(
689  model_ligand.atoms)
690  coverage = len(symmetries[0][0]) / len(model_ligand.atoms)
691  self._assignment_match_coverage[target_i, model_i] = coverage
692  self._assignment_isomorphisms[target_i, model_i] = 1.
693 
694  rmsd = _SCRMSD_symmetries(symmetries, model_ligand,
695  target_ligand, transformation=binding_site.transform)
696  LogDebug("RMSD: %.4f" % rmsd)
697 
698  # Save results?
699  if not rmsd_full_matrix[target_i, model_i] or \
700  rmsd_full_matrix[target_i, model_i]["rmsd"] > rmsd:
701  rmsd_full_matrix[target_i, model_i] = {
702  "rmsd": rmsd,
703  "lddt_lp": binding_site.lDDT,
704  "bs_ref_res": binding_site.substructure.residues,
705  "bs_ref_res_mapped": binding_site.ref_residues,
706  "bs_mdl_res_mapped": binding_site.mdl_residues,
707  "bb_rmsd": binding_site.bb_rmsd,
708  "target_ligand": target_ligand,
709  "model_ligand": model_ligand,
710  "chain_mapping": binding_site.GetFlatChainMapping(),
711  "transform": binding_site.transform,
712  "substructure_match": substructure_match,
713  "coverage": coverage,
714  "inconsistent_residues": binding_site.inconsistent_residues,
715  }
716  if self.unassigned:
717  rmsd_full_matrix[target_i, model_i][
718  "unassigned"] = False
719  LogDebug("Saved RMSD")
720 
721  mdl_bs_ent = self._build_binding_site_entity(
722  model_ligand, binding_site.mdl_residues, [])
723  mdl_bs_ent_ligand = mdl_bs_ent.FindResidue("_", 1) # by definition
724 
725  # Now for each symmetry, loop and rename atoms according
726  # to ref.
727  mdl_editor = mdl_bs_ent.EditXCS()
728  for i, (trg_sym, mdl_sym) in enumerate(symmetries):
729  # Prepare Entities for RMSD
730  for mdl_anum, trg_anum in zip(mdl_sym, trg_sym):
731  # Rename model atoms according to symmetry
732  trg_atom = ref_bs_ent_ligand.atoms[trg_anum]
733  mdl_atom = mdl_bs_ent_ligand.atoms[mdl_anum]
734  mdl_editor.RenameAtom(mdl_atom, trg_atom.name)
735  mdl_editor.UpdateICS()
736 
737  global_lddt, local_lddt, lddt_tot, lddt_cons, n_res, \
738  n_cont, n_cons = lddt_scorer.lDDT(
739  mdl_bs_ent, chain_mapping={"A": "A", "_": "_"},
740  no_intrachain=True,
741  return_dist_test=True,
742  check_resnames=self.check_resnames)
743  LogDebug("lDDT-PLI for symmetry %d: %.4f" % (i, global_lddt))
744 
745  # Save results?
746  if not lddt_pli_full_matrix[target_i, model_i]:
747  # First iteration
748  save_lddt = True
749  else:
750  last_best_lddt = lddt_pli_full_matrix[
751  target_i, model_i]["lddt_pli"]
752  last_best_rmsd = lddt_pli_full_matrix[
753  target_i, model_i]["rmsd"]
754  if global_lddt > last_best_lddt:
755  # Better lDDT-PLI
756  save_lddt = True
757  elif global_lddt == last_best_lddt and \
758  rmsd < last_best_rmsd:
759  # Same lDDT-PLI, better RMSD
760  save_lddt = True
761  else:
762  save_lddt = False
763  if save_lddt:
764  lddt_pli_full_matrix[target_i, model_i] = {
765  "lddt_pli": global_lddt,
766  "rmsd": rmsd,
767  "lddt_lp": binding_site.lDDT,
768  "lddt_pli_n_contacts": lddt_tot,
769  "bs_ref_res": binding_site.substructure.residues,
770  "bs_ref_res_mapped": binding_site.ref_residues,
771  "bs_mdl_res_mapped": binding_site.mdl_residues,
772  "bb_rmsd": binding_site.bb_rmsd,
773  "target_ligand": target_ligand,
774  "model_ligand": model_ligand,
775  "chain_mapping": binding_site.GetFlatChainMapping(),
776  "transform": binding_site.transform,
777  "substructure_match": substructure_match,
778  "coverage": coverage,
779  "inconsistent_residues": binding_site.inconsistent_residues,
780  }
781  if self.unassigned:
782  lddt_pli_full_matrix[target_i, model_i][
783  "unassigned"] = False
784  LogDebug("Saved lDDT-PLI")
785 
786  self._rmsd_full_matrix = rmsd_full_matrix
787  self._lddt_pli_full_matrix = lddt_pli_full_matrix
788 
789  @staticmethod
790  def _find_ligand_assignment(mat1, mat2=None, coverage=None, coverage_delta=None):
791  """ Find the ligand assignment based on mat1. If mat2 is provided, it
792  will be used to break ties in mat1. If mat2 is not provided, ties will
793  be resolved by taking the first match arbitrarily.
794 
795  Both mat1 and mat2 should "look" like RMSD - ie be between inf (bad)
796  and 0 (good).
797  """
798  # We will modify mat1 and mat2, so make copies of it first
799  mat1 = np.copy(mat1)
800  if mat2 is None:
801  mat2 = np.copy(mat1)
802  mat2[~np.isnan(mat2)] = np.inf
803  else:
804  mat2 = np.copy(mat2)
805  if coverage is None:
806  coverage = np.copy(mat1)
807  coverage[:] = 1 # Assume full coverage by default
808  else:
809  coverage = np.copy(coverage)
810 
811  assignments = []
812  if 0 in mat1.shape:
813  # No model or target ligand
814  LogDebug("No model or target ligand, returning no assignment.")
815  return assignments
816 
817  def _get_best_match(mat1_val, coverage_val):
818  """ Extract the row/column indices of the prediction matching the
819  given values."""
820  mat1_match_idx = np.argwhere((mat1 == mat1_val) & (coverage >= coverage_val))
821  # Multiple "best" - use mat2 to disambiguate
822  if len(mat1_match_idx) > 1:
823  # Get the values of mat2 at these positions
824  best_mat2_match = [mat2[tuple(x)] for x in mat1_match_idx]
825  # Find the index of the best mat2
826  # Note: argmin returns the first value which is min.
827  best_mat2_idx = np.array(best_mat2_match).argmin()
828  # Now get the original indices
829  return mat1_match_idx[best_mat2_idx]
830  else:
831  return mat1_match_idx[0]
832 
833  # First only consider top coverage matches.
834  min_coverage = np.max(coverage)
835  while min_coverage > 0:
836  LogVerbose("Looking for matches with coverage >= %s" % min_coverage)
837  min_mat1 = LigandScorer._nanmin_nowarn(mat1, coverage < min_coverage)
838  while not np.isnan(min_mat1):
839  max_i_trg, max_i_mdl = _get_best_match(min_mat1, min_coverage)
840 
841  # Would we have a match for this model ligand with higher score
842  # but lower coverage?
843  alternative_matches = (mat1[:, max_i_mdl] < min_mat1) & (
844  coverage[:, max_i_mdl] > (min_coverage - coverage_delta))
845  if np.any(alternative_matches):
846  # Get the scores of these matches
847  LogVerbose("Found match with lower coverage but better score")
848  min_mat1 = np.nanmin(mat1[alternative_matches])
849  max_i_trg, max_i_mdl = _get_best_match(min_mat1, min_coverage - coverage_delta)
850 
851  # Disable row and column
852  mat1[max_i_trg, :] = np.nan
853  mat1[:, max_i_mdl] = np.nan
854  mat2[max_i_trg, :] = np.nan
855  mat2[:, max_i_mdl] = np.nan
856  coverage[max_i_trg, :] = -np.inf
857  coverage[:, max_i_mdl] = -np.inf
858 
859  # Save
860  assignments.append((max_i_trg, max_i_mdl))
861 
862  # Recompute min
863  min_mat1 = LigandScorer._nanmin_nowarn(mat1, coverage < min_coverage)
864  # Recompute min_coverage
865  min_coverage = np.max(coverage)
866  return assignments
867 
868  @staticmethod
869  def _nanmin_nowarn(array, mask):
870  """Compute np.nanmin but ignore the RuntimeWarning."""
871  masked_array = np_ma.masked_array(array, mask=mask)
872  with warnings.catch_warnings(): # RuntimeWarning: All-NaN slice encountered
873  warnings.simplefilter("ignore")
874  min = np.nanmin(masked_array, )
875  if np_ma.is_masked(min):
876  return np.nan # Everything was masked
877  else:
878  return min
879 
880  @staticmethod
881  def _reverse_lddt(lddt):
882  """Reverse lDDT means turning it from a number between 0 and 1 to a
883  number between infinity and 0 (0 being better).
884 
885  In practice, this is 1/lDDT. If lDDT is 0, the result is infinity.
886  """
887  with warnings.catch_warnings(): # RuntimeWarning: divide by zero
888  warnings.simplefilter("ignore")
889  return np.float64(1) / lddt
890 
891  def _assign_ligands_rmsd(self):
892  """Assign (map) ligands between model and target.
893 
894  Sets self._rmsd and self._rmsd_details.
895  """
896  mat2 = self._reverse_lddt(self.lddt_pli_matrix)
897 
898  mat_tuple = self._assign_matrices(self.rmsd_matrix,
899  mat2,
900  self._rmsd_full_matrix,
901  "rmsd")
902  self._rmsd = mat_tuple[0]
903  self._rmsd_details = mat_tuple[1]
904  # Ignore unassigned ligands - they are dealt with in lddt_pli.
905  # So the following lines should stay commented out:
906  # self._unassigned_target_ligands = mat_tuple[2]
907  # self._unassigned_model_ligands = mat_tuple[3]
908 
909  def _assign_matrices(self, mat1, mat2, data, main_key):
910  """
911  Perform the ligand assignment, ie find the mapping between model and
912  target ligands.
913 
914  The algorithm starts by assigning the "best" mapping, and then discards
915  the target and model ligands (row, column) so that every model ligand
916  can be assigned to a single target ligand, and every target ligand
917  is only assigned to a single model ligand. Repeat until there is
918  nothing left to assign.
919 
920  In case of a tie in values in `mat1`, it uses `mat2` to break the tie.
921 
922  This algorithm doesn't guarantee a globally optimal assignment.
923 
924  Both `mat1` and `mat2` should contain values between 0 and infinity,
925  with lower values representing better scores. Use the
926  :meth:`_reverse_lddt` method to convert lDDT values to such a score.
927 
928  :param mat1: the main ligand assignment criteria (RMSD or lDDT-PLI)
929  :param mat2: the secondary ligand assignment criteria (lDDT-PLI or RMSD)
930  :param data: the data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
931  :param main_key: the key of data (dictionnaries within `data`) to
932  assign into out_main.
933  :return: a tuple with 2 dictionaries of matrices containing the main
934  data, and details, respectively.
935  """
936  assignments = self._find_ligand_assignment(mat1, mat2,
938  self.coverage_delta)
939  out_main = {}
940  out_details = {}
941  assigned_trg = [False] * len(self.target_ligands)
942  assigned_mdl = [False] * len(self.model_ligands)
943  for assignment in assignments:
944  trg_idx, mdl_idx = assignment
945  assigned_mdl[mdl_idx] = True
946  assigned_trg[trg_idx] = True
947  mdl_lig = self.model_ligands[mdl_idx]
948  mdl_cname = mdl_lig.chain.name
949  mdl_resnum = mdl_lig.number
950  if mdl_cname not in out_main:
951  out_main[mdl_cname] = {}
952  out_details[mdl_cname] = {}
953  out_main[mdl_cname][mdl_resnum] = data[
954  trg_idx, mdl_idx][main_key]
955  out_details[mdl_cname][mdl_resnum] = data[
956  trg_idx, mdl_idx]
957 
958  unassigned_trg, unassigned_mdl = self._assign_unassigned(
959  assigned_trg, assigned_mdl, [out_main], [out_details], [main_key])
960  return out_main, out_details, unassigned_trg, unassigned_mdl
961 
962  def _assign_unassigned(self, assigned_trg, assigned_mdl,
963  out_main, out_details, main_key):
964  unassigned_trg = {}
965  unassigned_mdl = {}
966 
967  unassigned_trg_idx = [i for i, x in enumerate(assigned_trg) if not x]
968  unassigned_mdl_idx = [i for i, x in enumerate(assigned_mdl) if not x]
969 
970  for mdl_idx in unassigned_mdl_idx:
971  mdl_lig = self.model_ligands[mdl_idx]
972  reason = self._find_unassigned_model_ligand_reason(mdl_lig, check=False)
973  mdl_cname = mdl_lig.chain.name
974  mdl_resnum = mdl_lig.number
975  if mdl_cname not in unassigned_mdl:
976  unassigned_mdl[mdl_cname] = {}
977  unassigned_mdl[mdl_cname][mdl_resnum] = reason
978  if self.unassigned:
979  for i, _ in enumerate(out_main):
980  if mdl_cname not in out_main[i]:
981  out_main[i][mdl_cname] = {}
982  out_details[i][mdl_cname] = {}
983  out_main[i][mdl_cname][mdl_resnum] = None
984  out_details[i][mdl_cname][mdl_resnum] = {
985  "unassigned": True,
986  "reason_short": reason[0],
987  "reason_long": reason[1],
988  main_key[i]: None,
989  }
990  LogInfo("Model ligand %s is unassigned: %s" % (
991  mdl_lig.qualified_name, reason[1]))
992 
993  for trg_idx in unassigned_trg_idx:
994  trg_lig = self.target_ligands[trg_idx]
995  reason = self._find_unassigned_target_ligand_reason(trg_lig, check=False)
996  trg_cname = trg_lig.chain.name
997  trg_resnum = trg_lig.number
998  if trg_cname not in unassigned_trg:
999  unassigned_trg[trg_cname] = {}
1000  unassigned_trg[trg_cname][trg_resnum] = reason
1001  LogInfo("Target ligand %s is unassigned: %s" % (
1002  trg_lig.qualified_name, reason[1]))
1003 
1004  return unassigned_trg, unassigned_mdl
1005 
1006 
1007  def _assign_matrix(self, mat, data1, main_key1, data2, main_key2):
1008  """
1009  Perform the ligand assignment, ie find the mapping between model and
1010  target ligands, based on a single matrix
1011 
1012  The algorithm starts by assigning the "best" mapping, and then discards
1013  the target and model ligands (row, column) so that every model ligand
1014  can be assigned to a single target ligand, and every target ligand
1015  is only assigned to a single model ligand. Repeat until there is
1016  nothing left to assign.
1017 
1018  This algorithm doesn't guarantee a globally optimal assignment.
1019 
1020  `mat` should contain values between 0 and infinity,
1021  with lower values representing better scores. Use the
1022  :meth:`_reverse_lddt` method to convert lDDT values to such a score.
1023 
1024  :param mat: the ligand assignment criteria (RMSD or lDDT-PLI)
1025  :param data1: the first data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
1026  :param main_key1: the first key of data (dictionnaries within `data`) to
1027  assign into out_main.
1028  :param data2: the second data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
1029  :param main_key2: the second key of data (dictionnaries within `data`) to
1030  assign into out_main.
1031  :return: a tuple with 4 dictionaries of matrices containing the main
1032  data1, details1, main data2 and details2, respectively.
1033  """
1034  assignments = self._find_ligand_assignment(mat,
1035  coverage=self._assignment_match_coverage,
1036  coverage_delta=self.coverage_delta)
1037  out_main1 = {}
1038  out_details1 = {}
1039  out_main2 = {}
1040  out_details2 = {}
1041  assigned_trg = [False] * len(self.target_ligands)
1042  assigned_mdl = [False] * len(self.model_ligands)
1043  for assignment in assignments:
1044  trg_idx, mdl_idx = assignment
1045  assigned_mdl[mdl_idx] = True
1046  assigned_trg[trg_idx] = True
1047  mdl_lig = self.model_ligands[mdl_idx]
1048  mdl_cname = mdl_lig.chain.name
1049  mdl_resnum = mdl_lig.number
1050  # Data 1
1051  if mdl_cname not in out_main1:
1052  out_main1[mdl_cname] = {}
1053  out_details1[mdl_cname] = {}
1054  out_main1[mdl_cname][mdl_resnum] = data1[
1055  trg_idx, mdl_idx][main_key1]
1056  out_details1[mdl_cname][mdl_resnum] = data1[
1057  trg_idx, mdl_idx]
1058  # Data2
1059  if mdl_cname not in out_main2:
1060  out_main2[mdl_cname] = {}
1061  out_details2[mdl_cname] = {}
1062  out_main2[mdl_cname][mdl_resnum] = data2[
1063  trg_idx, mdl_idx][main_key2]
1064  out_details2[mdl_cname][mdl_resnum] = data2[
1065  trg_idx, mdl_idx]
1066 
1067  unassigned_trg, unassigned_mdl = self._assign_unassigned(
1068  assigned_trg, assigned_mdl,
1069  [out_main1, out_main2], [out_details1, out_details2],
1070  [main_key1, main_key2])
1071 
1072  return out_main1, out_details1, out_main2, out_details2, \
1073  unassigned_trg, unassigned_mdl
1074 
1075  def _assign_ligands_lddt_pli(self):
1076  """ Assign ligands based on lDDT-PLI.
1077 
1078  Sets self._lddt_pli and self._lddt_pli_details.
1079  """
1080  mat1 = self._reverse_lddt(self.lddt_pli_matrix)
1081 
1082  mat_tuple = self._assign_matrices(mat1,
1083  self.rmsd_matrix,
1084  self._lddt_pli_full_matrix,
1085  "lddt_pli")
1086  self._lddt_pli = mat_tuple[0]
1087  self._lddt_pli_details = mat_tuple[1]
1088  self._unassigned_target_ligands = mat_tuple[2]
1089  self._unassigned_model_ligands = mat_tuple[3]
1090 
1091  def _assign_ligands_rmsd_only(self):
1092  """Assign (map) ligands between model and target based on RMSD only.
1093 
1094  Sets self._rmsd, self._rmsd_details, self._lddt_pli and
1095  self._lddt_pli_details.
1096  """
1097  mat_tuple = self._assign_matrix(self.rmsd_matrix,
1098  self._rmsd_full_matrix,
1099  "rmsd",
1100  self._lddt_pli_full_matrix,
1101  "lddt_pli")
1102  self._rmsd = mat_tuple[0]
1103  self._rmsd_details = mat_tuple[1]
1104  self._lddt_pli = mat_tuple[2]
1105  self._lddt_pli_details = mat_tuple[3]
1106  self._unassigned_target_ligands = mat_tuple[4]
1107  self._unassigned_model_ligands = mat_tuple[5]
1108 
1109  @property
1110  def rmsd_matrix(self):
1111  """ Get the matrix of RMSD values.
1112 
1113  Target ligands are in rows, model ligands in columns.
1114 
1115  NaN values indicate that no RMSD could be computed (i.e. different
1116  ligands).
1117 
1118  :rtype: :class:`~numpy.ndarray`
1119  """
1120  if self._rmsd_full_matrix is None:
1121  self._compute_scores()
1122  if self._rmsd_matrix is None:
1123  # convert
1124  shape = self._rmsd_full_matrix.shape
1125  self._rmsd_matrix = np.full(shape, np.nan)
1126  for i, j in np.ndindex(shape):
1127  if self._rmsd_full_matrix[i, j] is not None:
1128  self._rmsd_matrix[i, j] = self._rmsd_full_matrix[
1129  i, j]["rmsd"]
1130  return self._rmsd_matrix
1131 
1132  @property
1133  def lddt_pli_matrix(self):
1134  """ Get the matrix of lDDT-PLI values.
1135 
1136  Target ligands are in rows, model ligands in columns.
1137 
1138  NaN values indicate that no lDDT-PLI could be computed (i.e. different
1139  ligands).
1140 
1141  :rtype: :class:`~numpy.ndarray`
1142  """
1143  if self._lddt_pli_full_matrix is None:
1144  self._compute_scores()
1145  if self._lddt_pli_matrix is None:
1146  # convert
1147  shape = self._lddt_pli_full_matrix.shape
1148  self._lddt_pli_matrix = np.full(shape, np.nan)
1149  for i, j in np.ndindex(shape):
1150  if self._lddt_pli_full_matrix[i, j] is not None:
1151  self._lddt_pli_matrix[i, j] = self._lddt_pli_full_matrix[
1152  i, j]["lddt_pli"]
1153  return self._lddt_pli_matrix
1154 
1155  @property
1156  def coverage_matrix(self):
1157  """ Get the matrix of model ligand atom coverage in the target.
1158 
1159  Target ligands are in rows, model ligands in columns.
1160 
1161  A value of 0 indicates that there was no isomorphism between the model
1162  and target ligands. If `substructure_match=False`, only full match
1163  isomorphisms are considered, and therefore only values of 1.0 and 0.0
1164  are reported.
1165 
1166  :rtype: :class:`~numpy.ndarray`
1167  """
1168  if self._assignment_match_coverage is None:
1169  self._compute_scores()
1170  return self._assignment_match_coverage
1171 
1172  @property
1173  def rmsd(self):
1174  """Get a dictionary of RMSD score values, keyed by model ligand
1175  (chain name, :class:`~ost.mol.ResNum`).
1176 
1177  If the scoring object was instantiated with `unassigned=True`, some
1178  scores may be `None`.
1179 
1180  :rtype: :class:`dict`
1181  """
1182  if self._rmsd is None:
1183  if self.rmsd_assignment:
1185  else:
1186  self._assign_ligands_rmsd()
1187  return self._rmsd
1188 
1189  @property
1190  def rmsd_details(self):
1191  """Get a dictionary of RMSD score details (dictionaries), keyed by
1192  model ligand (chain name, :class:`~ost.mol.ResNum`).
1193 
1194  The value is a dictionary. For ligands that were assigned (mapped) to
1195  the target, the dictionary contain the following information:
1196 
1197  * `rmsd`: the RMSD score value.
1198  * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
1199  * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
1200  that define the binding site in the reference.
1201  * `bs_ref_res_mapped`: a list of residues
1202  (:class:`~ost.mol.ResidueHandle`) in the reference binding site
1203  that could be mapped to the model.
1204  * `bs_mdl_res_mapped`: a list of residues
1205  (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
1206  the reference binding site. The residues are in the same order as
1207  `bs_ref_res_mapped`.
1208  * `bb_rmsd`: the RMSD of the binding site backbone after superposition
1209  * `target_ligand`: residue handle of the target ligand.
1210  * `model_ligand`: residue handle of the model ligand.
1211  * `chain_mapping`: local chain mapping as a dictionary, with target
1212  chain name as key and model chain name as value.
1213  * `transform`: transformation to superpose the model onto the target.
1214  * `substructure_match`: whether the score is the result of a partial
1215  (substructure) match. A value of `True` indicates that the target
1216  ligand covers only part of the model, while `False` indicates a
1217  perfect match.
1218  * `coverage`: the fraction of model atoms covered by the assigned
1219  target ligand, in the interval (0, 1]. If `substructure_match`
1220  is `False`, this will always be 1.
1221  * `inconsistent_residues`: a list of tuples of mapped residues views
1222  (:class:`~ost.mol.ResidueView`) with residue names that differ
1223  between the reference and the model, respectively.
1224  The list is empty if all residue names match, which is guaranteed
1225  if `check_resnames=True`.
1226  Note: more binding site mappings may be explored during scoring,
1227  but only inconsistencies in the selected mapping are reported.
1228  * `unassigned`: only if the scorer was instantiated with
1229  `unassigned=True`: `False`
1230 
1231  If the scoring object was instantiated with `unassigned=True`, in
1232  addition the unassigned ligands will be reported with a score of `None`
1233  and the following information:
1234 
1235  * `unassigned`: `True`,
1236  * `reason_short`: a short token of the reason, see
1237  :attr:`unassigned_model_ligands` for details and meaning.
1238  * `reason_long`: a human-readable text of the reason, see
1239  :attr:`unassigned_model_ligands` for details and meaning.
1240  * `rmsd`: `None`
1241 
1242  :rtype: :class:`dict`
1243  """
1244  if self._rmsd_details is None:
1245  if self.rmsd_assignment:
1247  else:
1248  self._assign_ligands_rmsd()
1249  return self._rmsd_details
1250 
1251  @property
1252  def lddt_pli(self):
1253  """Get a dictionary of lDDT-PLI score values, keyed by model ligand
1254  (chain name, :class:`~ost.mol.ResNum`).
1255 
1256  If the scoring object was instantiated with `unassigned=True`, some
1257  scores may be `None`.
1258 
1259  :rtype: :class:`dict`
1260  """
1261  if self._lddt_pli is None:
1262  if self.rmsd_assignment:
1264  else:
1266  return self._lddt_pli
1267 
1268  @property
1269  def lddt_pli_details(self):
1270  """Get a dictionary of lDDT-PLI score details (dictionaries), keyed by
1271  model ligand (chain name, :class:`~ost.mol.ResNum`).
1272 
1273  Each sub-dictionary contains the following information:
1274 
1275  * `lddt_pli`: the lDDT-PLI score value.
1276  * `rmsd`: the RMSD score value corresponding to the lDDT-PLI
1277  chain mapping and assignment. This may differ from the RMSD-based
1278  assignment. Note that a different isomorphism than `lddt_pli` may
1279  be used.
1280  * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
1281  * `lddt_pli_n_contacts`: number of total contacts used in lDDT-PLI,
1282  summed over all thresholds. Can be divided by 8 to obtain the number
1283  of atomic contacts.
1284  * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
1285  that define the binding site in the reference.
1286  * `bs_ref_res_mapped`: a list of residues
1287  (:class:`~ost.mol.ResidueHandle`) in the reference binding site
1288  that could be mapped to the model.
1289  * `bs_mdl_res_mapped`: a list of residues
1290  (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
1291  the reference binding site. The residues are in the same order as
1292  `bs_ref_res_mapped`.
1293  * `bb_rmsd`: the RMSD of the binding site backbone after superposition.
1294  Note: not used for lDDT-PLI computation.
1295  * `target_ligand`: residue handle of the target ligand.
1296  * `model_ligand`: residue handle of the model ligand.
1297  * `chain_mapping`: local chain mapping as a dictionary, with target
1298  chain name as key and model chain name as value.
1299  * `transform`: transformation to superpose the model onto the target
1300  (for RMSD only).
1301  * `substructure_match`: whether the score is the result of a partial
1302  (substructure) match. A value of `True` indicates that the target
1303  ligand covers only part of the model, while `False` indicates a
1304  perfect match.
1305  * `inconsistent_residues`: a list of tuples of mapped residues views
1306  (:class:`~ost.mol.ResidueView`) with residue names that differ
1307  between the reference and the model, respectively.
1308  The list is empty if all residue names match, which is guaranteed
1309  if `check_resnames=True`.
1310  Note: more binding site mappings may be explored during scoring,
1311  but only inconsistencies in the selected mapping are reported.
1312  * `unassigned`: only if the scorer was instantiated with
1313  `unassigned=True`: `False`
1314 
1315  If the scoring object was instantiated with `unassigned=True`, in
1316  addition the unmapped ligands will be reported with a score of `None`
1317  and the following information:
1318 
1319  * `unassigned`: `True`,
1320  * `reason_short`: a short token of the reason, see
1321  :attr:`unassigned_model_ligands` for details and meaning.
1322  * `reason_long`: a human-readable text of the reason, see
1323  :attr:`unassigned_model_ligands` for details and meaning.
1324  * `lddt_pli`: `None`
1325 
1326  :rtype: :class:`dict`
1327  """
1328  if self._lddt_pli_details is None:
1329  if self.rmsd_assignment:
1331  else:
1333  return self._lddt_pli_details
1334 
1335  @property
1337  """Get a dictionary of target ligands not assigned to any model ligand,
1338  keyed by target ligand (chain name, :class:`~ost.mol.ResNum`).
1339 
1340  The assignment for the lDDT-PLI score is used (and is controlled
1341  by the `rmsd_assignment` argument).
1342 
1343  Each item contains a string from a controlled dictionary
1344  about the reason for the absence of assignment.
1345  A human-readable description can be obtained from the
1346  :attr:`unassigned_target_ligand_descriptions` property.
1347 
1348  Currently, the following reasons are reported:
1349 
1350  * `no_ligand`: there was no ligand in the model.
1351  * `disconnected`: the ligand graph was disconnected.
1352  * `binding_site`: no residues were in proximity of the ligand.
1353  * `model_representation`: no representation of the reference binding
1354  site was found in the model. (I.e. the binding site was not modeled.
1355  Remember: the binding site is defined in the target structure,
1356  the position of the model ligand itself is ignored at this point.)
1357  * `identity`: the ligand was not found in the model (by graph
1358  isomorphism). Check your ligand connectivity, and enable the
1359  `substructure_match` option if the target ligand is incomplete.
1360  * `stoichiometry`: there was a possible assignment in the model, but
1361  the model ligand was already assigned to a different target ligand.
1362  This indicates different stoichiometries.
1363  * `symmetries`: too many symmetries were found (by graph isomorphisms).
1364  Increase `max_symmetries`.
1365 
1366  Some of these reasons can be overlapping, but a single reason will be
1367  reported.
1368 
1369  :rtype: :class:`dict`
1370  """
1371  if self._unassigned_target_ligand_short is None:
1372  if self.rmsd_assignment:
1374  else:
1378  for cname, res in self._unassigned_target_ligands.items():
1379  self._unassigned_target_ligand_short[cname] = {}
1380  for resnum, val in res.items():
1381  self._unassigned_target_ligand_short[cname][resnum] = val[0]
1382  self._unassigned_target_ligand_descriptions[val[0]] = val[1]
1384 
1385  @property
1387  """Get a human-readable description of why target ligands were
1388  unassigned, as a dictionary keyed by the controlled dictionary
1389  from :attr:`unassigned_target_ligands`.
1390  """
1392  _ = self.unassigned_target_ligands # assigned there
1394 
1395  @property
1397  """Get a dictionary of model ligands not assigned to any target ligand,
1398  keyed by model ligand (chain name, :class:`~ost.mol.ResNum`).
1399 
1400  The assignment for the lDDT-PLI score is used (and is controlled
1401  by the `rmsd_assignment` argument).
1402 
1403  Each item contains a string from a controlled dictionary
1404  about the reason for the absence of assignment.
1405  A human-readable description can be obtained from the
1406  :attr:`unassigned_model_ligand_descriptions` property.
1407  Currently, the following reasons are reported:
1408 
1409  * `no_ligand`: there was no ligand in the target.
1410  * `disconnected`: the ligand graph is disconnected.
1411  * `binding_site`: a potential assignment was found in the target, but
1412  there were no polymer residues in proximity of the ligand in the
1413  target.
1414  * `model_representation`: a potential assignment was found in the target,
1415  but no representation of the binding site was found in the model.
1416  (I.e. the binding site was not modeled. Remember: the binding site
1417  is defined in the target structure, the position of the model ligand
1418  itself is ignored at this point.)
1419  * `identity`: the ligand was not found in the target (by graph
1420  isomorphism). Check your ligand connectivity, and enable the
1421  `substructure_match` option if the target ligand is incomplete.
1422  * `stoichiometry`: there was a possible assignment in the target, but
1423  the model target was already assigned to a different model ligand.
1424  This indicates different stoichiometries.
1425  * `symmetries`: too many symmetries were found (by graph isomorphisms).
1426  Increase `max_symmetries`.
1427 
1428  Some of these reasons can be overlapping, but a single reason will be
1429  reported.
1430 
1431  :rtype: :class:`dict`
1432  """
1433  if self._unassigned_model_ligand_short is None:
1434  if self.rmsd_assignment:
1436  else:
1440  for cname, res in self._unassigned_model_ligands.items():
1441  self._unassigned_model_ligand_short[cname] = {}
1442  for resnum, val in res.items():
1443  self._unassigned_model_ligand_short[cname][resnum] = val[0]
1444  self._unassigned_model_ligand_descriptions[val[0]] = val[1]
1445  return self._unassigned_model_ligand_short
1446 
1447  @property
1449  """Get a human-readable description of why model ligands were
1450  unassigned, as a dictionary keyed by the controlled dictionary
1451  from :attr:`unassigned_model_ligands`.
1452  """
1453  if self._unassigned_model_ligand_descriptions is None:
1454  _ = self.unassigned_model_ligands # assigned there
1456 
1457 
1458  def _set_custom_mapping(self, mapping):
1459  """ sets self.__model_mapping with a full blown MappingResult object
1460 
1461  :param mapping: mapping with trg chains as key and mdl ch as values
1462  :type mapping: :class:`dict`
1463  """
1464  chain_mapper = self.chain_mapper
1465  chem_mapping, chem_group_alns, mdl = \
1466  chain_mapper.GetChemMapping(self.model)
1467 
1468  # now that we have a chem mapping, lets do consistency checks
1469  # - check whether chain names are unique and available in structures
1470  # - check whether the mapped chains actually map to the same chem groups
1471  if len(mapping) != len(set(mapping.keys())):
1472  raise RuntimeError(f"Expect unique trg chain names in mapping. Got "
1473  f"{mapping.keys()}")
1474  if len(mapping) != len(set(mapping.values())):
1475  raise RuntimeError(f"Expect unique mdl chain names in mapping. Got "
1476  f"{mapping.values()}")
1477 
1478  trg_chains = set([ch.GetName() for ch in chain_mapper.target.chains])
1479  mdl_chains = set([ch.GetName() for ch in mdl.chains])
1480  for k,v in mapping.items():
1481  if k not in trg_chains:
1482  raise RuntimeError(f"Target chain \"{k}\" is not available "
1483  f"in target processed for chain mapping "
1484  f"({trg_chains})")
1485  if v not in mdl_chains:
1486  raise RuntimeError(f"Model chain \"{v}\" is not available "
1487  f"in model processed for chain mapping "
1488  f"({mdl_chains})")
1489 
1490  for trg_ch, mdl_ch in mapping.items():
1491  trg_group_idx = None
1492  mdl_group_idx = None
1493  for idx, group in enumerate(chain_mapper.chem_groups):
1494  if trg_ch in group:
1495  trg_group_idx = idx
1496  break
1497  for idx, group in enumerate(chem_mapping):
1498  if mdl_ch in group:
1499  mdl_group_idx = idx
1500  break
1501  if trg_group_idx is None or mdl_group_idx is None:
1502  raise RuntimeError("Could not establish a valid chem grouping "
1503  "of chain names provided in custom mapping.")
1504 
1505  if trg_group_idx != mdl_group_idx:
1506  raise RuntimeError(f"Chem group mismatch in custom mapping: "
1507  f"target chain \"{trg_ch}\" groups with the "
1508  f"following chemically equivalent target "
1509  f"chains: "
1510  f"{chain_mapper.chem_groups[trg_group_idx]} "
1511  f"but model chain \"{mdl_ch}\" maps to the "
1512  f"following target chains: "
1513  f"{chain_mapper.chem_groups[mdl_group_idx]}")
1514 
1515  pairs = set([(trg_ch, mdl_ch) for trg_ch, mdl_ch in mapping.items()])
1516  ref_mdl_alns = \
1517  chain_mapping._GetRefMdlAlns(chain_mapper.chem_groups,
1518  chain_mapper.chem_group_alignments,
1519  chem_mapping,
1520  chem_group_alns,
1521  pairs = pairs)
1522 
1523  # translate mapping format
1524  final_mapping = list()
1525  for ref_chains in chain_mapper.chem_groups:
1526  mapped_mdl_chains = list()
1527  for ref_ch in ref_chains:
1528  if ref_ch in mapping:
1529  mapped_mdl_chains.append(mapping[ref_ch])
1530  else:
1531  mapped_mdl_chains.append(None)
1532  final_mapping.append(mapped_mdl_chains)
1533 
1534  alns = dict()
1535  for ref_group, mdl_group in zip(chain_mapper.chem_groups,
1536  final_mapping):
1537  for ref_ch, mdl_ch in zip(ref_group, mdl_group):
1538  if ref_ch is not None and mdl_ch is not None:
1539  aln = ref_mdl_alns[(ref_ch, mdl_ch)]
1540  trg_view = chain_mapper.target.Select(f"cname={ref_ch}")
1541  mdl_view = mdl.Select(f"cname={mdl_ch}")
1542  aln.AttachView(0, trg_view)
1543  aln.AttachView(1, mdl_view)
1544  alns[(ref_ch, mdl_ch)] = aln
1545 
1546  self.__model_mapping = chain_mapping.MappingResult(chain_mapper.target, mdl,
1547  chain_mapper.chem_groups,
1548  chem_mapping,
1549  final_mapping, alns)
1550 
1551  def _find_unassigned_model_ligand_reason(self, ligand, assignment="lddt_pli", check=True):
1552  # Is this a model ligand?
1553  try:
1554  ligand_idx = self.model_ligands.index(ligand)
1555  except ValueError:
1556  # Raise with a better error message
1557  raise ValueError("Ligand %s is not in self.model_ligands" % ligand)
1558 
1559  # Ensure we are unassigned
1560  if check:
1561  details = getattr(self, assignment + "_details")
1562  if ligand.chain.name in details and ligand.number in details[ligand.chain.name]:
1563  ligand_details = details[ligand.chain.name][ligand.number]
1564  if not ("unassigned" in ligand_details and ligand_details["unassigned"]):
1565  raise RuntimeError("Ligand %s is mapped to %s" % (ligand, ligand_details["target_ligand"]))
1566 
1567  # Were there any ligands in the target?
1568  if len(self.target_ligands) == 0:
1569  return ("no_ligand", "No ligand in the target")
1570 
1571  # Is the ligand disconnected?
1572  graph = _ResidueToGraph(ligand)
1573  if not networkx.is_connected(graph):
1574  return ("disconnected", "Ligand graph is disconnected")
1575 
1576  # Do we have isomorphisms with the target?
1577  for trg_lig_idx, assigned in enumerate(self._assignment_isomorphisms[:, ligand_idx]):
1578  if np.isnan(assigned):
1579  try:
1580  _ComputeSymmetries(
1581  self.model_ligands[ligand_idx],
1582  self.target_ligands[trg_lig_idx],
1583  substructure_match=self.substructure_match,
1584  by_atom_index=True,
1585  return_symmetries=False)
1586  except (NoSymmetryError, DisconnectedGraphError):
1587  assigned = 0.
1588  except TooManySymmetriesError:
1589  assigned = -1.
1590  else:
1591  assigned = 1.
1592  self._assignment_isomorphisms[trg_lig_idx,ligand_idx] = assigned
1593  if assigned == 1.:
1594  # Could have been assigned
1595  # So what's up with this target ligand?
1596  assignment_matrix = getattr(self, assignment + "_matrix")
1597  all_nan = np.all(np.isnan(assignment_matrix[:, ligand_idx]))
1598  if all_nan:
1599  # The assignment matrix is all nans so we have a problem
1600  # with the binding site or the representation
1601  trg_ligand = self.target_ligands[trg_lig_idx]
1602  return self._unassigned_target_ligands_reason[trg_ligand]
1603  else:
1604  # Ligand was already assigned
1605  return ("stoichiometry",
1606  "Ligand was already assigned to an other "
1607  "model ligand (different stoichiometry)")
1608  elif assigned == -1:
1609  # Symmetries / isomorphisms exceeded limit
1610  return ("symmetries",
1611  "Too many symmetries were found.")
1612 
1613  # Could not be assigned to any ligand - must be different
1614  if self.substructure_match:
1615  iso = "subgraph isomorphism"
1616  else:
1617  iso = "full graph isomorphism"
1618  return ("identity", "Ligand was not found in the target (by %s)" % iso)
1619 
1620  def _find_unassigned_target_ligand_reason(self, ligand, assignment="lddt_pli", check=True):
1621  # Is this a target ligand?
1622  try:
1623  ligand_idx = self.target_ligands.index(ligand)
1624  except ValueError:
1625  # Raise with a better error message
1626  raise ValueError("Ligand %s is not in self.target_ligands" % ligand)
1627 
1628  # Ensure we are unassigned
1629  if check:
1630  details = getattr(self, assignment + "_details")
1631  for cname, chain_ligands in details.items():
1632  for rnum, details in chain_ligands.items():
1633  if "unassigned" in details and details["unassigned"]:
1634  continue
1635  if details['target_ligand'] == ligand:
1636  raise RuntimeError("Ligand %s is mapped to %s.%s" % (
1637  ligand, cname, rnum))
1638 
1639  # Were there any ligands in the model?
1640  if len(self.model_ligands) == 0:
1641  return ("no_ligand", "No ligand in the model")
1642 
1643  # Is the ligand disconnected?
1644  graph = _ResidueToGraph(ligand)
1645  if not networkx.is_connected(graph):
1646  return ("disconnected", "Ligand graph is disconnected")
1647 
1648  # Is it because there was no valid binding site or no representation?
1649  if ligand in self._unassigned_target_ligands_reason:
1650  return self._unassigned_target_ligands_reason[ligand]
1651  # Or because no symmetry?
1652  for model_lig_idx, assigned in enumerate(
1653  self._assignment_isomorphisms[ligand_idx, :]):
1654  if np.isnan(assigned):
1655  try:
1656  _ComputeSymmetries(
1657  self.model_ligands[model_lig_idx],
1658  self.target_ligands[ligand_idx],
1659  substructure_match=self.substructure_match,
1660  by_atom_index=True,
1661  return_symmetries=False)
1662  except (NoSymmetryError, DisconnectedGraphError):
1663  assigned = 0.
1664  except TooManySymmetriesError:
1665  assigned = -1.
1666  else:
1667  assigned = 1.
1668  self._assignment_isomorphisms[ligand_idx,model_lig_idx] = assigned
1669  if assigned == 1:
1670  # Could have been assigned but was assigned to a different ligand
1671  return ("stoichiometry",
1672  "Ligand was already assigned to an other "
1673  "target ligand (different stoichiometry)")
1674  elif assigned == -1:
1675  # Symmetries / isomorphisms exceeded limit
1676  return ("symmetries",
1677  "Too many symmetries were found.")
1678 
1679  # Could not be assigned to any ligand - must be different
1680  if self.substructure_match:
1681  iso = "subgraph isomorphism"
1682  else:
1683  iso = "full graph isomorphism"
1684  return ("identity", "Ligand was not found in the model (by %s)" % iso)
1685 
1686 
1687 def _ResidueToGraph(residue, by_atom_index=False):
1688  """Return a NetworkX graph representation of the residue.
1689 
1690  :param residue: the residue from which to derive the graph
1691  :type residue: :class:`ost.mol.ResidueHandle` or
1692  :class:`ost.mol.ResidueView`
1693  :param by_atom_index: Set this parameter to True if you need the nodes to
1694  be labeled by atom index (within the residue).
1695  Otherwise, if False, the nodes will be labeled by
1696  atom names.
1697  :type by_atom_index: :class:`bool`
1698  :rtype: :class:`~networkx.classes.graph.Graph`
1699 
1700  Nodes are labeled with the Atom's uppercase :attr:`~ost.mol.AtomHandle.element`.
1701  """
1702  nxg = networkx.Graph()
1703 
1704  for atom in residue.atoms:
1705  nxg.add_node(atom.name, element=atom.element.upper())
1706 
1707  # This will list all edges twice - once for every atom of the pair.
1708  # But as of NetworkX 3.0 adding the same edge twice has no effect, so we're good.
1709  nxg.add_edges_from([(
1710  b.first.name,
1711  b.second.name) for a in residue.atoms for b in a.GetBondList()])
1712 
1713  if by_atom_index:
1714  nxg = networkx.relabel_nodes(nxg,
1715  {a: b for a, b in zip(
1716  [a.name for a in residue.atoms],
1717  range(len(residue.atoms)))},
1718  True)
1719  return nxg
1720 
1721 
1722 def SCRMSD(model_ligand, target_ligand, transformation=geom.Mat4(),
1723  substructure_match=False, max_symmetries=1e6):
1724  """Calculate symmetry-corrected RMSD.
1725 
1726  Binding site superposition must be computed separately and passed as
1727  `transformation`.
1728 
1729  :param model_ligand: The model ligand
1730  :type model_ligand: :class:`ost.mol.ResidueHandle` or
1731  :class:`ost.mol.ResidueView`
1732  :param target_ligand: The target ligand
1733  :type target_ligand: :class:`ost.mol.ResidueHandle` or
1734  :class:`ost.mol.ResidueView`
1735  :param transformation: Optional transformation to apply on each atom
1736  position of model_ligand.
1737  :type transformation: :class:`ost.geom.Mat4`
1738  :param substructure_match: Set this to True to allow partial target
1739  ligand.
1740  :type substructure_match: :class:`bool`
1741  :param max_symmetries: If more than that many isomorphisms exist, raise
1742  a :class:`TooManySymmetriesError`. This can only be assessed by
1743  generating at least that many isomorphisms and can take some time.
1744  :type max_symmetries: :class:`int`
1745  :rtype: :class:`float`
1746  :raises: :class:`NoSymmetryError` when no symmetry can be found,
1747  :class:`DisconnectedGraphError` when ligand graph is disconnected,
1748  :class:`TooManySymmetriesError` when more than `max_symmetries`
1749  isomorphisms are found.
1750  """
1751 
1752  symmetries = _ComputeSymmetries(model_ligand, target_ligand,
1753  substructure_match=substructure_match,
1754  by_atom_index=True,
1755  max_symmetries=max_symmetries)
1756  return _SCRMSD_symmetries(symmetries, model_ligand, target_ligand,
1757  transformation)
1758 
1759 
1760 def _SCRMSD_symmetries(symmetries, model_ligand, target_ligand,
1761  transformation):
1762  """Compute SCRMSD with pre-computed symmetries. Internal. """
1763 
1764  best_rmsd = np.inf
1765  for i, (trg_sym, mdl_sym) in enumerate(symmetries):
1766  # Prepare Entities for RMSD
1767  trg_lig_rmsd_ent = mol.CreateEntity()
1768  trg_lig_rmsd_editor = trg_lig_rmsd_ent.EditXCS()
1769  trg_lig_rmsd_chain = trg_lig_rmsd_editor.InsertChain("_")
1770  trg_lig_rmsd_res = trg_lig_rmsd_editor.AppendResidue(trg_lig_rmsd_chain, "LIG")
1771 
1772  mdl_lig_rmsd_ent = mol.CreateEntity()
1773  mdl_lig_rmsd_editor = mdl_lig_rmsd_ent.EditXCS()
1774  mdl_lig_rmsd_chain = mdl_lig_rmsd_editor.InsertChain("_")
1775  mdl_lig_rmsd_res = mdl_lig_rmsd_editor.AppendResidue(mdl_lig_rmsd_chain, "LIG")
1776 
1777  for mdl_anum, trg_anum in zip(mdl_sym, trg_sym):
1778  # Rename model atoms according to symmetry
1779  trg_atom = target_ligand.atoms[trg_anum]
1780  mdl_atom = model_ligand.atoms[mdl_anum]
1781  # Add atoms in the correct order to the RMSD entities
1782  trg_lig_rmsd_editor.InsertAtom(trg_lig_rmsd_res, trg_atom.name, trg_atom.pos)
1783  mdl_lig_rmsd_editor.InsertAtom(mdl_lig_rmsd_res, mdl_atom.name, mdl_atom.pos)
1784 
1785  trg_lig_rmsd_editor.UpdateICS()
1786  mdl_lig_rmsd_editor.UpdateICS()
1787 
1788  rmsd = mol.alg.CalculateRMSD(mdl_lig_rmsd_ent.CreateFullView(),
1789  trg_lig_rmsd_ent.CreateFullView(),
1790  transformation)
1791  if rmsd < best_rmsd:
1792  best_rmsd = rmsd
1793 
1794  return best_rmsd
1795 
1796 
1797 def _ComputeSymmetries(model_ligand, target_ligand, substructure_match=False,
1798  by_atom_index=False, return_symmetries=True,
1799  max_symmetries=1e6):
1800  """Return a list of symmetries (isomorphisms) of the model onto the target
1801  residues.
1802 
1803  :param model_ligand: The model ligand
1804  :type model_ligand: :class:`ost.mol.ResidueHandle` or
1805  :class:`ost.mol.ResidueView`
1806  :param target_ligand: The target ligand
1807  :type target_ligand: :class:`ost.mol.ResidueHandle` or
1808  :class:`ost.mol.ResidueView`
1809  :param substructure_match: Set this to True to allow partial ligands
1810  in the reference.
1811  :type substructure_match: :class:`bool`
1812  :param by_atom_index: Set this parameter to True if you need the symmetries
1813  to refer to atom index (within the residue).
1814  Otherwise, if False, the symmetries refer to atom
1815  names.
1816  :type by_atom_index: :class:`bool`
1817  :type return_symmetries: If Truthy, return the mappings, otherwise simply
1818  return True if a mapping is found (and raise if
1819  no mapping is found). This is useful to quickly
1820  find out if a mapping exist without the expensive
1821  step to find all the mappings.
1822  :type return_symmetries: :class:`bool`
1823  :param max_symmetries: If more than that many isomorphisms exist, raise
1824  a :class:`TooManySymmetriesError`. This can only be assessed by
1825  generating at least that many isomorphisms and can take some time.
1826  :type max_symmetries: :class:`int`
1827  :raises: :class:`NoSymmetryError` when no symmetry can be found;
1828  :class:`TooManySymmetriesError` when more than `max_symmetries`
1829  isomorphisms are found.
1830 
1831  """
1832 
1833  # Get the Graphs of the ligands
1834  model_graph = _ResidueToGraph(model_ligand, by_atom_index=by_atom_index)
1835  target_graph = _ResidueToGraph(target_ligand, by_atom_index=by_atom_index)
1836 
1837  if not networkx.is_connected(model_graph):
1838  raise DisconnectedGraphError("Disconnected graph for model ligand %s" % model_ligand)
1839  if not networkx.is_connected(target_graph):
1840  raise DisconnectedGraphError("Disconnected graph for target ligand %s" % target_ligand)
1841 
1842  # Note the argument order (model, target) which differs from spyrmsd.
1843  # This is because a subgraph of model is isomorphic to target - but not the opposite
1844  # as we only consider partial ligands in the reference.
1845  # Make sure to generate the symmetries correctly in the end
1846  gm = networkx.algorithms.isomorphism.GraphMatcher(
1847  model_graph, target_graph, node_match=lambda x, y:
1848  x["element"] == y["element"])
1849  if gm.is_isomorphic():
1850  if not return_symmetries:
1851  return True
1852  symmetries = []
1853  for i, isomorphism in enumerate(gm.isomorphisms_iter()):
1854  if i >= max_symmetries:
1855  raise TooManySymmetriesError(
1856  "Too many symmetries between %s and %s" % (
1857  str(model_ligand), str(target_ligand)))
1858  symmetries.append((list(isomorphism.values()), list(isomorphism.keys())))
1859  assert len(symmetries) > 0
1860  LogDebug("Found %s isomorphic mappings (symmetries)" % len(symmetries))
1861  elif gm.subgraph_is_isomorphic() and substructure_match:
1862  if not return_symmetries:
1863  return True
1864  symmetries = []
1865  for i, isomorphism in enumerate(gm.subgraph_isomorphisms_iter()):
1866  if i >= max_symmetries:
1867  raise TooManySymmetriesError(
1868  "Too many symmetries between %s and %s" % (
1869  str(model_ligand), str(target_ligand)))
1870  symmetries.append((list(isomorphism.values()), list(isomorphism.keys())))
1871  assert len(symmetries) > 0
1872  # Assert that all the atoms in the target are part of the substructure
1873  assert len(symmetries[0][0]) == len(target_ligand.atoms)
1874  LogDebug("Found %s subgraph isomorphisms (symmetries)" % len(symmetries))
1875  elif gm.subgraph_is_isomorphic():
1876  LogDebug("Found subgraph isomorphisms (symmetries), but"
1877  " ignoring because substructure_match=False")
1878  raise NoSymmetryError("No symmetry between %s and %s" % (
1879  str(model_ligand), str(target_ligand)))
1880  else:
1881  LogDebug("Found no isomorphic mappings (symmetries)")
1882  raise NoSymmetryError("No symmetry between %s and %s" % (
1883  str(model_ligand), str(target_ligand)))
1884 
1885  return symmetries
1886 
1887 
1888 class NoSymmetryError(ValueError):
1889  """Exception raised when no symmetry can be found.
1890  """
1891  pass
1892 
1893 
1894 class TooManySymmetriesError(ValueError):
1895  """Exception raised when too many symmetries are found.
1896  """
1897  pass
1898 
1899 class DisconnectedGraphError(Exception):
1900  """Exception raised when the ligand graph is disconnected.
1901  """
1902  pass
1903 
1904 
1905 __all__ = ["LigandScorer", "SCRMSD", "NoSymmetryError",
1906  "TooManySymmetriesError", "DisconnectedGraphError"]
Protein or molecule.
definition of EntityView
Definition: entity_view.hh:86