OpenStructure
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
ligand_scoring.py
Go to the documentation of this file.
1 import warnings
2 
3 import numpy as np
4 import numpy.ma as np_ma
5 import networkx
6 
7 from ost import mol
8 from ost import geom
9 from ost import LogError, LogWarning, LogScript, LogInfo, LogVerbose, LogDebug
10 from ost.mol.alg import chain_mapping
11 
12 
14  """ Scorer class to compute various small molecule ligand (non polymer) scores.
15 
16  .. note ::
17  Extra requirements:
18 
19  - Python modules `numpy` and `networkx` must be available
20  (e.g. use ``pip install numpy networkx``)
21 
22  At the moment, two scores are available:
23 
24  * lDDT-PLI, that looks at the conservation of protein-ligand contacts
25  with :class:`lDDT <ost.mol.alg.lddt.lDDTScorer>`.
26  * Binding-site superposed, symmetry-corrected RMSD that assesses the
27  accuracy of the ligand pose (BiSyRMSD, hereinafter referred to as RMSD).
28 
29  Both scores involve local chain mapping of the reference binding site
30  onto the model, symmetry-correction, and finally assignment (mapping)
31  of model and target ligands, as described in (Manuscript in preparation).
32 
33  The binding site is defined based on a radius around the target ligand
34  in the reference structure only. It only contains protein and nucleic
35  acid chains that pass the criteria for the
36  :class:`chain mapping <ost.mol.alg.chain_mapping>`. This means ignoring
37  other ligands, waters, short polymers as well as any incorrectly connected
38  chains that may be in proximity.
39 
40  Results are available as matrices (`(lddt_pli|rmsd)_matrix`), where every
41  target-model score is reported in a matrix; as `(lddt_pli|rmsd)` where
42  a model-target assignment has been determined (see below) and reported in
43  a dictionary; and as (`(lddt_pli|rmsd)_details`) methods, which report
44  additional details about different aspects of the scoring such as chain
45  mapping.
46 
47  The behavior of chain mapping and ligand assignment can be controlled
48  with the `global_chain_mapping` and `rmsd_assignment` arguments.
49 
50  By default, chain mapping is performed locally, ie. only within the
51  binding site. As a result, different ligand scores can correspond to
52  different chain mappings. This tends to produce more favorable scores,
53  especially in large, partially regular oligomeric complexes.
54  Setting `global_chain_mapping=True` enforces a single global chain mapping,
55  as per :meth:`ost.mol.alg.chain_mapping.ChainMapper.GetMapping`.
56  Note that this global chain mapping currently ignores non polymer entities
57  such as small ligands, and may result in overly pessimistic scores.
58 
59  By default, target-model ligand assignments are computed independently
60  for the RMSD and lDDT-PLI scores. For RMSD, each model ligand is uniquely
61  assigned to a target ligand, starting from the "best" possible mapping
62  (lowest RMSD) and using each target and model ligand in a single
63  assignment. Ties are resolved by best (highest) lDDT-PLI. Similarly,
64  for lDDT-PLI, the assignment is based on the highest lDDT-PLI, and ties
65  broken by lowest RMSD. Setting `rmsd_assignment=True` forces a single
66  ligand assignment, based on RMSD only. Ties are broken arbitrarily.
67 
68  By default, only exact matches between target and model ligands are
69  considered. This is a problem when the target only contains a subset
70  of the expected atoms (for instance if atoms are missing in an
71  experimental structure, which often happens in the PDB). With
72  `substructure_match=True`, complete model ligands can be scored against
73  partial target ligands. One problem with this approach is that it is
74  very easy to find good matches to small, irrelevant ligands like EDO, CO2
75  or GOL. To counter that, the assignment algorithm considers the coverage,
76  expressed as the fraction of atoms of the model ligand atoms covered in the
77  target. Higher coverage matches are prioritized, but a match with a better
78  score will be preferred if it falls within a window of `coverage_delta`
79  (by default 0.2) of a worse-scoring match. As a result, for instance,
80  with a delta of 0.2, a low-score match with coverage 0.96 would be
81  preferred to a high-score match with coverage 0.90.
82 
83  Assumptions:
84 
85  The class generally assumes that the
86  :attr:`~ost.mol.ResidueHandle.is_ligand` property is properly set on all
87  the ligand atoms, and only ligand atoms. This is typically the case for
88  entities loaded from mmCIF (tested with mmCIF files from the PDB and
89  SWISS-MODEL), but it will most likely not work for most entities loaded
90  from PDB files.
91 
92  The class doesn't perform any cleanup of the provided structures.
93  It is up to the caller to ensure that the data is clean and suitable for
94  scoring. :ref:`Molck <molck>` should be used with extra
95  care, as many of the options (such as `rm_non_std` or `map_nonstd_res`) can
96  cause ligands to be removed from the structure. If cleanup with Molck is
97  needed, ligands should be kept aside and passed separately. Non-ligand residues
98  should be valid compounds with atom names following the naming conventions
99  of the component dictionary. Non-standard residues are acceptable, and if
100  the model contains a standard residue at that position, only atoms with
101  matching names will be considered.
102 
103  Unlike most of OpenStructure, this class does not assume that the ligands
104  (either in the model or the target) are part of the PDB component
105  dictionary. They may have arbitrary residue names. Residue names do not
106  have to match between the model and the target. Matching is based on
107  the calculation of isomorphisms which depend on the atom element name and
108  atom connectivity (bond order is ignored).
109  It is up to the caller to ensure that the connectivity of atoms is properly
110  set before passing any ligands to this class. Ligands with improper
111  connectivity will lead to bogus results.
112 
113  Note, however, that atom names should be unique within a residue (ie two
114  distinct atoms cannot have the same atom name).
115 
116  This only applies to the ligand. The rest of the model and target
117  structures (protein, nucleic acids) must still follow the usual rules and
118  contain only residues from the compound library.
119 
120  Although it isn't a requirement, hydrogen atoms should be removed from the
121  structures. Here is an example code snippet that will perform a reasonable
122  cleanup. Keep in mind that this is most likely not going to work as
123  expected with entities loaded from PDB files, as the `is_ligand` flag is
124  probably not set properly.
125 
126  Here is a snippet example of how to use this code::
127 
128  from ost.mol.alg.ligand_scoring import LigandScorer
129  from ost.mol.alg import Molck, MolckSettings
130 
131  # Load data
132  # Structure model in PDB format, containing the receptor only
133  model = io.LoadPDB("path_to_model.pdb")
134  # Ligand model as SDF file
135  model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf")
136  # Target loaded from mmCIF, containing the ligand
137  target = io.LoadMMCIF("path_to_target.cif")
138 
139  # Cleanup a copy of the structures
140  cleaned_model = model.Copy()
141  cleaned_target = target.Copy()
142  molck_settings = MolckSettings(rm_unk_atoms=True,
143  rm_non_std=False,
144  rm_hyd_atoms=True,
145  rm_oxt_atoms=False,
146  rm_zero_occ_atoms=False,
147  colored=False,
148  map_nonstd_res=False,
149  assign_elem=True)
150  Molck(cleaned_model, conop.GetDefaultLib(), molck_settings)
151  Molck(cleaned_target, conop.GetDefaultLib(), molck_settings)
152 
153  # Setup scorer object and compute lDDT-PLI
154  model_ligands = [model_ligand.Select("ele != H")]
155  ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands)
156  print("lDDT-PLI:", ls.lddt_pli)
157  print("RMSD:", ls.rmsd)
158 
159  :param model: Model structure - a deep copy is available as :attr:`model`.
160  No additional processing (ie. Molck), checks,
161  stereochemistry checks or sanitization is performed on the
162  input. Hydrogen atoms are kept.
163  :type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
164  :param target: Target structure - a deep copy is available as :attr:`target`.
165  No additional processing (ie. Molck), checks or sanitization
166  is performed on the input. Hydrogen atoms are kept.
167  :type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
168  :param model_ligands: Model ligands, as a list of
169  :class:`~ost.mol.ResidueHandle` belonging to the model
170  entity. Can be instantiated with either a :class:list of
171  :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
172  or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`.
173  If `None`, ligands will be extracted from the `model` entity,
174  from chains with :class:`~ost.mol.ChainType`
175  `CHAINTYPE_NON_POLY` (this is normally set properly in
176  entities loaded from mmCIF).
177  :type model_ligands: :class:`list`
178  :param target_ligands: Target ligands, as a list of
179  :class:`~ost.mol.ResidueHandle` belonging to the target
180  entity. Can be instantiated either a :class:list of
181  :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
182  or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
183  containing a single residue each. If `None`, ligands will be
184  extracted from the `target` entity, from chains with
185  :class:`~ost.mol.ChainType` `CHAINTYPE_NON_POLY` (this is
186  normally set properly in entities loaded from mmCIF).
187  :type target_ligands: :class:`list`
188  :param resnum_alignments: Whether alignments between chemically equivalent
189  chains in *model* and *target* can be computed
190  based on residue numbers. This can be assumed in
191  benchmarking setups such as CAMEO/CASP.
192  :type resnum_alignments: :class:`bool`
193  :param check_resnames: On by default. Enforces residue name matches
194  between mapped model and target residues.
195  :type check_resnames: :class:`bool`
196  :param rename_ligand_chain: If a residue with the same chain name and
197  residue number than an explicitly passed model
198  or target ligand exits in the structure,
199  and `rename_ligand_chain` is False, a
200  RuntimeError will be raised. If
201  `rename_ligand_chain` is True, the ligand will
202  be moved to a new chain instead, and the move
203  will be logged to the console with SCRIPT
204  level.
205  :type rename_ligand_chain: :class:`bool`
206  :param chain_mapper: a chain mapper initialized for the target structure.
207  If None (default), a chain mapper will be initialized
208  lazily as required.
209  :type chain_mapper: :class:`ost.mol.alg.chain_mapping.ChainMapper`
210  :param substructure_match: Set this to True to allow partial target ligand.
211  :type substructure_match: :class:`bool`
212  :param coverage_delta: the coverage delta for partial ligand assignment.
213  :type coverage_delta: :class:`float`
214  :param radius: Inclusion radius for the binding site. Residues with
215  atoms within this distance of the ligand will be considered
216  for inclusion in the binding site.
217  :type radius: :class:`float`
218  :param lddt_pli_radius: lDDT inclusion radius for lDDT-PLI.
219  :type lddt_pli_radius: :class:`float`
220  :param lddt_lp_radius: lDDT inclusion radius for lDDT-LP.
221  :type lddt_lp_radius: :class:`float`
222  :param binding_sites_topn: maximum number of target binding site
223  representations to assess, per target ligand.
224  Ignored if `global_chain_mapping` is True.
225  :type binding_sites_topn: :class:`int`
226  :param global_chain_mapping: set to True to use a global chain mapping for
227  the polymer (protein, nucleotide) chains.
228  Defaults to False, in which case only local
229  chain mappings are allowed (where different
230  ligand may be scored against different chain
231  mappings).
232  :type global_chain_mapping: :class:`bool`
233  :param custom_mapping: Provide custom chain mapping between *model* and
234  *target* that is used as global chain mapping.
235  Dictionary with target chain names as key and model
236  chain names as value. Only has an effect if
237  *global_chain_mapping* is True.
238  :type custom_mapping: :class:`dict`
239  :param rmsd_assignment: assign ligands based on RMSD only. The default
240  (False) is to use a combination of lDDT-PLI and
241  RMSD for the assignment.
242  :type rmsd_assignment: :class:`bool`
243  :param n_max_naive: Parameter for global chain mapping. If *model* and
244  *target* have less or equal that number of chains,
245  the full
246  mapping solution space is enumerated to find the
247  the optimum. A heuristic is used otherwise.
248  :type n_max_naive: :class:`int`
249  :param unassigned: If True, unassigned model ligands are reported in
250  the output together with assigned ligands, with a score
251  of None, and reason for not being assigned in the
252  \*_details matrix. Defaults to False.
253  :type unassigned: :class:`bool`
254  """
255  def __init__(self, model, target, model_ligands=None, target_ligands=None,
256  resnum_alignments=False, check_resnames=True,
257  rename_ligand_chain=False,
258  chain_mapper=None, substructure_match=False,
259  coverage_delta=0.2,
260  radius=4.0, lddt_pli_radius=6.0, lddt_lp_radius=10.0,
261  binding_sites_topn=100000, global_chain_mapping=False,
262  rmsd_assignment=False, n_max_naive=12,
263  custom_mapping=None, unassigned=False):
264 
265  if isinstance(model, mol.EntityView):
266  self.model = mol.CreateEntityFromView(model, False)
267  elif isinstance(model, mol.EntityHandle):
268  self.model = model.Copy()
269  else:
270  raise RuntimeError("model must be of type EntityView/EntityHandle")
271 
272  if isinstance(target, mol.EntityView):
273  self.target = mol.CreateEntityFromView(target, False)
274  elif isinstance(target, mol.EntityHandle):
275  self.target = target.Copy()
276  else:
277  raise RuntimeError("target must be of type EntityView/EntityHandle")
278 
279  # Extract ligands from target
280  if target_ligands is None:
282  else:
283  self.target_ligands = self._prepare_ligands(self.target, target,
284  target_ligands,
285  rename_ligand_chain)
286  if len(self.target_ligands) == 0:
287  LogWarning("No ligands in the target")
288 
289  # Extract ligands from model
290  if model_ligands is None:
292  else:
293  self.model_ligands = self._prepare_ligands(self.model, model,
294  model_ligands,
295  rename_ligand_chain)
296  if len(self.model_ligands) == 0:
297  LogWarning("No ligands in the model")
298  if len(self.target_ligands) == 0:
299  raise ValueError("No ligand in the model and in the target")
300 
301  self._chain_mapper = chain_mapper
302  self.resnum_alignments = resnum_alignments
303  self.check_resnames = check_resnames
304  self.rename_ligand_chain = rename_ligand_chain
305  self.substructure_match = substructure_match
306  self.radius = radius
307  self.lddt_pli_radius = lddt_pli_radius
308  self.lddt_lp_radius = lddt_lp_radius
309  self.binding_sites_topn = binding_sites_topn
310  self.global_chain_mapping = global_chain_mapping
311  self.rmsd_assignment = rmsd_assignment
312  self.n_max_naive = n_max_naive
313  self.unassigned = unassigned
314  self.coverage_delta = coverage_delta
315 
316  # scoring matrices
317  self._rmsd_matrix = None
318  self._rmsd_full_matrix = None
319  self._lddt_pli_matrix = None
320  self._lddt_pli_full_matrix = None
321 
322  # lazily computed scores
323  self._rmsd = None
324  self._rmsd_details = None
325  self._lddt_pli = None
326  self._lddt_pli_details = None
327 
328  # lazily precomputed variables
329  self._binding_sites = {}
330  self.__model_mapping = None
331 
332  # Bookkeeping of unassigned ligands
333  self._unassigned_target_ligands = None
334  self._unassigned_model_ligands = None
340  # Keep track of symmetries/isomorphisms (regardless of scoring)
341  # 0.0: no isomorphism
342  # 1.0: isomorphic
343  # np.nan: not assessed yet - that's why we can't use a boolean
344  self._assignment_isomorphisms = None
345  # Keep track of match coverage (only in case there was a score)
346  self._assignment_match_coverage = None
347 
348  if custom_mapping is not None:
349  self._set_custom_mapping(custom_mapping)
350 
351  @property
352  def chain_mapper(self):
353  """ Chain mapper object for the given :attr:`target`.
354 
355  :type: :class:`ost.mol.alg.chain_mapping.ChainMapper`
356  """
357  if self._chain_mapper is None:
359  n_max_naive=1e9,
360  resnum_alignments=self.resnum_alignments)
361  return self._chain_mapper
362 
363  @property
364  def _model_mapping(self):
365  """Get the global chain mapping for the model."""
366  if self.__model_mapping is None:
367  self.__model_mapping = self.chain_mapper.GetMapping(self.model,
368  n_max_naive=self.n_max_naive)
369  return self.__model_mapping
370 
371  @staticmethod
372  def _extract_ligands(entity):
373  """Extract ligands from entity. Return a list of residues.
374 
375  Assumes that ligands are contained in one or more chain with chain type
376  `mol.ChainType.CHAINTYPE_NON_POLY`. This is typically the case
377  for entities loaded from mmCIF (tested with mmCIF files from the PDB
378  and SWISS-MODEL), but it will most likely not work for most entities
379  loaded from PDB files.
380 
381  As a deviation from the mmCIF semantics, we allow a chain, set as
382  `CHAINTYPE_NON_POLY`, to contain more than one ligand. This function
383  performs basic checks to ensure that the residues in this chain are
384  not forming polymer bonds (ie peptide/nucleotide ligands) and will
385  raise a RuntimeError if this assumption is broken.
386 
387  Note: This will not extract ligands based on the HET record in the old
388  PDB style, as this is not a reliable indicator and depends on how the
389  entity was loaded.
390 
391  :param entity: the entity to extract ligands from
392  :type entity: :class:`~ost.mol.EntityHandle`
393  :rtype: :class:`list` of :class:`~ost.mol.ResidueHandle`
394 
395  """
396  extracted_ligands = []
397  for chain in entity.chains:
398  if chain.chain_type == mol.ChainType.CHAINTYPE_NON_POLY:
399  for residue in chain.residues:
400  if mol.InSequence(residue, residue.next):
401  raise RuntimeError("Connected residues in non polymer "
402  "chain %s" % (chain.name))
403  residue.SetIsLigand(True) # just in case
404  extracted_ligands.append(residue)
405  LogVerbose("Detected residue %s as ligand" % residue)
406  return extracted_ligands
407 
408  @staticmethod
409  def _prepare_ligands(new_entity, old_entity, ligands, rename_chain):
410  """Prepare the ligands given into a list of ResidueHandles which are
411  part of the copied entity, suitable for the model_ligands and
412  target_ligands properties.
413 
414  This function takes a list of ligands as (Entity|Residue)(Handle|View).
415  Entities can contain multiple ligands, which will be considered as
416  separate ligands.
417 
418  Ligands which are part of the entity are simply fetched in the new
419  copied entity. Otherwise, they are copied over to the copied entity.
420  """
421  extracted_ligands = []
422 
423  next_chain_num = 1
424  new_editor = None
425 
426  def _copy_residue(residue, rename_chain):
427  """ Copy the residue into the new chain.
428  Return the new residue handle."""
429  nonlocal next_chain_num, new_editor
430 
431  # Instantiate the editor
432  if new_editor is None:
433  new_editor = new_entity.EditXCS()
434 
435  new_chain = new_entity.FindChain(residue.chain.name)
436  if not new_chain.IsValid():
437  new_chain = new_editor.InsertChain(residue.chain.name)
438  else:
439  # Does a residue with the same name already exist?
440  already_exists = new_chain.FindResidue(residue.number).IsValid()
441  if already_exists:
442  if rename_chain:
443  chain_ext = 2 # Extend the chain name by this
444  while True:
445  new_chain_name = residue.chain.name + "_" + str(chain_ext)
446  new_chain = new_entity.FindChain(new_chain_name)
447  if new_chain.IsValid():
448  chain_ext += 1
449  continue
450  else:
451  new_chain = new_editor.InsertChain(new_chain_name)
452  break
453  LogScript("Moved ligand residue %s to new chain %s" % (
454  residue.qualified_name, new_chain.name))
455  else:
456  msg = "A residue number %s already exists in chain %s" % (
457  residue.number, residue.chain.name)
458  raise RuntimeError(msg)
459 
460  # Add the residue with its original residue number
461  new_res = new_editor.AppendResidue(new_chain, residue.name, residue.number)
462  # Add atoms
463  for old_atom in residue.atoms:
464  new_editor.InsertAtom(new_res, old_atom.name, old_atom.pos,
465  element=old_atom.element, occupancy=old_atom.occupancy,
466  b_factor=old_atom.b_factor, is_hetatm=old_atom.is_hetatom)
467  # Add bonds
468  for old_atom in residue.atoms:
469  for old_bond in old_atom.bonds:
470  new_first = new_res.FindAtom(old_bond.first.name)
471  new_second = new_res.FindAtom(old_bond.second.name)
472  new_editor.Connect(new_first, new_second)
473  return new_res
474 
475  def _process_ligand_residue(res, rename_chain):
476  """Copy or fetch the residue. Return the residue handle."""
477  new_res = None
478  if res.entity.handle == old_entity.handle:
479  # Residue is part of the old_entity handle.
480  # However, it may not be in the copied one, for instance it may have been a view
481  # We try to grab it first, otherwise we copy it
482  new_res = new_entity.FindResidue(res.chain.name, res.number)
483  if new_res and new_res.valid:
484  LogVerbose("Ligand residue %s already in entity" % res.handle.qualified_name)
485  else:
486  # Residue is not part of the entity, need to copy it first
487  new_res = _copy_residue(res, rename_chain)
488  LogVerbose("Copied ligand residue %s" % res.handle.qualified_name)
489  new_res.SetIsLigand(True)
490  return new_res
491 
492  for ligand in ligands:
493  if isinstance(ligand, mol.EntityHandle) or isinstance(ligand, mol.EntityView):
494  for residue in ligand.residues:
495  new_residue = _process_ligand_residue(residue, rename_chain)
496  extracted_ligands.append(new_residue)
497  elif isinstance(ligand, mol.ResidueHandle) or isinstance(ligand, mol.ResidueView):
498  new_residue = _process_ligand_residue(ligand, rename_chain)
499  extracted_ligands.append(new_residue)
500  else:
501  raise RuntimeError("Ligands should be given as Entity or Residue")
502 
503  if new_editor is not None:
504  new_editor.UpdateICS()
505  return extracted_ligands
506 
507  def _get_binding_sites(self, ligand):
508  """Find representations of the binding site of *ligand* in the model.
509 
510  Only consider protein and nucleic acid chains that pass the criteria
511  for the :class:`ost.mol.alg.chain_mapping`. This means ignoring other
512  ligands, waters, short polymers as well as any incorrectly connected
513  chain that may be in proximity.
514 
515  :param ligand: Defines the binding site to identify.
516  :type ligand: :class:`~ost.mol.ResidueHandle`
517  """
518  if ligand.hash_code not in self._binding_sites:
519 
520  # create view of reference binding site
521  ref_residues_hashes = set() # helper to keep track of added residues
522  ignored_residue_hashes = {ligand.hash_code}
523  for ligand_at in ligand.atoms:
524  close_atoms = self.target.FindWithin(ligand_at.GetPos(), self.radius)
525  for close_at in close_atoms:
526  # Skip any residue not in the chain mapping target
527  ref_res = close_at.GetResidue()
528  h = ref_res.handle.GetHashCode()
529  if h not in ref_residues_hashes and \
530  h not in ignored_residue_hashes:
531  if self.chain_mapper.target.ViewForHandle(ref_res).IsValid():
532  h = ref_res.handle.GetHashCode()
533  ref_residues_hashes.add(h)
534  elif ref_res.is_ligand:
535  LogWarning("Ignoring ligand %s in binding site of %s" % (
536  ref_res.qualified_name, ligand.qualified_name))
537  ignored_residue_hashes.add(h)
538  elif ref_res.chem_type == mol.ChemType.WATERS:
539  pass # That's ok, no need to warn
540  else:
541  LogWarning("Ignoring residue %s in binding site of %s" % (
542  ref_res.qualified_name, ligand.qualified_name))
543  ignored_residue_hashes.add(h)
544 
545  if ref_residues_hashes:
546  # reason for doing that separately is to guarantee same ordering of
547  # residues as in underlying entity. (Reorder by ResNum seems only
548  # available on ChainHandles)
549  ref_bs = self.target.CreateEmptyView()
550  for ch in self.target.chains:
551  for r in ch.residues:
552  if r.handle.GetHashCode() in ref_residues_hashes:
553  ref_bs.AddResidue(r, mol.ViewAddFlag.INCLUDE_ALL)
554  if len(ref_bs.residues) == 0:
555  raise RuntimeError("Failed to add proximity residues to "
556  "the reference binding site entity")
557 
558  # Find the representations
559  if self.global_chain_mapping:
560  self._binding_sites[ligand.hash_code] = self.chain_mapper.GetRepr(
561  ref_bs, self.model, inclusion_radius=self.lddt_lp_radius,
562  global_mapping = self._model_mapping)
563  else:
564  self._binding_sites[ligand.hash_code] = self.chain_mapper.GetRepr(
565  ref_bs, self.model, inclusion_radius=self.lddt_lp_radius,
566  topn=self.binding_sites_topn)
567 
568  # Flag empty representation
569  if not self._binding_sites[ligand.hash_code]:
570  self._unassigned_target_ligands_reason[ligand] = (
571  "model_representation",
572  "No representation of the reference binding site was "
573  "found in the model")
574 
575  else: # if ref_residues_hashes
576  # Flag missing binding site
577  self._unassigned_target_ligands_reason[ligand] = ("binding_site",
578  "No residue in proximity of the target ligand")
579  self._binding_sites[ligand.hash_code] = []
580 
581  return self._binding_sites[ligand.hash_code]
582 
583  @staticmethod
584  def _build_binding_site_entity(ligand, residues, extra_residues=[]):
585  """ Build an entity with all the binding site residues in chain A
586  and the ligand in chain _. Residues are renumbered consecutively from
587  1. The ligand is assigned residue number 1 and residue name LIG.
588  Residues in extra_residues not in `residues` in the model are added
589  at the end of chain A.
590 
591  :param ligand: the Residue Handle of the ligand
592  :type ligand: :class:`~ost.mol.ResidueHandle`
593  :param residues: a list of binding site residues
594  :type residues: :class:`list` of :class:`~ost.mol.ResidueHandle`
595  :param extra_residues: an optional list with addition binding site
596  residues. Residues in this list which are not
597  in `residues` will be added at the end of chain
598  A. This allows for instance adding unmapped
599  residues missing from the model into the
600  reference binding site.
601  :type extra_residues: :class:`list` of :class:`~ost.mol.ResidueHandle`
602  :rtype: :class:`~ost.mol.EntityHandle`
603  """
604  bs_ent = mol.CreateEntity()
605  ed = bs_ent.EditXCS()
606  bs_chain = ed.InsertChain("A")
607  seen_res_qn = []
608  for resnum, old_res in enumerate(residues, 1):
609  seen_res_qn.append(old_res.qualified_name)
610  new_res = ed.AppendResidue(bs_chain, old_res.handle,
611  deep=True)
612  ed.SetResidueNumber(new_res, mol.ResNum(resnum))
613 
614  # Add extra residues at the end.
615  for extra_res in extra_residues:
616  if extra_res.qualified_name not in seen_res_qn:
617  resnum += 1
618  seen_res_qn.append(extra_res.qualified_name)
619  new_res = ed.AppendResidue(bs_chain,
620  extra_res.handle,
621  deep=True)
622  ed.SetResidueNumber(new_res, mol.ResNum(resnum))
623  # Add the ligand in chain _
624  ligand_chain = ed.InsertChain("_")
625  ligand_res = ed.AppendResidue(ligand_chain, ligand,
626  deep=True)
627  ed.RenameResidue(ligand_res, "LIG")
628  ed.SetResidueNumber(ligand_res, mol.ResNum(1))
629  ed.UpdateICS()
630 
631  return bs_ent
632 
633  def _compute_scores(self):
634  """
635  Compute the RMSD and lDDT-PLI scores for every possible target-model
636  ligand pair and store the result in internal matrices.
637  """
638  # Create the result matrices
639  rmsd_full_matrix = np.empty(
640  (len(self.target_ligands), len(self.model_ligands)), dtype=dict)
641  lddt_pli_full_matrix = np.empty(
642  (len(self.target_ligands), len(self.model_ligands)), dtype=dict)
643  self._assignment_isomorphisms = np.full(
644  (len(self.target_ligands), len(self.model_ligands)), fill_value=np.nan)
645  self._assignment_match_coverage = np.zeros(
646  (len(self.target_ligands), len(self.model_ligands)))
647 
648  for target_i, target_ligand in enumerate(self.target_ligands):
649  LogVerbose("Analyzing target ligand %s" % target_ligand)
650 
651  for binding_site in self._get_binding_sites(target_ligand):
652  LogVerbose("Found binding site with chain mapping %s" % (binding_site.GetFlatChainMapping()))
653 
654  ref_bs_ent = self._build_binding_site_entity(
655  target_ligand, binding_site.ref_residues,
656  binding_site.substructure.residues)
657  ref_bs_ent_ligand = ref_bs_ent.FindResidue("_", 1) # by definition
658 
659  custom_compounds = {
660  ref_bs_ent_ligand.name:
661  mol.alg.lddt.CustomCompound.FromResidue(
662  ref_bs_ent_ligand)}
663  lddt_scorer = mol.alg.lddt.lDDTScorer(
664  ref_bs_ent,
665  custom_compounds=custom_compounds,
666  inclusion_radius=self.lddt_pli_radius)
667 
668  for model_i, model_ligand in enumerate(self.model_ligands):
669  try:
670  symmetries = _ComputeSymmetries(
671  model_ligand, target_ligand,
672  substructure_match=self.substructure_match,
673  by_atom_index=True)
674  LogVerbose("Ligands %s and %s symmetry match" % (
675  str(model_ligand), str(target_ligand)))
676  except NoSymmetryError:
677  # Ligands are different - skip
678  LogVerbose("No symmetry between %s and %s" % (
679  str(model_ligand), str(target_ligand)))
680  self._assignment_isomorphisms[target_i, model_i] = 0
681  continue
682  except DisconnectedGraphError:
683  # Disconnected graph is handled elsewhere
684  continue
685  substructure_match = len(symmetries[0][0]) != len(
686  model_ligand.atoms)
687  coverage = len(symmetries[0][0]) / len(model_ligand.atoms)
688  self._assignment_match_coverage[target_i, model_i] = coverage
689  self._assignment_isomorphisms[target_i, model_i] = 1
690 
691  rmsd = SCRMSD(model_ligand, target_ligand,
692  transformation=binding_site.transform,
693  substructure_match=self.substructure_match)
694  LogDebug("RMSD: %.4f" % rmsd)
695 
696  # Save results?
697  if not rmsd_full_matrix[target_i, model_i] or \
698  rmsd_full_matrix[target_i, model_i]["rmsd"] > rmsd:
699  rmsd_full_matrix[target_i, model_i] = {
700  "rmsd": rmsd,
701  "lddt_lp": binding_site.lDDT,
702  "bs_ref_res": binding_site.substructure.residues,
703  "bs_ref_res_mapped": binding_site.ref_residues,
704  "bs_mdl_res_mapped": binding_site.mdl_residues,
705  "bb_rmsd": binding_site.bb_rmsd,
706  "target_ligand": target_ligand,
707  "model_ligand": model_ligand,
708  "chain_mapping": binding_site.GetFlatChainMapping(),
709  "transform": binding_site.transform,
710  "substructure_match": substructure_match,
711  "coverage": coverage,
712  "inconsistent_residues": binding_site.inconsistent_residues,
713  }
714  if self.unassigned:
715  rmsd_full_matrix[target_i, model_i][
716  "unassigned"] = False
717  LogDebug("Saved RMSD")
718 
719  mdl_bs_ent = self._build_binding_site_entity(
720  model_ligand, binding_site.mdl_residues, [])
721  mdl_bs_ent_ligand = mdl_bs_ent.FindResidue("_", 1) # by definition
722 
723  # Now for each symmetry, loop and rename atoms according
724  # to ref.
725  mdl_editor = mdl_bs_ent.EditXCS()
726  for i, (trg_sym, mdl_sym) in enumerate(symmetries):
727  # Prepare Entities for RMSD
728  for mdl_anum, trg_anum in zip(mdl_sym, trg_sym):
729  # Rename model atoms according to symmetry
730  trg_atom = ref_bs_ent_ligand.atoms[trg_anum]
731  mdl_atom = mdl_bs_ent_ligand.atoms[mdl_anum]
732  mdl_editor.RenameAtom(mdl_atom, trg_atom.name)
733  mdl_editor.UpdateICS()
734 
735  global_lddt, local_lddt, lddt_tot, lddt_cons, n_res, \
736  n_cont, n_cons = lddt_scorer.lDDT(
737  mdl_bs_ent, chain_mapping={"A": "A", "_": "_"},
738  no_intrachain=True,
739  return_dist_test=True,
740  check_resnames=self.check_resnames)
741  LogDebug("lDDT-PLI for symmetry %d: %.4f" % (i, global_lddt))
742 
743  # Save results?
744  if not lddt_pli_full_matrix[target_i, model_i]:
745  # First iteration
746  save_lddt = True
747  else:
748  last_best_lddt = lddt_pli_full_matrix[
749  target_i, model_i]["lddt_pli"]
750  last_best_rmsd = lddt_pli_full_matrix[
751  target_i, model_i]["rmsd"]
752  if global_lddt > last_best_lddt:
753  # Better lDDT-PLI
754  save_lddt = True
755  elif global_lddt == last_best_lddt and \
756  rmsd < last_best_rmsd:
757  # Same lDDT-PLI, better RMSD
758  save_lddt = True
759  else:
760  save_lddt = False
761  if save_lddt:
762  lddt_pli_full_matrix[target_i, model_i] = {
763  "lddt_pli": global_lddt,
764  "rmsd": rmsd,
765  "lddt_lp": binding_site.lDDT,
766  "lddt_pli_n_contacts": lddt_tot,
767  "bs_ref_res": binding_site.substructure.residues,
768  "bs_ref_res_mapped": binding_site.ref_residues,
769  "bs_mdl_res_mapped": binding_site.mdl_residues,
770  "bb_rmsd": binding_site.bb_rmsd,
771  "target_ligand": target_ligand,
772  "model_ligand": model_ligand,
773  "chain_mapping": binding_site.GetFlatChainMapping(),
774  "transform": binding_site.transform,
775  "substructure_match": substructure_match,
776  "coverage": coverage,
777  "inconsistent_residues": binding_site.inconsistent_residues,
778  }
779  if self.unassigned:
780  lddt_pli_full_matrix[target_i, model_i][
781  "unassigned"] = False
782  LogDebug("Saved lDDT-PLI")
783 
784  self._rmsd_full_matrix = rmsd_full_matrix
785  self._lddt_pli_full_matrix = lddt_pli_full_matrix
786 
787  @staticmethod
788  def _find_ligand_assignment(mat1, mat2=None, coverage=None, coverage_delta=None):
789  """ Find the ligand assignment based on mat1. If mat2 is provided, it
790  will be used to break ties in mat1. If mat2 is not provided, ties will
791  be resolved by taking the first match arbitrarily.
792 
793  Both mat1 and mat2 should "look" like RMSD - ie be between inf (bad)
794  and 0 (good).
795  """
796  # We will modify mat1 and mat2, so make copies of it first
797  mat1 = np.copy(mat1)
798  if mat2 is None:
799  mat2 = np.copy(mat1)
800  mat2[~np.isnan(mat2)] = np.inf
801  else:
802  mat2 = np.copy(mat2)
803  if coverage is None:
804  coverage = np.copy(mat1)
805  coverage[:] = 1 # Assume full coverage by default
806  else:
807  coverage = np.copy(coverage)
808 
809  assignments = []
810  if 0 in mat1.shape:
811  # No model or target ligand
812  LogDebug("No model or target ligand, returning no assignment.")
813  return assignments
814 
815  def _get_best_match(mat1_val, coverage_val):
816  """ Extract the row/column indices of the prediction matching the
817  given values."""
818  mat1_match_idx = np.argwhere((mat1 == mat1_val) & (coverage >= coverage_val))
819  # Multiple "best" - use mat2 to disambiguate
820  if len(mat1_match_idx) > 1:
821  # Get the values of mat2 at these positions
822  best_mat2_match = [mat2[tuple(x)] for x in mat1_match_idx]
823  # Find the index of the best mat2
824  # Note: argmin returns the first value which is min.
825  best_mat2_idx = np.array(best_mat2_match).argmin()
826  # Now get the original indices
827  return mat1_match_idx[best_mat2_idx]
828  else:
829  return mat1_match_idx[0]
830 
831  # First only consider top coverage matches.
832  min_coverage = np.max(coverage)
833  while min_coverage > 0:
834  LogVerbose("Looking for matches with coverage >= %s" % min_coverage)
835  min_mat1 = LigandScorer._nanmin_nowarn(mat1, coverage < min_coverage)
836  while not np.isnan(min_mat1):
837  max_i_trg, max_i_mdl = _get_best_match(min_mat1, min_coverage)
838 
839  # Would we have a match for this model ligand with higher score
840  # but lower coverage?
841  alternative_matches = (mat1[:, max_i_mdl] < min_mat1) & (
842  coverage[:, max_i_mdl] > (min_coverage - coverage_delta))
843  if np.any(alternative_matches):
844  # Get the scores of these matches
845  LogVerbose("Found match with lower coverage but better score")
846  min_mat1 = np.min(mat1[alternative_matches])
847  max_i_trg, max_i_mdl = _get_best_match(min_mat1, min_coverage - coverage_delta)
848 
849  # Disable row and column
850  mat1[max_i_trg, :] = np.nan
851  mat1[:, max_i_mdl] = np.nan
852  mat2[max_i_trg, :] = np.nan
853  mat2[:, max_i_mdl] = np.nan
854  coverage[max_i_trg, :] = -np.inf
855  coverage[:, max_i_mdl] = -np.inf
856 
857  # Save
858  assignments.append((max_i_trg, max_i_mdl))
859 
860  # Recompute min
861  min_mat1 = LigandScorer._nanmin_nowarn(mat1, coverage < min_coverage)
862  # Recompute min_coverage
863  min_coverage = np.max(coverage)
864  return assignments
865 
866  @staticmethod
867  def _nanmin_nowarn(array, mask):
868  """Compute np.nanmin but ignore the RuntimeWarning."""
869  masked_array = np_ma.masked_array(array, mask=mask)
870  with warnings.catch_warnings(): # RuntimeWarning: All-NaN slice encountered
871  warnings.simplefilter("ignore")
872  min = np.nanmin(masked_array, )
873  if np_ma.is_masked(min):
874  return np.nan # Everything was masked
875  else:
876  return min
877 
878 
879 
880 
881  @staticmethod
882  def _reverse_lddt(lddt):
883  """Reverse lDDT means turning it from a number between 0 and 1 to a
884  number between infinity and 0 (0 being better).
885 
886  In practice, this is 1/lDDT. If lDDT is 0, the result is infinity.
887  """
888  with warnings.catch_warnings(): # RuntimeWarning: divide by zero
889  warnings.simplefilter("ignore")
890  return np.float64(1) / lddt
891 
892  def _assign_ligands_rmsd(self):
893  """Assign (map) ligands between model and target.
894 
895  Sets self._rmsd and self._rmsd_details.
896  """
897  mat2 = self._reverse_lddt(self.lddt_pli_matrix)
898 
899  mat_tuple = self._assign_matrices(self.rmsd_matrix,
900  mat2,
901  self._rmsd_full_matrix,
902  "rmsd")
903  self._rmsd = mat_tuple[0]
904  self._rmsd_details = mat_tuple[1]
905  # Ignore unassigned ligands - they are dealt with in lddt_pli.
906  # So the following lines should stay commented out:
907  # self._unassigned_target_ligands = mat_tuple[2]
908  # self._unassigned_model_ligands = mat_tuple[3]
909 
910  def _assign_matrices(self, mat1, mat2, data, main_key):
911  """
912  Perform the ligand assignment, ie find the mapping between model and
913  target ligands.
914 
915  The algorithm starts by assigning the "best" mapping, and then discards
916  the target and model ligands (row, column) so that every model ligand
917  can be assigned to a single target ligand, and every target ligand
918  is only assigned to a single model ligand. Repeat until there is
919  nothing left to assign.
920 
921  In case of a tie in values in `mat1`, it uses `mat2` to break the tie.
922 
923  This algorithm doesn't guarantee a globally optimal assignment.
924 
925  Both `mat1` and `mat2` should contain values between 0 and infinity,
926  with lower values representing better scores. Use the
927  :meth:`_reverse_lddt` method to convert lDDT values to such a score.
928 
929  :param mat1: the main ligand assignment criteria (RMSD or lDDT-PLI)
930  :param mat2: the secondary ligand assignment criteria (lDDT-PLI or RMSD)
931  :param data: the data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
932  :param main_key: the key of data (dictionnaries within `data`) to
933  assign into out_main.
934  :return: a tuple with 2 dictionaries of matrices containing the main
935  data, and details, respectively.
936  """
937  assignments = self._find_ligand_assignment(mat1, mat2,
939  self.coverage_delta)
940  out_main = {}
941  out_details = {}
942  assigned_trg = [False] * len(self.target_ligands)
943  assigned_mdl = [False] * len(self.model_ligands)
944  for assignment in assignments:
945  trg_idx, mdl_idx = assignment
946  assigned_mdl[mdl_idx] = True
947  assigned_trg[trg_idx] = True
948  mdl_lig = self.model_ligands[mdl_idx]
949  mdl_cname = mdl_lig.chain.name
950  mdl_resnum = mdl_lig.number
951  if mdl_cname not in out_main:
952  out_main[mdl_cname] = {}
953  out_details[mdl_cname] = {}
954  out_main[mdl_cname][mdl_resnum] = data[
955  trg_idx, mdl_idx][main_key]
956  out_details[mdl_cname][mdl_resnum] = data[
957  trg_idx, mdl_idx]
958 
959 
960  unassigned_trg, unassigned_mdl = self._assign_unassigned(
961  assigned_trg, assigned_mdl, [out_main], [out_details], [main_key])
962  return out_main, out_details, unassigned_trg, unassigned_mdl
963 
964  def _assign_unassigned(self, assigned_trg, assigned_mdl,
965  out_main, out_details, main_key):
966  unassigned_trg = {}
967  unassigned_mdl = {}
968 
969  unassigned_trg_idx = [i for i, x in enumerate(assigned_trg) if not x]
970  unassigned_mdl_idx = [i for i, x in enumerate(assigned_mdl) if not x]
971 
972  for mdl_idx in unassigned_mdl_idx:
973  mdl_lig = self.model_ligands[mdl_idx]
974  reason = self._find_unassigned_model_ligand_reason(mdl_lig, check=False)
975  mdl_cname = mdl_lig.chain.name
976  mdl_resnum = mdl_lig.number
977  if mdl_cname not in unassigned_mdl:
978  unassigned_mdl[mdl_cname] = {}
979  unassigned_mdl[mdl_cname][mdl_resnum] = reason
980  if self.unassigned:
981  for i, _ in enumerate(out_main):
982  if mdl_cname not in out_main[i]:
983  out_main[i][mdl_cname] = {}
984  out_details[i][mdl_cname] = {}
985  out_main[i][mdl_cname][mdl_resnum] = None
986  out_details[i][mdl_cname][mdl_resnum] = {
987  "unassigned": True,
988  "reason_short": reason[0],
989  "reason_long": reason[1],
990  main_key[i]: None,
991  }
992  LogInfo("Model ligand %s is unassigned: %s" % (
993  mdl_lig.qualified_name, reason[1]))
994 
995  for trg_idx in unassigned_trg_idx:
996  trg_lig = self.target_ligands[trg_idx]
997  reason = self._find_unassigned_target_ligand_reason(trg_lig, check=False)
998  trg_cname = trg_lig.chain.name
999  trg_resnum = trg_lig.number
1000  if trg_cname not in unassigned_trg:
1001  unassigned_trg[trg_cname] = {}
1002  unassigned_trg[trg_cname][trg_resnum] = reason
1003  LogInfo("Target ligand %s is unassigned: %s" % (
1004  trg_lig.qualified_name, reason[1]))
1005 
1006  return unassigned_trg, unassigned_mdl
1007 
1008 
1009  def _assign_matrix(self, mat, data1, main_key1, data2, main_key2):
1010  """
1011  Perform the ligand assignment, ie find the mapping between model and
1012  target ligands, based on a single matrix
1013 
1014  The algorithm starts by assigning the "best" mapping, and then discards
1015  the target and model ligands (row, column) so that every model ligand
1016  can be assigned to a single target ligand, and every target ligand
1017  is only assigned to a single model ligand. Repeat until there is
1018  nothing left to assign.
1019 
1020  This algorithm doesn't guarantee a globally optimal assignment.
1021 
1022  `mat` should contain values between 0 and infinity,
1023  with lower values representing better scores. Use the
1024  :meth:`_reverse_lddt` method to convert lDDT values to such a score.
1025 
1026  :param mat: the ligand assignment criteria (RMSD or lDDT-PLI)
1027  :param data1: the first data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
1028  :param main_key1: the first key of data (dictionnaries within `data`) to
1029  assign into out_main.
1030  :param data2: the second data (either self._rmsd_full_matrix or self._lddt_pli_matrix)
1031  :param main_key2: the second key of data (dictionnaries within `data`) to
1032  assign into out_main.
1033  :return: a tuple with 4 dictionaries of matrices containing the main
1034  data1, details1, main data2 and details2, respectively.
1035  """
1036  assignments = self._find_ligand_assignment(mat,
1037  coverage=self._assignment_match_coverage,
1038  coverage_delta=self.coverage_delta)
1039  out_main1 = {}
1040  out_details1 = {}
1041  out_main2 = {}
1042  out_details2 = {}
1043  assigned_trg = [False] * len(self.target_ligands)
1044  assigned_mdl = [False] * len(self.model_ligands)
1045  for assignment in assignments:
1046  trg_idx, mdl_idx = assignment
1047  assigned_mdl[mdl_idx] = True
1048  assigned_trg[trg_idx] = True
1049  mdl_lig = self.model_ligands[mdl_idx]
1050  mdl_cname = mdl_lig.chain.name
1051  mdl_resnum = mdl_lig.number
1052  # Data 1
1053  if mdl_cname not in out_main1:
1054  out_main1[mdl_cname] = {}
1055  out_details1[mdl_cname] = {}
1056  out_main1[mdl_cname][mdl_resnum] = data1[
1057  trg_idx, mdl_idx][main_key1]
1058  out_details1[mdl_cname][mdl_resnum] = data1[
1059  trg_idx, mdl_idx]
1060  # Data2
1061  if mdl_cname not in out_main2:
1062  out_main2[mdl_cname] = {}
1063  out_details2[mdl_cname] = {}
1064  out_main2[mdl_cname][mdl_resnum] = data2[
1065  trg_idx, mdl_idx][main_key2]
1066  out_details2[mdl_cname][mdl_resnum] = data2[
1067  trg_idx, mdl_idx]
1068 
1069  unassigned_trg, unassigned_mdl = self._assign_unassigned(
1070  assigned_trg, assigned_mdl,
1071  [out_main1, out_main2], [out_details1, out_details2],
1072  [main_key1, main_key2])
1073 
1074  return out_main1, out_details1, out_main2, out_details2, \
1075  unassigned_trg, unassigned_mdl
1076 
1077  def _assign_ligands_lddt_pli(self):
1078  """ Assign ligands based on lDDT-PLI.
1079 
1080  Sets self._lddt_pli and self._lddt_pli_details.
1081  """
1082  mat1 = self._reverse_lddt(self.lddt_pli_matrix)
1083 
1084  mat_tuple = self._assign_matrices(mat1,
1085  self.rmsd_matrix,
1086  self._lddt_pli_full_matrix,
1087  "lddt_pli")
1088  self._lddt_pli = mat_tuple[0]
1089  self._lddt_pli_details = mat_tuple[1]
1090  self._unassigned_target_ligands = mat_tuple[2]
1091  self._unassigned_model_ligands = mat_tuple[3]
1092 
1093  def _assign_ligands_rmsd_only(self):
1094  """Assign (map) ligands between model and target based on RMSD only.
1095 
1096  Sets self._rmsd, self._rmsd_details, self._lddt_pli and
1097  self._lddt_pli_details.
1098  """
1099  mat_tuple = self._assign_matrix(self.rmsd_matrix,
1100  self._rmsd_full_matrix,
1101  "rmsd",
1102  self._lddt_pli_full_matrix,
1103  "lddt_pli")
1104  self._rmsd = mat_tuple[0]
1105  self._rmsd_details = mat_tuple[1]
1106  self._lddt_pli = mat_tuple[2]
1107  self._lddt_pli_details = mat_tuple[3]
1108  self._unassigned_target_ligands = mat_tuple[4]
1109  self._unassigned_model_ligands = mat_tuple[5]
1110 
1111  @property
1112  def rmsd_matrix(self):
1113  """ Get the matrix of RMSD values.
1114 
1115  Target ligands are in rows, model ligands in columns.
1116 
1117  NaN values indicate that no RMSD could be computed (i.e. different
1118  ligands).
1119 
1120  :rtype: :class:`~numpy.ndarray`
1121  """
1122  if self._rmsd_full_matrix is None:
1123  self._compute_scores()
1124  if self._rmsd_matrix is None:
1125  # convert
1126  shape = self._rmsd_full_matrix.shape
1127  self._rmsd_matrix = np.full(shape, np.nan)
1128  for i, j in np.ndindex(shape):
1129  if self._rmsd_full_matrix[i, j] is not None:
1130  self._rmsd_matrix[i, j] = self._rmsd_full_matrix[
1131  i, j]["rmsd"]
1132  return self._rmsd_matrix
1133 
1134  @property
1135  def lddt_pli_matrix(self):
1136  """ Get the matrix of lDDT-PLI values.
1137 
1138  Target ligands are in rows, model ligands in columns.
1139 
1140  NaN values indicate that no lDDT-PLI could be computed (i.e. different
1141  ligands).
1142 
1143  :rtype: :class:`~numpy.ndarray`
1144  """
1145  if self._lddt_pli_full_matrix is None:
1146  self._compute_scores()
1147  if self._lddt_pli_matrix is None:
1148  # convert
1149  shape = self._lddt_pli_full_matrix.shape
1150  self._lddt_pli_matrix = np.full(shape, np.nan)
1151  for i, j in np.ndindex(shape):
1152  if self._lddt_pli_full_matrix[i, j] is not None:
1153  self._lddt_pli_matrix[i, j] = self._lddt_pli_full_matrix[
1154  i, j]["lddt_pli"]
1155  return self._lddt_pli_matrix
1156 
1157  @property
1158  def coverage_matrix(self):
1159  """ Get the matrix of model ligand atom coverage in the target.
1160 
1161  Target ligands are in rows, model ligands in columns.
1162 
1163  A value of 0 indicates that there was no isomorphism between the model
1164  and target ligands. If `substructure_match=False`, only full match
1165  isomorphisms are considered, and therefore only values of 1.0 and 0.0
1166  are reported.
1167 
1168  :rtype: :class:`~numpy.ndarray`
1169  """
1170  if self._assignment_match_coverage is None:
1171  self._compute_scores()
1172  return self._assignment_match_coverage
1173 
1174  @property
1175  def rmsd(self):
1176  """Get a dictionary of RMSD score values, keyed by model ligand
1177  (chain name, :class:`~ost.mol.ResNum`).
1178 
1179  If the scoring object was instantiated with `unassigned=True`, some
1180  scores may be `None`.
1181 
1182  :rtype: :class:`dict`
1183  """
1184  if self._rmsd is None:
1185  if self.rmsd_assignment:
1187  else:
1188  self._assign_ligands_rmsd()
1189  return self._rmsd
1190 
1191  @property
1192  def rmsd_details(self):
1193  """Get a dictionary of RMSD score details (dictionaries), keyed by
1194  model ligand (chain name, :class:`~ost.mol.ResNum`).
1195 
1196  The value is a dictionary. For ligands that were assigned (mapped) to
1197  the target, the dictionary contain the following information:
1198 
1199  * `rmsd`: the RMSD score value.
1200  * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
1201  * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
1202  that define the binding site in the reference.
1203  * `bs_ref_res_mapped`: a list of residues
1204  (:class:`~ost.mol.ResidueHandle`) in the reference binding site
1205  that could be mapped to the model.
1206  * `bs_mdl_res_mapped`: a list of residues
1207  (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
1208  the reference binding site. The residues are in the same order as
1209  `bs_ref_res_mapped`.
1210  * `bb_rmsd`: the RMSD of the binding site backbone after superposition
1211  * `target_ligand`: residue handle of the target ligand.
1212  * `model_ligand`: residue handle of the model ligand.
1213  * `chain_mapping`: local chain mapping as a dictionary, with target
1214  chain name as key and model chain name as value.
1215  * `transform`: transformation to superpose the model onto the target.
1216  * `substructure_match`: whether the score is the result of a partial
1217  (substructure) match. A value of `True` indicates that the target
1218  ligand covers only part of the model, while `False` indicates a
1219  perfect match.
1220  * `coverage`: the fraction of model atoms covered by the assigned
1221  target ligand, in the interval (0, 1]. If `substructure_match`
1222  is `False`, this will always be 1.
1223  * `inconsistent_residues`: a list of tuples of mapped residues views
1224  (:class:`~ost.mol.ResidueView`) with residue names that differ
1225  between the reference and the model, respectively.
1226  The list is empty if all residue names match, which is guaranteed
1227  if `check_resnames=True`.
1228  Note: more binding site mappings may be explored during scoring,
1229  but only inconsistencies in the selected mapping are reported.
1230  * `unassigned`: only if the scorer was instantiated with
1231  `unassigned=True`: `False`
1232 
1233  If the scoring object was instantiated with `unassigned=True`, in
1234  addition the unassigned ligands will be reported with a score of `None`
1235  and the following information:
1236 
1237  * `unassigned`: `True`,
1238  * `reason_short`: a short token of the reason, see
1239  :attr:`unassigned_model_ligands` for details and meaning.
1240  * `reason_long`: a human-readable text of the reason, see
1241  :attr:`unassigned_model_ligands` for details and meaning.
1242  * `rmsd`: `None`
1243 
1244  :rtype: :class:`dict`
1245  """
1246  if self._rmsd_details is None:
1247  if self.rmsd_assignment:
1249  else:
1250  self._assign_ligands_rmsd()
1251  return self._rmsd_details
1252 
1253  @property
1254  def lddt_pli(self):
1255  """Get a dictionary of lDDT-PLI score values, keyed by model ligand
1256  (chain name, :class:`~ost.mol.ResNum`).
1257 
1258  If the scoring object was instantiated with `unassigned=True`, some
1259  scores may be `None`.
1260 
1261  :rtype: :class:`dict`
1262  """
1263  if self._lddt_pli is None:
1264  if self.rmsd_assignment:
1266  else:
1268  return self._lddt_pli
1269 
1270  @property
1271  def lddt_pli_details(self):
1272  """Get a dictionary of lDDT-PLI score details (dictionaries), keyed by
1273  model ligand (chain name, :class:`~ost.mol.ResNum`).
1274 
1275  Each sub-dictionary contains the following information:
1276 
1277  * `lddt_pli`: the lDDT-PLI score value.
1278  * `rmsd`: the RMSD score value corresponding to the lDDT-PLI
1279  chain mapping and assignment. This may differ from the RMSD-based
1280  assignment. Note that a different isomorphism than `lddt_pli` may
1281  be used.
1282  * `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
1283  * `lddt_pli_n_contacts`: number of total contacts used in lDDT-PLI,
1284  summed over all thresholds. Can be divided by 8 to obtain the number
1285  of atomic contacts.
1286  * `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
1287  that define the binding site in the reference.
1288  * `bs_ref_res_mapped`: a list of residues
1289  (:class:`~ost.mol.ResidueHandle`) in the reference binding site
1290  that could be mapped to the model.
1291  * `bs_mdl_res_mapped`: a list of residues
1292  (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
1293  the reference binding site. The residues are in the same order as
1294  `bs_ref_res_mapped`.
1295  * `bb_rmsd`: the RMSD of the binding site backbone after superposition.
1296  Note: not used for lDDT-PLI computation.
1297  * `target_ligand`: residue handle of the target ligand.
1298  * `model_ligand`: residue handle of the model ligand.
1299  * `chain_mapping`: local chain mapping as a dictionary, with target
1300  chain name as key and model chain name as value.
1301  * `transform`: transformation to superpose the model onto the target
1302  (for RMSD only).
1303  * `substructure_match`: whether the score is the result of a partial
1304  (substructure) match. A value of `True` indicates that the target
1305  ligand covers only part of the model, while `False` indicates a
1306  perfect match.
1307  * `inconsistent_residues`: a list of tuples of mapped residues views
1308  (:class:`~ost.mol.ResidueView`) with residue names that differ
1309  between the reference and the model, respectively.
1310  The list is empty if all residue names match, which is guaranteed
1311  if `check_resnames=True`.
1312  Note: more binding site mappings may be explored during scoring,
1313  but only inconsistencies in the selected mapping are reported.
1314  * `unassigned`: only if the scorer was instantiated with
1315  `unassigned=True`: `False`
1316 
1317  If the scoring object was instantiated with `unassigned=True`, in
1318  addition the unmapped ligands will be reported with a score of `None`
1319  and the following information:
1320 
1321  * `unassigned`: `True`,
1322  * `reason_short`: a short token of the reason, see
1323  :attr:`unassigned_model_ligands` for details and meaning.
1324  * `reason_long`: a human-readable text of the reason, see
1325  :attr:`unassigned_model_ligands` for details and meaning.
1326  * `lddt_pli`: `None`
1327 
1328  :rtype: :class:`dict`
1329  """
1330  if self._lddt_pli_details is None:
1331  if self.rmsd_assignment:
1333  else:
1335  return self._lddt_pli_details
1336 
1337  @property
1339  """Get a dictionary of target ligands not assigned to any model ligand,
1340  keyed by target ligand (chain name, :class:`~ost.mol.ResNum`).
1341 
1342  The assignment for the lDDT-PLI score is used (and is controlled
1343  by the `rmsd_assignment` argument).
1344 
1345  Each item contains a string from a controlled dictionary
1346  about the reason for the absence of assignment.
1347  A human-readable description can be obtained from the
1348  :attr:`unassigned_target_ligand_descriptions` property.
1349 
1350  Currently, the following reasons are reported:
1351 
1352  * `no_ligand`: there was no ligand in the model.
1353  * `disconnected`: the ligand graph was disconnected.
1354  * `binding_site`: no residues were in proximity of the ligand.
1355  * `model_representation`: no representation of the reference binding
1356  site was found in the model. (I.e. the binding site was not modeled.
1357  Remember: the binding site is defined in the target structure,
1358  the position of the model ligand itself is ignored at this point.)
1359  * `identity`: the ligand was not found in the model (by graph
1360  isomorphism). Check your ligand connectivity, and enable the
1361  `substructure_match` option if the target ligand is incomplete.
1362  * `stoichiometry`: there was a possible assignment in the model, but
1363  the model ligand was already assigned to a different target ligand.
1364  This indicates different stoichiometries.
1365 
1366  Some of these reasons can be overlapping, but a single reason will be
1367  reported.
1368 
1369  :rtype: :class:`dict`
1370  """
1371  if self._unassigned_target_ligand_short is None:
1372  if self.rmsd_assignment:
1374  else:
1378  for cname, res in self._unassigned_target_ligands.items():
1379  self._unassigned_target_ligand_short[cname] = {}
1380  for resnum, val in res.items():
1381  self._unassigned_target_ligand_short[cname][resnum] = val[0]
1382  self._unassigned_target_ligand_descriptions[val[0]] = val[1]
1384 
1385  @property
1387  """Get a human-readable description of why target ligands were
1388  unassigned, as a dictionary keyed by the controlled dictionary
1389  from :attr:`unassigned_target_ligands`.
1390  """
1392  _ = self.unassigned_target_ligands # assigned there
1394 
1395  @property
1397  """Get a dictionary of model ligands not assigned to any target ligand,
1398  keyed by model ligand (chain name, :class:`~ost.mol.ResNum`).
1399 
1400  The assignment for the lDDT-PLI score is used (and is controlled
1401  by the `rmsd_assignment` argument).
1402 
1403  Each item contains a string from a controlled dictionary
1404  about the reason for the absence of assignment.
1405  A human-readable description can be obtained from the
1406  :attr:`unassigned_model_ligand_descriptions` property.
1407  Currently, the following reasons are reported:
1408 
1409  * `no_ligand`: there was no ligand in the target.
1410  * `disconnected`: the ligand graph is disconnected.
1411  * `binding_site`: a potential assignment was found in the target, but
1412  there were no polymer residues in proximity of the ligand in the
1413  target.
1414  * `model_representation`: a potential assignment was found in the target,
1415  but no representation of the binding site was found in the model.
1416  (I.e. the binding site was not modeled. Remember: the binding site
1417  is defined in the target structure, the position of the model ligand
1418  itself is ignored at this point.)
1419  * `identity`: the ligand was not found in the target (by graph
1420  isomorphism). Check your ligand connectivity, and enable the
1421  `substructure_match` option if the target ligand is incomplete.
1422  * `stoichiometry`: there was a possible assignment in the target, but
1423  the model target was already assigned to a different model ligand.
1424  This indicates different stoichiometries.
1425 
1426  Some of these reasons can be overlapping, but a single reason will be
1427  reported.
1428 
1429  :rtype: :class:`dict`
1430  """
1431  if self._unassigned_model_ligand_short is None:
1432  if self.rmsd_assignment:
1434  else:
1438  for cname, res in self._unassigned_model_ligands.items():
1439  self._unassigned_model_ligand_short[cname] = {}
1440  for resnum, val in res.items():
1441  self._unassigned_model_ligand_short[cname][resnum] = val[0]
1442  self._unassigned_model_ligand_descriptions[val[0]] = val[1]
1443  return self._unassigned_model_ligand_short
1444 
1445  @property
1447  """Get a human-readable description of why model ligands were
1448  unassigned, as a dictionary keyed by the controlled dictionary
1449  from :attr:`unassigned_model_ligands`.
1450  """
1451  if self._unassigned_model_ligand_descriptions is None:
1452  _ = self.unassigned_model_ligands # assigned there
1454 
1455 
1456  def _set_custom_mapping(self, mapping):
1457  """ sets self.__model_mapping with a full blown MappingResult object
1458 
1459  :param mapping: mapping with trg chains as key and mdl ch as values
1460  :type mapping: :class:`dict`
1461  """
1462  chain_mapper = self.chain_mapper
1463  chem_mapping, chem_group_alns, mdl = \
1464  chain_mapper.GetChemMapping(self.model)
1465 
1466  # now that we have a chem mapping, lets do consistency checks
1467  # - check whether chain names are unique and available in structures
1468  # - check whether the mapped chains actually map to the same chem groups
1469  if len(mapping) != len(set(mapping.keys())):
1470  raise RuntimeError(f"Expect unique trg chain names in mapping. Got "
1471  f"{mapping.keys()}")
1472  if len(mapping) != len(set(mapping.values())):
1473  raise RuntimeError(f"Expect unique mdl chain names in mapping. Got "
1474  f"{mapping.values()}")
1475 
1476  trg_chains = set([ch.GetName() for ch in chain_mapper.target.chains])
1477  mdl_chains = set([ch.GetName() for ch in mdl.chains])
1478  for k,v in mapping.items():
1479  if k not in trg_chains:
1480  raise RuntimeError(f"Target chain \"{k}\" is not available "
1481  f"in target processed for chain mapping "
1482  f"({trg_chains})")
1483  if v not in mdl_chains:
1484  raise RuntimeError(f"Model chain \"{v}\" is not available "
1485  f"in model processed for chain mapping "
1486  f"({mdl_chains})")
1487 
1488  for trg_ch, mdl_ch in mapping.items():
1489  trg_group_idx = None
1490  mdl_group_idx = None
1491  for idx, group in enumerate(chain_mapper.chem_groups):
1492  if trg_ch in group:
1493  trg_group_idx = idx
1494  break
1495  for idx, group in enumerate(chem_mapping):
1496  if mdl_ch in group:
1497  mdl_group_idx = idx
1498  break
1499  if trg_group_idx is None or mdl_group_idx is None:
1500  raise RuntimeError("Could not establish a valid chem grouping "
1501  "of chain names provided in custom mapping.")
1502 
1503  if trg_group_idx != mdl_group_idx:
1504  raise RuntimeError(f"Chem group mismatch in custom mapping: "
1505  f"target chain \"{trg_ch}\" groups with the "
1506  f"following chemically equivalent target "
1507  f"chains: "
1508  f"{chain_mapper.chem_groups[trg_group_idx]} "
1509  f"but model chain \"{mdl_ch}\" maps to the "
1510  f"following target chains: "
1511  f"{chain_mapper.chem_groups[mdl_group_idx]}")
1512 
1513  pairs = set([(trg_ch, mdl_ch) for trg_ch, mdl_ch in mapping.items()])
1514  ref_mdl_alns = \
1515  chain_mapping._GetRefMdlAlns(chain_mapper.chem_groups,
1516  chain_mapper.chem_group_alignments,
1517  chem_mapping,
1518  chem_group_alns,
1519  pairs = pairs)
1520 
1521  # translate mapping format
1522  final_mapping = list()
1523  for ref_chains in chain_mapper.chem_groups:
1524  mapped_mdl_chains = list()
1525  for ref_ch in ref_chains:
1526  if ref_ch in mapping:
1527  mapped_mdl_chains.append(mapping[ref_ch])
1528  else:
1529  mapped_mdl_chains.append(None)
1530  final_mapping.append(mapped_mdl_chains)
1531 
1532  alns = dict()
1533  for ref_group, mdl_group in zip(chain_mapper.chem_groups,
1534  final_mapping):
1535  for ref_ch, mdl_ch in zip(ref_group, mdl_group):
1536  if ref_ch is not None and mdl_ch is not None:
1537  aln = ref_mdl_alns[(ref_ch, mdl_ch)]
1538  trg_view = chain_mapper.target.Select(f"cname={ref_ch}")
1539  mdl_view = mdl.Select(f"cname={mdl_ch}")
1540  aln.AttachView(0, trg_view)
1541  aln.AttachView(1, mdl_view)
1542  alns[(ref_ch, mdl_ch)] = aln
1543 
1544  self.__model_mapping = chain_mapping.MappingResult(chain_mapper.target, mdl,
1545  chain_mapper.chem_groups,
1546  chem_mapping,
1547  final_mapping, alns)
1548 
1549  def _find_unassigned_model_ligand_reason(self, ligand, assignment="lddt_pli", check=True):
1550  # Is this a model ligand?
1551  try:
1552  ligand_idx = self.model_ligands.index(ligand)
1553  except ValueError:
1554  # Raise with a better error message
1555  raise ValueError("Ligand %s is not in self.model_ligands" % ligand)
1556 
1557  # Ensure we are unassigned
1558  if check:
1559  details = getattr(self, assignment + "_details")
1560  if ligand.chain.name in details and ligand.number in details[ligand.chain.name]:
1561  ligand_details = details[ligand.chain.name][ligand.number]
1562  if not ("unassigned" in ligand_details and ligand_details["unassigned"]):
1563  raise RuntimeError("Ligand %s is mapped to %s" % (ligand, ligand_details["target_ligand"]))
1564 
1565  # Were there any ligands in the target?
1566  if len(self.target_ligands) == 0:
1567  return ("no_ligand", "No ligand in the target")
1568 
1569  # Is the ligand disconnected?
1570  graph = _ResidueToGraph(ligand)
1571  if not networkx.is_connected(graph):
1572  return ("disconnected", "Ligand graph is disconnected")
1573 
1574  # Do we have isomorphisms with the target?
1575  for trg_lig_idx, assigned in enumerate(self._assignment_isomorphisms[:, ligand_idx]):
1576  if np.isnan(assigned):
1577  try:
1578  _ComputeSymmetries(
1579  self.model_ligands[ligand_idx],
1580  self.target_ligands[trg_lig_idx],
1581  substructure_match=self.substructure_match,
1582  by_atom_index=True,
1583  return_symmetries=False)
1584  except (NoSymmetryError, DisconnectedGraphError):
1585  assigned = 0.
1586  else:
1587  assigned = 1.
1588  self._assignment_isomorphisms[trg_lig_idx,ligand_idx] = assigned
1589  if assigned == 1.:
1590  # Could have been assigned
1591  # So what's up with this target ligand?
1592  assignment_matrix = getattr(self, assignment + "_matrix")
1593  all_nan = np.all(np.isnan(assignment_matrix[:, ligand_idx]))
1594  if all_nan:
1595  # The assignment matrix is all nans so we have a problem
1596  # with the binding site or the representation
1597  trg_ligand = self.target_ligands[trg_lig_idx]
1598  return self._unassigned_target_ligands_reason[trg_ligand]
1599  else:
1600  # Ligand was already assigned
1601  return ("stoichiometry",
1602  "Ligand was already assigned to an other "
1603  "model ligand (different stoichiometry)")
1604 
1605  # Could not be assigned to any ligand - must be different
1606  if self.substructure_match:
1607  iso = "subgraph isomorphism"
1608  else:
1609  iso = "full graph isomorphism"
1610  return ("identity", "Ligand was not found in the target (by %s)" % iso)
1611 
1612  def _find_unassigned_target_ligand_reason(self, ligand, assignment="lddt_pli", check=True):
1613  # Is this a target ligand?
1614  try:
1615  ligand_idx = self.target_ligands.index(ligand)
1616  except ValueError:
1617  # Raise with a better error message
1618  raise ValueError("Ligand %s is not in self.target_ligands" % ligand)
1619 
1620  # Ensure we are unassigned
1621  if check:
1622  details = getattr(self, assignment + "_details")
1623  for cname, chain_ligands in details.items():
1624  for rnum, details in chain_ligands.items():
1625  if "unassigned" in details and details["unassigned"]:
1626  continue
1627  if details['target_ligand'] == ligand:
1628  raise RuntimeError("Ligand %s is mapped to %s.%s" % (
1629  ligand, cname, rnum))
1630 
1631  # Were there any ligands in the model?
1632  if len(self.model_ligands) == 0:
1633  return ("no_ligand", "No ligand in the model")
1634 
1635  # Is the ligand disconnected?
1636  graph = _ResidueToGraph(ligand)
1637  if not networkx.is_connected(graph):
1638  return ("disconnected", "Ligand graph is disconnected")
1639 
1640  # Is it because there was no valid binding site or no representation?
1641  if ligand in self._unassigned_target_ligands_reason:
1642  return self._unassigned_target_ligands_reason[ligand]
1643  # Or because no symmetry?
1644  for model_lig_idx, assigned in enumerate(
1645  self._assignment_isomorphisms[ligand_idx, :]):
1646  if np.isnan(assigned):
1647  try:
1648  _ComputeSymmetries(
1649  self.model_ligands[model_lig_idx],
1650  self.target_ligands[ligand_idx],
1651  substructure_match=self.substructure_match,
1652  by_atom_index=True,
1653  return_symmetries=False)
1654  except (NoSymmetryError, DisconnectedGraphError):
1655  assigned = 0.
1656  else:
1657  assigned = 1.
1658  self._assignment_isomorphisms[ligand_idx,model_lig_idx] = assigned
1659  if assigned:
1660  # Could have been assigned but was assigned to a different ligand
1661  return ("stoichiometry",
1662  "Ligand was already assigned to an other "
1663  "target ligand (different stoichiometry)")
1664 
1665  # Could not be assigned to any ligand - must be different
1666  if self.substructure_match:
1667  iso = "subgraph isomorphism"
1668  else:
1669  iso = "full graph isomorphism"
1670  return ("identity", "Ligand was not found in the model (by %s)" % iso)
1671 
1672 
1673 def _ResidueToGraph(residue, by_atom_index=False):
1674  """Return a NetworkX graph representation of the residue.
1675 
1676  :param residue: the residue from which to derive the graph
1677  :type residue: :class:`ost.mol.ResidueHandle` or
1678  :class:`ost.mol.ResidueView`
1679  :param by_atom_index: Set this parameter to True if you need the nodes to
1680  be labeled by atom index (within the residue).
1681  Otherwise, if False, the nodes will be labeled by
1682  atom names.
1683  :type by_atom_index: :class:`bool`
1684  :rtype: :class:`~networkx.classes.graph.Graph`
1685 
1686  Nodes are labeled with the Atom's uppercase :attr:`~ost.mol.AtomHandle.element`.
1687  """
1688  nxg = networkx.Graph()
1689 
1690  for atom in residue.atoms:
1691  nxg.add_node(atom.name, element=atom.element.upper())
1692 
1693  # This will list all edges twice - once for every atom of the pair.
1694  # But as of NetworkX 3.0 adding the same edge twice has no effect, so we're good.
1695  nxg.add_edges_from([(
1696  b.first.name,
1697  b.second.name) for a in residue.atoms for b in a.GetBondList()])
1698 
1699  if by_atom_index:
1700  nxg = networkx.relabel_nodes(nxg,
1701  {a: b for a, b in zip(
1702  [a.name for a in residue.atoms],
1703  range(len(residue.atoms)))},
1704  True)
1705  return nxg
1706 
1707 
1708 def SCRMSD(model_ligand, target_ligand, transformation=geom.Mat4(),
1709  substructure_match=False):
1710  """Calculate symmetry-corrected RMSD.
1711 
1712  Binding site superposition must be computed separately and passed as
1713  `transformation`.
1714 
1715  :param model_ligand: The model ligand
1716  :type model_ligand: :class:`ost.mol.ResidueHandle` or
1717  :class:`ost.mol.ResidueView`
1718  :param target_ligand: The target ligand
1719  :type target_ligand: :class:`ost.mol.ResidueHandle` or
1720  :class:`ost.mol.ResidueView`
1721  :param transformation: Optional transformation to apply on each atom
1722  position of model_ligand.
1723  :type transformation: :class:`ost.geom.Mat4`
1724  :param substructure_match: Set this to True to allow partial target
1725  ligand.
1726  :type substructure_match: :class:`bool`
1727  :rtype: :class:`float`
1728  :raises: :class:`NoSymmetryError` when no symmetry can be found,
1729  :class:`DisconnectedGraphError` when ligand graph is disconnected.
1730  """
1731 
1732  symmetries = _ComputeSymmetries(model_ligand, target_ligand,
1733  substructure_match=substructure_match,
1734  by_atom_index=True)
1735 
1736  best_rmsd = np.inf
1737  for i, (trg_sym, mdl_sym) in enumerate(symmetries):
1738  # Prepare Entities for RMSD
1739  trg_lig_rmsd_ent = mol.CreateEntity()
1740  trg_lig_rmsd_editor = trg_lig_rmsd_ent.EditXCS()
1741  trg_lig_rmsd_chain = trg_lig_rmsd_editor.InsertChain("_")
1742  trg_lig_rmsd_res = trg_lig_rmsd_editor.AppendResidue(trg_lig_rmsd_chain, "LIG")
1743 
1744  mdl_lig_rmsd_ent = mol.CreateEntity()
1745  mdl_lig_rmsd_editor = mdl_lig_rmsd_ent.EditXCS()
1746  mdl_lig_rmsd_chain = mdl_lig_rmsd_editor.InsertChain("_")
1747  mdl_lig_rmsd_res = mdl_lig_rmsd_editor.AppendResidue(mdl_lig_rmsd_chain, "LIG")
1748 
1749  for mdl_anum, trg_anum in zip(mdl_sym, trg_sym):
1750  # Rename model atoms according to symmetry
1751  trg_atom = target_ligand.atoms[trg_anum]
1752  mdl_atom = model_ligand.atoms[mdl_anum]
1753  # Add atoms in the correct order to the RMSD entities
1754  trg_lig_rmsd_editor.InsertAtom(trg_lig_rmsd_res, trg_atom.name, trg_atom.pos)
1755  mdl_lig_rmsd_editor.InsertAtom(mdl_lig_rmsd_res, mdl_atom.name, mdl_atom.pos)
1756 
1757  trg_lig_rmsd_editor.UpdateICS()
1758  mdl_lig_rmsd_editor.UpdateICS()
1759 
1760  rmsd = mol.alg.CalculateRMSD(mdl_lig_rmsd_ent.CreateFullView(),
1761  trg_lig_rmsd_ent.CreateFullView(),
1762  transformation)
1763  if rmsd < best_rmsd:
1764  best_rmsd = rmsd
1765 
1766  return best_rmsd
1767 
1768 
1769 def _ComputeSymmetries(model_ligand, target_ligand, substructure_match=False,
1770  by_atom_index=False, return_symmetries=True):
1771  """Return a list of symmetries (isomorphisms) of the model onto the target
1772  residues.
1773 
1774  :param model_ligand: The model ligand
1775  :type model_ligand: :class:`ost.mol.ResidueHandle` or
1776  :class:`ost.mol.ResidueView`
1777  :param target_ligand: The target ligand
1778  :type target_ligand: :class:`ost.mol.ResidueHandle` or
1779  :class:`ost.mol.ResidueView`
1780  :param substructure_match: Set this to True to allow partial ligands
1781  in the reference.
1782  :type substructure_match: :class:`bool`
1783  :param by_atom_index: Set this parameter to True if you need the symmetries
1784  to refer to atom index (within the residue).
1785  Otherwise, if False, the symmetries refer to atom
1786  names.
1787  :type by_atom_index: :class:`bool`
1788  :type return_symmetries: If Truthy, return the mappings, otherwise simply
1789  return True if a mapping is found (and raise if
1790  no mapping is found). This is useful to quickly
1791  find out if a mapping exist without the expensive
1792  step to find all the mappings.
1793  :type return_symmetries: :class:`bool`
1794  :raises: :class:`NoSymmetryError` when no symmetry can be found.
1795 
1796  """
1797 
1798  # Get the Graphs of the ligands
1799  model_graph = _ResidueToGraph(model_ligand, by_atom_index=by_atom_index)
1800  target_graph = _ResidueToGraph(target_ligand, by_atom_index=by_atom_index)
1801 
1802  if not networkx.is_connected(model_graph):
1803  raise DisconnectedGraphError("Disconnected graph for model ligand %s" % model_ligand)
1804  if not networkx.is_connected(target_graph):
1805  raise DisconnectedGraphError("Disconnected graph for target ligand %s" % target_ligand)
1806 
1807  # Note the argument order (model, target) which differs from spyrmsd.
1808  # This is because a subgraph of model is isomorphic to target - but not the opposite
1809  # as we only consider partial ligands in the reference.
1810  # Make sure to generate the symmetries correctly in the end
1811  gm = networkx.algorithms.isomorphism.GraphMatcher(
1812  model_graph, target_graph, node_match=lambda x, y:
1813  x["element"] == y["element"])
1814  if gm.is_isomorphic():
1815  if not return_symmetries:
1816  return True
1817  symmetries = [
1818  (list(isomorphism.values()), list(isomorphism.keys()))
1819  for isomorphism in gm.isomorphisms_iter()]
1820  assert len(symmetries) > 0
1821  LogDebug("Found %s isomorphic mappings (symmetries)" % len(symmetries))
1822  elif gm.subgraph_is_isomorphic() and substructure_match:
1823  if not return_symmetries:
1824  return True
1825  symmetries = [(list(isomorphism.values()), list(isomorphism.keys())) for isomorphism in
1826  gm.subgraph_isomorphisms_iter()]
1827  assert len(symmetries) > 0
1828  # Assert that all the atoms in the target are part of the substructure
1829  assert len(symmetries[0][0]) == len(target_ligand.atoms)
1830  LogDebug("Found %s subgraph isomorphisms (symmetries)" % len(symmetries))
1831  elif gm.subgraph_is_isomorphic():
1832  LogDebug("Found subgraph isomorphisms (symmetries), but"
1833  " ignoring because substructure_match=False")
1834  raise NoSymmetryError("No symmetry between %s and %s" % (
1835  str(model_ligand), str(target_ligand)))
1836  else:
1837  LogDebug("Found no isomorphic mappings (symmetries)")
1838  raise NoSymmetryError("No symmetry between %s and %s" % (
1839  str(model_ligand), str(target_ligand)))
1840 
1841  return symmetries
1842 
1843 
1844 class NoSymmetryError(Exception):
1845  """Exception to be raised when no symmetry can be found.
1846  """
1847  pass
1848 
1849 class DisconnectedGraphError(Exception):
1850  """Exception to be raised when the ligand graph is disconnected.
1851  """
1852  pass
1853 
1854 
1855 __all__ = ["LigandScorer", "SCRMSD", "NoSymmetryError", "DisconnectedGraphError"]
Protein or molecule.
definition of EntityView
Definition: entity_view.hh:86