OpenStructure
Public Member Functions | Data Fields
LigandScorer Class Reference

Inherited by LDDTPLIScorer, and SCRMSDScorer.

Public Member Functions

def __init__ (self, model, target, model_ligands=None, target_ligands=None, resnum_alignments=False, rename_ligand_chain=False, substructure_match=False, coverage_delta=0.2, max_symmetries=1e5)
 
def state_matrix (self)
 
def model_ligand_states (self)
 
def target_ligand_states (self)
 
def score_matrix (self)
 
def coverage_matrix (self)
 
def aux_matrix (self)
 
def assignment (self)
 
def score (self)
 
def aux (self)
 
def unassigned_target_ligands (self)
 
def unassigned_model_ligands (self)
 
def get_target_ligand_state_report (self, trg_lig_idx)
 
def get_model_ligand_state_report (self, mdl_lig_idx)
 
def guess_target_ligand_unassigned_reason (self, trg_lig_idx)
 
def guess_model_ligand_unassigned_reason (self, mdl_lig_idx)
 
def unassigned_model_ligands_reasons (self)
 
def unassigned_target_ligands_reasons (self)
 

Data Fields

 model
 
 target
 
 target_ligands
 
 model_ligands
 
 resnum_alignments
 
 rename_ligand_chain
 
 substructure_match
 
 coverage_delta
 
 max_symmetries
 
 state_decoding
 

Detailed Description

 Scorer to compute various small molecule ligand (non polymer) scores.

.. note ::
  Extra requirements:

  - Python modules `numpy` and `networkx` must be available
    (e.g. use ``pip install numpy networkx``)

:class:`LigandScorer` is an abstract base class dealing with all the setup,
data storage, enumerating ligand symmetries and target/model ligand
matching/assignment. But actual score computation is delegated to child
classes.

At the moment, two such classes are available:

* :class:`ost.mol.alg.ligand_scoring_lddtpli.LDDTPLIScorer`
  that assesses the conservation of protein-ligand
  contacts
* :class:`ost.mol.alg.ligand_scoring_scrmsd.SCRMSDScorer`
  that computes a binding-site superposed, symmetry-corrected RMSD.

All versus all scores are available through the lazily computed
:attr:`score_matrix`. However, many things can go wrong... be it even
something as simple as two ligands not matching. Error states therefore
encode scoring issues. An Issue for a particular ligand is indicated by a
non-zero state in :attr:`model_ligand_states`/:attr:`target_ligand_states`.
This invalidates pairwise scores of such a ligand with all other ligands.
This and other issues in pairwise score computation are reported in
:attr:`state_matrix` which has the same size as :attr:`score_matrix`.
Only if the respective location is 0, a valid pairwise score can be
expected. The states and their meaning can be explored with code::

  for state_code, (short_desc, desc) in scorer_obj.state_decoding.items():
      print(state_code)
      print(short_desc)
      print(desc)

A common use case is to derive a one-to-one mapping between ligands in
the model and the target for which :class:`LigandScorer` provides an
automated assignment procedure.
By default, only exact matches between target and model ligands are
considered. This is a problem when the target only contains a subset
of the expected atoms (for instance if atoms are missing in an
experimental structure, which often happens in the PDB). With
`substructure_match=True`, complete model ligands can be scored against
partial target ligands. One problem with this approach is that it is
very easy to find good matches to small, irrelevant ligands like EDO, CO2
or GOL. The assignment algorithm therefore considers the coverage,
expressed as the fraction of atoms of the model ligand atoms covered in the
target. Higher coverage matches are prioritized, but a match with a better
score will be preferred if it falls within a window of `coverage_delta`
(by default 0.2) of a worse-scoring match. As a result, for instance,
with a delta of 0.2, a low-score match with coverage 0.96 would be
preferred over a high-score match with coverage 0.70.

Assumptions:

:class:`LigandScorer` generally assumes that the
:attr:`~ost.mol.ResidueHandle.is_ligand` property is properly set on all
the ligand atoms, and only ligand atoms. This is typically the case for
entities loaded from mmCIF (tested with mmCIF files from the PDB and
SWISS-MODEL). Legacy PDB files must contain `HET` headers (which is usually
the case for files downloaded from the PDB but not elsewhere).

The class doesn't perform any cleanup of the provided structures.
It is up to the caller to ensure that the data is clean and suitable for
scoring. :ref:`Molck <molck>` should be used with extra
care, as many of the options (such as `rm_non_std` or `map_nonstd_res`) can
cause ligands to be removed from the structure. If cleanup with Molck is
needed, ligands should be kept aside and passed separately. Non-ligand
residues should be valid compounds with atom names following the naming
conventions of the component dictionary. Non-standard residues are
acceptable, and if the model contains a standard residue at that position,
only atoms with matching names will be considered.

Unlike most of OpenStructure, this class does not assume that the ligands
(either in the model or the target) are part of the PDB component
dictionary. They may have arbitrary residue names. Residue names do not
have to match between the model and the target. Matching is based on
the calculation of isomorphisms which depend on the atom element name and
atom connectivity (bond order is ignored).
It is up to the caller to ensure that the connectivity of atoms is properly
set before passing any ligands to this class. Ligands with improper
connectivity will lead to bogus results.

Note, however, that atom names should be unique within a residue (ie two
distinct atoms cannot have the same atom name).

This only applies to the ligand. The rest of the model and target
structures (protein, nucleic acids) must still follow the usual rules and
contain only residues from the compound library.

Although it isn't a requirement, hydrogen atoms should be removed from the
structures. Here is an example code snippet that will perform a reasonable
cleanup. Keep in mind that this is most likely not going to work as
expected with entities loaded from PDB files, as the `is_ligand` flag is
probably not set properly.

Here is an example of how to use setup a scorer code::

    from ost.mol.alg.ligand_scoring_scrmsd import SCRMSDScorer
    from ost.mol.alg import Molck, MolckSettings

    # Load data
    # Structure model in PDB format, containing the receptor only
    model = io.LoadPDB("path_to_model.pdb")
    # Ligand model as SDF file
    model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf")
    # Target loaded from mmCIF, containing the ligand
    target = io.LoadMMCIF("path_to_target.cif")

    # Cleanup a copy of the structures
    cleaned_model = model.Copy()
    cleaned_target = target.Copy()
    molck_settings = MolckSettings(rm_unk_atoms=True,
                                   rm_non_std=False,
                                   rm_hyd_atoms=True,
                                   rm_oxt_atoms=False,
                                   rm_zero_occ_atoms=False,
                                   colored=False,
                                   map_nonstd_res=False,
                                   assign_elem=True)
    Molck(cleaned_model, conop.GetDefaultLib(), molck_settings)
    Molck(cleaned_target, conop.GetDefaultLib(), molck_settings)

    # Setup scorer object and compute lDDT-PLI
    model_ligands = [model_ligand.Select("ele != H")]
    sc = SCRMSDScorer(cleaned_model, cleaned_target, model_ligands)

    # Perform assignment and read respective scores
    for lig_pair in sc.assignment:
        trg_lig = sc.target_ligands[lig_pair[0]]
        mdl_lig = sc.model_ligands[lig_pair[1]]
        score = sc.score_matrix[lig_pair[0], lig_pair[1]]
        print(f"Score for {trg_lig} and {mdl_lig}: {score}")

:param model: Model structure - a deep copy is available as :attr:`model`.
              No additional processing (ie. Molck), checks,
              stereochemistry checks or sanitization is performed on the
              input. Hydrogen atoms are kept.
:type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
:param target: Target structure - a deep copy is available as
               :attr:`target`. No additional processing (ie. Molck), checks
               or sanitization is performed on the input. Hydrogen atoms are
               kept.
:type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
:param model_ligands: Model ligands, as a list of
                      :class:`~ost.mol.ResidueHandle` belonging to the model
                      entity. Can be instantiated with either a :class:list
                      of :class:`~ost.mol.ResidueHandle`/
                      :class:`ost.mol.ResidueView` or of
                      :class:`ost.mol.EntityHandle`/
                      :class:`ost.mol.EntityView`.
                      If `None`, ligands will be extracted based on the
                      :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is
                      normally set properly in entities loaded from mmCIF).
:type model_ligands: :class:`list`
:param target_ligands: Target ligands, as a list of
                       :class:`~ost.mol.ResidueHandle` belonging to the
                       target entity. Can be instantiated either a
                       :class:list of :class:`~ost.mol.ResidueHandle`/
                       :class:`ost.mol.ResidueView` or of
                       :class:`ost.mol.EntityHandle`/
                       :class:`ost.mol.EntityView` containing a single
                       residue each. If `None`, ligands will be extracted
                       based on the :attr:`~ost.mol.ResidueHandle.is_ligand`
                       flag (this is normally set properly in entities
                       loaded from mmCIF).
:type target_ligands: :class:`list`
:param resnum_alignments: Whether alignments between chemically equivalent
                          chains in *model* and *target* can be computed
                          based on residue numbers. This can be assumed in
                          benchmarking setups such as CAMEO/CASP.
:type resnum_alignments: :class:`bool`
:param rename_ligand_chain: If a residue with the same chain name and
                            residue number than an explicitly passed model
                            or target ligand exits in the structure,
                            and `rename_ligand_chain` is False, a
                            RuntimeError will be raised. If
                            `rename_ligand_chain` is True, the ligand will
                            be moved to a new chain instead, and the move
                            will be logged to the console with SCRIPT
                            level.
:type rename_ligand_chain: :class:`bool`
:param substructure_match: Set this to True to allow incomplete (i.e.
                           partially resolved) target ligands.
:type substructure_match: :class:`bool`
:param coverage_delta: the coverage delta for partial ligand assignment.
:type coverage_delta: :class:`float`
:param max_symmetries: If more than that many isomorphisms exist for
                       a target-ligand pair, it will be ignored and reported
                       as unassigned.
:type max_symmetries: :class:`int`

Definition at line 8 of file ligand_scoring_base.py.

Constructor & Destructor Documentation

◆ __init__()

def __init__ (   self,
  model,
  target,
  model_ligands = None,
  target_ligands = None,
  resnum_alignments = False,
  rename_ligand_chain = False,
  substructure_match = False,
  coverage_delta = 0.2,
  max_symmetries = 1e5 
)

Definition at line 204 of file ligand_scoring_base.py.

Member Function Documentation

◆ assignment()

def assignment (   self)
 Ligand assignment based on computed scores

Implements a greedy algorithm to assign target and model ligands
with each other. Starts from each valid ligand pair as indicated
by a state of 0 in :attr:`state_matrix`. Each iteration first selects
high coverage pairs. Given max_coverage defined as the highest
coverage observed in the available pairs, all pairs with coverage
in [max_coverage-*coverage_delta*, max_coverage] are selected.
The best scoring pair among those is added to the assignment
and the whole process is repeated until there are no ligands to
assign anymore.

:rtype: :class:`list` of :class:`tuple` (trg_lig_idx, mdl_lig_idx)

Definition at line 392 of file ligand_scoring_base.py.

◆ aux()

def aux (   self)
 Get a dictionary of score details, keyed by model ligand

Extract dict with something like:
``scorer.score[lig.GetChain().GetName()][lig.GetNumber()]``.
The returned info dicts are based on :attr:`~assignment`. The content is
documented in the respective child class.

:rtype: :class:`dict`

Definition at line 467 of file ligand_scoring_base.py.

◆ aux_matrix()

def aux_matrix (   self)
 Get the matrix of scorer specific auxiliary data.

Target ligands are in rows, model ligands in columns.

Auxiliary data consists of arbitrary data dicts which allow a child
class to provide additional information for a scored ligand pair.
empty dictionaries indicate that the child class simply didn't return
anything or that no value could be computed (e.g. different ligands).
In other words: values are only valid if respective location in the
:attr:`~state_matrix` is 0.

:rtype: :class:`~numpy.ndarray`

Definition at line 373 of file ligand_scoring_base.py.

◆ coverage_matrix()

def coverage_matrix (   self)
 Get the matrix of model ligand atom coverage in the target.

Target ligands are in rows, model ligands in columns.

NaN values indicate that no value could be computed (i.e. different
ligands). In other words: values are only valid if the respective
location in :attr:`~state_matrix` is 0. If `substructure_match=False`,
only full match isomorphisms are considered, and therefore only values
of 1.0 can be observed.

:rtype: :class:`~numpy.ndarray`

Definition at line 355 of file ligand_scoring_base.py.

◆ get_model_ligand_state_report()

def get_model_ligand_state_report (   self,
  mdl_lig_idx 
)
 Get summary of states observed with respect to all target ligands

Mainly for debug purposes 

:param mdl_lig_idx: Index of model ligand for which report should be
                    generated
:type mdl_lig_idx: :class:`int`

Definition at line 521 of file ligand_scoring_base.py.

◆ get_target_ligand_state_report()

def get_target_ligand_state_report (   self,
  trg_lig_idx 
)
 Get summary of states observed with respect to all model ligands

Mainly for debug purposes 

:param trg_lig_idx: Index of target ligand for which report should be
                    generated
:type trg_lig_idx: :class:`int`

Definition at line 509 of file ligand_scoring_base.py.

◆ guess_model_ligand_unassigned_reason()

def guess_model_ligand_unassigned_reason (   self,
  mdl_lig_idx 
)
 Makes an educated guess why model ligand is not assigned

This either returns actual error states or custom states that are
derived from them.

:param mdl_lig_idx: Index of model ligand
:type mdl_lig_idx: :class:`int`
:returns: :class:`tuple` with two elements: 1) keyword 2) human readable
          sentence describing the issue, (\"unknown\",\"unknown\") if
          nothing obvious can be found.
:raises: :class:`RuntimeError` if specified model ligand is assigned

Definition at line 616 of file ligand_scoring_base.py.

◆ guess_target_ligand_unassigned_reason()

def guess_target_ligand_unassigned_reason (   self,
  trg_lig_idx 
)
 Makes an educated guess why target ligand is not assigned

This either returns actual error states or custom states that are
derived from them.

:param trg_lig_idx: Index of target ligand
:type trg_lig_idx: :class:`int`
:returns: :class:`tuple` with two elements: 1) keyword 2) human readable
          sentence describing the issue, (\"unknown\",\"unknown\") if
          nothing obvious can be found.
:raises: :class:`RuntimeError` if specified target ligand is assigned

Definition at line 552 of file ligand_scoring_base.py.

◆ model_ligand_states()

def model_ligand_states (   self)
 Encodes states of model ligands

Non-zero state in any of the model ligands invalidates the full
respective column in :attr:`~state_matrix`.

:rtype: :class:`~numpy.ndarray`

Definition at line 313 of file ligand_scoring_base.py.

◆ score()

def score (   self)
 Get a dictionary of score values, keyed by model ligand

Extract score with something like:
``scorer.score[lig.GetChain().GetName()][lig.GetNumber()]``.
The returned scores are based on :attr:`~assignment`.

:rtype: :class:`dict`

Definition at line 445 of file ligand_scoring_base.py.

◆ score_matrix()

def score_matrix (   self)
 Get the matrix of scores.

Target ligands are in rows, model ligands in columns.

NaN values indicate that no value could be computed (i.e. different
ligands). In other words: values are only valid if the respective
location in :attr:`~state_matrix` is 0. 

:rtype: :class:`~numpy.ndarray`

Definition at line 339 of file ligand_scoring_base.py.

◆ state_matrix()

def state_matrix (   self)
 Encodes states of ligand pairs

Ligand pairs can be matched and a valid score can be expected if
respective location in this matrix is 0.
Target ligands are in rows, model ligands in columns. States are encoded
as integers <= 9. Larger numbers encode errors for child classes.
Use something like ``self.state_decoding[3]`` to get a decscription.       

:rtype: :class:`~numpy.ndarray`

Definition at line 297 of file ligand_scoring_base.py.

◆ target_ligand_states()

def target_ligand_states (   self)
 Encodes states of target ligands

Non-zero state in any of the target ligands invalidates the full
respective row in :attr:`~state_matrix`.

:rtype: :class:`~numpy.ndarray`

Definition at line 326 of file ligand_scoring_base.py.

◆ unassigned_model_ligands()

def unassigned_model_ligands (   self)
 Get indices of model ligands which are not assigned 

:rtype: :class:`list` of :class:`int`

Definition at line 500 of file ligand_scoring_base.py.

◆ unassigned_model_ligands_reasons()

def unassigned_model_ligands_reasons (   self)

Definition at line 680 of file ligand_scoring_base.py.

◆ unassigned_target_ligands()

def unassigned_target_ligands (   self)
 Get indices of target ligands which are not assigned 

:rtype: :class:`list` of :class:`int`

Definition at line 490 of file ligand_scoring_base.py.

◆ unassigned_target_ligands_reasons()

def unassigned_target_ligands_reasons (   self)

Definition at line 693 of file ligand_scoring_base.py.

Field Documentation

◆ coverage_delta

coverage_delta

Definition at line 248 of file ligand_scoring_base.py.

◆ max_symmetries

max_symmetries

Definition at line 249 of file ligand_scoring_base.py.

◆ model

model

Definition at line 210 of file ligand_scoring_base.py.

◆ model_ligands

model_ligands

Definition at line 235 of file ligand_scoring_base.py.

◆ rename_ligand_chain

rename_ligand_chain

Definition at line 246 of file ligand_scoring_base.py.

◆ resnum_alignments

resnum_alignments

Definition at line 245 of file ligand_scoring_base.py.

◆ state_decoding

state_decoding

Definition at line 282 of file ligand_scoring_base.py.

◆ substructure_match

substructure_match

Definition at line 247 of file ligand_scoring_base.py.

◆ target

target

Definition at line 217 of file ligand_scoring_base.py.

◆ target_ligands

target_ligands

Definition at line 225 of file ligand_scoring_base.py.


The documentation for this class was generated from the following file: