OpenStructure
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
Public Member Functions | Data Fields
LigandScorer Class Reference

Public Member Functions

def __init__
 
def chain_mapper
 
def rmsd_matrix
 
def lddt_pli_matrix
 
def coverage_matrix
 
def rmsd
 
def rmsd_details
 
def lddt_pli
 
def lddt_pli_details
 
def unassigned_target_ligands
 
def unassigned_target_ligand_descriptions
 
def unassigned_model_ligands
 
def unassigned_model_ligand_descriptions
 

Data Fields

 model
 
 target
 
 target_ligands
 
 model_ligands
 
 resnum_alignments
 
 check_resnames
 
 rename_ligand_chain
 
 substructure_match
 
 radius
 
 lddt_pli_radius
 
 lddt_lp_radius
 
 binding_sites_topn
 
 global_chain_mapping
 
 rmsd_assignment
 
 n_max_naive
 
 max_symmetries
 
 unassigned
 
 coverage_delta
 

Detailed Description

Scorer class to compute various small molecule ligand (non polymer) scores.

.. note ::
  Extra requirements:

  - Python modules `numpy` and `networkx` must be available
    (e.g. use ``pip install numpy networkx``)

At the moment, two scores are available:

* lDDT-PLI, that looks at the conservation of protein-ligand contacts
  with :class:`lDDT <ost.mol.alg.lddt.lDDTScorer>`.
* Binding-site superposed, symmetry-corrected RMSD that assesses the
  accuracy of the ligand pose (BiSyRMSD, hereinafter referred to as RMSD).

Both scores involve local chain mapping of the reference binding site
onto the model, symmetry-correction, and finally assignment (mapping)
of model and target ligands, as described in (Manuscript in preparation).

The binding site is defined based on a radius around the target ligand
in the reference structure only. It only contains protein and nucleic
acid chains that pass the criteria for the
:class:`chain mapping <ost.mol.alg.chain_mapping>`. This means ignoring
other ligands, waters, short polymers as well as any incorrectly connected
chains that may be in proximity.

Results are available as matrices (`(lddt_pli|rmsd)_matrix`), where every
target-model score is reported in a matrix; as `(lddt_pli|rmsd)` where
a model-target assignment has been determined (see below) and reported in
a dictionary; and as (`(lddt_pli|rmsd)_details`) methods, which report
additional details about different aspects of the scoring such as chain
mapping.

The behavior of chain mapping and ligand assignment can be controlled
with the `global_chain_mapping` and `rmsd_assignment` arguments.

By default, chain mapping is performed locally, ie. only within the
binding site. As a result, different ligand scores can correspond to
different chain mappings. This tends to produce more favorable scores,
especially in large, partially regular oligomeric complexes.
Setting `global_chain_mapping=True` enforces a single global chain mapping,
as per :meth:`ost.mol.alg.chain_mapping.ChainMapper.GetMapping`.
Note that this global chain mapping currently ignores non polymer entities
such as small ligands, and may result in overly pessimistic scores.

By default, target-model ligand assignments are computed independently
for the RMSD and lDDT-PLI scores. For RMSD, each model ligand is uniquely
assigned to a target ligand, starting from the "best" possible mapping
(lowest RMSD) and using each target and model ligand in a single
assignment. Ties are resolved by best (highest) lDDT-PLI. Similarly,
for lDDT-PLI, the assignment is based on the highest lDDT-PLI, and ties
broken by lowest RMSD. Setting `rmsd_assignment=True` forces a single
ligand assignment, based on RMSD only. Ties are broken arbitrarily.

By default, only exact matches between target and model ligands are
considered. This is a problem when the target only contains a subset
of the expected atoms (for instance if atoms are missing in an
experimental structure, which often happens in the PDB). With
`substructure_match=True`, complete model ligands can be scored against
partial target ligands. One problem with this approach is that it is
very easy to find good matches to small, irrelevant ligands like EDO, CO2
or GOL. To counter that, the assignment algorithm considers the coverage,
expressed as the fraction of atoms of the model ligand atoms covered in the
target. Higher coverage matches are prioritized, but a match with a better
score will be preferred if it falls within a window of `coverage_delta`
(by default 0.2) of a worse-scoring match. As a result, for instance,
with a delta of 0.2, a low-score match with coverage 0.96 would be
preferred to a high-score match with coverage 0.90.

Assumptions:

The class generally assumes that the
:attr:`~ost.mol.ResidueHandle.is_ligand` property is properly set on all
the ligand atoms, and only ligand atoms. This is typically the case for
entities loaded from mmCIF (tested with mmCIF files from the PDB and
SWISS-MODEL). Legacy PDB files must contain `HET` headers (which is usually
the case for files downloaded from the PDB but not elsewhere).

The class doesn't perform any cleanup of the provided structures.
It is up to the caller to ensure that the data is clean and suitable for
scoring. :ref:`Molck <molck>` should be used with extra
care, as many of the options (such as `rm_non_std` or `map_nonstd_res`) can
cause ligands to be removed from the structure. If cleanup with Molck is
needed, ligands should be kept aside and passed separately. Non-ligand residues
should be valid compounds with atom names following the naming conventions
of the component dictionary. Non-standard residues are acceptable, and if
the model contains a standard residue at that position, only atoms with
matching names will be considered.

Unlike most of OpenStructure, this class does not assume that the ligands
(either in the model or the target) are part of the PDB component
dictionary. They may have arbitrary residue names. Residue names do not
have to match between the model and the target. Matching is based on
the calculation of isomorphisms which depend on the atom element name and
atom connectivity (bond order is ignored).
It is up to the caller to ensure that the connectivity of atoms is properly
set before passing any ligands to this class. Ligands with improper
connectivity will lead to bogus results.

Note, however, that atom names should be unique within a residue (ie two
distinct atoms cannot have the same atom name).

This only applies to the ligand. The rest of the model and target
structures (protein, nucleic acids) must still follow the usual rules and
contain only residues from the compound library.

Although it isn't a requirement, hydrogen atoms should be removed from the
structures. Here is an example code snippet that will perform a reasonable
cleanup. Keep in mind that this is most likely not going to work as
expected with entities loaded from PDB files, as the `is_ligand` flag is
probably not set properly.

Here is a snippet example of how to use this code::

    from ost.mol.alg.ligand_scoring import LigandScorer
    from ost.mol.alg import Molck, MolckSettings

    # Load data
    # Structure model in PDB format, containing the receptor only
    model = io.LoadPDB("path_to_model.pdb")
    # Ligand model as SDF file
    model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf")
    # Target loaded from mmCIF, containing the ligand
    target = io.LoadMMCIF("path_to_target.cif")

    # Cleanup a copy of the structures
    cleaned_model = model.Copy()
    cleaned_target = target.Copy()
    molck_settings = MolckSettings(rm_unk_atoms=True,
                                   rm_non_std=False,
                                   rm_hyd_atoms=True,
                                   rm_oxt_atoms=False,
                                   rm_zero_occ_atoms=False,
                                   colored=False,
                                   map_nonstd_res=False,
                                   assign_elem=True)
    Molck(cleaned_model, conop.GetDefaultLib(), molck_settings)
    Molck(cleaned_target, conop.GetDefaultLib(), molck_settings)

    # Setup scorer object and compute lDDT-PLI
    model_ligands = [model_ligand.Select("ele != H")]
    ls = LigandScorer(model=cleaned_model, target=cleaned_target, model_ligands=model_ligands)
    print("lDDT-PLI:", ls.lddt_pli)
    print("RMSD:", ls.rmsd)

:param model: Model structure - a deep copy is available as :attr:`model`.
              No additional processing (ie. Molck), checks,
              stereochemistry checks or sanitization is performed on the
              input. Hydrogen atoms are kept.
:type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
:param target: Target structure - a deep copy is available as :attr:`target`.
              No additional processing (ie. Molck), checks or sanitization
              is performed on the input. Hydrogen atoms are kept.
:type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
:param model_ligands: Model ligands, as a list of
              :class:`~ost.mol.ResidueHandle` belonging to the model
              entity. Can be instantiated with either a :class:list of
              :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
              or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`.
              If `None`, ligands will be extracted based on the
              :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is
              normally set properly in entities loaded from mmCIF).
:type model_ligands: :class:`list`
:param target_ligands: Target ligands, as a list of
              :class:`~ost.mol.ResidueHandle` belonging to the target
              entity. Can be instantiated either a :class:list of
              :class:`~ost.mol.ResidueHandle`/:class:`ost.mol.ResidueView`
              or of :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
              containing a single residue each.
              If `None`, ligands will be extracted based on the
              :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is
              normally set properly in entities loaded from mmCIF).
:type target_ligands: :class:`list`
:param resnum_alignments: Whether alignments between chemically equivalent
                          chains in *model* and *target* can be computed
                          based on residue numbers. This can be assumed in
                          benchmarking setups such as CAMEO/CASP.
:type resnum_alignments: :class:`bool`
:param check_resnames:  On by default. Enforces residue name matches
                        between mapped model and target residues.
:type check_resnames: :class:`bool`
:param rename_ligand_chain: If a residue with the same chain name and
                            residue number than an explicitly passed model
                            or target ligand exits in the structure,
                            and `rename_ligand_chain` is False, a
                            RuntimeError will be raised. If
                            `rename_ligand_chain` is True, the ligand will
                            be moved to a new chain instead, and the move
                            will be logged to the console with SCRIPT
                            level.
:type rename_ligand_chain: :class:`bool`
:param chain_mapper: a chain mapper initialized for the target structure.
                     If None (default), a chain mapper will be initialized
                     lazily as required.
:type chain_mapper:  :class:`ost.mol.alg.chain_mapping.ChainMapper`
:param substructure_match: Set this to True to allow partial target ligand.
:type substructure_match: :class:`bool`
:param coverage_delta: the coverage delta for partial ligand assignment.
:type coverage_delta: :class:`float`
:param radius: Inclusion radius for the binding site. Residues with
               atoms within this distance of the ligand will be considered
               for inclusion in the binding site.
:type radius: :class:`float`
:param lddt_pli_radius: lDDT inclusion radius for lDDT-PLI.
:type lddt_pli_radius: :class:`float`
:param lddt_lp_radius: lDDT inclusion radius for lDDT-LP.
:type lddt_lp_radius: :class:`float`
:param binding_sites_topn: maximum number of target binding site
                           representations to assess, per target ligand.
                           Ignored if `global_chain_mapping` is True.
:type binding_sites_topn: :class:`int`
:param global_chain_mapping: set to True to use a global chain mapping for
                             the polymer (protein, nucleotide) chains.
                             Defaults to False, in which case only local
                             chain mappings are allowed (where different
                             ligand may be scored against different chain
                             mappings).
:type global_chain_mapping: :class:`bool`
:param custom_mapping: Provide custom chain mapping between *model* and
                       *target* that is used as global chain mapping.
                       Dictionary with target chain names as key and model
                       chain names as value. Only has an effect if
                       *global_chain_mapping* is True.
:type custom_mapping: :class:`dict`
:param rmsd_assignment: assign ligands based on RMSD only. The default
                        (False) is to use a combination of lDDT-PLI and
                        RMSD for the assignment.
:type rmsd_assignment: :class:`bool`
:param n_max_naive: Parameter for global chain mapping. If *model* and
                    *target* have less or equal that number of chains,
                    the full
                    mapping solution space is enumerated to find the
                    the optimum. A heuristic is used otherwise.
:type n_max_naive: :class:`int`
:param max_symmetries: If more than that many isomorphisms exist for
                   a target-ligand pair, it will be ignored and reported
                   as unassigned.
:type max_symmetries: :class:`int`
:param unassigned: If True, unassigned model ligands are reported in
                   the output together with assigned ligands, with a score
                   of None, and reason for not being assigned in the
                   \\*_details matrix. Defaults to False.
:type unassigned: :class:`bool`

Definition at line 13 of file ligand_scoring.py.

Constructor & Destructor Documentation

def __init__ (   self,
  model,
  target,
  model_ligands = None,
  target_ligands = None,
  resnum_alignments = False,
  check_resnames = True,
  rename_ligand_chain = False,
  chain_mapper = None,
  substructure_match = False,
  coverage_delta = 0.2,
  radius = 4.0,
  lddt_pli_radius = 6.0,
  lddt_lp_radius = 10.0,
  binding_sites_topn = 100000,
  global_chain_mapping = False,
  rmsd_assignment = False,
  n_max_naive = 12,
  max_symmetries = 1e5,
  custom_mapping = None,
  unassigned = False 
)

Definition at line 266 of file ligand_scoring.py.

Member Function Documentation

def chain_mapper (   self)
Chain mapper object for the given :attr:`target`.

:type: :class:`ost.mol.alg.chain_mapping.ChainMapper`

Definition at line 356 of file ligand_scoring.py.

def coverage_matrix (   self)
Get the matrix of model ligand atom coverage in the target.

Target ligands are in rows, model ligands in columns.

A value of 0 indicates that there was no isomorphism between the model
and target ligands. If `substructure_match=False`, only full match
isomorphisms are considered, and therefore only values of 1.0 and 0.0
are reported.

:rtype: :class:`~numpy.ndarray`

Definition at line 1156 of file ligand_scoring.py.

def lddt_pli (   self)
Get a dictionary of lDDT-PLI score values, keyed by model ligand
(chain name, :class:`~ost.mol.ResNum`).

If the scoring object was instantiated with `unassigned=True`, some
scores may be `None`.

:rtype: :class:`dict`

Definition at line 1252 of file ligand_scoring.py.

def lddt_pli_details (   self)
Get a dictionary of lDDT-PLI score details (dictionaries), keyed by
model ligand (chain name, :class:`~ost.mol.ResNum`).

Each sub-dictionary contains the following information:

* `lddt_pli`: the lDDT-PLI score value.
* `rmsd`: the RMSD score value corresponding to the lDDT-PLI
  chain mapping and assignment. This may differ from the RMSD-based
  assignment. Note that a different isomorphism than `lddt_pli` may
  be used.
* `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
* `lddt_pli_n_contacts`: number of total contacts used in lDDT-PLI,
  summed over all thresholds. Can be divided by 8 to obtain the number
  of atomic contacts.
* `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
  that define the binding site in the reference.
* `bs_ref_res_mapped`: a list of residues
  (:class:`~ost.mol.ResidueHandle`) in the reference binding site
  that could be mapped to the model.
* `bs_mdl_res_mapped`: a list of residues
  (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
  the reference binding site. The residues are in the same order as
  `bs_ref_res_mapped`.
* `bb_rmsd`: the RMSD of the binding site backbone after superposition.
  Note: not used for lDDT-PLI computation.
* `target_ligand`: residue handle of the target ligand.
* `model_ligand`: residue handle of the model ligand.
* `chain_mapping`: local chain mapping as a dictionary, with target
  chain name as key and model chain name as value.
* `transform`: transformation to superpose the model onto the target
  (for RMSD only).
* `substructure_match`: whether the score is the result of a partial
  (substructure) match. A value of `True` indicates that the target
  ligand covers only part of the model, while `False` indicates a
  perfect match.
* `inconsistent_residues`: a list of tuples of mapped residues views
  (:class:`~ost.mol.ResidueView`) with residue names that differ
  between the reference and the model, respectively.
  The list is empty if all residue names match, which is guaranteed
  if `check_resnames=True`.
  Note: more binding site mappings may be explored during scoring,
  but only inconsistencies in the selected mapping are reported.
* `unassigned`: only if the scorer was instantiated with
  `unassigned=True`: `False`

If the scoring object was instantiated with `unassigned=True`, in
addition the unmapped ligands will be reported with a score of `None`
and the following information:

* `unassigned`: `True`,
* `reason_short`: a short token of the reason, see
  :attr:`unassigned_model_ligands` for details and meaning.
* `reason_long`: a human-readable text of the reason, see
  :attr:`unassigned_model_ligands` for details and meaning.
* `lddt_pli`: `None`

:rtype: :class:`dict`

Definition at line 1269 of file ligand_scoring.py.

def lddt_pli_matrix (   self)
Get the matrix of lDDT-PLI values.

Target ligands are in rows, model ligands in columns.

NaN values indicate that no lDDT-PLI could be computed (i.e. different
ligands).

:rtype: :class:`~numpy.ndarray`

Definition at line 1133 of file ligand_scoring.py.

def rmsd (   self)
Get a dictionary of RMSD score values, keyed by model ligand
(chain name, :class:`~ost.mol.ResNum`).

If the scoring object was instantiated with `unassigned=True`, some
scores may be `None`.

:rtype: :class:`dict`

Definition at line 1173 of file ligand_scoring.py.

def rmsd_details (   self)
Get a dictionary of RMSD score details (dictionaries), keyed by
model ligand (chain name, :class:`~ost.mol.ResNum`).

The value is a dictionary. For ligands that were assigned (mapped) to
the target, the dictionary contain the following information:

* `rmsd`: the RMSD score value.
* `lddt_lp`: the lDDT score of the ligand pocket (lDDT-LP).
* `bs_ref_res`: a list of residues (:class:`~ost.mol.ResidueHandle`)
  that define the binding site in the reference.
* `bs_ref_res_mapped`: a list of residues
  (:class:`~ost.mol.ResidueHandle`) in the reference binding site
  that could be mapped to the model.
* `bs_mdl_res_mapped`: a list of residues
  (:class:`~ost.mol.ResidueHandle`) in the model that were mapped to
  the reference binding site. The residues are in the same order as
  `bs_ref_res_mapped`.
* `bb_rmsd`: the RMSD of the binding site backbone after superposition
* `target_ligand`: residue handle of the target ligand.
* `model_ligand`: residue handle of the model ligand.
* `chain_mapping`: local chain mapping as a dictionary, with target
  chain name as key and model chain name as value.
* `transform`: transformation to superpose the model onto the target.
* `substructure_match`: whether the score is the result of a partial
  (substructure) match. A value of `True` indicates that the target
  ligand covers only part of the model, while `False` indicates a
  perfect match.
* `coverage`: the fraction of model atoms covered by the assigned
  target ligand, in the interval (0, 1]. If `substructure_match`
  is `False`, this will always be 1.
* `inconsistent_residues`: a list of tuples of mapped residues views
  (:class:`~ost.mol.ResidueView`) with residue names that differ
  between the reference and the model, respectively.
  The list is empty if all residue names match, which is guaranteed
  if `check_resnames=True`.
  Note: more binding site mappings may be explored during scoring,
  but only inconsistencies in the selected mapping are reported.
* `unassigned`: only if the scorer was instantiated with
  `unassigned=True`: `False`

If the scoring object was instantiated with `unassigned=True`, in
addition the unassigned ligands will be reported with a score of `None`
and the following information:

* `unassigned`: `True`,
* `reason_short`: a short token of the reason, see
  :attr:`unassigned_model_ligands` for details and meaning.
* `reason_long`: a human-readable text of the reason, see
  :attr:`unassigned_model_ligands` for details and meaning.
* `rmsd`: `None`

:rtype: :class:`dict`

Definition at line 1190 of file ligand_scoring.py.

def rmsd_matrix (   self)
Get the matrix of RMSD values.

Target ligands are in rows, model ligands in columns.

NaN values indicate that no RMSD could be computed (i.e. different
ligands).

:rtype: :class:`~numpy.ndarray`

Definition at line 1110 of file ligand_scoring.py.

def unassigned_model_ligand_descriptions (   self)
Get a human-readable description of why model ligands were
unassigned, as a dictionary keyed by the controlled dictionary
from :attr:`unassigned_model_ligands`.

Definition at line 1448 of file ligand_scoring.py.

def unassigned_model_ligands (   self)
Get a dictionary of model ligands not assigned to any target ligand,
keyed by model ligand (chain name, :class:`~ost.mol.ResNum`).

The assignment for the lDDT-PLI score is used (and is controlled
by the `rmsd_assignment` argument).

Each item contains a string from a controlled dictionary
about the reason for the absence of assignment.
A human-readable description can be obtained from the
:attr:`unassigned_model_ligand_descriptions` property.
Currently, the following reasons are reported:

* `no_ligand`: there was no ligand in the target.
* `disconnected`: the ligand graph is disconnected.
* `binding_site`: a potential assignment was found in the target, but
  there were no polymer residues in proximity of the ligand in the
  target.
* `model_representation`: a potential assignment was found in the target,
  but no representation of the binding site was found in the model.
  (I.e. the binding site was not modeled. Remember: the binding site
  is defined in the target structure, the position of the model ligand
  itself is ignored at this point.)
* `identity`: the ligand was not found in the target (by graph
  isomorphism). Check your ligand connectivity, and enable the
  `substructure_match` option if the target ligand is incomplete.
* `stoichiometry`: there was a possible assignment in the target, but
  the model target was already assigned to a different model ligand.
  This indicates different stoichiometries.
* `symmetries`: too many symmetries were found (by graph isomorphisms).
  Increase `max_symmetries`.

Some of these reasons can be overlapping, but a single reason will be
reported.

:rtype: :class:`dict`

Definition at line 1396 of file ligand_scoring.py.

def unassigned_target_ligand_descriptions (   self)
Get a human-readable description of why target ligands were
unassigned, as a dictionary keyed by the controlled dictionary
from :attr:`unassigned_target_ligands`.

Definition at line 1386 of file ligand_scoring.py.

def unassigned_target_ligands (   self)
Get a dictionary of target ligands not assigned to any model ligand,
keyed by target ligand (chain name, :class:`~ost.mol.ResNum`).

The assignment for the lDDT-PLI score is used (and is controlled
by the `rmsd_assignment` argument).

Each item contains a string from a controlled dictionary
about the reason for the absence of assignment.
A human-readable description can be obtained from the
:attr:`unassigned_target_ligand_descriptions` property.

Currently, the following reasons are reported:

* `no_ligand`: there was no ligand in the model.
* `disconnected`: the ligand graph was disconnected.
* `binding_site`: no residues were in proximity of the ligand.
* `model_representation`: no representation of the reference binding
  site was found in the model. (I.e. the binding site was not modeled.
  Remember: the binding site is defined in the target structure,
  the position of the model ligand itself is ignored at this point.)
* `identity`: the ligand was not found in the model (by graph
  isomorphism). Check your ligand connectivity, and enable the
  `substructure_match` option if the target ligand is incomplete.
* `stoichiometry`: there was a possible assignment in the model, but
  the model ligand was already assigned to a different target ligand.
  This indicates different stoichiometries.
* `symmetries`: too many symmetries were found (by graph isomorphisms).
  Increase `max_symmetries`.

Some of these reasons can be overlapping, but a single reason will be
reported.

:rtype: :class:`dict`

Definition at line 1336 of file ligand_scoring.py.

Field Documentation

binding_sites_topn

Definition at line 312 of file ligand_scoring.py.

check_resnames

Definition at line 306 of file ligand_scoring.py.

coverage_delta

Definition at line 318 of file ligand_scoring.py.

global_chain_mapping

Definition at line 313 of file ligand_scoring.py.

lddt_lp_radius

Definition at line 311 of file ligand_scoring.py.

lddt_pli_radius

Definition at line 310 of file ligand_scoring.py.

max_symmetries

Definition at line 316 of file ligand_scoring.py.

model

Definition at line 269 of file ligand_scoring.py.

model_ligands

Definition at line 294 of file ligand_scoring.py.

n_max_naive

Definition at line 315 of file ligand_scoring.py.

radius

Definition at line 309 of file ligand_scoring.py.

rename_ligand_chain

Definition at line 307 of file ligand_scoring.py.

resnum_alignments

Definition at line 305 of file ligand_scoring.py.

rmsd_assignment

Definition at line 314 of file ligand_scoring.py.

substructure_match

Definition at line 308 of file ligand_scoring.py.

target

Definition at line 276 of file ligand_scoring.py.

target_ligands

Definition at line 284 of file ligand_scoring.py.

unassigned

Definition at line 317 of file ligand_scoring.py.


The documentation for this class was generated from the following file: