OpenStructure
|
Inherited by LDDTPLIScorer, and SCRMSDScorer.
Public Member Functions | |
def | __init__ (self, model, target, model_ligands=None, target_ligands=None, resnum_alignments=False, rename_ligand_chain=False, substructure_match=False, coverage_delta=0.2, max_symmetries=1e5) |
def | state_matrix (self) |
def | model_ligand_states (self) |
def | target_ligand_states (self) |
def | score_matrix (self) |
def | coverage_matrix (self) |
def | aux_matrix (self) |
def | assignment (self) |
def | score (self) |
def | aux (self) |
def | unassigned_target_ligands (self) |
def | unassigned_model_ligands (self) |
def | get_target_ligand_state_report (self, trg_lig_idx) |
def | get_model_ligand_state_report (self, mdl_lig_idx) |
def | guess_target_ligand_unassigned_reason (self, trg_lig_idx) |
def | guess_model_ligand_unassigned_reason (self, mdl_lig_idx) |
def | unassigned_model_ligands_reasons (self) |
def | unassigned_target_ligands_reasons (self) |
Data Fields | |
model | |
target | |
target_ligands | |
model_ligands | |
resnum_alignments | |
rename_ligand_chain | |
substructure_match | |
coverage_delta | |
max_symmetries | |
state_decoding | |
Scorer to compute various small molecule ligand (non polymer) scores. .. note :: Extra requirements: - Python modules `numpy` and `networkx` must be available (e.g. use ``pip install numpy networkx``) :class:`LigandScorer` is an abstract base class dealing with all the setup, data storage, enumerating ligand symmetries and target/model ligand matching/assignment. But actual score computation is delegated to child classes. At the moment, two such classes are available: * :class:`ost.mol.alg.ligand_scoring_lddtpli.LDDTPLIScorer` that assesses the conservation of protein-ligand contacts * :class:`ost.mol.alg.ligand_scoring_scrmsd.SCRMSDScorer` that computes a binding-site superposed, symmetry-corrected RMSD. All versus all scores are available through the lazily computed :attr:`score_matrix`. However, many things can go wrong... be it even something as simple as two ligands not matching. Error states therefore encode scoring issues. An Issue for a particular ligand is indicated by a non-zero state in :attr:`model_ligand_states`/:attr:`target_ligand_states`. This invalidates pairwise scores of such a ligand with all other ligands. This and other issues in pairwise score computation are reported in :attr:`state_matrix` which has the same size as :attr:`score_matrix`. Only if the respective location is 0, a valid pairwise score can be expected. The states and their meaning can be explored with code:: for state_code, (short_desc, desc) in scorer_obj.state_decoding.items(): print(state_code) print(short_desc) print(desc) A common use case is to derive a one-to-one mapping between ligands in the model and the target for which :class:`LigandScorer` provides an automated assignment procedure. By default, only exact matches between target and model ligands are considered. This is a problem when the target only contains a subset of the expected atoms (for instance if atoms are missing in an experimental structure, which often happens in the PDB). With `substructure_match=True`, complete model ligands can be scored against partial target ligands. One problem with this approach is that it is very easy to find good matches to small, irrelevant ligands like EDO, CO2 or GOL. The assignment algorithm therefore considers the coverage, expressed as the fraction of atoms of the model ligand atoms covered in the target. Higher coverage matches are prioritized, but a match with a better score will be preferred if it falls within a window of `coverage_delta` (by default 0.2) of a worse-scoring match. As a result, for instance, with a delta of 0.2, a low-score match with coverage 0.96 would be preferred over a high-score match with coverage 0.70. Assumptions: :class:`LigandScorer` generally assumes that the :attr:`~ost.mol.ResidueHandle.is_ligand` property is properly set on all the ligand atoms, and only ligand atoms. This is typically the case for entities loaded from mmCIF (tested with mmCIF files from the PDB and SWISS-MODEL). Legacy PDB files must contain `HET` headers (which is usually the case for files downloaded from the PDB but not elsewhere). The class doesn't perform any cleanup of the provided structures. It is up to the caller to ensure that the data is clean and suitable for scoring. :ref:`Molck <molck>` should be used with extra care, as many of the options (such as `rm_non_std` or `map_nonstd_res`) can cause ligands to be removed from the structure. If cleanup with Molck is needed, ligands should be kept aside and passed separately. Non-ligand residues should be valid compounds with atom names following the naming conventions of the component dictionary. Non-standard residues are acceptable, and if the model contains a standard residue at that position, only atoms with matching names will be considered. Unlike most of OpenStructure, this class does not assume that the ligands (either in the model or the target) are part of the PDB component dictionary. They may have arbitrary residue names. Residue names do not have to match between the model and the target. Matching is based on the calculation of isomorphisms which depend on the atom element name and atom connectivity (bond order is ignored). It is up to the caller to ensure that the connectivity of atoms is properly set before passing any ligands to this class. Ligands with improper connectivity will lead to bogus results. Note, however, that atom names should be unique within a residue (ie two distinct atoms cannot have the same atom name). This only applies to the ligand. The rest of the model and target structures (protein, nucleic acids) must still follow the usual rules and contain only residues from the compound library. Although it isn't a requirement, hydrogen atoms should be removed from the structures. Here is an example code snippet that will perform a reasonable cleanup. Keep in mind that this is most likely not going to work as expected with entities loaded from PDB files, as the `is_ligand` flag is probably not set properly. Here is an example of how to use setup a scorer code:: from ost.mol.alg.ligand_scoring_scrmsd import SCRMSDScorer from ost.mol.alg import Molck, MolckSettings # Load data # Structure model in PDB format, containing the receptor only model = io.LoadPDB("path_to_model.pdb") # Ligand model as SDF file model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf") # Target loaded from mmCIF, containing the ligand target = io.LoadMMCIF("path_to_target.cif") # Cleanup a copy of the structures cleaned_model = model.Copy() cleaned_target = target.Copy() molck_settings = MolckSettings(rm_unk_atoms=True, rm_non_std=False, rm_hyd_atoms=True, rm_oxt_atoms=False, rm_zero_occ_atoms=False, colored=False, map_nonstd_res=False, assign_elem=True) Molck(cleaned_model, conop.GetDefaultLib(), molck_settings) Molck(cleaned_target, conop.GetDefaultLib(), molck_settings) # Setup scorer object and compute lDDT-PLI model_ligands = [model_ligand.Select("ele != H")] sc = SCRMSDScorer(cleaned_model, cleaned_target, model_ligands) # Perform assignment and read respective scores for lig_pair in sc.assignment: trg_lig = sc.target_ligands[lig_pair[0]] mdl_lig = sc.model_ligands[lig_pair[1]] score = sc.score_matrix[lig_pair[0], lig_pair[1]] print(f"Score for {trg_lig} and {mdl_lig}: {score}") :param model: Model structure - a deep copy is available as :attr:`model`. No additional processing (ie. Molck), checks, stereochemistry checks or sanitization is performed on the input. Hydrogen atoms are kept. :type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` :param target: Target structure - a deep copy is available as :attr:`target`. No additional processing (ie. Molck), checks or sanitization is performed on the input. Hydrogen atoms are kept. :type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` :param model_ligands: Model ligands, as a list of :class:`~ost.mol.ResidueHandle` belonging to the model entity. Can be instantiated with either a :class:list of :class:`~ost.mol.ResidueHandle`/ :class:`ost.mol.ResidueView` or of :class:`ost.mol.EntityHandle`/ :class:`ost.mol.EntityView`. If `None`, ligands will be extracted based on the :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is normally set properly in entities loaded from mmCIF). :type model_ligands: :class:`list` :param target_ligands: Target ligands, as a list of :class:`~ost.mol.ResidueHandle` belonging to the target entity. Can be instantiated either a :class:list of :class:`~ost.mol.ResidueHandle`/ :class:`ost.mol.ResidueView` or of :class:`ost.mol.EntityHandle`/ :class:`ost.mol.EntityView` containing a single residue each. If `None`, ligands will be extracted based on the :attr:`~ost.mol.ResidueHandle.is_ligand` flag (this is normally set properly in entities loaded from mmCIF). :type target_ligands: :class:`list` :param resnum_alignments: Whether alignments between chemically equivalent chains in *model* and *target* can be computed based on residue numbers. This can be assumed in benchmarking setups such as CAMEO/CASP. :type resnum_alignments: :class:`bool` :param rename_ligand_chain: If a residue with the same chain name and residue number than an explicitly passed model or target ligand exits in the structure, and `rename_ligand_chain` is False, a RuntimeError will be raised. If `rename_ligand_chain` is True, the ligand will be moved to a new chain instead, and the move will be logged to the console with SCRIPT level. :type rename_ligand_chain: :class:`bool` :param substructure_match: Set this to True to allow incomplete (i.e. partially resolved) target ligands. :type substructure_match: :class:`bool` :param coverage_delta: the coverage delta for partial ligand assignment. :type coverage_delta: :class:`float` :param max_symmetries: If more than that many isomorphisms exist for a target-ligand pair, it will be ignored and reported as unassigned. :type max_symmetries: :class:`int`
Definition at line 8 of file ligand_scoring_base.py.
def __init__ | ( | self, | |
model, | |||
target, | |||
model_ligands = None , |
|||
target_ligands = None , |
|||
resnum_alignments = False , |
|||
rename_ligand_chain = False , |
|||
substructure_match = False , |
|||
coverage_delta = 0.2 , |
|||
max_symmetries = 1e5 |
|||
) |
Definition at line 204 of file ligand_scoring_base.py.
def assignment | ( | self | ) |
Ligand assignment based on computed scores Implements a greedy algorithm to assign target and model ligands with each other. Starts from each valid ligand pair as indicated by a state of 0 in :attr:`state_matrix`. Each iteration first selects high coverage pairs. Given max_coverage defined as the highest coverage observed in the available pairs, all pairs with coverage in [max_coverage-*coverage_delta*, max_coverage] are selected. The best scoring pair among those is added to the assignment and the whole process is repeated until there are no ligands to assign anymore. :rtype: :class:`list` of :class:`tuple` (trg_lig_idx, mdl_lig_idx)
Definition at line 392 of file ligand_scoring_base.py.
def aux | ( | self | ) |
Get a dictionary of score details, keyed by model ligand Extract dict with something like: ``scorer.score[lig.GetChain().GetName()][lig.GetNumber()]``. The returned info dicts are based on :attr:`~assignment`. The content is documented in the respective child class. :rtype: :class:`dict`
Definition at line 467 of file ligand_scoring_base.py.
def aux_matrix | ( | self | ) |
Get the matrix of scorer specific auxiliary data. Target ligands are in rows, model ligands in columns. Auxiliary data consists of arbitrary data dicts which allow a child class to provide additional information for a scored ligand pair. empty dictionaries indicate that the child class simply didn't return anything or that no value could be computed (e.g. different ligands). In other words: values are only valid if respective location in the :attr:`~state_matrix` is 0. :rtype: :class:`~numpy.ndarray`
Definition at line 373 of file ligand_scoring_base.py.
def coverage_matrix | ( | self | ) |
Get the matrix of model ligand atom coverage in the target. Target ligands are in rows, model ligands in columns. NaN values indicate that no value could be computed (i.e. different ligands). In other words: values are only valid if the respective location in :attr:`~state_matrix` is 0. If `substructure_match=False`, only full match isomorphisms are considered, and therefore only values of 1.0 can be observed. :rtype: :class:`~numpy.ndarray`
Definition at line 355 of file ligand_scoring_base.py.
def get_model_ligand_state_report | ( | self, | |
mdl_lig_idx | |||
) |
Get summary of states observed with respect to all target ligands Mainly for debug purposes :param mdl_lig_idx: Index of model ligand for which report should be generated :type mdl_lig_idx: :class:`int`
Definition at line 521 of file ligand_scoring_base.py.
def get_target_ligand_state_report | ( | self, | |
trg_lig_idx | |||
) |
Get summary of states observed with respect to all model ligands Mainly for debug purposes :param trg_lig_idx: Index of target ligand for which report should be generated :type trg_lig_idx: :class:`int`
Definition at line 509 of file ligand_scoring_base.py.
def guess_model_ligand_unassigned_reason | ( | self, | |
mdl_lig_idx | |||
) |
Makes an educated guess why model ligand is not assigned This either returns actual error states or custom states that are derived from them. :param mdl_lig_idx: Index of model ligand :type mdl_lig_idx: :class:`int` :returns: :class:`tuple` with two elements: 1) keyword 2) human readable sentence describing the issue, (\"unknown\",\"unknown\") if nothing obvious can be found. :raises: :class:`RuntimeError` if specified model ligand is assigned
Definition at line 616 of file ligand_scoring_base.py.
def guess_target_ligand_unassigned_reason | ( | self, | |
trg_lig_idx | |||
) |
Makes an educated guess why target ligand is not assigned This either returns actual error states or custom states that are derived from them. :param trg_lig_idx: Index of target ligand :type trg_lig_idx: :class:`int` :returns: :class:`tuple` with two elements: 1) keyword 2) human readable sentence describing the issue, (\"unknown\",\"unknown\") if nothing obvious can be found. :raises: :class:`RuntimeError` if specified target ligand is assigned
Definition at line 552 of file ligand_scoring_base.py.
def model_ligand_states | ( | self | ) |
Encodes states of model ligands Non-zero state in any of the model ligands invalidates the full respective column in :attr:`~state_matrix`. :rtype: :class:`~numpy.ndarray`
Definition at line 313 of file ligand_scoring_base.py.
def score | ( | self | ) |
Get a dictionary of score values, keyed by model ligand Extract score with something like: ``scorer.score[lig.GetChain().GetName()][lig.GetNumber()]``. The returned scores are based on :attr:`~assignment`. :rtype: :class:`dict`
Definition at line 445 of file ligand_scoring_base.py.
def score_matrix | ( | self | ) |
Get the matrix of scores. Target ligands are in rows, model ligands in columns. NaN values indicate that no value could be computed (i.e. different ligands). In other words: values are only valid if the respective location in :attr:`~state_matrix` is 0. :rtype: :class:`~numpy.ndarray`
Definition at line 339 of file ligand_scoring_base.py.
def state_matrix | ( | self | ) |
Encodes states of ligand pairs Ligand pairs can be matched and a valid score can be expected if respective location in this matrix is 0. Target ligands are in rows, model ligands in columns. States are encoded as integers <= 9. Larger numbers encode errors for child classes. Use something like ``self.state_decoding[3]`` to get a decscription. :rtype: :class:`~numpy.ndarray`
Definition at line 297 of file ligand_scoring_base.py.
def target_ligand_states | ( | self | ) |
Encodes states of target ligands Non-zero state in any of the target ligands invalidates the full respective row in :attr:`~state_matrix`. :rtype: :class:`~numpy.ndarray`
Definition at line 326 of file ligand_scoring_base.py.
def unassigned_model_ligands | ( | self | ) |
Get indices of model ligands which are not assigned :rtype: :class:`list` of :class:`int`
Definition at line 500 of file ligand_scoring_base.py.
def unassigned_model_ligands_reasons | ( | self | ) |
Definition at line 680 of file ligand_scoring_base.py.
def unassigned_target_ligands | ( | self | ) |
Get indices of target ligands which are not assigned :rtype: :class:`list` of :class:`int`
Definition at line 490 of file ligand_scoring_base.py.
def unassigned_target_ligands_reasons | ( | self | ) |
Definition at line 693 of file ligand_scoring_base.py.
coverage_delta |
Definition at line 248 of file ligand_scoring_base.py.
max_symmetries |
Definition at line 249 of file ligand_scoring_base.py.
model |
Definition at line 210 of file ligand_scoring_base.py.
model_ligands |
Definition at line 235 of file ligand_scoring_base.py.
rename_ligand_chain |
Definition at line 246 of file ligand_scoring_base.py.
resnum_alignments |
Definition at line 245 of file ligand_scoring_base.py.
state_decoding |
Definition at line 282 of file ligand_scoring_base.py.
substructure_match |
Definition at line 247 of file ligand_scoring_base.py.
target |
Definition at line 217 of file ligand_scoring_base.py.
target_ligands |
Definition at line 225 of file ligand_scoring_base.py.