OpenStructure
|
Inherited by LDDTPLIScorer, and SCRMSDScorer.
Data Fields | |
state_decoding | |
Scorer to compute various small molecule ligand (non polymer) scores. :class:`LigandScorer` is an abstract base class dealing with all the setup, data storage, enumerating ligand symmetries and target/model ligand matching/assignment. But actual score computation is delegated to child classes. At the moment, two such classes are available: * :class:`ost.mol.alg.ligand_scoring_lddtpli.LDDTPLIScorer` that assesses the conservation of protein-ligand contacts (LDDT-PLI); * :class:`ost.mol.alg.ligand_scoring_scrmsd.SCRMSDScorer` that computes a binding-site superposed, symmetry-corrected RMSD (BiSyRMSD) and ligand pocket LDDT (LDDT-LP). All versus all scores are available through the lazily computed :attr:`score_matrix`. However, many things can go wrong... be it even something as simple as two ligands not matching. Error states therefore encode scoring issues. An Issue for a particular ligand is indicated by a non-zero state in :attr:`model_ligand_states`/:attr:`target_ligand_states`. This invalidates pairwise scores of such a ligand with all other ligands. This and other issues in pairwise score computation are reported in :attr:`state_matrix` which has the same size as :attr:`score_matrix`. Only if the respective location is 0, a valid pairwise score can be expected. The states and their meaning can be explored with code:: for state_code, (short_desc, desc) in scorer_obj.state_decoding.items(): print(state_code) print(short_desc) print(desc) A common use case is to derive a one-to-one mapping between ligands in the model and the target for which :class:`LigandScorer` provides an automated :attr:`assignment` procedure. By default, only exact matches between target and model ligands are considered. This is a problem when the target only contains a subset of the expected atoms (for instance if atoms are missing in an experimental structure, which often happens in the PDB). With `substructure_match=True`, complete model ligands can be scored against partial target ligands. One problem with this approach is that it is very easy to find good matches to small, irrelevant ligands like EDO, CO2 or GOL. The assignment algorithm therefore considers the coverage, expressed as the fraction of atoms of the model ligand atoms covered in the target. Higher coverage matches are prioritized, but a match with a better score will be preferred if it falls within a window of `coverage_delta` (by default 0.2) of a worse-scoring match. As a result, for instance, with a delta of 0.2, a low-score match with coverage 0.96 would be preferred over a high-score match with coverage 0.70. Assumptions: Unlike most of OpenStructure, this class does not assume that the ligands (either for the model or the target) are part of the PDB component dictionary. They may have arbitrary residue names. Residue names do not have to match between the model and the target. Matching is based on the calculation of isomorphisms which depend on the atom element name and atom connectivity (bond order is ignored). It is up to the caller to ensure that the connectivity of atoms is properly set before passing any ligands to this class. Ligands with improper connectivity will lead to bogus results. This only applies to the ligand. The rest of the model and target structures (protein, nucleic acids) must still follow the usual rules and contain only residues from the compound library. Structures are cleaned up according to constructor documentation. We advise to use the :func:`MMCIFPrep` and :func:`PDBPrep` for loading which already clean hydrogens and, in the case of MMCIF, optionally extract ligands ready to be used by the :class:`LigandScorer` based on "non-polymer" entity types. In case of PDB file format, ligands must be loaded separately as SDF files. Only polymers (protein and nucleic acids) of model and target are considered for ligand binding sites. The :class:`ost.mol.alg.chain_mapping.ChainMapper` is used to enumerate possible mappings of these chains. In short: identical chains in the target are grouped based on pairwise sequence identity (see pep_seqid_thr/nuc_seqid_thr param). Each model chain is assigned to one of these groups (see mdl_map_pep_seqid_thr/mdl_map_nuc_seqid_thr param). To avoid spurious matches, only polymers of a certain length are considered in this matching procedure (see min_pep_length/min_nuc_length param). Shorter polymers are never mapped and do not contribute to scoring. Here is an example of how to setup a scorer:: from ost.mol.alg.ligand_scoring_scrmsd import SCRMSDScorer # Load data # Structure model in PDB format, containing the receptor only model = PDBPrep("path_to_model.pdb") # Ligand model as SDF file model_ligand = io.LoadEntity("path_to_ligand.sdf", format="sdf") # Target loaded from mmCIF, containing the ligand target, target_ligands = MMCIFPrep("path_to_target.cif", extract_nonpoly=True) # Setup scorer object and compute SCRMSD model_ligands = [model_ligand.Select("ele != H")] sc = SCRMSDScorer(model, target, model_ligands, target_ligands) # Perform assignment and read respective scores for lig_pair in sc.assignment: trg_lig = sc.target_ligands[lig_pair[0]] mdl_lig = sc.model_ligands[lig_pair[1]] score = sc.score_matrix[lig_pair[0], lig_pair[1]] print(f"Score for {trg_lig} and {mdl_lig}: {score}") # check cleanup in model and target structure: print("model cleanup:", sc.model_cleanup_log) print("target cleanup:", sc.target_cleanup_log) :param model: Model structure - a deep copy is available as :attr:`model`. The model undergoes the following cleanup steps which are dependent on :class:`ost.conop.CompoundLib` returned by :func:`ost.conop.GetDefaultLib`: 1) removal of hydrogens, 2) removal of residues for which there is no entry in :class:`ost.conop.CompoundLib`, 3) removal of residues that are not peptide linking or nucleotide linking according to :class:`ost.conop.CompoundLib` 4) removal of atoms that are not defined for respective residues in :class:`ost.conop.CompoundLib`. Except step 1), every cleanup is logged with :class:`ost.LogLevel` Warning and a report is available as :attr:`model_cleanup_log`. :type model: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` :param target: Target structure - same processing as *model*. :type target: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` :param model_ligands: Model ligands, as a list of :class:`ost.mol.ResidueHandle`/ :class:`ost.mol.ResidueView`/ :class:`ost.mol.EntityHandle`/ :class:`ost.mol.EntityView`. For :class:`ost.mol.EntityHandle`/ :class:`ost.mol.EntityView`, each residue is considered to be an individual ligand. All ligands are copied into a separate :class:`ost.mol.EntityHandle` available as :attr:`model_ligand_ent` and the respective list of ligands is available as :attr:`model_ligands`. :type model_ligands: :class:`list` :param target_ligands: Target ligands, same processing as model ligands. :type target_ligands: :class:`list` :param resnum_alignments: Whether alignments between chemically equivalent chains in *model* and *target* can be computed based on residue numbers. This can be assumed in benchmarking setups such as CAMEO/CASP. :type resnum_alignments: :class:`bool` :param substructure_match: Set this to True to allow incomplete (i.e. partially resolved) target ligands. :type substructure_match: :class:`bool` :param coverage_delta: the coverage delta for partial ligand assignment. :type coverage_delta: :class:`float` :param max_symmetries: If more than that many isomorphisms exist for a target-ligand pair, it will be ignored and reported as unassigned. :type max_symmetries: :class:`int` :param min_pep_length: Relevant parameter if short peptides are involved in the polymer binding site. Minimum peptide length for a chain to be considered in chain mapping. The chain mapping algorithm first performs an all vs. all pairwise sequence alignment to identify \"equal\" chains within the target structure. We go for simple sequence identity there. Short sequences can be problematic as they may produce high sequence identity alignments by pure chance. :type min_pep_length: :class:`int` :param min_nuc_length: Same for nucleotides :type min_nuc_length: :class:`int` :param pep_seqid_thr: Parameter that affects identification of identical chains in target - see :class:`ost.mol.alg.chain_mapping.ChainMapper` :type pep_seqid_thr: :class:`float` :param nuc_seqid_thr: Parameter that affects identification of identical chains in target - see :class:`ost.mol.alg.chain_mapping.ChainMapper` :type nuc_seqid_thr: :class:`float` :param mdl_map_pep_seqid_thr: Parameter that affects mapping of model chains to target chains - see :class:`ost.mol.alg.chain_mapping.ChainMapper` :type mdl_map_pep_seqid_thr: :class:`float` :param mdl_map_nuc_seqid_thr: Parameter that affects mapping of model chains to target chains - see :class:`ost.mol.alg.chain_mapping.ChainMapper` :type mdl_map_nuc_seqid_thr: :class:`float`
Definition at line 216 of file ligand_scoring_base.py.
def __init__ | ( | self, | |
model, | |||
target, | |||
model_ligands, | |||
target_ligands, | |||
resnum_alignments = False , |
|||
substructure_match = False , |
|||
coverage_delta = 0.2 , |
|||
max_symmetries = 1e5 , |
|||
rename_ligand_chain = False , |
|||
min_pep_length = 6 , |
|||
min_nuc_length = 4 , |
|||
pep_seqid_thr = 95. , |
|||
nuc_seqid_thr = 95. , |
|||
mdl_map_pep_seqid_thr = 0. , |
|||
mdl_map_nuc_seqid_thr = 0. |
|||
) |
Definition at line 402 of file ligand_scoring_base.py.
def assignment | ( | self | ) |
Ligand assignment based on computed scores Implements a greedy algorithm to assign target and model ligands with each other. Starts from each valid ligand pair as indicated by a state of 0 in :attr:`state_matrix`. Each iteration first selects high coverage pairs. Given max_coverage defined as the highest coverage observed in the available pairs, all pairs with coverage in [max_coverage-*coverage_delta*, max_coverage] are selected. The best scoring pair among those is added to the assignment and the whole process is repeated until there are no ligands to assign anymore. :rtype: :class:`list` of :class:`tuple` (trg_lig_idx, mdl_lig_idx)
Definition at line 760 of file ligand_scoring_base.py.
def aux | ( | self | ) |
Get a dictionary of score details, keyed by model ligand Extract dict with something like: ``scorer.score[lig.GetChain().GetName()][lig.GetNumber()]``. The returned info dicts are based on :attr:`~assignment`. The content is documented in the respective child class. :rtype: :class:`dict`
Definition at line 836 of file ligand_scoring_base.py.
def aux_matrix | ( | self | ) |
Get the matrix of scorer specific auxiliary data. Target ligands are in rows, model ligands in columns. Auxiliary data consists of arbitrary data dicts which allow a child class to provide additional information for a scored ligand pair. empty dictionaries indicate that the child class simply didn't return anything or that no value could be computed (e.g. different ligands). In other words: values are only valid if respective location in the :attr:`~state_matrix` is 0. :rtype: :class:`~numpy.ndarray`
Definition at line 741 of file ligand_scoring_base.py.
def coverage_delta | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 653 of file ligand_scoring_base.py.
def coverage_matrix | ( | self | ) |
Get the matrix of model ligand atom coverage in the target. Target ligands are in rows, model ligands in columns. NaN values indicate that no value could be computed (i.e. different ligands). In other words: values are only valid if the respective location in :attr:`~state_matrix` is 0. If `substructure_match=False`, only full match isomorphisms are considered, and therefore only values of 1.0 can be observed. :rtype: :class:`~numpy.ndarray`
Definition at line 723 of file ligand_scoring_base.py.
def get_model_ligand_state_report | ( | self, | |
mdl_lig_idx | |||
) |
Get summary of states observed with respect to all target ligands Mainly for debug purposes :param mdl_lig_idx: Index of model ligand for which report should be generated :type mdl_lig_idx: :class:`int`
Definition at line 891 of file ligand_scoring_base.py.
def get_target_ligand_state_report | ( | self, | |
trg_lig_idx | |||
) |
Get summary of states observed with respect to all model ligands Mainly for debug purposes :param trg_lig_idx: Index of target ligand for which report should be generated :type trg_lig_idx: :class:`int`
Definition at line 879 of file ligand_scoring_base.py.
def guess_model_ligand_unassigned_reason | ( | self, | |
mdl_lig_idx | |||
) |
Makes an educated guess why model ligand is not assigned This either returns actual error states or custom states that are derived from them. Currently, the following reasons are reported: * `no_ligand`: there was no ligand in the target. * `disconnected`: the ligand graph is disconnected. * `identity`: the ligand was not found in the target (by graph or subgraph isomorphism). Check your ligand connectivity. * `no_iso`: no full isomorphic match could be found. Try enabling `substructure_match=True` if the target ligand is incomplete. * `symmetries`: too many symmetries were found (by graph isomorphisms). Try to increase `max_symmetries`. * `stoichiometry`: there was a possible assignment in the target, but the model target was already assigned to a different model ligand. This indicates different stoichiometries. * `no_contact` (LDDT-PLI only): There were no LDDT contacts between the binding site and the ligand, and LDDT-PLI is undefined. * `target_binding_site` (SCRMSD only): a potential assignment was found in the target, but there were no polymer residues in proximity of the ligand in the target. * `model_binding_site` (SCRMSD only): a potential assignment was found in the target, but no binding site was found in the model. Either the binding site was not modeled or the model ligand was positioned too far in combination with `full_bs_search=False`. :param mdl_lig_idx: Index of model ligand :type mdl_lig_idx: :class:`int` :returns: :class:`tuple` with two elements: 1) keyword 2) human readable sentence describing the issue, (\"unknown\",\"unknown\") if nothing obvious can be found. :raises: :class:`RuntimeError` if specified model ligand is assigned
Definition at line 1006 of file ligand_scoring_base.py.
def guess_target_ligand_unassigned_reason | ( | self, | |
trg_lig_idx | |||
) |
Makes an educated guess why target ligand is not assigned This either returns actual error states or custom states that are derived from them. Currently, the following reasons are reported: * `no_ligand`: there was no ligand in the model. * `disconnected`: the ligand graph was disconnected. * `identity`: the ligand was not found in the model (by graph isomorphism). Check your ligand connectivity. * `no_iso`: no full isomorphic match could be found. Try enabling `substructure_match=True` if the target ligand is incomplete. * `symmetries`: too many symmetries were found (by graph isomorphisms). Try to increase `max_symmetries`. * `stoichiometry`: there was a possible assignment in the model, but the model ligand was already assigned to a different target ligand. This indicates different stoichiometries. * `no_contact` (LDDT-PLI only): There were no LDDT contacts between the binding site and the ligand, and LDDT-PLI is undefined. * `target_binding_site` (SCRMSD only): no polymer residues were in proximity of the target ligand. * `model_binding_site` (SCRMSD only): the binding site was not found in the model. Either the binding site was not modeled or the model ligand was positioned too far in combination with `full_bs_search=False`. :param trg_lig_idx: Index of target ligand :type trg_lig_idx: :class:`int` :returns: :class:`tuple` with two elements: 1) keyword 2) human readable sentence describing the issue, (\"unknown\",\"unknown\") if nothing obvious can be found. :raises: :class:`RuntimeError` if specified target ligand is assigned
Definition at line 922 of file ligand_scoring_base.py.
def max_symmetries | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 659 of file ligand_scoring_base.py.
def mdl_map_nuc_seqid_thr | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 641 of file ligand_scoring_base.py.
def mdl_map_pep_seqid_thr | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 635 of file ligand_scoring_base.py.
def min_nuc_length | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 617 of file ligand_scoring_base.py.
def min_pep_length | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 611 of file ligand_scoring_base.py.
def model | ( | self | ) |
Model receptor structure Processed according to docs in :class:`LigandScorer` constructor
Definition at line 541 of file ligand_scoring_base.py.
def model_cleanup_log | ( | self | ) |
Reports residues/atoms that were removed in model during cleanup Residues and atoms are described as :class:`str` in format <chain_name>.<resnum>.<ins_code> (residue) and <chain_name>.<resnum>.<ins_code>.<aname> (atom). :class:`dict` with keys: * 'cleaned_residues': another :class:`dict` with keys: * 'no_clib': residues that have been removed because no entry could be found in :class:`ost.conop.CompoundLib` * 'not_linking': residues that have been removed because they're not peptide or nucleotide linking according to :class:`ost.conop.CompoundLib` * 'cleaned_atoms': another :class:`dict` with keys: * 'unknown_atoms': atoms that have been removed as they're not part of their respective residue according to :class:`ost.conop.CompoundLib`
Definition at line 557 of file ligand_scoring_base.py.
def model_ligand_states | ( | self | ) |
Encodes states of model ligands Non-zero state in any of the model ligands invalidates the full respective column in :attr:`~state_matrix`. :rtype: :class:`~numpy.ndarray`
Definition at line 681 of file ligand_scoring_base.py.
def model_ligands | ( | self | ) |
Residues representing model ligands :class:`list` of :class:`ost.mol.ResidueHandle`
Definition at line 589 of file ligand_scoring_base.py.
def nuc_seqid_thr | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 629 of file ligand_scoring_base.py.
def pep_seqid_thr | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 623 of file ligand_scoring_base.py.
def resnum_alignments | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 605 of file ligand_scoring_base.py.
def score | ( | self | ) |
Get a dictionary of score values, keyed by model ligand Extract score with something like: ``scorer.score[lig.GetChain().GetName()][lig.GetNumber()]``. The returned scores are based on :attr:`~assignment`. :rtype: :class:`dict`
Definition at line 814 of file ligand_scoring_base.py.
def score_matrix | ( | self | ) |
Get the matrix of scores. Target ligands are in rows, model ligands in columns. NaN values indicate that no value could be computed (i.e. different ligands). In other words: values are only valid if the respective location in :attr:`~state_matrix` is 0. :rtype: :class:`~numpy.ndarray`
Definition at line 707 of file ligand_scoring_base.py.
def state_matrix | ( | self | ) |
Encodes states of ligand pairs Ligand pairs can be matched and a valid score can be expected if respective location in this matrix is 0. Target ligands are in rows, model ligands in columns. States are encoded as integers <= 9. Larger numbers encode errors for child classes. Use something like ``self.state_decoding[3]`` to get a decscription. :rtype: :class:`~numpy.ndarray`
Definition at line 665 of file ligand_scoring_base.py.
def substructure_match | ( | self | ) |
Given at :class:`LigandScorer` construction
Definition at line 647 of file ligand_scoring_base.py.
def target | ( | self | ) |
Target receptor structure Processed according to docs in :class:`LigandScorer` constructor
Definition at line 549 of file ligand_scoring_base.py.
def target_cleanup_log | ( | self | ) |
Same for target
Definition at line 583 of file ligand_scoring_base.py.
def target_ligand_states | ( | self | ) |
Encodes states of target ligands Non-zero state in any of the target ligands invalidates the full respective row in :attr:`~state_matrix`. :rtype: :class:`~numpy.ndarray`
Definition at line 694 of file ligand_scoring_base.py.
def target_ligands | ( | self | ) |
Residues representing target ligands :class:`list` of :class:`ost.mol.ResidueHandle`
Definition at line 597 of file ligand_scoring_base.py.
def unassigned_model_ligands | ( | self | ) |
Get indices of model ligands which are not assigned :rtype: :class:`list` of :class:`int`
Definition at line 870 of file ligand_scoring_base.py.
def unassigned_model_ligands_reasons | ( | self | ) |
Definition at line 1091 of file ligand_scoring_base.py.
def unassigned_target_ligands | ( | self | ) |
Get indices of target ligands which are not assigned :rtype: :class:`list` of :class:`int`
Definition at line 860 of file ligand_scoring_base.py.
def unassigned_target_ligands_reasons | ( | self | ) |
Definition at line 1104 of file ligand_scoring_base.py.
state_decoding |
Definition at line 525 of file ligand_scoring_base.py.