OpenStructure
|
Public Member Functions | |
def | __init__ |
def | chem_mapping |
def | chem_mapping |
def | ent_to_cm_1 |
def | ent_to_cm_1 |
def | ent_to_cm_2 |
def | ent_to_cm_2 |
def | symm_1 |
def | symm_2 |
def | SetSymmetries |
def | chain_mapping |
def | chain_mapping |
def | chain_mapping_scheme |
def | alignments |
def | alignments |
def | mapped_residues |
def | mapped_residues |
def | global_score |
def | best_score |
def | superposition |
def | clustalw_bin |
def | clustalw_bin |
def | GetOligoLDDTScorer |
Data Fields | |
qs_ent_1 | |
qs_ent_2 | |
res_num_alignment | |
calpha_only | |
max_ca_per_chain_for_cm | |
max_mappings_extensive | |
Object to compute QS scores. Simple usage without any precomputed contacts, symmetries and mappings: .. code-block:: python import ost from ost.mol.alg import qsscoring # load two biounits to compare ent_full = ost.io.LoadPDB('3ia3', remote=True) ent_1 = ent_full.Select('cname=A,D') ent_2 = ent_full.Select('cname=B,C') # get score ost.PushVerbosityLevel(3) try: qs_scorer = qsscoring.QSscorer(ent_1, ent_2) ost.LogScript('QSscore:', str(qs_scorer.global_score)) ost.LogScript('Chain mapping used:', str(qs_scorer.chain_mapping)) # commonly you want the QS global score as output qs_score = qs_scorer.global_score except qsscoring.QSscoreError as ex: # default handling: report failure and set score to 0 ost.LogError('QSscore failed:', str(ex)) qs_score = 0 For maximal performance when computing QS scores of the same entity with many others, it is advisable to construct and reuse :class:`QSscoreEntity` objects. Any known / precomputed information can be filled into the appropriate attribute here (no checks done!). Otherwise most quantities are computed on first access and cached (lazy evaluation). Setters are provided to set values with extra checks (e.g. :func:`SetSymmetries`). All necessary seq. alignments are done by global BLOSUM62-based alignment. A multiple sequence alignment is performed with ClustalW unless :attr:`chain_mapping` is provided manually. You will need to have an executable ``clustalw`` or ``clustalw2`` in your ``PATH`` or you must set :attr:`clustalw_bin` accordingly. Otherwise an exception (:class:`ost.settings.FileNotFound`) is thrown. Formulas for QS scores: :: - QS_best = weighted_scores / (weight_sum + weight_extra_mapped) - QS_global = weighted_scores / (weight_sum + weight_extra_all) -> weighted_scores = sum(w(min(d1,d2)) * (1 - abs(d1-d2)/12)) for shared -> weight_sum = sum(w(min(d1,d2))) for shared -> weight_extra_mapped = sum(w(d)) for all mapped but non-shared -> weight_extra_all = sum(w(d)) for all non-shared -> w(d) = 1 if d <= 5, exp(-2 * ((d-5.0)/4.28)^2) else In the formulas above: * "d": CA/CB-CA/CB distance of an "inter-chain contact" ("d1", "d2" for "shared" contacts). * "mapped": we could map chains of two structures and align residues in :attr:`alignments`. * "shared": pairs of residues which are "mapped" and have "inter-chain contact" in both structures. * "inter-chain contact": CB-CB pairs (CA for GLY) with distance <= 12 A (fallback to CA-CA if :attr:`calpha_only` is True). * "w(d)": weighting function (prob. of 2 res. to interact given CB distance) from `Xu et al. 2009 <https://dx.doi.org/10.1016%2Fj.jmb.2008.06.002>`_. :param ent_1: First structure to be scored. :type ent_1: :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or :class:`~ost.mol.EntityView` :param ent_2: Second structure to be scored. :type ent_2: :class:`QSscoreEntity`, :class:`~ost.mol.EntityHandle` or :class:`~ost.mol.EntityView` :param res_num_alignment: Sets :attr:`res_num_alignment` :raises: :class:`QSscoreError` if input structures are invalid or are monomers or have issues that make it impossible for a QS score to be computed. .. attribute:: qs_ent_1 :class:`QSscoreEntity` object for *ent_1* given at construction. If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we set it to 'pdb_1' using :func:`~QSscoreEntity.SetName`. .. attribute:: qs_ent_2 :class:`QSscoreEntity` object for *ent_2* given at construction. If entity names (:attr:`~QSscoreEntity.original_name`) are not unique, we set it to 'pdb_2' using :func:`~QSscoreEntity.SetName`. .. attribute:: calpha_only True if any of the two structures is CA-only (after cleanup). :type: :class:`bool` .. attribute:: max_ca_per_chain_for_cm Maximal number of CA atoms to use in each chain to determine chain mappings. Setting this to -1 disables the limit. Limiting it speeds up determination of symmetries and chain mappings. By default it is set to 100. :type: :class:`int` .. attribute:: max_mappings_extensive Maximal number of chain mappings to test for 'extensive' :attr:`chain_mapping_scheme`. The extensive chain mapping search must in the worst case check O(N^2) * O(N!) possible mappings for complexes with N chains. Two octamers without symmetry would require 322560 mappings to be checked. To limit computations, a :class:`QSscoreError` is thrown if we try more than the maximal number of chain mappings. The value must be set before the first use of :attr:`chain_mapping`. By default it is set to 100000. :type: :class:`int` .. attribute:: res_num_alignment Forces each alignment in :attr:`alignments` to be based on residue numbers instead of using a global BLOSUM62-based alignment. :type: :class:`bool`
Definition at line 46 of file qsscoring.py.
def __init__ | ( | self, | |
ent_1, | |||
ent_2, | |||
res_num_alignment = False |
|||
) |
Definition at line 170 of file qsscoring.py.
def alignments | ( | self | ) |
List of successful sequence alignments using :attr:`chain_mapping`. There will be one alignment for each mapped chain and they are ordered by their chain names in :attr:`qs_ent_1`. The first sequence of each alignment belongs to :attr:`qs_ent_1` and the second one to :attr:`qs_ent_2`. The sequences are named according to the mapped chain names and have views attached into :attr:`QSscoreEntity.ent` of :attr:`qs_ent_1` and :attr:`qs_ent_2`. If :attr:`res_num_alignment` is False, each alignment is performed using a global BLOSUM62-based alignment. Otherwise, the positions in the alignment sequences are simply given by the residue number so that residues with matching numbers are aligned. :getter: Computed on first use (cached) :type: :class:`list` of :class:`~ost.seq.AlignmentHandle`
Definition at line 438 of file qsscoring.py.
def alignments | ( | self, | |
alignments | |||
) |
Definition at line 465 of file qsscoring.py.
def best_score | ( | self | ) |
QS-score without penalties. Like :attr:`global_score`, but neglecting additional residues or chains in one of the biounits (i.e. the score is calculated considering only mapped chains and residues). :getter: Computed on first use (cached) :type: :class:`float` :raises: :class:`QSscoreError` if only one chain is mapped
Definition at line 506 of file qsscoring.py.
def chain_mapping | ( | self | ) |
Mapping from :attr:`ent_to_cm_1` to :attr:`ent_to_cm_2`. Properties: - Mapping is between chains of same chem. group (see :attr:`chem_mapping`) - Each chain can appear only once in mapping - All chains of complex with less chains are mapped - Symmetry (:attr:`symm_1`, :attr:`symm_2`) is taken into account Details on algorithms used to find mapping: - We try all pairs of chem. mapped chains within symmetry group and get superpose-transformation for them - First option: check for "sufficient overlap" of other chain-pairs - For each chain-pair defined above: apply superposition to full oligomer and map chains based on structural overlap - Structural overlap = X% of residues in second oligomer covered within Y Angstrom of a (chem. mapped) chain in first oligomer. We successively try (X,Y) = (80,4), (40,6) and (20,8) to be less and less strict in mapping (warning shown for most permissive one). - If multiple possible mappings are found, we choose the one which leads to the lowest multi-chain-RMSD given the superposition - Fallback option: try all mappings to find minimal multi-chain-RMSD (warning shown) - For each chain-pair defined above: apply superposition, try all (!) possible chain mappings (within symmetry group) and keep mapping with lowest multi-chain-RMSD - Repeat procedure above to resolve symmetry. Within the symmetry group we can use the chain mapping computed before and we just need to find which symmetry group in first oligomer maps to which in the second one. We again try all possible combinations... - Limitations: - Trying all possible mappings is a combinatorial nightmare (factorial). We throw an exception if too many combinations (e.g. octomer vs octomer with no usable symmetry) - The mapping is forced: the "best" mapping will be chosen independently of how badly they fit in terms of multi-chain-RMSD - As a result, such a forced mapping can lead to a large range of resulting QS scores. An extreme example was observed between 1on3.1 and 3u9r.1, where :attr:`global_score` can range from 0.12 to 0.43 for mappings with very similar multi-chain-RMSD. :getter: Computed on first use (cached) :type: :class:`dict` with key / value = :class:`str` (chain names, key for :attr:`ent_to_cm_1`, value for :attr:`ent_to_cm_2`) :raises: :class:`QSscoreError` if there are too many combinations to check to find a chain mapping (see :attr:`max_mappings_extensive`).
Definition at line 343 of file qsscoring.py.
def chain_mapping | ( | self, | |
chain_mapping | |||
) |
Definition at line 405 of file qsscoring.py.
def chain_mapping_scheme | ( | self | ) |
Mapping scheme used to get :attr:`chain_mapping`. Possible values: - 'strict': 80% overlap needed within 4 Angstrom (overlap based mapping). - 'tolerant': 40% overlap needed within 6 Angstrom (overlap based mapping). - 'permissive': 20% overlap needed within 8 Angstrom (overlap based mapping). It's best if you check mapping manually! - 'extensive': Extensive search used for mapping detection (fallback). This approach has known limitations and may be removed in future versions. Mapping should be checked manually! - 'user': :attr:`chain_mapping` was set by user before first use of this attribute. :getter: Computed with :attr:`chain_mapping` on first use (cached) :type: :class:`str` :raises: :class:`QSscoreError` as in :attr:`chain_mapping`.
Definition at line 409 of file qsscoring.py.
def chem_mapping | ( | self | ) |
Inter-complex mapping of chemical groups. Each group (see :attr:`QSscoreEntity.chem_groups`) is mapped according to highest sequence identity. Alignment is between longest sequences in groups. Limitations: - If different numbers of groups, we map only the groups for the complex with less groups (rest considered unmapped and shown as warning) - The mapping is forced: the "best" mapping will be chosen independently of how low the seq. identity may be :getter: Computed on first use (cached) :type: :class:`dict` with key = :class:`tuple` of chain names in :attr:`qs_ent_1` and value = :class:`tuple` of chain names in :attr:`qs_ent_2`. :raises: :class:`QSscoreError` if we end up having no chains for either entity in the mapping (can happen if chains do not have CA atoms).
Definition at line 211 of file qsscoring.py.
def chem_mapping | ( | self, | |
chem_mapping | |||
) |
Definition at line 237 of file qsscoring.py.
def clustalw_bin | ( | self | ) |
Full path to ``clustalw`` or ``clustalw2`` executable to use for multiple sequence alignments (unless :attr:`chain_mapping` is provided manually). :getter: Located in path on first use (cached) :type: :class:`str`
Definition at line 542 of file qsscoring.py.
def clustalw_bin | ( | self, | |
clustalw_bin | |||
) |
Definition at line 555 of file qsscoring.py.
def ent_to_cm_1 | ( | self | ) |
Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries. Properties: - Includes only residues aligned according to :attr:`chem_mapping` - Includes only 1 CA atom per residue - Has at least 5 and at most :attr:`max_ca_per_chain_for_cm` atoms per chain - All chains of the same chemical group have the same number of atoms (also in :attr:`ent_to_cm_2` according to :attr:`chem_mapping`) - All chains appearing in :attr:`chem_mapping` appear in this entity (so the two can be safely used together) This entity might be transformed (i.e. all positions rotated/translated by same transformation matrix) if this can speed up computations. So do not assume fixed global positions (but relative distances will remain fixed). :getter: Computed on first use (cached) :type: :class:`~ost.mol.EntityHandle` :raises: :class:`QSscoreError` if any chain ends up having less than 5 res.
Definition at line 241 of file qsscoring.py.
def ent_to_cm_1 | ( | self, | |
ent_to_cm_1 | |||
) |
Definition at line 268 of file qsscoring.py.
def ent_to_cm_2 | ( | self | ) |
Subset of :attr:`qs_ent_1` used to compute chain mapping and symmetries (see :attr:`ent_to_cm_1` for details).
Definition at line 272 of file qsscoring.py.
def ent_to_cm_2 | ( | self, | |
ent_to_cm_2 | |||
) |
Definition at line 281 of file qsscoring.py.
def GetOligoLDDTScorer | ( | self, | |
settings, | |||
penalize_extra_chains = True |
|||
) |
:return: :class:`OligoLDDTScorer` object, setup for this QS scoring problem. The scorer is set up with :attr:`qs_ent_1` as the reference and :attr:`qs_ent_2` as the model. :param settings: Passed to :class:`OligoLDDTScorer` constructor. :param penalize_extra_chains: Passed to :class:`OligoLDDTScorer` constructor.
Definition at line 558 of file qsscoring.py.
def global_score | ( | self | ) |
QS-score with penalties. The range of the score is between 0 (i.e. no interface residues are shared between biounits) and 1 (i.e. the interfaces are identical). The global QS-score is computed applying penalties when interface residues or entire chains are missing (i.e. anything that is not mapped in :attr:`mapped_residues` / :attr:`chain_mapping`) in one of the biounits. :getter: Computed on first use (cached) :type: :class:`float` :raises: :class:`QSscoreError` if only one chain is mapped
Definition at line 487 of file qsscoring.py.
def mapped_residues | ( | self | ) |
Mapping of shared residues in :attr:`alignments`. :getter: Computed on first use (cached) :type: :class:`dict` *mapped_residues[c1][r1] = r2* with: *c1* = Chain name in first entity (= first sequence in aln), *r1* = Residue number in first entity, *r2* = Residue number in second entity
Definition at line 469 of file qsscoring.py.
def mapped_residues | ( | self, | |
mapped_residues | |||
) |
Definition at line 483 of file qsscoring.py.
def SetSymmetries | ( | self, | |
symm_1, | |||
symm_2 | |||
) |
Set user-provided symmetry groups. These groups are restricted to chain names appearing in :attr:`ent_to_cm_1` and :attr:`ent_to_cm_2` respectively. They are only valid if they cover all chains and both *symm_1* and *symm_2* have same lengths of symmetry group tuples. Otherwise trivial symmetry group used (see :attr:`symm_1`). :param symm_1: Value to set for :attr:`symm_1`. :param symm_2: Value to set for :attr:`symm_2`.
Definition at line 323 of file qsscoring.py.
def superposition | ( | self | ) |
Superposition result based on shared CA atoms in :attr:`alignments`. The superposition can be used to map :attr:`QSscoreEntity.ent` of :attr:`qs_ent_1` onto the one of :attr:`qs_ent_2`. Use :func:`ost.geom.Invert` if you need the opposite transformation. :getter: Computed on first use (cached) :type: :class:`ost.mol.alg.SuperpositionResult`
Definition at line 522 of file qsscoring.py.
def symm_1 | ( | self | ) |
Symmetry groups for :attr:`qs_ent_1` used to speed up chain mapping. This is a list of chain-lists where each chain-list can be used reconstruct the others via cyclic C or dihedral D symmetry. The first chain-list is used as a representative symmetry group. For heteromers, the group-members must contain all different seqres in oligomer. Example: symm. groups [(A,B,C), (D,E,F), (G,H,I)] means that there are symmetry transformations to get (D,E,F) and (G,H,I) from (A,B,C). Properties: - All symmetry group tuples have the same length (num. of chains) - All chains in :attr:`ent_to_cm_1` appear (w/o duplicates) - For heteros: symmetry group tuples have all different chem. groups - Trivial symmetry group = one tuple with all chains (used if inconsistent data provided or if no symmetry is found) - Either compatible to :attr:`symm_2` or trivial symmetry groups used. Compatibility requires same lengths of symmetry group tuples and it must be possible to get an overlap (80% of residues covered within 6 A of a (chem. mapped) chain) of all chains in representative symmetry groups by superposing one pair of chains. :getter: Computed on first use (cached) :type: :class:`list` of :class:`tuple` of :class:`str` (chain names)
Definition at line 285 of file qsscoring.py.
def symm_2 | ( | self | ) |
Symmetry groups for :attr:`qs_ent_2` (see :attr:`symm_1` for details).
Definition at line 317 of file qsscoring.py.
calpha_only |
Definition at line 192 of file qsscoring.py.
max_ca_per_chain_for_cm |
Definition at line 193 of file qsscoring.py.
max_mappings_extensive |
Definition at line 194 of file qsscoring.py.
qs_ent_1 |
Definition at line 173 of file qsscoring.py.
qs_ent_2 |
Definition at line 177 of file qsscoring.py.
res_num_alignment |
Definition at line 191 of file qsscoring.py.