Class to compute chain mappings
All algorithms are performed on processed structures which fulfill
criteria as given in constructor arguments (*min_pep_length*,
"min_nuc_length") and only contain residues which have all required backbone
atoms. for peptide residues thats N, CA, C and CB (no CB for GLY), for
nucleotide residues thats O5', C5', C4', C3' and O3'.
Chain mapping is a three step process:
* Group chemically identical chains in *target* using pairwise
alignments that are either computed with Needleman-Wunsch (NW) or
simply derived from residue numbers (*resnum_alignments* flag).
In case of NW, *pep_subst_mat*, *pep_gap_open* and *pep_gap_ext*
and their nucleotide equivalents are relevant. Two chains are
considered identical if they fulfill the thresholds given by
*pep_seqid_thr*, *pep_gap_thr*, their nucleotide equivalents
respectively. The grouping information is available as
attributes of this class.
* Map chains in an input model to these groups. Generating alignments
and the similarity criteria are the same as above. You can either
get the group mapping with :func:`GetChemMapping` or directly call
one of the full fletched one-to-one chain mapping functions which
execute that step internally.
* Obtain one-to-one mapping for chains in an input model and
*target* with one of the available mapping functions. Just to get an
idea of complexity. If *target* and *model* are octamers, there are
``8! = 40320`` possible chain mappings.
:param target: Target structure onto which models are mapped.
Computations happen on a selection only containing
polypeptides and polynucleotides.
:type target: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param resnum_alignments: Use residue numbers instead of
Needleman-Wunsch to compute pairwise
alignments. Relevant for :attr:`~chem_groups`
and related attributes.
:type resnum_alignments: :class:`bool`
:param pep_seqid_thr: Threshold used to decide when two chains are
identical. 95 percent tolerates the few mutations
crystallographers like to do.
:type pep_seqid_thr: :class:`float`
:param pep_gap_thr: Additional threshold to avoid gappy alignments with
high seqid. By default this is disabled (set to 1.0).
This threshold checks for a maximum allowed fraction
of gaps in any of the two sequences after stripping
terminal gaps. The reason for not just normalizing
seqid by the longer sequence is that one sequence
might be a perfect subsequence of the other but only
cover half of it.
:type pep_gap_thr: :class:`float`
:param nuc_seqid_thr: Nucleotide equivalent for *pep_seqid_thr*
:type nuc_seqid_thr: :class:`float`
:param nuc_gap_thr: Nucleotide equivalent for *nuc_gap_thr*
:type nuc_gap_thr: :class:`float`
:param pep_subst_mat: Substitution matrix to align peptide sequences,
irrelevant if *resnum_alignments* is True,
defaults to seq.alg.BLOSUM62
:type pep_subst_mat: :class:`ost.seq.alg.SubstWeightMatrix`
:param pep_gap_open: Gap open penalty to align peptide sequences,
irrelevant if *resnum_alignments* is True
:type pep_gap_open: :class:`int`
:param pep_gap_ext: Gap extension penalty to align peptide sequences,
irrelevant if *resnum_alignments* is True
:type pep_gap_ext: :class:`int`
:param nuc_subst_mat: Nucleotide equivalent for *pep_subst_mat*,
defaults to seq.alg.NUC44
:type nuc_subst_mat: :class:`ost.seq.alg.SubstWeightMatrix`
:param nuc_gap_open: Nucleotide equivalent for *pep_gap_open*
:type nuc_gap_open: :class:`int`
:param nuc_gap_ext: Nucleotide equivalent for *pep_gap_ext*
:type nuc_gap_ext: :class:`int`
:param min_pep_length: Minimal number of residues for a peptide chain to be
considered in target and in models.
:type min_pep_length: :class:`int`
:param min_nuc_length: Minimal number of residues for a nucleotide chain to be
considered in target and in models.
:type min_nuc_length: :class:`int`
:param n_max_naive: Max possible chain mappings that are enumerated in
:func:`~GetNaivelDDTMapping` /
:func:`~GetDecomposerlDDTMapping`. A
:class:`RuntimeError` is raised in case of bigger
complexity.
:type n_max_naive: :class:`int`
Definition at line 508 of file chain_mapping.py.
def GetlDDTMapping |
( |
|
self, |
|
|
|
model, |
|
|
|
inclusion_radius = 15.0 , |
|
|
|
thresholds = [0.5 , |
|
|
|
strategy = "naive" , |
|
|
|
steep_opt_rate = None , |
|
|
|
block_seed_size = 5 , |
|
|
|
block_blocks_per_chem_group = 5 , |
|
|
|
chem_mapping_result = None |
|
) |
| |
Identify chain mapping by optimizing lDDT score
Maps *model* chain sequences to :attr:`~chem_groups` and find mapping
based on backbone only lDDT score (CA for amino acids C3' for
Nucleotides).
Either performs a naive search, i.e. enumerate all possible mappings or
executes a greedy strategy that tries to identify a (close to) optimal
mapping in an iterative way by starting from a start mapping (seed). In
each iteration, the one-to-one mapping that leads to highest increase
in number of conserved contacts is added with the additional requirement
that this added mapping must have non-zero interface counts towards the
already mapped chains. So basically we're "growing" the mapped structure
by only adding connected stuff.
The available strategies:
* **naive**: Enumerates all possible mappings and returns best
* **greedy_fast**: perform all vs. all single chain lDDTs within the
respective ref/mdl chem groups. The mapping with highest number of
conserved contacts is selected as seed for greedy extension
* **greedy_full**: try multiple seeds for greedy extension, i.e. try
all ref/mdl chain combinations within the respective chem groups and
retain the mapping leading to the best lDDT.
* **greedy_block**: try multiple seeds for greedy extension, i.e. try
all ref/mdl chain combinations within the respective chem groups and
extend them to *block_seed_size*. *block_blocks_per_chem_group*
for each chem group are selected for exhaustive extension.
Sets :attr:`MappingResult.opt_score` in case of no trivial one-to-one
mapping.
:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param inclusion_radius: Inclusion radius for lDDT
:type inclusion_radius: :class:`float`
:param thresholds: Thresholds for lDDT
:type thresholds: :class:`list` of :class:`float`
:param strategy: Strategy to find mapping. Must be in ["naive",
"greedy_fast", "greedy_full", "greedy_block"]
:type strategy: :class:`str`
:param steep_opt_rate: Only relevant for greedy strategies.
If set, every *steep_opt_rate* mappings, a simple
optimization is executed with the goal of
avoiding local minima. The optimization
iteratively checks all possible swaps of mappings
within their respective chem groups and accepts
swaps that improve lDDT score. Iteration stops as
soon as no improvement can be achieved anymore.
:type steep_opt_rate: :class:`int`
:param block_seed_size: Param for *greedy_block* strategy - Initial seeds
are extended by that number of chains.
:type block_seed_size: :class:`int`
:param block_blocks_per_chem_group: Param for *greedy_block* strategy -
Number of blocks per chem group that
are extended in an initial search
for high scoring local solutions.
:type block_blocks_per_chem_group: :class:`int`
:param chem_mapping_result: Pro param. The result of
:func:`~GetChemMapping` where you provided
*model*. If set, *model* parameter is not
used.
:type chem_mapping_result: :class:`tuple`
:returns: A :class:`MappingResult`
Definition at line 792 of file chain_mapping.py.
def GetQSScoreMapping |
( |
|
self, |
|
|
|
model, |
|
|
|
contact_d = 12.0 , |
|
|
|
strategy = "naive" , |
|
|
|
block_seed_size = 5 , |
|
|
|
block_blocks_per_chem_group = 5 , |
|
|
|
steep_opt_rate = None , |
|
|
|
chem_mapping_result = None , |
|
|
|
greedy_prune_contact_map = False |
|
) |
| |
Identify chain mapping based on QSScore
Scoring is based on CA/C3' positions which are present in all chains of
a :attr:`chem_groups` as well as the *model* chains which are mapped to
that respective chem group.
The following strategies are available:
* **naive**: Naively iterate all possible mappings and return best based
on QS score.
* **greedy_fast**: perform all vs. all single chain lDDTs within the
respective ref/mdl chem groups. The mapping with highest number of
conserved contacts is selected as seed for greedy extension.
Extension is based on QS-score.
* **greedy_full**: try multiple seeds for greedy extension, i.e. try
all ref/mdl chain combinations within the respective chem groups and
retain the mapping leading to the best QS-score.
* **greedy_block**: try multiple seeds for greedy extension, i.e. try
all ref/mdl chain combinations within the respective chem groups and
extend them to *block_seed_size*. *block_blocks_per_chem_group*
for each chem group are selected for exhaustive extension.
Sets :attr:`MappingResult.opt_score` in case of no trivial one-to-one
mapping.
:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param contact_d: Max distance between two residues to be considered as
contact in qs scoring
:type contact_d: :class:`float`
:param strategy: Strategy for sampling, must be in ["naive"]
:type strategy: :class:`str`
:param chem_mapping_result: Pro param. The result of
:func:`~GetChemMapping` where you provided
*model*. If set, *model* parameter is not
used.
:type chem_mapping_result: :class:`tuple`
:param greedy_prune_contact_map: Relevant for all strategies that use
greedy extensions. If True, only chains
with at least 3 contacts (8A CB
distance) towards already mapped chains
in trg/mdl are considered for
extension. All chains that give a
potential non-zero QS-score increase
are used otherwise (at least one
contact within 12A). The consequence
is reduced runtime and usually no
real reduction in accuracy.
:returns: A :class:`MappingResult`
Definition at line 931 of file chain_mapping.py.
def GetRigidMapping |
( |
|
self, |
|
|
|
model, |
|
|
|
strategy = "greedy_single_gdtts" , |
|
|
|
single_chain_gdtts_thresh = 0.4 , |
|
|
|
subsampling = None , |
|
|
|
first_complete = False , |
|
|
|
iterative_superposition = False , |
|
|
|
chem_mapping_result = None |
|
) |
| |
Identify chain mapping based on rigid superposition
Superposition and scoring is based on CA/C3' positions which are present
in all chains of a :attr:`chem_groups` as well as the *model*
chains which are mapped to that respective chem group.
Transformations to superpose *model* onto :attr:`ChainMapper.target`
are estimated using all possible combinations of target and model chains
within the same chem groups and build the basis for further extension.
There are four extension strategies:
* **greedy_single_gdtts**: Iteratively add the model/target chain pair
that adds the most conserved contacts based on the GDT-TS metric
(Number of CA/C3' atoms within [8, 4, 2, 1] Angstrom). The mapping
with highest GDT-TS score is returned. However, that mapping is not
guaranteed to be complete (see *single_chain_gdtts_thresh*).
* **greedy_iterative_gdtts**: Same as greedy_single_gdtts except that
the transformation gets updated with each added chain pair.
* **greedy_single_rmsd**: Conceptually similar to greedy_single_gdtts
but the added chain pairs are the ones with lowest RMSD.
The mapping with lowest overall RMSD gets returned.
*single_chain_gdtts_thresh* is only applied to derive the initial
transformations. After that, the minimal RMSD chain pair gets
iteratively added without applying any threshold.
* **greedy_iterative_rmsd**: Same as greedy_single_rmsd exept that
the transformation gets updated with each added chain pair.
*single_chain_gdtts_thresh* is only applied to derive the initial
transformations. After that, the minimal RMSD chain pair gets
iteratively added without applying any threshold.
:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param strategy: Strategy to extend mappings from initial transforms,
see description above. Must be in ["greedy_single",
"greedy_iterative", "greedy_iterative_rmsd"]
:type strategy: :class:`str`
:param single_chain_gdtts_thresh: Minimal GDT-TS score for model/target
chain pair to be added to mapping.
Mapping extension for a given
transform stops when no pair fulfills
this threshold, potentially leading to
an incomplete mapping.
:type single_chain_gdtts_thresh: :class:`float`
:param subsampling: If given, only use an equally distributed subset
of all CA/C3' positions for superposition/scoring.
:type subsampling: :class:`int`
:param first_complete: Avoid full enumeration and return first found
mapping that covers all model chains or all
target chains. Has no effect on
greedy_iterative_rmsd strategy.
:type first_complete: :class:`bool`
:param iterative_superposition: Whether to compute inital
transformations with
:func:`ost.mol.alg.IterativeSuperposeSVD`
as oposed to
:func:`ost.mol.alg.SuperposeSVD`
:type iterative_superposition: :class:`bool`
:param chem_mapping_result: Pro param. The result of
:func:`~GetChemMapping` where you provided
*model*. If set, *model* parameter is not
used.
:type chem_mapping_result: :class:`tuple`
:returns: A :class:`MappingResult`
Definition at line 1053 of file chain_mapping.py.
def ProcessStructure |
( |
|
self, |
|
|
|
ent |
|
) |
| |
Entity processing for chain mapping
* Selects view containing peptide and nucleotide residues which have
required backbone atoms present - for peptide residues thats
N, CA, C and CB (no CB for GLY), for nucleotide residues thats
O5', C5', C4', C3' and O3'.
* filters view by chain lengths, see *min_pep_length* and
*min_nuc_length* in constructor
* Extracts atom sequences for each chain in that view
* Attaches corresponding :class:`ost.mol.EntityView` to each sequence
* If residue number alignments are used, strictly increasing residue
numbers without insertion codes are ensured in each chain
:param ent: Entity to process
:type ent: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:returns: Tuple with 3 elements: 1) :class:`ost.mol.EntityView`
containing peptide and nucleotide residues 2)
:class:`ost.seq.SequenceList` containing ATOMSEQ sequences
for each polypeptide chain in returned view, sequences have
:class:`ost.mol.EntityView` of according chains attached
3) same for polynucleotide chains
Definition at line 1514 of file chain_mapping.py.