OpenStructure
Loading...
Searching...
No Matches
Public Member Functions | Data Fields | Protected Member Functions | Protected Attributes
ChainMapper Class Reference

Public Member Functions

 __init__ (self, target, resnum_alignments=False, pep_seqid_thr=95., nuc_seqid_thr=95., pep_subst_mat=seq.alg.BLOSUM62, pep_gap_open=-11, pep_gap_ext=-1, nuc_subst_mat=seq.alg.NUC44, nuc_gap_open=-4, nuc_gap_ext=-4, min_pep_length=6, min_nuc_length=4, n_max_naive=1e8, mdl_map_pep_seqid_thr=0., mdl_map_nuc_seqid_thr=0., seqres=None, trg_seqres_mapping=None)
 
 target (self)
 
 polypep_seqs (self)
 
 polynuc_seqs (self)
 
 chem_groups (self)
 
 chem_group_alignments (self)
 
 chem_group_ref_seqs (self)
 
 chem_group_types (self)
 
 GetChemMapping (self, model)
 
 GetlDDTMapping (self, model, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], strategy="heuristic", steep_opt_rate=None, block_seed_size=5, block_blocks_per_chem_group=5, chem_mapping_result=None, heuristic_n_max_naive=40320)
 
 GetQSScoreMapping (self, model, contact_d=12.0, strategy="heuristic", block_seed_size=5, block_blocks_per_chem_group=5, heuristic_n_max_naive=40320, steep_opt_rate=None, chem_mapping_result=None, greedy_prune_contact_map=True)
 
 GetRMSDMapping (self, model, strategy="heuristic", subsampling=50, chem_mapping_result=None, heuristic_n_max_naive=120)
 
 GetAFMMapping (self, model, chem_mapping_result=None)
 
 GetMapping (self, model, n_max_naive=40320)
 
 GetRepr (self, substructure, model, topn=1, inclusion_radius=15.0, thresholds=[0.5, 1.0, 2.0, 4.0], bb_only=False, only_interchain=False, chem_mapping_result=None, global_mapping=None)
 
 GetNMappings (self, model)
 
 ProcessStructure (self, ent)
 
 NWAlign (self, s1, s2, stype)
 
 SEQRESAlign (self, seqres, s, s_ent)
 
 ResNumAlign (self, s1, s2, s1_ent, s2_ent)
 

Data Fields

 resnum_alignments
 
 pep_seqid_thr
 
 nuc_seqid_thr
 
 min_pep_length
 
 min_nuc_length
 
 n_max_naive
 
 mdl_map_pep_seqid_thr
 
 mdl_map_nuc_seqid_thr
 
 pep_subst_mat
 
 pep_gap_open
 
 pep_gap_ext
 
 nuc_subst_mat
 
 nuc_gap_open
 
 nuc_gap_ext
 
 seqres
 
 trg_seqres_mapping
 
 seqres_set
 
 chem_group_types
 
 chem_group_alignments
 
 chem_groups
 
 target
 
 polynuc_seqs
 

Protected Member Functions

 _ComputeChemGroups (self)
 
 _ChemGroupAlignmentsFromSEQRES (self)
 
 _GroupSequencesSEQRES (self, poly_seqs)
 
 _ChemGroupAlignmentsFromATOMSEQ (self)
 
 _GroupSequences (self, seqs, seqid_thr, min_length, chem_type)
 
 _MapSequence (self, ref_seqs, ref_types, s, s_type, s_ent, seq_id_thr=0.0, min_aln_length=0)
 

Protected Attributes

 _chem_groups
 
 _chem_group_alignments
 
 _chem_group_ref_seqs
 
 _chem_group_types
 
 _target
 
 _polypep_seqs
 
 _polynuc_seqs
 

Detailed Description

 Class to compute chain mappings

All algorithms are performed on processed structures which fulfill
criteria as given in constructor arguments (*min_pep_length*,
"min_nuc_length") and only contain residues which have all required backbone
atoms. for peptide residues thats N, CA, C and CB (no CB for GLY), for
nucleotide residues thats O5', C5', C4', C3' and O3'.

Chain mapping is a three step process:

* Group chemically identical chains in *target* into so called chem
  groups. There are two options to achieve this:
  1) using pairwise alignments that are either computed with
  Needleman-Wunsch (NW) or simply derived from residue numbers
  (*resnum_alignments* flag). In case of NW, *pep_subst_mat*, *pep_gap_open*
  and *pep_gap_ext* and their nucleotide equivalents are relevant. Two
  chains are considered identical if they fulfill the *pep_seqid_thr* and
  have at least *min_pep_length* aligned residues. Same logic is applied
  for nucleotides with respective thresholds.
  2) specify seqres sequences and provide the mapping manually. This can
  be useful if you're loading data from an mmCIF file where you have mmCIF
  entity information. Alignments are performed based on residue numbers
  in this case and don't consider the *resnum_alignments* flag. Any
  mismatch of one letter code in the structure and the respective SEQRES
  raises an error.

* Map chains in an input model to these groups. Parameters for generating
  alignments are the same as above. Model chains are mapped to the chem
  group with highest sequence identity. Optionally, you can set a minimum
  sequence identity threshold with *mdl_map_pep_seqid_thr*/
  *mdl_map_nuc_seqid_thr* to avoid nonsensical mappings. You can either
  get the group mapping with :func:`GetChemMapping` or directly call
  one of the full fletched one-to-one chain mapping functions which
  execute that step internally.

* Obtain one-to-one mapping for chains in an input model and
  *target* with one of the available mapping functions. Just to get an
  idea of complexity. If *target* and *model* are octamers, there are
  ``8! = 40320`` possible chain mappings.

:param target: Target structure onto which models are mapped.
               Computations happen on a selection only containing
               polypeptides and polynucleotides.
:type target: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param resnum_alignments: Use residue numbers instead of
                          Needleman-Wunsch to compute pairwise
                          alignments. Relevant for :attr:`~chem_groups` 
                          if *seqres* is not specified explicitely.
:type resnum_alignments: :class:`bool`
:param pep_seqid_thr: Sequence identity threshold (in percent of aligned
                      columns) used to decide when two chains are
                      identical. 95 percent tolerates the few mutations
                      crystallographers like to do. Range: [0-100].
:type pep_seqid_thr:  :class:`float`
:param nuc_seqid_thr: Nucleotide equivalent for *pep_seqid_thr*
:type nuc_seqid_thr:  :class:`float`
:param pep_subst_mat: Substitution matrix to align peptide sequences,
                      irrelevant if *resnum_alignments* is True,
                      defaults to seq.alg.BLOSUM62
:type pep_subst_mat: :class:`ost.seq.alg.SubstWeightMatrix`
:param pep_gap_open: Gap open penalty to align peptide sequences,
                     irrelevant if *resnum_alignments* is True
:type pep_gap_open: :class:`int`
:param pep_gap_ext: Gap extension penalty to align peptide sequences,
                    irrelevant if *resnum_alignments* is True
:type pep_gap_ext: :class:`int`
:param nuc_subst_mat: Nucleotide equivalent for *pep_subst_mat*,
                      defaults to seq.alg.NUC44
:type nuc_subst_mat: :class:`ost.seq.alg.SubstWeightMatrix`
:param nuc_gap_open: Nucleotide equivalent for *pep_gap_open*
:type nuc_gap_open: :class:`int`
:param nuc_gap_ext: Nucleotide equivalent for *pep_gap_ext*
:type nuc_gap_ext: :class:`int`
:param min_pep_length: Minimal number of residues for a peptide chain to be
                       considered in target and in models.
:type min_pep_length: :class:`int`
:param min_nuc_length: Minimal number of residues for a nucleotide chain to be
                       considered in target and in models.
:type min_nuc_length: :class:`int` 
:param n_max_naive: Max possible chain mappings that are enumerated in
                    :func:`~GetNaivelDDTMapping` /
                    :func:`~GetDecomposerlDDTMapping`. A
                    :class:`RuntimeError` is raised in case of bigger
                    complexity.
:type n_max_naive: :class:`int`
:param mdl_map_pep_seqid_thr: Sequence identity threshold (in percent of
                              aligned columns) to avoid non-sensical
                              mapping of model chains to reference chem
                              groups. If a value larger than 0.0 is
                              provided, minimal criteria for assignment
                              becomes a sequence identity of
                              *mdl_map_pep_seqid_thr* and at least
                              *min_pep_length* aligned residues. If set to
                              zero, it simply assigns a model chain to the
                              chem group with highest sequence identity.
                              Range: [0-100].
:type mdl_map_pep_seqid_thr: :class:`float`
:param mdl_map_nuc_seqid_thr: Nucleotide equivalent of
                              *mdl_map_pep_seqid_thr*
:type mdl_map_nuc_seqid_thr: :class:`float`
:param seqres: SEQRES sequences. All polymer chains in *target* will be
               aligned to these sequences using a residue number based
               alignment, effectively ignoring *resnum_alignments* in
               the chem grouping step. Additionally, you need to
               manually specify a mapping of the polymer chains using
               *trg_seqres_mapping* and an error is raised otherwise.
               The one letter codes in
               the structure must exactly match the respective characters
               in seqres and an error is raised if not. These seqres
               sequences are set as :attr:`~chem_group_ref_seqs` and will
               thus influence the mapping of model chains.
:type seqres: :class:`ost.seq.SequenceList`
:param trg_seqres_mapping: Maps each polymer chain in *target* to a sequence
                           in *seqres*. It's a :class:`dict` with chain name
                           as key and seqres sequence name as value.
:type trg_seqres_mapping: :class:`dict`

Definition at line 480 of file chain_mapping.py.

Constructor & Destructor Documentation

◆ __init__()

__init__ (   self,
  target,
  resnum_alignments = False,
  pep_seqid_thr = 95.,
  nuc_seqid_thr = 95.,
  pep_subst_mat = seq.alg.BLOSUM62,
  pep_gap_open = -11,
  pep_gap_ext = -1,
  nuc_subst_mat = seq.alg.NUC44,
  nuc_gap_open = -4,
  nuc_gap_ext = -4,
  min_pep_length = 6,
  min_nuc_length = 4,
  n_max_naive = 1e8,
  mdl_map_pep_seqid_thr = 0.,
  mdl_map_nuc_seqid_thr = 0.,
  seqres = None,
  trg_seqres_mapping = None 
)

Definition at line 598 of file chain_mapping.py.

Member Function Documentation

◆ _ChemGroupAlignmentsFromATOMSEQ()

_ChemGroupAlignmentsFromATOMSEQ (   self)
protected
 Groups target sequences based on ATOMSEQ

returns tuple that can be set as self._chem_group_alignments and
self._chem_group_types

Definition at line 2089 of file chain_mapping.py.

◆ _ChemGroupAlignmentsFromSEQRES()

_ChemGroupAlignmentsFromSEQRES (   self)
protected
 Groups target sequences based on SEQRES

returns tuple that can be set as self._chem_group_alignments and
self._chem_group_types

Definition at line 2026 of file chain_mapping.py.

◆ _ComputeChemGroups()

_ComputeChemGroups (   self)
protected
 Sets properties :attr:`~chem_groups`,
:attr:`~chem_group_alignments`, :attr:`~chem_group_ref_seqs`,
:attr:`~chem_group_types`

Definition at line 1998 of file chain_mapping.py.

◆ _GroupSequences()

_GroupSequences (   self,
  seqs,
  seqid_thr,
  min_length,
  chem_type 
)
protected
Get list of alignments representing groups of equivalent sequences

:param seqid_thr: Threshold used to decide when two chains are identical.
:type seqid_thr:  :class:`float`
:param gap_thr: Additional threshold to avoid gappy alignments with high
                seqid. Number of aligned columns must be at least this
                number.
:type gap_thr: :class:`int`
:param aligner: Helper class to generate pairwise alignments
:type aligner: :class:`_Aligner`
:param chem_type: ChemType of seqs, must be in
                  [:class:`ost.mol.ChemType.AMINOACIDS`,
                  :class:`ost.mol.ChemType.NUCLEOTIDES`]
:type chem_type: :class:`ost.mol.ChemType` 
:returns: A list of alignments, one alignment for each group
          with longest sequence (reference) as first sequence.
:rtype: :class:`ost.seq.AlignmentList`

Definition at line 2145 of file chain_mapping.py.

◆ _GroupSequencesSEQRES()

_GroupSequencesSEQRES (   self,
  poly_seqs 
)
protected

Definition at line 2042 of file chain_mapping.py.

◆ _MapSequence()

_MapSequence (   self,
  ref_seqs,
  ref_types,
  s,
  s_type,
  s_ent,
  seq_id_thr = 0.0,
  min_aln_length = 0 
)
protected
Tries top map *s* onto any of the sequences in *ref_seqs*

Computes alignments of *s* to each of the reference sequences of equal type
and sorts them by seqid*fraction_covered (seqid: sequence identity of
aligned columns in alignment, fraction_covered: Fraction of non-gap
characters in reference sequence that are covered by non-gap characters in
*s*). Best scoring mapping is returned. Optionally, *seq_id*/
*min_aln_length* thresholds can be enabled to avoid non-sensical mappings.
However, *min_aln_length* only has an effect if *seq_id_thr* > 0!!!

:param ref_seqs: Reference sequences 
:type ref_seqs: :class:`ost.seq.SequenceList`
:param ref_types: Types of reference sequences, e.g.
                  ost.mol.ChemType.AminoAcids
:type ref_types: :class:`list` of :class:`ost.mol.ChemType`
:param s: Sequence to map
:type s: :class:`ost.seq.SequenceHandle`
:param s_type: Type of *s*, only try mapping to sequences in *ref_seqs*
               with equal type as defined in *ref_types*
:param s_ent: Entity which represents *s*. Only relevant in case of
              residue number alignments.
:type s_ent: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView`
:param seq_id_thr: Minimum sequence identity to be considered as match
:type seq_id_thr: :class:`float`
:param min_aln_length: Minimum number of aligned columns to be considered
                       as match. Only has an effect if *seq_id_thr* > 0!
:type min_aln_length: :class:`int`    
:returns: Tuple with two elements. 1) index of sequence in *ref_seqs* to
          which *s* can be mapped 2) Pairwise sequence alignment with 
          sequence from *ref_seqs* as first sequence. Both elements are
          None if no mapping can be found or if thresholds are not 
          fulfilled for any alignment.
:raises: :class:`RuntimeError` if mapping is ambiguous, i.e. *s*
         successfully maps to more than one sequence in *ref_seqs* 

Definition at line 2221 of file chain_mapping.py.

◆ chem_group_alignments()

chem_group_alignments (   self)
MSA for each group in :attr:`~chem_groups`

The first sequence is the reference sequence.
The subsequent sequences represent the ATOMSEQ sequences in
:attr:`~target` in same order as in :attr:`~chem_groups`.

:getter: Computed on first use (cached)
:type: :class:`ost.seq.AlignmentList`

Definition at line 723 of file chain_mapping.py.

◆ chem_group_ref_seqs()

chem_group_ref_seqs (   self)
Reference sequence for each group in :attr:`~chem_groups`

:getter: Computed on first use (cached)
:type: :class:`ost.seq.SequenceList`

Definition at line 738 of file chain_mapping.py.

◆ chem_group_types()

chem_group_types (   self)
ChemType of each group in :attr:`~chem_groups`

Specifying if groups are poly-peptides/nucleotides, i.e. 
:class:`ost.mol.ChemType.AMINOACIDS` or
:class:`ost.mol.ChemType.NUCLEOTIDES`

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`ost.mol.ChemType`

Definition at line 749 of file chain_mapping.py.

◆ chem_groups()

chem_groups (   self)
Groups of chemically equivalent chains in :attr:`~target`

First chain in group is the one with longest sequence.

:getter: Computed on first use (cached)
:type: :class:`list` of :class:`list` of :class:`str` (chain names)

Definition at line 710 of file chain_mapping.py.

◆ GetAFMMapping()

GetAFMMapping (   self,
  model,
  chem_mapping_result = None 
)
Identify chain mapping as in AlphaFold-Multimer (AFM)

Described in section 7.3 of

Richard Evans, Michael O’Neill, Alexander Pritzel, Natasha Antropova,
Andrew Senior, Tim Green, Augustin Žídek, Russ Bates, Sam Blackwell,
Jason Yim, et al. Protein complex prediction with AlphaFold-Multimer.
bioRxiv preprint bioRxiv:10.1101/2021.10.04.463034, 2021.

To summarize (largely follows the description in the AF-Multimer paper):
One anchor chain in the ground truth (reference) is selected. If the
reference has stoichiometry A3B2, an arbitary chain in B is selected
as it has smaller ambiguity. In a tie, for example A2B2, the longest
chain in A, B is selected.

Given a model chain with same sequence as the reference anchor chain,
a CA-RMSD (C3' for nucleotides) based superposition is performed. Chains
are then greedily assigned by minimum distance of their geometric
centers. This procedure is repeated for every model chain with same
sequence and the assignment with minimal RMSD of the geometric centers
is returned.

A modification of this algorithm allows to deal with different
stoichiometries in reference and model. In the anchor selection, this
function ensures that there is at least one model chain that can be
mapped to this anchor. If the logic above does not identify a mappable
chain pair for the smallest group, it continues with the second smallest
group etc.

:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:returns: A :class:`MappingResult`

Definition at line 1253 of file chain_mapping.py.

◆ GetChemMapping()

GetChemMapping (   self,
  model 
)
Maps sequences in *model* to chem_groups of target

:param model: Model from which to extract sequences, a
              selection that only includes peptides and nucleotides
              is performed and returned along other results.
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:returns: Tuple with two lists of length `len(self.chem_groups)`, a list
          and an :class:`ost.mol.EntityView` representing *model*:
          1) Each element is a :class:`list` with mdl chain names that
          map to the chem group at that position.
          2) Each element is a :class:`ost.seq.AlignmentList` aligning
          these mdl chain sequences to the chem group ref sequences.
          3) The model chains that could not be mapped to the reference
          4) A selection of *model* that only contains polypeptides and
          polynucleotides whose ATOMSEQ exactly matches the sequence
          info in the returned alignments.

Definition at line 763 of file chain_mapping.py.

◆ GetlDDTMapping()

GetlDDTMapping (   self,
  model,
  inclusion_radius = 15.0,
  thresholds = [0.5, 1.0, 2.0, 4.0],
  strategy = "heuristic",
  steep_opt_rate = None,
  block_seed_size = 5,
  block_blocks_per_chem_group = 5,
  chem_mapping_result = None,
  heuristic_n_max_naive = 40320 
)
 Identify chain mapping by optimizing lDDT score

Maps *model* chain sequences to :attr:`~chem_groups` and find mapping
based on backbone only lDDT score (CA for amino acids C3' for
Nucleotides).

Either performs a naive search, i.e. enumerate all possible mappings or
executes a greedy strategy that tries to identify a (close to) optimal
mapping in an iterative way by starting from a start mapping (seed). In
each iteration, the one-to-one mapping that leads to highest increase
in number of conserved contacts is added with the additional requirement
that this added mapping must have non-zero interface counts towards the
already mapped chains. So basically we're "growing" the mapped structure
by only adding connected stuff.

The available strategies:

* **naive**: Enumerates all possible mappings and returns best        

* **greedy_full**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  retain the mapping leading to the best lDDT.

* **greedy_block**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  extend them to *block_seed_size*. *block_blocks_per_chem_group*
  for each chem group are selected for exhaustive extension.

* **heuristic**: Uses *naive* strategy if number of possible mappings
  is within *heuristic_n_max_naive*. The default of 40320 corresponds
  to an octamer (8!=40320). A structure with stoichiometry A6B2 would be
  6!*2!=1440 etc. If the number of possible mappings is larger,
  *greedy_full* is used.

Sets :attr:`MappingResult.opt_score` in case of no trivial one-to-one
mapping. 

:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param inclusion_radius: Inclusion radius for lDDT
:type inclusion_radius: :class:`float`
:param thresholds: Thresholds for lDDT
:type thresholds: :class:`list` of :class:`float`
:param strategy: Strategy to find mapping. Must be in ["naive",
                 "greedy_full", "greedy_block"]
:type strategy: :class:`str`
:param steep_opt_rate: Only relevant for greedy strategies.
                       If set, every *steep_opt_rate* mappings, a simple
                       optimization is executed with the goal of
                       avoiding local minima. The optimization
                       iteratively checks all possible swaps of mappings
                       within their respective chem groups and accepts
                       swaps that improve lDDT score. Iteration stops as
                       soon as no improvement can be achieved anymore.
:type steep_opt_rate: :class:`int`
:param block_seed_size: Param for *greedy_block* strategy - Initial seeds
                        are extended by that number of chains.
:type block_seed_size: :class:`int`
:param block_blocks_per_chem_group: Param for *greedy_block* strategy -
                                    Number of blocks per chem group that
                                    are extended in an initial search
                                    for high scoring local solutions.
:type block_blocks_per_chem_group: :class:`int`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:returns: A :class:`MappingResult`

Definition at line 817 of file chain_mapping.py.

◆ GetMapping()

GetMapping (   self,
  model,
  n_max_naive = 40320 
)
 Convenience function to get mapping with currently preferred method

If number of possible chain mappings is <= *n_max_naive*, a naive
QS-score mapping is performed and optimal QS-score is guaranteed.
For anything else, a QS-score mapping with the greedy_full strategy is
performed (greedy_prune_contact_map = True). The default for
*n_max_naive* of 40320 corresponds to an octamer (8!=40320). A
structure with stoichiometry A6B2 would be 6!*2!=1440 etc.

If :attr:`~target` has nucleotide sequences, the QS-score target
function is replaced with a backbone only lDDT score that has
an inclusion radius of 30A. 

Definition at line 1471 of file chain_mapping.py.

◆ GetNMappings()

GetNMappings (   self,
  model 
)
 Returns number of possible mappings

:param model: Model with chains that are mapped onto
              :attr:`chem_groups`
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`

Definition at line 1754 of file chain_mapping.py.

◆ GetQSScoreMapping()

GetQSScoreMapping (   self,
  model,
  contact_d = 12.0,
  strategy = "heuristic",
  block_seed_size = 5,
  block_blocks_per_chem_group = 5,
  heuristic_n_max_naive = 40320,
  steep_opt_rate = None,
  chem_mapping_result = None,
  greedy_prune_contact_map = True 
)
 Identify chain mapping based on QSScore

Scoring is based on CA/C3' positions which are present in all chains of
a :attr:`chem_groups` as well as the *model* chains which are mapped to
that respective chem group.

The following strategies are available:

* **naive**: Naively iterate all possible mappings and return best based
             on QS score.

* **greedy_full**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  retain the mapping leading to the best QS-score. 

* **greedy_block**: try multiple seeds for greedy extension, i.e. try
  all ref/mdl chain combinations within the respective chem groups and
  extend them to *block_seed_size*. *block_blocks_per_chem_group*
  for each chem group are selected for exhaustive extension.

* **heuristic**: Uses *naive* strategy if number of possible mappings
  is within *heuristic_n_max_naive*. The default of 40320 corresponds
  to an octamer (8!=40320). A structure with stoichiometry A6B2 would be
  6!*2!=1440 etc. If the number of possible mappings is larger,
  *greedy_full* is used.

Sets :attr:`MappingResult.opt_score` in case of no trivial one-to-one
mapping.

:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param contact_d: Max distance between two residues to be considered as 
                  contact in qs scoring
:type contact_d: :class:`float` 
:param strategy: Strategy for sampling, must be in ["naive",
                 greedy_full", "greedy_block"]
:type strategy: :class:`str`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:param greedy_prune_contact_map: Relevant for all strategies that use
                                 greedy extensions. If True, only chains
                                 with at least 3 contacts (8A CB
                                 distance) towards already mapped chains
                                 in trg/mdl are considered for
                                 extension. All chains that give a
                                 potential non-zero QS-score increase
                                 are used otherwise (at least one
                                 contact within 12A). The consequence
                                 is reduced runtime and usually no
                                 real reduction in accuracy.
:returns: A :class:`MappingResult`

Definition at line 964 of file chain_mapping.py.

◆ GetRepr()

GetRepr (   self,
  substructure,
  model,
  topn = 1,
  inclusion_radius = 15.0,
  thresholds = [0.5, 1.0, 2.0, 4.0],
  bb_only = False,
  only_interchain = False,
  chem_mapping_result = None,
  global_mapping = None 
)
 Identify *topn* representations of *substructure* in *model*

*substructure* defines a subset of :attr:`~target` for which one
wants the *topn* representations in *model*. Representations are scored
and sorted by lDDT.

:param substructure: A :class:`ost.mol.EntityView` which is a subset of
                     :attr:`~target`. Should be selected with the
                     OpenStructure query language. Example: if you're
                     interested in residues with number 42,43 and 85 in
                     chain A:
                     ``substructure=mapper.target.Select("cname=A and rnum=42,43,85")``
                     A :class:`RuntimeError` is raised if *substructure*
                     does not refer to the same underlying
                     :class:`ost.mol.EntityHandle` as :attr:`~target`.
:type substructure: :class:`ost.mol.EntityView`
:param model: Structure in which one wants to find representations for
              *substructure*
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param topn: Max number of representations that are returned
:type topn: :class:`int`
:param inclusion_radius: Inclusion radius for lDDT
:type inclusion_radius: :class:`float`
:param thresholds: Thresholds for lDDT
:type thresholds: :class:`list` of :class:`float`
:param bb_only: Only consider backbone atoms in lDDT computation
:type bb_only: :class:`bool`
:param only_interchain: Only score interchain contacts in lDDT. Useful
                        if you want to identify interface patches.
:type only_interchain: :class:`bool`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:param global_mapping: Pro param. Specify a global mapping result. This
                       fully defines the desired representation in the
                       model but extracts it and enriches it with all
                       the nice attributes of :class:`ReprResult`.
                       The target attribute in *global_mapping* must be
                       of the same entity as self.target and the model
                       attribute of *global_mapping* must be of the same
                       entity as *model*.
:type global_mapping: :class:`MappingResult`
:returns: :class:`list` of :class:`ReprResult`

Definition at line 1494 of file chain_mapping.py.

◆ GetRMSDMapping()

GetRMSDMapping (   self,
  model,
  strategy = "heuristic",
  subsampling = 50,
  chem_mapping_result = None,
  heuristic_n_max_naive = 120 
)
Identify chain mapping based on minimal RMSD superposition

Superposition and scoring is based on CA/C3' positions which are present
in all chains of a :attr:`chem_groups` as well as the *model*
chains which are mapped to that respective chem group.

The following strategies are available:

* **naive**: Naively iterate all possible mappings and return the one
  with lowes RMSD.

* **greedy_single**: perform all vs. all single chain superpositions
  within the respective ref/mdl chem groups to use as starting points.
  For each starting point, iteratively add the model/target chain pair
  with lowest RMSD until a full mapping is achieved. The mapping with
  lowest RMSD is returned.

* **greedy_single_centroid**: Same as *greedy_single* but iteratively
  adds model/target chain pairs with lowest centroid distance. The
  mapping with the lowest centroid RMSD is returned. This is mainly for
  evaluation purposes. One known failure mode is that intertwined
  structures may have multiple centroids sitting on top of each other.
  RMSD and thus *greedy_single* are better able to disentangle this
  situation.

* **greedy_iterative**: Same as greedy_single_rmsd exept that the
  transformation gets updated with each added chain pair.

* **heuristic**: Uses *naive* strategy if number of possible mappings
  is within *heuristic_n_max_naive*. The default of 120 corresponds
  to a homo-pentamer (5!=120). If the number of possible mappings is
  larger, *greedy_iterative* is used.

:param model: Model to map
:type model: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param strategy: Strategy for sampling. Must be in ["naive",
                 "greedy_single", "greedy_iterative"]
:type strategy: :class:`str`
:param subsampling: If given, only an equally distributed subset
                    of CA/C3' positions in each chain are used for
                    superposition/scoring.
:type subsampling: :class:`int`
:param chem_mapping_result: Pro param. The result of
                            :func:`~GetChemMapping` where you provided
                            *model*. If set, *model* parameter is not
                            used.
:type chem_mapping_result: :class:`tuple`
:returns: A :class:`MappingResult`

Definition at line 1093 of file chain_mapping.py.

◆ NWAlign()

NWAlign (   self,
  s1,
  s2,
  stype 
)
 Access to internal sequence alignment functionality

Performs Needleman-Wunsch alignment with parameterization
setup at ChainMapper construction

:param s1: First sequence to align
:type s1: :class:`ost.seq.SequenceHandle`
:param s2: Second sequence to align
:type s2: :class:`ost.seq.SequenceHandle`
:param stype: Type of sequences to align, must be in
              [:class:`ost.mol.ChemType.AMINOACIDS`,
              :class:`ost.mol.ChemType.NUCLEOTIDES`]
:returns: Pairwise alignment of s1 and s2

Definition at line 1875 of file chain_mapping.py.

◆ polynuc_seqs()

polynuc_seqs (   self)
Sequences of nucleotide chains in :attr:`~target`

:type: :class:`ost.seq.SequenceList`

Definition at line 702 of file chain_mapping.py.

◆ polypep_seqs()

polypep_seqs (   self)
Sequences of peptide chains in :attr:`~target`

:type: :class:`ost.seq.SequenceList`

Definition at line 694 of file chain_mapping.py.

◆ ProcessStructure()

ProcessStructure (   self,
  ent 
)
 Entity processing for chain mapping

* Selects view containing peptide and nucleotide residues which have 
  required backbone atoms present - for peptide residues thats
  N, CA, C and CB (no CB for GLY), for nucleotide residues thats
  O5', C5', C4', C3' and O3'.
* filters view by chain lengths, see *min_pep_length* and
  *min_nuc_length* in constructor
* Extracts atom sequences for each chain in that view
* If residue number alignments are used, strictly increasing residue
  numbers without insertion codes are ensured in each chain

:param ent: Entity to process
:type ent: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:returns: Tuple with 3 elements: 1) :class:`ost.mol.EntityView`
          containing peptide and nucleotide residues 2)
          :class:`ost.seq.SequenceList` containing ATOMSEQ sequences
          for each polypeptide chain in returned view
          3) same for polynucleotide chains

Definition at line 1765 of file chain_mapping.py.

◆ ResNumAlign()

ResNumAlign (   self,
  s1,
  s2,
  s1_ent,
  s2_ent 
)
 Access to internal sequence alignment functionality

Performs residue number based alignment. Residue numbers are extracted
from *s1_ent*/*s2_ent*.

:param s1: First sequence to align
:type s1: :class:`ost.seq.SequenceHandle`
:param s2: Second sequence to align
:type s2: :class:`ost.seq.SequenceHandle`
:param s1_ent: Structure as source for residue numbers in *s1*. Must
               contain a chain named after sequence name in *s1*. This
               chain must have the exact same number of residues as
               characters in s1.
:type s1_ent: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`
:param s2_ent: Same for *s2*.
:type s2_ent: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`

:returns: Pairwise alignment of s1 and s2

Definition at line 1942 of file chain_mapping.py.

◆ SEQRESAlign()

SEQRESAlign (   self,
  seqres,
  s,
  s_ent 
)
 Access to internal sequence alignment functionality

Performs alignment on SEQRES. Residue numbers for *s* are
extracted from *s_ent*. Residue numbers start from 1, i.e.
1 aligns to the first element in *seqres*.

:param seqres: SEQRES sequence
:type seqres: :class:`ost.seq.SequenceHandle`
:param s: Sequence to align
:type s: :class:`ost.seq.SequenceHandle`
:param s_ent: Structure which must contain a chain of same name as *s*.
              This chain must have the exact same number of residues
              as characters in *s*.
:type s_ent: :class:`ost.mol.EntityView`/:class:`ost.mol.EntityHandle`

Definition at line 1902 of file chain_mapping.py.

◆ target()

target (   self)
Target structure that only contains peptides/nucleotides

Contains only residues that have the backbone representatives
(CA for peptide and C3' for nucleotides) to avoid ATOMSEQ alignment
inconsistencies when switching between all atom and backbone only
representations.

:type: :class:`ost.mol.EntityView`

Definition at line 681 of file chain_mapping.py.

Field Documentation

◆ _chem_group_alignments

_chem_group_alignments
protected

Definition at line 637 of file chain_mapping.py.

◆ _chem_group_ref_seqs

_chem_group_ref_seqs
protected

Definition at line 638 of file chain_mapping.py.

◆ _chem_group_types

_chem_group_types
protected

Definition at line 639 of file chain_mapping.py.

◆ _chem_groups

_chem_groups
protected

Definition at line 636 of file chain_mapping.py.

◆ _polynuc_seqs

_polynuc_seqs
protected

Definition at line 642 of file chain_mapping.py.

◆ _polypep_seqs

_polypep_seqs
protected

Definition at line 642 of file chain_mapping.py.

◆ _target

_target
protected

Definition at line 642 of file chain_mapping.py.

◆ chem_group_alignments

chem_group_alignments

Definition at line 906 of file chain_mapping.py.

◆ chem_group_types

chem_group_types

Definition at line 788 of file chain_mapping.py.

◆ chem_groups

chem_groups

Definition at line 914 of file chain_mapping.py.

◆ mdl_map_nuc_seqid_thr

mdl_map_nuc_seqid_thr

Definition at line 624 of file chain_mapping.py.

◆ mdl_map_pep_seqid_thr

mdl_map_pep_seqid_thr

Definition at line 623 of file chain_mapping.py.

◆ min_nuc_length

min_nuc_length

Definition at line 621 of file chain_mapping.py.

◆ min_pep_length

min_pep_length

Definition at line 620 of file chain_mapping.py.

◆ n_max_naive

n_max_naive

Definition at line 622 of file chain_mapping.py.

◆ nuc_gap_ext

nuc_gap_ext

Definition at line 630 of file chain_mapping.py.

◆ nuc_gap_open

nuc_gap_open

Definition at line 629 of file chain_mapping.py.

◆ nuc_seqid_thr

nuc_seqid_thr

Definition at line 619 of file chain_mapping.py.

◆ nuc_subst_mat

nuc_subst_mat

Definition at line 628 of file chain_mapping.py.

◆ pep_gap_ext

pep_gap_ext

Definition at line 627 of file chain_mapping.py.

◆ pep_gap_open

pep_gap_open

Definition at line 626 of file chain_mapping.py.

◆ pep_seqid_thr

pep_seqid_thr

Definition at line 618 of file chain_mapping.py.

◆ pep_subst_mat

pep_subst_mat

Definition at line 625 of file chain_mapping.py.

◆ polynuc_seqs

polynuc_seqs

Definition at line 1485 of file chain_mapping.py.

◆ resnum_alignments

resnum_alignments

Definition at line 617 of file chain_mapping.py.

◆ seqres

seqres

Definition at line 631 of file chain_mapping.py.

◆ seqres_set

seqres_set

Definition at line 633 of file chain_mapping.py.

◆ target

target

Definition at line 919 of file chain_mapping.py.

◆ trg_seqres_mapping

trg_seqres_mapping

Definition at line 632 of file chain_mapping.py.


The documentation for this class was generated from the following file: