You are reading the documentation for version 3.2 of ProMod3.
You may also want to read the documentation for:
1.3
2.0
2.1
3.0
3.1
3.3
Handling Loop Candidates
For convenience, we provide the LoopCandidates class as a container of
loop candidates with consistent length and sequence among them. It can either be
filled manually or generated using static filling functions using the
functionality from the FragDB or Monte Carlo algorithms.
Once it contains candidates you can apply closing, scoring or clustering
algorithms on all loop candidates. Example:
from ost import io
from promod3 import loop, scoring, modelling
# let's load a crambin structure from the pdb
crambin = io.LoadPDB('data/1CRN.pdb')
SEQRES = ''.join([r.one_letter_code for r in crambin.residues])
# this is the sequence we want to remodel
loop_seq = SEQRES[23:31]
# let's define the stem residues
n_stem = crambin.residues[23]
c_stem = crambin.residues[30]
# we use the StructureDB as source for structural information
structure_db = loop.LoadStructureDB()
# the FragDB allows to access the StructureDB based on geometric
# features of the loop stem residue
frag_db = loop.LoadFragDB()
# the LoopCandidates allow to handle several loops at once
# we directly want to find potential loop candidates from the
# previously loaded databases
loop_candidates = modelling.LoopCandidates.FillFromDatabase(\
n_stem, c_stem, loop_seq, frag_db, structure_db)
# candidates usually don't match exactly the required stem coords.
# CCD (Cyclic Coordinate Descent) is one way to enforce this match.
loop_candidates.ApplyCCD(n_stem, c_stem)
# setup backbone scorer with clash and cbeta scoring
score_env = scoring.BackboneScoreEnv(SEQRES)
score_env.SetInitialEnvironment(crambin)
scorer = scoring.BackboneOverallScorer()
scorer["cbeta"] = scoring.LoadCBetaScorer()
scorer["clash"] = scoring.ClashScorer()
scorer.AttachEnvironment(score_env)
# the scorer can then be used on the LoopCandidates object to
# calculate desired scores (here: cbeta & clash, start resnum = 24)
bb_scores = modelling.ScoreContainer()
loop_candidates.CalculateBackboneScores(bb_scores, scorer, score_env,
["cbeta", "clash"], 24)
# we finally perform a weighted linear combination of the scores,
# pick the best one, insert it into our structure and save it
weights = {"cbeta": 1, "clash": 1}
scores = bb_scores.LinearCombine(weights)
min_score = min(scores)
min_candidate = scores.index(min_score)
bb_list = loop_candidates[min_candidate]
bb_list.InsertInto(crambin.chains[0], n_stem.GetNumber())
io.SavePDB(crambin, "modified_crambin.pdb")
The LoopCandidates class
-
class
promod3.modelling. LoopCandidates (seq)
Initializes an empty container of loop candidates.
Candidates can be accessed and iterated as if it was a list of
BackboneList .
Parameters: | seq (str ) – The sequence being enforced for all candidates |
-
static
FillFromDatabase (n_stem, c_stem, seq, frag_db, structural_db, extended_search=False)
Searches for loop candidates matching the length (number of residues in
seq) and geometry (of n_stem and c_stem) of the loop to be modelled in
a fragment database.
This call also adds fragment infos (HasFragmentInfos() will be True).
Parameters: |
- n_stem (
ost.mol.ResidueHandle ) – The residue at the N-terminal end of the loop
- c_stem (
ost.mol.ResidueHandle ) – The residue at the C-terminal end of the loop
- seq (
str ) – The sequence of residues to be added including the
n_stem and c_stem
- frag_db (
FragDB ) – The fragment database
- structural_db (
StructureDB ) – The according structural database
- extended_search (
bool ) – Whether search should be extended to fragments
matching the geometry of the n_stem and c_stem
a bit less precisely.
|
Returns: | A list of loop candidates
|
Return type: | LoopCandidates
|
Raises: | RuntimeError if stems do no contain N, CA and C
atoms.
|
-
static
FillFromMonteCarloSampler (seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)
-
static
FillFromMonteCarloSampler (initial_bb, seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)
Uses Monte Carlo simulated annealing to sample the loop to be modelled.
If initial_bb is given, every Monte Carlo run starts from that configuration.
Warning
The BackboneScoreEnv
attached to scorer will be altered in the specified stretch.
You might consider the Stash / Pop mechanism of the
BackboneScoreEnv to restore to the
original state once the sampling is done.
Parameters: |
- initial_bb (
BackboneList ) – Initial configuration used for the sampling
- seq (
str ) – The sequence of residues to be sampled
- num_loops (
int ) – Number of loop candidates to return
- steps (
int ) – Number of MC steps to perform for each loop candidate
generated
- sampler (Sampler Object) – Used to generate a new configuration at each MC step
- closer (Closer Object) – Used to close the loop after each MC step
- scorer (Scorer Object) – Used to score the generated configurations at each MC step
- cooler (Cooler Object) – Controls the temperature profile of the simulated annealing
- random_seed (
int ) – Seed to feed the random number generator for
accepting/rejecting proposed monte carlo steps.
For every monte carlo run, the random number generator
gets refreshed and this seed gets increased by 1.
|
Returns: | The resulting candidates
|
Return type: | LoopCandidates
|
Raises: | RuntimeError , if initial_bb is not given and
the Monte Carlo sampler failed to initialize (i.e. stems are too
far apart) or if initial_bb is given and it is inconsistent with
seq.
|
-
GetSequence ()
Returns: | The Sequence |
Return type: | str |
-
ApplyCCD (n_stem, c_stem, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)
-
ApplyCCD (n_stem, c_stem, torsion_sampler, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)
-
ApplyCCD (n_stem, c_stem, torsion_samplers, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)
Closes all loop candidates in this container using the CCD algorithm (i.e.
modifies the candidates so that its stem residues match those of n_stem
and c_stem). CCD (cyclic coordinate descent, see CCD ) is an
iterative minimization algorithm.
If torsion_sampler is given, it is used at each step of the closing to
calculate the probability of the proposed move, which is then accepted or
not depending on a metropolis criterium.
Parameters: |
- n_stem (
ost.mol.ResidueHandle ) – Residue defining the n-stem positions every candidate
should match. See CCD() .
- c_stem (
ost.mol.ResidueHandle ) – Residue defining the c-stem positions every candidate
should match. See CCD() .
- torsion_sampler (
TorsionSampler / list
of TorsionSampler ) – A torsion sampler (used for all residues) or a list
of samplers (one per residue).
- max_iterations (
int ) – Maximum number of iterations
- rmsd_cutoff (
float ) – Cutoff in stem residue RMSD used to determine
convergence
- keep_non_converged (
bool ) – Whether to keep loop candidates for which the
closing did not converge
- random_seed (
int ) – seed for random number generator used to
accept/reject moves in CCD algorithm
|
Returns: | An index for each kept candidate corresponding to the candidate
index before this function was called. This is useful to keep track
of scores and other data extracted before.
|
Return type: | list of int
|
-
ApplyKIC (n_stem, c_stem, pivot_one, pivot_two, pivot_three)
Closes all loop candidates in this container (i.e. modifies the candidates
so that its stem residues match those of n_stem and c_stem, which are
the stem residues of the loop being modelled), using the KIC algorithm. This
algorithm finds analytical solutions for closing the loop by modifying the
torsion angles of just three pivot residues. Due to the underlying
mathematical formalism in KIC, up to 16 solutions can be found for every
candidate. This leads to an increase in number of loops.
Parameters: |
- n_stem (
ost.mol.ResidueHandle ) – Residue defining the n-stem positions every candidate
should match
- c_stem (
ost.mol.ResidueHandle ) – Residue defining the c-stem positions every candidate
should match
- pivot_one (
int ) – First pivot residue
- pivot_two (
int ) – Second pivot residue
- pivot_three (
int ) – Third pivot residue
|
Returns: | An index for each added candidate corresponding to the original
candidate index before this function was called (each candidate can
lead to multiple or no candidates after KIC is applied). This is
useful to keep track of scores and other data extracted before.
|
Return type: | list of int
|
-
CalculateBackboneScores (score_container, scorer, scorer_env, start_resnum, chain_idx=0)
-
CalculateBackboneScores (score_container, scorer, scorer_env, keys, start_resnum, chain_idx=0)
-
CalculateBackboneScores (score_container, mhandle, start_resnum, chain_idx=0)
-
CalculateBackboneScores (score_container, mhandle, keys, start_resnum, chain_idx=0)
-
CalculateAllAtomScores (score_container, mhandle, start_resnum, chain_idx=0)
-
CalculateAllAtomScores (score_container, mhandle, keys, start_resnum, chain_idx=0)
Calculate backbone / all-atom scores for each loop candidate.
Note that (unless otherwise noted) a lower “score” is better!
The computed scores are in the same same order as the candidates in here.
Parameters: |
|
Raises: | RuntimeError if IsAllAtomScoringSetUp()
is False, if keys has a key for which no scorer exists or if
anything is raised when calculating the scores.
|
-
CalculateSequenceProfileScores (score_container, structure_db, prof, offset=0)
-
CalculateStructureProfileScores (score_container, structure_db, prof, offset=0)
Calculates a score comparing the given profile prof starting at offset
with the sequence / structure profile of each candidate as extracted from
structure_db (see ost.seq.ProfileHandle.GetAverageScore() for
details, prof.null_model is used for weighting).
Note that for profile scores a higher “score” is better! So take care when
combining this to other scores, where it is commonly the other way around.
This requires that each candidate has a connected fragment info into the
given structure_db (e.g. FillFromDatabase() must have been called
with this DB).
The computed scores are in the same same order as the candidates in here.
Parameters: |
|
Raises: | RuntimeError if HasFragmentInfos() is
False, if structure_db is incompatible with the stored fragment
infos or if prof has less than offset+len elements (len =
length of loops stored in here).
|
-
CalculateStemRMSDs (score_container, n_stem, c_stem)
Calculates RMSD between the given stems and the first and last residue of
the loop candidates. This first superposes the first loop residue with
n_stem and then computes the RMSD between the last residue’s N, CA, C
positions and the corresponding atoms in c_stem.
Note that this score is only useful before calling ApplyCCD() or
ApplyKIC() .
The computed scores are in the same same order as the candidates in here.
Parameters: |
|
Raises: | RuntimeError if stems do no contain N, CA and C
atoms.
|
-
CalculateSequenceProfileScores (structure_db, prof, offset=0)
-
CalculateStructureProfileScores (structure_db, prof, offset=0)
-
CalculateStemRMSDs (n_stem, c_stem)
Same as the score_container variant above, but here we directly return the
score vector instead of storing it in a container.
Returns: | Score for each candidate (same order as candidates in here). |
Return type: | list of float |
-
Add (bb_list)
Parameters: | bb_list (BackboneList ) – The loop candidate to be added. |
Raises: | RuntimeError if sequence of bb_list is not
consistent with internal sequence |
-
AddFragmentInfo (fragment)
Adds a fragment info for the last candidate added with Add() . Note
that these infos are added automatically with FillFromDatabase() and
updated whenever the candidates are copied, removed, clustered etc.. So you
will probably never need this function.
Parameters: | fragment (FragmentInfo ) – The fragment info to add. |
-
Remove (index)
Remove a loop candidate from the list of candidates.
Parameters: | index (int ) – The index of the candidate that will be removed |
Raises: | RuntimeError if index is out of bounds. |
-
HasFragmentInfos ()
Returns: | True, if each loop candidate has a connected fragment info. |
Return type: | bool |
-
GetFragmentInfo (index)
Returns: | Fragment info connected to loop candidate at given index. |
Return type: | FragmentInfo |
Parameters: | index (int ) – The index of the candidate. |
Raises: | RuntimeError if HasFragmentInfos() is
False or if index is out of bounds. |
-
Extract (indices)
Returns: | Container with candidates with given indices. |
Return type: | LoopCandidates |
Parameters: | indices (list of int ) – Candidate indices to extract. |
Raises: | RuntimeError if any index is out of bounds. |
-
GetClusters (max_dist, superposed_rmsd = False)
Clusters the loop candidates according to their pairwise CA RMSD using an
agglomerative hierarchical clustering algorithm. Every candidate gets
assigned a unique cluster. At every clustering step, the two clusters with
shortest distance get merged, with the distance definition being the average
CA RMSD between any of the members of the two clusters.
Parameters: |
- max_dist (
float ) – Maximal distance two clusters can have to be merged
- superposed_rmsd (
bool ) – Whether to calculate the RMSD based on a minimal
RMSD superposition or simply consider the current
positions
|
Returns: | Lists of loop candidate indices into this container
|
Return type: | list of list of int
|
-
GetClusteredCandidates (max_dist, neglect_size_one=True, superposed_rmsd=False)
Returns: | List of loop candidates clustered as in GetClusters() .
|
Return type: | list of LoopCandidates
|
Parameters: |
- max_dist (
float ) – Maximal distance two clusters can have to be merged
- neglect_size_one (
bool ) – Whether clusters should be added to the returned
list if they only contain one loop candidate
- superposed_rmsd (
bool ) – Whether to calculate the RMSD based on a minimal
RMSD superposition or simply consider the current
positions
|
-
GetLargestCluster (max_dist, superposed_rmsd = False)
Instead of performing a full clustering, the algorithm simply searches for
the candidate with the most other candidates having a CA RMSD below
max_dist. This candidate then serves as centroid for the return
cluster.
Parameters: |
- max_dist (
float ) – Maximal CA RMSD to cluster centroid
- superposed_rmsd (
bool ) – Whether to calculate the RMSD based on a minimal
RMSD superposition or simply consider the current
positions
|
Returns: | Largest possible cluster with all members having a
CA RMSD below max_dist to cluster centroid.
|
Return type: | LoopCandidates
|
Keeping track of loop candidate scores
Two helper classes are used to keep track and combine different scores computed
on loop candidates.
-
class
promod3.modelling. ScoreContainer
Container to keep vectors of scores (one for each loop candidate) for each
scorer (one vector for each single scorer). Each score vector is guaranteed
to have the same number of values.
-
IsEmpty ()
Returns: | True, if no loop candidates have been scored with any scorer yet. |
Return type: | bool |
-
Contains (key)
Returns: | True, if a score vector for this key was already added. |
Return type: | bool |
Parameters: | key (str ) – Key for desired scorer. |
-
Get (key)
Returns: | Score vector for the given key. |
Return type: | list of GetNumCandidates() float |
Parameters: | key (str ) – Key for desired score vector. |
Raises: | RuntimeError if there are no scores for that
key. |
-
Set (key, scores)
Parameters: |
- key (
str ) – Set scores for that key.
- scores (
list of float ) – Score vector to set.
|
Raises: | RuntimeError if this container contains other
score vectors with a different number of entries.
|
-
GetNumCandidates ()
Returns: | Number of loop candidates that are scored here. This is the length
of each score vector in this container. |
Return type: | int |
-
LinearCombine (linear_weights)
Returns: | Weighted, linear combination of scores. |
Return type: | list of GetNumCandidates() float |
Parameters: | linear_weights (dict (keys: str ,
values: float )) – Weights for each scorer. |
Raises: | RuntimeError if linear_weights has a key for
which no scores were added. |
-
Copy ()
-
Returns: | Container with scores for a subset of loop candidates. |
Return type: | ScoreContainer |
Parameters: | indices (list of int ) – List of loop candidate indices to pick
(in [0, GetNumCandidates() -1]) |
Raises: | RuntimeError if any index is out of bounds. |
-
Extend (other)
Extend each score vector with the score vector of other (must have
matching keys).
Parameters: | other (ScoreContainer ) – Score container to be added to this one. |
-
class
promod3.modelling. ScoringWeights
Globally accessible set of weights to be used in scoring. This also defines
a consistent naming of keys used for backbone and all atom scorers as set up
by SetupDefaultBackboneScoring() and SetupDefaultAllAtomScoring() .
Different sets of weights are available. You can get general weights
that have been optimized for a non redundant set of loops with lengths [3,14]
(including stem residues). The alternative is to get weights that have only
been optimized on a length specific subset of loops. By default you get
different weights for following lengths: [0,1,2,3,4], [5,6], [7,8], [9,10],
[11,12], [13,14].
If you choose to modify the weights, please ensure to set consistently named
keys in here and to use consistently named scorers and scoring computations!
-
static
GetWeights (with_db=False, with_aa=False, length_dependent=False, loop_length=-1)
Returns: | Named weights to be used when scoring loop candidates. The default
weights were optimized to give the best performance when choosing
the loop candidate with the lowest combined score. Each set of
weights includes (different) backbone scoring weights, that are
optionally only trained on a subset of loops with corresponding
loop length.
|
Return type: | dict (keys: str , values: float )
|
Parameters: |
- with_db (
bool ) – True to choose a set of weights including DB specific scores
(stem RMSD and profile scores)
- with_aa (
bool ) – True to choose a set of weights including all atom scores
- length_dependent (
bool ) – Whether to use general weights or their length
length dependent counterparts.
- loop_length (
int ) – Length of full loop. If no weights are available for
this length or when loop_length is -1, the general
weights are returned.
|
-
static
SetWeights (with_db, with_aa, weights, length_dependent=False, loop_length=-1)
Overwrite a set of weights as returned by GetWeights() .
-
static
GetBackboneWeights (with_db=False, with_aa=False, length_dependent=False, loop_length=-1)
Returns: | Subset of GetWeights() for backbone scoring as in
scoring.BackboneOverallScorer.CalculateLinearCombination() .
|
Return type: | dict (keys: str , values: float )
|
Parameters: |
|
-
static
GetAllAtomWeights (with_db=False, length_dependent=False, loop_length=-1)
Returns: | Subset of GetWeights() for all atom scoring as in
scoring.AllAtomOverallScorer.CalculateLinearCombination() .
|
Return type: | dict (keys: str , values: float )
|
Parameters: |
|
-
static
GetStemRMSDsKey ()
-
static
GetSequenceProfileScoresKey ()
-
static
GetStructureProfileScoresKey ()
Returns: | Default key for stem RMSD / sequence profile / structure profile
scores. |
Return type: | str |
-
static
SetStemRMSDsKey (key)
-
static
SetSequenceProfileScoresKey (key)
-
static
SetStructureProfileScoresKey (key)
Parameters: | key (str ) – New default key for stem RMSD / sequence profile / structure
profile scores. |
-
static
GetBackboneScoringKeys ()
-
static
GetAllAtomScoringKeys ()
Returns: | List of backbone / all-atom scorers to be computed for any set of
weights defined in here. |
Return type: | list of str |
-
static
SetBackboneScoringKeys (keys)
-
static
SetAllAtomScoringKeys (keys)
Parameters: | keys (list of str ) – New list of backbone / all-atom scorers to be computed for any
set of weights defined in here. |
Example: loop scoring in modelling
In the example below, we show how we find and choose a loop candidate to close a
gap for a model. This shows the combined usage of ModellingHandle to
keep a consistent modelling environment, LoopCandidates with its
scoring routines, ScoreContainer for keeping track of scores and
ScoringWeights to combine scores:
from ost import io, seq
from promod3 import modelling, loop
# setup raw model
tpl = io.LoadPDB('data/1crn_cut.pdb')
seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN'
seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA------GCIIIPGATCPGDYAN'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
print(("Number of gaps in raw model: %d" % len(mhandle.gaps)))
# setup default scorers for modelling handle
modelling.SetupDefaultBackboneScoring(mhandle)
modelling.SetupDefaultAllAtomScoring(mhandle)
# setup databases
frag_db = loop.LoadFragDB()
structure_db = loop.LoadStructureDB()
torsion_sampler = loop.LoadTorsionSamplerCoil()
# get data for gap to close
gap = mhandle.gaps[0].Copy()
print(("Gap to close: %s" % str(gap)))
n_stem = gap.before
c_stem = gap.after
start_resnum = n_stem.GetNumber().GetNum()
start_idx = start_resnum - 1 # res. num. starts at 1
# get loop candidates from FragDB
candidates = modelling.LoopCandidates.FillFromDatabase(\
n_stem, c_stem, gap.full_seq, frag_db, structure_db)
print(("Number of loop candidates: %d" % len(candidates)))
# all scores will be kept in a score container which we update
all_scores = modelling.ScoreContainer()
# the keys used to identify scores are globally defined
print(("Stem RMSD key = '%s'" \
% modelling.ScoringWeights.GetStemRMSDsKey()))
print(("Profile keys = ['%s', '%s']" \
% (modelling.ScoringWeights.GetSequenceProfileScoresKey(),
modelling.ScoringWeights.GetStructureProfileScoresKey())))
print(("Backbone scoring keys = %s" \
% str(modelling.ScoringWeights.GetBackboneScoringKeys())))
print(("All atom scoring keys = %s" \
% str(modelling.ScoringWeights.GetAllAtomScoringKeys())))
# get stem RMSDs for each candidate (i.e. how well does it fit?)
# -> this must be done before CCD to be meaningful
candidates.CalculateStemRMSDs(all_scores, n_stem, c_stem)
# close the candidates with CCD
orig_indices = candidates.ApplyCCD(n_stem, c_stem, torsion_sampler)
print(("Number of closed loop candidates: %d" % len(candidates)))
# get subset of previously computed scores
all_scores = all_scores.Extract(orig_indices)
# add profile scores (needs profile for target sequence)
prof = io.LoadSequenceProfile("data/1CRNA.hhm")
candidates.CalculateSequenceProfileScores(all_scores, structure_db,
prof, start_idx)
candidates.CalculateStructureProfileScores(all_scores, structure_db,
prof, start_idx)
# add backbone scores
candidates.CalculateBackboneScores(all_scores, mhandle, start_resnum)
# add all atom scores
candidates.CalculateAllAtomScores(all_scores, mhandle, start_resnum)
# use default weights to combine scores
weights = modelling.ScoringWeights.GetWeights(with_db=True,
with_aa=True)
scores = all_scores.LinearCombine(weights)
# rank them (best = lowest "score")
arg_sorted_scores = sorted([(v,i) for i,v in enumerate(scores)])
print("Ranked candidates: score, index")
for v,i in arg_sorted_scores:
print(("%g, %d" % (v,i)))
# insert best into model, update scorers and clear gaps
best_candidate = candidates[arg_sorted_scores[0][1]]
modelling.InsertLoopClearGaps(mhandle, best_candidate, gap)
print(("Number of gaps in closed model: %d" % len(mhandle.gaps)))
io.SavePDB(mhandle.model, "model.pdb")
|
Contents
Search
Enter search terms or a module, class or function name.
Previous topic
Handling Gaps
Next topic
Fitting Loops Into Gaps
You are here
|