You are reading the documentation for version 1.3 of ProMod3.
You may also want to read the documentation for:
2.0
2.1
3.0
3.1
3.2
Handling Loop Candidates
For convenience, we provide the LoopCandidates class as a container of
loop candidates with consistent length and sequence among them. It can either be
filled manually or generated using static filling functions using the
functionality from the FragDB or Monte Carlo algorithms.
Once it contains candidates you can apply closing, scoring or clustering
algorithms on all loop candidates. Example:
from ost import io
from promod3 import loop, scoring, modelling
# let's load a crambin structure from the pdb
crambin = io.LoadPDB('data/1CRN.pdb')
SEQRES = ''.join([r.one_letter_code for r in crambin.residues])
# this is the sequence we want to remodel
loop_seq = SEQRES[23:31]
# let's define the stem residues
n_stem = crambin.residues[23]
c_stem = crambin.residues[30]
# we use the StructureDB as source for structural information
structure_db = loop.LoadStructureDB()
# the FragDB allows to access the StructureDB based on geometric
# features of the loop stem residue
frag_db = loop.LoadFragDB()
# the LoopCandidates allow to handle several loops at once
# we directly want to find potential loop candidates from the
# previously loaded databases
loop_candidates = modelling.LoopCandidates.FillFromDatabase(\
n_stem, c_stem, loop_seq, frag_db, structure_db)
# candidates usually don't match exactly the required stem coords.
# CCD (Cyclic Coordinate Descent) is one way to enforce this match.
loop_candidates.ApplyCCD(n_stem, c_stem)
# setup backbone scorer with clash and cbeta scoring
score_env = scoring.BackboneScoreEnv(SEQRES)
score_env.SetInitialEnvironment(crambin)
scorer = scoring.BackboneOverallScorer()
scorer["cbeta"] = scoring.LoadCBetaScorer()
scorer["clash"] = scoring.ClashScorer()
scorer.AttachEnvironment(score_env)
# the scorer can then be used on the LoopCandidates object to
# calculate desired scores (here: cbeta & clash, start resnum = 24)
bb_scores = modelling.ScoreContainer()
loop_candidates.CalculateBackboneScores(bb_scores, scorer, score_env,
["cbeta", "clash"], 24)
# we finally perform a weighted linear combination of the scores,
# pick the best one, insert it into our structure and save it
weights = {"cbeta": 1, "clash": 1}
scores = bb_scores.LinearCombine(weights)
min_score = min(scores)
min_candidate = scores.index(min_score)
bb_list = loop_candidates[min_candidate]
bb_list.InsertInto(crambin.chains[0], n_stem.GetNumber())
io.SavePDB(crambin, "modified_crambin.pdb")
The LoopCandidates class

class
promod3.modelling. LoopCandidates (seq)
Initializes an empty container of loop candidates.
Candidates can be accessed and iterated as if it was a list of
BackboneList .
Parameters:  seq (str ) – The sequence being enforced for all candidates 

static
FillFromDatabase (n_stem, c_stem, seq, frag_db, structural_db, extended_search=False)
Searches for loop candidates matching the length (number of residues in
seq) and geometry (of n_stem and c_stem) of the loop to be modelled in
a fragment database.
This call also adds fragment infos (HasFragmentInfos() will be True).
Parameters: 
 n_stem (
ost.mol.ResidueHandle ) – The residue at the Nterminal end of the loop
 c_stem (
ost.mol.ResidueHandle ) – The residue at the Cterminal end of the loop
 seq (
str ) – The sequence of residues to be added including the
n_stem and c_stem
 frag_db (
FragDB ) – The fragment database
 structural_db (
StructureDB ) – The according structural database
 extended_search (
bool ) – Whether search should be extended to fragments
matching the geometry of the n_stem and c_stem
a bit less precisely.

Returns:  A list of loop candidates

Return type:  LoopCandidates

Raises:  RuntimeError if stems do no contain N, CA and C
atoms.


static
FillFromMonteCarloSampler (seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)

static
FillFromMonteCarloSampler (initial_bb, seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)
Uses Monte Carlo simulated annealing to sample the loop to be modelled.
If initial_bb is given, every Monte Carlo run starts from that configuration.
Warning
The BackboneScoreEnv
attached to scorer will be altered in the specified stretch.
You might consider the Stash / Pop mechanism of the
BackboneScoreEnv to restore to the
original state once the sampling is done.
Parameters: 
 initial_bb (
BackboneList ) – Initial configuration used for the sampling
 seq (
str ) – The sequence of residues to be sampled
 num_loops (
int ) – Number of loop candidates to return
 steps (
int ) – Number of MC steps to perform for each loop candidate
generated
 sampler (Sampler Object) – Used to generate a new configuration at each MC step
 closer (Closer Object) – Used to close the loop after each MC step
 scorer (Scorer Object) – Used to score the generated configurations at each MC step
 cooler (Cooler Object) – Controls the temperature profile of the simulated annealing
 random_seed (
int ) – Seed to feed the random number generator for
accepting/rejecting proposed monte carlo steps.
For every monte carlo run, the random number generator
gets refreshed and this seed gets increased by 1.

Returns:  The resulting candidates

Return type:  LoopCandidates

Raises:  RuntimeError , if initial_bb is not given and
the Monte Carlo sampler failed to initialize (i.e. stems are too
far apart) or if initial_bb is given and it is inconsistent with
seq.


GetSequence ()
Returns:  The Sequence 
Return type:  str 

ApplyCCD (n_stem, c_stem, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)

ApplyCCD (n_stem, c_stem, torsion_sampler, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)

ApplyCCD (n_stem, c_stem, torsion_samplers, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)
Closes all loop candidates in this container using the CCD algorithm (i.e.
modifies the candidates so that its stem residues match those of n_stem
and c_stem). CCD (cyclic coordinate descent, see CCD ) is an
iterative minimization algorithm.
If torsion_sampler is given, it is used at each step of the closing to
calculate the probability of the proposed move, which is then accepted or
not depending on a metropolis criterium.
Parameters: 
 n_stem (
ost.mol.ResidueHandle ) – Residue defining the nstem positions every candidate
should match. See CCD() .
 c_stem (
ost.mol.ResidueHandle ) – Residue defining the cstem positions every candidate
should match. See CCD() .
 torsion_sampler (
TorsionSampler / list
of TorsionSampler ) – A torsion sampler (used for all residues) or a list
of samplers (one per residue).
 max_iterations (
int ) – Maximum number of iterations
 rmsd_cutoff (
float ) – Cutoff in stem residue RMSD used to determine
convergence
 keep_non_converged (
bool ) – Whether to keep loop candidates for which the
closing did not converge
 random_seed (
int ) – seed for random number generator used to
accept/reject moves in CCD algorithm

Returns:  An index for each kept candidate corresponding to the candidate
index before this function was called. This is useful to keep track
of scores and other data extracted before.

Return type:  list of int


ApplyKIC (n_stem, c_stem, pivot_one, pivot_two, pivot_three)
Closes all loop candidates in this container (i.e. modifies the candidates
so that its stem residues match those of n_stem and c_stem, which are
the stem residues of the loop being modelled), using the KIC algorithm. This
algorithm finds analytical solutions for closing the loop by modifying the
torsion angles of just three pivot residues. Due to the underlying
mathematical formalism in KIC, up to 16 solutions can be found for every
candidate. This leads to an increase in number of loops.
Parameters: 
 n_stem (
ost.mol.ResidueHandle ) – Residue defining the nstem positions every candidate
should match
 c_stem (
ost.mol.ResidueHandle ) – Residue defining the cstem positions every candidate
should match
 pivot_one (
int ) – First pivot residue
 pivot_two (
int ) – Second pivot residue
 pivot_three (
int ) – Third pivot residue

Returns:  An index for each added candidate corresponding to the original
candidate index before this function was called (each candidate can
lead to multiple or no candidates after KIC is applied). This is
useful to keep track of scores and other data extracted before.

Return type:  list of int


CalculateBackboneScores (score_container, scorer, scorer_env, start_resnum, chain_idx=0)

CalculateBackboneScores (score_container, scorer, scorer_env, keys, start_resnum, chain_idx=0)

CalculateBackboneScores (score_container, mhandle, start_resnum, chain_idx=0)

CalculateBackboneScores (score_container, mhandle, keys, start_resnum, chain_idx=0)

CalculateAllAtomScores (score_container, mhandle, start_resnum, chain_idx=0)

CalculateAllAtomScores (score_container, mhandle, keys, start_resnum, chain_idx=0)
Calculate backbone / allatom scores for each loop candidate.
Note that (unless otherwise noted) a lower “score” is better!
The computed scores are in the same same order as the candidates in here.

CalculateSequenceProfileScores (score_container, structure_db, prof, offset=0)

CalculateStructureProfileScores (score_container, structure_db, prof, offset=0)
Calculates a score comparing the given profile prof starting at offset
with the sequence / structure profile of each candidate as extracted from
structure_db (see ost.seq.ProfileHandle.GetAverageScore() for
details, prof.null_model is used for weighting).
Note that for profile scores a higher “score” is better! So take care when
combining this to other scores, where it is commonly the other way around.
This requires that each candidate has a connected fragment info into the
given structure_db (e.g. FillFromDatabase() must have been called
with this DB).
The computed scores are in the same same order as the candidates in here.
Parameters: 

Raises:  RuntimeError if HasFragmentInfos() is
False, if structure_db is incompatible with the stored fragment
infos or if prof has less than offset+len elements (len =
length of loops stored in here).


CalculateStemRMSDs (score_container, n_stem, c_stem)
Calculates RMSD between the given stems and the first and last residue of
the loop candidates. This first superposes the first loop residue with
n_stem and then computes the RMSD between the last residue’s N, CA, C
positions and the corresponding atoms in c_stem.
Note that this score is only useful before calling ApplyCCD() or
ApplyKIC() .
The computed scores are in the same same order as the candidates in here.
Parameters: 

Raises:  RuntimeError if stems do no contain N, CA and C
atoms.


CalculateSequenceProfileScores (structure_db, prof, offset=0)

CalculateStructureProfileScores (structure_db, prof, offset=0)

CalculateStemRMSDs (n_stem, c_stem)
Same as the score_container variant above, but here we directly return the
score vector instead of storing it in a container.
Returns:  Score for each candidate (same order as candidates in here). 
Return type:  list of float 

Add (bb_list)
Parameters:  bb_list (BackboneList ) – The loop candidate to be added. 
Raises:  RuntimeError if sequence of bb_list is not
consistent with internal sequence 

AddFragmentInfo (fragment)
Adds a fragment info for the last candidate added with Add() . Note
that these infos are added automatically with FillFromDatabase() and
updated whenever the candidates are copied, removed, clustered etc.. So you
will probably never need this function.
Parameters:  fragment (FragmentInfo ) – The fragment info to add. 

Remove (index)
Remove a loop candidate from the list of candidates.
Parameters:  index (int ) – The index of the candidate that will be removed 
Raises:  RuntimeError if index is out of bounds. 

HasFragmentInfos ()
Returns:  True, if each loop candidate has a connected fragment info. 
Return type:  bool 

GetFragmentInfo (index)


Extract (indices)
Returns:  Container with candidates with given indices. 
Return type:  LoopCandidates 
Parameters:  indices (list of int ) – Candidate indices to extract. 
Raises:  RuntimeError if any index is out of bounds. 

GetClusters (max_dist, superposed_rmsd = False)
Clusters the loop candidates according to their pairwise CA RMSD using an
agglomerative hierarchical clustering algorithm. Every candidate gets
assigned a unique cluster. At every clustering step, the two clusters with
shortest distance get merged, with the distance definition being the average
CA RMSD between any of the members of the two clusters.
Parameters: 
 max_dist (
float ) – Maximal distance two clusters can have to be merged
 superposed_rmsd (
bool ) – Whether to calculate the RMSD based on a minimal
RMSD superposition or simply consider the current
positions

Returns:  Lists of loop candidate indices into this container

Return type:  list of list of int


GetClusteredCandidates (max_dist, neglect_size_one=True, superposed_rmsd=False)
Returns:  List of loop candidates clustered as in GetClusters() .

Return type:  list of LoopCandidates

Parameters: 
 max_dist (
float ) – Maximal distance two clusters can have to be merged
 neglect_size_one (
bool ) – Whether clusters should be added to the returned
list if they only contain one loop candidate
 superposed_rmsd (
bool ) – Whether to calculate the RMSD based on a minimal
RMSD superposition or simply consider the current
positions


GetLargestCluster (max_dist, superposed_rmsd = False)
Instead of performing a full clustering, the algorithm simply searches for
the candidate with the most other candidates having a CA RMSD below
max_dist. This candidate then serves as centroid for the return
cluster.
Parameters: 
 max_dist (
float ) – Maximal CA RMSD to cluster centroid
 superposed_rmsd (
bool ) – Whether to calculate the RMSD based on a minimal
RMSD superposition or simply consider the current
positions

Returns:  Largest possible cluster with all members having a
CA RMSD below max_dist to cluster centroid.

Return type:  LoopCandidates

Keeping track of loop candidate scores
Two helper classes are used to keep track and combine different scores computed
on loop candidates.

class
promod3.modelling. ScoreContainer
Container to keep vectors of scores (one for each loop candidate) for each
scorer (one vector for each single scorer). Each score vector is guaranteed
to have the same number of values.

IsEmpty ()
Returns:  True, if no loop candidates have been scored with any scorer yet. 
Return type:  bool 

Contains (key)
Returns:  True, if a score vector for this key was already added. 
Return type:  bool 
Parameters:  key (str ) – Key for desired scorer. 

Get (key)
Returns:  Score vector for the given key. 
Return type:  list of GetNumCandidates() float 
Parameters:  key (str ) – Key for desired score vector. 
Raises:  RuntimeError if there are no scores for that
key. 

Set (key, scores)
Parameters: 
 key (
str ) – Set scores for that key.
 scores (
list of float ) – Score vector to set.

Raises:  RuntimeError if this container contains other
score vectors with a different number of entries.


GetNumCandidates ()
Returns:  Number of loop candidates that are scored here. This is the length
of each score vector in this container. 
Return type:  int 

LinearCombine (linear_weights)
Returns:  Weighted, linear combination of scores. 
Return type:  list of GetNumCandidates() float 
Parameters:  linear_weights (dict (keys: str ,
values: float )) – Weights for each scorer. 
Raises:  RuntimeError if linear_weights has a key for
which no scores were added. 

Copy ()

Returns:  Container with scores for a subset of loop candidates. 
Return type:  ScoreContainer 
Parameters:  indices (list of int ) – List of loop candidate indices to pick
(in [0, GetNumCandidates() 1]) 
Raises:  RuntimeError if any index is out of bounds. 

Extend (other)
Extend each score vector with the score vector of other (must have
matching keys).
Parameters:  other (ScoreContainer ) – Score container to be added to this one. 

class
promod3.modelling. ScoringWeights
Globally accessible set of weights to be used in scoring. This also defines
a consistent naming of keys used for backbone and all atom scorers as set up
by SetupDefaultBackboneScoring() and SetupDefaultAllAtomScoring() .
Different sets of weights are available. You can get general weights
that have been optimized for a non redundant set of loops with lengths [3,14]
(including stem residues). The alternative is to get weights that have only
been optimized on a length specific subset of loops. By default you get
different weights for following lengths: [0,1,2,3,4], [5,6], [7,8], [9,10],
[11,12], [13,14].
If you choose to modify the weights, please ensure to set consistently named
keys in here and to use consistently named scorers and scoring computations!

static
GetWeights (with_db=False, with_aa=False, length_dependent=False, loop_length=1)
Returns:  Named weights to be used when scoring loop candidates. The default
weights were optimized to give the best performance when choosing
the loop candidate with the lowest combined score. Each set of
weights includes (different) backbone scoring weights, that are
optionally only trained on a subset of loops with corresponding
loop length.

Return type:  dict (keys: str , values: float )

Parameters: 
 with_db (
bool ) – True to choose a set of weights including DB specific scores
(stem RMSD and profile scores)
 with_aa (
bool ) – True to choose a set of weights including all atom scores
 length_dependent (
bool ) – Whether to use general weights or their length
length dependent counterparts.
 loop_length (
int ) – Length of full loop. If no weights are available for
this length or when loop_length is 1, the general
weights are returned.


static
SetWeights (with_db, with_aa, weights, length_dependent=False, loop_length=1)
Overwrite a set of weights as returned by GetWeights() .

static
GetBackboneWeights (with_db=False, with_aa=False, length_dependent=False, loop_length=1)
Returns:  Subset of GetWeights() for backbone scoring as in
scoring.BackboneOverallScorer.CalculateLinearCombination() .

Return type:  dict (keys: str , values: float )

Parameters: 


static
GetAllAtomWeights (with_db=False, length_dependent=False, loop_length=1)
Returns:  Subset of GetWeights() for all atom scoring as in
scoring.AllAtomOverallScorer.CalculateLinearCombination() .

Return type:  dict (keys: str , values: float )

Parameters: 


static
GetStemRMSDsKey ()

static
GetSequenceProfileScoresKey ()

static
GetStructureProfileScoresKey ()
Returns:  Default key for stem RMSD / sequence profile / structure profile
scores. 
Return type:  str 

static
SetStemRMSDsKey (key)

static
SetSequenceProfileScoresKey (key)

static
SetStructureProfileScoresKey (key)
Parameters:  key (str ) – New default key for stem RMSD / sequence profile / structure
profile scores. 

static
GetBackboneScoringKeys ()

static
GetAllAtomScoringKeys ()
Returns:  List of backbone / allatom scorers to be computed for any set of
weights defined in here. 
Return type:  list of str 

static
SetBackboneScoringKeys (keys)

static
SetAllAtomScoringKeys (keys)
Parameters:  keys (list of str ) – New list of backbone / allatom scorers to be computed for any
set of weights defined in here. 
Example: loop scoring in modelling
In the example below, we show how we find and choose a loop candidate to close a
gap for a model. This shows the combined usage of ModellingHandle to
keep a consistent modelling environment, LoopCandidates with its
scoring routines, ScoreContainer for keeping track of scores and
ScoringWeights to combine scores:
from ost import io, seq
from promod3 import modelling, loop
# setup raw model
tpl = io.LoadPDB('data/1crn_cut.pdb')
seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN'
seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEAGCIIIPGATCPGDYAN'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
print("Number of gaps in raw model: %d" % len(mhandle.gaps))
# setup default scorers for modelling handle
modelling.SetupDefaultBackboneScoring(mhandle)
modelling.SetupDefaultAllAtomScoring(mhandle)
# setup databases
frag_db = loop.LoadFragDB()
structure_db = loop.LoadStructureDB()
torsion_sampler = loop.LoadTorsionSamplerCoil()
# get data for gap to close
gap = mhandle.gaps[0].Copy()
print("Gap to close: %s" % str(gap))
n_stem = gap.before
c_stem = gap.after
start_resnum = n_stem.GetNumber().GetNum()
start_idx = start_resnum  1 # res. num. starts at 1
# get loop candidates from FragDB
candidates = modelling.LoopCandidates.FillFromDatabase(\
n_stem, c_stem, gap.full_seq, frag_db, structure_db)
print("Number of loop candidates: %d" % len(candidates))
# all scores will be kept in a score container which we update
all_scores = modelling.ScoreContainer()
# the keys used to identify scores are globally defined
print("Stem RMSD key = '%s'" \
% modelling.ScoringWeights.GetStemRMSDsKey())
print("Profile keys = ['%s', '%s']" \
% (modelling.ScoringWeights.GetSequenceProfileScoresKey(),
modelling.ScoringWeights.GetStructureProfileScoresKey()))
print("Backbone scoring keys = %s" \
% str(modelling.ScoringWeights.GetBackboneScoringKeys()))
print("All atom scoring keys = %s" \
% str(modelling.ScoringWeights.GetAllAtomScoringKeys()))
# get stem RMSDs for each candidate (i.e. how well does it fit?)
# > this must be done before CCD to be meaningful
candidates.CalculateStemRMSDs(all_scores, n_stem, c_stem)
# close the candidates with CCD
orig_indices = candidates.ApplyCCD(n_stem, c_stem, torsion_sampler)
print("Number of closed loop candidates: %d" % len(candidates))
# get subset of previously computed scores
all_scores = all_scores.Extract(orig_indices)
# add profile scores (needs profile for target sequence)
prof = io.LoadSequenceProfile("data/1CRNA.hhm")
candidates.CalculateSequenceProfileScores(all_scores, structure_db,
prof, start_idx)
candidates.CalculateStructureProfileScores(all_scores, structure_db,
prof, start_idx)
# add backbone scores
candidates.CalculateBackboneScores(all_scores, mhandle, start_resnum)
# add all atom scores
candidates.CalculateAllAtomScores(all_scores, mhandle, start_resnum)
# use default weights to combine scores
weights = modelling.ScoringWeights.GetWeights(with_db=True,
with_aa=True)
scores = all_scores.LinearCombine(weights)
# rank them (best = lowest "score")
arg_sorted_scores = sorted([(v,i) for i,v in enumerate(scores)])
print("Ranked candidates: score, index")
for v,i in arg_sorted_scores:
print("%g, %d" % (v,i))
# insert best into model, update scorers and clear gaps
best_candidate = candidates[arg_sorted_scores[0][1]]
modelling.InsertLoopClearGaps(mhandle, best_candidate, gap)
print("Number of gaps in closed model: %d" % len(mhandle.gaps))
io.SavePDB(mhandle.model, "model.pdb")

Contents
Search
Enter search terms or a module, class or function name.
Previous topic
Handling Gaps
Next topic
Fitting Loops Into Gaps
You are here
