Handling Loop Candidates¶

For convenience, we provide the LoopCandidates class as a container of loop candidates with consistent length and sequence among them. It can either be filled manually or generated using static filling functions using the functionality from the FragDB or Monte Carlo algorithms. Once it contains candidates you can apply closing, scoring or clustering algorithms on all loop candidates. Example:

from ost import io
from promod3 import loop, scoring, modelling

# let's load a crambin structure from the pdb
crambin = io.LoadPDB('data/1CRN.pdb')
SEQRES = ''.join([r.one_letter_code for r in crambin.residues])

# this is the sequence we want to remodel
loop_seq = SEQRES[23:31]

# let's define the stem residues
n_stem = crambin.residues[23]
c_stem = crambin.residues[30]

# we use the StructureDB as source for structural information
structure_db = loop.LoadStructureDB()

# the FragDB allows to access the StructureDB based on geometric 
# features of the loop stem residue
frag_db = loop.LoadFragDB()

# the LoopCandidates allow to handle several loops at once
# we directly want to find potential loop candidates from the
# previously loaded databases
loop_candidates = modelling.LoopCandidates.FillFromDatabase(\
                    n_stem, c_stem, loop_seq, frag_db, structure_db)

# candidates usually don't match exactly the required stem coords.
# CCD (Cyclic Coordinate Descent) is one way to enforce this match.
loop_candidates.ApplyCCD(n_stem, c_stem)

# setup backbone scorer with clash and cbeta scoring
score_env = scoring.BackboneScoreEnv(SEQRES)
score_env.SetInitialEnvironment(crambin)
scorer = scoring.BackboneOverallScorer()
scorer["cbeta"] = scoring.LoadCBetaScorer()
scorer["clash"] = scoring.ClashScorer()
scorer.AttachEnvironment(score_env)

# the scorer can then be used on the LoopCandidates object to
# calculate desired scores (here: cbeta & clash, start resnum = 24)
bb_scores = modelling.ScoreContainer()
loop_candidates.CalculateBackboneScores(bb_scores, scorer, score_env,
                                        ["cbeta", "clash"], 24)
# we finally perform a weighted linear combination of the scores,
# pick the best one, insert it into our structure and save it
weights = {"cbeta": 1, "clash": 1}
scores = bb_scores.LinearCombine(weights)
min_score = min(scores)
min_candidate = scores.index(min_score)
bb_list = loop_candidates[min_candidate]
bb_list.InsertInto(crambin.chains[0], n_stem.GetNumber())
io.SavePDB(crambin, "modified_crambin.pdb")

The LoopCandidates class¶

class promod3.modelling.LoopCandidates(seq)¶

Initializes an empty container of loop candidates.

Candidates can be accessed and iterated as if it was a list of BackboneList.

Parameters:: seq (str) – The sequence being enforced for all candidates

static FillFromDatabase(n_stem, c_stem, seq, frag_db, structural_db, extended_search=False)¶

Searches for loop candidates matching the length (number of residues in seq) and geometry (of n_stem and c_stem) of the loop to be modelled in a fragment database.

This call also adds fragment infos (HasFragmentInfos() will be True).

Parameters:

n_stem (ost.mol.ResidueHandle) – The residue at the N-terminal end of the loop
c_stem (ost.mol.ResidueHandle) – The residue at the C-terminal end of the loop
seq (str) – The sequence of residues to be added including the n_stem and c_stem
frag_db (FragDB) – The fragment database
structural_db (StructureDB) – The according structural database
extended_search (bool) – Whether search should be extended to fragments matching the geometry of the n_stem and c_stem a bit less precisely.

Returns:

A list of loop candidates

Return type:

LoopCandidates

Raises:

RuntimeError if stems do no contain N, CA and C atoms.

static FillFromMonteCarloSampler(seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)¶

static FillFromMonteCarloSampler(initial_bb, seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)

Uses Monte Carlo simulated annealing to sample the loop to be modelled. If initial_bb is given, every Monte Carlo run starts from that configuration.

Warning

The BackboneScoreEnv attached to scorer will be altered in the specified stretch. You might consider the Stash / Pop mechanism of the BackboneScoreEnv to restore to the original state once the sampling is done.

Parameters:

initial_bb (BackboneList) – Initial configuration used for the sampling
seq (str) – The sequence of residues to be sampled
num_loops (int) – Number of loop candidates to return
steps (int) – Number of MC steps to perform for each loop candidate generated
sampler (Sampler Object) – Used to generate a new configuration at each MC step
closer (Closer Object) – Used to close the loop after each MC step
scorer (Scorer Object) – Used to score the generated configurations at each MC step
cooler (Cooler Object) – Controls the temperature profile of the simulated annealing
random_seed (int) – Seed to feed the random number generator for accepting/rejecting proposed monte carlo steps. For every monte carlo run, the random number generator gets refreshed and this seed gets increased by 1.

Returns:

The resulting candidates

Return type:

LoopCandidates

Raises:

RuntimeError, if initial_bb is not given and the Monte Carlo sampler failed to initialize (i.e. stems are too far apart) or if initial_bb is given and it is inconsistent with seq.

GetSequence()¶

Returns:: The Sequence
Return type:: str

ApplyCCD(n_stem, c_stem, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)¶

ApplyCCD(n_stem, c_stem, torsion_sampler, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)

ApplyCCD(n_stem, c_stem, torsion_samplers, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)

Closes all loop candidates in this container using the CCD algorithm (i.e. modifies the candidates so that its stem residues match those of n_stem and c_stem). CCD (cyclic coordinate descent, see CCD) is an iterative minimization algorithm.

If torsion_sampler is given, it is used at each step of the closing to calculate the probability of the proposed move, which is then accepted or not depending on a metropolis criterium.

Parameters:

n_stem (ost.mol.ResidueHandle) – Residue defining the n-stem positions every candidate should match. See CCD().
c_stem (ost.mol.ResidueHandle) – Residue defining the c-stem positions every candidate should match. See CCD().
torsion_sampler (TorsionSampler / list of TorsionSampler) – A torsion sampler (used for all residues) or a list of samplers (one per residue).
max_iterations (int) – Maximum number of iterations
rmsd_cutoff (float) – Cutoff in stem residue RMSD used to determine convergence
keep_non_converged (bool) – Whether to keep loop candidates for which the closing did not converge
random_seed (int) – seed for random number generator used to accept/reject moves in CCD algorithm

Returns:

An index for each kept candidate corresponding to the candidate index before this function was called. This is useful to keep track of scores and other data extracted before.

Return type:

list of int

ApplyKIC(n_stem, c_stem, pivot_one, pivot_two, pivot_three)¶

Closes all loop candidates in this container (i.e. modifies the candidates so that its stem residues match those of n_stem and c_stem, which are the stem residues of the loop being modelled), using the KIC algorithm. This algorithm finds analytical solutions for closing the loop by modifying the torsion angles of just three pivot residues. Due to the underlying mathematical formalism in KIC, up to 16 solutions can be found for every candidate. This leads to an increase in number of loops.

Parameters:

n_stem (ost.mol.ResidueHandle) – Residue defining the n-stem positions every candidate should match
c_stem (ost.mol.ResidueHandle) – Residue defining the c-stem positions every candidate should match
pivot_one (int) – First pivot residue
pivot_two (int) – Second pivot residue
pivot_three (int) – Third pivot residue

Returns:

An index for each added candidate corresponding to the original candidate index before this function was called (each candidate can lead to multiple or no candidates after KIC is applied). This is useful to keep track of scores and other data extracted before.

Return type:

list of int

CalculateBackboneScores(score_container, scorer, scorer_env, start_resnum, chain_idx=0)¶

CalculateBackboneScores(score_container, scorer, scorer_env, keys, start_resnum, chain_idx=0)

CalculateBackboneScores(score_container, mhandle, start_resnum, chain_idx=0)

CalculateBackboneScores(score_container, mhandle, keys, start_resnum, chain_idx=0)

CalculateAllAtomScores(score_container, mhandle, start_resnum, chain_idx=0)¶

CalculateAllAtomScores(score_container, mhandle, keys, start_resnum, chain_idx=0)

Calculate backbone / all-atom scores for each loop candidate. Note that (unless otherwise noted) a lower “score” is better!

The computed scores are in the same same order as the candidates in here.

Parameters:

score_container (ScoreContainer) – Add scores to this score container using the given key names (or the ones from ScoringWeights)
scorer (BackboneOverallScorer) – Backbone scoring object with set environment for the particular loop modelling problem.
scorer_env (BackboneScoreEnv) – The scoring environment attached to scorer
mhandle (ModellingHandle) – Modelling handle set up scoring (see SetupDefaultBackboneScoring() SetupDefaultAllAtomScoring()).
keys (list of str) – Keys of the desired scorers. If not given, we use the set of keys given by ScoringWeights.GetBackboneScoringKeys() / ScoringWeights.GetAllAtomScoringKeys().
start_resnum (int) – Res. number defining the position in the SEQRES.
chain_idx (int) – Index of chain the loops belong to.

Raises:

RuntimeError if IsAllAtomScoringSetUp() is False, if keys has a key for which no scorer exists or if anything is raised when calculating the scores.

CalculateSequenceProfileScores(score_container, structure_db, prof, offset=0)¶

CalculateStructureProfileScores(score_container, structure_db, prof, offset=0)¶

Calculates a score comparing the given profile prof starting at offset with the sequence / structure profile of each candidate as extracted from structure_db (see ost.seq.ProfileHandle.GetAverageScore() for details, prof.null_model is used for weighting).

Note that for profile scores a higher “score” is better! So take care when combining this to other scores, where it is commonly the other way around.

This requires that each candidate has a connected fragment info into the given structure_db (e.g. FillFromDatabase() must have been called with this DB).

The computed scores are in the same same order as the candidates in here.

Parameters:

score_container (ScoreContainer) – Add scores to this score container using the default key name defined in ScoringWeights
structural_db (StructureDB) – Structural database used in FillFromDatabase()
prof (ost.seq.ProfileHandle) – Profile information for target.
offset (int) – Loop starts at index offset in prof.

Raises:

RuntimeError if HasFragmentInfos() is False, if structure_db is incompatible with the stored fragment infos or if prof has less than offset+len elements (len = length of loops stored in here).

CalculateStemRMSDs(score_container, n_stem, c_stem)¶

Calculates RMSD between the given stems and the first and last residue of the loop candidates. This first superposes the first loop residue with n_stem and then computes the RMSD between the last residue’s N, CA, C positions and the corresponding atoms in c_stem.

Note that this score is only useful before calling ApplyCCD() or ApplyKIC().

The computed scores are in the same same order as the candidates in here.

Parameters:

score_container (ScoreContainer) – Add scores to this score container using the default key name defined in ScoringWeights
n_stem (ost.mol.ResidueHandle) – The residue at the N-terminal end of the loop.
c_stem (ost.mol.ResidueHandle) – The residue at the C-terminal end of the loop.

Raises:

RuntimeError if stems do no contain N, CA and C atoms.

CalculateSequenceProfileScores(structure_db, prof, offset=0)¶

CalculateStructureProfileScores(structure_db, prof, offset=0)¶

CalculateStemRMSDs(n_stem, c_stem)¶

Same as the score_container variant above, but here we directly return the score vector instead of storing it in a container.

Returns:: Score for each candidate (same order as candidates in here).
Return type:: list of float

Add(bb_list)¶

Parameters:: bb_list (BackboneList) – The loop candidate to be added.
Raises:: RuntimeError if sequence of bb_list is not consistent with internal sequence

AddFragmentInfo(fragment)¶

Adds a fragment info for the last candidate added with Add(). Note that these infos are added automatically with FillFromDatabase() and updated whenever the candidates are copied, removed, clustered etc.. So you will probably never need this function.

Parameters:: fragment (FragmentInfo) – The fragment info to add.

Remove(index)¶

Remove a loop candidate from the list of candidates.

Parameters:: index (int) – The index of the candidate that will be removed
Raises:: RuntimeError if index is out of bounds.

HasFragmentInfos()¶

Returns:: True, if each loop candidate has a connected fragment info.
Return type:: bool

GetFragmentInfo(index)¶

Returns:: Fragment info connected to loop candidate at given index.
Return type:: FragmentInfo
Parameters:: index (int) – The index of the candidate.
Raises:: RuntimeError if HasFragmentInfos() is False or if index is out of bounds.

Extract(indices)¶

Returns:: Container with candidates with given indices.
Return type:: LoopCandidates
Parameters:: indices (list of int) – Candidate indices to extract.
Raises:: RuntimeError if any index is out of bounds.

GetClusters(max_dist, superposed_rmsd=False)¶

Clusters the loop candidates according to their pairwise CA RMSD using an agglomerative hierarchical clustering algorithm. Every candidate gets assigned a unique cluster. At every clustering step, the two clusters with shortest distance get merged, with the distance definition being the average CA RMSD between any of the members of the two clusters.

Parameters:

max_dist (float) – Maximal distance two clusters can have to be merged
superposed_rmsd (bool) – Whether to calculate the RMSD based on a minimal RMSD superposition or simply consider the current positions

Returns:

Lists of loop candidate indices into this container

Return type:

list of list of int

GetClusteredCandidates(max_dist, neglect_size_one=True, superposed_rmsd=False)¶

Returns:

List of loop candidates clustered as in GetClusters().

Return type:

list of LoopCandidates

Parameters:

max_dist (float) – Maximal distance two clusters can have to be merged
neglect_size_one (bool) – Whether clusters should be added to the returned list if they only contain one loop candidate
superposed_rmsd (bool) – Whether to calculate the RMSD based on a minimal RMSD superposition or simply consider the current positions

GetLargestCluster(max_dist, superposed_rmsd=False)¶

Instead of performing a full clustering, the algorithm simply searches for the candidate with the most other candidates having a CA RMSD below max_dist. This candidate then serves as centroid for the return cluster.

Parameters:

max_dist (float) – Maximal CA RMSD to cluster centroid
superposed_rmsd (bool) – Whether to calculate the RMSD based on a minimal RMSD superposition or simply consider the current positions

Returns:

Largest possible cluster with all members having a CA RMSD below max_dist to cluster centroid.

Return type:

LoopCandidates

Keeping track of loop candidate scores¶

Two helper classes are used to keep track and combine different scores computed on loop candidates.

class promod3.modelling.ScoreContainer¶

Container to keep vectors of scores (one for each loop candidate) for each scorer (one vector for each single scorer). Each score vector is guaranteed to have the same number of values.

IsEmpty()¶

Returns:: True, if no loop candidates have been scored with any scorer yet.
Return type:: bool

Contains(key)¶

Returns:: True, if a score vector for this key was already added.
Return type:: bool
Parameters:: key (str) – Key for desired scorer.

Get(key)¶

Returns:: Score vector for the given key.
Return type:: list of GetNumCandidates() float
Parameters:: key (str) – Key for desired score vector.
Raises:: RuntimeError if there are no scores for that key.

Set(key, scores)¶

Parameters:

key (str) – Set scores for that key.
scores (list of float) – Score vector to set.

Raises:

RuntimeError if this container contains other score vectors with a different number of entries.

GetNumCandidates()¶

Returns:: Number of loop candidates that are scored here. This is the length of each score vector in this container.
Return type:: int

LinearCombine(linear_weights)¶

Returns:: Weighted, linear combination of scores.
Return type:: list of GetNumCandidates() float
Parameters:: linear_weights (dict (keys: str, values: float)) – Weights for each scorer.
Raises:: RuntimeError if linear_weights has a key for which no scores were added.

Copy()¶

Returns:: Full copy of this container.
Return type:: ScoreContainer

Extract(indices)¶

Returns:: Container with scores for a subset of loop candidates.
Return type:: ScoreContainer
Parameters:: indices (list of int) – List of loop candidate indices to pick (in [0, GetNumCandidates()-1])
Raises:: RuntimeError if any index is out of bounds.

Extend(other)¶

Extend each score vector with the score vector of other (must have matching keys).

Parameters:: other (ScoreContainer) – Score container to be added to this one.

class promod3.modelling.ScoringWeights¶

Globally accessible set of weights to be used in scoring. This also defines a consistent naming of keys used for backbone and all atom scorers as set up by SetupDefaultBackboneScoring() and SetupDefaultAllAtomScoring().

Different sets of weights are available. You can get general weights that have been optimized for a non redundant set of loops with lengths [3,14] (including stem residues). The alternative is to get weights that have only been optimized on a length specific subset of loops. By default you get different weights for following lengths: [0,1,2,3,4], [5,6], [7,8], [9,10], [11,12], [13,14].

If you choose to modify the weights, please ensure to set consistently named keys in here and to use consistently named scorers and scoring computations!

static GetWeights(with_db=False, with_aa=False, length_dependent=False, loop_length=-1)¶

Returns:

Named weights to be used when scoring loop candidates. The default weights were optimized to give the best performance when choosing the loop candidate with the lowest combined score. Each set of weights includes (different) backbone scoring weights, that are optionally only trained on a subset of loops with corresponding loop length.

Return type:

dict (keys: str, values: float)

Parameters:

with_db (bool) – True to choose a set of weights including DB specific scores (stem RMSD and profile scores)
with_aa (bool) – True to choose a set of weights including all atom scores
length_dependent (bool) – Whether to use general weights or their length length dependent counterparts.
loop_length (int) – Length of full loop. If no weights are available for this length or when loop_length is -1, the general weights are returned.

static SetWeights(with_db, with_aa, weights, length_dependent=False, loop_length=-1)¶: Overwrite a set of weights as returned by GetWeights().

static GetBackboneWeights(with_db=False, with_aa=False, length_dependent=False, loop_length=-1)¶

Returns:

Subset of GetWeights() for backbone scoring as in scoring.BackboneOverallScorer.CalculateLinearCombination().

Return type:

dict (keys: str, values: float)

Parameters:

with_db (bool) – As in GetWeights()
with_aa (bool) – As in GetWeights()
length_dependent (bool) – As in GetWeights()
loop_length (int) – As in GetWeights()

static GetAllAtomWeights(with_db=False, length_dependent=False, loop_length=-1)¶

Returns:

Subset of GetWeights() for all atom scoring as in scoring.AllAtomOverallScorer.CalculateLinearCombination().

Return type:

dict (keys: str, values: float)

Parameters:

with_db (bool) – As in GetWeights()
length_dependent (bool) – As in GetWeights()
loop_length (int) – As in GetWeights()

static GetStemRMSDsKey()¶

static GetSequenceProfileScoresKey()¶

static GetStructureProfileScoresKey()¶

Returns:: Default key for stem RMSD / sequence profile / structure profile scores.
Return type:: str

static SetStemRMSDsKey(key)¶

static SetSequenceProfileScoresKey(key)¶

static SetStructureProfileScoresKey(key)¶

Parameters:: key (str) – New default key for stem RMSD / sequence profile / structure profile scores.

static GetBackboneScoringKeys()¶

static GetAllAtomScoringKeys()¶

Returns:: List of backbone / all-atom scorers to be computed for any set of weights defined in here.
Return type:: list of str

static SetBackboneScoringKeys(keys)¶

static SetAllAtomScoringKeys(keys)¶

Parameters:: keys (list of str) – New list of backbone / all-atom scorers to be computed for any set of weights defined in here.

Example: loop scoring in modelling¶

In the example below, we show how we find and choose a loop candidate to close a gap for a model. This shows the combined usage of ModellingHandle to keep a consistent modelling environment, LoopCandidates with its scoring routines, ScoreContainer for keeping track of scores and ScoringWeights to combine scores:

from ost import io, seq
from promod3 import modelling, loop

# setup raw model
tpl = io.LoadPDB('data/1crn_cut.pdb')
seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN'
seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA------GCIIIPGATCPGDYAN'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
                          seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
print(("Number of gaps in raw model: %d" % len(mhandle.gaps)))

# setup default scorers for modelling handle
modelling.SetupDefaultBackboneScoring(mhandle)
modelling.SetupDefaultAllAtomScoring(mhandle)

# setup databases
frag_db = loop.LoadFragDB()
structure_db = loop.LoadStructureDB()
torsion_sampler = loop.LoadTorsionSamplerCoil()

# get data for gap to close
gap = mhandle.gaps[0].Copy()
print(("Gap to close: %s" % str(gap)))
n_stem = gap.before
c_stem = gap.after
start_resnum = n_stem.GetNumber().GetNum()
start_idx = start_resnum - 1   # res. num. starts at 1

# get loop candidates from FragDB
candidates = modelling.LoopCandidates.FillFromDatabase(\
                n_stem, c_stem, gap.full_seq, frag_db, structure_db)
print(("Number of loop candidates: %d" % len(candidates)))

# all scores will be kept in a score container which we update
all_scores = modelling.ScoreContainer()
# the keys used to identify scores are globally defined
print(("Stem RMSD key = '%s'" \
      % modelling.ScoringWeights.GetStemRMSDsKey()))
print(("Profile keys = ['%s', '%s']" \
      % (modelling.ScoringWeights.GetSequenceProfileScoresKey(),
         modelling.ScoringWeights.GetStructureProfileScoresKey())))
print(("Backbone scoring keys = %s" \
      % str(modelling.ScoringWeights.GetBackboneScoringKeys())))
print(("All atom scoring keys = %s" \
      % str(modelling.ScoringWeights.GetAllAtomScoringKeys())))

# get stem RMSDs for each candidate (i.e. how well does it fit?)
# -> this must be done before CCD to be meaningful
candidates.CalculateStemRMSDs(all_scores, n_stem, c_stem)

# close the candidates with CCD
orig_indices = candidates.ApplyCCD(n_stem, c_stem, torsion_sampler)
print(("Number of closed loop candidates: %d" % len(candidates)))

# get subset of previously computed scores
all_scores = all_scores.Extract(orig_indices)

# add profile scores (needs profile for target sequence)
prof = io.LoadSequenceProfile("data/1CRNA.hhm")
candidates.CalculateSequenceProfileScores(all_scores, structure_db,
                                          prof, start_idx)
candidates.CalculateStructureProfileScores(all_scores, structure_db,
                                           prof, start_idx)
# add backbone scores
candidates.CalculateBackboneScores(all_scores, mhandle, start_resnum)
# add all atom scores
candidates.CalculateAllAtomScores(all_scores, mhandle, start_resnum)

# use default weights to combine scores
weights = modelling.ScoringWeights.GetWeights(with_db=True,
                                              with_aa=True)
scores = all_scores.LinearCombine(weights)

# rank them (best = lowest "score")
arg_sorted_scores = sorted([(v,i) for i,v in enumerate(scores)])
print("Ranked candidates: score, index")
for v,i in arg_sorted_scores:
  print(("%g, %d" % (v,i)))

# insert best into model, update scorers and clear gaps
best_candidate = candidates[arg_sorted_scores[0][1]]
modelling.InsertLoopClearGaps(mhandle, best_candidate, gap)
print(("Number of gaps in closed model: %d" % len(mhandle.gaps)))
io.SavePDB(mhandle.model, "model.pdb")

Handling Loop Candidates¶

The LoopCandidates class¶

Keeping track of loop candidate scores¶

Example: loop scoring in modelling¶

Search

Contents