Handling Loop Candidates¶
For convenience, we provide the LoopCandidates
class as a container of
loop candidates with consistent length and sequence among them. It can either be
filled manually or generated using static filling functions using the
functionality from the FragDB
or Monte Carlo algorithms.
Once it contains candidates you can apply closing, scoring or clustering
algorithms on all loop candidates. Example:
from ost import io
from promod3 import loop, scoring, modelling
# let's load a crambin structure from the pdb
crambin = io.LoadPDB('data/1CRN.pdb')
SEQRES = ''.join([r.one_letter_code for r in crambin.residues])
# this is the sequence we want to remodel
loop_seq = SEQRES[23:31]
# let's define the stem residues
n_stem = crambin.residues[23]
c_stem = crambin.residues[30]
# we use the StructureDB as source for structural information
structure_db = loop.LoadStructureDB()
# the FragDB allows to access the StructureDB based on geometric
# features of the loop stem residue
frag_db = loop.LoadFragDB()
# the LoopCandidates allow to handle several loops at once
# we directly want to find potential loop candidates from the
# previously loaded databases
loop_candidates = modelling.LoopCandidates.FillFromDatabase(\
n_stem, c_stem, loop_seq, frag_db, structure_db)
# candidates usually don't match exactly the required stem coords.
# CCD (Cyclic Coordinate Descent) is one way to enforce this match.
loop_candidates.ApplyCCD(n_stem, c_stem)
# setup backbone scorer with clash and cbeta scoring
score_env = scoring.BackboneScoreEnv(SEQRES)
score_env.SetInitialEnvironment(crambin)
scorer = scoring.BackboneOverallScorer()
scorer["cbeta"] = scoring.LoadCBetaScorer()
scorer["clash"] = scoring.ClashScorer()
scorer.AttachEnvironment(score_env)
# the scorer can then be used on the LoopCandidates object to
# calculate desired scores (here: cbeta & clash, start resnum = 24)
bb_scores = modelling.ScoreContainer()
loop_candidates.CalculateBackboneScores(bb_scores, scorer, score_env,
["cbeta", "clash"], 24)
# we finally perform a weighted linear combination of the scores,
# pick the best one, insert it into our structure and save it
weights = {"cbeta": 1, "clash": 1}
scores = bb_scores.LinearCombine(weights)
min_score = min(scores)
min_candidate = scores.index(min_score)
bb_list = loop_candidates[min_candidate]
bb_list.InsertInto(crambin.chains[0], n_stem.GetNumber())
io.SavePDB(crambin, "modified_crambin.pdb")
The LoopCandidates class¶
- class promod3.modelling.LoopCandidates(seq)¶
Initializes an empty container of loop candidates.
Candidates can be accessed and iterated as if it was a
list
ofBackboneList
.- Parameters:
seq (
str
) – The sequence being enforced for all candidates
- static FillFromDatabase(n_stem, c_stem, seq, frag_db, structural_db, extended_search=False)¶
Searches for loop candidates matching the length (number of residues in seq) and geometry (of n_stem and c_stem) of the loop to be modelled in a fragment database.
This call also adds fragment infos (
HasFragmentInfos()
will be True).- Parameters:
n_stem (
ost.mol.ResidueHandle
) – The residue at the N-terminal end of the loopc_stem (
ost.mol.ResidueHandle
) – The residue at the C-terminal end of the loopseq (
str
) – The sequence of residues to be added including the n_stem and c_stemfrag_db (
FragDB
) – The fragment databasestructural_db (
StructureDB
) – The according structural databaseextended_search (
bool
) – Whether search should be extended to fragments matching the geometry of the n_stem and c_stem a bit less precisely.
- Returns:
A list of loop candidates
- Return type:
- Raises:
RuntimeError
if stems do no contain N, CA and C atoms.
- static FillFromMonteCarloSampler(seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)¶
- static FillFromMonteCarloSampler(initial_bb, seq, num_loops, steps, sampler, closer, scorer, cooler, random_seed=0)
Uses Monte Carlo simulated annealing to sample the loop to be modelled. If initial_bb is given, every Monte Carlo run starts from that configuration.
Warning
The
BackboneScoreEnv
attached to scorer will be altered in the specified stretch. You might consider the Stash / Pop mechanism of theBackboneScoreEnv
to restore to the original state once the sampling is done.- Parameters:
initial_bb (
BackboneList
) – Initial configuration used for the samplingseq (
str
) – The sequence of residues to be samplednum_loops (
int
) – Number of loop candidates to returnsteps (
int
) – Number of MC steps to perform for each loop candidate generatedsampler (Sampler Object) – Used to generate a new configuration at each MC step
closer (Closer Object) – Used to close the loop after each MC step
scorer (Scorer Object) – Used to score the generated configurations at each MC step
cooler (Cooler Object) – Controls the temperature profile of the simulated annealing
random_seed (
int
) – Seed to feed the random number generator for accepting/rejecting proposed monte carlo steps. For every monte carlo run, the random number generator gets refreshed and this seed gets increased by 1.
- Returns:
The resulting candidates
- Return type:
- Raises:
RuntimeError
, if initial_bb is not given and the Monte Carlo sampler failed to initialize (i.e. stems are too far apart) or if initial_bb is given and it is inconsistent with seq.
- ApplyCCD(n_stem, c_stem, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)¶
- ApplyCCD(n_stem, c_stem, torsion_sampler, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)
- ApplyCCD(n_stem, c_stem, torsion_samplers, max_iterations=1000, rmsd_cutoff=0.1, keep_non_converged=false, random_seed=0)
Closes all loop candidates in this container using the CCD algorithm (i.e. modifies the candidates so that its stem residues match those of n_stem and c_stem). CCD (cyclic coordinate descent, see
CCD
) is an iterative minimization algorithm.If torsion_sampler is given, it is used at each step of the closing to calculate the probability of the proposed move, which is then accepted or not depending on a metropolis criterium.
- Parameters:
n_stem (
ost.mol.ResidueHandle
) – Residue defining the n-stem positions every candidate should match. SeeCCD()
.c_stem (
ost.mol.ResidueHandle
) – Residue defining the c-stem positions every candidate should match. SeeCCD()
.torsion_sampler (
TorsionSampler
/list
ofTorsionSampler
) – A torsion sampler (used for all residues) or a list of samplers (one per residue).max_iterations (
int
) – Maximum number of iterationsrmsd_cutoff (
float
) – Cutoff in stem residue RMSD used to determine convergencekeep_non_converged (
bool
) – Whether to keep loop candidates for which the closing did not convergerandom_seed (
int
) – seed for random number generator used to accept/reject moves in CCD algorithm
- Returns:
An index for each kept candidate corresponding to the candidate index before this function was called. This is useful to keep track of scores and other data extracted before.
- Return type:
- ApplyKIC(n_stem, c_stem, pivot_one, pivot_two, pivot_three)¶
Closes all loop candidates in this container (i.e. modifies the candidates so that its stem residues match those of n_stem and c_stem, which are the stem residues of the loop being modelled), using the KIC algorithm. This algorithm finds analytical solutions for closing the loop by modifying the torsion angles of just three pivot residues. Due to the underlying mathematical formalism in KIC, up to 16 solutions can be found for every candidate. This leads to an increase in number of loops.
- Parameters:
n_stem (
ost.mol.ResidueHandle
) – Residue defining the n-stem positions every candidate should matchc_stem (
ost.mol.ResidueHandle
) – Residue defining the c-stem positions every candidate should matchpivot_one (
int
) – First pivot residuepivot_two (
int
) – Second pivot residuepivot_three (
int
) – Third pivot residue
- Returns:
An index for each added candidate corresponding to the original candidate index before this function was called (each candidate can lead to multiple or no candidates after KIC is applied). This is useful to keep track of scores and other data extracted before.
- Return type:
- CalculateBackboneScores(score_container, scorer, scorer_env, start_resnum, chain_idx=0)¶
- CalculateBackboneScores(score_container, scorer, scorer_env, keys, start_resnum, chain_idx=0)
- CalculateBackboneScores(score_container, mhandle, start_resnum, chain_idx=0)
- CalculateBackboneScores(score_container, mhandle, keys, start_resnum, chain_idx=0)
- CalculateAllAtomScores(score_container, mhandle, start_resnum, chain_idx=0)¶
- CalculateAllAtomScores(score_container, mhandle, keys, start_resnum, chain_idx=0)
Calculate backbone / all-atom scores for each loop candidate. Note that (unless otherwise noted) a lower “score” is better!
The computed scores are in the same same order as the candidates in here.
- Parameters:
score_container (
ScoreContainer
) – Add scores to this score container using the given key names (or the ones fromScoringWeights
)scorer (
BackboneOverallScorer
) – Backbone scoring object with set environment for the particular loop modelling problem.scorer_env (
BackboneScoreEnv
) – The scoring environment attached to scorermhandle (
ModellingHandle
) – Modelling handle set up scoring (seeSetupDefaultBackboneScoring()
SetupDefaultAllAtomScoring()
).keys (
list
ofstr
) – Keys of the desired scorers. If not given, we use the set of keys given byScoringWeights.GetBackboneScoringKeys()
/ScoringWeights.GetAllAtomScoringKeys()
.start_resnum (
int
) – Res. number defining the position in the SEQRES.chain_idx (
int
) – Index of chain the loops belong to.
- Raises:
RuntimeError
ifIsAllAtomScoringSetUp()
is False, if keys has a key for which no scorer exists or if anything is raised when calculating the scores.
- CalculateSequenceProfileScores(score_container, structure_db, prof, offset=0)¶
- CalculateStructureProfileScores(score_container, structure_db, prof, offset=0)¶
Calculates a score comparing the given profile prof starting at offset with the sequence / structure profile of each candidate as extracted from structure_db (see
ost.seq.ProfileHandle.GetAverageScore()
for details, prof.null_model is used for weighting).Note that for profile scores a higher “score” is better! So take care when combining this to other scores, where it is commonly the other way around.
This requires that each candidate has a connected fragment info into the given structure_db (e.g.
FillFromDatabase()
must have been called with this DB).The computed scores are in the same same order as the candidates in here.
- Parameters:
score_container (
ScoreContainer
) – Add scores to this score container using the default key name defined inScoringWeights
structural_db (
StructureDB
) – Structural database used inFillFromDatabase()
prof (
ost.seq.ProfileHandle
) – Profile information for target.offset (
int
) – Loop starts at index offset in prof.
- Raises:
RuntimeError
ifHasFragmentInfos()
is False, if structure_db is incompatible with the stored fragment infos or if prof has less than offset+len elements (len = length of loops stored in here).
- CalculateStemRMSDs(score_container, n_stem, c_stem)¶
Calculates RMSD between the given stems and the first and last residue of the loop candidates. This first superposes the first loop residue with n_stem and then computes the RMSD between the last residue’s N, CA, C positions and the corresponding atoms in c_stem.
Note that this score is only useful before calling
ApplyCCD()
orApplyKIC()
.The computed scores are in the same same order as the candidates in here.
- Parameters:
score_container (
ScoreContainer
) – Add scores to this score container using the default key name defined inScoringWeights
n_stem (
ost.mol.ResidueHandle
) – The residue at the N-terminal end of the loop.c_stem (
ost.mol.ResidueHandle
) – The residue at the C-terminal end of the loop.
- Raises:
RuntimeError
if stems do no contain N, CA and C atoms.
- CalculateSequenceProfileScores(structure_db, prof, offset=0)¶
- CalculateStructureProfileScores(structure_db, prof, offset=0)¶
- CalculateStemRMSDs(n_stem, c_stem)¶
Same as the score_container variant above, but here we directly return the score vector instead of storing it in a container.
- Add(bb_list)¶
- Parameters:
bb_list (
BackboneList
) – The loop candidate to be added.- Raises:
RuntimeError
if sequence of bb_list is not consistent with internal sequence
- AddFragmentInfo(fragment)¶
Adds a fragment info for the last candidate added with
Add()
. Note that these infos are added automatically withFillFromDatabase()
and updated whenever the candidates are copied, removed, clustered etc.. So you will probably never need this function.- Parameters:
fragment (
FragmentInfo
) – The fragment info to add.
- Remove(index)¶
Remove a loop candidate from the list of candidates.
- Parameters:
index (
int
) – The index of the candidate that will be removed- Raises:
RuntimeError
if index is out of bounds.
- HasFragmentInfos()¶
- Returns:
True, if each loop candidate has a connected fragment info.
- Return type:
- GetFragmentInfo(index)¶
- Returns:
Fragment info connected to loop candidate at given index.
- Return type:
- Parameters:
index (
int
) – The index of the candidate.- Raises:
RuntimeError
ifHasFragmentInfos()
is False or if index is out of bounds.
- Extract(indices)¶
- Returns:
Container with candidates with given indices.
- Return type:
- Parameters:
- Raises:
RuntimeError
if any index is out of bounds.
- GetClusters(max_dist, superposed_rmsd=False)¶
Clusters the loop candidates according to their pairwise CA RMSD using an agglomerative hierarchical clustering algorithm. Every candidate gets assigned a unique cluster. At every clustering step, the two clusters with shortest distance get merged, with the distance definition being the average CA RMSD between any of the members of the two clusters.
- Parameters:
- Returns:
Lists of loop candidate indices into this container
- Return type:
- GetClusteredCandidates(max_dist, neglect_size_one=True, superposed_rmsd=False)¶
- Returns:
List of loop candidates clustered as in
GetClusters()
.- Return type:
- Parameters:
max_dist (
float
) – Maximal distance two clusters can have to be mergedneglect_size_one (
bool
) – Whether clusters should be added to the returned list if they only contain one loop candidatesuperposed_rmsd (
bool
) – Whether to calculate the RMSD based on a minimal RMSD superposition or simply consider the current positions
- GetLargestCluster(max_dist, superposed_rmsd=False)¶
Instead of performing a full clustering, the algorithm simply searches for the candidate with the most other candidates having a CA RMSD below max_dist. This candidate then serves as centroid for the return cluster.
- Parameters:
- Returns:
Largest possible cluster with all members having a CA RMSD below max_dist to cluster centroid.
- Return type:
Keeping track of loop candidate scores¶
Two helper classes are used to keep track and combine different scores computed on loop candidates.
- class promod3.modelling.ScoreContainer¶
Container to keep vectors of scores (one for each loop candidate) for each scorer (one vector for each single scorer). Each score vector is guaranteed to have the same number of values.
- IsEmpty()¶
- Returns:
True, if no loop candidates have been scored with any scorer yet.
- Return type:
- Contains(key)¶
- Get(key)¶
- Returns:
Score vector for the given key.
- Return type:
- Parameters:
key (
str
) – Key for desired score vector.- Raises:
RuntimeError
if there are no scores for that key.
- Set(key, scores)¶
- GetNumCandidates()¶
- Returns:
Number of loop candidates that are scored here. This is the length of each score vector in this container.
- Return type:
- LinearCombine(linear_weights)¶
- Copy()¶
- Returns:
Full copy of this container.
- Return type:
- Extract(indices)¶
- Returns:
Container with scores for a subset of loop candidates.
- Return type:
- Parameters:
indices (
list
ofint
) – List of loop candidate indices to pick (in [0,GetNumCandidates()
-1])- Raises:
RuntimeError
if any index is out of bounds.
- Extend(other)¶
Extend each score vector with the score vector of other (must have matching keys).
- Parameters:
other (
ScoreContainer
) – Score container to be added to this one.
- class promod3.modelling.ScoringWeights¶
Globally accessible set of weights to be used in scoring. This also defines a consistent naming of keys used for backbone and all atom scorers as set up by
SetupDefaultBackboneScoring()
andSetupDefaultAllAtomScoring()
.Different sets of weights are available. You can get general weights that have been optimized for a non redundant set of loops with lengths [3,14] (including stem residues). The alternative is to get weights that have only been optimized on a length specific subset of loops. By default you get different weights for following lengths: [0,1,2,3,4], [5,6], [7,8], [9,10], [11,12], [13,14].
If you choose to modify the weights, please ensure to set consistently named keys in here and to use consistently named scorers and scoring computations!
- static GetWeights(with_db=False, with_aa=False, length_dependent=False, loop_length=-1)¶
- Returns:
Named weights to be used when scoring loop candidates. The default weights were optimized to give the best performance when choosing the loop candidate with the lowest combined score. Each set of weights includes (different) backbone scoring weights, that are optionally only trained on a subset of loops with corresponding loop length.
- Return type:
- Parameters:
with_db (
bool
) – True to choose a set of weights including DB specific scores (stem RMSD and profile scores)with_aa (
bool
) – True to choose a set of weights including all atom scoreslength_dependent (
bool
) – Whether to use general weights or their length length dependent counterparts.loop_length (
int
) – Length of full loop. If no weights are available for this length or when loop_length is -1, the general weights are returned.
- static SetWeights(with_db, with_aa, weights, length_dependent=False, loop_length=-1)¶
Overwrite a set of weights as returned by
GetWeights()
.
- static GetBackboneWeights(with_db=False, with_aa=False, length_dependent=False, loop_length=-1)¶
- Returns:
Subset of
GetWeights()
for backbone scoring as inscoring.BackboneOverallScorer.CalculateLinearCombination()
.- Return type:
- Parameters:
with_db (
bool
) – As inGetWeights()
with_aa (
bool
) – As inGetWeights()
length_dependent (
bool
) – As inGetWeights()
loop_length (
int
) – As inGetWeights()
- static GetAllAtomWeights(with_db=False, length_dependent=False, loop_length=-1)¶
- Returns:
Subset of
GetWeights()
for all atom scoring as inscoring.AllAtomOverallScorer.CalculateLinearCombination()
.- Return type:
- Parameters:
with_db (
bool
) – As inGetWeights()
length_dependent (
bool
) – As inGetWeights()
loop_length (
int
) – As inGetWeights()
- static GetStemRMSDsKey()¶
- static GetSequenceProfileScoresKey()¶
- static GetStructureProfileScoresKey()¶
- Returns:
Default key for stem RMSD / sequence profile / structure profile scores.
- Return type:
- static SetStemRMSDsKey(key)¶
- static SetSequenceProfileScoresKey(key)¶
- static SetStructureProfileScoresKey(key)¶
- Parameters:
key (
str
) – New default key for stem RMSD / sequence profile / structure profile scores.
Example: loop scoring in modelling¶
In the example below, we show how we find and choose a loop candidate to close a
gap for a model. This shows the combined usage of ModellingHandle
to
keep a consistent modelling environment, LoopCandidates
with its
scoring routines, ScoreContainer
for keeping track of scores and
ScoringWeights
to combine scores:
from ost import io, seq
from promod3 import modelling, loop
# setup raw model
tpl = io.LoadPDB('data/1crn_cut.pdb')
seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN'
seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA------GCIIIPGATCPGDYAN'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
print(("Number of gaps in raw model: %d" % len(mhandle.gaps)))
# setup default scorers for modelling handle
modelling.SetupDefaultBackboneScoring(mhandle)
modelling.SetupDefaultAllAtomScoring(mhandle)
# setup databases
frag_db = loop.LoadFragDB()
structure_db = loop.LoadStructureDB()
torsion_sampler = loop.LoadTorsionSamplerCoil()
# get data for gap to close
gap = mhandle.gaps[0].Copy()
print(("Gap to close: %s" % str(gap)))
n_stem = gap.before
c_stem = gap.after
start_resnum = n_stem.GetNumber().GetNum()
start_idx = start_resnum - 1 # res. num. starts at 1
# get loop candidates from FragDB
candidates = modelling.LoopCandidates.FillFromDatabase(\
n_stem, c_stem, gap.full_seq, frag_db, structure_db)
print(("Number of loop candidates: %d" % len(candidates)))
# all scores will be kept in a score container which we update
all_scores = modelling.ScoreContainer()
# the keys used to identify scores are globally defined
print(("Stem RMSD key = '%s'" \
% modelling.ScoringWeights.GetStemRMSDsKey()))
print(("Profile keys = ['%s', '%s']" \
% (modelling.ScoringWeights.GetSequenceProfileScoresKey(),
modelling.ScoringWeights.GetStructureProfileScoresKey())))
print(("Backbone scoring keys = %s" \
% str(modelling.ScoringWeights.GetBackboneScoringKeys())))
print(("All atom scoring keys = %s" \
% str(modelling.ScoringWeights.GetAllAtomScoringKeys())))
# get stem RMSDs for each candidate (i.e. how well does it fit?)
# -> this must be done before CCD to be meaningful
candidates.CalculateStemRMSDs(all_scores, n_stem, c_stem)
# close the candidates with CCD
orig_indices = candidates.ApplyCCD(n_stem, c_stem, torsion_sampler)
print(("Number of closed loop candidates: %d" % len(candidates)))
# get subset of previously computed scores
all_scores = all_scores.Extract(orig_indices)
# add profile scores (needs profile for target sequence)
prof = io.LoadSequenceProfile("data/1CRNA.hhm")
candidates.CalculateSequenceProfileScores(all_scores, structure_db,
prof, start_idx)
candidates.CalculateStructureProfileScores(all_scores, structure_db,
prof, start_idx)
# add backbone scores
candidates.CalculateBackboneScores(all_scores, mhandle, start_resnum)
# add all atom scores
candidates.CalculateAllAtomScores(all_scores, mhandle, start_resnum)
# use default weights to combine scores
weights = modelling.ScoringWeights.GetWeights(with_db=True,
with_aa=True)
scores = all_scores.LinearCombine(weights)
# rank them (best = lowest "score")
arg_sorted_scores = sorted([(v,i) for i,v in enumerate(scores)])
print("Ranked candidates: score, index")
for v,i in arg_sorted_scores:
print(("%g, %d" % (v,i)))
# insert best into model, update scorers and clear gaps
best_candidate = candidates[arg_sorted_scores[0][1]]
modelling.InsertLoopClearGaps(mhandle, best_candidate, gap)
print(("Number of gaps in closed model: %d" % len(mhandle.gaps)))
io.SavePDB(mhandle.model, "model.pdb")