Modelling Pipeline¶

A protein homology modelling pipeline has the following main steps:

Build a raw model from the template (see BuildRawModel() function)
Perform loop modelling to close (or remove) all gaps (see functions CloseSmallDeletions(), RemoveTerminalGaps(), MergeGapsByDistance(), FillLoopsByDatabase(), FillLoopsByMonteCarlo(), CloseLargeDeletions() or CloseGaps() that calls all these functions using predefined heuristics)
Build sidechains (see BuildSidechains() function)
Minimize energy of final model using molecular mechanics (see MinimizeModelEnergy() function)

The last steps to go from a raw model to a final model can easily be executed with the BuildFromRawModel() function. If you want to run and tweak the internal steps, you can start with the following code and adapt it to your purposes:

from ost import io
from promod3 import modelling, loop

# setup
merge_distance = 4
fragment_db = loop.LoadFragDB()
structure_db = loop.LoadStructureDB()
torsion_sampler = loop.LoadTorsionSamplerCoil()

# get raw model
tpl = io.LoadPDB('data/1crn_cut.pdb')
aln = io.LoadAlignment('data/1crn.fasta')
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)

# we're not modelling termini
modelling.RemoveTerminalGaps(mhandle)

# perform loop modelling to close all gaps
modelling.CloseGaps(mhandle, merge_distance, fragment_db,
                    structure_db, torsion_sampler)

# build sidechains
modelling.BuildSidechains(mhandle, merge_distance, fragment_db,
                          structure_db, torsion_sampler)

# minimize energy of final model using molecular mechanics
modelling.MinimizeModelEnergy(mhandle)

# check final model and report issues
modelling.CheckFinalModel(mhandle)

# extract final model
final_model = mhandle.model
io.SavePDB(final_model, 'model.pdb')

Build Raw Modelling Handle¶

class promod3.modelling.ModellingHandle¶

Handles the result for structure model building and provides high-level methods to turn an initial raw model (see BuildRawModel()) into a complete protein model by removing any existing gaps.

model¶

The resulting model. This includes one chain per target chain (in the same order as the sequences in seqres) and (if they were included) a chain named ‘_’ for ligands. You can therefore access model.chains items and seqres items with the same indexing and the optional ligand chain follows afterwards.

Type:: EntityHandle

gaps¶

List of gaps in the model that could not be copied from the template. These gaps may be the result of insertions/deletions in the alignment or due to missing or incomplete backbone coordinates in the template structure. Gaps of different chains are appended one after another.

Type:: StructuralGapList

seqres¶

List of sequences with one SequenceHandle for each chain of the target protein.

Type:: SequenceList

profiles¶

List of profiles with one ost.seq.ProfileHandle for each chain of the target protein (same order as in seqres). Please note, that this attribute won’t be set by simply calling BuildFromRawModel(). You have to fill it manually or even better by the convenient function SetSequenceProfiles(), to ensure consistency with the seqres.

Type:: list of ost.seq.ProfileHandle

psipred_predictions¶

List of predictions with one promod3.loop.PsipredPrediction for each chain of the target protein (same order as in seqres). Please note, that this attribute won’t be set by simply calling BuildFromRawModel(). You have to fill it manually or even better by the convenient function SetPsipredPredictions(), to ensure consistency with the seqres.

Type:: list of PsipredPrediction

backbone_scorer_env¶

Backbone score environment attached to this handle. A default environment is set with SetupDefaultBackboneScoring() when needed. Additional information can be added to the environment before running the pipeline steps.

Type:: BackboneScoreEnv

backbone_scorer¶

Backbone scorer container attached to this handle. A default set of scorers is initialized with SetupDefaultBackboneScoring() when needed.

Type:: BackboneOverallScorer

all_atom_scorer_env¶

All atom environment attached to this handle for scoring. A default environment is set with SetupDefaultAllAtomScoring() when needed. This environment is for temporary work only and is only updated to score loops. It is not to be updated when loops are chosen and added to the final model.

Type:: AllAtomEnv

all_atom_scorer¶

All atom scorer container attached to this handle. A default set of scorers is initialized with SetupDefaultAllAtomScoring() when needed.

Type:: AllAtomOverallScorer

all_atom_sidechain_env¶

All atom environment attached to this handle for sidechain reconstruction. A default environment is set with SetupDefaultAllAtomScoring() when needed.

Type:: AllAtomEnv

sidechain_reconstructor¶

A sidechain reconstructor to add sidechains to loops prior to all atom scoring. A default one is set with SetupDefaultAllAtomScoring() when needed.

Type:: SidechainReconstructor

fragger_handles¶

Optional attribute which is set in SetFraggerHandles(). Use hasattr() to check if it’s available. If it’s set, it is used in BuildFromRawModel().

Type:: list of FraggerHandle

modelling_issues¶

Optional attribute which is set in AddModellingIssue(). Use hasattr() to check if it’s available. If it’s set, it can be used to check issues which occurred in BuildFromRawModel() (see MinimizeModelEnergy() and CheckFinalModel() for details).

Type:: list of ModellingIssue

Copy()¶

Generates a deep copy. Everything will be copied over to the returned ModellingHandle, except the potentially set scoring members backbone_scorer, backbone_scorer_env, all_atom_scorer_env, all_atom_scorer, all_atom_sidechain_env and sidechain_reconstructor.

Returns:: A deep copy of the current handle
Return type:: ModellingHandle

promod3.modelling.BuildRawModel(aln, chain_names=None, include_ligands=False, spdbv_style=False, aln_preprocessing='default')¶

Builds a raw (pseudo) model from the alignment. Can either take a single alignment handle or an alignment handle list. Every list item is treated as a single chain in the final raw model.

Each alignment handle must contain exactly two sequences and the second sequence is considered the template sequence, which must have a EntityView attached.

Before extracting the coordinates, the alignments are pre-processed according to aln_preprocessing.

This is a basic protein core modelling algorithm that copies backbone coordinates based on the sequence alignment. For matching residues, the side chain coordinates are also copied. Gaps are ignored. Hydrogen an deuterium atoms are not copied into the model.

The function tries to reuse as much as possible from the template. Modified residues are treated as follows:

Selenium methionine residues are converted to methionine

Side chains which contain all atoms of the parent amino acid, e.g. phosphoserine are copied as a whole with the modifications stripped off.

Residues with missing backbone atoms and D-peptides are generally skipped and treated as gaps. Missing Cbeta atoms in backbone are ok and reconstructed. If all residues are skipped (e.g. Calpha traces), we report an error and return an empty model.

Residue numbers are set such that missing residue in gaps are honoured and subsequent loop modelling can insert new residues without having to renumber. The numbering of residues starts for every chain with the value 1.

The returned ModellingHandle stores the obtained raw model as well as information about insertions and deletions in the gaps list.

Parameters:

aln (AlignmentHandle / AlignmentList) – Single alignment handle for raw model with single chain or list of alignment handles for raw model with multiple chains.
include_ligands (bool) – True, if we wish to include ligands in the model. This searches for ligands in all OST handles of the views attached to the alignments. Ligands are identified with the ligand property in the handle (set by OST based on HET records) or by the chain name ‘_’ (as set in SMTL). All ligands are added to a new chain named ‘_’.
chain_names (str / list) – If set, this overrides the default chain naming (chains are consecutively named according to characters in ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz’). If aln is of type ost.seq.AlignmentHandle, chain_names is expected to be a str. If aln is of type ost.seq.AlignmentList, chain_names is expected to be a list of str of same size as aln or a str. For the latter case, chains will consecutively named according to characters in chain_names.
spdbv_style (bool) – True, if we need a model in the old SPDBV style.
aln_preprocessing – Calls promod3.modelling.PullTerminalDeletions() if set to ‘default’. Can be disabled when set to False.

Returns:

Raw (pseudo) model from the alignment.

Return type:

ModellingHandle

Raises:

A RuntimeError when:

the alignments do not have two sequences
the second sequence does not have an attached structure
the residues of the template structure do not match with the alignment sequence (note that you can set an “offset” (see SetSequenceOffset()) for the template sequence (but not for the target))
the target sequence has a non-zero offset (cannot be honored as the resulting model will always start its residue numbering at 1)

The Default Pipeline¶

promod3.modelling.BuildFromRawModel(mhandle, use_amber_ff=False, extra_force_fields=[], model_termini=False)¶

Build a model starting with a raw model (see BuildRawModel()).

This function implements a recommended pipeline to generate complete models from a raw model. The steps are shown in detail in the code example above. If you wish to use your own pipeline, you can use that code as a starting point for your own custom modelling pipeline. For reproducibility, we recommend that you keep copies of custom pipelines.

To adapt the scoring used during loop closing, you can call SetupDefaultBackboneScoring() and SetupDefaultAllAtomScoring() and adapt the default scoring members. Alternatively, you can setup the scoring manually, but you must ensure consistency yourself!

By default, a simple backbone dihedral sampling is performed when entering Monte Carlo. If mhandle has a list of FraggerHandle objects attached as “fragger_handles” attribute, the sampling will be performed with structural fragments. To ensure consistency, the fragger handles should be attached using SetFraggerHandles(). But be aware of increased runtime due to the fragment search step.

If the function fails to close all gaps, it will produce a warning and return an incomplete model.

Parameters:

mhandle (ModellingHandle) – The prepared template coordinates loaded with the input alignment.
use_amber_ff (bool) – if True, use the AMBER force field instead of the def. CHARMM one (see ost.mol.mm.LoadAMBERForcefield() and ost.mol.mm.LoadCHARMMForcefield()). Both do a similarly good job without ligands (CHARMM slightly better), but you will want to be consistent with the optional force fields in extra_force_fields.
extra_force_fields (list of ost.mol.mm.Forcefield) – Additional list of force fields to use if a (ligand) residue cannot be parametrized with the default force field. The force fields are tried in the order as given and ligands without an existing parametrization are skipped.
model_termini (bool) – The default modelling pipeline in ProMod3 is optimized to generate a gap-free model of the region in the target sequence(s) that is covered with template information. Terminal extensions without template coverage are negelected. You can activate this flag to enforce a model of the full target sequence(s). The terminal parts will be modelled with a crude Monte Carlo approach. Be aware that the accuracy of those termini is likely to be limited.

Returns:

Delivers the model as an OST entity.

Return type:

Entity

Modelling Steps¶

promod3.modelling.SetupDefaultBackboneScoring(mhandle)¶

Setup scorers and environment for meddling with backbones. This one is already tailored towards a certain modelling job. The scorers added (with their respective keys) are:

“cb_packing”: CBPackingScorer
“cbeta”: CBetaScorer
“reduced”: ReducedScorer
“clash”: ClashScorer
“hbond”: HBondScorer
“torsion”: TorsionScorer
“pairwise”: PairwiseScorer

Parameters:: mhandle (ModellingHandle) – The modelling handle. This will set the properties backbone_scorer and backbone_scorer_env of mhandle.

promod3.modelling.IsBackboneScoringSetUp(mhandle)¶

Returns:: True, if backbone_scorer and backbone_scorer_env of mhandle are set.
Return type:: bool
Parameters:: mhandle (ModellingHandle) – Modelling handle to check.

promod3.modelling.SetupDefaultAllAtomScoring(mhandle)¶

Setup scorers and environment to perform all atom scoring. This one is already tailored towards a certain modelling job, where we reconstruct sidechains for loop candidates and score them. The scorers added (with their respective keys) are:

“aa_interaction”: AllAtomInteractionScorer
“aa_packing”: AllAtomPackingScorer
“aa_clash”: AllAtomClashScorer

Parameters:: mhandle (ModellingHandle) – The modelling handle. This will set the properties all_atom_scorer_env, all_atom_scorer, all_atom_sidechain_env and sidechain_reconstructor.

promod3.modelling.IsAllAtomScoringSetUp(mhandle)¶

Returns:: True, if all_atom_scorer_env, all_atom_scorer, all_atom_sidechain_env and sidechain_reconstructor of mhandle are set.
Return type:: bool
Parameters:: mhandle (ModellingHandle) – Modelling handle to check.

promod3.modelling.InsertLoop(mhandle, bb_list, start_resnum, chain_idx)¶

Insert loop into model and ensure consistent updating of scoring environments. Note that we do not update all_atom_scorer_env as that one is meant to be updated only while scoring. To clear a gap while inserting a loop, use the simpler InsertLoopClearGaps().

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
bb_list (BackboneList) – Loop to insert (backbone only).
start_resnum (int) – Res. number defining the start position in the SEQRES.
chain_idx (int) – Index of chain the loop belongs to.

promod3.modelling.RemoveTerminalGaps(mhandle)¶

Removes terminal gaps without modelling them (just removes them from the list of gaps). This is useful for pipelines which lack the possibility to properly model loops at the termini.

Parameters:: mhandle (ModellingHandle) – Modelling handle on which to apply change.
Returns:: Number of gaps which were removed.
Return type:: int

promod3.modelling.ReorderGaps(mhandle)¶: Reorders all gaps to ensure sequential order by performing lexicographical comparison on the sequence formed by chain index of the gap and start residue number.

promod3.modelling.MergeMHandle(source_mhandle, target_mhandle, source_chain_idx, target_chain_idx, start_resnum, end_resnum, transform)¶

Merges the specified stretch of source_mhandle into target_mhandle by replacing all structural information and gaps in the stretch start_resnum and end_resnum (inclusive). The residues specified by start_resnum and end_resnum must be valid in the source_mhandle, i.e. not be enclosed by a gap. If a gap encloses start_resnum or end_resnum in the target_mhandle, the gap gets replaced by a shortened version not including the part overlapping with the defined stretch. If there is any scoring set up (backbone or all atom), the according environments get updated in target_mhandle.

Parameters:

source_mhandle (ModellingHandle) – Source of structural information and gaps
target_mhandle (ModellingHandle) – Structural information and gaps will be copied in here
source_chain_idx (int) – This is the chain where the info comes from
target_chain_idx (int) – This is the chain where the info goes to
start_resnum (int) – First residue of the copied stretch
end_resnum (int) – Last residue of the copied stretch
transform (ost.geom.Mat4) – Transformation to be applied to all atom positions when they’re copied over

Raises:

A RuntimeError when:

the chain indices are invalid
the SEQRES of the specified chains do not match
the start and end residue numbers are invalid or when the residues at the specified positions in the source_mhandle do not exist
a gap in the source_mhandle encloses the residues specified by start_resnum and end_resnum

promod3.modelling.SetSequenceProfiles(mhandle, profiles)¶

Sets the sequence profiles of mhandle while ensuring consistency with the seqres.

Parameters:

mhandle (ModellingHandle) – Will have the profiles attached afterwards
profiles (list of ost.seq.ProfileHandle) – The sequence profiles to attach

Raises:

ValueError when the given profiles are not consistent with seqres in mhandle

promod3.modelling.SetPsipredPredictions(mhandle, predictions)¶

Sets the predictions of mhandle while ensuring consistency with the seqres.

Parameters:

mhandle (ModellingHandle) – Will have the predictions attached afterwards
predictions (list of PsipredPrediction) – The predictions to attach

Raises:

ValueError when the given predictions are not consistent with seqres in mhandle

promod3.modelling.SetFraggerHandles(mhandle, fragger_handles)¶

Sets fragger_handles in mhandle while ensuring consistency with the ModellingHandle.seqres.

Parameters:

mhandle (ModellingHandle) – Will have the fragger handles attached afterwards
fragger_handles (list of FraggerHandle) – The fragger handles to attach

Raises:

ValueError when the given fragger_handles are not consistent with seqres in mhandle

promod3.modelling.CloseGaps(mhandle, merge_distance=4, fragment_db=None, structure_db=None, torsion_sampler=None, fragger_handles=None, chain_idx=None, resnum_range=None, length_dep_weights=False)¶

Tries to close all gaps in a model, except termini. It will go through following steps:

Try to close small deletions by relaxing them (see CloseSmallDeletions())
Iteratively merge gaps up to a distance merge_distance (see MergeGapsByDistance()) and try to fill them with a database approach (see FillLoopsByDatabase())
Try to fill remaining gaps using a Monte Carlo approach (see FillLoopsByMonteCarlo())
Large deletions get closed using a last resort approach (see CloseLargeDeletions())

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
merge_distance (int) – Max. merge distance when performing the database approach
fragment_db (FragDB) – Database for searching fragments in database approach, must be consistent with provided structure_db. A default is loaded if None.
structure_db (StructureDB) – Structure db from which the fragment_db gets it’s structural information. A default is loaded if None.
torsion_sampler (promod3.loop.TorsionSampler) – Used as parameter for FillLoopsByDatabase() and FillLoopsByMonteCarlo() A default one is loaded if None.
fragger_handles (list) – A list of promod3.modelling.FraggerHandle objects for each chain in mhandle. If provided, fragments will be used for sampling when the FillLoopsByMonteCarlo() gets executed.
chain_idx (int) – If not None, only gaps from chain with given index get processed
resnum_range (tuple containing two int) – If not None, only gaps within this resnum range get processed.
length_dep_weights (bool) – ScoringWeights provides different sets of weights that have been trained on different loop subsets. If this flag is true, the length dependent weights are used to close loops with database / Monte Carlo.

promod3.modelling.CloseSmallDeletions(mhandle, max_extension=9, clash_thresh=1.0, e_thresh=200, use_scoring_extender=True, use_full_extender=True, chain_idx=None, resnum_range=None, ff_lookup=None)¶

Close small deletions by relaxing neighbouring residues.

Small deletions in the template from the target-template alignment have a good chance to be bridged just by relaxing neighbours around a tiny gap. Before diving into the more demanding tasks in modeling, those may be closed already in the raw-model. After closure some checks are done to see if the solution is stereochemically sensible.

Closed gaps are removed from mhandle.gaps.

from ost import io, seq
from promod3 import modelling

# setup
tpl = io.LoadPDB('data/gly.pdb')
aln = seq.CreateAlignment(seq.CreateSequence('trg', 'GGGG-GGGG'),
                          seq.CreateSequence('tpl', 'GGGGAGGGG'))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
# close small deletion
print('Number of gaps before: %d' % len(mhandle.gaps))
modelling.CloseSmallDeletions(mhandle)
print('Number of gaps after: %d' % len(mhandle.gaps))

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
max_extension (int) – Maximal number of gap extension steps to perform (see GapExtender)
clash_thresh (float) – Threshold for the backbone clash score. Acceptance means being lower than this.
e_thresh (float) – Potential energy should be lower than this.
use_scoring_extender (bool) – True = use ScoringGapExtender instead of GapExtender. The gap is penalized according as 0.8*length + sum(helices) + sum(sheets). For the scondary-structure-penalty to work, the model-template must have the appropriate information before BuildRawModel() is called (e.g. with ost.mol.alg.AssignSecStruct()).
use_full_extender (bool) – True = use FullGapExtender instead of of GapExtender. Also works in combination with use_scoring_extender. This allows the gap extender to skip neighboring gaps and to correctly handle gaps close to termini.
chain_idx (int) – If not None, only gaps from chain with given index get processed
resnum_range (tuple containing two int) – If not None, only gaps within this resnum range get processed.
ff_lookup (promod3.loop.ForcefieldLookup) – Forcefield to parametrize promod3.loop.BackboneList in promod3.modelling.BackboneRelaxer. If set to None, the one returned by promod3.loop.ForcefieldLookup.GetDefault() gets used.

promod3.modelling.MergeGapsByDistance(mhandle, distance, chain_idx=None, resnum_range=None)¶

Merge 2 neighbouring gaps by deleting residues in-between.

Check if two neighbouring gaps are at max. distance residues apart from each other. Then delete the residues and store a new gap spanning the whole stretch of original gaps and the deleted region. Original gaps will be removed. Stem residues count to the gap, so A-A-A has a distance of 0.

IMPORTANT: we assume here that mhandle stores gaps sequentially. Non-sequential gaps are ignored!

from ost import io, seq
from promod3 import modelling

# setup
tpl = io.LoadPDB('data/1crn_cut.pdb')
seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATGYTCIIIPGATCPGDYAN'
seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA----G--CIIIPGATCPGDYAN'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
                          seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
# merge gaps
print('Number of gaps before: %d' % len(mhandle.gaps))
modelling.MergeGapsByDistance(mhandle, 0)
print('Number of gaps after: %d' % len(mhandle.gaps))

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
distance (int) – The max. no. of residues between two gaps up to which merge happens.
chain_idx (int) – If not None, only gaps from chain with given index get processed
resnum_range (tuple containing two int) – If not None, two gaps only get merged if they’re both in this resnum range.

promod3.modelling.FillLoopsByDatabase(mhandle, fragment_db, structure_db, torsion_sampler=None, max_loops_to_search=40, min_loops_required=4, max_res_extension=-1, extended_search=True, use_scoring_extender=True, use_full_extender=True, score_variant=0, ring_punch_detection=1, chain_idx=None, resnum_range=None, max_num_all_atom=0, clash_thresh=-1, length_dep_weights=False)¶

Try to fill up loops from a structural database.

Usually this will extend the gaps a bit to match candidates from the database. Do not expect a gap being filled in between its actual stem residues. This function cannot fill gaps at C- or N-terminal.

from ost import io, seq
from promod3 import modelling, loop

# setup
tpl = io.LoadPDB('data/1crn_cut.pdb')
seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN'
seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA------GCIIIPGATCPGDYAN'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
                          seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
# close gaps
print('Number of gaps before: %d' % len(mhandle.gaps))
modelling.FillLoopsByDatabase(mhandle, loop.LoadFragDB(),
                              loop.LoadStructureDB(),
                              loop.LoadTorsionSamplerCoil())
print('Number of gaps after: %d' % len(mhandle.gaps))

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
fragment_db (FragDB) – A fragment database coupled to the structure_db.
structure_db (StructureDB) – Backbone/profile data.
torsion_sampler (TorsionSampler) – A sampler for torsion angles. A default one is loaded if None.
max_loops_to_search (int) – Define how many candidates are ‘enough’ to be evaluated per loop. The actual found candidates may be more (if we found ‘enough’) or less (if not enough candidates exist) of this number.
min_loops_required (int) – Define how many candidates we require to close the loop. If we did not find at least this number of candidates for a gap, we skip it without closing. Can be set to max_loops_to_search (or equivalently to -1) to enforce that we only close gaps for which we found enough candidates.
max_res_extension (int) – Only allow this number of residues to be added to the gaps when extending. If set to -1, any number of residues can be added (as long as the fragment_db allows it).
extended_search (bool) – True = more loop candidates are considered. The candidate search is done less precisely (see FillFromDatabase()). The candidates are still scored and evaluated the same though (only more of them considered).
use_scoring_extender (bool) – True = use ScoringGapExtender instead of GapExtender. See CloseSmallDeletions().
use_full_extender (bool) – True = use FullGapExtender instead of GapExtender. See CloseSmallDeletions().
score_variant (int) –
How to score loop candidates. Options:
- 0: put frame of backbone residues enclosing all candidates and score frame. This will also “score” non-modelled residues!
- 1: score candidates directly
- 2: like 1 but penalize length of candidate
ring_punch_detection (int) –
How to deal with ring punchings. Options:
- 0: not at all (fastest)
- 1: check for punchings with existing rings
- 2: check incl. sidechain for loop cand.
chain_idx (int) – If not None, only gaps from chain with given index get processed
resnum_range (tuple containing two int) – If not None, only gaps within this resnum range get processed
max_num_all_atom (int) – If > 0, we prefilter loop candidates based on non-all-atom-scores and apply all atom scoring to the best max_num_all_atom candidates. If desired, 5 is a good value here (larger values give only numerical improvement). With 5, this will be approx. 2x slower than without and will give a slight improvement in loop selection.
clash_thresh (float) – If > 0, we only keep loop candidates which have a backbone clash score lower than this.
length_dep_weights (bool) – ScoringWeights provides different sets of weights that have been trained on different loop subsets. If this flag is true, the length dependent weights are used to select the final loops.

promod3.modelling.FillLoopsByMonteCarlo(mhandle, torsion_sampler=None, max_loops_to_search=6, max_extension=30, mc_num_loops=2, mc_steps=5000, use_scoring_extender=True, use_full_extender=True, score_variant=0, ring_punch_detection=1, fragger_handles=None, chain_idx=None, resnum_range=None, length_dep_weights=False)¶

Try to fill up loops with Monte Carlo sampling.

This is meant as a “last-resort” approach when it is not possible to fill the loops from the database with FillLoopsByDatabase(). This will extend the gaps (up to max_extension times) a bit to allow for more loop candidates to be found.

The loops are modelled by either sampling the dihedral angles or (if fragger_handles is given) Fragger lists. The latter is only used if the gap length is >= the length of fragments stored.

This function cannot fill gaps at C- or N-terminal.

from ost import io, seq
from promod3 import modelling, loop

# setup
tpl = io.LoadPDB('data/1crn_cut.pdb')
seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN'
seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA------GCIIIPGATCPGDYAN'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
                          seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
# close gaps
print('Number of gaps before: %d' % len(mhandle.gaps))
modelling.FillLoopsByMonteCarlo(mhandle,
                              	loop.LoadTorsionSamplerCoil())
print('Number of gaps after: %d' % len(mhandle.gaps))

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
torsion_sampler (TorsionSampler) – A sampler for torsion angles. A default one is loaded if None.
max_loops_to_search (int) – Define how many candidates are ‘enough’ to be evaluated per loop.
max_extension (int) – Maximal number of gap extension steps to perform (see GapExtender)
mc_num_loops (int) – Number of loop candidates to consider for each extended gap (see FillFromMonteCarloSampler())
mc_steps (int) – Number of MC steps to perform for each loop candidate (see FillFromMonteCarloSampler())
use_scoring_extender (bool) – True = use ScoringGapExtender instead of GapExtender. See CloseSmallDeletions().
use_full_extender (bool) – True = use FullGapExtender instead of GapExtender. See CloseSmallDeletions().
score_variant (int) – How to score loop candidates (AllAtom not supported). See FillLoopsByDatabase().
ring_punch_detection (int) – How to deal with ring punchings. See FillLoopsByDatabase().
fragger_handles (list of FraggerHandle) – Either None (no fragger sampling used) or one fragger handle for each chain in mhandle.
chain_idx (int) – If not None, only gaps from chain with given index get processed
resnum_range (tuple containing two int) – If not None, only gaps within this resnum range get processed
length_dep_weights (bool) – ScoringWeights provides different sets of weights that have been trained on different loop subsets. If this flag is true, the length dependent weights are used to select the final loops.

promod3.modelling.CloseLargeDeletions(mhandle, structure_db, linker_length=8, num_fragments=500, use_scoring_extender=True, use_full_extender=True, chain_idx=None, resnum_range=None)¶

Try to close large deletions.

This is meant as a “last-resort” approach. In some cases you cannot close very large deletions simply because the two parts separated by a deletion are too far apart. The idea is to sample a linker region and always move the whole chain towards the n-terminus.

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
structure_db (StructureDB) – The database from which to extract fragments for the linker region.
linker_length (int) – Desired length (in residues w/o stems) for the linker. This may be shorter if extender cannot extend further.
num_fragments (int) – Number of fragments to sample the linker.
use_scoring_extender (bool) – True = use ScoringGapExtender instead of GapExtender. See CloseSmallDeletions().
use_full_extender (bool) – True = use FullGapExtender instead of GapExtender. See CloseSmallDeletions().
chain_idx (int) – If not None, only gaps from chain with given index get processed
resnum_range (tuple containing two int) – If not None, only gaps within this resnum range get processed

promod3.modelling.ModelTermini(mhandle, torsion_sampler, fragger_handles=None, mc_num_loops=20, mc_steps=5000)¶

Try to model termini with Monte Carlo sampling.

Use with care! This is an experimental feature which will increase coverage but we do not assume that the resulting termini are of high quality!

The termini are modelled by either sampling the dihedral angles or (if fragger_handles is given) Fragger lists. The latter is only used if the gap length is >= the length of fragments stored.

from ost import io, seq
from promod3 import modelling, loop

# setup
tpl = io.LoadPDB('data/gly.pdb')
seq_trg = 'AAAAGGGGGGGGGGGGGGGGGGGGAAAAAA'
seq_tpl = '----GGGGGGGGGGGGGGGGGGGG------'
aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg),
                          seq.CreateSequence('tpl', seq_tpl))
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
# close gaps
print('Number of gaps before: %d' % len(mhandle.gaps))
modelling.ModelTermini(mhandle,
                       loop.LoadTorsionSamplerCoil())
print('Number of gaps after: %d' % len(mhandle.gaps))

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
torsion_sampler (TorsionSampler) – A sampler for torsion angles.
fragger_handles (list of FraggerHandle) – Either None (no fragger sampling used) or one fragger handle for each chain in mhandle.
mc_num_loops (int) – Number of loop candidates to consider for each terminal gap (see FillFromMonteCarloSampler())
mc_steps (int) – Number of MC steps to perform for each loop candidate (see FillFromMonteCarloSampler())

promod3.modelling.BuildSidechains(mhandle, merge_distance=4, fragment_db=None, structure_db=None, torsion_sampler=None, rotamer_library=None)¶

Build sidechains for model.

This is a wrapper for promod3.modelling.ReconstructSidechains(), followed by a check for ring punches. If ring punches are found it introduces gaps for the residues with punched rings and tries to fill them with FillLoopsByDatabase() with ring_punch_detection=2.

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
merge_distance (int) – Used as parameter for MergeGapsByDistance() if ring punches are found.
fragment_db (FragDB) – Used as parameter for FillLoopsByDatabase() if ring punches are found. A default one is loaded if None.
structure_db (StructureDB) – Used as parameter for FillLoopsByDatabase() if ring punches are found. A default one is loaded if None.
torsion_sampler (TorsionSampler) – Used as parameter for FillLoopsByDatabase() if ring punches are found. A default one is loaded if None.
rotamer_library (RotamerLib or BBDepRotamerLib) – Used as parameter for modelling.ReconstructSidechains(), a default one is loaded if None.

promod3.modelling.MinimizeModelEnergy(mhandle, max_iterations=12, max_iter_sd=20, max_iter_lbfgs=10, use_amber_ff=False, extra_force_fields=[], tolerance_sd=1.0, tolerance_lbfgs=1.0)¶

Minimize energy of final model using molecular mechanics.

Uses ost.mol.mm to perform energy minimization. It will iteratively (at most max_iterations times):

run up to max_iter_sd minimization iter. of a steepest descend method
run up to max_iter_lbfgs minimization iter. of a Limited-memory Broyden-Fletcher-Goldfarb-Shanno method
abort if no stereochemical problems found

The idea is that we don’t want to minimize “too much”. So, we iteratively minimize until there are no stereochemical problems and not more.

To speed things up, this can run on multiple CPU threads by setting the env. variable PM3_OPENMM_CPU_THREADS to the number of desired threads. If the variable is not set, 1 thread will be used by default.

If the starting model is so bad that the energy is NaN or Inf from the start (happens if atoms are on top of each other or almost), the energy minimization is aborted. This issue is logged and added as a major issue to modelling_issues of mhandle.

Parameters:

mhandle (ModellingHandle) – Modelling handle on which to apply change.
max_iterations (int) – Max. number of iterations for SD+LBFGS
max_iter_sd (int) – Max. number of iterations within SD method
max_iter_lbfgs (int) – Max. number of iterations within LBFGS method
tolerance_sd (float) – Tolerance parameter passed to ApplySD of ost.mol.mm.Simulation object.
tolerance_lbfgs (float) – Tolerance parameter passed to ApplyLBFGS of ost.mol.mm.Simulation object.
use_amber_ff (bool) – if True, use the AMBER force field instead of the def. CHARMM one (see BuildFromRawModel()).
extra_force_fields (list of ost.mol.mm.Forcefield) – Additional list of force fields to use (see BuildFromRawModel()).

Returns:

The model including all oxygens as used in the minimizer.

Return type:

Entity

promod3.modelling.CheckFinalModel(mhandle)¶

Performs samity checks on final models and reports problems.

Issues are logged and tracked in modelling_issues of mhandle. Major issues:

Chains with less than 3 residues (usually due to bad templates).
Incomplete models (i.e. some gaps couldn’t be closed). One issue is created per unclosed gap and the stems of the gap are added to the issue.
Complete models with sequence mismatches (should never happen).
Residues with rings which have been punched by another bond.

Minor issues:

Remaining stereo-chemical problems after energy minimization. The affected residues will have the boolean property “stereo_chemical_problem_backbone” set to True, if the problem affects backbone atoms.
Residues that contain non-planar rings.

Parameters:: mhandle (ModellingHandle) – Modelling handle for which to perform checks.

class promod3.modelling.ModellingIssue(text, severity, residue_list=[])¶

Parameters:

text – Sets text.
severity – Sets severity.
residue_list – Sets residue_list.

text¶

Description of issue.

Type:: str

severity¶

Severity of issue.

Type:: Severity

residue_list¶

List of residues affected by issue (or empty list if global issue).

Type:: list of ResidueHandle / ResidueView

class Severity¶

Enumerates severities.

MAJOR = 10¶: Major issues like MM-failures, incomplete models and ring punches.

MINOR = 0¶: Minor issues like remaining stereo-chemistry problems after MM.

is_major()¶

Returns:: True if this is a major issue.
Return type:: bool

promod3.modelling.AddModellingIssue(mhandle, text, severity, residue_list=[])¶

Adds a new ModellingIssue to modelling_issues in mhandle.

If mhandle doesn’t contain the modelling_issues attribute yet, it is added.

Parameters:

mhandle (ModellingHandle) – Will have the issue added to.
text – Sets ModellingIssue.text.
severity – Sets ModellingIssue.severity.
residue_list – Sets ModellingIssue.residue_list.

Alignment Fiddling¶

promod3.modelling.DeleteGapCols(aln)¶

Deletes alignment columns that only contain gaps.

Columns that only contain gaps (‘-’) are removed. If no such column can be identified, the input alignment gets returned. A new alignment gets constructed otherwise. The sequences of the new alignment retain name, offset and the potentially attached ost.mol.EntityView from the original sequences.

Parameters:: aln (ost.seq.AlignmentHandle) – Input alignment
Returns:: The processed alignment
Return type:: ost.seq.AlignmentHandle

promod3.modelling.PullTerminalDeletions(aln, min_terminal_anchor_size=4)¶

Fixes deletions close to termini.

Some alignment tools may produce alignments with deletions close to the termini. For example:

SEQRES:  A-----BCDE...
ATOMSEQ: ABCDEFGHIJ...

where A is the anchor residue. The default loop modelling pipeline would keep the position of A fixed and start to omit structural information from B, C, … until it is able to resolve the deletion. If the anchor is very short, a shift typically results in a better model:

SEQRES:  -----ABCDE...
ATOMSEQ: ABCDEFGHIJ... 

This function checks whether the gap closest to any termini is a deletion (one or several ‘-’ in the first sequence) and estimates the anchor size (number of aligned residues towards the respective termini from that gap). If the anchor size is smaller than min_terminal_anchor_size, a shift is applied and the deletion removed.

This is done iteratively, until no deletion can be removed anymore.

SEQRES:  A-B--CDEF...        
ATOMSEQ: ABCDEFGHI...

becomes

SEQRES:  ---ABCDEF...
ATOMSEQ: ABCDEFGHI...

given a min_terminal_anchor_size>2. If no shift can be performed, the input alignment gets returned. A new alignment gets constructed otherwise. The sequences of the new alignment retain name, offset and the potentially attached ost.mol.EntityView from the original sequences.

Parameters:: aln (ost.seq.AlignmentHandle) – Input alignment
Returns:: The processed alignment
Return type:: ost.seq.AlignmentHandle

Modelling Pipeline¶

Build Raw Modelling Handle¶

The Default Pipeline¶

Modelling Steps¶

Alignment Fiddling¶

Search

Contents