Modelling Pipeline¶
A protein homology modelling pipeline has the following main steps:
Build a raw model from the template (see
BuildRawModel()
function)Perform loop modelling to close (or remove) all gaps (see functions
CloseSmallDeletions()
,RemoveTerminalGaps()
,MergeGapsByDistance()
,FillLoopsByDatabase()
,FillLoopsByMonteCarlo()
,CloseLargeDeletions()
orCloseGaps()
that calls all these functions using predefined heuristics)Build sidechains (see
BuildSidechains()
function)Minimize energy of final model using molecular mechanics (see
MinimizeModelEnergy()
function)
The last steps to go from a raw model to a final model can easily be executed
with the BuildFromRawModel()
function. If you want to run and tweak the
internal steps, you can start with the following code and adapt it to your
purposes:
from ost import io
from promod3 import modelling, loop
# setup
merge_distance = 4
fragment_db = loop.LoadFragDB()
structure_db = loop.LoadStructureDB()
torsion_sampler = loop.LoadTorsionSamplerCoil()
# get raw model
tpl = io.LoadPDB('data/1crn_cut.pdb')
aln = io.LoadAlignment('data/1crn.fasta')
aln.AttachView(1, tpl.CreateFullView())
mhandle = modelling.BuildRawModel(aln)
# we're not modelling termini
modelling.RemoveTerminalGaps(mhandle)
# perform loop modelling to close all gaps
modelling.CloseGaps(mhandle, merge_distance, fragment_db,
structure_db, torsion_sampler)
# build sidechains
modelling.BuildSidechains(mhandle, merge_distance, fragment_db,
structure_db, torsion_sampler)
# minimize energy of final model using molecular mechanics
modelling.MinimizeModelEnergy(mhandle)
# check final model and report issues
modelling.CheckFinalModel(mhandle)
# extract final model
final_model = mhandle.model
io.SavePDB(final_model, 'model.pdb')
Build Raw Modelling Handle¶
- class promod3.modelling.ModellingHandle¶
Handles the result for structure model building and provides high-level methods to turn an initial raw model (see
BuildRawModel()
) into a complete protein model by removing any existing gaps.- model¶
The resulting model. This includes one chain per target chain (in the same order as the sequences in
seqres
) and (if they were included) a chain named ‘_’ for ligands. You can therefore access model.chains items andseqres
items with the same indexing and the optional ligand chain follows afterwards.- Type:
- gaps¶
List of gaps in the model that could not be copied from the template. These gaps may be the result of insertions/deletions in the alignment or due to missing or incomplete backbone coordinates in the template structure. Gaps of different chains are appended one after another.
- Type:
- seqres¶
List of sequences with one
SequenceHandle
for each chain of the target protein.- Type:
- profiles¶
List of profiles with one
ost.seq.ProfileHandle
for each chain of the target protein (same order as inseqres
). Please note, that this attribute won’t be set by simply callingBuildFromRawModel()
. You have to fill it manually or even better by the convenient functionSetSequenceProfiles()
, to ensure consistency with the seqres.- Type:
- psipred_predictions¶
List of predictions with one
promod3.loop.PsipredPrediction
for each chain of the target protein (same order as inseqres
). Please note, that this attribute won’t be set by simply callingBuildFromRawModel()
. You have to fill it manually or even better by the convenient functionSetPsipredPredictions()
, to ensure consistency with the seqres.- Type:
- backbone_scorer_env¶
Backbone score environment attached to this handle. A default environment is set with
SetupDefaultBackboneScoring()
when needed. Additional information can be added to the environment before running the pipeline steps.- Type:
- backbone_scorer¶
Backbone scorer container attached to this handle. A default set of scorers is initialized with
SetupDefaultBackboneScoring()
when needed.- Type:
- all_atom_scorer_env¶
All atom environment attached to this handle for scoring. A default environment is set with
SetupDefaultAllAtomScoring()
when needed. This environment is for temporary work only and is only updated to score loops. It is not to be updated when loops are chosen and added to the final model.- Type:
- all_atom_scorer¶
All atom scorer container attached to this handle. A default set of scorers is initialized with
SetupDefaultAllAtomScoring()
when needed.- Type:
- all_atom_sidechain_env¶
All atom environment attached to this handle for sidechain reconstruction. A default environment is set with
SetupDefaultAllAtomScoring()
when needed.- Type:
- sidechain_reconstructor¶
A sidechain reconstructor to add sidechains to loops prior to all atom scoring. A default one is set with
SetupDefaultAllAtomScoring()
when needed.- Type:
- fragger_handles¶
Optional attribute which is set in
SetFraggerHandles()
. Usehasattr()
to check if it’s available. If it’s set, it is used inBuildFromRawModel()
.- Type:
- modelling_issues¶
Optional attribute which is set in
AddModellingIssue()
. Usehasattr()
to check if it’s available. If it’s set, it can be used to check issues which occurred inBuildFromRawModel()
(seeMinimizeModelEnergy()
andCheckFinalModel()
for details).- Type:
- Copy()¶
Generates a deep copy. Everything will be copied over to the returned
ModellingHandle
, except the potentially set scoring membersbackbone_scorer
,backbone_scorer_env
,all_atom_scorer_env
,all_atom_scorer
,all_atom_sidechain_env
andsidechain_reconstructor
.- Returns:
A deep copy of the current handle
- Return type:
- promod3.modelling.BuildRawModel(aln, chain_names=None, include_ligands=False, spdbv_style=False, aln_preprocessing='default')¶
Builds a raw (pseudo) model from the alignment. Can either take a single alignment handle or an alignment handle list. Every list item is treated as a single chain in the final raw model.
Each alignment handle must contain exactly two sequences and the second sequence is considered the template sequence, which must have a
EntityView
attached.Before extracting the coordinates, the alignments are pre-processed according to aln_preprocessing.
This is a basic protein core modelling algorithm that copies backbone coordinates based on the sequence alignment. For matching residues, the side chain coordinates are also copied. Gaps are ignored. Hydrogen an deuterium atoms are not copied into the model.
The function tries to reuse as much as possible from the template. Modified residues are treated as follows:
Selenium methionine residues are converted to methionine
Side chains which contain all atoms of the parent amino acid, e.g. phosphoserine are copied as a whole with the modifications stripped off.
Residues with missing backbone atoms and D-peptides are generally skipped and treated as gaps. Missing Cbeta atoms in backbone are ok and reconstructed. If all residues are skipped (e.g. Calpha traces), we report an error and return an empty model.
Residue numbers are set such that missing residue in gaps are honoured and subsequent loop modelling can insert new residues without having to renumber. The numbering of residues starts for every chain with the value 1.
The returned
ModellingHandle
stores the obtained raw model as well as information about insertions and deletions in the gaps list.- Parameters:
aln (
AlignmentHandle
/AlignmentList
) – Single alignment handle for raw model with single chain or list of alignment handles for raw model with multiple chains.include_ligands (
bool
) – True, if we wish to include ligands in the model. This searches for ligands in all OST handles of the views attached to the alignments. Ligands are identified with the ligand property in the handle (set by OST based on HET records) or by the chain name ‘_’ (as set in SMTL). All ligands are added to a new chain named ‘_’.chain_names (
str
/list
) – If set, this overrides the default chain naming (chains are consecutively named according to characters in ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz’). If aln is of typeost.seq.AlignmentHandle
, chain_names is expected to be astr
. If aln is of typeost.seq.AlignmentList
, chain_names is expected to be alist
ofstr
of same size as aln or astr
. For the latter case, chains will consecutively named according to characters in chain_names.spdbv_style (
bool
) – True, if we need a model in the old SPDBV style.aln_preprocessing – Calls
promod3.modelling.PullTerminalDeletions()
if set to ‘default’. Can be disabled when set to False.
- Returns:
Raw (pseudo) model from the alignment.
- Return type:
- Raises:
A
RuntimeError
when:the alignments do not have two sequences
the second sequence does not have an attached structure
the residues of the template structure do not match with the alignment sequence (note that you can set an “offset” (see
SetSequenceOffset()
) for the template sequence (but not for the target))the target sequence has a non-zero offset (cannot be honored as the resulting model will always start its residue numbering at 1)
The Default Pipeline¶
- promod3.modelling.BuildFromRawModel(mhandle, use_amber_ff=False, extra_force_fields=[], model_termini=False)¶
Build a model starting with a raw model (see
BuildRawModel()
).This function implements a recommended pipeline to generate complete models from a raw model. The steps are shown in detail in the code example above. If you wish to use your own pipeline, you can use that code as a starting point for your own custom modelling pipeline. For reproducibility, we recommend that you keep copies of custom pipelines.
To adapt the scoring used during loop closing, you can call
SetupDefaultBackboneScoring()
andSetupDefaultAllAtomScoring()
and adapt the default scoring members. Alternatively, you can setup the scoring manually, but you must ensure consistency yourself!By default, a simple backbone dihedral sampling is performed when entering Monte Carlo. If mhandle has a list of
FraggerHandle
objects attached as “fragger_handles” attribute, the sampling will be performed with structural fragments. To ensure consistency, the fragger handles should be attached usingSetFraggerHandles()
. But be aware of increased runtime due to the fragment search step.If the function fails to close all gaps, it will produce a warning and return an incomplete model.
- Parameters:
mhandle (
ModellingHandle
) – The prepared template coordinates loaded with the input alignment.use_amber_ff (
bool
) – if True, use the AMBER force field instead of the def. CHARMM one (seeost.mol.mm.LoadAMBERForcefield()
andost.mol.mm.LoadCHARMMForcefield()
). Both do a similarly good job without ligands (CHARMM slightly better), but you will want to be consistent with the optional force fields in extra_force_fields.extra_force_fields (
list
ofost.mol.mm.Forcefield
) – Additional list of force fields to use if a (ligand) residue cannot be parametrized with the default force field. The force fields are tried in the order as given and ligands without an existing parametrization are skipped.model_termini (
bool
) – The default modelling pipeline in ProMod3 is optimized to generate a gap-free model of the region in the target sequence(s) that is covered with template information. Terminal extensions without template coverage are negelected. You can activate this flag to enforce a model of the full target sequence(s). The terminal parts will be modelled with a crude Monte Carlo approach. Be aware that the accuracy of those termini is likely to be limited.
- Returns:
Delivers the model as an OST entity.
- Return type:
Modelling Steps¶
- promod3.modelling.SetupDefaultBackboneScoring(mhandle)¶
Setup scorers and environment for meddling with backbones. This one is already tailored towards a certain modelling job. The scorers added (with their respective keys) are:
“cb_packing”:
CBPackingScorer
“cbeta”:
CBetaScorer
“reduced”:
ReducedScorer
“clash”:
ClashScorer
“hbond”:
HBondScorer
“torsion”:
TorsionScorer
“pairwise”:
PairwiseScorer
- Parameters:
mhandle (
ModellingHandle
) – The modelling handle. This will set the propertiesbackbone_scorer
andbackbone_scorer_env
of mhandle.
- promod3.modelling.IsBackboneScoringSetUp(mhandle)¶
- Returns:
True, if
backbone_scorer
andbackbone_scorer_env
of mhandle are set.- Return type:
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle to check.
- promod3.modelling.SetupDefaultAllAtomScoring(mhandle)¶
Setup scorers and environment to perform all atom scoring. This one is already tailored towards a certain modelling job, where we reconstruct sidechains for loop candidates and score them. The scorers added (with their respective keys) are:
“aa_interaction”:
AllAtomInteractionScorer
“aa_packing”:
AllAtomPackingScorer
“aa_clash”:
AllAtomClashScorer
- Parameters:
mhandle (
ModellingHandle
) – The modelling handle. This will set the propertiesall_atom_scorer_env
,all_atom_scorer
,all_atom_sidechain_env
andsidechain_reconstructor
.
- promod3.modelling.IsAllAtomScoringSetUp(mhandle)¶
- Returns:
True, if
all_atom_scorer_env
,all_atom_scorer
,all_atom_sidechain_env
andsidechain_reconstructor
of mhandle are set.- Return type:
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle to check.
- promod3.modelling.InsertLoop(mhandle, bb_list, start_resnum, chain_idx)¶
Insert loop into model and ensure consistent updating of scoring environments. Note that we do not update
all_atom_scorer_env
as that one is meant to be updated only while scoring. To clear a gap while inserting a loop, use the simplerInsertLoopClearGaps()
.- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.bb_list (
BackboneList
) – Loop to insert (backbone only).start_resnum (
int
) – Res. number defining the start position in the SEQRES.chain_idx (
int
) – Index of chain the loop belongs to.
- promod3.modelling.RemoveTerminalGaps(mhandle)¶
Removes terminal gaps without modelling them (just removes them from the list of gaps). This is useful for pipelines which lack the possibility to properly model loops at the termini.
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.- Returns:
Number of gaps which were removed.
- Return type:
- promod3.modelling.ReorderGaps(mhandle)¶
Reorders all gaps to ensure sequential order by performing lexicographical comparison on the sequence formed by chain index of the gap and start residue number.
- promod3.modelling.MergeMHandle(source_mhandle, target_mhandle, source_chain_idx, target_chain_idx, start_resnum, end_resnum, transform)¶
Merges the specified stretch of source_mhandle into target_mhandle by replacing all structural information and gaps in the stretch start_resnum and end_resnum (inclusive). The residues specified by start_resnum and end_resnum must be valid in the source_mhandle, i.e. not be enclosed by a gap. If a gap encloses start_resnum or end_resnum in the target_mhandle, the gap gets replaced by a shortened version not including the part overlapping with the defined stretch. If there is any scoring set up (backbone or all atom), the according environments get updated in target_mhandle.
- Parameters:
source_mhandle (
ModellingHandle
) – Source of structural information and gapstarget_mhandle (
ModellingHandle
) – Structural information and gaps will be copied in heresource_chain_idx (
int
) – This is the chain where the info comes fromtarget_chain_idx (
int
) – This is the chain where the info goes tostart_resnum (
int
) – First residue of the copied stretchend_resnum (
int
) – Last residue of the copied stretchtransform (
ost.geom.Mat4
) – Transformation to be applied to all atom positions when they’re copied over
- Raises:
A
RuntimeError
when:the chain indices are invalid
the SEQRES of the specified chains do not match
the start and end residue numbers are invalid or when the residues at the specified positions in the source_mhandle do not exist
a gap in the source_mhandle encloses the residues specified by start_resnum and end_resnum
- promod3.modelling.SetSequenceProfiles(mhandle, profiles)¶
Sets the
sequence profiles
of mhandle while ensuring consistency with theseqres
.- Parameters:
mhandle (
ModellingHandle
) – Will have the profiles attached afterwardsprofiles (
list
ofost.seq.ProfileHandle
) – The sequence profiles to attach
- Raises:
ValueError
when the given profiles are not consistent with seqres in mhandle
- promod3.modelling.SetPsipredPredictions(mhandle, predictions)¶
Sets the
predictions
of mhandle while ensuring consistency with theseqres
.- Parameters:
mhandle (
ModellingHandle
) – Will have the predictions attached afterwardspredictions (
list
ofPsipredPrediction
) – The predictions to attach
- Raises:
ValueError
when the given predictions are not consistent with seqres in mhandle
- promod3.modelling.SetFraggerHandles(mhandle, fragger_handles)¶
Sets
fragger_handles
in mhandle while ensuring consistency with theModellingHandle.seqres
.- Parameters:
mhandle (
ModellingHandle
) – Will have the fragger handles attached afterwardsfragger_handles (
list
ofFraggerHandle
) – The fragger handles to attach
- Raises:
ValueError
when the given fragger_handles are not consistent with seqres in mhandle
- promod3.modelling.CloseGaps(mhandle, merge_distance=4, fragment_db=None, structure_db=None, torsion_sampler=None, fragger_handles=None, chain_idx=None, resnum_range=None, length_dep_weights=False)¶
Tries to close all gaps in a model, except termini. It will go through following steps:
Try to close small deletions by relaxing them (see
CloseSmallDeletions()
)Iteratively merge gaps up to a distance merge_distance (see
MergeGapsByDistance()
) and try to fill them with a database approach (seeFillLoopsByDatabase()
)Try to fill remaining gaps using a Monte Carlo approach (see
FillLoopsByMonteCarlo()
)Large deletions get closed using a last resort approach (see
CloseLargeDeletions()
)
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.merge_distance (
int
) – Max. merge distance when performing the database approachfragment_db (
FragDB
) – Database for searching fragments in database approach, must be consistent with provided structure_db. A default is loaded if None.structure_db (
StructureDB
) – Structure db from which the fragment_db gets it’s structural information. A default is loaded if None.torsion_sampler (
promod3.loop.TorsionSampler
) – Used as parameter forFillLoopsByDatabase()
andFillLoopsByMonteCarlo()
A default one is loaded if None.fragger_handles (
list
) – A list ofpromod3.modelling.FraggerHandle
objects for each chain in mhandle. If provided, fragments will be used for sampling when theFillLoopsByMonteCarlo()
gets executed.chain_idx (
int
) – If not None, only gaps from chain with given index get processedresnum_range (
tuple
containing twoint
) – If not None, only gaps within this resnum range get processed.length_dep_weights (
bool
) –ScoringWeights
provides different sets of weights that have been trained on different loop subsets. If this flag is true, the length dependent weights are used to close loops with database / Monte Carlo.
- promod3.modelling.CloseSmallDeletions(mhandle, max_extension=9, clash_thresh=1.0, e_thresh=200, use_scoring_extender=True, use_full_extender=True, chain_idx=None, resnum_range=None, ff_lookup=None)¶
Close small deletions by relaxing neighbouring residues.
Small deletions in the template from the target-template alignment have a good chance to be bridged just by relaxing neighbours around a tiny gap. Before diving into the more demanding tasks in modeling, those may be closed already in the raw-model. After closure some checks are done to see if the solution is stereochemically sensible.
Closed gaps are removed from
mhandle.gaps
.from ost import io, seq from promod3 import modelling # setup tpl = io.LoadPDB('data/gly.pdb') aln = seq.CreateAlignment(seq.CreateSequence('trg', 'GGGG-GGGG'), seq.CreateSequence('tpl', 'GGGGAGGGG')) aln.AttachView(1, tpl.CreateFullView()) mhandle = modelling.BuildRawModel(aln) # close small deletion print('Number of gaps before: %d' % len(mhandle.gaps)) modelling.CloseSmallDeletions(mhandle) print('Number of gaps after: %d' % len(mhandle.gaps))
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.max_extension (
int
) – Maximal number of gap extension steps to perform (seeGapExtender
)clash_thresh (
float
) – Threshold for the backbone clash score. Acceptance means being lower than this.e_thresh (
float
) – Potential energy should be lower than this.use_scoring_extender (
bool
) – True = useScoringGapExtender
instead ofGapExtender
. The gap is penalized according as 0.8*length + sum(helices) + sum(sheets). For the scondary-structure-penalty to work, the model-template must have the appropriate information beforeBuildRawModel()
is called (e.g. withost.mol.alg.AssignSecStruct()
).use_full_extender (
bool
) – True = useFullGapExtender
instead of ofGapExtender
. Also works in combination with use_scoring_extender. This allows the gap extender to skip neighboring gaps and to correctly handle gaps close to termini.chain_idx (
int
) – If not None, only gaps from chain with given index get processedresnum_range (
tuple
containing twoint
) – If not None, only gaps within this resnum range get processed.ff_lookup (
promod3.loop.ForcefieldLookup
) – Forcefield to parametrizepromod3.loop.BackboneList
inpromod3.modelling.BackboneRelaxer
. If set to None, the one returned bypromod3.loop.ForcefieldLookup.GetDefault()
gets used.
- promod3.modelling.MergeGapsByDistance(mhandle, distance, chain_idx=None, resnum_range=None)¶
Merge 2 neighbouring gaps by deleting residues in-between.
Check if two neighbouring gaps are at max. distance residues apart from each other. Then delete the residues and store a new gap spanning the whole stretch of original gaps and the deleted region. Original gaps will be removed. Stem residues count to the gap, so A-A-A has a distance of 0.
IMPORTANT: we assume here that mhandle stores gaps sequentially. Non-sequential gaps are ignored!
from ost import io, seq from promod3 import modelling # setup tpl = io.LoadPDB('data/1crn_cut.pdb') seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATGYTCIIIPGATCPGDYAN' seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA----G--CIIIPGATCPGDYAN' aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg), seq.CreateSequence('tpl', seq_tpl)) aln.AttachView(1, tpl.CreateFullView()) mhandle = modelling.BuildRawModel(aln) # merge gaps print('Number of gaps before: %d' % len(mhandle.gaps)) modelling.MergeGapsByDistance(mhandle, 0) print('Number of gaps after: %d' % len(mhandle.gaps))
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.distance (
int
) – The max. no. of residues between two gaps up to which merge happens.chain_idx (
int
) – If not None, only gaps from chain with given index get processedresnum_range (
tuple
containing twoint
) – If not None, two gaps only get merged if they’re both in this resnum range.
- promod3.modelling.FillLoopsByDatabase(mhandle, fragment_db, structure_db, torsion_sampler=None, max_loops_to_search=40, min_loops_required=4, max_res_extension=-1, extended_search=True, use_scoring_extender=True, use_full_extender=True, score_variant=0, ring_punch_detection=1, chain_idx=None, resnum_range=None, max_num_all_atom=0, clash_thresh=-1, length_dep_weights=False)¶
Try to fill up loops from a structural database.
Usually this will extend the gaps a bit to match candidates from the database. Do not expect a gap being filled in between its actual stem residues. This function cannot fill gaps at C- or N-terminal.
from ost import io, seq from promod3 import modelling, loop # setup tpl = io.LoadPDB('data/1crn_cut.pdb') seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN' seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA------GCIIIPGATCPGDYAN' aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg), seq.CreateSequence('tpl', seq_tpl)) aln.AttachView(1, tpl.CreateFullView()) mhandle = modelling.BuildRawModel(aln) # close gaps print('Number of gaps before: %d' % len(mhandle.gaps)) modelling.FillLoopsByDatabase(mhandle, loop.LoadFragDB(), loop.LoadStructureDB(), loop.LoadTorsionSamplerCoil()) print('Number of gaps after: %d' % len(mhandle.gaps))
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.fragment_db (
FragDB
) – A fragment database coupled to the structure_db.structure_db (
StructureDB
) – Backbone/profile data.torsion_sampler (
TorsionSampler
) – A sampler for torsion angles. A default one is loaded if None.max_loops_to_search (
int
) – Define how many candidates are ‘enough’ to be evaluated per loop. The actual found candidates may be more (if we found ‘enough’) or less (if not enough candidates exist) of this number.min_loops_required (
int
) – Define how many candidates we require to close the loop. If we did not find at least this number of candidates for a gap, we skip it without closing. Can be set tomax_loops_to_search
(or equivalently to -1) to enforce that we only close gaps for which we found enough candidates.max_res_extension (
int
) – Only allow this number of residues to be added to the gaps when extending. If set to -1, any number of residues can be added (as long as the fragment_db allows it).extended_search (
bool
) – True = more loop candidates are considered. The candidate search is done less precisely (seeFillFromDatabase()
). The candidates are still scored and evaluated the same though (only more of them considered).use_scoring_extender (
bool
) – True = useScoringGapExtender
instead ofGapExtender
. SeeCloseSmallDeletions()
.use_full_extender (
bool
) – True = useFullGapExtender
instead ofGapExtender
. SeeCloseSmallDeletions()
.score_variant (
int
) –How to score loop candidates. Options:
0: put frame of backbone residues enclosing all candidates and score frame. This will also “score” non-modelled residues!
1: score candidates directly
2: like 1 but penalize length of candidate
ring_punch_detection (
int
) –How to deal with ring punchings. Options:
0: not at all (fastest)
1: check for punchings with existing rings
2: check incl. sidechain for loop cand.
chain_idx (
int
) – If not None, only gaps from chain with given index get processedresnum_range (
tuple
containing twoint
) – If not None, only gaps within this resnum range get processedmax_num_all_atom (
int
) – If > 0, we prefilter loop candidates based on non-all-atom-scores and apply all atom scoring to the best max_num_all_atom candidates. If desired, 5 is a good value here (larger values give only numerical improvement). With 5, this will be approx. 2x slower than without and will give a slight improvement in loop selection.clash_thresh (
float
) – If > 0, we only keep loop candidates which have a backbone clash score lower than this.length_dep_weights (
bool
) –ScoringWeights
provides different sets of weights that have been trained on different loop subsets. If this flag is true, the length dependent weights are used to select the final loops.
- promod3.modelling.FillLoopsByMonteCarlo(mhandle, torsion_sampler=None, max_loops_to_search=6, max_extension=30, mc_num_loops=2, mc_steps=5000, use_scoring_extender=True, use_full_extender=True, score_variant=0, ring_punch_detection=1, fragger_handles=None, chain_idx=None, resnum_range=None, length_dep_weights=False)¶
Try to fill up loops with Monte Carlo sampling.
This is meant as a “last-resort” approach when it is not possible to fill the loops from the database with
FillLoopsByDatabase()
. This will extend the gaps (up to max_extension times) a bit to allow for more loop candidates to be found.The loops are modelled by either sampling the dihedral angles or (if fragger_handles is given)
Fragger
lists. The latter is only used if the gap length is >= the length of fragments stored.This function cannot fill gaps at C- or N-terminal.
from ost import io, seq from promod3 import modelling, loop # setup tpl = io.LoadPDB('data/1crn_cut.pdb') seq_trg = 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN' seq_tpl = 'TTCCPSIVARSNFNVCRLPGTPEA------GCIIIPGATCPGDYAN' aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg), seq.CreateSequence('tpl', seq_tpl)) aln.AttachView(1, tpl.CreateFullView()) mhandle = modelling.BuildRawModel(aln) # close gaps print('Number of gaps before: %d' % len(mhandle.gaps)) modelling.FillLoopsByMonteCarlo(mhandle, loop.LoadTorsionSamplerCoil()) print('Number of gaps after: %d' % len(mhandle.gaps))
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.torsion_sampler (
TorsionSampler
) – A sampler for torsion angles. A default one is loaded if None.max_loops_to_search (
int
) – Define how many candidates are ‘enough’ to be evaluated per loop.max_extension (
int
) – Maximal number of gap extension steps to perform (seeGapExtender
)mc_num_loops (
int
) – Number of loop candidates to consider for each extended gap (seeFillFromMonteCarloSampler()
)mc_steps (
int
) – Number of MC steps to perform for each loop candidate (seeFillFromMonteCarloSampler()
)use_scoring_extender (
bool
) – True = useScoringGapExtender
instead ofGapExtender
. SeeCloseSmallDeletions()
.use_full_extender (
bool
) – True = useFullGapExtender
instead ofGapExtender
. SeeCloseSmallDeletions()
.score_variant (
int
) – How to score loop candidates (AllAtom not supported). SeeFillLoopsByDatabase()
.ring_punch_detection (
int
) – How to deal with ring punchings. SeeFillLoopsByDatabase()
.fragger_handles (
list
ofFraggerHandle
) – Either None (no fragger sampling used) or one fragger handle for each chain in mhandle.chain_idx (
int
) – If not None, only gaps from chain with given index get processedresnum_range (
tuple
containing twoint
) – If not None, only gaps within this resnum range get processedlength_dep_weights (
bool
) –ScoringWeights
provides different sets of weights that have been trained on different loop subsets. If this flag is true, the length dependent weights are used to select the final loops.
- promod3.modelling.CloseLargeDeletions(mhandle, structure_db, linker_length=8, num_fragments=500, use_scoring_extender=True, use_full_extender=True, chain_idx=None, resnum_range=None)¶
Try to close large deletions.
This is meant as a “last-resort” approach. In some cases you cannot close very large deletions simply because the two parts separated by a deletion are too far apart. The idea is to sample a linker region and always move the whole chain towards the n-terminus.
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.structure_db (
StructureDB
) – The database from which to extract fragments for the linker region.linker_length (
int
) – Desired length (in residues w/o stems) for the linker. This may be shorter if extender cannot extend further.num_fragments (
int
) – Number of fragments to sample the linker.use_scoring_extender (
bool
) – True = useScoringGapExtender
instead ofGapExtender
. SeeCloseSmallDeletions()
.use_full_extender (
bool
) – True = useFullGapExtender
instead ofGapExtender
. SeeCloseSmallDeletions()
.chain_idx (
int
) – If not None, only gaps from chain with given index get processedresnum_range (
tuple
containing twoint
) – If not None, only gaps within this resnum range get processed
- promod3.modelling.ModelTermini(mhandle, torsion_sampler, fragger_handles=None, mc_num_loops=20, mc_steps=5000)¶
Try to model termini with Monte Carlo sampling.
Use with care! This is an experimental feature which will increase coverage but we do not assume that the resulting termini are of high quality!
The termini are modelled by either sampling the dihedral angles or (if fragger_handles is given)
Fragger
lists. The latter is only used if the gap length is >= the length of fragments stored.from ost import io, seq from promod3 import modelling, loop # setup tpl = io.LoadPDB('data/gly.pdb') seq_trg = 'AAAAGGGGGGGGGGGGGGGGGGGGAAAAAA' seq_tpl = '----GGGGGGGGGGGGGGGGGGGG------' aln = seq.CreateAlignment(seq.CreateSequence('trg', seq_trg), seq.CreateSequence('tpl', seq_tpl)) aln.AttachView(1, tpl.CreateFullView()) mhandle = modelling.BuildRawModel(aln) # close gaps print('Number of gaps before: %d' % len(mhandle.gaps)) modelling.ModelTermini(mhandle, loop.LoadTorsionSamplerCoil()) print('Number of gaps after: %d' % len(mhandle.gaps))
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.torsion_sampler (
TorsionSampler
) – A sampler for torsion angles.fragger_handles (
list
ofFraggerHandle
) – Either None (no fragger sampling used) or one fragger handle for each chain in mhandle.mc_num_loops (
int
) – Number of loop candidates to consider for each terminal gap (seeFillFromMonteCarloSampler()
)mc_steps (
int
) – Number of MC steps to perform for each loop candidate (seeFillFromMonteCarloSampler()
)
- promod3.modelling.BuildSidechains(mhandle, merge_distance=4, fragment_db=None, structure_db=None, torsion_sampler=None, rotamer_library=None)¶
Build sidechains for model.
This is a wrapper for
promod3.modelling.ReconstructSidechains()
, followed by a check for ring punches. If ring punches are found it introduces gaps for the residues with punched rings and tries to fill them withFillLoopsByDatabase()
with ring_punch_detection=2.- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.merge_distance (
int
) – Used as parameter forMergeGapsByDistance()
if ring punches are found.fragment_db (
FragDB
) – Used as parameter forFillLoopsByDatabase()
if ring punches are found. A default one is loaded if None.structure_db (
StructureDB
) – Used as parameter forFillLoopsByDatabase()
if ring punches are found. A default one is loaded if None.torsion_sampler (
TorsionSampler
) – Used as parameter forFillLoopsByDatabase()
if ring punches are found. A default one is loaded if None.rotamer_library (
RotamerLib
orBBDepRotamerLib
) – Used as parameter formodelling.ReconstructSidechains()
, a default one is loaded if None.
- promod3.modelling.MinimizeModelEnergy(mhandle, max_iterations=12, max_iter_sd=20, max_iter_lbfgs=10, use_amber_ff=False, extra_force_fields=[], tolerance_sd=1.0, tolerance_lbfgs=1.0)¶
Minimize energy of final model using molecular mechanics.
Uses
ost.mol.mm
to perform energy minimization. It will iteratively (at most max_iterations times):run up to max_iter_sd minimization iter. of a steepest descend method
run up to max_iter_lbfgs minimization iter. of a Limited-memory Broyden-Fletcher-Goldfarb-Shanno method
abort if no stereochemical problems found
The idea is that we don’t want to minimize “too much”. So, we iteratively minimize until there are no stereochemical problems and not more.
To speed things up, this can run on multiple CPU threads by setting the env. variable
PM3_OPENMM_CPU_THREADS
to the number of desired threads. If the variable is not set, 1 thread will be used by default.If the starting model is so bad that the energy is NaN or Inf from the start (happens if atoms are on top of each other or almost), the energy minimization is aborted. This issue is logged and added as a major issue to
modelling_issues
of mhandle.- Parameters:
mhandle (
ModellingHandle
) – Modelling handle on which to apply change.max_iterations (
int
) – Max. number of iterations for SD+LBFGSmax_iter_sd (
int
) – Max. number of iterations within SD methodmax_iter_lbfgs (
int
) – Max. number of iterations within LBFGS methodtolerance_sd (
float
) – Tolerance parameter passed to ApplySD ofost.mol.mm.Simulation
object.tolerance_lbfgs (
float
) – Tolerance parameter passed to ApplyLBFGS ofost.mol.mm.Simulation
object.use_amber_ff (
bool
) – if True, use the AMBER force field instead of the def. CHARMM one (seeBuildFromRawModel()
).extra_force_fields (
list
ofost.mol.mm.Forcefield
) – Additional list of force fields to use (seeBuildFromRawModel()
).
- Returns:
The model including all oxygens as used in the minimizer.
- Return type:
- promod3.modelling.CheckFinalModel(mhandle)¶
Performs samity checks on final models and reports problems.
Issues are logged and tracked in
modelling_issues
of mhandle. Major issues:Chains with less than 3 residues (usually due to bad templates).
Incomplete models (i.e. some gaps couldn’t be closed). One issue is created per unclosed gap and the stems of the gap are added to the issue.
Complete models with sequence mismatches (should never happen).
Residues with rings which have been punched by another bond.
Minor issues:
Remaining stereo-chemical problems after energy minimization. The affected residues will have the boolean property “stereo_chemical_problem_backbone” set to True, if the problem affects backbone atoms.
Residues that contain non-planar rings.
- Parameters:
mhandle (
ModellingHandle
) – Modelling handle for which to perform checks.
- class promod3.modelling.ModellingIssue(text, severity, residue_list=[])¶
- Parameters:
text – Sets
text
.severity – Sets
severity
.residue_list – Sets
residue_list
.
- residue_list¶
List of residues affected by issue (or empty list if global issue).
- Type:
list
ofResidueHandle
/ResidueView
- promod3.modelling.AddModellingIssue(mhandle, text, severity, residue_list=[])¶
Adds a new
ModellingIssue
tomodelling_issues
in mhandle.If mhandle doesn’t contain the
modelling_issues
attribute yet, it is added.- Parameters:
mhandle (
ModellingHandle
) – Will have the issue added to.text – Sets
ModellingIssue.text
.severity – Sets
ModellingIssue.severity
.residue_list – Sets
ModellingIssue.residue_list
.
Alignment Fiddling¶
- promod3.modelling.DeleteGapCols(aln)¶
Deletes alignment columns that only contain gaps.
Columns that only contain gaps (‘-’) are removed. If no such column can be identified, the input alignment gets returned. A new alignment gets constructed otherwise. The sequences of the new alignment retain name, offset and the potentially attached
ost.mol.EntityView
from the original sequences.- Parameters:
aln (
ost.seq.AlignmentHandle
) – Input alignment- Returns:
The processed alignment
- Return type:
- promod3.modelling.PullTerminalDeletions(aln, min_terminal_anchor_size=4)¶
Fixes deletions close to termini.
Some alignment tools may produce alignments with deletions close to the termini. For example:
SEQRES: A-----BCDE... ATOMSEQ: ABCDEFGHIJ...
where A is the anchor residue. The default loop modelling pipeline would keep the position of A fixed and start to omit structural information from B, C, … until it is able to resolve the deletion. If the anchor is very short, a shift typically results in a better model:
SEQRES: -----ABCDE... ATOMSEQ: ABCDEFGHIJ...
This function checks whether the gap closest to any termini is a deletion (one or several ‘-’ in the first sequence) and estimates the anchor size (number of aligned residues towards the respective termini from that gap). If the anchor size is smaller than min_terminal_anchor_size, a shift is applied and the deletion removed.
This is done iteratively, until no deletion can be removed anymore.
SEQRES: A-B--CDEF... ATOMSEQ: ABCDEFGHI...
becomes
SEQRES: ---ABCDEF... ATOMSEQ: ABCDEFGHI...
given a min_terminal_anchor_size>2. If no shift can be performed, the input alignment gets returned. A new alignment gets constructed otherwise. The sequences of the new alignment retain name, offset and the potentially attached
ost.mol.EntityView
from the original sequences.- Parameters:
aln (
ost.seq.AlignmentHandle
) – Input alignment- Returns:
The processed alignment
- Return type: