ProMod3 Actions¶
A pure command line interface of ProMod3 is provided by actions.
You can execute pm help
for a list of possible actions and for every action,
you can type pm <ACTION> -h
to get a description on its usage.
Here we list the most prominent actions with simple examples.
Building models¶
You can run a full protein homology modelling pipeline from the command line with
$ pm build-model [-h] (-f <FILE> | -c <FILE> | -j <OBJECT>|<FILE>)
(-p <FILE> | -e <FILE>) [-s <FILE>] [-o <FILENAME>]
[-r] [-t]
Example usage:
$ pm build-model -f aln.fasta -p tpl.pdb
This reads a target-template alignment from aln.fasta
and a matching
structure from tpl.pdb
and produces a gap-less model which is stored as
model.pdb
. The output filename can be controlled with the -o
flag.
Target-template alignments can be provided in FASTA (-f
), CLUSTAL (-c
)
or as JSON files/objects (-j
). Files can be plain or gzipped.
At least one alignment must be given and you cannot mix file formats.
Multiple alignment files can be given and target chains will be appended in the
given order. The chains of the target model are named with default chain names
(A, B, C, …, see BuildRawModel()
).
Notes on the input formats:
Leading/trailing whitespaces of sequence names will always be deleted
FASTA input example:
>target HGFHVHEFGDNTNGCMSSGPHFNPYGKEHGAPVDENRHLG >2jlp-1.A|55 RAIHVHQFGDLSQGCESTGPHYNPLAVPH------PQHPG
Target sequence is either named “trg” or “target” or the first sequence is used. Template sequence names can encode an identifier for the chain to attach to it and optionally an offset (here: 55, see below for details). Leading whitespaces of fasta headers will be deleted
CLUSTAL input follows the same logic as FASTA input
JSON input: filenames are not allowed to start with ‘{‘. JSON objects contain an entry with key ‘alignmentlist’. That in turn is an array of objects with keys ‘target’ and ‘template’. Those in turn are objects with keys ‘name’ (string id. for sequence), ‘seqres’ (string for aligned sequence) and optionally for templates ‘offset’ (number of residues to skip in structure file attached to it). Example:
{"alignmentlist": [ { "target": { "name": "mytrg", "seqres": "HGFHVHEFGDNTNGCMSSGPHFNPYGKEHGAPVDENRHLG" }, "template": { "name": "2jlp-1.A", "offset": 55, "seqres": "RAIHVHQFGDLSQGCESTGPHYNPLAVPH------PQHPG" } } ] }
Structures can be provided in PDB (-p
) or in any format readable by the
ost.io.LoadEntity()
method (-e
). In the latter case, the format is
chosen by file ending. Recognized File Extensions: .ent
, .pdb
,
.ent.gz
, .pdb.gz
, .cif
, .cif.gz
. At least one structure must be
given and you cannot mix file formats. Multiple structures can be given and each
structure may have multiple chains, but care must be taken to identify which
chain to attach to which template sequence. Chains for each sequence are
identified based on the sequence name of the templates in the alignments. Valid
sequence names are:
anything, if only one structure with one chain
“<FILE>.<CHAIN>”, where <FILE> is the base file name of an imported structure with no extensions and <CHAIN> is the identifier of the chain in the imported structure.
“<FILE>” if only one chain in file
“<CHAIN>” if only one file imported
“<CHAINID>|<OFFSET>”, where <CHAINID> identifies the chain as above and <OFFSET> is the number of residues to skip for that chain to reach the first residue in the aligned sequence. Leading/trailing whitespaces of <CHAINID> and <OFFSET> are ignored.
Example: ... -p data/2jlp.pdb.gz
, where the pdb file has chains A
,
B
, C
and the template sequence is named 2jlp.A|55
.
You can optionally specify sequence profiles to be added (-s
) and linked
to the corresponding target sequences. This has an impact on loop scoring with
the database approach.
The profiles can be provided as plain files or gzipped. Following file
extensions are understood: .hhm, .hhm.gz, .pssm, .pssm.gz.
Consider to use ost.bindings.hhblits.HHblits.A3MToProfile()
if you have a
file in a3m format at hand.
The profiles are mapped based on exact matches towards the gapless target sequences from the provided alignment files, i.e. one profile is mapped to several chains in case of homo-oligomers
Every profile must have a unique sequence to avoid ambiguities
All or nothing - You cannot provide profiles for only a subset of target sequences
Example usage:
$ pm build-model -f aln.fasta -p tpl.pdb -s prof.hhm
A fast torsion angle based sampling is performed in case of Monte Carlo
sampling. You can enforce the usage of structural fragments with -r
but this increases runtime due to searching the required fragments.
Setup of the according promod3.modelling.FraggerHandle
objects is performed in the
PM3ArgumentParser
class as described in
detail here
.
The default modelling pipeline in ProMod3 is optimized to generate a gap-free
model of the region in the target sequence(s) that is covered with template
information. Terminal extensions without template coverage are negelected.
You can enforce a model of the full target sequence(s) by adding -t
.
The terminal parts will be modelled with a crude Monte Carlo approach. Be aware
that the accuracy of those termini is likely to be limited. Termini of length 1
won’t be modelled.
Possible exit codes of the action:
0: all went well
1: an unhandled exception was raised
2: arguments cannot be parsed or required arguments are missing
3: failed to perform modelling (internal error)
4: failed to write results to file
other non-zero: failure in argument checking (see
promod3.core.pm3argparse.PM3ArgumentParser
)
Sidechain Modelling¶
You can (re-)construct the sidechains in a model from the command line.
$ usage: build-sidechains [-h] (-p <FILE> | -e <FILE>) [-o <FILENAME>] [-k] [-n]
[-r] [-i] [-s]
Example usage:
$ pm build-sidechains -p input.pdb
This reads a structure stored in in.pdb, strips all sidechains,
detects and models disulfid bonds and reconstructs all sidechains with the
flexible rotamer model. The result is stored as out.pdb
.
The output filename can be controlled with the -o
flag.
A structure can be provided in PDB (-p
) or in any format readable by the
ost.io.LoadEntity()
method (-e
). In the latter case, the format is
chosen by file ending. Recognized File Extensions: .ent
, .pdb
,
.ent.gz
, .pdb.gz
, .cif
, .cif.gz
.
Several flags control the modelling behaviour:
- -k, --keep-sidechains¶
Keep existing sidechains.
- -n, --no-disulfids¶
Do not build disulfid bonds before sidechain optimization
- -r, --rigid-rotamers¶
Do not use rotamers with subrotamers
- -i, --backbone-independent¶
Use backbone independent rotamer library (from
promod3.sidechain.LoadLib()
) instead of the default backbone dependent one (frompromod3.sidechain.LoadBBDepLib()
)
- -s, --no-subrotamer-optimization¶
Dont do subrotamer optimization if flexible rotamer model is used
- -f, --energy_function¶
The energy function to be used. Default is SCWRL4, can be any function supported by
promod3.modelling.ReconstructSidechains()
.