Generate ost.mol.mm systems

To simplify the creation of ost.mol.mm / OpenMM simulations for loops in proteins, we define a system creator for loops (MmSystemCreator) and a specialized forcefield lookup for amino acids (ForcefieldLookup).

The example below showcases the creation and use of an MM system:

from ost import io, geom
from promod3 import loop

# setup system creator
ff_lookup = loop.ForcefieldLookup.GetDefault()
mm_sys = loop.MmSystemCreator(ff_lookup)

# load example (has res. numbering starting at 1)
ent = io.LoadPDB("data/1CRN.pdb")
res_list = ent.residues
num_residues = len(res_list)

# get all atom positions for full protein
all_atoms = loop.AllAtomPositions(res_list)
# here full structure in res_indices but in practice this could
# be just a subset of residues relevant to the loop of interest
res_indices = list(range(num_residues))
# define two loops (indices into res_indices list)
loop_start_indices = [10, 20]
loop_lengths = [6, 4]
# define which of the res_indices is terminal
is_n_ter = [True] + [False] * (num_residues - 1)
is_c_ter = [False] * (num_residues - 1) + [True]
# get disulfid bridges
disulfid_bridges = mm_sys.GetDisulfidBridges(all_atoms, res_indices)
# setup MM system
mm_sys.SetupSystem(all_atoms, res_indices, loop_start_indices,
                   loop_lengths, is_n_ter, is_c_ter, disulfid_bridges)

# run simulation
sim = mm_sys.GetSimulation()
print("Potential energy before: %g" % sim.GetPotentialEnergy())
sim.ApplySD(0.01, 100)
print("Potential energy after: %g" % sim.GetPotentialEnergy())

# extract new loop positions and store it
mm_sys.ExtractLoopPositions(all_atoms, res_indices)
io.SavePDB(all_atoms.ToEntity(), "mm_sys_output.pdb")

Note

To simplify use when sidechains may be missing and the region of interest for the MM system has to be determined, it might be better to use the simpler promod3.modelling.SidechainReconstructor and promod3.modelling.AllAtomRelaxer classes. Even if all the sidechains are available, those classes will be helpful since the overhead to check sidechains without reconstructing them is minimal.

Create MM systems for loops

class promod3.loop.MmSystemCreator(ff_lookup, fix_surrounding_hydrogens=True, kill_electrostatics=False, nonbonded_cutoff=8, inaccurate_pot_energy=False)

Setup a system creator for a specific forcefield. The constructor only stores the settings. Most setup work is done by SetupSystem().

The idea is to have a set of movable loop residues and a set of fixed surrounding residues which interact with the loop.

Parameters:
  • ff_lookup (ForcefieldLookup) – Forcefield to use with this system creator.

  • fix_surrounding_hydrogens (bool) – If False, the hydrogens of the surrounding residues can move to improve H-bond building (True by default as it only has a minor impact on the result and a big one on performance).

  • kill_electrostatics (bool) – If True, all charges are removed from the system. This is good for H-bond building, but may be bad if residues in the surrounding are missing (gaps).

  • nonbonded_cutoff (float) – Defines cutoff to set for non bonded interactions. Recommended values: 5 if kill_electrostatics = True, 8 otherwise. Negative value means no cutoff.

  • inaccurate_pot_energy (bool) – If True, we do not set correct non-bonded interactions between fixed atoms. This leads to inaccurate pot. energies but it is faster and has no effect on simulation runs (e.g. ApplySD) otherwise.

GetDisulfidBridges(all_pos, res_indices)
Returns:

Pairs of indices (i,j), where res_indices[i] and res_indices[j] are assumed to have a disulfid bridge (CYS-SG pairs within 2.5 A).

Return type:

list of tuple with two int

Parameters:
  • all_pos (AllAtomPositions) – Provides positions for each res_indices[i].

  • res_indices (list of int) – Residue indices into all_pos.

SetupSystem(all_pos, res_indices, loop_length, is_n_ter, is_c_ter, disulfid_bridges)
SetupSystem(all_pos, res_indices, loop_start_indices, loop_lengths, is_n_ter, is_c_ter, disulfid_bridges)

Setup a new system for the loop / surrounding defined by all_pos and res_indices. Positions are taken from all_pos[res_indices[i]] and each of them must have all heavy atoms defined. Residue all_pos[i] is assumed to be peptide bound to all_pos[i-1] unless res. i is labelled as N-terminal. If all_pos[i-1] has an unset C (could happen for gaps while modelling), res. i will still be treated as if peptide-bound and the hydrogen attached to N is built assuming a helical structure as an approximation. Similarly, we do not generate OXT atoms for residues followed by a gap unless they are specifically labelled as C-terminal.

Each loop is defined by an entry in loop_start_indices and loop_lengths. For loop i_loop, res_indices[loop_start_indices[i_loop]] is the N-stem and res_indices[loop_start_indices[i_loop] + loop_lengths[i_loop] - 1] is the C-stem of the loop. The loop indices are expected to be contiguous in res_indices between the stems. The stems of the loop are kept fixed (N, CA, CB for N-stem / CA, CB, C, O for C-stem) unless they are terminal residues. Overlapping loops are merged and 0-length loops are removed.

If loop_length is given, a single loop with start index 0 and the given loop length is defined.

If a system was setup previously, it is overwritten completely.

If possible, this uses the “CPU” platform in OpenMM using the env. variable PM3_OPENMM_CPU_THREADS to define the number of desired threads (1 thread is used if variable is undefined).

Parameters:
  • all_pos (AllAtomPositions) – Provides positions for each res_indices[i].

  • res_indices (list of int) – Residue indices into all_pos.

  • loop_length (int) – Length of single loop (incl. stems).

  • loop_start_indices (list of int) – Start indices of loops.

  • loop_lengths (list of int) – Lengths of loops (incl. stems).

  • is_n_ter (list of bool) – For each res_indices[i], is_n_ter[i] defines whether that residue is N-terminal.

  • is_c_ter (list of bool) – For each res_indices[i], is_c_ter[i] defines whether that residue is C-terminal.

  • disulfid_bridges (list of tuple with two int) – Pairs of indices (i,j), where res_indices[i] and res_indices[j] form a disulfid bridge (see GetDisulfidBridges())

Raises:

RuntimeError if loops out of bounds with respect to res_indices, loop_start_indices / loop_lengths or res_indices / is_n_ter / is_c_ter have inconsistent lengths, or if any all_pos[res_indices[i]] is invalid or has unset heavy atoms.

UpdatePositions(all_pos, res_indices)

Updates the positions in the system. Even though SetupSystem() is already very fast, this can speed up resetting a simulation. The data must be consistent with the data used in the last SetupSystem() call.

Parameters:
  • all_pos (AllAtomPositions) – Provides positions for each res_indices[i].

  • res_indices (list of int) – Residue indices into all_pos.

Raises:

RuntimeError if data is inconsistent with last SetupSystem() call (same number of residues, same AA).

ExtractLoopPositions(loop_pos)
ExtractLoopPositions(out_pos, res_indices)

Extracts loop positions from the current simulation. If the simulation was run (via GetSimulation()), we internally have new positions for the residues corresponding to all_pos[res_indices[i]] passed in SetupSystem(). This function then extracts these positions back into a compatible output storage.

If loop_pos is passed, only the loops are stored. The loop residues are stored contiguously and the loops are stored in the order given by GetLoopStartIndices() / GetLoopLengths(). The loop_pos storage must have at least GetNumLoopResidues() residues and must have matching amino acid types with respect to the loop residues passed as all_pos of SetupSystem().

If out_pos and res_indices are passed, residues must have matching amino acid types for out_pos[res_indices[i]] and all_pos[res_indices[i]] of SetupSystem(). Only loop residues with i in the ranges defined by GetLoopStartIndices() / GetLoopLengths() are touched.

Parameters:
  • loop_pos (AllAtomPositions) – Reduced storage only covering loop positions.

  • out_pos (AllAtomPositions) – Storage for loop positions linked to res_indices.

  • res_indices (list of int) – Residue indices into out_pos.

Raises:

RuntimeError if data is inconsistent with last SetupSystem() call (big enough and matching AA).

GetSimulation()
Returns:

Simulation object setup by SetupSystem(). Use this to run MM simulations.

Return type:

ost.mol.mm.Simulation

GetNumResidues()
Returns:

Number of residues of current simulation setup.

Return type:

int

GetNumLoopResidues()
Returns:

Number of loop residues (incl. stems) of current simulation setup.

Return type:

int

GetLoopStartIndices()
Returns:

Start indices of loops (see SetupSystem()).

Return type:

list of int

GetLoopLengths()
Returns:

Lengths of loops (see SetupSystem()).

Return type:

list of int

GetForcefieldAminoAcids()
Returns:

Forcefield-specific amino acid type for each residue of last SetupSystem() call.

Return type:

list of ForcefieldAminoAcid

GetIndexing()

The atoms of residue i are stored from idx[i] to idx[i+1]-1, where idx is the list returned by this function. The atoms themselves are ordered according to the indexing defined by ForcefieldLookup.

Returns:

Indexing to positions vector used by the simulation object. The last item of the list contains the number of atoms in the system.

Return type:

list of int

GetCpuPlatformSupport()
Returns:

True, if we will use OpenMM’s “CPU” platform (enabled by default if platform is available). False, if we use “Reference” platform.

Return type:

bool

SetCpuPlatformSupport(cpu_platform_support)

Override “CPU” platform support setting. Useful to force the use of the OpenMM’s “Reference” platform for testing (by default we use “CPU” if it is available).

Parameters:

cpu_platform_support (bool) – True, if “CPU” platform desired (ignored if platform not available). False, if “Reference” platform desired.

Forcefield lookup for amino acids

The ForcefieldLookup class and its helpers define a fast way to extract FF specific data for amino acids in a protein. We distinguish amino acid types (and a few variants thereof) which may all be N- and/or C-terminal.

class promod3.loop.ForcefieldLookup

This class provides all functionality to generate ost.mol.mm.Simulation objects. Specifically, we can:

  • get a consistent indexing of each atom of each residue in [0, N-1], where N = GetNumAtoms() (note that only OXT indexing depends on whether a residue is terminal)

  • extract masses, charges and LJ-parameters for each atom (list of length N)

  • extract connectivities (ForcefieldConnectivity), which include all possible bonds, angles, dihedrals, impropers and LJ pairs

There is functionality to adapt the lookup and store it as needed or you can load a predefined one with GetDefault() or LoadCHARMM().

The atom indexing and GetAA() are independent of the loaded file.

static GetDefault()
Returns:

A default singleton instance (shared throughout the Python process) of this class with all data defined. Using this instance has the advantage that the object is only loaded once!

Return type:

ForcefieldLookup

static SetDefault(new_default)

Sets default singleton instance for all future GetDefault() calls.

Parameters:

new_default (ForcefieldLookup) – Lookup object to use as the new default.

static LoadCHARMM()
Returns:

Predefined lookup object extracted from a CHARMM forcefield (loaded from disk)

Return type:

ForcefieldLookup

static Load(filename)
static LoadPortable(filename)

Loads raw binary file generated with Save() (optimized for fast reading) / portable file generated with SavePortable() (slower but less machine-dependent).

Parameters:

filename (str) – Path to the file from which to load.

Returns:

The loaded lookup object

Return type:

ForcefieldLookup

Raises:

RuntimeError if file cannot be opened or if file cannot be parsed (see here for details).

Save(filename)
SavePortable(filename)

Saves a raw / portable binary representation. Use portable files for distribution and convert locally to raw files. See here for details.

Parameters:

filename (str) – Path to the file where it will be saved.

Raises:

RuntimeError if file cannot be opened.

GetAA(ff_aa)
Returns:

Amino acid type for given ff_aa

Return type:

ost.conop.AminoAcid

Parameters:

ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

GetNumAtoms(ff_aa, is_nter, is_cter)
Returns:

Number of atoms for given input.

Return type:

int

Parameters:
  • ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

  • is_nter (bool) – True if N-terminal variant desired

  • is_cter (bool) – True if C-terminal variant desired

GetHeavyIndex(ff_aa, atom_idx)
GetHeavyIndex(ff_aa, atom_name)
Returns:

Internal index for given heavy atom in [0, GetNumAtoms()]

Return type:

int

Parameters:
GetHydrogenIndex(ff_aa, atom_idx)
GetHydrogenIndex(ff_aa, atom_name)
Returns:

Internal index for given hydrogen atom in [0, GetNumAtoms()]

Return type:

int

Parameters:
GetOXTIndex(ff_aa, is_nter)
Returns:

Internal index of OXT atom for C-terminal residue

Return type:

int

Parameters:
  • ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

  • is_nter (bool) – True if N-terminal variant desired

GetFudgeLJ()
Returns:

Dampening factor for LJ 1,4 interactions (see ost.mol.mm.Topology.SetFudgeLJ())

Return type:

float

GetFudgeQQ()
Returns:

Dampening factor for electrostatic 1,4 interactions (see ost.mol.mm.Topology.SetFudgeQQ())

Return type:

float

GetMasses(ff_aa, is_nter, is_cter)
Returns:

Mass for each atom (see ost.mol.mm.Topology.SetMasses())

Return type:

list of float (length = GetNumAtoms())

Parameters:
  • ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

  • is_nter (bool) – True if N-terminal variant desired

  • is_cter (bool) – True if C-terminal variant desired

GetCharges(ff_aa, is_nter, is_cter)
Returns:

Charge for each atom (see ost.mol.mm.Topology.SetCharges())

Return type:

list of float (length = GetNumAtoms())

Parameters:
  • ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

  • is_nter (bool) – True if N-terminal variant desired

  • is_cter (bool) – True if C-terminal variant desired

GetSigmas(ff_aa, is_nter, is_cter)
Returns:

Sigma in nm for each atom (see ost.mol.mm.Topology.SetSigmas())

Return type:

list of float (length = GetNumAtoms())

Parameters:
  • ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

  • is_nter (bool) – True if N-terminal variant desired

  • is_cter (bool) – True if C-terminal variant desired

GetEpsilons(ff_aa, is_nter, is_cter)
Returns:

Epsilon in kJ/mol for each atom (see ost.mol.mm.Topology.SetEpsilons())

Return type:

list of float (length = GetNumAtoms())

Parameters:
  • ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

  • is_nter (bool) – True if N-terminal variant desired

  • is_cter (bool) – True if C-terminal variant desired

GetInternalConnectivity(ff_aa, is_nter, is_cter)
Returns:

Internal connectivity of a residue

Return type:

ForcefieldConnectivity

Parameters:
  • ff_aa (ForcefieldAminoAcid) – Forcefield-specific amino acid type

  • is_nter (bool) – True if N-terminal variant desired

  • is_cter (bool) – True if C-terminal variant desired

GetPeptideBoundConnectivity(ff_aa_one, ff_aa_two, is_nter, is_cter)
Returns:

All connectivity which include peptide bond between two residues (additional to GetInternalConnectivity())

Return type:

ForcefieldConnectivity

Parameters:
  • ff_aa_one (ForcefieldAminoAcid) – Forcefield-specific amino acid type of first residue

  • ff_aa_two (ForcefieldAminoAcid) – Forcefield-specific amino acid type of second residue

  • is_nter (bool) – True if first residue is N-terminal

  • is_cter (bool) – True if second residue is C-terminal

GetDisulfidConnectivity()
Returns:

All connectivity which include disulfid bridge between two cysteins (additional to GetInternalConnectivity())

Return type:

ForcefieldConnectivity

SetFudgeLJ(fudge)

Set value for future GetFudgeLJ() calls to fudge.

SetFudgeQQ(fudge)

Set value for future GetFudgeQQ() calls to fudge.

SetMasses(ff_aa, is_nter, is_cter, masses)

Set value for future GetMasses() calls to masses.

SetCharges(ff_aa, is_nter, is_cter, charges)

Set value for future GetCharges() calls to charges.

SetSigmas(ff_aa, is_nter, is_cter, sigmas)

Set value for future GetSigmas() calls to sigmas.

SetEpsilons(ff_aa, is_nter, is_cter, epsilons)

Set value for future GetEpsilons() calls to epsilons.

SetInternalConnectivity(ff_aa, is_nter, is_cter, connectivity)

Set value for future GetInternalConnectivity() calls to connectivity.

SetPeptideBoundConnectivity(ff_aa_one, ff_aa_two, is_nter, is_cter, connectivity)

Set value for future GetPeptideBoundConnectivity() calls to connectivity.

SetDisulfidConnectivity(connectivity)

Set value for future GetDisulfidConnectivity() calls to connectivity.

class promod3.loop.ForcefieldAminoAcid

Enumerates the amino acid types for forcefields. The first 20 values correspond to the 20 values of ost.conop.AminoAcid. Additionally, there are values for disulfid bridges (FF_CYS2), d-protonated histidine (FF_HISD, default for ost.conop.HIS is FF_HISE) and FF_XXX for unknown types. The full list of values is:

FF_ALA, FF_ARG, FF_ASN, FF_ASP, FF_GLN, FF_GLU, FF_LYS, FF_SER, FF_CYS, FF_MET, FF_TRP, FF_TYR, FF_THR, FF_VAL, FF_ILE, FF_LEU, FF_GLY, FF_PRO FF_HISE, FF_PHE, FF_CYS2, FF_HISD, FF_XXX

class promod3.loop.ForcefieldConnectivity

Contains lists of bonds, angles, dihedrals, impropers and LJ pairs (exclusions are the combination of all bonds and 1,3 pairs of angles and are not stored separately). Each type of connectivity has it’s own class (see below) storing indices and parameters to be used for methods of ost.mol.mm.Topology. The indexing of atoms for internal connectivities is in [0, N-1], where N = ForcefieldLookup.GetNumAtoms(). For connectivities of pairs of residues, atoms of the first residue are in [0, N1-1] and atoms of the second one are in [N1, N1+N2-1], where N1 and N2 are the number of atoms of the two residues. For disulfid bridges, N1 = N2 = GetNumAtoms(FF_CYS2, False, False).

harmonic_bonds

List of harmonic bonds.

Type:

list of ForcefieldBondInfo

harmonic_angles

List of harmonic angles.

Type:

list of ForcefieldHarmonicAngleInfo

urey_bradley_angles

List of Urey-Bradley angles.

Type:

list of ForcefieldUreyBradleyAngleInfo

periodic_dihedrals

List of periodic dihedrals.

Type:

list of ForcefieldPeriodicDihedralInfo

periodic_impropers

List of periodic impropers.

Type:

list of ForcefieldPeriodicDihedralInfo

harmonic_impropers

List of harmonic impropers.

Type:

list of ForcefieldHarmonicImproperInfo

lj_pairs

List of LJ pairs.

Type:

list of ForcefieldLJPairInfo

class promod3.loop.ForcefieldBondInfo

Define harmonic bond (see ost.mol.mm.Topology.AddHarmonicBond())

index_one

Index of particle 1

Type:

int

index_two

Index of particle 2

Type:

int

bond_length

Bond length in nm

Type:

float

force_constant

Force constant in kJ/mol/nm^2

Type:

float

class promod3.loop.ForcefieldHarmonicAngleInfo

Define harmonic angle (see ost.mol.mm.Topology.AddHarmonicAngle())

index_one

Index of particle 1

Type:

int

index_two

Index of particle 2

Type:

int

index_three

Index of particle 3

Type:

int

angle

Angle in radians

Type:

float

force_constant

Force constant in kJ/mol/radian^2

Type:

float

class promod3.loop.ForcefieldUreyBradleyAngleInfo

Define Urey-Bradley angle (see ost.mol.mm.Topology.AddUreyBradleyAngle())

index_one

Index of particle 1

Type:

int

index_two

Index of particle 2

Type:

int

index_three

Index of particle 3

Type:

int

angle

Angle in radians

Type:

float

angle_force_constant

Angle force constant kJ/mol/radian^2

Type:

float

bond_length

Bond length in nm

Type:

float

bond_force_constant

Bond force constant kJ/mol/nm^2

Type:

float

class promod3.loop.ForcefieldPeriodicDihedralInfo

Define periodic dihedral or improper (see ost.mol.mm.Topology.AddPeriodicDihedral() and ost.mol.mm.Topology.AddPeriodicImproper())

index_one

Index of particle 1

Type:

int

index_two

Index of particle 2

Type:

int

index_three

Index of particle 3

Type:

int

index_four

Index of particle 4

Type:

int

multiplicity

Multiplicity

Type:

int

phase

Phase in radians

Type:

float

force_constant

Force constant in kJ/mol/radian^2

Type:

float

class promod3.loop.ForcefieldHarmonicImproperInfo

Define harmonic improper (see ost.mol.mm.Topology.AddHarmonicImproper())

index_one

Index of particle 1

Type:

int

index_two

Index of particle 2

Type:

int

index_three

Index of particle 3

Type:

int

index_four

Index of particle 4

Type:

int

angle

Angle in radians

Type:

float

force_constant

Force constant kJ/mol/radian^2

Type:

float

class promod3.loop.ForcefieldLJPairInfo

Define LJ pair (see ost.mol.mm.Topology.AddLJPair())

index_one

Index of particle 1

Type:

int

index_two

Index of particle 2

Type:

int

sigma

Sigma in nm

Type:

float

epsilon

Epsilon in kJ/mol

Type:

float

Search

Enter search terms or a module, class or function name.

Contents