This document is for OpenStructure version 1.1, the latest version is 2.8 !

conop – Connectivity and Topology of Molecules

The main task of the conop module is to connect atoms with bonds. While the bond class is also part of the base module, the conop module deals with setting up the correct bonds between atoms.

Motivation

Traditionally the connectivity between atoms has not been reliably described in a PDB file. Different programs adopted various ways of finding out if two atoms are connected. One way chosen is to rely on proper naming of the atoms. For example, the backbone atoms of the standard amino acids are named as N, CA, C and O and if atoms with these name appear in the same residue they are shown connected. Another way is to apply additional heuristics to find out if a peptide bond between two consecutive residues is formed. Breaks in the backbone are indicated, e.g., by introducing a discontinuity in the numbering of the residue.

Loader heuristics are great if you are the one that implemented them but are problematic if you are just the user of a software that has them. As time goes on, these heuristics become buried in thousands of lines of code and they are often hard yet impossible to trace back.

Different clients of the framework have different requirements. A visualisation software wants to read in a PDB files as is without making any changes. A script in an automated pipeline, however, does want to either strictly reject files that are incomplete or fill-in missing structural features. All these aspects are implemented in the conop module, separated from the loading of the PDB file, giving clients a fine grained control over the loading process.

The Builder interface

The conop module defines a Builder interface, to run connectivity algorithms, that is to connect the atoms with bonds and perform basic clean up of errorneous structures. The clients of the conop module can specify how the Builder should treat unknown amino acids, missing atoms and chemically infeasible bonds.

The exact behaviour for a builder is implementation-specific. So far, two classes implement the Builder interface: A heuristic and a rule-based builder. The builders mainly differ in the source of their connectivity information. The HeuristicBuilder uses a hard-coded heuristic connectivity table for the 20 standard amino acids as well as nucleotides.For other compounds such as ligands the HeuristicBuilder runs a distance-based connectivity algorithm that connects two atoms if they are closer than a certain threshold. The RuleBasedBuilder uses a connectivity library containing all molecular components present in the PDB files on PDB.org. The library can easily be extended with custom connectivity information, if required. By default the heuristic builder is used, however the builder may be switched by setting the !RuleBasedBuilder as the default. To do so, one has first to create a new instance of a RuleBasedBuilder and register it in the builder registry of the conop module. In Python, this can be achieved with

from ost import conop
compound_lib=conop.CompoundLib.Load('...')
rbb=conop.RuleBasedBuilder(compound_lib)
conop.Conopology.Instance().RegisterBuilder(rbb,'rbb')
conop.Conopology.Instance().SetDefaultBuilder('rbb')

All subsequent calls to ost.io.LoadEntity() will make use of the RuleBasedBuilder instead of the heuristic builder. See here for more information on how to create the necessary files to use the rule-based builder.

class ost.conop.Builder
CompleteAtoms(residue)

add any missing atoms to the residue based on its key, with coordinates set to zero.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
CheckResidueCompleteness(residue)

verify that the given residue has all atoms it is supposed to have based on its key.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
IsResidueComplete(residue)

Check whether the residue has all atoms it is supposed to have. Hydrogen atoms are not required for a residue to be complete.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
IdentifyResidue(residue)

attempt to identify the residue based on its atoms, and return a suggestion for the proper residue key.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
ConnectAtomsOfResidue(residue)

Connects atoms of residue based on residue and atom name. This method does not establish inter-residue bonds. To connect atoms that belong to different residues, use ConnectResidueToPrev(), or ConnectResidueToNext().

Parameters:residue (mol.ResidueHandle) – must be a valid residue
ConnectResidueToPrev(residue, prev)

Connect atoms of residue to previous. The order of the parameters is important. In case of a polypeptide chain, the residues are thought to be ordered from N- to C- terminus.

Parameters:
  • residue (mol.ResidueHandle) – must be a valid residue
  • prev (mol.ResidueHandle) – valid or invalid residue
DoesPeptideBondExist(n, c)

Check if peptide bond should be formed between the n and c atom. This method is called by ConnectResidueWithNext() after making sure that both residues participating in the peptide bond are peptide linking components.

By default, IsBondFeasible() is used to check whether the two atoms form a peptide bond.

Parameters:
  • n (mol.AtomHandle) – backbone nitrogen atom (IUPAC name N). Must be valid.
  • c (mol.AtomHandle) – backbone C-atom (IUPAC name C). Must be valid.
IsBondFeasible(atom_a, atom_b)

Overloadable hook to check if bond between to atoms is feasible. The default implementation uses a distance-based check to check if the two atoms should be connected. The atoms are connected if they are in the range of 0.8 to 1.2 times their van-der-WAALS radius.

Parameters:
  • atom_a – a valid atom
  • atom_a – a valid atom
GuessAtomElement(atom_name, hetatm)

guess element of atom based on name and hetatm flag

Parameters:
  • atom_name (string) – IUPAC atom name, e.g. CA, CB or N.
  • hetatm (bool) – Whether the atom is a hetatm or not
AssignBackboneTorsionsToResidue(residue)

For peptide-linking residues, residues, assigns phi, psi and omega torsions to amino acid.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
GuessChemClass(residue)

Guesses the chemical class of the residue based on its atom and connectivity.

So far, the method only guesses whether the residue is a peptide. A residue is a peptide if all the backbone atoms N,CA,C,O are present, have the right element and are in a suitable orientation to form bonds.

class ost.conop.RuleBasedBuilder

The RuleBasedBuilder implements the Builder interface. Refer to its documentation for a basic description of the methods.

CheckResidueCompleteness(residue)

By using the description of the chemical compound, the completeness of the residue is verified. The method distinguishes between required atoms and atoms that are optional, like OXT that is only present, if not peptide bond is formed. Whenever an unknown atom is encountered, OnUnknownAtom() is invoked. Subclasses of the RuleBasedBuilder may implement some additional logic to deal with unknown atom. Likewise, whenever a required atom is missing, OnMissingAtom() is invoked. Hydrogen atoms are not considered as required by default.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
IdentifyResidue(residue)

Looks-up the residue in the database of chemical compounds and returns the name of the residue or “UNK” if the residue has not been found in the library.

Parameters:residue (mol.ResidueHandle) – must be a valid residue
OnUnknownAtom(atom)

Invoked whenever an unkknown atom has been encountered during a residue completeness check.

The default implementation guesses the atom properties based on the name and returns false, meaning that it should be treated as an unknown atom.

Custom implementations of this method may delete the atom, or modify it.

Parameters:atom (mol.AtomHandle) – the unknown atom
OnMissingAtom(atom)

Invoked whenever an atom is missing. It is up to the overloaded method to deal with the missing atom, either by ignoring it or by inserting a dummy atom.

Parameters:atom (string) – The missing atom’s name

Connecting atoms

A single function call to ConnectAll() is sufficient to assign residue and atoms properties as well as to connect atoms with bonds.

# Suppose that BuildRawModel is a function that returns a protein structure
# with no atom properties assigned and no bonds formed.
ent=BuildRawModel(...)
print ent.bonds  # will return an empty list
# Call ConnectAll() to assign properties/connect atoms
conop.ConnectAll(ent)
print ent.bonds  # will print a list containing many bonds

For fine grained control, the Builder interface may be used directly.

Convert MM CIF dictionary

The CompoundLib may be created from a MM CIF dictionary. The latest dictionary can be found on the wwPDB site.

After downloading the file in MM CIF use the chemdict_tool to convert the MM CIF dictionary into our internal format.

chemdict_tool create <components.cif> <compounds.chemlib>

If you are working with CHARMM trajectory files, you will also have to add the definitions for CHARMM. Assuming your are in the top-level source directory of OpenStructure, this can be achieved by:

chemdict_tool update modules/conop/data/charmm.cif <compounds.chemlib> charmm

Search

Enter search terms or a module, class or function name.

Contents

Documentation is available for the following OpenStructure versions:

dev / 2.8 / 2.7 / 2.6 / 2.5 / 2.4 / 2.3.1 / 2.3 / 2.2 / 2.1 / 2.0 / 1.9 / 1.8 / 1.7.1 / 1.7 / 1.6 / 1.5 / 1.4 / 1.3 / 1.2 / 1.11 / 1.10 / (Currently viewing 1.1)

This documentation is still under heavy development!
If something is missing or if you need the C++ API description in doxygen style, check our old documentation for further information.