Connectivity¶
Motivation¶
The connectivity of atoms is notoriously difficult to come by for biological macromolecules. PDB files, the de facto standard exchange format for structural information allows bonds to be specified in CONECT records. However, they are not mandatory. Many programs, especially the ones not depending on connectivity of atoms, do not write CONECT records. As a result, programs and structural biology frameworks can’t rely on connectivity information to be present. The connectivity information needs to be derived in the program itself.
Loader heuristics are great if you are the one that implemented them but are problematic if you are just the user of a software that has them. As time goes on, these heuristics become buried in thousands of lines of code and they are often hard yet impossible to trace back.
Different clients of the framework have different requirements. A visualisation software wants to read in a PDB files as is without making any changes. A script in an automated pipeline, however, does want to either strictly reject files that are incomplete or fill-in missing structural features. All these aspects are implemented in the conop module, separated from the loading of the PDB file, giving clients a fine grained control over the loading process. The conop logic can thus be reused in code requiring the presence of
The conop module defines a Processor
interface, to run connectivity
algorithms, that is to connect the atoms with bonds and perform basic clean up
of erroneous structures. The clients of the conop module can specify how the
Processor should treat unknown amino acids, missing atoms and chemically
infeasible bonds.
Processors¶
The exact behaviour for a processor is implementation-specific. So far, two classes implement the processor interface: A heuristic and a rule-based processor. The processors mainly differ in the source of their connectivity information.
The HeuristicProcessor uses a hard-coded heuristic connectivity
table for the 20 standard amino acids as well as nucleotides. For other
compounds such as ligands the HeuristicProcessor runs a distance-based
connectivity algorithm that connects two atoms if they belong to the same or
two consecutive residues, and are within a
reasonable distance
of each other.
The RuleBasedProcessor uses the compound library, a connectivity library containing all molecular components present in the PDB files on PDB.org. The library can easily be extended with custom connectivity information, if required.
- class Processor¶
- check_bond_feasibility¶
Whether an additional bond feasibility check is performed. Disabled by default. If turned on, atoms are only connected by bonds if they are within a reasonable distance (as defined by
IsBondFeasible()
).- Type:
bool
- assign_torsions¶
Whether backbone torsions should be added to the backbone. Enabled by default. If turned on, PHI, PSI and OMEGA torsions are assigned to the peptide residues. See also
AssignBackboneTorsions()
.- Type:
bool
- connect¶
Whether to connect atoms by bonds. Enabled by default. Turn this off if you would like to speed up the loading process and do not require connectivity information to be present in your structures. Note though that
peptide_bonds
may be ignored if this is turned off.- Type:
bool
- peptide_bonds¶
Whether to connect residues by peptide bonds. Enabled by default. This also sets the
is_protein
property of residues when peptide bonds are created. Turn this off if you would like to create your own peptide bonds.- Type:
bool
- zero_occ_treatment¶
Controls the behaviour of importing atoms with zero occupancy. By default, this is set to warn.
- Type:
- connect_hetatm¶
- Type:
bool
Whether to connect atoms that are both hetatms. Enabled by default. Disabling can be useful if there are compounds which are not covered by the PDB component dictionary and you prefer to create your own connectivity for those.
- Process(ent)¶
Processess the entity ent according to the current options.
- class HeuristicProcessor(check_bond_feasibility=False, assign_torsions=True, connect=True, peptide_bonds=True, connect_hetatm=True, zero_occ_treatment=CONOP_WARN)¶
The
HeuristicProcessor
implements theProcessor
interface. Refer to its documentation for methods and accessors common to all processor.- Parameters:
check_bond_feasibility – Sets
check_bond_feasibility
assign_torsions – Sets
assign_torsions
connect – Sets
connect
peptide_bonds – Sets
peptide_bonds
connect_hetatm – Sets
connect_hetatm
zero_occ_treatment – Sets
zero_occ_treatment
- class RuleBasedProcessor(compound_lib, fix_elements=True, strict_hydrogens=False, unknown_res_treatment=CONOP_WARN, unknown_atom_treatment=CONOP_WARN, check_bond_feasibility=False, assign_torsions=True, connect=True, peptide_bonds=True, connect_hetatm=True, zero_occ_treatment=CONOP_WARN)¶
The
RuleBasedProcessor
implements theProcessor
interface. Refer to its documentation for methods and accessors common to all processor.- Parameters:
compound_lib (
CompoundLib
) – The compound library to usefix_elements – Sets
fix_elements
strict_hydrogens – Sets
strict_hydrogens
unknown_res_treatment – Sets
unk_atom_treatment
unknown_atom_treatment – Sets
unk_res_treatment
check_bond_feasibility – Sets
check_bond_feasibility
assign_torsions – Sets
assign_torsions
connect – Sets
connect
peptide_bonds – Sets
peptide_bonds
connect_hetatm – Sets
connect_hetatm
zero_occ_treatment – Sets
zero_occ_treatment
- fix_elements¶
Whether the element of the atom should be changed to the atom defined in the compound library. Enabled by default.
- Type:
bool
- strict_hydrogens¶
Whether to use strict hydrogen naming rules outlined in the compound library. Disabled by default.
- Type:
bool
- unk_atom_treatment¶
Treatment upon encountering an unknown atom. Warn by default.
- Type:
- unk_res_treatment¶
Treatment upon encountering an unknown residue. Warn by default.
- Type:
- class ConopAction¶
Defines actions to take when certain events happen during processing. Possible values:
CONOP_WARN
,CONOP_SILENT
,CONOP_REMOVE
,CONOP_REMOVE_ATOM
,CONOP_REMOVE_RESIDUE
,CONOP_FATAL