This document is for OpenStructure version 2.2, the latest version is 2.9.0 !

mmCIF File Format

The mmCIF file format is a container for structural entities provided by the PDB. Here we describe how to load those files and how to deal with information provided above the legacy PDB format (MMCifInfo, MMCifInfoCitation, MMCifInfoTransOp, MMCifInfoBioUnit, MMCifInfoStructDetails, MMCifInfoObsolete, MMCifInfoStructRef, MMCifInfoStructRefSeq, MMCifInfoStructRefSeqDif, MMCifInfoRevisions, MMCifInfoEntityBranchLink).

Loading mmCIF Files

LoadMMCIF(filename, fault_tolerant=None, calpha_only=None, profile='DEFAULT', remote=False, seqres=False, info=False)

Load a mmCIF file and return one or more entities. Several options allow to customize the exact behaviour of the mmCIF import. For more information on these options, see IO Profiles for entity importer.

Residues are flagged as ligand if they are mentioned in a HET record.

Parameters:
  • fault_tolerant – Enable/disable fault-tolerant import. If set, overrides the value of IOProfile.fault_tolerant.

  • remote – If set to True, the method tries to load the pdb from the remote pdb repository www.pdb.org. The filename is then interpreted as the pdb id.

  • seqres – Whether to read SEQRES records. If True, a SequenceList object is returned as the second item. The sequences in the list are named according to the mmCIF chain name. This feature requires a default compound library to be defined and accessible via GetDefaultLib() or an empty list is returned.

  • info – Whether to return an info container with the other output. If True, a MMCifInfo object is returned as last item.

Return type:

EntityHandle (or tuple if seqres or info are True).

Raises:

IOException if the import fails due to an erroneous or non-existent file.

Categories Available

The following categories of a mmCIF file are considered by the reader:

Notes:

  • Structures in mmCIF format can have two chain names. The “new” chain name extracted from atom_site.label_asym_id is used to name the chains in the EntityHandle. The “old” (author provided) chain name is extracted from atom_site.auth_asym_id for the first atom of the chain. It is added as string property named “pdb_auth_chain_name” to the ChainHandle. The mapping is also stored in MMCifInfo as GetMMCifPDBChainTr() and GetPDBMMCifChainTr() if SEQRES records are read in LoadMMCIF() and a non-empty SEQRES record exists for that chain (this should exclude ligands and water).

  • Molecular entities in mmCIF are identified by an entity.id. Each chain is mapped to an ID in MMCifInfo as GetMMCifEntityIdTr().

Info Classes

Information from mmCIF files that goes beyond structural data, is kept in a special container, the MMCifInfo class. Here is a detailed description of the annotation available.

class MMCifInfo

This is the container for all bits of non-molecular data pulled from a mmCIF file.

citations

Stores a list of citations (MMCifInfoCitation).

Also available as GetCitations().

biounits

Stores a list of biounits (MMCifInfoBioUnit).

Also available as GetBioUnits().

method

Stores the experimental method used to create the structure.

Also available as GetMethod(). May also be modified by SetMethod().

resolution

Stores the resolution of the crystal structure. Set to 0 if no value in loaded mmCIF file.

Also available as GetResolution(). May also be modified by SetResolution().

r_free

Stores the R-free value of the crystal structure. Set to 0 if no value in loaded mmCIF file.

Also available as GetRFree(). May also be modified by SetRFree().

r_work

Stores the R-work value of the crystal structure. Set to 0 if no value in loaded mmCIF file.

Also available as GetRWork(). May also be modified by SetRWork().

operations

Stores the operations needed to transform a crystal structure into a bio unit.

Also available as GetOperations(). May also be modified by AddOperation().

struct_details

Stores details about the structure in a MMCifInfoStructDetails object.

Also available as GetStructDetails(). May also be modified by SetStructDetails().

struct_refs

Lists all links to external databases in the mmCIF file.

revisions

Stores a simple history of a PDB entry.

Also available as GetRevisions(). May be extended by AddRevision().

Type:

MMCifInfoRevisions

obsolete

Stores information about obsoleted / superseded entries.

Also available as GetObsoleteInfo(). May also be modified by SetObsoleteInfo().

Type:

MMCifInfoObsolete

AddCitation(citation)

Add a citation to the citation list of an info object.

Parameters:

citation (MMCifInfoCitation) – Citation to be added.

AddAuthorsToCitation(id, authors)

Adds a list of authors to a specific citation.

Parameters:
  • id (str) – Identifier of the citation.

  • authors (StringList) – List of authors.

GetCitations()

See citations

AddBioUnit(biounit)

Add a bio unit to the bio unit list of an info object. If the id of biounit already exists in the set of assemblies, both will be merged. This means that chain and operations lists will be concatenated and the interval lists (operationsintervalls, chainintervalls) will be updated.

Parameters:

biounit (MMCifInfoBioUnit) – Bio unit to be added.

GetBioUnits()

See biounits

SetMethod(method)

See method

GetMethod()

See method

SetResolution(resolution)

See resolution

GetResolution()

See resolution

AddOperation(operation)

See operations

GetOperations()

See operations

SetStructDetails(details)

See struct_details

GetStructDetails()
AddMMCifPDBChainTr(cif_chain_id, pdb_chain_id)

Set up a translation for a certain mmCIF chain name to the traditional PDB chain name.

Parameters:
  • cif_chain_id (str) – atom_site.label_asym_id

  • pdb_chain_id (str) – atom_site.auth_asym_id

GetMMCifPDBChainTr(cif_chain_id)

Get the translation of a certain mmCIF chain name to the traditional PDB chain name. Only works if SEQRES records are read in LoadMMCIF() and a compound library is available (see GetDefaultLib()).

Parameters:

cif_chain_id (str) – atom_site.label_asym_id

Returns:

atom_site.auth_asym_id as str (empty if no mapping)

AddPDBMMCifChainTr(pdb_chain_id, cif_chain_id)

Set up a translation for a certain PDB chain name to the mmCIF chain name.

Parameters:
  • pdb_chain_id (str) – atom_site.auth_asym_id

  • cif_chain_id (str) – atom_site.label_asym_id

GetPDBMMCifChainTr(pdb_chain_id)

Get the translation of a certain PDB chain name to the mmCIF chain name.

Parameters:

pdb_chain_id (str) – atom_site.auth_asym_id

Returns:

atom_site.label_asym_id as str (empty if no mapping)

AddMMCifEntityIdTr(cif_chain_id, entity_id)

Set up a translation for a certain mmCIF chain name to the mmCIF entity ID.

Parameters:
  • cif_chain_id (str) – atom_site.label_asym_id

  • entity_id (str) – atom_site.label_entity_id

GetMMCifEntityIdTr(cif_chain_id)

Get the translation of a certain mmCIF chain name to the mmCIF entity ID.

Parameters:

cif_chain_id (str) – atom_site.label_asym_id

Returns:

atom_site.label_entity_id as str (empty if no mapping)

AddRevision(num, date, status, major=-1, minor=-1)

Add a new iteration to the revision history. See MMCifInfoRevisions.AddRevision().

GetRevisions()

See revisions

SetRevisionsDateOriginal(date)

Set the date, when this entry first entered the PDB. Ignored if it was set in the past. See MMCifInfoRevisions.SetDateOriginal().

GetObsoleteInfo()

See obsolete

SetObsoleteInfo()

See obsolete

Get bond information for branched entities. Returns all MMCifInfoEntityBranchLink objects in one list. Chain and residue information is available by the stored AtomHandles of each entry.

Returns:

list of MMCifInfoEntityBranchLink

GetEntityBranchByChain(chain_name)

Get bond information for chains with branched entities. Returns all MMCifInfoEntityBranchLink objects in one list if chain is a branched entity, an empty list otherwise.

Parameters:

chain_name (str) – Chain name to check for branch links

Returns:

list of MMCifInfoEntityBranchLink

Add bond information for a branched entity.

Parameters:
  • chain_name (str) – Chain the bond belongs to

  • atom1 (AtomHandle) – First atom of the bond

  • atom2 (AtomHandle) – Second atom of the bond

  • bond_order (int) – Bond order (e.g. 1=single, 2=double, 3=triple)

Returns:

Nothing

GetEntityBranchChainNames()

Get a list of chain names which contain branched entities.

Returns:

list of str

GetEntityBranchChains()

Get a list of chains which contain branched entities.

Returns:

list of ChainHandle

Establish all bonds stored for branched entities.

class MMCifInfoCitation

This stores citation information from an input file.

id

Stores an internal identifier for a citation. If not provided, resembles an empty string.

Also available as GetID(). May also be modified by SetID().

cas

Stores a Chemical Abstract Service identifier if available. If not provided, resembles an empty string.

Also available as GetCAS(). May also be modified by SetCas().

isbn

Stores the ISBN code, presumably for cited books. If not provided, resembles an empty string.

Also available as GetISBN(). May also be modified by SetISBN().

published_in

Stores the book or journal title of a publication. Should take the full title, no abbreviations. If not provided, resembles an empty string.

Also available as GetPublishedIn(). May also be modified by SetPublishedIn().

volume

Supposed to store volume information for journals. Since the volume number is not always a simple integer, it is stored as a string. If not provided, resembles an empty string.

Also available as GetVolume(). May also be modified by SetVolume().

page_first

Stores the first page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.

Also available as GetPageFirst(). May also be modified by SetPageFirst().

page_last

Stores the last page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.

Also available as GetPageLast(). May also be modified by SetPageLast().

doi

Stores the Document Object Identifier as used by doi.org for a cited document. If not provided, resembles an empty string.

Also available as GetDOI(). May also be modified by SetDOI().

pubmed

Stores the PubMed accession number. If not provided, is set to 0.

Also available as GetPubMed(). May also be modified by SetPubmed().

year

Stores the publication year. If not provided, is set to 0.

Also available as GetYear(). May also be modified by SetYear().

title

Stores a title. If not provided, is set to an empty string.

Also available as GetTitle(). May also be modified by SetTitle().

book_publisher

Name of publisher of the citation, relevant for books and book chapters.

Also available as GetBookPublisher() and SetBookPublisher().

book_publisher_city

City of the publisher of the citation, relevant for books and book chapters.

Also available as GetBookPublisherCity() and SetBookPublisherCity().

citation_type

Defines where a citation was published. Either journal, book or unknown.

Also available as GetCitationType(). May also be modified by SetCitationType() with values from MMCifInfoCType. For conveinience setters SetCitationTypeJournal(), SetCitationTypeBook() and SetCitationTypeUnknown() exist.

For checking the type of a citation, IsCitationTypeJournal(), IsCitationTypeBook() and IsCitationTypeUnknown() can be used.

authors

Stores a StringList of authors.

Also available as GetAuthorList(). May also be modified by SetAuthorList().

GetCAS()

See cas

SetCAS(cas)

See cas

GetISBN()

See isbn

SetISBN(isbn)

See isbn

GetPublishedIn()

See published_in

SetPublishedIn(title)

See published_in

GetVolume()

See volume

SetVolume(volume)

See volume

GetPageFirst()

See page_first

SetPageFirst(first)

See page_first

GetPageLast()

See page_last

SetPageLast(last)

See page_last

GetDOI()

See doi

SetDOI(doi)

See doi

GetPubMed()

See pubmed

SetPubMed(no)

See pubmed

GetYear()

See year

SetYear(year)

See year

GetTitle()

See title

SetTitle(title)

See title

GetBookPublisher()

See book_publisher

SetBookPublisher()

See book_publisher

GetBookPublisherCity()

See book_publisher_city

SetBookPublisherCity()

See book_publisher_city

GetCitationType()

See citation_type

SetCitationType(publication_type)

See citation_type

SetCitationTypeJournal()

See citation_type

SetCitationTypeBook()

See citation_type

SetCitationTypeUnknown()

See citation_type

IsCitationTypeJournal()

See citation_type

IsCitationTypeBook()

See citation_type

IsCitationTypeUnknown()

See citation_type

GetAuthorList()

See authors

SetAuthorList(list)

See authors

class MMCifInfoTransOp

This stores operations needed to transform an EntityHandle into a bio unit.

id

A unique identifier. If not provided, resembles an empty string.

Also available as GetID(). May also be modified by SetID().

type

Describes the operation. If not provided, resembles an empty string.

Also available as GetType(). May also be modified by SetType().

translation

The translational vector. Also available as GetVector(). May also be

modified by SetVector().

rotation

The rotational matrix. Also available as GetMatrix(). May also be

modified by SetMatrix().

GetID()

See id

SetID(id)

See id

GetType()

See type

SetType(type)

See type

GetVector()

See translation

SetVector(x, y, z)

See translation

GetMatrix()

See rotation

SetMatrix(i00, i01, i02, i10, i11, i12, i20, i21, i22)

See rotation

class MMCifInfoBioUnit

This stores information how a structure is to be assembled to form the bio unit.

id

The id of a bio unit as given by the original mmCIF file.

Also available as GetID(). May also be modified by SetID().

Type:

str

details

Special aspects of the biological assembly. If not provided, resembles an empty string.

Also available as GetDetails(). May also be modified by SetDetails().

method_details

Details about the method used to determine this biological assembly.

Also available as GetMethodDetails(). May also be modified by SetMethodDetails().

chains

Chains involved in this bio unit. If not provided, resembles an empty list.

Also available as GetChainList(). May also be modified by AddChain() or SetChainList().

chainintervals

List of intervals on the chain list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.

Also available as GetChainIntervalList(). Is automatically modified by AddChain(), SetChainList() and MMCifInfo.AddBioUnit().

operations

Translations and rotations needed to create the bio unit. Filled with objects of class MMCifInfoTransOp.

Also available as GetOperations(). May be modified by AddOperations()

operationsintervalls

List of intervals on the operations list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.

Also available as GetOperationsIntervalList(). Is automatically modified by AddOperations() and MMCifInfo.AddBioUnit().

GetID()

See id

SetID(id)

See id

GetDetails()

See details

SetDetails(details)

See details

GetMethodDetails()

See method_details

SetMethodDetails(details)

See method_details

GetChainList()

See chains

SetChainList(chains)

See chains, also resets chainintervalls to contain only one interval enclosing the whole chain list.

Parameters:

chains (StringList) – List of chain names.

AddChain(chain name)

See chains, also extends the right border of the last entry in chainintervalls.

GetChainIntervalList()

See chainintervals

GetOperations()

See operations

AddOperations(list of operations)

See operations, also extends the right border of the last entry in operationsintervalls.

GetOperationsIntervalList()

See operationsintervalls

PDBize(asu, seqres=None, min_polymer_size=None, transformation=False, peptide_min_size=10, nucleicacid_min_size=10, saccharide_min_size=10)

Returns the biological assembly (bio unit) for an entity. The new entity created is well suited to be saved as a PDB file. Therefore the function tries to meet the requirements of single-character chain names. The following measures are taken.

  • All ligands are put into one chain (_)

  • Water is put into one chain (-)

  • Each polymer gets its own chain, named A-Z 0-9 a-z.

  • The description of non-polymer chains will be put into a generic string property called description on the residue level.

  • Ligands that resemble a polymer but have less than min_polymer_size / peptide_min_size / nucleicacid_min_size / saccharide_min_size residues are assigned the same numeric residue number. The residues are distinguished by insertion code.

  • Sometimes bio units exceed the coordinate system storable in a PDB file. In that case, the box around the entity will be aligned to the lower left corner of the coordinate system.

Since this function is at the moment mainly used to create biounits from mmCIF files to be saved as PDBs, the function assumes that the ChainType properties are set correctly.

Parameters:
  • asu (EntityHandle) – Asymmetric unit to work on. Should be created from a mmCIF file.

  • seqres (SequenceList) – If set to a valid sequence list, the length of the seqres records will be used to determine if a certain chain has the minimally required length.

  • min_polymer_size (int) – The minimal number of residues a polymer needs to get its own chain. Everything below that number will be sorted into the ligand chain. Overrides peptide_min_size, nucleicacid_min_size and saccharide_min_size if set to a value different than None.

  • transformation (bool) – If set, return the transformation matrix used to move the bounding box of the bio unit to the lower left corner.

  • peptide_min_size (int) – Minimal size to get an individual chain for a polypeptide. Is overridden by min_polymer_size.

  • nucleicacid_min_size (int) – Minimal size to get an individual chain for a polynucleotide. Is overridden by min_polymer_size.

  • saccharide_min_size (int) – Minimal size to get an individual chain for an oligosaccharide or polysaccharide. Is overridden by min_polymer_size.

class MMCifInfoStructDetails

Holds details about the structure.

entry_id

Identifier for a curtain data block. If not provided, resembles an empty string.

Also available as GetEntryID(). May also be modified by SetEntryID().

title

Set a title for the structure.

Also available as GetTitle(). May also be modified by SetTitle().

casp_flag

Tells whether this structure was a target in some competition.

Also available as GetCASPFlag(). May also be modified by SetCASPFlag().

descriptor

Descriptor for an NDB structure or the unstructured content of a PDB COMPND record.

Also available as GetDescriptor(). May also be modified by SetDescriptor().

mass

Molecular mass of a molecule.

Also available as GetMass(). May also be modified by SetMass().

mass_method

Method used to determine the molecular weight.

Also available as GetMassMethod(). May also be modified by SetMassMethod().

model_details

Details about how the structure was determined.

Also available as GetModelDetails(). May also be modified by SetModelDetails().

model_type_details

Details about how the type of the structure was determined.

Also available as GetModelTypeDetails(). May also be modified by SetModelTypeDetails().

GetEntryID()

See entry_id

SetEntryID(id)

See entry_id

GetTitle()

See title

SetTitle(title)

See title

GetCASPFlag()

See casp_flag

SetCASPFlag(flag)

See casp_flag

GetDescriptor()

See descriptor

SetDescriptor(descriptor)

See descriptor

GetMass()

See mass

SetMass(mass)

See mass

GetMassMethod()

See mass_method

SetMassMethod(method)

See mass_method

GetModelDetails()

See model_details

SetModelDetails(details)

See model_details

GetModelTypeDetails()

See model_type_details

SetModelTypeDetails(details)

See model_type_details

class MMCifInfoObsolete
Holds details on obsolete / superseded structures. The data is

available both in the obsolete and in the replacement entries.

date

When was the entry replaced?

Also available as GetDate(). May also be modified by SetDate().

id

Type of change. Either Obsolete or Supersede. Returns a string starting upper case. Has to be set via OBSLTE or SPRSDE.

Also available as GetID(). May also be modified by SetID().

pdb_id

ID of the replacing entry.

Also available as GetPDBID(). May also be modified by SetPDBID().

replace_pdb_id

ID of the replaced entry.

Also available as GetReplacedPDBID(). May also be modified by SetReplacedPDBID().

GetDate()

See date

SetDate(date)

See date

GetID()

See id

SetID(id)

See id

GetPDBID()

See pdb_id

SetPDBID(flag)

See pdb_id

GetReplacedPDBID()

See replace_pdb_id

SetReplacedPDBID(descriptor)

See replace_pdb_id

class MMCifInfoStructRef

Holds the information of the struct_ref category. The category describes the link of polymers in the mmCIF file to sequences stored in external databases such as UniProt. The related categories struct_ref_seq and struct_ref_seq_dif also list differences between the sequences of the deposited structure and the sequences in the database. Two prominent examples of such differences are point mutations and/or expression tags.

db_name

Name of the external database, for example UNP for UniProt.

Type:

str

db_id

Name of the reference sequence in the database pointed to by db_name.

Type:

str

db_access

Alternative accession code for the sequence in the database pointed to by db_name.

Type:

str

GetAlignedSeq(name)

Returns the aligned sequence for the given name, None if the sequence does not exist.

aligned_seqs

List of aligned sequences (all entries of the struct_ref_seq category mapping to this struct_ref).

class MMCifInfoStructRefSeq

An aligned range of residues between a sequence in a reference database and the deposited sequence.

align_id

Uniquely identifies every struct_ref_seq item in the mmCIF file.

Type:

str

seq_begin
seq_end

The starting point (1-based) and end point of the aligned range in the deposited sequence, respectively.

Type:

int

db_begin
db_end

The starting point (1-based) and end point of the aligned range in the database sequence, respectively.

Type:

int

difs

List of differences between the deposited sequence and the sequence in the database.

chain_name

Chain name of the polymer in the mmCIF file.

class MMCifInfoStructRefSeqDif

A particular difference between the deposited sequence and the sequence in the database.

rnum

The residue number (1-based) of the residue in the deposited sequence

Type:

int

details

A textual description of the difference, e.g. point mutation, expression tag, purification artifact.

Type:

str

class MMCifInfoRevisions

Revision history of a PDB entry. If you find a ‘?’ somewhere, this means ‘not set’.

date_original

The date when this entry was seen in PDB for the very first time. This is not necessarily the release date. Expected format ‘yyyy-mm-dd’.

Type:

str

first_release

Index + 1 of the revision releasing this entry. If the value is 0, was not set yet. Set first time we encounter a GetStatus() value of “full release” (mmCIF versions < 5) or “Initial release” (current mmCIF).

Type:

int

AddRevision(num, date, status, major=-1, minor=-1)

Add a new iteration to the history.

Parameters:
Raises:

Exception if num is <= the last added iteration.

GetSize()
Returns:

Number of revisions (valid revision indices are in [0, number-1]).

Return type:

int

GetDate(i)
Parameters:

i (int) – Index of revision

Returns:

Date the PDB revision took place. Expected format ‘yyyy-mm-dd’.

Return type:

str

Raises:

Exception if i out of bounds.

GetNum(i)
Parameters:

i (int) – Index of revision

Returns:

Unique identifier of revision (assigned in increasing order)

Return type:

int

Raises:

Exception if i out of bounds.

GetStatus(i)
Parameters:

i (int) – Index of revision

Returns:

The status of this revision.

Return type:

str

Raises:

Exception if i out of bounds.

GetMajor(i)
Parameters:

i (int) – Index of revision

Returns:

The major version of this revision (-1 if not set).

Return type:

int

Raises:

Exception if i out of bounds.

GetMinor(i)
Parameters:

i (int) – Index of revision

Returns:

The minor version of this revision (-1 if not set).

Return type:

int

Raises:

Exception if i out of bounds.

GetLastDate()
Returns:

Date of the latest revision (‘?’ if no revision set).

Return type:

str

GetLastMajor()
Returns:

Major version of the latest revision (-1 if not set).

Return type:

int

GetLastMinor()
Returns:

Minor version of the latest revision (-1 if not set).

Return type:

int

SetDateOriginal(date)
GetDateOriginal()

See date_original

GetFirstRelease()

See first_release

Data from pdbx_entity_branch, most specifically pdbx_entity_branch_link. That is connectivity information for branched entities, e.g. carbohydrates/ oligosaccharides. Conop Processors can not easily connect them so we use this information in LoadMMCIF() to do that.

atom1

The first atom of the bond. Corresponds to entity_branch_link.atom_id_1, entity_branch_link.comp_id_1 and entity_branch_link.entity_branch_list_num_1. Also available via GetAtom1() and SetAtom1().

Type:

AtomHandle

atom2

The second atom of the bond. Corresponds to entity_branch_link.atom_id_2, entity_branch_link.comp_id_2 and entity_branch_link.entity_branch_list_num_2. Also available via GetAtom2() and SetAtom2().

Type:

AtomHandle

bond_order

Order of a bond (e.g. 1=single, 2=double, 3=triple). Corresponds to entity_branch_link.value_order. Also available via GetBondOrder() and SetBondOrder().

Type:

int

Establish a bond between atom1 and atom2 of a MMCifInfoEntityBranchLink.

Parameters:

editor (XCSEditor) – The editor instance to call for connecting the atoms.

Returns:

Nothing

GetAtom1()

See atom1

GetAtom2()

See atom2

GetBondOrder()

See bond_order

SetAtom1()

See atom1

SetAtom2()

See atom2

SetBondOrder()

See bond_order

Search

Enter search terms or a module, class or function name.

Contents

Documentation is available for the following OpenStructure versions:

dev / 2.9.0 / 2.8 / 2.7 / 2.6 / 2.5 / 2.4 / 2.3.1 / 2.3 / (Currently viewing 2.2) / 2.1 / 2.0 / 1.9 / 1.8 / 1.7.1 / 1.7 / 1.6 / 1.5 / 1.4 / 1.3 / 1.2 / 1.11 / 1.10 / 1.1

This documentation is still under heavy development!
If something is missing or if you need the C++ API description in doxygen style, check our old documentation for further information.