mmCIF File Format¶
The mmCIF file format is an alternate container for structural entities, also provided by the PDB. Here we describe how to load those files and how to deal with information provided above the common PDB format (MMCifInfo, MMCifInfoCitation, MMCifInfoTransOp, MMCifInfoBioUnit, MMCifInfoStructDetails).
Loading mmCIF Files¶
- LoadMMCIF(filename, restrict_chains='', fault_tolerant=None, calpha_only=None, profile='DEFAULT', remote=False, seqres=False, info=False)¶
Load MMCIF file from disk and return one or more entities. Several options allow to customize the exact behaviour of the MMCIF import. For more information on these options, see IO Profiles for entity importer.
Residues are flagged as ligand if they are mentioned in a HET record.
Parameters: - restrict_chains – If not an empty string, only chains listed in the string will be imported.
- fault_tolerant – Enable/disable fault-tolerant import. If set, overrides the value of IOProfile.fault_tolerant.
- remote – If set to True, the method tries to load the pdb from the remote pdb repository www.pdb.org. The filename is then interpreted as the pdb id.
- seqres – Whether to read SEQRES records. If set to True, the loaded entity and seqres entry will be returned as second item.
- info – Whether to return an info container with the other output. Returns a MMCifInfo object as last item.
Return type: Raises : IOException if the import fails due to an erroneous or non-existent file.
Categories Available¶
The following categories of a mmCIF file are considered by the reader:
- atom_site: Used to build the entity
- entity: Involved in setting ChainTypes
- entity_poly: Involved in setting ChainTypes
- citation: Goes into MMCifInfoCitation
- citation_author: Goes into MMCifInfoCitation
- exptl: Goes into MMCifInfo as method.
- refine: Goes into MMCifInfo as resolution.
- pdbx_struct_assembly: Used for MMCifInfoBioUnit.
- pdbx_struct_assembly_gen: Used for MMCifInfoBioUnit.
- pdbx_struct_oper_list: Used for MMCifInfoBioUnit.
- struct: Details about a structure, stored in MMCifInfoStructDetails.
- struct_conf: Stores secondary structure information (practically helices) in the entity
- struct_sheet_range: Stores secondary structure information for sheets in the entity
- pdbx_database_PDB_obs_spr: Verbose information on obsoleted/ superseded entries, stored in MMCifInfoObsolete
- struct_ref stored in MMCifInfoStructRef
- struct_ref_seq stored in MMCifInfoStructRefSeq
- struct_ref_seq_dif stored in MMCifInfoStructRefDif
- database_pdb_rev stored in MMCifInfoRevisions
Info Classes¶
Information from mmCIF files that goes beyond structural data, is kept in a special container, the MMCifInfo class. Here is a detailed description of the annotation available.
- class MMCifInfo¶
This is the container for all bits of non-molecular data pulled from a mmCIF file.
- citations¶
Stores a list of citations (MMCifInfoCitation).
Also available as GetCitations().
- biounits¶
Stores a list of biounits (MMCifInfoBioUnit).
Also available as GetBioUnits().
- method¶
Stores the experimental method used to create the structure.
Also available as GetMethod(). May also be modified by SetMethod().
- resolution¶
Stores the resolution of the crystal structure.
Also available as GetResolution(). May also be modified by SetResolution().
- operations¶
Stores the operations needed to transform a crystal structure into a bio unit.
Also available as GetOperations(). May also be modified by AddOperation().
- struct_details¶
Stores details about the structure in a MMCifInfoStructDetails object.
Also available as GetStructDetails(). May also be modified by SetStructDetails().
- struct_refs¶
Lists all links to external databases in the mmCIF file.
- revisions¶
Stores a simple history of a PDB entry.
Also available as GetRevisions(). May be extended by AddRevision().
- AddCitation(citation)¶
Add a citation to the citation list of an info object.
Parameters: citation (MMCifInfoCitation) – Citation to be added.
- AddAuthorsToCitation(id, authors)¶
Adds a list of authors to a specific citation.
Parameters: - id (str) – Identifier of the citation.
- authors (StringList) – List of authors.
- AddBioUnit(biounit)¶
Add a bio unit to the bio unit list of an info object. If the id of biounit already exists in the set of assemblies, both will be merged. This means that chain and operations lists will be concatenated and the interval lists (operationsintervalls, chainintervalls) will be updated.
Parameters: biounit (MMCifInfoBioUnit) – Bio unit to be added.
- SetResolution(resolution)¶
See resolution
- GetResolution()¶
See resolution
- AddOperation(operation)¶
See operations
- GetOperations()¶
See operations
- SetStructDetails(details)¶
See struct_details
- GetStructDetails()¶
- AddMMCifPDBChainTr(cif_chain_id, pdb_chain_id)¶
Set up a translation for a certain mmCIF chain name to the traditional PDB chain name.
Parameters: - cif_chain_id (str) – atom_site.label_asym_id
- pdb_chain_id (str) – atom_site.auth_asym_id
- GetMMCifPDBChainTr(cif_chain_id)¶
Get the translation of a certain mmCIF chain name to the traditional PDB chain name.
Parameters: cif_chain_id (str) – atom_site.label_asym_id Returns: atom_site.auth_asym_id as str
- AddPDBCMMCifChainTr(pdb_chain_id, cif_chain_id)¶
Set up a translation for a certain PDB chain name to the mmCIF chain name.
Parameters: - pdb_chain_id (str) – atom_site.label_asym_id
- cif_chain_id (str) – atom_site.auth_asym_id
- GetPDBMMCifChainTr(pdb_chain_id)¶
Get the translation of a certain PDB chain name to the mmCIF chain name.
Parameters: pdb_chain_id (str) – atom_site.auth_asym_id Returns: atom_site.label_asym_id as str
- AddRevision(num, date, status)¶
Add a new iteration to the history.
Parameters: - num (int) – database_pdb_rev.num
- date (str) – database_pdb_rev.date
- status (str) – database_pdb_rev.status
- SetRevisionsDateOriginal(date)¶
Set the date, when this entry first entered the PDB.
Parameters: date (str) – database_pdb_rev.date_original
- class MMCifInfoCitation¶
This stores citation information from an input file.
- id¶
Stores an internal identifier for a citation. If not provided, resembles an empty string.
Also available as GetID(). May also be modified by SetID().
- cas¶
Stores a Chemical Abstract Service identifier if available. If not provided, resembles an empty string.
Also available as GetCAS(). May also be modified by SetCas().
- isbn¶
Stores the ISBN code, presumably for cited books. If not provided, resembles an empty string.
Also available as GetISBN(). May also be modified by SetISBN().
- published_in¶
Stores the book or journal title of a publication. Should take the full title, no abbreviations. If not provided, resembles an empty string.
Also available as GetPublishedIn(). May also be modified by SetPublishedIn().
- volume¶
Supposed to store volume information for journals. Since the volume number is not always a simple integer, it is stored as a string. If not provided, resembles an empty string.
Also available as GetVolume(). May also be modified by SetVolume().
- page_first¶
Stores the first page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.
Also available as GetPageFirst(). May also be modified by SetPageFirst().
- page_last¶
Stores the last page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.
Also available as GetPageLast(). May also be modified by SetPageLast().
- doi¶
Stores the Document Object Identifier as used by doi.org for a cited document. If not provided, resembles an empty string.
Also available as GetDOI(). May also be modified by SetDOI().
- pubmed¶
Stores the PubMed accession number. If not provided, is set to 0.
Also available as GetPubMed(). May also be modified by SetPubmed().
- year¶
Stores the publication year. If not provided, is set to 0.
Also available as GetYear(). May also be modified by SetYear().
- title¶
Stores a title. If not provided, is set to an empty string.
Also available as GetTitle(). May also be modified by SetTitle().
Stores a StringList of authors.
Also available as GetAuthorList(). May also be modified by SetAuthorList().
- GetPublishedIn()¶
See published_in
- SetPublishedIn(title)¶
See published_in
- GetPageFirst()¶
See page_first
- SetPageFirst(first)¶
See page_first
- class MMCifInfoTransOp¶
This stores operations needed to transform an entity into a bio unit.
- id¶
A unique identifier. If not provided, resembles an empty string.
- type¶
Describes the operation. If not provided, resembles an empty string.
Also available as GetType(). May also be modified by SetType().
- translation¶
The translational vector. Also available as GetVector(). May also be
modified by SetVector().
- rotation¶
The rotational matrix. Also available as GetMatrix(). May also be
modified by SetMatrix().
- GetVector()¶
See translation
- SetVector(x, y, z)¶
See translation
- class MMCifInfoBioUnit¶
This stores information how a structure is to be assembled to form the bio unit.
- id¶
The id of a bio unit as given by the original mmCIF file.
Also available as GetID(). May also be modified by SetID().
Type : str
- details¶
Special aspects of the biological assembly. If not provided, resembles an empty string.
Also available as GetDetails(). May also be modified by SetDetails().
- method_details¶
Details about the method used to determine this biological assembly.
Also available as GetMethodDetails(). May also be modified by SetMethodDetails().
- chains¶
Chains involved in this bio unit. If not provided, resembles an empty list.
Also available as GetChainList(). May also be modified by AddChain() or SetChainList().
- chainintervals¶
List of intervals on the chain list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.
Also available as GetChainIntervalList(). Is automatically modified by AddChain(), SetChainList() and MMCifInfo.AddBioUnit().
- operations¶
Translations and rotations needed to create the bio unit. Filled with objects of class MMCifInfoTransOp.
Also available as GetOperations(). May be modified by AddOperations()
- operationsintervalls¶
List of intervals on the operations list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.
Also available as GetOperationsIntervalList(). Is automatically modified by AddOperations() and MMCifInfo.AddBioUnit().
- GetMethodDetails()¶
See method_details
- SetMethodDetails(details)¶
See method_details
- SetChainList(chains)¶
See chains, also resets chainintervalls to contain only one interval enclosing the whole chain list.
Parameters: chains (StringList) – List of chain names.
- AddChain(chain name)¶
See chains, also extends the right border of the last entry in chainintervalls.
- GetChainIntervalList()¶
See chainintervalls
- GetOperations()¶
See operations
- AddOperations(list of operations)¶
See operations, also extends the right border of the last entry in operationsintervalls.
- GetOperationsIntervalList()¶
- PDBize(asu, seqres=None, min_polymer_size=10, transformation=False)¶
Returns the biological assembly (bio unit) for an entity. The new entity created is well suited to be saved as a PDB file. Therefore the function tries to meet the requirements of single-character chain names. The following measures are taken.
- All ligands are put into one chain (_)
- Water is put into one chain (-)
- Each polymer gets its own chain, named A-Z 0-9 a-z.
- The description of non-polymer chains will be put into a generic string property called description on the residue level.
- Ligands that resemble a polymer but have less than min_polymer_size residues are assigned the same numeric residue number. The residues are distinguished by insertion code.
- Sometimes bio units exceed the coordinate system storable in a PDB file. In that case, the box around the entity will be aligned to the lower left corner of the coordinate system.
Since this function is at the moment mainly used to create biounits from mmCIF files to be saved as PDBs, the function assumes that the ChainType properties are set correctly.
Parameters: - asu (EntityHandle) – Asymmetric unit to work on. Should be created from a mmCIF file.
- seqres (:class:’~ost.seq.SequenceList’) – If set to a valid sequence list, the length of the seqres records will be used to determine if a certain chain has the minimally required length.
- min_polymer_size (int) – The minimal number of residues a polymer needs to get its own chain. Everything below that number will be sorted into the ligand chain.
- transformation (bool) – If set, return the transformation matrix used to move the bounding box of the bio unit to the lower left corner.
- class MMCifInfoStructDetails¶
Holds details about the structure.
- entry_id¶
Identifier for a curtain data block. If not provided, resembles an empty string.
Also available as GetEntryID(). May also be modified by SetEntryID().
- title¶
Set a title for the structure.
Also available as GetTitle(). May also be modified by SetTitle().
- casp_flag¶
Tells whether this structure was a target in some competition.
Also available as GetCASPFlag(). May also be modified by SetCASPFlag().
- descriptor¶
Descriptor for an NDB structure or the unstructured content of a PDB COMPND record.
Also available as GetDescriptor(). May also be modified by SetDescriptor().
- mass_method¶
Method used to determine the molecular weight.
Also available as GetMassMethod(). May also be modified by SetMassMethod().
- model_details¶
Details about how the structure was determined.
Also available as GetModelDetails(). May also be modified by SetModelDetails().
- model_type_details¶
Details about how the type of the structure was determined.
Also available as GetModelTypeDetails(). May also be modified by SetModelTypeDetails().
- GetDescriptor()¶
See descriptor
- SetDescriptor(descriptor)¶
See descriptor
- GetMassMethod()¶
See mass_method
- SetMassMethod(method)¶
See mass_method
- GetModelDetails()¶
See model_details
- SetModelDetails(details)¶
See model_details
- GetModelTypeDetails()¶
- SetModelTypeDetails(details)¶
- class MMCifInfoObsolete¶
Holds details on obsolete/ superseded structures.
- id¶
Type of change. Either Obsolete or Supersede. Returns a string starting upper case. Has to be set via OBSLTE or SPRSDE.
- pdb_id¶
ID of the replacing entry.
Also available as GetPDBID(). May also be modified by SetPDBID().
- replace_pdb_id¶
ID of the replaced entry.
Also available as GetReplacedPDBID(). May also be modified by SetReplacedPDBID().
- GetReplacedPDBID()¶
See replace_pdb_id
- SetReplacedPDBID(descriptor)¶
See replace_pdb_id
- class MMCifInfoStructRef¶
Holds the information of the struct_ref category. The category describes the link of polymers in the mmCIF file to sequences stored in external databases such as UniProt. The related categories struct_ref_seq and struct_ref_seq_dif also list differences between the sequences of the deposited structure and the sequences in the database. Two prominent examples of such differences are point mutations and/or expression tags.
- db_name¶
Name of the external database, for example UNP for UniProt.
Type : str
- db_access¶
Alternative accession code for the sequence in the database pointed to by db_name.
Type : str
- GetAlignedSeq(name)¶
Returns the aligned sequence for the given name, None if the sequence does not exist.
- aligned_seqs¶
List of aligned sequences (all entries of the struct_ref_seq category mapping to this struct_ref).
- class MMCifInfoStructRefSeq¶
An aligned range of residues between a sequence in a reference database and the deposited sequence.
- align_id¶
Uniquely identifies every struct_ref_seq item in the mmCIF file.
Type : str
- seq_begin¶
- seq_end¶
- The starting point (1-based) and end point of the aligned range in the deposited sequence, respectively.
Type : int
- db_begin¶
- db_end¶
- The starting point (1-based) and end point of the aligned range in the database sequence, respectively.
Type : int
- difs¶
List of differences between the deposited sequence and the sequence in the database.
- chain_name¶
Chain name of the polymer in the mmCIF file.
- class MMCifInfoStructRefSeqDif¶
A particular difference between the deposited sequence and the sequence in the database.
- rnum¶
The residue number (1-based) of the residue in the deposited sequence
Type : int
- details¶
A textual description of the difference, e.g. point mutation, expression tag, purification artifact.
Type : str
- class MMCifInfoRevisions¶
Revision history of a PDB entry. If you find a ‘?’ somewhere, this means ‘not set’.
- date_original¶
The date when this entry was seen in PDB for the very first time. This is not necessarily the release date.
type: str - first_release¶
Index + 1 of the revision releasing this entry. If the value is 0, was not set yet.
type: int - SetDateOriginal(date)¶
Set the date, when this entry first entered the PDB.
param date: database_pdb_rev.date_original type date: str - GetDateOriginal()¶
Retrieve database_pdb_rev.date_original.
returns: database_pdb_rev.date_original as str in format ‘yyyy-mm-dd’ - AddRevision(int num, String date, String status)¶
Add a new iteration to the history.
param num: database_pdb_rev.num type num: int param date: database_pdb_rev.date type date: str param status: database_pdb_rev.status type status: str - GetSize()¶
returns: Number of revisions as int - GetDate(i)¶
param i: Index of revision type i: int returns: database_pdb_rev.date as str - GetNum(i)¶
param i: Index of revision type i: int returns: database_pdb_rev.num as int - GetStatus(i)¶
param i: Index of revision type i: int returns: database_pdb_rev.status as str - GetLastDate()¶
The date of the latest revision.
returns: date as str - GetFirstRelease()¶
Points to the revision releasing the entry.
returns: Index as int