mmCIF File Format¶
The mmCIF file format is an alternate container for structural entities, also
provided by the PDB. Here we describe how to load those files and how to deal
with information provided above the common PDB format (MMCifInfo
,
MMCifInfoCitation
, MMCifInfoTransOp
,
MMCifInfoBioUnit
, MMCifInfoStructDetails
,
MMCifInfoObsolete
, MMCifInfoStructRef
,
MMCifInfoStructRefSeq
, MMCifInfoStructRefSeqDif
,
MMCifInfoRevisions
).
Loading mmCIF Files¶
-
LoadMMCIF
(filename, fault_tolerant=None, calpha_only=None, profile='DEFAULT', remote=False, seqres=False, info=False)¶ Load MMCIF file from disk and return one or more entities. Several options allow to customize the exact behaviour of the MMCIF import. For more information on these options, see IO Profiles for entity importer.
Residues are flagged as ligand if they are mentioned in a HET record.
Parameters: - fault_tolerant – Enable/disable fault-tolerant import. If set, overrides
the value of
IOProfile.fault_tolerant
. - remote – If set to True, the method tries to load the pdb from the remote pdb repository www.pdb.org. The filename is then interpreted as the pdb id.
- seqres – Whether to read SEQRES records. If set to True, the loaded entity and seqres entry will be returned as second item.
- info – Whether to return an info container with the other output.
Returns a
MMCifInfo
object as last item.
Return type: Raises: IOException
if the import fails due to an erroneous or non-existent file.- fault_tolerant – Enable/disable fault-tolerant import. If set, overrides
the value of
Categories Available¶
The following categories of a mmCIF file are considered by the reader:
atom_site
: Used to build theEntityHandle
entity
: Involved in settingChainType
of chainsentity_poly
: Involved in settingChainType
of chainscitation
: Goes intoMMCifInfoCitation
citation_author
: Goes intoMMCifInfoCitation
exptl
: Goes intoMMCifInfo
asmethod
.refine
: Goes intoMMCifInfo
asresolution
,r_free
andr_work
.pdbx_struct_assembly
: Used forMMCifInfoBioUnit
.pdbx_struct_assembly_gen
: Used forMMCifInfoBioUnit
.pdbx_struct_oper_list
: Used forMMCifInfoBioUnit
.struct
: Details about a structure, stored inMMCifInfoStructDetails
.struct_conf
: Stores secondary structure information (practically helices) in theEntityHandle
struct_sheet_range
: Stores secondary structure information for sheets in theEntityHandle
pdbx_database_PDB_obs_spr
: Verbose information on obsoleted/ superseded entries, stored inMMCifInfoObsolete
struct_ref
stored inMMCifInfoStructRef
struct_ref_seq
stored inMMCifInfoStructRefSeqDif
struct_ref_seq_dif
stored inMMCifInfoStructRefDif
database_pdb_rev
(mmCIF dictionary version < 5) stored inMMCifInfoRevisions
pdbx_audit_revision_history
andpdbx_audit_revision_details
(mmCIF dictionary version >= 5) used to fillMMCifInfoRevisions
Notes:
- Structures in mmCIF format can have two chain names. The “new” chain name
extracted from
atom_site.label_asym_id
is used to name the chains in theEntityHandle
. The “old” (author provided) chain name is extracted fromatom_site.auth_asym_id
for the first atom of the chain. It is added as string property named “pdb_auth_chain_name” to theChainHandle
. The mapping is also stored inMMCifInfo
asGetMMCifPDBChainTr()
andGetPDBMMCifChainTr()
if SEQRES records are read inLoadMMCIF()
and a non-empty SEQRES record exists for that chain (this should exclude ligands and water). - Molecular entities in mmCIF are identified by an
entity.id
. Each chain is mapped to an ID inMMCifInfo
asGetMMCifEntityIdTr()
.
Info Classes¶
Information from mmCIF files that goes beyond structural data, is kept in a
special container, the MMCifInfo
class. Here is a detailed description
of the annotation available.
-
class
MMCifInfo
¶ This is the container for all bits of non-molecular data pulled from a mmCIF file.
-
citations
¶ Stores a list of citations (
MMCifInfoCitation
).Also available as
GetCitations()
.
-
biounits
¶ Stores a list of biounits (
MMCifInfoBioUnit
).Also available as
GetBioUnits()
.
-
method
¶ Stores the experimental method used to create the structure.
Also available as
GetMethod()
. May also be modified bySetMethod()
.
-
resolution
¶ Stores the resolution of the crystal structure. Set to 0 if no value in loaded mmCIF file.
Also available as
GetResolution()
. May also be modified bySetResolution()
.
-
r_free
¶ Stores the R-free value of the crystal structure. Set to 0 if no value in loaded mmCIF file.
Also available as
GetRFree()
. May also be modified bySetRFree()
.
-
r_work
¶ Stores the R-work value of the crystal structure. Set to 0 if no value in loaded mmCIF file.
Also available as
GetRWork()
. May also be modified bySetRWork()
.
-
operations
¶ Stores the operations needed to transform a crystal structure into a bio unit.
Also available as
GetOperations()
. May also be modified byAddOperation()
.
-
struct_details
¶ Stores details about the structure in a
MMCifInfoStructDetails
object.Also available as
GetStructDetails()
. May also be modified bySetStructDetails()
.
-
struct_refs
¶ Lists all links to external databases in the mmCIF file.
-
revisions
¶ Stores a simple history of a PDB entry.
Also available as
GetRevisions()
. May be extended byAddRevision()
.Type: MMCifInfoRevisions
-
AddCitation
(citation)¶ Add a citation to the citation list of an info object.
Parameters: citation ( MMCifInfoCitation
) – Citation to be added.
-
AddAuthorsToCitation
(id, authors)¶ Adds a list of authors to a specific citation.
Parameters: - id (
str
) – Identifier of the citation. - authors (
StringList
) – List of authors.
- id (
-
AddBioUnit
(biounit)¶ Add a bio unit to the bio unit list of an info object. If the
id
ofbiounit
already exists in the set of assemblies, both will be merged. This means thatchain
andoperations
lists will be concatenated and the interval lists (operationsintervalls
,chainintervalls
) will be updated.Parameters: biounit ( MMCifInfoBioUnit
) – Bio unit to be added.
-
SetResolution
(resolution)¶ See
resolution
-
GetResolution
()¶ See
resolution
-
AddOperation
(operation)¶ See
operations
-
GetOperations
()¶ See
operations
-
SetStructDetails
(details)¶ See
struct_details
-
GetStructDetails
()¶
-
AddMMCifPDBChainTr
(cif_chain_id, pdb_chain_id)¶ Set up a translation for a certain mmCIF chain name to the traditional PDB chain name.
Parameters: - cif_chain_id (
str
) – atom_site.label_asym_id - pdb_chain_id (
str
) – atom_site.auth_asym_id
- cif_chain_id (
-
GetMMCifPDBChainTr
(cif_chain_id)¶ Get the translation of a certain mmCIF chain name to the traditional PDB chain name.
Parameters: cif_chain_id ( str
) – atom_site.label_asym_idReturns: atom_site.auth_asym_id as str
(empty if no mapping)
-
AddPDBMMCifChainTr
(pdb_chain_id, cif_chain_id)¶ Set up a translation for a certain PDB chain name to the mmCIF chain name.
Parameters: - pdb_chain_id (
str
) – atom_site.auth_asym_id - cif_chain_id (
str
) – atom_site.label_asym_id
- pdb_chain_id (
-
GetPDBMMCifChainTr
(pdb_chain_id)¶ Get the translation of a certain PDB chain name to the mmCIF chain name.
Parameters: pdb_chain_id ( str
) – atom_site.auth_asym_idReturns: atom_site.label_asym_id as str
(empty if no mapping)
-
AddMMCifEntityIdTr
(cif_chain_id, entity_id)¶ Set up a translation for a certain mmCIF chain name to the mmCIF entity ID.
Parameters: - cif_chain_id (
str
) – atom_site.label_asym_id - entity_id (
str
) – atom_site.label_entity_id
- cif_chain_id (
-
GetMMCifEntityIdTr
(cif_chain_id)¶ Get the translation of a certain mmCIF chain name to the mmCIF entity ID.
Parameters: cif_chain_id ( str
) – atom_site.label_asym_idReturns: atom_site.label_entity_id as str
(empty if no mapping)
-
AddRevision
(num, date, status)¶ Add a new iteration to the revision history. See
MMCifInfoRevisions.AddRevision()
.
-
SetRevisionsDateOriginal
(date)¶ Set the date, when this entry first entered the PDB. Ignored if it was set in the past. See
MMCifInfoRevisions.SetDateOriginal()
.
-
-
class
MMCifInfoCitation
¶ This stores citation information from an input file.
-
id
¶ Stores an internal identifier for a citation. If not provided, resembles an empty string.
Also available as
GetID()
. May also be modified bySetID()
.
-
cas
¶ Stores a Chemical Abstract Service identifier if available. If not provided, resembles an empty string.
Also available as
GetCAS()
. May also be modified bySetCas()
.
-
isbn
¶ Stores the ISBN code, presumably for cited books. If not provided, resembles an empty string.
Also available as
GetISBN()
. May also be modified bySetISBN()
.
-
published_in
¶ Stores the book or journal title of a publication. Should take the full title, no abbreviations. If not provided, resembles an empty string.
Also available as
GetPublishedIn()
. May also be modified bySetPublishedIn()
.
-
volume
¶ Supposed to store volume information for journals. Since the volume number is not always a simple integer, it is stored as a string. If not provided, resembles an empty string.
Also available as
GetVolume()
. May also be modified bySetVolume()
.
-
page_first
¶ Stores the first page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.
Also available as
GetPageFirst()
. May also be modified bySetPageFirst()
.
-
page_last
¶ Stores the last page of a publication. Since the page numbers are not always a simple integers, they are stored as strings. If not provided, resembles empty strings.
Also available as
GetPageLast()
. May also be modified bySetPageLast()
.
-
doi
¶ Stores the Document Object Identifier as used by doi.org for a cited document. If not provided, resembles an empty string.
Also available as
GetDOI()
. May also be modified bySetDOI()
.
-
pubmed
¶ Stores the PubMed accession number. If not provided, is set to 0.
Also available as
GetPubMed()
. May also be modified bySetPubmed()
.
-
year
¶ Stores the publication year. If not provided, is set to 0.
Also available as
GetYear()
. May also be modified bySetYear()
.
-
title
¶ Stores a title. If not provided, is set to an empty string.
Also available as
GetTitle()
. May also be modified bySetTitle()
.
Stores a
StringList
of authors.Also available as
GetAuthorList()
. May also be modified bySetAuthorList()
.
-
GetPublishedIn
()¶ See
published_in
-
SetPublishedIn
(title)¶ See
published_in
-
GetPageFirst
()¶ See
page_first
-
SetPageFirst
(first)¶ See
page_first
-
-
class
MMCifInfoTransOp
¶ This stores operations needed to transform an
EntityHandle
into a bio unit.-
id
¶ A unique identifier. If not provided, resembles an empty string.
-
type
¶ Describes the operation. If not provided, resembles an empty string.
Also available as
GetType()
. May also be modified bySetType()
.
-
translation
¶ The translational vector. Also available as
GetVector()
. May also bemodified by
SetVector()
.
-
rotation
¶ The rotational matrix. Also available as
GetMatrix()
. May also bemodified by
SetMatrix()
.
-
GetVector
()¶ See
translation
-
SetVector
(x, y, z)¶ See
translation
-
-
class
MMCifInfoBioUnit
¶ This stores information how a structure is to be assembled to form the bio unit.
-
id
¶ The id of a bio unit as given by the original mmCIF file.
Also available as
GetID()
. May also be modified bySetID()
.Type: str
-
details
¶ Special aspects of the biological assembly. If not provided, resembles an empty string.
Also available as
GetDetails()
. May also be modified bySetDetails()
.
-
method_details
¶ Details about the method used to determine this biological assembly.
Also available as
GetMethodDetails()
. May also be modified bySetMethodDetails()
.
-
chains
¶ Chains involved in this bio unit. If not provided, resembles an empty list.
Also available as
GetChainList()
. May also be modified byAddChain()
orSetChainList()
.
-
chainintervals
¶ List of intervals on the chain list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.
Also available as
GetChainIntervalList()
. Is automatically modified byAddChain()
,SetChainList()
andMMCifInfo.AddBioUnit()
.
-
operations
¶ Translations and rotations needed to create the bio unit. Filled with objects of class
MMCifInfoTransOp
.Also available as
GetOperations()
. May be modified byAddOperations()
-
operationsintervalls
¶ List of intervals on the operations list. Needed if there a several sets of chains and transformations to create the bio unit. Comes as a list of tuples. First component is the start, second is the right border of the interval.
Also available as
GetOperationsIntervalList()
. Is automatically modified byAddOperations()
andMMCifInfo.AddBioUnit()
.
-
GetMethodDetails
()¶ See
method_details
-
SetMethodDetails
(details)¶ See
method_details
-
SetChainList
(chains)¶ See
chains
, also resetschainintervalls
to contain only one interval enclosing the whole chain list.Parameters: chains ( StringList
) – List of chain names.
-
AddChain
(chain name)¶ See
chains
, also extends the right border of the last entry inchainintervalls
.
-
GetChainIntervalList
()¶ See
chainintervals
-
GetOperations
()¶ See
operations
-
AddOperations
(list of operations)¶ See
operations
, also extends the right border of the last entry inoperationsintervalls
.
-
GetOperationsIntervalList
()¶
-
PDBize
(asu, seqres=None, min_polymer_size=10, transformation=False)¶ Returns the biological assembly (bio unit) for an entity. The new entity created is well suited to be saved as a PDB file. Therefore the function tries to meet the requirements of single-character chain names. The following measures are taken.
- All ligands are put into one chain (_)
- Water is put into one chain (-)
- Each polymer gets its own chain, named A-Z 0-9 a-z.
- The description of non-polymer chains will be put into a generic string property called description on the residue level.
- Ligands that resemble a polymer but have less than min_polymer_size residues are assigned the same numeric residue number. The residues are distinguished by insertion code.
- Sometimes bio units exceed the coordinate system storable in a PDB file. In that case, the box around the entity will be aligned to the lower left corner of the coordinate system.
Since this function is at the moment mainly used to create biounits from mmCIF files to be saved as PDBs, the function assumes that the
ChainType
properties are set correctly.Parameters: - asu (
EntityHandle
) – Asymmetric unit to work on. Should be created from a mmCIF file. - seqres (
SequenceList
) – If set to a valid sequence list, the length of the seqres records will be used to determine if a certain chain has the minimally required length. - min_polymer_size (int) – The minimal number of residues a polymer needs to get its own chain. Everything below that number will be sorted into the ligand chain.
- transformation (
bool
) – If set, return the transformation matrix used to move the bounding box of the bio unit to the lower left corner.
-
-
class
MMCifInfoStructDetails
¶ Holds details about the structure.
-
entry_id
¶ Identifier for a curtain data block. If not provided, resembles an empty string.
Also available as
GetEntryID()
. May also be modified bySetEntryID()
.
-
title
¶ Set a title for the structure.
Also available as
GetTitle()
. May also be modified bySetTitle()
.
-
casp_flag
¶ Tells whether this structure was a target in some competition.
Also available as
GetCASPFlag()
. May also be modified bySetCASPFlag()
.
-
descriptor
¶ Descriptor for an NDB structure or the unstructured content of a PDB COMPND record.
Also available as
GetDescriptor()
. May also be modified bySetDescriptor()
.
-
mass_method
¶ Method used to determine the molecular weight.
Also available as
GetMassMethod()
. May also be modified bySetMassMethod()
.
-
model_details
¶ Details about how the structure was determined.
Also available as
GetModelDetails()
. May also be modified bySetModelDetails()
.
-
model_type_details
¶ Details about how the type of the structure was determined.
Also available as
GetModelTypeDetails()
. May also be modified bySetModelTypeDetails()
.
-
GetDescriptor
()¶ See
descriptor
-
SetDescriptor
(descriptor)¶ See
descriptor
-
GetMassMethod
()¶ See
mass_method
-
SetMassMethod
(method)¶ See
mass_method
-
GetModelDetails
()¶ See
model_details
-
SetModelDetails
(details)¶ See
model_details
-
GetModelTypeDetails
()¶
-
SetModelTypeDetails
(details)¶
-
-
class
MMCifInfoObsolete
¶ Holds details on obsolete/ superseded structures.
-
id
¶ Type of change. Either Obsolete or Supersede. Returns a string starting upper case. Has to be set via
OBSLTE
orSPRSDE
.
-
pdb_id
¶ ID of the replacing entry.
Also available as
GetPDBID()
. May also be modified bySetPDBID()
.
-
replace_pdb_id
¶ ID of the replaced entry.
Also available as
GetReplacedPDBID()
. May also be modified bySetReplacedPDBID()
.
-
GetReplacedPDBID
()¶ See
replace_pdb_id
-
SetReplacedPDBID
(descriptor)¶ See
replace_pdb_id
-
-
class
MMCifInfoStructRef
¶ Holds the information of the struct_ref category. The category describes the link of polymers in the mmCIF file to sequences stored in external databases such as UniProt. The related categories
struct_ref_seq
andstruct_ref_seq_dif
also list differences between the sequences of the deposited structure and the sequences in the database. Two prominent examples of such differences are point mutations and/or expression tags.-
db_name
¶ Name of the external database, for example UNP for UniProt.
Type: str
-
db_access
¶ Alternative accession code for the sequence in the database pointed to by
db_name
.Type: str
-
GetAlignedSeq
(name)¶ Returns the aligned sequence for the given name, None if the sequence does not exist.
-
aligned_seqs
¶ List of aligned sequences (all entries of the struct_ref_seq category mapping to this struct_ref).
-
-
class
MMCifInfoStructRefSeq
¶ An aligned range of residues between a sequence in a reference database and the deposited sequence.
-
align_id
¶ Uniquely identifies every struct_ref_seq item in the mmCIF file.
Type: str
-
seq_begin
¶ -
seq_end
¶ The starting point (1-based) and end point of the aligned range in the deposited sequence, respectively.
Type: int
-
db_begin
¶ -
db_end
¶ The starting point (1-based) and end point of the aligned range in the database sequence, respectively.
Type: int
-
difs
¶ List of differences between the deposited sequence and the sequence in the database.
-
chain_name
¶ Chain name of the polymer in the mmCIF file.
-
-
class
MMCifInfoStructRefSeqDif
¶ A particular difference between the deposited sequence and the sequence in the database.
-
rnum
¶ The residue number (1-based) of the residue in the deposited sequence
Type: int
-
details
¶ A textual description of the difference, e.g. point mutation, expression tag, purification artifact.
Type: str
-
-
class
MMCifInfoRevisions
¶ Revision history of a PDB entry. If you find a ‘?’ somewhere, this means ‘not set’.
-
date_original
¶ The date when this entry was seen in PDB for the very first time. This is not necessarily the release date. Expected format ‘yyyy-mm-dd’.
Type: str
-
first_release
¶ Index + 1 of the revision releasing this entry. If the value is 0, was not set yet. Set first time we encounter a
GetStatus()
value of “full release” (mmCIF versions < 5) or “Initial release” (current mmCIF).Type: int
-
AddRevision
(num, date, status)¶ Add a new iteration to the history.
Parameters: - num (
int
) – SeeGetNum()
- date (
str
) – SeeGetDate()
- status (
str
) – SeeGetStatus()
Raises: Exception if num is <= the last added iteration.
- num (
-
GetSize
()¶ Returns: Number of revisions (valid revision indices are in [0, number-1]). Return type: int
-
GetDate
(i)¶ Parameters: i ( int
) – Index of revisionReturns: Date the PDB revision took place. Expected format ‘yyyy-mm-dd’. Return type: str
Raises: Exception if i out of bounds.
-
GetNum
(i)¶ Parameters: i ( int
) – Index of revisionReturns: Unique identifier of revision (assigned in increasing order) Return type: int
Raises: Exception if i out of bounds.
-
GetStatus
(i)¶ Parameters: i ( int
) – Index of revisionReturns: The status of this revision. Return type: str
Raises: Exception if i out of bounds.
-
GetLastDate
()¶ Returns: Date of the latest revision (‘?’ if no revision set). Return type: str
-
SetDateOriginal
(date)¶ -
GetDateOriginal
()¶ See
date_original
-
GetFirstRelease
()¶ See
first_release
-