io
- Input and Output of Sequences, Structures and Maps¶
The io module deals with the input and output of entities
, alignments
,
sequences
, images
. Importers for common file formats containing molecules
such as PDB, SDF and CHARMM trajectory files are available. Sequence and
alignment file formats such as FASTA and CLUSTALW are supported as well as
various image data (e.g. png, dm3) and density map files (e.g. CCP4, MRC).
Molecular Structures¶
Loading Molecular Structures¶
The io
modules offer several ways to load molecular structures
depending on your requirements. The most general way is offered by
LoadEntity()
, which will automatically detect the file format based
on the file extension.
- LoadEntity(filename, format='auto')¶
Load entity from disk. If format is set to ‘auto’, the function guesses the filetype based on the extension of the file e.g. files ending in ‘.pdb’, ‘.ent’, ‘.ent.gz’ and ‘.pdb.gz’ will automatically be loaded as PDB files. For files without or exotic extensions, the format can be set explicitly as the second parameter.
# Recognizes SDF file by file extension ent = io.LoadEntity('file.sdf') # In this case, there is no file extensions, so you have to say it's a # SDF file explicitly ent = io.LoadEntity('file', 'sdf')
For a list of file formats supported by
LoadEntity()
, see Supported Structure File Formats.- Raises:
IOUnknownFormatException
if the format string supplied is not recognized or the file format can not be detected based on the file extension.IOException
if the import fails due to an erroneous or inexistent file.
Some of the formats have a dedicated function that allows you to tweak many
parameters that affect the import. PDB files can be loaded with LoadPDB()
and mmCIF files with LoadMMCIF()
(this also gives you access to the
MMCifInfo
class). It offers tighter control over the exact loading
behaviour.
- LoadPDB(filename, restrict_chains='', no_hetatms=None, fault_tolerant=None, load_multi=False, quack_mode=None, join_spread_atom_records=None, calpha_only=None, profile='DEFAULT', remote=False, remote_repo='pdb', dialect=None, seqres=False, bond_feasibility_check=None, read_conect=False)¶
Load PDB file from disk and return one or more entities. Several options allow to customize the exact behaviour of the PDB import. For more information on these options, see IO Profiles for entity importer.
Residues are flagged as ligand if they are mentioned in a HET record.
- Parameters:
filename (
str
) – File to be loadedrestrict_chains (
str
) – If not an empty string, only chains listed in the string will be imported.no_hetatms (
bool
) – If set to True, HETATM records will be ignored. Overrides the value ofIOProfile.no_hetatms
fault_tolerant (
bool
) – Enable/disable fault-tolerant import. If set, overrides the value ofIOProfile.fault_tolerant
.load_multi (
bool
) – If set to True, a list of entities will be returned instead of only the first. This is useful when dealing with multi-PDB files.quack_mode (
bool
) – Guess the chemical class for unknown residues based on its atoms and connectivity. If set, overrides the value ofIOProfile.quack_mode
.join_spread_atom_records (
bool
) – If set, overrides the value ofIOProfile.join_spread_atom_records
.calpha_only (
bool
) – When set to true, forces the importer to only load atoms named CA. If set, overrides the value ofIOProfile.calpha_only
.profile (
str
/ost.io.IOProfile
) – Aggregation of flags and algorithms to control import and processing of molecular structures. Can either be astr
specifying one of the default profiles [‘DEFAULT’, ‘SLOPPY’, ‘CHARMM’, ‘STRICT’] or an actual object of typeost.io.IOProfile
. If astr
defines a default profile,IOProfile.processor
is set toost.conop.RuleBasedProcessor
with the currently setost.conop.CompoundLib
available asost.conop.GetDefaultLib()
. If noost.conop.CompoundLib
is available,ost.conop.HeuristicProcessor
is used instead. See IO Profiles for entity importer for more info.remote (
bool
) – If set to True, the method tries to load the pdb from the remote repository given as remote_repo. The filename is then interpreted as the entry id as further specified for the remote_repo parameter.remote_repo (
str
) – Remote repository to fetch structure if remote is True. Must be one in [‘pdb’, ‘smtl’, ‘pdb_redo’]. In case of ‘pdb’ and ‘pdb_redo’, the entry must be given as lower case pdb id, which loads the deposited assymetric unit (e.g. ‘1ake’). In case of ‘smtl’, the entry must also specify the desired biounit (e.g. ‘1ake.1’).dialect (
str
) – Specifies the particular dialect to use. If set, overrides the value ofIOProfile.dialect
seqres (
bool
) – Whether to read SEQRES records. If set to True, the loaded entity and seqres entry will be returned as a tuple.bond_feasibility_check (
bool
) – Flag forIOProfile.processor
. If turned on, atoms are only connected by bonds if they are within a reasonable distance (as defined byost.conop.IsBondFeasible()
). If set, overrides the value ofost.conop.Processor.check_bond_feasibility
read_conect (
bool
) – By default, OpenStructure doesn’t read CONECT statements in a pdb file. Reason is that they’re often missing and we prefer to rely on the chemical component dictionary from the PDB. However, there may be cases where you really want these CONECT statements. For example novel compounds with no entry in the chemical component dictionary. Setting this to True has two effects: 1) CONECT statements are read and blindly applied 2) The processor does not connect any pair of HETATOM atoms in order to not interfer with the CONECT statements.
- Return type:
EntityHandle
or a list thereof if load_multi is True.- Raises:
IOException
if the import fails due to an erroneous or inexistent file
- PDBStrToEntity(pdb_string, profile=IOProfile(), process=False)¶
Load entity from a string in PDB format. By default the entity is loaded with an empty IO Profile and is not processed with the
Processor
, even if one is available in the IO Profile.- Parameters:
pdb_string – A PDB file as a string.
profile – The IO Profile to read the entity with. For more information on the IO Profiles available, see IO Profiles for entity importer.
process – If set to True, run the
Processor
contained in the IO Profile.
- Return type:
To get an entity equivalent to one loaded with
LoadPDB()
, set the profile and process arguments as follows:with open('protein.pdb') as pdb_fd: pdb_str = pdb.read() ent = io.PDBStrToEntity(pdb_str, ost.io.profiles['DEFAULT'], True)
- SDFStrToEntity(sdf_string)¶
Load entity from a string in SDF format.
- Parameters:
pdb_string – A SDF file as a string.
- Return type:
- class ost.io.OMF¶
Experimental data format capable of storing minimally required information of a
ost.mol.EntityHandle
in a heavily compressed manner (OMF - OpenStructure Minimal Format). Shares lots of ideas with the mmtf or binaryCIF formats but comes with no dependencies attached.- static FromEntity(ent)¶
Generates
ost.io.OMF
object starting from anost.mol.EntityHandle
. ent is is assigned as assymetric unit, no biounits are defined (i.e.GetBU()
raises an error in any case`).- Parameters:
ent (
ost.mol.EntityHandle
) – Structural data- Returns:
The created
ost.io.OMF
object
- static FromFile(filepath)¶
Generates
ost.io.OMF
object from a file stored withToFile()
.- Parameters:
filepath (
str
) – The file- Returns:
The created
ost.io.OMF
object
- static FromMMCIF(ent, info)¶
Generates
ost.io.OMF
object starting from anost.mol.EntityHandle
with an associatedMMCifInfo
(you get this stuff withost.io.LoadMMCIF()
). ent is assigned as assymetric unit and the biounits are fetched from info.- Parameters:
ent (
ost.mol.EntityHandle
) – Structural datainfo (
MMCifInfo
) – Biounits are fetched from here
- Returns:
The created
ost.io.OMF
object
- static FromBytes(bytes_string)¶
Generates
ost.io.OMF
object from abytes
object created withToBytes()
.- Parameters:
bytes_string (
bytes
) – The super heavily compressed OMF structure
- ToFile(filepath)¶
Stores object to file
- Parameters:
filepath (
str
) – The file
- ToBytes()¶
Stores object to
bytes
string- Returns:
The
bytes
string
- GetAU()¶
Getter for assymetric unit
- Returns:
The assymetric unit as
ost.mol.EntityHandle
- GetAUChain(chain_name)¶
Getter for single chain in the assymetric unit
- Parameters:
chain_name (
str
) – The name of the chain you want- Returns:
ost.mol.EntityHandle
of the AU only containing the specified chain
- GetBU(bu_idx)¶
Getter for Biounit
- Parameters:
bu_idx (
int
) – The index from the biounit as it has been read from info inFromMMCIF()
.- Returns:
The Biounit as
ost.mol.EntityHandle
- Raises:
RuntimeError
if bu_idx doesn’t exist
Loading Molecular Structures From Remote Repositories¶
LoadPDB()
already provides access to selected structural databases in
the internet when enabling the remote flag. Predefined
ost.io.remote.RemoteRepository
objects are available as
from ost.io import remote
repo_name = 'smtl'
repo = remote.REMOTE_REPOSITORIES.get(repo_name)
# url for entry with id 1ake.1
print(repo.URLForID('1ake.1'))
where repo_name can be one of [‘pdb’, ‘cif’, ‘pdb_redo’, ‘smtl’]. Instead of explicit access, you can directly fetch data using:
- RemoteGet(id, from_repo='pdb')¶
Invokes
RemoteRepository.Get()
on predefined repositories (‘pdb’, ‘smtl’, ‘cif’, ‘pdb_redo’)- Parameters:
from_repo (
str
) – One of the predefined repositories
- RemoteLoad(id, from_repo='pdb')¶
Invokes
RemoteRepository.Load()
on predefined repositories (‘pdb’, ‘smtl’, ‘cif’, ‘pdb_redo’)- Parameters:
from_repo (
str
) – One of the predefined repositories
- class RemoteRepository(name, url_pattern, type, id_transform='upper')¶
A remote repository represents a structural database accessible through the internet, e.g. the PDB or SWISS-MODEL template library.
- Parameters:
name (
str
) – Name of the repositoryurl_pattern (
str
) – URL pattern for repository. Required format is described inURLForID()
type (
str
) – Data format to expect at resolved URL must be in (‘pdb’, ‘cif’)id_transform (
str
) – Transformation to apply to ID before resolving URL inURLForID()
. Must be in (‘lower’, ‘upper’)
- Get(id)¶
Resolves URL with
URLForID()
, dumps the content in a temporary file and returns its path.- Parameters:
id (
str
) – ID to resolve
- Load(id)¶
Resolves URL with
URLForID()
and directly loads/returns the accordingost.mol.EntityHandle
. Loading invokes theost.io.LoadPDB()
/ost.io.LoadMMCIF()
with default parameterization. If you need custom settings, you might want to consider to callGet()
and do the loading manually.- Parameters:
id (
str
) – ID to resolve
- URLForID(id)¶
Resolves URL given url_pattern and id_transform provided at object initialization. The url_pattern must contain substring ‘$ID’. Given id, the URL to the structure gets constructed by applying id_transform and inserting it at the location of ‘$ID’. e.g. ‘https://files.rcsb.org/view/$ID.pdb’ given 1ake as id and ‘upper’ as id_transform resolves to: ‘https://files.rcsb.org/view/1AKE.pdb’
Saving Molecular Structures¶
Saving a complete entity or a view is a matter of calling
SaveEntity()
.
ent = io.LoadEntity('protein.pdb')
# save full entity
io.SaveEntity(ent, 'full.pdb')
# only save C-alpha atoms
io.SaveEntity(ent.Select('aname=CA and peptide=true'), 'calpha.pdb')
SavePDB()
provides a simple way to save several entities into one
file:
ent = io.LoadEntity('protein.pdb')
# Save complete entity
io.SavePDB(ent, 'full.pdb')
# Save chain A and chain B separately
io.SavePDB([ent.Select('cname=A'), ent.Select('cname=B')], 'split.pdb')
- SaveEntity(ent, filename, format='auto')¶
Save entity to disk. If format is set to ‘auto’, the function guesses the filetype based on the file extension, otherwise the supplied format is checked against the available export plugins.
- Parameters:
ent (
EntityHandle
orEntityView
) – The entity to be savedfilename (string) – The filename
format (string) – Name of the format
- Raises:
IOUnknownFormatException
if the format string supplied is not recognized or the file format can not be detected based on the file extension.
- SavePDB(models, filename, dialect=None, pqr=False, profile='DEFAULT')¶
Save entity or list of entities to disk. If a list of entities is supplied the PDB file will be saved as a multi PDB file. Each of the entities is wrapped into a MODEL/ENDMDL pair.
If the atom number exceeds 99999, ‘*’ is used.
- Parameters:
models – The entity or list of entities (handles or views) to be saved
filename (string) – The filename
- Raises:
IOException if the restrictions of the PDB format are not satisfied (with the exception of atom numbers, see above):
Chain names with more than one character
Atom positions with coordinates outside range [-999.99, 9999.99]
Residue names longer than three characters
Atom names longer than four characters
Numeric part of
ost.mol.ResNum
outside range [-999, 9999]Alternative atom indicators longer than one character
- EntityToPDBStr(ent, profile=IOProfile())¶
Return entity as a string in PDB format.
- Parameters:
entity – The
EntityHandle
orEntityView
profile – The IO Profile to read the entity with. For more information on the IO Profiles available, see IO Profiles for entity importer.
- Raises:
IOException if the restrictions of the PDB format are not satisfied (see
ost.io.SavePDB()
)- Return type:
string.
- EntityToSDFStr(ent)¶
Return entity as a string in SDF format.
- Parameters:
entity – The
EntityHandle
orEntityView
- Return type:
string.
Sequences and Alignments¶
Loading sequence or alignment files¶
- LoadSequence(filename, format='auto')¶
Load sequence data from disk. If format is set to ‘auto’, the function guesses the filetype based on the extension of the file. Files ending in ‘.fasta’, ‘.aln’ will automatically be loaded.
- Parameters:
filename (string) – The filename
format (string) – Name of the format
- Return type:
For files with non-standard extensions, the format can be set explicitly specifying the format parameter.
# recognizes FASTA file by file extension myseq = io.LoadSequence('seq.fasta') # for obtaining a SequenceList seqlist = io.LoadSequenceList('seqs.fasta') # or for multiple alignments (here from CLUSTAL) aln = io.LoadAlignment('algnm.aln', format="clustal")
For a list of file formats supported by
LoadSequence()
see Supported Sequence File Formats.- Raises:
IOUnknownFormatException
if the format string supplied is not recognized or the file format can not be detected based on the file extension.IOException
if the import fails due to an erroneous or inexistent file.
- LoadSequenceList(filename, format='auto')¶
For a description of how to use
LoadSequenceList()
please refer toLoadSequence()
. For a list of file formats supported byLoadSequenceList()
see Supported Sequence File Formats.- Parameters:
filename (string) – The filename
format (string) – Name of the format
- Return type:
- LoadAlignment(filename, format='auto')¶
For a description of how to use
LoadAlignment()
please refer toLoadSequence()
. For a list of file formats supported byLoadAlignment()
see Supported Sequence File Formats.- Parameters:
filename (string) – The filename
format (string) – Name of the format
- Return type:
- LoadSequenceProfile(filename, format='auto')¶
Load sequence profile data from disk. If format is set to ‘auto’, the function guesses the filetype based on the extension of the file. Files ending in ‘.hhm’ (output of HHblits) and ‘.pssm’ (ASCII Table (PSSM) output of PSI-BLAST as generated with blastpgp and flag -Q) will automatically be loaded.
For files with non-standard extensions, the format can be set explicitly specifying the format parameter.
# recognizes hhm file by file extension myprof = io.LoadSequenceProfile('myhmm.hhm') # recognizes pssm file by file extension myprof = io.LoadSequenceProfile('myprof.pssm') # to override format myprof = io.LoadSequenceProfile('myfile', format='hhm') myprof = io.LoadSequenceProfile('myfile', format='pssm')
For a list of file formats supported by
LoadSequenceProfile()
see Supported Sequence Profile File Formats.- Parameters:
filename (string) – The filename
format (string) – Name of the format
- Return type:
- Raises:
IOUnknownFormatException
if the format string supplied is not recognized or the file format can not be detected based on the file extension.IOException
if the import fails due to an erroneous or inexistent file.
- AlignmentFromString(data, format)¶
Load alignment from string.
The format argument is mandatory. For a list of supported formats, see Supported Sequence File Formats.
- Parameters:
data (string) – The alignment
format (string) – Name of the format
- Return type:
- SequenceFromString(data, format)¶
Load sequence from string.
The format argument is mandatory. For a list of supported formats, see Supported Sequence File Formats.
- Parameters:
data (string) – The sequence
format (string) – Name of the format
- Return type:
- SequenceListFromString(data, format)¶
Load a list of sequences from string.
The format argument is mandatory. For a list of supported formats, see Supported Sequence File Formats.
- Parameters:
data (string) – The list of sequences
format (string) – Name of the format
- Return type:
- SequenceProfileFromString(data, format)¶
Load sequence profile from string.
The format argument is mandatory.
- Parameters:
data (string) – String containing the data you would read from a file with specified format.
format (string) – Name of the format. Can either be “hhm” or “pssm”.
- Return type:
Saving Sequence Data¶
- SaveSequence(sequence, filename, format='auto')¶
Saving sequence data is performed by calling
SaveSequence()
. For files with non-standard extensions, the format can be set explicitly specifying the ‘format’ parameter.# recognizes FASTA file by file extension io.SaveSequence(myseq, 'seq.fasta') # for saving a SequenceList io.SaveSequenceList(seqlist, 'seqlist.fasta') # or for multiple alignments (here in FASTA format) io.SaveAlignment(aln, 'aln.fasta')
For a list of file formats supported by
SaveSequence()
see Supported Sequence File Formats.- Parameters:
sequence (
SequenceHandle
) – The sequencefilename (string) – The filename
format (string) – Name of the format
- Raises:
IOUnknownFormatException
if the format string supplied is not recognized or the file format can not be detected based on the file extension.IOException
if the import fails due to an erroneous or inexistent file.
- SaveSequenceList(seq_list, filename, format='auto')¶
For a desription of how to use
SaveSequenceList()
please refer toSaveSequence()
. For a list of file formats supported bySaveSequenceList()
see Supported Sequence File Formats.- Parameters:
seq_list (
SequenceList
) – The sequence listfilename (string) – The filename
format (string) – Name of the format
- SaveAlignment(aln, filename, format='auto')¶
For a desription of how to use
SaveAlignment()
please refer toSaveSequence()
.For a list of file formats supported by
SaveAlignment()
see Supported Sequence File Formats.- Parameters:
aln (
AlignmentHandle
) – The alignmentfilename (string) – The filename
format (string) – Name of the format
- AlignmentToString(ali, format)¶
Return alignment as a string.
The format argument is mandatory. For a list of supported formats see Supported Sequence File Formats.
- Parameters:
ali (
AlignmentHandle
) – The alignmentformat (string) – Name of the format
- Return type:
string
- SequenceToString(seq, format)¶
Return sequence as a string.
The format argument is mandatory. For a list of supported formats see Supported Sequence File Formats.
- Parameters:
seq (
SequenceHandle
) – The sequenceformat (string) – Name of the format
- Return type:
string
- SequenceListToString(seq_list, format)¶
Return sequence list as a string.
The format argument is mandatory. For a list of supported formats see Supported Sequence File Formats.
- Parameters:
seq_list – The sequence list
format (string) – Name of the format
- Return type:
string
Density Maps¶
Loading Density Maps¶
- LoadImage(filename)¶
Load density map from disk with the extension being guessed by the function.
- Parameters:
filename (string) – The filename
- LoadImage(filename, format)¶
Load density map from disk. If no format is given, the function guesses the filetype based on the extension of the file. If the extension is unknown or not present the filetype will be guessed based on the content of the file if possible.
- Parameters:
filename (string) – The filename
format – The file format
- Raises:
IOUnknownFormatException
if the format supplied is not recognized or the file format can not be detected based on the file extension and content.IOException
if the import fails due to an erroneous or inexistent file.
# recognizes mrc file by file extension ent = io.LoadImage('file.mrc') # it is always possible to explicitly set the image format # DAT file explicitly ent = io.LoadImage('file', Dat())
For a list of file formats supported by
LoadImage()
, see Supported Image File Formats.
Saving Density Maps¶
- SaveImage(image, filename)¶
Save density map to disk with the function guessing the filetype based on the file extension.
- SaveImage(image, filename, format)¶
Save density map to disk. If no format is set, the function guesses the filetype based on the file extension.
- Parameters:
image (
IMageHandle
) – The density map to be savedfilename (string) – The filename
format – The file format
- Raises:
IOUnknownFormatException
if the file format can not be detected based on the file extension
For a list of file formats supported by
SaveImage()
, see Supported Image File Formats.# load density map image = io.LoadImage('density_map.ccp4') # save density map io.SaveImage(image, 'new_map.map', CCP4())
Stereochemical Parameters¶
In order to check the structure for some stereo-chemical and steric clashes
before computing the lDDT scores it is required to pass parameter file based on
Engh and Huber parameters, and on the atomic radii as defined in the Cambridge
Structural Database. OpenStructure ships with default file called
stereo_chemical_props.txt located in $OST_ROOT/share/openstructure
directory. A function ReadStereoChemicalPropsFile()
is used to
read this file.
- ReadStereoChemicalPropsFile(filename='', check=True)¶
Read stereochemical parameters - if not provided a local version will be used.
- Parameters:
filename (
str
) – The path to the parameter file that will be used. If set to “”, it reads the default file shipped with OpenStructure.check (
bool
) – Raise an error when any of the resulting tables are empty.
- Returns:
Object containing stereochemical parameters
- Return type:
StereoChemicalProps
- GetStereoChemicalPropsFile()¶
Get the default path to the stereochemical paramteres file.