Queries

OpenStructure includes a powerful query system that allows you to perform custom selections on a molecular entity in a convenient way.

The Basics

It is often convenient to highlight or focus certain parts of the structure. OpenStructure includes a powerful query system that allows you to perform custom selections in a convenient way. Selections are carried out mainly by calling the Select method made available by EntityHandle and EntityView objects while providing a query string. Queries are written using a dedicated mini-language. For example, to select all arginine residues of a given structure, one would write:

arginines = model.Select('rname=ARG')

A simple selection query (called a predicate) consists of a property (here, rname), a comparison operator (here, =) and an argument (here, ARG). The return value of a call to the EntityHandle.Select() method is always an EntityView. The EntityView always contains a full hierarchy of elements, never standalone separated elements. In the above example, the EntityView called arginines will contain all chains from the structure called model that have at least one arginine. In turn these chains will contain all residues that have been identified as arginines. The residues themselves will contain references to all of their atoms. Of course, queries are not limited to selecting residues based on their type, it is also possible to select atom by name:

c_betas = model.Select('aname=CB')

As before, c_betas is an instance of an EntityView object and contains a full hierarchy. The main difference to the previous example is that the selected residues do not contain a list of all of their atoms but only the C-beta. These examples clarify why the name ‘view’ was chosen for this result of a Select() statement. It represents a reduced, restrained way of looking at the original structure.

Both the selection statements that have been used so far take strings as their arguments. However, selection properties such as rnum (residue number), take numeric arguments. With numeric arguments it is possible to use identity operators ( != and =). It is also possible to compare them using the >, <, >= and <= operators. For example, the 20 N-terminal residues of a protein can be selected with:

n_term = model.Select('rnum<=20')

If you want to supply arguments with special characters they need to be put in quotation marks (’ or “). For instance, this is needed for any chain name containing spaces as in:

model.Select('cname=" "')

Almost any name can be quoted with QueryQuoteName().

Combining predicates

Selection predicates can be combined with boolean operators. For example , you might want to select all C atoms with crystallographic occupancy higher than 50. These atoms must match the predicate ele=C in addition to the predicate occ>50. In the query language this can be written as:

model.Select('ele=C and occ>50')

Compact forms are available for several selection statements. For example, to select all arginines and aspargines, one could use a statement like:

arg_and_asn = model.Select('rname=ARG or rname=ASN')

However, this is rather cumbersome as it requires the word rname to be typed twice. Since the only difference between the two parts of the selection is the argument that follows the word rname, the statement can also be written in an abbreviated form:

arg_and_asn = model.Select('rname=ARG,ASN')

Another example: to select residues with numbers in the range 130 to 200, one could use the following statement

center = model.Select('rnum>=130 and rnum<=200')

or alternatively use the much nicer syntax:

center = model.Select('rnum=130:200')

This last statement is completely equivalent to the previous one. This syntax can be used when the selection statement requires a range of integer values within a closed interval.

Distance Queries

The query

around_center = model.Select('5 <> {0,0,0}')

selects all chains, residues and atoms that lie with 5 Å to the origin of the reference system ({0,0,0}). The <> operator is called the ‘within’ operator. Instead of a point, the within statements can also be used to return a view containing all chains, residues and atoms within a radius of another selection statement applied to the same entity. Square brackets are used to delimit the inner query statement.

around_hem = model.Select('5 <> [rname=HEM]')
model.Select('5 <> [rname=HEM and ele=C] and rname!=HEM')

Bonds and Queries

When an EntityView is generated by a selection, it includes by default only bonds for which both connected atoms satisfy the query statement. This can be changed by passing the parameters EXCLUSIVE_BONDS or NO_BONDS when calling the Select method. EXCLUSIVE_BONDS adds bonds to the EntityView when at least one of the two atoms falls within the boundary of the selection. NO_BONDS suppresses the bond inclusion step completely.

Whole Residue Queries

If the parameter MATCH_RESIDUES is passed when the Select method is called, the resulting EntityView will include whole residues for which at least one atom satisfies the query. This means that if at least one atom in the residue falls within the boundaries of the selection, all atoms of the residue will be included in the View.

More Query Usage

The high level interface for queries are the Select methods of the EntityHandle and EntityView classes. By passing in a query string, a view consisting of a subset of the elements is returned.

Queries also offer a second interface: IsAtomSelected(), IsResidueSelected() and IsChainSelected() take an atom, residue or chain as their argument and return true or false, depending on whether the element fulfills the predicates.

Generic Properties in Queries

The query language can also be used for numeric generic properties (i.e. float and int), but the syntax is slightly different. To access any generic properties, it needs to be specified that they are generic and at which level they are defined. Therefore, all generic properties start with a g, followed by an a, r or c for atom, residue or chain level respectively.

# set generic properties for atom, residue, chain
atom_handle.SetFloatProp("testpropatom", 5.2)
resid_handle.SetFloatProp("testpropres", 1.1)
chain_handle.SetIntProp("testpropchain", 10)

# query statements
sel_a = e.Select("gatestpropatom<=10.0")
sel_r = e.Select("grtestpropres=1.0")
sel_c = e.Select("gctestpropchain>5")

Since generic properties do not need to be defined for all parts of an entity (e.g. it could be specified for one single AtomHandle), the query statement will throw an error unless you specify a default value in the query statement which can be done using a ‘:’ character:

# if one or more atoms have no generic properties

sel = e.Select("gatestprop=5")
# this will throw an error

# you can specify a default value:
sel = e.Select("gatestprop:1.0=5")
# this will run through smoothly and use 1.0 as
# the default value for all atoms that do not
# have the generic property 'testprop'

Using this method, you will be warned if a generic property is not set for all atoms, residues or chains unless you specify a default value. So, be careful when you do.

Available Properties

The following properties may be used in predicates. The type is given for each property.

Properties of Chains

cname/chain (str) Chain name

Properties of Residues

rname (str): Residue name

rnum (int): Residue number. Currently only the numeric part is honored.

rtype (str): Residue type as given by the DSSP code (e.g. “H” for alpha helix, “E” for extended), “helix” for all helix types, “ext” or “strand” for all beta sheets or “coil” for any type of coil (see SecStructure).

rindex (int): Index of residue handle in chain. This index is the same for views and handles.

peptide (bool): Whether the residue is peptide linking.

protein (bool): Whether the residue is considered to be part of a connected protein.

rbfac (float): average B (temperature) factor of residue

ligand (bool) Whether the residue is a ligand.

water (bool) Whether the residue is water.

Properties of Atoms

aname (str): Atom name

ele (str): Atom element

occ (float): Atom occupancy

abfac (float): Atom B-factor

x (float): X coordinate of atom.

y (float): Y coordinate of atom.

z (float): Z coordinate of atom.

aindex (int): Atom index

ishetatm (bool): Whether the atom is a heterogenous atom.

acharge (float): Atom charge

Query API documentation

In the following, the interface of the query class is described. In general, you will not have to use this interface but will pass the query as string directly.

class Query(string='')

Create a new query from the given string. The constructor does not throw any error in case the query contains syntax errors. Use valid to check whether the query was valid.

string

The string used to create the query.

Type:

str

valid

True, when the query could be compiled without syntax errors.

Type:

bool

error

If valid is false, this attribute contains the error message. Otherwise it is set to an empty string

Type:

str

IsAtomSelected(atom)

Returns true, when the given atom handle fulfills the predicates, false if not.

IsChainSelected(chain)

Return true if at least one of the atomso of the chain matches the predicates.

IsResidueSelected(residue)

Returns true, when at least one atom of the residue matches the predicates.

class QueryFlag

Defines flags to change default behaviour of Select queries. Possible values:

  • EXCLUSIVE_BONDS - adds bonds to the EntityView when at least one of the two bonded atoms was selected (by default both must be selected)

  • NO_BONDS - do not include any bonds (by default bonds are included)

  • MATCH_RESIDUES - include all atoms of a residue if any of its atoms is selected (by default only selected atoms are included)

QueryQuoteName(name)

Adds appropriate quotation marks to use name in a Query. For instance the following code snippet would generate a query string selecting all chains from a list of chain names:

query = "cname=" + ','.join(mol.QueryQuoteName(name) for name in names)

Note that there is some limited support of wild card symbols (* and ?) which may have undesired effects in a query such as the code above.

Parameters:

name (str) – Name to put in quotation marks

Return type:

str

Raises:

Exception if name cannot be used in queries. This happens if name includes both ‘ and ” or if it ends with \.