Queries¶
OpenStructure includes a powerful query system that allows you to perform custom selections on a molecular entity in a convenient way.
The Basics¶
It is often convenient to highlight or focus certain parts of the structure.
OpenStructure includes a powerful query system that allows you to perform custom
selections in a convenient way. Selections are carried out mainly by calling the Select method made available by EntityHandle and EntityView
objects while providing a query string. Queries are written using a dedicated mini-language. For example, to select all arginine residues of a given structure, one would write:
arginines = model.Select('rname=ARG')
A simple selection query (called a predicate) consists of a property (here,
rname), a comparison operator (here, =) and an argument (here, ARG). The
return value of a call to the EntityHandle.Select()
method is always an
EntityView
. The EntityView
always contains a full hierarchy of
elements, never standalone separated elements. In the above example, the
EntityView
called arginines will contain all chains from the
structure called model that have at least one arginine. In turn these chains
will contain all residues that have been identified as arginines. The residues
themselves will contain references to all of their atoms. Of course, queries are
not limited to selecting residues based on their type, it is also possible to
select atom by name:
c_betas = model.Select('aname=CB')
As before, c_betas is an instance of an EntityView
object and contains
a full hierarchy. The main difference to the previous example is that the
selected residues do not contain a list of all of their atoms but only the
C-beta. These examples clarify why the name ‘view’ was chosen for this result of
a Select()
statement. It represents a reduced, restrained
way of looking at the original structure.
Both the selection statements that have been used so far take strings as their arguments. However, selection properties such as rnum (residue number), take numeric arguments. With numeric arguments it is possible to use identity operators ( != and =). It is also possible to compare them using the >, <, >= and <= operators. For example, the 20 N-terminal residues of a protein can be selected with:
n_term = model.Select('rnum<=20')
If you want to supply arguments with special characters they need to be put in quotation marks (’ or “). For instance, this is needed for any chain name containing spaces as in:
model.Select('cname=" "')
Almost any name can be quoted with QueryQuoteName()
.
Combining predicates¶
Selection predicates can be combined with boolean operators. For example , you might want to select all C atoms with crystallographic occupancy higher than 50. These atoms must match the predicate ele=C in addition to the predicate occ>50. In the query language this can be written as:
model.Select('ele=C and occ>50')
Compact forms are available for several selection statements. For example, to select all arginines and aspargines, one could use a statement like:
arg_and_asn = model.Select('rname=ARG or rname=ASN')
However, this is rather cumbersome as it requires the word rname to be typed twice. Since the only difference between the two parts of the selection is the argument that follows the word rname, the statement can also be written in an abbreviated form:
arg_and_asn = model.Select('rname=ARG,ASN')
Another example: to select residues with numbers in the range 130 to 200, one could use the following statement
center = model.Select('rnum>=130 and rnum<=200')
or alternatively use the much nicer syntax:
center = model.Select('rnum=130:200')
This last statement is completely equivalent to the previous one. This syntax can be used when the selection statement requires a range of integer values within a closed interval.
Distance Queries¶
The query
around_center = model.Select('5 <> {0,0,0}')
selects all chains, residues and atoms that lie with 5 Å to the origin of the reference system ({0,0,0}). The <> operator is called the ‘within’ operator. Instead of a point, the within statements can also be used to return a view containing all chains, residues and atoms within a radius of another selection statement applied to the same entity. Square brackets are used to delimit the inner query statement.
around_hem = model.Select('5 <> [rname=HEM]')
model.Select('5 <> [rname=HEM and ele=C] and rname!=HEM')
Bonds and Queries¶
When an EntityView
is generated by a selection, it includes by default
only bonds for which both connected atoms satisfy the query statement. This can
be changed by passing the parameters EXCLUSIVE_BONDS or NO_BONDS when
calling the Select method. EXCLUSIVE_BONDS adds bonds to the
EntityView
when at least one of the two atoms falls within the boundary
of the selection. NO_BONDS suppresses the bond inclusion step completely.
Whole Residue Queries¶
If the parameter MATCH_RESIDUES is passed when the Select method is called,
the resulting EntityView
will include whole residues for which at least
one atom satisfies the query. This means that if at least one atom in the
residue falls within the boundaries of the selection, all atoms of the residue
will be included in the View.
More Query Usage¶
The high level interface for queries are the Select methods of the
EntityHandle and EntityView
classes. By passing in a query string, a view
consisting of a subset of the elements is returned.
Queries also offer a second interface: IsAtomSelected(), IsResidueSelected() and IsChainSelected() take an atom, residue or chain as their argument and return true or false, depending on whether the element fulfills the predicates.
Generic Properties in Queries¶
The query language can also be used for numeric generic properties (i.e. float and int), but the syntax is slightly different. To access any generic properties, it needs to be specified that they are generic and at which level they are defined. Therefore, all generic properties start with a g, followed by an a, r or c for atom, residue or chain level respectively.
# set generic properties for atom, residue, chain
atom_handle.SetFloatProp("testpropatom", 5.2)
resid_handle.SetFloatProp("testpropres", 1.1)
chain_handle.SetIntProp("testpropchain", 10)
# query statements
sel_a = e.Select("gatestpropatom<=10.0")
sel_r = e.Select("grtestpropres=1.0")
sel_c = e.Select("gctestpropchain>5")
Since generic properties do not need to be defined for all parts of an entity
(e.g. it could be specified for one single AtomHandle
), the query
statement will throw an error unless you specify a default value in the query
statement which can be done using a ‘:’ character:
# if one or more atoms have no generic properties
sel = e.Select("gatestprop=5")
# this will throw an error
# you can specify a default value:
sel = e.Select("gatestprop:1.0=5")
# this will run through smoothly and use 1.0 as
# the default value for all atoms that do not
# have the generic property 'testprop'
Using this method, you will be warned if a generic property is not set for all atoms, residues or chains unless you specify a default value. So, be careful when you do.
Available Properties¶
The following properties may be used in predicates. The type is given for each property.
Properties of Chains¶
cname/chain (str) Chain name
Properties of Residues¶
rname (str): Residue name
rnum (int): Residue number
. Currently only the
numeric part is honored.
rtype (str): Residue type as given by the DSSP code (e.g. “H” for alpha
helix, “E” for extended), “helix” for all helix types, “ext” or “strand” for
all beta sheets or “coil” for any type of coil (see SecStructure
).
rindex (int): Index
of residue handle in chain.
This index is the same for views and handles.
peptide (bool): Whether the residue is peptide linking
.
protein (bool): Whether the residue is considered to be
part of a connected protein
.
rbfac (float): average B (temperature) factor of residue
ligand (bool) Whether the residue is a ligand
.
water (bool) Whether the residue is water.
Properties of Atoms¶
aname (str): Atom name
ele (str): Atom element
occ (float): Atom occupancy
abfac (float): Atom B-factor
x (float): X
coordinate of atom.
y (float): Y
coordinate of atom.
z (float): Z
coordinate of atom.
aindex (int): Atom index
ishetatm (bool): Whether the atom is a heterogenous
atom
.
acharge (float): Atom charge
Query API documentation¶
In the following, the interface of the query class is described. In general, you will not have to use this interface but will pass the query as string directly.
- class Query(string='')¶
Create a new query from the given string. The constructor does not throw any error in case the query contains syntax errors. Use
valid
to check whether the query was valid.- string¶
The string used to create the query.
- Type:
str
- valid¶
True, when the query could be compiled without syntax errors.
- Type:
bool
- error¶
If
valid
is false, this attribute contains the error message. Otherwise it is set to an empty string- Type:
str
- IsAtomSelected(atom)¶
Returns true, when the given atom handle fulfills the predicates, false if not.
- IsChainSelected(chain)¶
Return true if at least one of the atomso of the chain matches the predicates.
- IsResidueSelected(residue)¶
Returns true, when at least one atom of the residue matches the predicates.
- class QueryFlag¶
Defines flags to change default behaviour of Select queries. Possible values:
EXCLUSIVE_BONDS
- adds bonds to theEntityView
when at least one of the two bonded atoms was selected (by default both must be selected)NO_BONDS
- do not include any bonds (by default bonds are included)MATCH_RESIDUES
- include all atoms of a residue if any of its atoms is selected (by default only selected atoms are included)
- QueryQuoteName(name)¶
Adds appropriate quotation marks to use name in a
Query
. For instance the following code snippet would generate a query string selecting all chains from a list of chain names:query = "cname=" + ','.join(mol.QueryQuoteName(name) for name in names)
Note that there is some limited support of wild card symbols (* and ?) which may have undesired effects in a query such as the code above.
- Parameters:
name (
str
) – Name to put in quotation marks- Return type:
str
- Raises:
Exception
if name cannot be used in queries. This happens if name includes both ‘ and ” or if it ends with \.