:mod:`~ost.bindings.hhblits` - Search related sequences in databases ================================================================================ Introduction -------------------------------------------------------------------------------- HHblits is a sequence search tool like BLAST but able to find more distant homologs. This is achieved by aligning hidden Markov models (HMM) in the search process as opposed to `sequence-sequence` searches in BLAST. HHblits works on a HMM database, usually that one is provided, queried with a HMM representing your target sequence. The latter one needs to be calculated before the actual search. The software suite needed for HHblits can be found on `github <https://github.com/soedinglab/hh-suite>`_. Alternatively, the deprecated HHblits 2.x suite can be found here: `here <http://wwwuser.gwdg.de/~compbiol/data/hhsuite/releases/all/>`_. On HHblits Versions -------------------------------------------------------------------------------- The binding for HHblits 3 has internally been forked from the HHblits 2 binding. The binding for HHblits 2 is considered deprecated and doesn't receive bugfixes anymore. Also the documentation refers to the HHblits 3 binding. The different bindings can be imported explicitely: .. code-block:: python from ost.bindings import hhblits2 from ost.bindings import hhblits3 Alternatively you can let OpenStructure figure out the HHblits version you're using and import the appropriate binding for you under the base name hhblits. This assumes the HHblits binary (hhblits) to be in your path and raises an error otherwise. .. code-block:: python from ost.bindings import hhblits Examples -------------------------------------------------------------------------------- A typical search: Get an instance of the binding, build the search profile out of the query sequence, run the search and iterate results. .. code-block:: python from ost.bindings import hhblits3 # Create a SequenceHandle, alternatively you can load any sequence in # FASTA format using ost.io.LoadSequence(<PATH_TO_SEQUENCE>) query_seq = seq.CreateSequence('Query', 'TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN') # set up the search environment # lets assume a default installation with hhblits binary at # <PATH_TO_HHBLITS_INSTALL>/bin/hhblits hh = hhblits3.HHblits(query_seq, '<PATH_TO_HHBLITS_INSTALL>') # now create a search profile for the query sequence against uniclust30 # which you can get with instructions in the hh-suite user guide (github) # <PATH_TO_DB>/uniclust30_2018_08 is just the prefix common to # all db files, so `ls <PATH_TO_DB>/uniclust30_2018_08*` would list all # of them a3m_file = hh.BuildQueryMSA(nrdb='<PATH_TO_DB>/uniclust30_2018_08') # lets load the data in the a3m_file and display the generated # multiple sequence alignment note that ParseA3M is not a class method # but a module function a3m_data = hhblits3.ParseA3M(open(a3m_file)) print(a3m_data['msa']) # search time! we just search against uniclust again, but every HHblits db is # working here, e.g. one build from all the sequences in PDB hit_file = hh.Search(a3m_file, '<PATH_TO_DB>/uniclust30_2018_08') # lets have a look at the resuls with open(hit_file) as hit_fh: header, hits = hhblits3.ParseHHblitsOutput(hit_fh) for hit in hits: print(hit.aln) # cleanup hh.Cleanup() Binding API -------------------------------------------------------------------------------- .. automodule:: ost.bindings.hhblits3 :synopsis: Search related sequences in databases :members: .. LocalWords: HHblits homologs