clustalw - Perform multiple sequence alignment¶
- ClustalW(seq1, seq2=None, clustalw=None, keep_files=False, nopgap=False, clustalw_option_string=False)¶
Runs a ClustalW multiple sequence alignment. The results are returned as a
AlignmentHandleinstance.There are two ways to use this function:
align exactly two sequences:
- param seq1:
sequence_one
- type seq1:
SequenceHandleorstr- param seq2:
sequence_two
- type seq2:
SequenceHandleorstr
The two sequences can be specified as two separate function parameters (seq1, seq2). The type of both parameters can be either
SequenceHandleorstr, but must be the same for both parameters.align two or more sequences:
- param seq1:
sequence_list
- type seq1:
- param seq2:
must be
None
Two or more sequences can be specified by using a
SequenceList. It is then passed as the first function parameter (seq1). The second parameter (seq2) must beNone.
- Parameters:
clustalw (
str) – path to ClustalW executable (used inLocate())nopgap (
bool) – turn residue-specific gaps offclustalw_option_string (
str) – additional ClustalW flags (see http://www.clustal.org/download/clustalw_help.txt)keep_files (
bool) – do not delete temporary files
Note
In the passed sequences ClustalW will convert lowercase to uppercase, and change all ‘.’ to ‘-’. OST will convert and ‘?’ to ‘X’ before aligning sequences with ClustalW.
If a
sequence namecontains spaces, only the part before the space is considered as sequence name. To avoid surprises, you should remove spaces from the sequence name.Sequence names must be unique (
ValueErrorexception raised otherwise).
ClustalW will accept only IUB/IUPAC amino acid and nucleic acid codes:
Residue
Name
Residue
Name
A
alanine
P
proline
B
aspartate or asparagine
Q
glutamine
C
cystine
R
arginine
D
aspartate
S
serine
E
glutamate
T
threonine
F
phenylalanine
U
selenocysteine
G
glycine
V
valine
H
histidine
W
tryptophan
I
isoleucine
Y
tyrosine
K
lysine
Z
glutamate or glutamine
L
leucine
X
any
M
methionine
*
translation stop
N
asparagine
-
gap of indeterminate length