This project has moved and is read-only. For the latest updates, please go here.
1
Vote

HHProfileParser hhmake compatibility

description

HHMs created with HHsuite 2.0's hhmake can have a number of features that will cause an exception while loading with HHProfileParser.
The issues I noticed are: the master sequence of the profile, if the user has not set explicitly the -cons flag, may contain gaps. If the user has set the -cons flag, the process fails when loading additional sequences in the .hhm that have a different length compared to the consensus sequence (because the consensus sequence only contains match states, whereas the alignment may also include insert columns).
It seems that these issues are due to the different HMM standards of HHsuite 2.0.

file attachments

comments

kalev wrote Jan 22, 2015 at 7:21 AM

Do you have a sample file? Please attach it here. Thanks!

jbassler wrote Jan 27, 2015 at 7:58 AM

Sure, here it comes.

The file was creates via 'hhmake -i (input file <fasta>) -o (output file <hhm>)'

When I try to parse it by 'HHProfileParser('gapped_master_sequence.hhm').parse()' it fails with 'HHProfileFormatError: Layer 1 can't be represented by a gap'.

CSB-Version: 1.2.3
HHMake version 2.0.16 (Jan 2013)

jbassler wrote Jan 27, 2015 at 8:05 AM

Concerning the second issue, I have to correct myself: The problem seems coupled to the '-M first' option of HHmake, not the '-cons'. Using '-cons' in the above case circumvents the problem, whereas using '-M first' or similar results in a different error when parsing: 'SequenceError: <RichSequence: gi|325296841|ref|NP_001191662.1|:(2-213), 259 residues> is not of the expected length'.

The original sequence alignment is, of course, valid and all sequences have the same length. It seems I can only upload one file per post, so please contanct me if you need it.

kalev wrote Jan 30, 2015 at 7:55 AM

Can you try to convert your source alignment to a3m format becore calling hhmake? The second error is 100% legitimate - you can't have gaps in the master sequence. HHmake doesn't check for that, but it's an error and CSB correctly detects it. This has tripped a few people in the past. It happens when you feed in a custom alignment in FASTA format, which contains gaps all over the place. You are better off converting it to a3m format first to make sure that gaps are understood by hhmake as gaps, not as match states. I have a feeling that this might resolve the first issue too, so give it a try.

kalev wrote Jan 30, 2015 at 3:01 PM

Just to clarify:
  1. 'HHProfileFormatError: Layer %i can't be represented by a gap' means that you have gaps in the master sequence. This is a real problem in your HMM file which is essentially corrupt. You should convert your FASTA alignment to A3M and then call hhmake to correct this problem.
  2. 'SequenceError: %s is not of the expected length' indicates misalignment, a corrupt A3M alignment in this case. To illustrate why these sequences are misaligned, take the top 3 sequences from the alignment and remove all deletions (-) and insertions (lower case chars) or just count the number of upper case characters. This number should be equal to the number of match states (205), which is currently not the case (203, 202, ...).