function [as1, as2, score, T] = alignSequences(s1, s2, SM, g) %ALIGNSEQUENCES Perform sequence alignment % [as1, as2, score, T] = alignSequences(s1, s2, SM, g) % Takes two sequences, s1 and s2, a scoring matrix SM, and a gap % score g, and performs sequence alignment. Returns the two % aligned sequences, as1 and as2, along with the alignment score % and the dynamic-programming matrix T. % Retrieve alignment sequence lengths N = length(s1); M = length(s2); % Convert amino acid symbols ('A','R','N',...,'V') to amino acid % positions (1,2,...,20). This is done for faster, more efficient % scoring matrix lookups. seq1 = sym2pos(s1); seq2 = sym2pos(s2); % Initialize the dynamic matrix. Since Matlab indeces start at 1, % the dynamic matrix will be (N+1)x(M+1) T(1,1) = 0; T(2:N+1,1) = g .* [1:N]'; T(1,2:M+1) = g .* [1:M]; % Perform dynamic programming. Note that due to a small change in % indexing, padding seq1 and seq2 is not necessary. for i=1:N for j=1:M T(i+1,j+1) = max([T(i,j+1)+g, T(i+1,j)+g, T(i,j)+SM(seq1(i), seq2(j))]); end end % Recover the optimal alignment. Again, note the change in indexing i = N; j = M; as1 = []; as2 = []; while (i>0 | j>0) if (i>0 & T(i+1,j+1) == T(i,j+1) + g) as1 = [seq1(i) as1]; % s1(i) aligned against a gap as2 = [ 0 as2]; % 0 corresponds to a gap i = i - 1; elseif (j>0 & T(i+1,j+1) == T(i+1,j) + g) as1 = [ 0 as1]; % 0 corresponds to a gap as2 = [seq2(j) as2]; % s2(i) aligned against a gap j = j - 1; else as1 = [seq1(i) as1]; % s1(i) is aligned against s2(j) as2 = [seq2(j) as2]; i = i - 1; j = j - 1; end end % Convert as1 and as2 from amino acid positions (1,2,...20) to % amino acid symbols ('A','R','N',...,'V') as1 = pos2sym(as1); as2 = pos2sym(as2); % Return the total score score = T(N+1,M+1);