Dynamic programming for alignment

Dynamic programming is a technique that breaks a larger problem down into smaller sub-problems, solves those and caches the answers.  In this way, solving the larger problem is more computationally efficient than a naive brute-force approach which tries all possible answers.  One example use is for aligning two strings to find the minimum edit distance that transforms one string into the other. For example, four edits (two substitutions, one deletion and one insertion) are needed to turn “the cat sat on the mat” into “the mouse bit the mat gently“:

the cat   sat on the mat 
the mouse bit    the mat gently

Dynamic programming involves solving sub-problems like aligning “the cat” and “the mouse“, and then expanding to solve, say, aligning “the cat” and “the mouse bit“, until the two complete strings are aligned.

A table is built where each column corresponds to a word in the first string, and each row to a word in the second. The first row and column correspond to empty strings.

        | "" the cat sat on the mat
------------------------------------
""      | .  .   .   .   .  .   .
the     | .  .   .   .   .  .   .
mouse   | .  .   .   .   .  .   .
bit     | .  .   .   .   .  .   .
the     | .  .   .   .   .  .   . 
mat     | .  .   .   .   .  .   .
gently  | .  .   .   .   .  .   .

The cost of word insertions, deletions and substitutions are defined as I, D and S. Starting from the top left corner, the table is filled in. The first row and column are simple, a running sum of deletion/insertion costs:

        | "" the cat sat on the mat
------------------------------------
""      | 0  D   2D  3D  4D 5D  6D
the     | I  .   .   .   .  .   .
mouse   | 2I .   .   .   .  .   .
bit     | 3I .   .   .   .  .   .
the     | 4I .   .   .   .  .   . 
mat     | 5I .   .   .   .  .   .
gently  | 6I .   .   .   .  .   .

Any other position (i,j) is filled in with the minimum of

(i-1,j)   + I
(i  ,j-1) + D
(i-1,j-1) + S if (word i in string B != word j in string A)
(i-1,j-1)     if (word i in string B == word j in string A)

These choices correspond to 1) inserting word i from string B, 2) deleting word j from string A, 3) substituting word j from string A with word i from string B, and 4) aligning word j from string A with word i from string B as they are equal.

The table can now be filled in from position (1,1), and the value in the bottom right hand corner gives the minimum edit distance. Assuming that I=S=D=1, then the completed table is

        | "" the cat sat on the mat
------------------------------------
""      | 0  1   2   3   4  5   6
the     | 1  0   1   2   3  4   5
mouse   | 2  1   1   2   3  4   5
bit     | 3  2   2   2   3  4   5
the     | 4  3   3   3   3  3   4 
mat     | 5  4   4   4   4  4   3
gently  | 6  5   5   5   5  5   4

The numbers marked in bold show the path through the table to the bottom right corner, or the best decisions made as the algorithm progressed. Tracing back this path tells us how to align the two strings.

The substitution cost S does not need to be constant. For example, we might decide that the words “mouse” and “house” are closer than the words “cat” and “house”, and set a lower value of S for the first pair of words than the second.

Dynamic programming can be used aligning many different things, for example:

  • In bioinformatics to align segments of DNA, RNA or protein
  • Aligning two time series, e.g. segments of audio, to see how similar they are
  • Aligning letters in two words for spellchecking
Advertisements
This entry was posted in Technology and tagged , , , . Bookmark the permalink.

One Response to Dynamic programming for alignment

  1. Pingback: The Knapsack Problem and Dynamic Programming | My computer doesn't listen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s