jueves, 28 de febrero de 2013

Resumen. Suffix Tree

Reference to  "Algorithms on Strings, Trees, and Sequences" by Dan Gusfield.


The string of lenght  m can be generated into m sufflix can be stored

  •  created this structure requieres time O(m)

  • for a parttner requieres tiimeO(n)
These two properties make the suffix tree an appealing structure for a diverse range of bioinformatics applications including: multiple genome alignment (Michael Hohl et al., 2002)

Biological Application

Genome Alignament

 Reference this book "Algorithms on Strings, Trees, and Sequences" by Dan Gusfield.

 At a least two programs based Suffix trees are available for whole gnome alignament. 

MUMmer and MGA both use common subsequences as anchors for the alignment. While they both use the same data structure:

MUMmer extracts Maximal Unique Matches (MUMs)  sequences that occur exactly once in each genome, sorts these sequences to find the longest set of MUMs occurring in the same order in both sequences, and uses this set of sequences to anchor the multiple alignment. 

For this the gaps between anchors are filled using the Needleman Wunsch dynamic programming alignment algorithm.  MUMmer is restricted to comparing two genomes.

The function

MGA computes the longest non-overlapping sequence of Maximal Multiple Exact Matches (multiMEMs) and uses these to guide the multiple alignment. A MEM is defined as a (K+1)-tuple (l, p_0, p_1, ... p_k-1) such that l indicates the length of the MEM, and p_i indicates the start coordinate of the exact match in genome i. A maximal MEM cannot be extended to the left or the right and is referred to as a multiMEM. Gaps are shortened by recursively extracting multiMEM sequences, and finally filled using ClustalW - a progressive/iterative multiple alignment method.



1 comentario: