Reference to
"Algorithms on Strings, Trees, and Sequences" by Dan
Gusfield.
Introduction
The string of lenght m can be generated into m sufflix can be stored
created this structure requieres time O(m)
- for a parttner requieres tiimeO(n)
These two
properties make the suffix tree an appealing structure for a
diverse range of bioinformatics applications including: multiple
genome alignment (Michael Hohl et al., 2002)
Biological Application
Genome Alignament
Reference this book
"Algorithms on Strings, Trees, and Sequences" by Dan
Gusfield.
At a least two programs based Suffix trees are available for whole gnome alignament.
MUMmer and
MGA both use common subsequences as anchors for the alignment.
While they both use the same data structure:
MUMmer extracts Maximal Unique Matches (MUMs) sequences that
occur exactly once in each genome, sorts these sequences to find
the longest set of MUMs occurring in the same order in both
sequences, and uses this set of sequences to anchor the multiple
alignment.
For this the gaps between anchors are filled using the
Needleman Wunsch dynamic programming alignment algorithm. MUMmer is restricted to comparing two genomes.
The function
MGA computes the longest non-overlapping sequence of Maximal
Multiple Exact Matches (multiMEMs) and uses these to guide the
multiple alignment. A MEM is defined as a (K+1)-tuple (l, p_0, p_1,
... p_k-1) such that l indicates the length of the MEM, and p_i
indicates the start coordinate of the exact match in genome i. A
maximal MEM cannot be extended to the left or the right and is
referred to as a multiMEM. Gaps are shortened by recursively
extracting multiMEM sequences, and finally filled using ClustalW -
a progressive/iterative multiple alignment method.
Bibliografia
Quedó algo breve y mal estructurado, pero te pongo un punto extra.
ResponderEliminar