|
C++ Programming |
Download Source and Sample
Definitions
OverviewThe Longest Common Subsequence algorithm constructs a sequence from two source sequences, and contains elements common to both the sources in the order in which they appears. For example, take the first two lines of a popular nursery rhyme as input sequences. What is the longest sequence that contains elements from both of these sequences, with the elements in the same order as they appear in the originals? To visualise the problem, we can write the two strings, one below the other, such that each letter that appears in both strings are aligned:
c a le te", and has a length of 10
elements. To solve the problem of identifying the LCS, we can define that:
Once a decision is made in respect to the first element in the input sequences, the rest of the problem is simply another LCS problem on the short sequences. That is to say, that the problem is recursive. The LCS is computed using a 2-dimensional array, with one sequence along the x-axis and the other sequence along the y-axis. Given that the subsequence is common to both sequences, the x and y axis are interchangable. The algorithm uses memory based on rows, so it follows that by representing the shortest sequence in the x axis will use the least memory, but representing the shortest sequence in the y axis will use less iterations and therefore be slightly faster. I have adopted the smallest memory footprint approach in my algorithm. In order to calculate a value in the array, we must know the values to the left, above and diagonally above-left. So, we only ever need two rows of information, so we can allocate enough memory for two rows of working data and reuse the buffer for alternate rows. At the end of the algorithm, the length of the subsequence is then given by the right most value in last row. AlgorithmNote that the first and second row must be initialised to zero. The algorithm below assumes
the sequence elements are in array indexes 1..n. Index 0 is reserved for 0 value elements to
avoid conditions in the main loops. The
for each element in the first sequence from first to last
for each element in the second sequence from first to (n / 2)
if the data for each element is equal, then
set array[column][row] = 1 + array[column-1][row-1]
else
set array[column][row] = max(array[column-1][row], array[column][row-1])
for each element in the first sequence from the first to the last
for each element in the second sequence from (n / 2) + 1 to the last
if the data for each element is equal, then
set array[column][row] = 1 + array[column-1][row-1]
set position[column] = position[column-1];
else if array[column-1][row] > array[column][row-1]
set array[column][row] = array[column-1][row]
set position[column] = position[column-1];
else
set array[column][row] = array[column][row-1]
Once we get to this point, we know the length of the subsequence, and we know the middle
node along the solution path, which is in the last element of SynopsisWe have two fundamental algorithms. The calculation of the Subsequence and the Length of the Subsequence. Each algorithm has two overloaded implementations, one that takes an allocator object for memory management and the other that provides a default.
template<typename size_type,
typename ItIn1,
typename ItIn2,
typename ItSubSeq>
size_type
longest_common_subsequence(ItIn1 begin_first, ItIn1 end_first,
ItIn2 begin_second, ItIn2 end_second,
ItSubSeq subsequence);
template<typename size_type,
typename Alloc,
typename ItIn1,
typename ItIn2,
typename ItSubSeq>
size_type
longest_common_subsequence(ItIn1 begin_first, ItIn1 end_first,
ItIn2 begin_second, ItIn2 end_second,
ItSubSeq subsequence, Alloc &alloc);
size_type
longest_common_subsequence_length(ItIn1 begin_first, ItIn1 end_first,
ItIn2 begin_second, ItIn2 end_second);
template<typename size_type,
typename Alloc,
typename ItIn1,
typename ItIn2>
size_type
longest_common_subsequence_length(ItIn1 begin_first, ItIn1 end_first,
ItIn2 begin_second, ItIn2 end_second,
Alloc &alloc);
Example
test_character_seq(const char *str1, const char *str2)
{
std::cout << "Comparing: \"" << str1 << "\"" << std::endl
<< " & \"" << str2 << "\"" << std::endl;
std::string subsequence;
signed short llcs;
llcs = boost::longest_common_subsequence<
signed short>(
str1, str1+strlen(str1),
str2, str2+strlen(str2),
std::back_inserter<>(subsequence));
std::cout << "Longest Common Subsequence: \"" << subsequence
<< "\"" << std::endl
<< "Length is " << llcs << std::endl;
BOOST_ASSERT(llcs == static_cast<signed short>(subsequence.size()));
BOOST_ASSERT(
llcs == boost::longest_common_subsequence_length<
signed short>(
str1,
str1+strlen(str1),
str2,
str2+strlen(str2)));
}
Test ProgramThe test program has been compiled and tested with 4 compilers with 5 STL variants.
|
||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||