Sunday 8 September 2013

String matching algorithms

Rabin-Karp algorithm

This is an application of rolling hash. The expected running time is O(n + m) where m is the length of the pattern p[0...m - 1] and n is the length of the source text.

Horspool

Horspool algorithm is a variation based on Knuth-Morris-Pratt algorithm and Boyer-Moore algorithm. These algorithms are all based on the idea that using the information of the pattern and the text we can shift the position of comparison by more than 1 position. They all require pre-computing a shift table.

The shift table in the Horspool algorithm is called bad character shift table. For each character c in p[0...m-2], bad_char_skip[c] = m - 1 - position of  last occurrence of c in p[0...m-2]. When a character is encountered that does not occur in the pattern, we can safely skip ahead for the whole length of the pattern.

The algorithm is as follow:
  1. align pattern p with text and move from right to left comparing each character in p[0...m-1] with corresponding character in t[i...i+m-1]
  2. if the left end of p is reached return the position in t that aligns with the left most in p
  3. if before reaching the left end of p, we get a mismatch, and we need to shift right. The number of position to shift is based t[i+m-1], i.e. bad_char_skip[t[i+m-1]].
Complexity: it has an average-case complexity of O(n) on randome text, although it has O(mn) in the worst case.

Z algorithms

No comments :

Post a Comment