Saturday, 21 September 2013

CUDA programming

Coalescing: compute capability >= 1.2

  • Possible bus transaction sizes: 32B, 64B,  or 128B
  • Memory segment must be aligned: first address = multiple of segment size

Add CUDA programming support in Visual Studio

Sunday, 8 September 2013

String matching algorithms

Rabin-Karp algorithm

This is an application of rolling hash. The expected running time is O(n + m) where m is the length of the pattern p[0...m - 1] and n is the length of the source text.

Horspool

Horspool algorithm is a variation based on Knuth-Morris-Pratt algorithm and Boyer-Moore algorithm. These algorithms are all based on the idea that using the information of the pattern and the text we can shift the position of comparison by more than 1 position. They all require pre-computing a shift table.

The shift table in the Horspool algorithm is called bad character shift table. For each character c in p[0...m-2], bad_char_skip[c] = m - 1 - position of  last occurrence of c in p[0...m-2]. When a character is encountered that does not occur in the pattern, we can safely skip ahead for the whole length of the pattern.

The algorithm is as follow:
  1. align pattern p with text and move from right to left comparing each character in p[0...m-1] with corresponding character in t[i...i+m-1]
  2. if the left end of p is reached return the position in t that aligns with the left most in p
  3. if before reaching the left end of p, we get a mismatch, and we need to shift right. The number of position to shift is based t[i+m-1], i.e. bad_char_skip[t[i+m-1]].
Complexity: it has an average-case complexity of O(n) on randome text, although it has O(mn) in the worst case.

Z algorithms

Hashing

Hash table is implemented as an array that allows random access. There are two common ways to deal with key collisions:
  • Chaining: colliding elements are put in a linked list in each slot of table. This increase the total size of the data structure. The expected running time for search is \(\Theta(1 + \alpha)\) where \(\alpha\) = # keys stored in the table / # slots in table, i.e., the load factor.
  • Open addressing:  the hash function takes two arguments, the key k and the trial number i. The hash function has the property that if we keep trying h(k, i), we will even hit all slots of the table. An example of such a hash function is double hashing: \(h(k, i) = h_1(k) + i\cdot h_2(k)\)