Rabin-Karp Algorithm
The Rabin-Karp Algorithm is a string matching/searching algorithm that uses hashing to find a pattern within a text.
It is efficient when we need to search for multiple patterns in a single text.
It compares the hash value of the pattern with hash values of substrings of the text. If the hash matches, it does a character-by-character match to confirm.
How It Works (Step-by-Step):
- Calculate the hash of the pattern.
- Slide a window of the same length over the text.
- For each window:
- Compute the hash of the current substring.
- If the hash matches the pattern’s hash, check the actual characters.
- If characters match, pattern is found.
- Use a rolling hash to efficiently compute the next substring hash.
Example:
Text: “ababcababc”
Pattern: “abc”
- Hash of “abc” → Assume hash = 123
- Sliding over text:
- “aba” → hash ≠ 123
- “bab” → hash ≠ 123
- “abc” → hash = 123 → check characters → Match!
Pattern found at index 2 and again at index 7
Time Complexity:
|
Case |
Time |
|
Best/Average |
O(n + m) |
|
Worst Case |
O(nm) (if too many hash collisions) |
- n = length of text
- m = length of pattern
Python Code Logic (Core Only):
def rabin_karp (text, pattern):
n, m = len(text), len(pattern)
base = 256
prime = 101
pattern_hash = 0
window_hash = 0
h = 1
# Precompute h = base^(m-1) % prime
for i in range(m-1):
h = (h * base) % prime
# Initial hash values
for i in range(m):
pattern_hash = (base * pattern_hash + ord(pattern[i])) % prime
window_hash = (base * window_hash + ord(text[i])) % prime
# Slide window
for i in range(n – m + 1):
if pattern_hash == window_hash:
if text[i:i+m] == pattern:
print(“Pattern found at index”, i)
if i < n – m:
window_hash = (base * (window_hash – ord(text[i]) * h) + ord(text[i + m])) % prime
if window_hash < 0:
window_hash += prime
Why Use Rabin-Karp?
- Ideal when searching many patterns in the same text.
- Efficient due to rolling hash.
- Avoids repeated character-by-character checking.