Z Algorithm
Problem Motivation
- Given a pattern string and a main string, find all the occurences of pattern in main string in O(n).
- In the basic approach, it is basically a O(n^2) approach available, but the question demands a O(n) complexity algorithm.
Algorithm (in basic words)
- Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from S[i] which is also a prefix of S, i.e. the maximum k such that S[j] = S[i + j] for all 0 ≤ j < k. Note that Z[i] = 0 means that S[0] ≠ S[i]. For easier terminology, we will refer to substrings which are also a prefix as prefix-substrings.
Algorithm Description
- While traversing the string we maintain two variables L and R such that S[L…R] is a prefix substring.
- Initially, we run brutly for finding Z[1], by comparing and traversing S[0..] with S[1…].Thus, it will give the starting L and R.
- Now, let’s say that for i , we have all values till Z[i-1] and L and R currently, in order to calculate Z[i] from L to R , two cases will arise
- If i>R, it means that no prefix substring is there, thus brutly finding the answer for Z[i] and we will update L to R in this case like comparing S[i…] to S[0…].
- Otherwise, i<R , then lets say k=i-l, thus Z[i] = min(Z[k], R-i+1) because S[i..] matches with S[k…] for atleast R-i+1 elements , Now consider:
- If Z[k] < R-i+1 , there is no longer substring from i or Z[k] would be greater, thus in this case Z[i]=Z[k].
- If Z[k] >= R-i+1 thus may be more characters than R-i+1 can be matched, that we have to find, thus in this case again find Z[i] by brute forcing and update L and R again.
Time Complexity
- The run time complexity is clearly O(n), because we never compare the characters less than R , thus R can reach n atmost in n steps incrementing 1 each time, thus complexity is O(n).
Problem Solution
- One application of the Z Algorithm is for the standard string matching problem of finding matches for a pattern T of length m in a string S of length n. We can do this in O(n + m) time by using the Z Algorithm on the string T Φ S (that is, concatenating T, Φ, and S) where Φ is a character that matches nothing. The indices i with Z[i] = m correspond to matches of T in S.
vectorzfunction(int n,string s) { vector z(n); z[0]=0; int l=0,r=0; for(int i=1;i<n;i++) { if(i>r) { l=r=i; while(r < n && s[r-l] == s[r]) r++; z[i]=r-l; r--; } else { k=i-l; if(z[k] < r-i+1) z[i]=z[k]; else { l=i; while(r < n&&s[r-l]==s[r])r++; z[i]=r-l; r--; } } } return z; }