Z Algorithm

Problem Motivation

  • Given a pattern string and a main string, find all the occurences of pattern in main string in O(n).
  • In the basic approach, it is basically a O(n^2) approach available, but the question demands a O(n) complexity algorithm.

Algorithm (in basic words)

  • Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from S[i] which is also a prefix of S, i.e. the maximum k such that S[j] = S[i + j] for all 0 ≤ j < k. Note that Z[i] = 0 means that S[0] ≠ S[i]. For easier terminology, we will refer to substrings which are also a prefix as prefix-substrings.

Algorithm Description

  • While traversing the string we maintain two variables L and R such that S[L…R] is a prefix substring.
  • Initially, we run brutly for finding Z[1], by comparing and traversing S[0..] with S[1…].Thus, it will give the starting L and R.
  • Now, let’s say that for i , we have all values till Z[i-1] and L and R currently, in order to calculate Z[i] from L to R , two cases will arise
    • If i>R, it means that no prefix substring is there, thus brutly finding the answer for Z[i] and we will update L to R in this case like comparing S[i…] to S[0…].
    • Otherwise, i<R , then lets say k=i-l, thus Z[i] = min(Z[k], R-i+1) because S[i..] matches with S[k…] for atleast R-i+1 elements , Now consider:
      • If Z[k] < R-i+1 , there is no longer substring from i or Z[k] would be greater, thus in this case Z[i]=Z[k].
      • If Z[k] >= R-i+1 thus may be more characters than R-i+1 can be matched, that we have to find, thus in this case again find Z[i] by brute forcing and update L and R again.

Time Complexity

  • The run time complexity is clearly O(n), because we never compare the characters less than R , thus R can reach n atmost in n steps incrementing 1 each time, thus complexity is O(n).

Problem Solution

  • One application of the Z Algorithm is for the standard string matching problem of finding matches for a pattern T of length m in a string S of length n. We can do this in O(n + m) time by using the Z Algorithm on the string T Φ S (that is, concatenating T, Φ, and S) where Φ is a character that matches nothing. The indices i with Z[i] = m correspond to matches of T in S.
	vector zfunction(int n,string s)
       { 
         vector z(n);
         z[0]=0;
         int l=0,r=0;
         for(int i=1;i<n;i++)
         {
            if(i>r)
            {
                l=r=i;
                while(r < n && s[r-l] == s[r]) r++;
                z[i]=r-l;
                r--;
            }
            else
            {
                k=i-l;
                if(z[k] < r-i+1)
                  z[i]=z[k];
                else
                {
                   l=i;
                   while(r < n&&s[r-l]==s[r])r++;
                   z[i]=r-l;
                   r--;
                }
            }
         }
         return z;
     }