Problem 9 - Implement strStr()

Leetcode Link: Implement strStr().

Statement
Examples
Solution 1 - Simple Loop
Solution 2 - Use a Deterministic Finite Automata (DFA)
Solution 3 - Knuth-Morris-Pratt (KMP) Algorithm

Statement

Implement strStr().

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C’s strstr() and Java’s indexOf().

Examples

Input: haystack = "hello", needle = "ll"
Output: 2

Input: haystack = "aaaaa", needle = "bba"
Output: -1

Solution 1 - Simple Loop

Concept

Assertion: Consider a string \(\text{text}\) of length \(n\) and a pattern \(\text{pattern}\) of length \(m \le n\). Then, \(\text{pattern}\) can occur in \(\text{text}\) starting at any index \(i\) such that \(0 \le i \le n - m\).

If \(\text{needle}\) is an empty string, return \(0\).

Let length of \(\text{haystack}\) be \(n\) and that of \(\text{needle}\) be \(m\).

For each \(0 \le i \le n - m\):

If \(\text{needle} = \text{haystack}[i : i + m]\), return \(i\).
Otherwise, continue.

Return \(-1\) by default.

Example

Input

haystack = "hello"
needle = "ll"

Procedure

\(m = 2\)
Iteration 1:
- \(i = 0\)
- Since \(\text{needle} \not = {\text{haystack}[0:2] = \text{he}}\), continue
Iteration 2:
- \(i = 1\)
- Since \(\text{needle} \not = {\text{haystack}[1:3] = \text{el}}\), continue
Iteration 3:
- \(i = 2\)
- Since \(\text{needle} = \text{haystack}[2:4] = \text{ll}\), return \(i = 2\)

Time complexity

There are a total of \(n - m + 1\) possible values for \(i\) and for each \(i\), the solution takes out a slice of length \(m\). Therefore, the time complexity is \(\mathcal{O}((n - m + 1)*m)\).

Solution 2 - Use a Deterministic Finite Automata (DFA)

Concept

To learn about DFAs, see Deterministic finite automaton.

Assertion 1: Consider a string \(\text{pattern}\) of length \(m\). Let \(A=(Q, \Sigma, \delta, q_0, F)\) be a DFA such that \(L(A)=\{w \mid w \text{ ends with pattern}\}\). Then, \(A\) has \(m + 1\) states (\(\mid Q \mid = m\)) and a single final state \(q_F\) (\(\mid F \mid = 1\)).

For example, if \(\text{pattern}=ll\), \(Q=\{0, 1, 2\}\), \(\Sigma=\{a,b,\dots,z\}\), \(q_0=0\) and \(F=\{2\}\), then \(A\) is as shown:

Assertion 2: Let \(A\) be a DFA as mentioned above with \(Q=\{0, 1, \dots, m\}\) and \(q_F=m\).

Consider a string \(\text{text}\) of length \(n\).

If \(\text{text}\) is processed through \(A\) and the final state \(q_F=m\) is reached on the character at index \(i \lt n\), then \(\text{pattern}\) exists in \(\text{text}\) and starts at index \(i - m + 1\).

Assertion 3: The empty string \(\epsilon\) is a prefix and a suffix of all strings.

The solution is made up of three parts.

Prefix Function

The prefix function is a function \(\pi:\{0, 1, \dots, m - 1\} \rightarrow \{0, 1, \dots, m-1\}\) defined for \(\text{pattern}\) such that:

\[\pi[q] = \max \{k: k \lt q \text{ and pattern}[:k] \text{ is a suffix of pattern}[:q]\}\]

In other words, \(\pi[q]\) is the length of the longest prefix of \(\text{pattern}\) that is a proper suffix (since \(k \lt q\)) of \(\text{pattern}[:q+1]\).

For example, let \(\text{pattern} = \text{ababaca}\).

The prefixes of \(\text{pattern}\) are as shown:

\(\epsilon\)

\(\text{a}\)

\(\text{ab}\)

\(\text{aba}\)

\(\text{abab}\)

\(\text{ababa}\)

\(\text{ababac}\)

\(\text{ababaca}\)

\(\pi\) is as shown:

q	0	1	2	3	4	5	6
\(\text{pattern}[:q+1]\)	\(\text{a}\)	\(\text{ab}\)	\(\text{aba}\)	\(\text{abab}\)	\(\text{ababa}\)	\(\text{ababac}\)	\(\text{ababaca}\)
Proper suffixes	\(\epsilon\)	\(\epsilon\), \(\text{b}\)	\(\epsilon\), \(\text{a}\), \(\text{ba}\)	\(\epsilon\), \(\text{b}\), \(\text{ab}\), \(\text{bab}\)	\(\epsilon\), \(\text{a}\), \(\text{ba}\), \(\text{aba}\), \(\text{baba}\)	\(\epsilon\), \(\text{c}\), \(\text{ac}\), \(\text{bac}\), \(\text{abac}\), \(\text{babac}\)	\(\epsilon\), \(\text{a}\), \(\text{ca}\), \(\text{aca}\), \(\text{baca}\), \(\text{abaca}\), \(\text{babaca}\)
Required prefix	\(\epsilon\)	\(\epsilon\)	\(\text{a}\)	\(\text{ab}\)	\(\text{aba}\)	\(\epsilon\)	\(\text{a}\)
\(\pi[q]\)	0	0	1	2	3	0	1

The code for computing \(\pi\) is as follows:

def pi(pattern: str) -> list[int]:
    pi = [-1] * len(pattern)
    pi[0] = 0

    k = 0

    for idx, char in enumerate(pattern[1:], 1):
        while k > 0 and pattern[k] != char:
            k = pi[k - 1]

        if pattern[k] == char:
            k += 1

        pi[idx] = k

    return pi

Transition Function

Note: There is a much simpler way of computing the transition function without using \(\pi\) but it is significantly slower.

Given the prefix function \(\pi\) for \(\text{pattern}\), the transition function \(\delta\) of the DFA \(A\) as mentioned in the assertion can be created easily.

Set \(\delta(q, a) = 0\) for all \(a \in \Sigma - \{\text{pattern[0]}\}\).

Set \(\delta(0, \text{pattern[0]})=1\).

For each \(1 \le q \le m\) and \(a \in \Sigma\):

If \(q = m\) or \(\text{pattern}[q] \not = a\), \(\delta(q, a) = \delta(\pi[q - 1], a)\)
Otherwise, \(\delta(q, a) = q + 1\)

Running \(\text{text}\) through the DFA

Given the transition function \(\delta\), the starting position of \(\text{pattern}\) in \(\text{text}\) can be found easily.

Initialize \(q\) to \(0\), the start state of the DFA.

For each character \(a=\text{text}[i]\) in \(\text{text}\):

Get the next state as \(q = \delta(q, a)\)
If \(q = m\) (the final state), return \(i - m + 1\)
Otherwise, continue

Return \(-1\) at the end.

Example

Input

haystack = "hello"
needle = "ll"

Procedure

Compute prefix function \(\pi\):

q 0 1

\(\pi[q]\) 0 1
Compute the transition function \(\delta\):

\(\delta\) \(\text{a}\) \(\text{b}\) \(\dots\) \(\text{l}\) \(\dots\) \(\text{z}\)

0 0 0 \(\dots\) 1 \(\dots\) 0

1 0 0 \(\dots\) 2 \(\dots\) 0

*2 0 0 \(\dots\) 2 \(\dots\) 0

The \(\dots\) indicates that the values are same everywhere and \(*\) indicates the final state:.
\(q = 0\)
Iteration 1:
- \(i = 0\)
- \(a = \text{haystack[i]} = \text{h}\)
- \(q = \delta(0, \text{h}) = 0\)
- Since \(q \not = 2\), continue.
Iteration 2:
- \(i = 1\)
- \(a = \text{haystack[i]} = \text{e}\)
- \(q = \delta(0, \text{e}) = 0\)
- Since \(q \not = 2\), continue.
Iteration 3:
- \(i = 2\)
- \(a = \text{haystack[i]} = \text{l}\)
- \(q = \delta(0, \text{l}) = 1\)
- Since \(q \not = 2\), continue.
Iteration 4:
- \(i = 3\)
- \(a = \text{haystack[i]} = \text{l}\)
- \(q = \delta(1, \text{l}) = 2\)
- Since \(q = 2\), return \(i - m + 1 = 2\).

q	0	1
\(\pi[q]\)	0	1

\(\delta\)	\(\dots\)	\(\text{l}\)	\(\dots\)
0	\(\dots\)	1	\(\dots\)
1	\(\dots\)	2	\(\dots\)
*2	\(\dots\)	2	\(\dots\)

Time complexity

The prefix function can be computed in \(\mathcal{O}(m)\).

Using the prefix function, the transition function can be computed in \(\mathcal{O}(m\mid \Sigma \mid)\).

Problem 9 - Implement strStr()

Statement

Examples

Solution 1 - Simple Loop

Concept

Example

Time complexity

Solution 2 - Use a Deterministic Finite Automata (DFA)

Concept

Example

Time complexity

Solution 3 - Knuth-Morris-Pratt (KMP) Algorithm

Concept

Example

Time complexity