A

R

P

```
"""
LCS Problem Statement: Given two sequences, find the length of longest subsequence
present in both of them. A subsequence is a sequence that appears in the same relative
order, but not necessarily continuous.
Example:"abc", "abg" are subsequences of "abcdefgh".
"""
def longest_common_subsequence(x: str, y: str):
"""
Finds the longest common subsequence between two strings. Also returns the
The subsequence found
Parameters
----------
x: str, one of the strings
y: str, the other string
Returns
-------
L[m][n]: int, the length of the longest subsequence. Also equal to len(seq)
Seq: str, the subsequence found
>>> longest_common_subsequence("programming", "gaming")
(6, 'gaming')
>>> longest_common_subsequence("physics", "smartphone")
(2, 'ph')
>>> longest_common_subsequence("computer", "food")
(1, 'o')
"""
# find the length of strings
assert x is not None
assert y is not None
m = len(x)
n = len(y)
# declaring the array for storing the dp values
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
match = 1 if x[i - 1] == y[j - 1] else 0
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1] + match)
seq = ""
i, j = m, n
while i > 0 and j > 0:
match = 1 if x[i - 1] == y[j - 1] else 0
if dp[i][j] == dp[i - 1][j - 1] + match:
if match == 1:
seq = x[i - 1] + seq
i -= 1
j -= 1
elif dp[i][j] == dp[i - 1][j]:
i -= 1
else:
j -= 1
return dp[m][n], seq
if __name__ == "__main__":
a = "AGGTAB"
b = "GXTXAYB"
expected_ln = 4
expected_subseq = "GTAB"
ln, subseq = longest_common_subsequence(a, b)
print("len =", ln, ", sub-sequence =", subseq)
import doctest
doctest.testmod()
```

Given two strings `S`

and `T`

, find the length of the longest common subsequence (LCS).

Let the `dp[i][j]`

be the length of the longest common subsequence of prefixes `S[1..i]`

and `T[1..j]`

. Our answer (the length of LCS) is `dp[|S|][|T|]`

since the prefix of the length of string is the string itself.

Both `dp[0][i]`

and `dp[i][0]`

are `0`

for any `i`

since the LCS of empty prefix and anything else is an empty string.

Now let's try to calculate `dp[i][j]`

for any `i`

, `j`

. Let's say `S[1..i] = *A`

and `T[1..j] = *B`

where `*`

stands for any sequence of letters (could be different for `S`

and `T`

), `A`

stands for any letter and `B`

stands for any letter different from `A`

. Since `A != B`

, our LCS doesn't include `A`

or `B`

as a last character. So we could try to throw away `A`

or `B`

character. If we throw `A`

, our LCS length will be `dp[i - 1][j]`

(since we have prefixes `S[1..i - 1]`

and `T[1..j]`

). If we try to throw `B`

character, we will have prefixes `S[1..i]`

and `T[1..j - 1]`

so the length of LCS will be `dp[i][j - 1]`

. As we are looking for the Longest common subsequence, we will pick the maximum value from `dp[i][j - 1]`

and `dp[i - 1][j]`

.

But what if `S[1..i] = *A`

and `T[1..j] = *A`

? We could say that the LCS of our prefixes is LCS of prefixes `S[1..i - 1]`

and `T[1..j - 1]`

plus the letter `A`

. So `dp[i][j] = dp[i - 1][j - 1] + 1`

if `S[i] = T[j]`

.

We could see that we can fill our `dp`

table row by row, column by column. So our algorithm will be like:

- Let's say that we have strings
`S`

of the length N and`T`

of the length M (numbered from 1). Let's create the table`dp`

of size`(N + 1) x (M + 1)`

numbered from 0. - Let's fill the 0th row and the 0th column of
`dp`

with 0. - Then, we follow the algorithm:

```
for i in range(1..N):
for j in range(1..M):
if(S[i] == T[j])
dp[i][j] = dp[i - 1][j - 1] + 1
else
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
```

`O(N * M)`

In any case

`O(N * M)`

- simple implementation
`O(min {N, M})`

- two-layers implementation (as `dp[i][j]`

depends on only i-th and i-th layers, we coudld store only two layers).

Let's say we have strings `ABCB`

and `BBCB`

. We will build such a table:

```
# # A B C B
# 0 0 0 0 0
B 0 ? ? ? ?
B 0 ? ? ? ?
C 0 ? ? ? ?
B 0 ? ? ? ?
```

Now we will start to fill our table from 1st row. Since `S[1] = A`

and `T[1] = B`

, the `dp[1][1]`

will be the maximal value of `dp[0][1] = 0`

and `dp[1][0] = 0`

. So `dp[1][1] = 0`

. But now `S[2] = B = T[1]`

, so `dp[1][2] = dp[0][1] + 1 = 1`

. `dp[1][3]`

is `1`

since `A != C`

and we pick `max{dp[1][2], dp[0][3]}`

. And `dp[1][4] = dp[0][3] + 1 = 1`

.

```
# # A B C B
# 0 0 0 0 0
B 0 0 1 1 1
B 0 ? ? ? ?
C 0 ? ? ? ?
B 0 ? ? ? ?
```

Now let's fill the other part of the table:

```
# # A B C B
# 0 0 0 0 0
B 0 0 1 1 1
B 0 0 1 1 2
C 0 0 1 2 2
B 0 0 1 2 3
```

So the length of LCS is `dp[4][4] = 3`

.