Problem
This problem is taken from the book Introduction to Algorithms, Third Edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein:
In order to transform one source string of text x[1..m] to a target
string y[1..n], we can perform various transformation operations.
Our goal is, given x and y, to produce a series of transformations
that change x to y. We use an array ́z—assumed to be large enough to
hold all the characters it will need—to hold the intermediate results.
Initially, z is empty, and at termination, we should have z[j] = y[j]
for j = 1, 2,…, n. We maintain current indices i into x and j into
z, and the operations are allowed to alter z and these indices.
Initially, i = j = 1. We are required to examine every character in x
during the transformation, which means that at the end of the sequence
of transformation operations, we must have i = m + 1.We may choose from among six transformation operations:
Copy a character from x to z by setting ́z[j] = x[i] and then incrementing both i and j. This operation examines x[i].
Replace a character from x by another character c, by setting z[j] = c, and then incrementing both i and j . This operation examines x[i].
Delete a character from x by incrementing i but leaving j alone. This operation examines x[i].
Insert the character c into z by setting z[j] = c and then incrementing j , but leaving i alone. This operation examines no
characters of x.Twiddle (i.e., exchange) the next two characters by copying them from x to z but in the opposite order; we do so by setting z[j] = x[i
+ 1] and ́z[j + 1] = x[i] and then setting i = i + 2 and j = j + 2. This operation examines x[i] and x[i + 1].Kill the remainder of x by setting i = m + 1. This operation examines all characters in x that have not yet been examined. This
operation, if performed, must be the final operation.a. Given two sequences x[1..m] and y[1..n] and set of transformation
operation costs, the edit distance from x to y is the cost of the
least expensive operation sequence that transforms x to y. Describe a
dynamic-programming algorithm that finds the edit distance from
x[1..m] to y[1..n] and prints an optimal operation sequence. Analyze
the running time and space requirements of your algorithm
My solution is O(nm+m+n).
Code:
extern crate rand;
use std::collections::HashMap;
use std::collections::VecDeque;
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum EditOp {
Copy,
Replace,
Delete,
Insert,
Twiddle,
Kill,
}
impl EditOp {
fn valid(&self,
i : usize,
j : usize,
iprev : char,
jprev : char,
icurrent: char,
jcurrent: char) -> bool {
let threshold = |p| i != p || j == p;
match *self {
EditOp::Copy => threshold(1)
&& icurrent == jcurrent,
EditOp::Replace => threshold(1),
EditOp::Delete => i > 1,
EditOp::Twiddle => i > 1 && j > 1
&& threshold(2)
&& icurrent == jprev
&& iprev == jcurrent,
_ => true,
}
}
fn displacement(&self) -> (usize, usize) {
match *self {
EditOp::Copy => (1, 1),
EditOp::Replace => (1, 1),
EditOp::Delete => (1, 0),
EditOp::Insert => (0, 1),
EditOp::Twiddle => (2, 2),
EditOp::Kill => (0, 0),
}
}
fn apply(&self,
from_itr: &mut std::str::Chars,
to_itr : &mut std::str::Chars,
result : &mut String) {
match *self {
EditOp::Copy => {
result.push(from_itr.next().unwrap());
to_itr.next();
},
EditOp::Replace => {
result.push(to_itr.next().unwrap());
from_itr.next();
},
EditOp::Delete => {
from_itr.next();
},
EditOp::Insert => {
result.push(to_itr.next().unwrap());
},
EditOp::Twiddle => {
let second = from_itr.next().unwrap();
let first = from_itr.next().unwrap();
result.push(first);
result.push(second);
to_itr.next();
to_itr.next();
},
EditOp::Kill => {},
}
}
}
fn create_tables(from_len: usize,
to_len : usize,
costs : &HashMap<EditOp, usize>)
-> (Vec<Vec<EditOp>>, Vec<Vec<usize>>) {
let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
let delete_cost = costs.get(&EditOp::Delete).map(|c| *c).unwrap_or(0);
for (i, c) in (1.. from_len).map(|i|(i, i * delete_cost)) {
cost_table[i][0] = c;
op_table[i][0] = EditOp::Delete;
}
(op_table , cost_table)
}
fn edit_distance(from : &str,
to : &str,
costs: &HashMap<EditOp, usize>)
-> (usize, VecDeque<EditOp>){
if costs.is_empty() {
panic!("map of costs is empty");
}
let from_len = from.chars().count();
let to_len = to.chars().count();
let (mut op_table, mut cost_table) = create_tables(from_len + 1,
to_len + 1,
costs);
let (mut lasti, mut lastj) = (' ', ' ');
for (i, ichar) in from.chars().enumerate() {
for (j, jchar) in to.chars().enumerate() {
let (i, j) = (i + 1, j + 1);
let (op, cost) = costs.iter()
.filter(|&(x, _)| !x.eq(&EditOp::Kill))
.filter(|&(x, _)| x.valid(i, j, lasti, lastj,
ichar, jchar))
.map(|(x, c)| {
let (di, dj) = x.displacement();
(x, cost_table[i - di][j - dj] + c)
})
.min_by_key(|&(_, c)| c)
.unwrap();
cost_table[i][j] = cost;
op_table[i][j] = *op;
lastj = jchar;
}
lasti = ichar;
}
let no_kill_cost = cost_table[from_len][to_len];
let (i, cost, kill) = costs.get(&EditOp::Kill)
.map(|kc| min_with_kill(&cost_table,
from_len,
to_len, *kc))
.unwrap_or((from_len, no_kill_cost, None));
(cost, build_operations(kill, &op_table, i, to_len))
}
fn edit(from: &str, to: &str, ops: &VecDeque<EditOp>) -> String {
let mut result = String::new();
let mut from_itr = from.chars();
let mut to_itr = to.chars();
for op in ops {
op.apply(&mut from_itr, &mut to_itr, &mut result);
}
result
}
fn min_with_kill(cost_table: &Vec<Vec<usize>>,
from_size : usize,
to_size : usize,
kill_cost : usize ) -> (usize, usize, Option<EditOp>) {
let no_kill_cost = cost_table[from_size][to_size];
(1..from_size).map(|i|(i, cost_table[i][to_size] + kill_cost) )
.map(|(i, c)|(i, c, Some(EditOp::Kill)))
.chain(Some((from_size, no_kill_cost, None)).into_iter())
.min_by_key(|&(_, cost, _)| cost).unwrap()
}
fn build_operations(kill : Option<EditOp>,
op_table: &Vec<Vec<EditOp>>,
i : usize,
j : usize) -> VecDeque<EditOp> {
let itr = std::iter::repeat(()).scan((i, j), |s,_| {
let op = op_table[s.0][s.1];
let (id, jd) = op.displacement();
*s = (s.0 - id, s.1 - jd);
Some(op)
}).take_while(|op| !op.eq(&EditOp::Kill));
let mut stack = VecDeque::new();
for op in kill.into_iter().chain(itr) {
stack.push_front(op);
}
stack
}
fn main() {
let mut map = HashMap::new();
map.insert(EditOp::Copy, 2);
map.insert(EditOp::Replace, 3);
map.insert(EditOp::Delete, 2);
map.insert(EditOp::Insert, 4);
map.insert(EditOp::Twiddle, 1);
map.insert(EditOp::Kill, 1);
let from = "▓▓▓▓n
▒▒▒▓▓n
▒▒▒▒▒▓n
▒▒▒▒▒▒▓n
▒▒▒▒▒▒▓n
▒▒▒▒▒▒▒▓n
▒▒▒▒▒▒▒▓▓▓n
▒▓▓▓▓▓▓░░░▓n
▒▓░░░░▓░░░░▓n
▓░░░░░░▓░▓░▓n
▓░░░░░░▓░░░▓n
▓░░▓░░░▓▓▓▓n
▒▓░░░░▓▒▒▒▒▓n
▒▒▓▓▓▓▒▒▒▒▒▓n
▒▒▒▒▒▒▒▒▓▓▓▓n
▒▒▒▒▒▓▓▓▒▒▒▒▓n
▒▒▒▒▓▒▒▒▒▒▒▒▒▓n
▒▒▒▓▒▒▒▒▒▒▒▒▒▓n
▒▒▓▒▒▒▒▒▒▒▒▒▒▒▓n
▒▓▒▓▒▒▒▒▒▒▒▒▒▓n
▒▓▒▓▓▓▓▓▓▓▓▓▓n
▒▓▒▒▒▒▒▒▒▓n
▒▒▓▒▒▒▒▒▓";
let to = "│▒│ /▒/n
│▒│/▒/n
│▒/▒/─┬─┐n
│▒│▒|▒│▒│n
┌┴─┴─┐-┘─┘n
│▒┌──┘▒▒▒│n
└┐▒▒▒▒▒▒ ";
let (c, s) = edit_distance(&from, &to, &map);
let result = edit(&from, &to, &s);
println!("cost = {}", c);
println!("{:?}", s);
println!("{}", result);
}
Test:
use rand::Rng;
fn random_string(size: usize) -> String {
(0..size).map(|_|rand::random::<char>()).collect()
}
#[test]
fn test_for_accurate_edition(){
let mut map = HashMap::new();
map.insert(EditOp::Copy, 2);
map.insert(EditOp::Replace, 3);
map.insert(EditOp::Delete, 2);
map.insert(EditOp::Insert, 4);
map.insert(EditOp::Twiddle, 1);
map.insert(EditOp::Kill, 1);
let size_generator = || rand::thread_rng().gen_range(100, 500);
for _ in 0..100 {
let from = random_string(size_generator());
let to = random_string(size_generator());
let (_, s) = edit_distance(&from, &to, &map);
let result = edit(&from, &to, &s);
assert_eq!(to, result);
}
}
Solution
Here’s a lightweight review…
- You can combine imports with
{}
:use std::collections::{HashMap, VecDeque};
- Consider importing enum variants in methods that exhaustively match on a lot of them. This helps avoid duplication and rightward pushing of code.
- Accept a generic type that implements an iterator, instead of the specific
std::str::Chars
. - Introduce more and unique types, especially when you have repeated parallel names (
i
,iprev
,icurrent
). This also allows you to move functionality into methods. - Introduce type aliases. This is a tiny step towards official types, but gives an opportunity to introduce names and shorten them.
- Instead of
VecDeque
, use a normalVec
, then you can useIterator::collect
and reverse it. - Use
a == &b
instead ofa.eq(&b)
. - I always recommend using
expect
instead ofunwrap
. When the panic happens, you will be much happier having some context of where the failure is. - Maybe introduce a dedicated struct for the cost mapping. Then you can implement
Default
, and there’s no chance of missing a key. - Use
&[T]
instead of&Vec<T>
. Faster and more usable. - [Clippy] Combine identical
match
arms withFoo | Bar
. - [Clippy] Use
map_or
instead ofmap().unwrap_or
. - [Clippy] Can use
cloned
instead of|x| *x
. - [Clippy] Calls to
edit
andedit_distance
have extra references. - Check out rustfmt, which, let’s say disagrees, with some of your style choices.
Unfortunately, I found the code very dense, which made reading over it fairly difficult. It’s possible some more verbose names could help. Here’s a cherry-picked sample of code that takes a second to mentally process :
.map(|(x, c)| {
let (di, dj) = x.displacement();
(x, cost_table[i - di][j - dj] + c)
})
extern crate rand;
use std::collections::HashMap;
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
enum EditOp {
Copy,
Replace,
Delete,
Insert,
Twiddle,
Kill,
}
#[derive(Debug, Copy, Clone)]
struct FindBetterName {
idx: usize,
prev: char,
current: char,
}
impl FindBetterName {
fn valid_twiddle(&self, other: &FindBetterName) -> bool {
self.idx > 1 && other.idx > 1 && self.current == other.prev && self.prev == other.current
}
}
impl EditOp {
fn valid(&self, i: FindBetterName, j: FindBetterName) -> bool {
use EditOp::*;
let threshold = |p| i.idx != p || j.idx == p;
match *self {
Copy => threshold(1) && i.current == j.current,
Replace => threshold(1),
Delete => i.idx > 1,
Twiddle => threshold(2) && i.valid_twiddle(&j),
_ => true,
}
}
fn displacement(&self) -> (usize, usize) {
use EditOp::*;
match *self {
Copy | Replace => (1, 1),
Delete => (1, 0),
Insert => (0, 1),
Twiddle => (2, 2),
Kill => (0, 0),
}
}
fn apply<F, T>(&self, mut from_itr: F, mut to_itr: T, result: &mut String)
where F: Iterator<Item = char>,
T: Iterator<Item = char>
{
use EditOp::*;
match *self {
Copy => {
result.push(from_itr.next().unwrap());
to_itr.next();
}
Replace => {
result.push(to_itr.next().unwrap());
from_itr.next();
}
Delete => {
from_itr.next();
}
Insert => {
result.push(to_itr.next().unwrap());
}
Twiddle => {
let second = from_itr.next().unwrap();
let first = from_itr.next().unwrap();
result.push(first);
result.push(second);
to_itr.next();
to_itr.next();
}
Kill => {}
}
}
}
type CostMap = HashMap<EditOp, usize>;
fn create_tables(from_len: usize,
to_len: usize,
costs: &CostMap)
-> (Vec<Vec<EditOp>>, Vec<Vec<usize>>) {
let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
let delete_cost = costs.get(&EditOp::Delete).cloned().unwrap_or(0);
for (i, c) in (1..from_len).map(|i| (i, i * delete_cost)) {
cost_table[i][0] = c;
op_table[i][0] = EditOp::Delete;
}
(op_table, cost_table)
}
fn edit_distance(from: &str, to: &str, costs: &CostMap) -> (usize, Operations) {
if costs.is_empty() {
panic!("map of costs is empty");
}
let from_len = from.chars().count();
let to_len = to.chars().count();
let (mut op_table, mut cost_table) = create_tables(from_len + 1, to_len + 1, costs);
let (mut lasti, mut lastj) = (' ', ' ');
for (i, ichar) in from.chars().enumerate() {
for (j, jchar) in to.chars().enumerate() {
let (i, j) = (i + 1, j + 1);
let ii = FindBetterName {
idx: i,
prev: lasti,
current: ichar,
};
let jj = FindBetterName {
idx: j,
prev: lastj,
current: jchar,
};
let (op, cost) = costs.iter()
.filter(|&(x, _)| x != &EditOp::Kill)
.filter(|&(x, _)| x.valid(ii, jj))
.map(|(x, c)| {
let (di, dj) = x.displacement();
(x, cost_table[i - di][j - dj] + c)
})
.min_by_key(|&(_, c)| c)
.unwrap();
cost_table[i][j] = cost;
op_table[i][j] = *op;
lastj = jchar;
}
lasti = ichar;
}
let no_kill_cost = cost_table[from_len][to_len];
let (i, cost, kill) = costs.get(&EditOp::Kill)
.map_or((from_len, no_kill_cost, None), |kc| {
min_with_kill(&cost_table, from_len, to_len, *kc)
});
(cost, build_operations(kill, &op_table, i, to_len))
}
type Operations = Vec<EditOp>;
type OperationsSlice<'a> = &'a [EditOp];
fn edit(from: &str, to: &str, ops: OperationsSlice) -> String {
let mut result = String::new();
let mut from_itr = from.chars();
let mut to_itr = to.chars();
for op in ops {
op.apply(&mut from_itr, &mut to_itr, &mut result);
}
result
}
fn min_with_kill(cost_table: &[Vec<usize>],
from_size: usize,
to_size: usize,
kill_cost: usize)
-> (usize, usize, Option<EditOp>) {
let no_kill_cost = cost_table[from_size][to_size];
(1..from_size)
.map(|i| (i, cost_table[i][to_size] + kill_cost))
.map(|(i, c)| (i, c, Some(EditOp::Kill)))
.chain(Some((from_size, no_kill_cost, None)).into_iter())
.min_by_key(|&(_, cost, _)| cost)
.unwrap()
}
fn build_operations(kill: Option<EditOp>,
op_table: &[Vec<EditOp>],
i: usize,
j: usize)
-> Operations {
let itr = std::iter::repeat(())
.scan((i, j), |s, _| {
let op = op_table[s.0][s.1];
let (id, jd) = op.displacement();
*s = (s.0 - id, s.1 - jd);
Some(op)
})
.take_while(|op| op != &EditOp::Kill);
let mut stack: Vec<_> = kill.into_iter().chain(itr).collect();
stack.reverse();
stack
}
fn main() {
let mut cost_map = HashMap::new();
cost_map.insert(EditOp::Copy, 2);
cost_map.insert(EditOp::Replace, 3);
cost_map.insert(EditOp::Delete, 2);
cost_map.insert(EditOp::Insert, 4);
cost_map.insert(EditOp::Twiddle, 1);
cost_map.insert(EditOp::Kill, 1);
let from = "▓▓▓▓n
▒▒▒▓▓n
▒▒▒▒▒▓n
▒▒▒▒▒▒▓n
▒▒▒▒▒▒▓n
▒▒▒▒▒▒▒▓n
▒▒▒▒▒▒▒▓▓▓n
▒▓▓▓▓▓▓░░░▓n
▒▓░░░░▓░░░░▓n
▓░░░░░░▓░▓░▓n
▓░░░░░░▓░░░▓n
▓░░▓░░░▓▓▓▓n
▒▓░░░░▓▒▒▒▒▓n
▒▒▓▓▓▓▒▒▒▒▒▓n
▒▒▒▒▒▒▒▒▓▓▓▓n
▒▒▒▒▒▓▓▓▒▒▒▒▓n
▒▒▒▒▓▒▒▒▒▒▒▒▒▓n
▒▒▒▓▒▒▒▒▒▒▒▒▒▓n
▒▒▓▒▒▒▒▒▒▒▒▒▒▒▓n
▒▓▒▓▒▒▒▒▒▒▒▒▒▓n
▒▓▒▓▓▓▓▓▓▓▓▓▓n
▒▓▒▒▒▒▒▒▒▓n
▒▒▓▒▒▒▒▒▓";
let to = "│▒│ /▒/n
│▒│/▒/n
│▒/▒/─┬─┐n
│▒│▒|▒│▒│n
┌┴─┴─┐-┘─┘n
│▒┌──┘▒▒▒│n
└┐▒▒▒▒▒▒ ";
let (c, s) = edit_distance(from, to, &cost_map);
let result = edit(from, to, &s);
println!("cost = {}", c);
println!("{:?}", s);
println!("{}", result);
}
It seems my solution is incorrect for some inputs.
The main problem is in the valid
and create_tables
functions.
For some reason I added a several conditions in valid
so that i
will reach zero always after j
and this is wrong. For example "a" -> "ha"
with insert and copy should be valid. But for this to work properly all j
so that i == 0
most have insert in the table (The only way to edit an empty string is with insert).
So valid
and create_table
will end up like this:
fn create_tables(from_len: usize,
to_len : usize,
costs : &HashMap<EditOp, usize>)
-> (Vec<Vec<EditOp>>, Vec<Vec<usize>>) {
let mut op_table = vec![ vec![EditOp::Kill; to_len]; from_len ];
let mut cost_table = vec![ vec![0usize; to_len]; from_len ];
let delete_cost = *costs.get(&EditOp::Delete).expect("delete most be in costs");
let insert_cost = *costs.get(&EditOp::Insert).expect("insert most be in costs");
for (i, c) in (1.. from_len).map(|i|(i, i * delete_cost)) {
cost_table[i][0] = c;
op_table[i][0] = EditOp::Delete;
}
for (j, c) in (1.. to_len).map(|j|(j, j * insert_cost)) {
cost_table[0][j] = c;
op_table[0][j] = EditOp::Insert;
}
(op_table , cost_table)
}
fn valid(&self,
i : usize,
j : usize,
iprev : char,
jprev : char,
icurrent: char,
jcurrent: char) -> bool {
match *self {
EditOp::Copy => icurrent == jcurrent,
EditOp::Twiddle => i > 1 && j > 1
&& icurrent == jprev
&& iprev == jcurrent,
_ => true,
}
}