Searching for an idiomatic Rust implementation of Minimum Edit Distance (LeetCode #72)

Posted on

Problem

I am solving LeetCode Dynamic Programming challenges (this one is #72, at https://leetcode.com/problems/edit-distance).

Here below, I’ve implemented a Rust solution for the minimum edit distance between two strings problem. (Note that this solution works and passes all the test cases on LeetCode)

use std::collections::HashMap;
use std::cmp::min;

impl Solution {
    pub fn min_distance(word1: String, word2: String) -> i32 {
        let mut D: HashMap<(i32, i32), i32> = HashMap::new();
        let m = word1.len() as i32;
        let n = word2.len() as i32;

        let w1: Vec<char> = word1.chars().collect();
        let w2: Vec<char> = word2.chars().collect();

        for i in 0..=m {
            D.insert((i, 0), i);
        }
        for j in 0..=n {
            D.insert((0, j), j);
        }

        for i in 1..=m {
            for j in 1..=n {
                if w1[(i-1) as usize] == w2[(j-1) as usize] { 
                    D.insert((i, j), *D.get(&(i-1, j-1)).unwrap());
                } else {
                    let p = *D.get(&(i - 1, j - 1)).unwrap();
                    let q = *D.get(&(i - 1, j)).unwrap();
                    let r = *D.get(&(i, j - 1)).unwrap();

                    D.insert((i, j), 1 + min(p, min(q, r)));
                }
            }
        }

        return *D.get(&(m, n)).unwrap();
    }
}

I’ve actually also solved the same thing in JAVA, which I have a much better command on…

public class EditDistance {
    public static int minDistance(String word1, String word2) {
        int m = word1.length();
        int n = word2.length();
        int[][] D = new int[m+1][n+1];

        for(int i=0; i<=m; ++i) D[i][0] = i;
        for(int j=0; j<=n; ++j) D[0][j] = j;

        for(int i=1; i<=m; ++i) {
            for(int j=1; j<=n; ++j) {
                if(word1.charAt(i-1) == word2.charAt(j-1)) D[i][j] = D[i-1][j-1];
                else D[i][j] = 1 + Math.min(D[i-1][j-1], Math.min(D[i-1][j], D[i][j-1]));
            }
        }
        return D[m][n];
    }

    public static void main(String[] args) {
        System.out.println(minDistance("intention", "execution"));
    }
}

Something about my Rust solution seems off to me. Not really sure but here are some of my irritations:

  1. I couldn’t find a better way to iterate a string and refer to it by its index easily other than having to convert it to a vector.

  2. I think I’m unwrap()-ing too much. Although I’m not aiming for code golf here, it still feels subconsciously that I could have avoided so many unwrap()s.

  3. I feel a bit bad because I kind of word to word translated my solution from Java… perhaps there’s a more Rust-idiomatic way of doing whatever I did?

Looking forward to your advise and help 🙂 Thanks in advance!

Solution

The first thing I noticed about both versions of your code is the plenitude of single-letter names, something you can improve upon. Incidentally, Rust uses snake_case for variables, so the compiler would have warned you about D.

Now, let’s talk about the struggles I presume you have by reading the Rust code.

First of all, you probably felt forced to convert between i32 and usize all the time. The decision to return i32 instead of usize is a typical indicator of the less-than-ideal quality of LeetCode interfaces (as usual). In fact, it goes worse – impl Solution is not a good idea in Rust, and there is no reason to take the ownership of strings here. My advice here would be to perform all calculations using usize and only convert to i32 at the very end.

I assume HashMap<(i32, i32), i32> results from the lack of multi-dimensional vector support in Rust. Indeed, my first choice would be to use ndarray in this scenario, which is not available in the standard library. You can make a small wrapper for this:

struct Vec2<T> {
    elements: Vec<T>,
    n_rows: usize,
    n_columns: usize,
}

and derive Index<(usize, usize)> and IndexMut<(usize, usize)>. Or you could sacrifice some performance and use Vec<Vec<usize>>.

Instead of

let w1: Vec<char> = word1.chars().collect();
let w2: Vec<char> = word2.chars().collect();

I would simply use .as_bytes(), assuming that LeetCode doesn’t consider multi-byte characters.


Here’s how I reassembled everything according to the Java version: (not tested)

pub fn min_distance(word_a: String, word_b: String) -> i32 {
    let word_a = word_a.as_bytes();
    let word_b = word_b.as_bytes();

    let len_a = word_a.len();
    let len_b = word_b.len();

    let mut matrix = vec![vec![0; len_b + 1]; len_a + 1];

    for i in 0..=len_a {
        matrix[i][0] = i;
    }
    for j in 0..=len_b {
        matrix[0][j] = j;
    }

    for i in 1..=len_a {
        for j in 1..=len_b {
            matrix[i][j] = if word_a[i - 1] == word_b[j - 1] {
                matrix[i - 1][j - 1]
            } else {
                1 + matrix[i - 1][j - 1]
                    .min(matrix[i - 1][j])
                    .min(matrix[i][j - 1])
            };
        }
    }

    matrix[len_a][len_b] as i32
}

The restriction of LeetCode (poor interface, no external crates) caused me to be lazy, so I used nested Vecs. In a real-world scenario, ndarray::Array2 would be a far better choice.

Besides, this is still far from optimal for Rust, where we tend to avoid manual indexing and - 1. Taking into account the algorithmic nature of this application, I won’t go deeper into this.

Leave a Reply

Your email address will not be published. Required fields are marked *