Mapping words to letters which contains

Posted on

Problem

Given sentence for example: ala ma kota, kot koduje w Javie kota I should to every letter occurring in the sentence I should map every string that this letter contains. So the output should be:

{a=[ala, javie, kota, ma], d=[koduje], e=[javie, koduje], i=[javie], j=[javie, koduje], k=[koduje, kot, kota], l=[ala], m=[ma], o=[koduje, kot, kota], t=[kot, kota], u=[koduje], v=[javie], w=[w]}

I have written algorithm for it, but I think that there is a better way do to this using streams (or somehow else).
Code;

String sentence = "ala ma kota, kot koduje w Javie kota .kota";

        String purifiedLowerCaseSentence = getPurifiedLowerCaseSentence(sentence);
        Set<Character> characters = getEveryCharacterOccurringInSentence(purifiedLowerCaseSentence);
        String[] splittedSentence = purifiedLowerCaseSentence.split("\s");
        Map<Character, Set<String>> characterWithAccordingWord = new LinkedHashMap<>();

        for (char iteratedCharacter : characters) {
            for (String iteratedWord : splittedSentence) {
                if (iteratedWord.indexOf(iteratedCharacter) >= 0) {
                    characterWithAccordingWord.computeIfAbsent(iteratedCharacter, k -> new TreeSet<>());
                    characterWithAccordingWord.get(iteratedCharacter).add(iteratedWord);
                }
            }
        }

And these two methods which are placed in other class:

static String getPurifiedLowerCaseSentence(String sentence) {
        return sentence.replaceAll("\p{P}", "").toLowerCase();
    }

    static Set<Character> getEveryCharacterOccurringInSentence(String sentence) {
        return sentence.chars()
                .mapToObj(e -> (char) e).collect(Collectors.toSet());
    }

Solution

Not sure about its efficiency or whether or not it improves your algorithm, but since you want to use streams, this would be an option:

public static void main(String[] args) {
        String sentence = "ala ma kota, kot koduje w Javie kota .kota";
        String sanitizedSentence = sentence.replaceAll("[^A-Za-z ]", "").toLowerCase();
        String[] words = sanitizedSentence.split(" ");
        Map<String,List<String>> result = sanitizedSentence
            .chars()
            .filter(Character::isAlphabetic)
            .distinct()
            .sorted()
            .mapToObj(c->(char)c)
            .map(Object::toString)
            .collect(Collectors.toMap(
                Function.identity(), 
                c->Stream.of(words).distinct().filter(word -> word.contains(c)).sorted().collect(Collectors.toList()),
                (a,b)->a,
                LinkedHashMap::new // <- to ensure they are shown in order when printing
            ));
        System.out.println(result);
    }

You actually have three nested loops there, while two would suffice (the indexOf loops the characters in the word).

For each word in the sentence
    For each character in the word
        Add the word to the set associated to the character

The solution is inefficient.

First of all, there is absolutely no need to calculate the set of characters in advance. you can iterate over the sentence and build the map during this iteration. you don’t need to know the keys in advance.

Furthermore, for each distinct character, you iterate over the entire sentence. there is really no need for this Cartesian-type processing. the map can be calculated with just one iteration over the sentence. This is providing you can parse the words without the need for split()

while we are in this subject, split by whitespace will leave the words with punctuation marks, so you would get “kota,” in the output, not to mention your example sentence contains two occurrences of “kota” that will appear in the output, so you need to decide if that’s what you want.

Leave a Reply

Your email address will not be published. Required fields are marked *