Awk program for extracting unique values from a k1=v1,k2=v2,… list

Posted on

Problem

A single string consisting of comma-separated pairs in the format <key>=<value>, like

ABC=https://host:443/topic1,DEF=https://host:443/topic1,GHI=https://host:443/topic3,JKL=https://host:443/topic3

must be converted to a line containing the unique set (for which the order does not matter) of values, separated by an empty space, i. e.:

https://host:443/topic1 https://host:443/topic3

via an Awk program.

The background idea is to convert one variant of CLI-arguments into another one in a shell script, like:

#!/bin/bash

all_args="ABC=https://host:443/topic1,DEF=https://host1:443/topic1,GHI=https://host:443/topic3,JKL=https://host:443/topic3"
command1 $all_args
command2 $(echo $all_args | awk -f extractor.awk)

The solution:

BEGIN { FS="," }
{
    for ( i = 1; i <= NF; i++) {
        split($i, arr, "=")
        vals[arr[2]] = arr[2]
    }
    for (v in vals) printf(v " ") 
}

Solution

The array values are irrelevant. Instead of

        vals[arr[2]] = arr[2]

you could use

        vals[arr[2]] = 1

to indicate you’re storing flags in vals, and it is the keys that are important later, not the values.

If your input is guaranteed to be “well formed”, and you can assume there is exactly one , between each pair, and one = inside each pair, you can separate fields on BOTH the comma and the equals characters. Then you could extract every second field and flag those in your vals set. This eliminates the need for split().

BEGIN { FS="[,=]" }
{
        for( i = 2; i <= NF; i += 2) {
                vals[$i] = 1
        }
        for (v in vals) printf(v " ")
}

Leave a Reply

Your email address will not be published. Required fields are marked *