Huffman coding

By building a tree

This version uses nested Arrays to build a tree like shown in this diagram, and then recursively traverses the finished tree to accumulate the prefixes.

sub huffman (%frequencies) {
    my @queue = %frequencies.map({ [.value, .key] }).sort;
    while @queue > 1 {
        given @queue.splice(0, 2) -> ([$freq1, $node1], [$freq2, $node2]) {
            @queue = (|@queue, [$freq1 + $freq2, [$node1, $node2]]).sort;
        }
    }
    hash gather walk @queue[0][1], '';
}

multi walk ($node,            $prefix) { take $node => $prefix; }
multi walk ([$node1, $node2], $prefix) { walk $node1, $prefix ~ '0';
                                         walk $node2, $prefix ~ '1'; }

Without building a tree

This version uses an Array of Pairs to implement a simple priority queue. Each value of the queue is a Hash mapping from letters to prefixes, and when the queue is reduced the hashes are merged on-the-fly, so that the last one remaining is the wanted Huffman table.

sub huffman (%frequencies) {
    my @queue = %frequencies.map: { .value => (hash .key => '') };
    while @queue > 1 {
        @queue.=sort;
        my $x = @queue.shift;
        my $y = @queue.shift;
        @queue.push: ($x.key + $y.key) => hash $x.value.deepmap('0' ~ *),
                                               $y.value.deepmap('1' ~ *);
    }
    @queue[0].value;
}

# Testing

for huffman 'this is an example for huffman encoding'.comb.Bag {
    say "'{.key}' : {.value}";
}

# To demonstrate that the table can do a round trip:

say '';
my $original = 'this is an example for huffman encoding';

my %encode-key = huffman $original.comb.Bag;
my %decode-key = %encode-key.invert;
my @codes      = %decode-key.keys;

my $encoded = $original.subst: /./,      { %encode-key{$_} }, :g;
my $decoded = $encoded .subst: /@codes/, { %decode-key{$_} }, :g;

.say for $original, $encoded, $decoded;

Output:

'x' : 11000
'p' : 01100
'h' : 0001
'g' : 00000
'a' : 1001
'e' : 1101
'd' : 110011
's' : 0111
'f' : 1110
'c' : 110010
'm' : 0010
' ' : 101
'n' : 010
'o' : 0011
'u' : 10001
't' : 10000
'i' : 1111
'r' : 01101
'l' : 00001

this is an example for huffman encoding
1000000011111011110111110111101100101010111011100010010010011000000111011011110001101101101000110001111011100010100101010111010101100100011110011111101000000
this is an example for huffman encoding