Text processing/2

This version does validation with a single Perl 6 regex that is much more readable than the typical regex, and arguably expresses the data structure more straightforwardly. Here we use normal quotes for literals, and \h for horizontal whitespace.

Variables like $good-record that are going to be autoincremented do not need to be initialized.

The .push method on a hash is magical and loses no information; if a duplicate key is found in the pushed pair, an array of values is automatically created of the old value and the new value pushed. Hence we can easily track all the lines that a particular duplicate occurred at.

The .all method does "junctional" logic: it autothreads through comparators as any English speaker would expect. Junctions can also short-circuit as soon as they find a value that doesn't match, and the evaluation order is up to the computer, so it can be optimized or parallelized.

The final line simply greps out the pairs from the hash whose value is an array with more than 1 element. (Those values that are not arrays nevertheless have a .elems method that always reports 1.) The .pairs is merely there for clarity; grepping a hash directly has the same effect. Note that we sort the pairs after we've grepped them, not before; this works fine in Perl 6, sorting on the key and value as primary and secondary keys. Finally, pairs and arrays provide a default print format that is sufficient without additional formatting in this case.

my $good-records;
my $line;
my %dates;

for lines() {
    $line++;
    / ^
    (\d ** 4 '-' \d\d '-' \d\d)
    [ \h+ \d+'.'\d+ \h+ ('-'?\d+) ] ** 24
    $ /
        or note "Bad format at line $line" and next;
    %dates.push: $0 => $line;
    $good-records++ if $1.all >= 1;
}

say "$good-records good records out of $line total";

say 'Repeated timestamps (with line numbers):';
.say for sort %dates.pairs.grep: *.value.elems > 1;

Output:

Output:

5017 good records out of 5471 total
Repeated timestamps (with line numbers):
1990-03-25 => [84 85]
1991-03-31 => [455 456]
1992-03-29 => [819 820]
1993-03-28 => [1183 1184]
1995-03-26 => [1910 1911]