Multiple-alias syntax for foreach

Preamble

Author:
Sponsor: Nicholas Clark <nick@ccl4.org>
ID:      0001
Status:  Implemented

Abstract

Implement the currently illegal syntax for my ($key, $value) (%hash) { ... } to act as two-at-a-time iteration over hashes. This approach is not specific to hashes - it generalises to n-at-a-time over any list.

Motivation

The each function is generally discouraged due to implementation details, but is still the most “pretty” and concise way to write such a loop:

while (my ($key, $value) = each %hash) { ... }

An alternative to this would be a syntax to alias multiple items per iteration in a for loop.

for my ($key, $value) (%hash) { ... }

This generalizes as a “bundler” - to alias a number of elements from the list equal to the number of variables specified. Such as:

for my ($foo, $bar, $baz) (@array) {
    # $foo, $bar, and $baz are the next three elements of @array,
    # or undef if overflowed

Rationale

The existing syntax to iterate over the keys and values of a hash:

while (my ($key, $value) = each %hash) { ... }

suffers from several problems

For correctness it assumes that the internal state of the hash being “clean” - to be robust one should reset the iterator first with keys %hash; in void context
It’s hard to teach to beginners - to understand what is going on here one needs to know
- list assignment
- empty lists are false; non-empty lists are true
- that the hash holds an internal iterator
You can’t modify the hash inside the loop without confusing the iterator

You can write it in two lines as

for my $k (keys %hash) {
    my $v = $hash{$h};
    ...
}

(three if %hash is actually a complex expression you don’t want to evaluate twice)

but that’s not perceived as an obvious “win” over the one-liner.

The more general n-at-a-time iteration of an array problem doesn’t have a simple generic solution. For example, you can destructively iterate an array:

while (@list) {
    my ($x, $y, $z) = splice @list, 0, 3;
    ...
}

(with that 3 needing to be written out explicitly - it can’t derived from the number of lexicals)

You can iterate over an list non-destructively like this:

my @temp = qw(Ace Two Three Four Five Six Seven Eight Nine Ten Jack Queen King);
for (my $i = 0; $i < @temp; $i += 3) {
    my ($foo, $bar, $baz) = @temp[$i .. $i + 2];
    print "$foo $bar $baz\n";
}

but that

is verbose
is hard to get right
repeats the three-ness in three places
needs a temporary array (hence a copy) for a list
also copies all the values into the lexicals

The proposed syntax solves all of these problems.

Specification

diff --git a/pod/perlsyn.pod b/pod/perlsyn.pod
index fe511f052e..490119d00e 100644
--- a/pod/perlsyn.pod
+++ b/pod/perlsyn.pod
@@ -282,6 +282,14 @@ The following compound statements may be used to control flow:

     PHASE BLOCK

+As of Perl 5.36, you can iterate over multiple values at a time by specifying
+a list of lexicals within parentheses
+
+    no warnings "experimental::for_list";
+    LABEL for my (VAR, VAR) (LIST) BLOCK
+    LABEL for my (VAR, VAR) (LIST) BLOCK continue BLOCK
+    LABEL foreach my (VAR, VAR) (LIST) BLOCK
+    LABEL foreach my (VAR, VAR) (LIST) BLOCK continue BLOCK
+
 If enabled by the experimental C<try> feature, the following may also be used

     try BLOCK catch (VAR) BLOCK
@@ -549,6 +557,14 @@ followed by C<my>.  To use this form, you must enable the C<refaliasing>
 feature via C<use feature>.  (See L<feature>.  See also L<perlref/Assigning
 to References>.)

+As of Perl 5.36, you can iterate over a list of lexical scalars n-at-a-time.
+If the size of the LIST is not an exact multiple of number of iterator
+variables, then on the last iteration the "excess" iterator variables are
+undefined values, much like if you slice beyond the end of an array.  You
+can only iterate over scalars - unlike list assignment, it's not possible to
+use C<undef> to signify a value that isn't wanted.  This is a limitation of
+the current implementation, and might be changed in the future.
+
 Examples:

     for (@ary) { s/foo/bar/ }
@@ -574,6 +590,17 @@ Examples:
         # do something which each %hash
     }

+    foreach my ($foo, $bar, $baz) (@list) {
+        # do something three-at-a-time
+    }
+
+    foreach my ($key, $value) (%hash) {
+        # iterate over the hash
+        # The hash is eagerly flattened to a list before the loop starts,
+        # but as ever keys are copies, values are aliases.
+        # This is the same behaviour as for $var (%hash) {...}
+    }
+
 Here's how a C programmer might code up a particular algorithm in Perl:

     for (my $i = 0; $i < @ary1; $i++) {

Backwards Compatibility

The new syntax is a syntax error on existing Perls. It generates this error message

Missing $ on loop variable at /home/nick/test/three-at-a-time.pl line 4.

Existing static tooling should be able to recognise it as a syntax error, but without changes wouldn’t be able to give any better diagnostics.

The proposed implementation has tests and patches for B::Deparse and B::Concise. Devel::Cover, Devel::NYTProf and Perl::Critic handle the new syntax just fine.

I don’t think that we have any way to emulate this syntax for earlier Perls. I don’t have any feel for what sort of API we would need to add to the parser to make it possible in the future, and whether any API we could add would be fragile and tightly coupled to the exact current implementation.

Security Implications

We don’t think that this change is likely to cause CVEs or similar problems.

Examples

The examples in the Motivation seem sufficient.

Prototype Implementation

See https://github.com/nwc10/perl5/commits/smoke-me/nicholas/pp_iter

Future Scope

Permit `undef` in the list of scalars

for my ($a, undef, $c) (1 .. 9) { ... }

It’s a reasonable generalisation of list assignment, where you can assign to undef to mean “ignore this value”. It’s also safe syntax to implement for foreach (either as part of this proposal, or at a later date), and likely will be useful in some specific cases. But not having it doesn’t hinder the utility of the basic proposal.

The easiest implementation of n-at-a-time foreach is if there are exactly n scalars, all declared in the for loop itself, because this way they occupy adjacent Pad slots. This means that there is only one extra integer to store in the optree, which used both to calculate the n-at-a-time and the addresses of the target variables. adding undef to the mix rules out an obvious simple, clear implementation.

If we discover that there is a good way to add undef without much increased complexity, then we should consider doing this.

Rejected Ideas

Permit @array or %hash in the list of lexicals

for my ($key, $value, %rest) (%hash) { ... }
for my ($first, $second, @rest) (@array) {... }

Syntactically these all “work”, and don’t violate the assumption that all lexicals are in adjacent Pad slots. But it would add complexity to the runtime. Generalising 1 scalar at a time to n at a time is mostly just adding a C for loop around some existing (known working) code.

Implementing these would mean adding code for what is basically a funky way of writing

{ my ($key, $value, %rest) = %hash; ... }
{ my ($first, $second, @rest) = @array; ... }

Permit `our` as well as `my`

Using our instead of my works in the grammar. ie we can parse this:

for our ($foo, $bar, $baz) (@array) {

However, the implementation would be far more complex. Each package variable would need a pair of GV and RV2GV ops to set it up for aliasing in a lexical, meaning that we’d need to create a list to feed into ENTERITER, and have it process that list to set up Pads. This complexity isn’t worth it.

Alternative parsable syntaxes suggested by Father Chrysostomos

When this syntax was suggested in 2017, Father Chrysostomos observed that in addition to the first syntax, the next two are also currently errors and could be parsed:

for my ($key, $value) (%hash) { ... }
for my $key, my $value (%hash) { ... }
for $foo, $bar (%hash) { ... }

He notes that it’s only this illegal syntax that could not be parsed:

for ($foo, $bar) (%hash) { ... } # syntax error

https://www.nntp.perl.org/group/perl.perl5.porters/2017/04/msg243856.html

Strictly we also can’t parse this:

for (my $key, my $value) (%hash) { ... }

meaning that we can’t offer choice of syntax in for analogous to:

(my $foo, my $bar, my $baz) = @ARGV
my ($foo, $bar, $baz) = @ARGV

which parse to the same optree.

However, these two lines are not the same:

my $key, my $value = @pair;
my ($key, $value) = @pair;

hence we should not offer the alternative syntax:

for my $key, my $value (%hash) { ... }

For implementation reasons we only want to handle lexicals. This rules out:

for $foo, $bar (%hash) { ... }

So these alternative possible syntaxes should not be offered.

Should this be behind a feature guard?

As the new syntax was previously an error, and previously in similar situations we have not used a feature guard, we don’t think that we should this time.

Open Issues

Copyright

This document and code and documentation within it may be used, redistributed and/or modified under the same terms as Perl itself.