Multiple-alias syntax for foreach
Preamble
Author:
Sponsor: Nicholas Clark <nick@ccl4.org>
ID: 0001
Status: Implemented
Abstract
Implement the currently illegal syntax
for my ($key, $value) (%hash) { ... }
to act as
two-at-a-time iteration over hashes. This approach is not specific to
hashes - it generalises to n-at-a-time over any list.
Motivation
The each
function is generally discouraged due to
implementation details, but is still the most “pretty” and concise way
to write such a loop:
while (my ($key, $value) = each %hash) { ... }
An alternative to this would be a syntax to alias multiple items per
iteration in a for
loop.
for my ($key, $value) (%hash) { ... }
This generalizes as a “bundler” - to alias a number of elements from the list equal to the number of variables specified. Such as:
for my ($foo, $bar, $baz) (@array) {
# $foo, $bar, and $baz are the next three elements of @array,
# or undef if overflowed
Rationale
The existing syntax to iterate over the keys and values of a hash:
while (my ($key, $value) = each %hash) { ... }
suffers from several problems
- For correctness it assumes that the internal state of the hash being
“clean” - to be robust one should reset the iterator first with
keys %hash;
in void context - It’s hard to teach to beginners - to understand what is going on
here one needs to know
- list assignment
- empty lists are false; non-empty lists are true
- that the hash holds an internal iterator
- You can’t modify the hash inside the loop without confusing the iterator
You can write it in two lines as
for my $k (keys %hash) {
my $v = $hash{$h};
...
}
(three if %hash
is actually a complex expression you
don’t want to evaluate twice)
but that’s not perceived as an obvious “win” over the one-liner.
The more general n-at-a-time iteration of an array problem doesn’t have a simple generic solution. For example, you can destructively iterate an array:
while (@list) {
my ($x, $y, $z) = splice @list, 0, 3;
...
}
(with that 3 needing to be written out explicitly - it can’t derived from the number of lexicals)
You can iterate over an list non-destructively like this:
my @temp = qw(Ace Two Three Four Five Six Seven Eight Nine Ten Jack Queen King);
for (my $i = 0; $i < @temp; $i += 3) {
my ($foo, $bar, $baz) = @temp[$i .. $i + 2];
print "$foo $bar $baz\n";
}
but that
- is verbose
- is hard to get right
- repeats the three-ness in three places
- needs a temporary array (hence a copy) for a list
- also copies all the values into the lexicals
The proposed syntax solves all of these problems.
Specification
diff --git a/pod/perlsyn.pod b/pod/perlsyn.pod
index fe511f052e..490119d00e 100644
--- a/pod/perlsyn.pod
+++ b/pod/perlsyn.pod
@@ -282,6 +282,14 @@ The following compound statements may be used to control flow:
PHASE BLOCK
+As of Perl 5.36, you can iterate over multiple values at a time by specifying
+a list of lexicals within parentheses
+
+ no warnings "experimental::for_list";
+ LABEL for my (VAR, VAR) (LIST) BLOCK
+ LABEL for my (VAR, VAR) (LIST) BLOCK continue BLOCK
+ LABEL foreach my (VAR, VAR) (LIST) BLOCK
+ LABEL foreach my (VAR, VAR) (LIST) BLOCK continue BLOCK
+
If enabled by the experimental C<try> feature, the following may also be used
try BLOCK catch (VAR) BLOCK
@@ -549,6 +557,14 @@ followed by C<my>. To use this form, you must enable the C<refaliasing>
feature via C<use feature>. (See L<feature>. See also L<perlref/Assigning
to References>.)
+As of Perl 5.36, you can iterate over a list of lexical scalars n-at-a-time.
+If the size of the LIST is not an exact multiple of number of iterator
+variables, then on the last iteration the "excess" iterator variables are
+undefined values, much like if you slice beyond the end of an array. You
+can only iterate over scalars - unlike list assignment, it's not possible to
+use C<undef> to signify a value that isn't wanted. This is a limitation of
+the current implementation, and might be changed in the future.
+
Examples:
for (@ary) { s/foo/bar/ }
@@ -574,6 +590,17 @@ Examples:
# do something which each %hash
}
+ foreach my ($foo, $bar, $baz) (@list) {
+ # do something three-at-a-time
+ }
+
+ foreach my ($key, $value) (%hash) {
+ # iterate over the hash
+ # The hash is eagerly flattened to a list before the loop starts,
+ # but as ever keys are copies, values are aliases.
+ # This is the same behaviour as for $var (%hash) {...}
+ }
+
Here's how a C programmer might code up a particular algorithm in Perl:
for (my $i = 0; $i < @ary1; $i++) {
Backwards Compatibility
The new syntax is a syntax error on existing Perls. It generates this error message
Missing $ on loop variable at /home/nick/test/three-at-a-time.pl line 4.
Existing static tooling should be able to recognise it as a syntax error, but without changes wouldn’t be able to give any better diagnostics.
The proposed implementation has tests and patches for
B::Deparse
and B::Concise
.
Devel::Cover
, Devel::NYTProf
and
Perl::Critic
handle the new syntax just fine.
I don’t think that we have any way to emulate this syntax for earlier Perls. I don’t have any feel for what sort of API we would need to add to the parser to make it possible in the future, and whether any API we could add would be fragile and tightly coupled to the exact current implementation.
Security Implications
We don’t think that this change is likely to cause CVEs or similar problems.
Examples
The examples in the Motivation seem sufficient.
Prototype Implementation
See https://github.com/nwc10/perl5/commits/smoke-me/nicholas/pp_iter
Future Scope
Permit undef
in the list of scalars
for my ($a, undef, $c) (1 .. 9) { ... }
It’s a reasonable generalisation of list assignment, where you can
assign to undef to mean “ignore this value”. It’s also safe syntax to
implement for foreach
(either as part of this proposal,
or at a later date), and likely will be useful in some
specific cases. But not having it doesn’t hinder the
utility of the basic proposal.
The easiest implementation of n-at-a-time
foreach
is if there are exactly n scalars, all
declared in the for
loop itself, because this way they
occupy adjacent Pad slots. This means that there is only one extra
integer to store in the optree, which used both to calculate the
n-at-a-time and the addresses of the target
variables. adding undef
to the mix rules out an obvious
simple, clear implementation.
If we discover that there is a good way to add undef
without much increased complexity, then we should consider doing
this.
Rejected Ideas
Permit @array or %hash in the list of lexicals
for my ($key, $value, %rest) (%hash) { ... }
for my ($first, $second, @rest) (@array) {... }
Syntactically these all “work”, and don’t violate the assumption that
all lexicals are in adjacent Pad slots. But it would add complexity to
the runtime. Generalising 1 scalar at a time to n at a
time is mostly just adding a C for
loop around some
existing (known working) code.
Implementing these would mean adding code for what is basically a funky way of writing
{ my ($key, $value, %rest) = %hash; ... }
{ my ($first, $second, @rest) = @array; ... }
Permit our
as well as
my
Using our
instead of my
works in the
grammar. ie we can parse this:
for our ($foo, $bar, $baz) (@array) {
However, the implementation would be far more complex. Each package variable would need a pair of GV and RV2GV ops to set it up for aliasing in a lexical, meaning that we’d need to create a list to feed into ENTERITER, and have it process that list to set up Pads. This complexity isn’t worth it.
Alternative parsable syntaxes suggested by Father Chrysostomos
When this syntax was suggested in 2017, Father Chrysostomos observed that in addition to the first syntax, the next two are also currently errors and could be parsed:
for my ($key, $value) (%hash) { ... }
for my $key, my $value (%hash) { ... }
for $foo, $bar (%hash) { ... }
He notes that it’s only this illegal syntax that could not be parsed:
for ($foo, $bar) (%hash) { ... } # syntax error
https://www.nntp.perl.org/group/perl.perl5.porters/2017/04/msg243856.html
Strictly we also can’t parse this:
for (my $key, my $value) (%hash) { ... }
meaning that we can’t offer choice of syntax in for
analogous to:
(my $foo, my $bar, my $baz) = @ARGV
my ($foo, $bar, $baz) = @ARGV
which parse to the same optree.
However, these two lines are not the same:
my $key, my $value = @pair;
my ($key, $value) = @pair;
hence we should not offer the alternative syntax:
for my $key, my $value (%hash) { ... }
For implementation reasons we only want to handle lexicals. This rules out:
for $foo, $bar (%hash) { ... }
So these alternative possible syntaxes should not be offered.
Should this be behind a feature guard?
As the new syntax was previously an error, and previously in similar situations we have not used a feature guard, we don’t think that we should this time.
Open Issues
Copyright
Copyright (C) 2021, Nicholas Clark
This document and code and documentation within it may be used, redistributed and/or modified under the same terms as Perl itself.