built-in functions for checking the original type of a scalar

Preamble

Author:  Graham Knop <haarg@haarg.org>
Sponsor:
ID:      0017
Status:  Implemented

Abstract

This PPC proposes adding new builtin functions created_as_string and created_as_number for checking the original form of non-scalar references. Specifically, checking if a value was created as a string or a number. This is primarily meant for use by serializers.

Motivation

Perl itself is operator typed and is not generally meant to distinguish between values like 1 and “1”. However, many external systems do care about these types, and many times these values need to be transferred through perl without anything like a schema to enforce these types. This makes it important for things like serializers to be able to track the “origin” type of a value.

Classically, perl scalars have no set type. They may be created as strings or numbers, but based on their use they will internally store alternate forms of the value. If these alternate forms are considered “accurate”, the corresponding flag will be set. This makes it impossible to know for certain how some values were created, as they will have both string and number forms that are flagged as “accurate”.

In perl v5.36, this has changed. The POK flag is now only set for values that are created as strings. Values that are created as strings may gain IOK or NOK flags based on their use, but since numbers will never gain the POK flag, this can still be used to detect numbers.

Exposing this information via builtin functions will allow pure perl code to accurately serialize values.

Rationale

While the flags on scalars can be checked via the B module, using it requires an understanding of perl internals, and it is slow and uses a fairly large amount of memory. Being able to interpret the flags to know string vs number is also not entirely obvious.

The functions are intentionally using longer less convenient names, with the hope that this will discourage their use for type checks within perl. Inside perl, 1 and “1” are meant to be treated equivalently. It is only when transferring data outside perl that these are meant to be used.

created_as is meant to only tell how a value was created, without implying anything further about what the value “is”. created in this case means any modification of the value, as any modification of a value requires creating the new, modified value.

Specification

created_as_string VALUE

created_as_string takes a single scalar argument.

It returns a boolean value representing if the value was originally created as a string. It will return false for references, numbers, booleans, and undef. Internally, this would be SvPOK(sv) && !SvIsBOOL(sv).

created_as_number VALUE

created_as_number takes a single scalar argument.

It returns a boolean value representing if the value was originally created as a number. It will return false for references, strings, booleans, and undef. Internally, this would be SvNIOK(sv) && !SvPOK(sv) && !SvIsBOOL(sv).

Backwards Compatibility

There should be no significant backwards compatibility concerns. created_as_string and created_as_number will be imported only when requested. Static analysis tools may need to be updated.

As these functions rely on the changes to how SV flags get set, it isn’t possible to accurately implement these on older perls. Existing serializers use heuristics that will usually give accurate answers. One of these heuristics could be used to provide a compatible polyfill.

Security Implications

Nothing specific predicted.

Examples

use builtin qw(created_as_number created_as_string);

my $value1 = "1";
my $string1 = created_as_string $value1; # true
my $number1 = created_as_number $value1; # false

my $value2 = 1;
my $string2 = created_as_string $value2; # false
my $number2 = created_as_number $value2; # true

my $value3 = "1";
my $used_as_number3 = 0+$value3;
my $string3 = created_as_string $value3; # true
my $number3 = created_as_number $value3; # false

my $value4 = 1;
my $used_as_string4 = "$value4";
my $string4 = created_as_string $value4; # false
my $number4 = created_as_number $value4; # true

Prototype Implementation

None.

Future Scope

It is possible that in the future, the NOK or IOK flags will only be set for values created as numbers. That would mean the implementation of these functions may change, but should not impact the results returned.

Additionally, if the NOK and IOK flags were only set for values created as integers or created as floats, it may be possible to provide additional functions created_as_integer and created_as_float to distinguish these types. Some serializers care about this distinction as well.

Rejected Ideas

Various alternative names for these functions have been proposed.

is_string and is_number were the initial discussed functions, but these names imply stronger typing than perl is really meant to have. They would more strongly encourage using these functions for type checking within perl, rather than the usually more appropriate looks_like_number.

was_originally_string and was_originally_number were also proposed, but originally carries the implication that the values may no longer be those things. The use of an adverb also feels wrong stylistically.

Open Issues

None?

Copyright (C) 2022, Graham Knop.

This document and code and documentation within it may be used, redistributed and/or modified under the same terms as Perl itself.