Perl Style Guide

From Proxmox VE
Revision as of 07:33, 16 April 2020 by Thomas Lamprecht (talk | contribs) (add some regex info)
Jump to navigation Jump to search

PVE Perl Style Guide

Various of our files have inconsistent styles due to historical growth as well as because of the mixed styles we got from various contributors.

Please try to follow this guide if you're working on code for PVE to avoid adding to the style mess.

Here's a summary of our style (which is somewhat unusual at least when it comes to the mixed indentation).

Indentation

We indent by 4 spaces, assume tabs to be 8 spaces and convert all groups of 8 spaces into tabs at the beginning of a line. Here's an example, with tabs represented via '>........'

sub foo {
    my ($arg1, $arg2, $arg3) = @_;
    if ($arg1) {
>.......print($arg2, "\n");
>.......if ($arg3) {
>.......    die "Exceptions should end with a newline in most cases\n"
>.......>.......if $arg3 ne $arg2;
>.......    print("Another line\n");
>.......}
    }
}

Please try to avoid lines longer than 80 characters unless it really ruins the code layout (which often means you want to factorize your code better).

The vim configuration would be `:set ts=8 sts=4 sw=4 noet cc=80`

Like all editors for some reason I'll never understand we do not distinguish between indentation and alignment, so if you split up an expression over multiple lines we still use the same "all 8 spaces are 1 tab" pattern.

Spacing and syntax usage

Spaces around parenthesis with syntactical words are inserted as you would in regular text (one before the opening parenthesis, one after the closing parenthesis). Similarly for curly braces:

  • use if (cond) {
  • use while (cond) {
  • not if(cond) {
  • not if (cond){
  • not if(cond){

BUT: no space between a function`s name and its parameter list:

  • func(params)
  • not func (params)

Use spaces around braces of lists:

  • use my $list = [ 1, 2, 3 ];
  • use my $hash = { one => 1, two => 2 };

No spaces for unary operators or sigils which are directly connected to one another, or in array/hash accesses (here the contents of the brackets or curly braces represent content of the expression/variable to its left, so it makes sense to "group" them):

  • use !$foo
  • not ! $foo
  • use $foo->{text}
  • use $foo{text}
  • use $foo->[index]
  • use $foo[index]
  • use $foo->(index)
  • not &$foo(args)
  • not & $foo(args)

In general: use spaces in arithmetic expressions in a way which makes sense, eg. you can skip them on a short single binary operation, or if it helps reading the expression by grouping it so that the operator precedence is emphasized. Do not add spaces in a way which conflicts with the operators' precedences:

  • use a + b
  • not a+b
  • may use a*3 + b*4
  • must not use a+3 * b+4

In if-else blocks a else or elsif should be on it's own line together with its curly braces.

# Good
if (expr) {
    code();
} else {
    code();
}

# Bad
if (expr) {
    code();
}
else {
    code();
}

# Bad
if (expr)
{
    code();
}
else
{
    code();
}

Perl syntax choices

Most of these are chosen for semantic clarity and should make it easier to understand the code for people who don't use much perl or simply aren't used to our code base:

  • use $foo->(args) instead of &$foo(args)
  • use $foo->[subscript] instead of $$foo[subscript]
  • use $foo->{subscript} instead of $$foo{subscript}
  • prefer $foo = value if !defined($foo);
    over $foo //= value;
  • use if (!cond) {
    over unless (cond) {
  • use either of foreach or for when looping over a list of elements.

Regarding the first three: When not accessing an element but simply dereferencing *once*, the dereferencing sigil can be put in front with braces, eg. ${stuff} or @{stuff}, provided stuff is easy enough to read. Otherwise pull stuff out into a local variable first.

Blocks and multi-line statements

While there certainly are way-too-long lines in our code already, please try to avoid adding more of them.

Generally, when a line is long, try to split it up in a reasonable manner, and when there's an followup block or statement afterwards (eg. the contents of an if or while-loop), please *separate* the block visually:

sub foo {
    my ($arg1, $arg2, $arg3) = @_;

    # Good:
    if ($arg1->{foo}->{bar} && $arg2 * $arg1->{xzy} > $arg3->{baz}
        && $arg3->{more})
    {
>.......code();
    }
    # Bad: (block not visually distinct from the code())
    if ($arg1->{foo}->{bar} && $arg2 * $arg1->{xzy} > $arg3->{baz}
        && $arg3->{more}) {
>.......code();
    }

    # Good:
    if ($arg1->{foo}->{bar}
        && $arg2 * $arg1->{xzy} > $arg3->{baz}
        && $arg3->{more})
    {
>.......code();
    }
    # Bad: (inconsistent '&&' placement)
    if ($arg1->{foo}->{bar}
        && $arg2 * $arg1->{xzy} > $arg3->{baz} &&
        $arg3->{more})
    {
>.......code();
    }
}

If a language permits leaving out braces for single statements in an if for example, *do* use braces when the condition spans multiple lines.

Comments

Besides the fact that your comments should not explain what happens for a code hunk, but rather why it is they way it is, or why it even exists if one could question that, you should adhere to the following rules:

  • at least a single space after the initial comment hash symbol (#), more can be added for formatting purposes
  • try to stay below the 80 character per line length limit
  • do not add newlines early, use the full width (80cc) available
  • short comments can stay in the same line as the to-be-commented statement
  • try to keep comment and its related statement(s) together location-wise, so that it's clear what the comment target is.

Regular Expressions

Matching Digits Fallacy

For matching digits normally \d is used, but this has some implications which may not be expected. Namely, it matches all "Unicode Characters" (code points) from the 'Number, Decimal Digit' Category.

If you think that this could have any implication on security or long term functionality then do not use \d but use the [0-9] character class.

Non-Capturing Groups

Normally you should use non capturing groups (?:) as long as you do not need to use the value in parenthesis. They can make the regex execution faster and simpler do understand.