Difference between revisions of "Perl Style Guide"

From Proxmox VE
Jump to: navigation, search
(Breaking long lines and strings)
(use <code> tags instead of <tt> tags)
 
(4 intermediate revisions by the same user not shown)
Line 96: Line 96:
 
  if !defined($another_variable_name) && defined($some_hash->{value_foo});
 
  if !defined($another_variable_name) && defined($some_hash->{value_foo});
  
 +
If a post-if has such a long expression where one would need to wrap that too, consider using a normal <code>if (EXPR) {</code> instead.
 +
 +
# BAD: wrapping the actual post-if over multiple lines makes it very hard to read
 +
$a_longer_variable_name = $some_hash->{value_foo}
 +
    if !defined($another_variable_name) && defined($some_hash->{value_foo})
 +
    && $another_condition && defined($yet_another_check);
 +
 +
# GOOD: use a "regular" if instead. Expression wrapping is more flexible, but place
 +
# the { start of block on separate line
 +
if (!defined($another_variable_name) && defined($some_hash->{value_foo}) &&
 +
    $another_condition && defined($yet_another_check))
 +
{
 +
    $a_longer_variable_name = $some_hash->{value_foo};
 +
}
 +
 +
# GOOD: more fine-grained expression wrap is OK, can help readability
 +
# orient on surrounding code for what to use
 +
if (
 +
    !defined($another_variable_name)
 +
    && defined($some_hash->{value_foo})
 +
    && $another_condition
 +
    && defined($yet_another_check)
 +
) {
 +
    $a_longer_variable_name = $some_hash->{value_foo};
 +
}
 +
   
 
Also remember that you must NOT use post if when declaring the variable for the first time with <code>my</code>!
 
Also remember that you must NOT use post if when declaring the variable for the first time with <code>my</code>!
  
Line 104: Line 130:
 
parenthesis). Similarly for curly braces:
 
parenthesis). Similarly for curly braces:
  
* use <tt>if (cond) {</tt>
+
* use <code>if (cond) {</code>
* use <tt>while (cond) {</tt>
+
* use <code>while (cond) {</code>
* '''not''' <tt>if(cond) {</tt>
+
* '''not''' <code>if(cond) {</code>
* '''not''' <tt>if (cond){</tt>
+
* '''not''' <code>if (cond){</code>
* '''not''' <tt>if(cond){</tt>
+
* '''not''' <code>if(cond){</code>
  
 
'''BUT:''' no space between a ''function''`s name and its parameter list:
 
'''BUT:''' no space between a ''function''`s name and its parameter list:
* <tt>func(params)</tt>
+
* <code>func(params)</code>
* '''not''' <tt>func (params)</tt>
+
* '''not''' <code>func (params)</code>
  
 
Use spaces around braces of lists:
 
Use spaces around braces of lists:
* use <tt>my $list = [ 1, 2, 3 ];</tt>
+
* use <code>my $list = [ 1, 2, 3 ];</code>
* use <tt>my $hash = { one => 1, two => 2 };</tt>
+
* use <code>my $hash = { one => 1, two => 2 };</code>
  
 
No spaces for unary operators or sigils which are directly connected to one
 
No spaces for unary operators or sigils which are directly connected to one
Line 122: Line 148:
 
braces represent content of the expression/variable to its left, so it makes
 
braces represent content of the expression/variable to its left, so it makes
 
sense to "group" them):
 
sense to "group" them):
* use <tt>!$foo</tt>
+
* use <code>!$foo</code>
* '''not''' <tt>! $foo</tt>
+
* '''not''' <code>! $foo</code>
* use <tt>$foo->{text}</tt>
+
* use <code>$foo->{text}</code>
* use <tt>$foo{text}</tt>
+
* use <code>$foo{text}</code>
* use <tt>$foo->[index]</tt>
+
* use <code>$foo->[index]</code>
* use <tt>$foo[index]</tt>
+
* use <code>$foo[index]</code>
* use <tt>$foo->(index)</tt>
+
* use <code>$foo->(index)</code>
* '''not''' <tt>&$foo(args)</tt>
+
* '''not''' <code>&$foo(args)</code>
* '''not''' <tt>& $foo(args)</tt>
+
* '''not''' <code>& $foo(args)</code>
  
 
In general: use spaces in arithmetic expressions in a way which makes sense,
 
In general: use spaces in arithmetic expressions in a way which makes sense,
Line 136: Line 162:
 
the expression by grouping it so that the operator precedence is emphasized. Do
 
the expression by grouping it so that the operator precedence is emphasized. Do
 
not add spaces in a way which conflicts with the operators' precedences:
 
not add spaces in a way which conflicts with the operators' precedences:
* use <tt>a + b</tt>
+
* use <code>a + b</code>
* '''not''' <tt>a+b</tt>
+
* '''not''' <code>a+b</code>
* may use <tt>a*3 + b*4</tt>
+
* may use <code>a*3 + b*4</code>
* '''must not''' use <tt>a+3 * b+4</tt>
+
* '''must not''' use <code>a+3 * b+4</code>
  
 
In if-else blocks a else or elsif should be on it's own line together with its curly braces.
 
In if-else blocks a else or elsif should be on it's own line together with its curly braces.
Line 176: Line 202:
 
our code base:
 
our code base:
  
* use <tt>$foo->(args)</tt> instead of <tt>&$foo(args)</tt>
+
* use <code>$foo->(args)</code> instead of <code>&$foo(args)</code>
* use <tt>$foo->[subscript]</tt> instead of <tt>$$foo[subscript]</tt>
+
* use <code>$foo->[subscript]</code> instead of <code>$$foo[subscript]</code>
* use <tt>$foo->{subscript}</tt> instead of <tt>$$foo{subscript}</tt>
+
* use <code>$foo->{subscript}</code> instead of <code>$$foo{subscript}</code>
  
When not accessing an element but simply dereferencing *once*, the dereferencing sigil can be put in front with braces, eg. <tt>${stuff}</tt> or <tt>@{stuff}</tt>, provided <tt>stuff</tt> is easy enough to read. Otherwise pull <tt>stuff</tt> out into a local variable first.
+
When not accessing an element but simply dereferencing *once*, the dereferencing sigil can be put in front with braces, eg. <code>${stuff}</code> or <code>@{stuff}</code>, provided <code>stuff</code> is easy enough to read. Otherwise, pull <code>stuff</code> out into a local variable first.
  
* prefer <tt>$foo = value if !defined($foo);</tt>
+
* prefer <code>$foo = value if !defined($foo);</code>
*: over <tt>$foo //= value;</tt>
+
*: over <code>$foo //= value;</code>
* use <tt>if (!cond) {</tt>
+
* use <code>if (!cond) {</code>
*: over <tt>unless (cond) {</tt>
+
*: over <code>unless (cond) {</code>
* use either of <tt>foreach</tt> or <tt>for</tt> when looping over a list of elements.
+
* use <code>for</code> when looping over a list of elements
* prefer <tt>foo($a, $b);</tt>
+
*: over <code>foreach</code>
*: over <tt>foo $a, $b</tt>
+
* prefer <code>foo($a, $b);</code>
 +
*: over <code>foo $a, $b</code>
  
 
Function calls should favor the use of parentheses. Omitting them is ''only'' allowed in simple cases.
 
Function calls should favor the use of parentheses. Omitting them is ''only'' allowed in simple cases.
Line 202: Line 229:
 
</pre>
 
</pre>
  
As a rule of thumb, when operators are involved, or when calling functions other than <tt>print, map</tt> or <tt>grep</tt>, always use parentheses for function calls.
+
As a rule of thumb, when operators are involved, or when calling functions other than <code>print, map</code> or <code>grep</code>, always use parentheses for function calls.
  
 
=== Blocks and multi-line statements ===
 
=== Blocks and multi-line statements ===
 
While there certainly are way-too-long lines in our code already, please try to avoid adding more of them.
 
While there certainly are way-too-long lines in our code already, please try to avoid adding more of them.
  
Generally, when a line is long, try to split it up in a reasonable manner, and when there's an followup block or statement afterwards (eg. the contents of an <tt>if</tt> or <tt>while</tt>-loop), please *separate* the block visually:
+
Generally, when a line is long, try to split it up in a reasonable manner, and when there's an followup block or statement afterwards (eg. the contents of an <code>if</code> or <code>while</code>-loop), please *separate* the block visually:
  
 
<pre>
 
<pre>
Line 242: Line 269:
 
</pre>
 
</pre>
  
If a language permits leaving out braces for single statements in an <tt>if</tt> for example, *do* use braces when the condition spans multiple lines.
+
If a language permits leaving out braces for single statements in an <code>if</code> for example, *do* use braces when the condition spans multiple lines.
  
 
=== Comments ===
 
=== Comments ===
Line 249: Line 276:
  
 
* at least a single space after the initial comment hash symbol (#), more can be added for formatting purposes
 
* at least a single space after the initial comment hash symbol (#), more can be added for formatting purposes
* try to stay below the 80 character per line length limit
+
* try to stay below the 80 or 100 character per line length limit, whatever limit the surrounding codes uses
* do not add newlines early, use the full width (80cc) available
+
* do not add newlines early, use the full width (80 to 100 cc) available
 
* short comments can stay in the same line as the to-be-commented statement
 
* short comments can stay in the same line as the to-be-commented statement
 
* try to keep comment and its related statement(s) together location-wise, so that it's clear what the comment target is.
 
* try to keep comment and its related statement(s) together location-wise, so that it's clear what the comment target is.

Latest revision as of 14:56, 2 April 2021

PVE Perl Style Guide

Various of our files have inconsistent styles due to historical growth as well as because of the mixed styles we got from various contributors.

Please try to follow this guide if you're working on code for PVE to avoid adding to the style mess.

Here's a summary of our style (which is somewhat unusual at least when it comes to the mixed indentation).

Indentation

We indent by 4 spaces, assume tabs to be 8 spaces and convert all groups of 8 spaces into tabs at the beginning of a line. Here's an example, with tabs represented via '>........'

sub foo {
    my ($arg1, $arg2, $arg3) = @_;
    if ($arg1) {
>.......print($arg2, "\n");
>.......if ($arg3) {
>.......    die "Exceptions should end with a newline in most cases\n"
>.......>.......if $arg3 ne $arg2;
>.......    print("Another line\n");
>.......}
    }
}

Like all editors for some reason I'll never understand we do not distinguish between indentation and alignment, so if you split up an expression over multiple lines we still use the same "all 8 spaces are 1 tab" pattern.

The vim configuration would be :set ts=8 sts=4 sw=4 noet

Breaking long lines and strings

The preferred limit on the length of a single line is 80 columns.

Statements longer than 80 columns should be broken into sensible chunks, unless exceeding 80 columns significantly increases readability and does not hide information. The maximal line length tolerated for those exceptions is 100 columns.

The vim configuration would be :set cc=81 or, to show both soft and hard limit: :set cc=81,101


Wrapping Arguments

Once you need to wrap a function call, wrap each argument, including the first, on a separate line. Keep the trailing comma for the last argument too.

# GOOD! Wrapped nicely, trailing commas added
my $result = PVE::Some::Long::Module::Name::get_foo_bar_baz(
    $argument_one,
    $argument_two,
    $argument_three,
    ...
    $argument_last,
);
# GOOD! Not to long, so there must not be any wrap
my $result = PVE::Some::Long::Module::Name::short($a);
# BAD! First argument needs to be on its own line
my $result = PVE::Some::Long::Module::Name::get_foo_bar_baz($argument_one,
    $argument_two,
    ...
    $argument_last,
);
# BAD! Do not move indentation over to match opening line, but add a single level
# (or 4 spaces, depending on how other code in the module handles this).
my $result = PVE::Some::Long::Module::Name::get_foo_bar_baz(
                                                            $argument_one,
                                                            $argument_two,
                                                            ...
                                                            $argument_last);

Wrapping Post-If

Always wrap the whole if EXPR to a new line and intend with four (4) spaces once it gets to long.

# GOOD: wrapped correctly
$a_longer_variable_name = $some_hash->{value_foo}
    if !defined($another_variable_name) && defined($some_hash->{value_foo});
# GOOD: short enough, so one line it is
$foo = $bar if defined($bar);
# BAD: once wrapping is required, the if needs to be ALWAYS on its own line
$a_longer_variable_name = $some_hash->{value_foo} if !defined($another_variable_name)
    && defined($some_hash->{value_foo});
# BAD: wrap correctly but indentation is missing!
$a_longer_variable_name = $some_hash->{value_foo}
if !defined($another_variable_name) && defined($some_hash->{value_foo});

If a post-if has such a long expression where one would need to wrap that too, consider using a normal if (EXPR) { instead.

# BAD: wrapping the actual post-if over multiple lines makes it very hard to read
$a_longer_variable_name = $some_hash->{value_foo}
    if !defined($another_variable_name) && defined($some_hash->{value_foo})
    && $another_condition && defined($yet_another_check);
# GOOD: use a "regular" if instead. Expression wrapping is more flexible, but place
# the { start of block on separate line
if (!defined($another_variable_name) && defined($some_hash->{value_foo}) &&
    $another_condition && defined($yet_another_check))
{
    $a_longer_variable_name = $some_hash->{value_foo};
}
# GOOD: more fine-grained expression wrap is OK, can help readability
# orient on surrounding code for what to use
if (
    !defined($another_variable_name)
    && defined($some_hash->{value_foo})
    && $another_condition
    && defined($yet_another_check)
) {
    $a_longer_variable_name = $some_hash->{value_foo};
}
    

Also remember that you must NOT use post if when declaring the variable for the first time with my!

Spacing and syntax usage

Spaces around parenthesis with syntactical words are inserted as you would in regular text (one before the opening parenthesis, one after the closing parenthesis). Similarly for curly braces:

  • use if (cond) {
  • use while (cond) {
  • not if(cond) {
  • not if (cond){
  • not if(cond){

BUT: no space between a function`s name and its parameter list:

  • func(params)
  • not func (params)

Use spaces around braces of lists:

  • use my $list = [ 1, 2, 3 ];
  • use my $hash = { one => 1, two => 2 };

No spaces for unary operators or sigils which are directly connected to one another, or in array/hash accesses (here the contents of the brackets or curly braces represent content of the expression/variable to its left, so it makes sense to "group" them):

  • use !$foo
  • not ! $foo
  • use $foo->{text}
  • use $foo{text}
  • use $foo->[index]
  • use $foo[index]
  • use $foo->(index)
  • not &$foo(args)
  • not & $foo(args)

In general: use spaces in arithmetic expressions in a way which makes sense, eg. you can skip them on a short single binary operation, or if it helps reading the expression by grouping it so that the operator precedence is emphasized. Do not add spaces in a way which conflicts with the operators' precedences:

  • use a + b
  • not a+b
  • may use a*3 + b*4
  • must not use a+3 * b+4

In if-else blocks a else or elsif should be on it's own line together with its curly braces.

# Good
if (expr) {
    code();
} else {
    code();
}

# Bad
if (expr) {
    code();
}
else {
    code();
}

# Bad
if (expr)
{
    code();
}
else
{
    code();
}

Perl syntax choices

Most of these are chosen for semantic clarity and should make it easier to understand the code for people who don't use much perl or simply aren't used to our code base:

  • use $foo->(args) instead of &$foo(args)
  • use $foo->[subscript] instead of $$foo[subscript]
  • use $foo->{subscript} instead of $$foo{subscript}

When not accessing an element but simply dereferencing *once*, the dereferencing sigil can be put in front with braces, eg. ${stuff} or @{stuff}, provided stuff is easy enough to read. Otherwise, pull stuff out into a local variable first.

  • prefer $foo = value if !defined($foo);
    over $foo //= value;
  • use if (!cond) {
    over unless (cond) {
  • use for when looping over a list of elements
    over foreach
  • prefer foo($a, $b);
    over foo $a, $b

Function calls should favor the use of parentheses. Omitting them is only allowed in simple cases.

foo($a, $b); # ok
foo $a, $b; # okay, but discouraged
print $a, $b; # okay, print is commonly used this way
print($a, $b); # preferred
delete $a->{b}; # okay, common
my $var = delete($hash->{key}) // "default"; # ok
my $var = delete $hash->{key} // "default"; # NOT ok!

As a rule of thumb, when operators are involved, or when calling functions other than print, map or grep, always use parentheses for function calls.

Blocks and multi-line statements

While there certainly are way-too-long lines in our code already, please try to avoid adding more of them.

Generally, when a line is long, try to split it up in a reasonable manner, and when there's an followup block or statement afterwards (eg. the contents of an if or while-loop), please *separate* the block visually:

sub foo {
    my ($arg1, $arg2, $arg3) = @_;

    # Good:
    if ($arg1->{foo}->{bar} && $arg2 * $arg1->{xzy} > $arg3->{baz}
        && $arg3->{more})
    {
>.......code();
    }
    # Bad: (block not visually distinct from the code())
    if ($arg1->{foo}->{bar} && $arg2 * $arg1->{xzy} > $arg3->{baz}
        && $arg3->{more}) {
>.......code();
    }

    # Good:
    if ($arg1->{foo}->{bar}
        && $arg2 * $arg1->{xzy} > $arg3->{baz}
        && $arg3->{more})
    {
>.......code();
    }
    # Bad: (inconsistent '&&' placement)
    if ($arg1->{foo}->{bar}
        && $arg2 * $arg1->{xzy} > $arg3->{baz} &&
        $arg3->{more})
    {
>.......code();
    }
}

If a language permits leaving out braces for single statements in an if for example, *do* use braces when the condition spans multiple lines.

Comments

Besides the fact that your comments should not explain what happens for a code hunk, but rather why it is they way it is, or why it even exists if one could question that, you should adhere to the following rules:

  • at least a single space after the initial comment hash symbol (#), more can be added for formatting purposes
  • try to stay below the 80 or 100 character per line length limit, whatever limit the surrounding codes uses
  • do not add newlines early, use the full width (80 to 100 cc) available
  • short comments can stay in the same line as the to-be-commented statement
  • try to keep comment and its related statement(s) together location-wise, so that it's clear what the comment target is.

Regular Expressions

Matching Digits Fallacy

For matching digits normally \d is used, but this has some implications which may not be expected. Namely, it matches all "Unicode Characters" (code points) from the 'Number, Decimal Digit' Category.

If you think that this could have any implication on security or long term functionality then do not use \d but use the [0-9] character class.

Non-Capturing Groups

Normally you should use non capturing groups (?:) as long as you do not need to use the value in parenthesis. They can make the regex execution faster and simpler do understand.

Basic Linting with perlcritic

You can use perlcritic to avoid some runtime errors and style issues, it's far from perfect or fixing the issues dynamic languages have in general, but it's also better than nothing by a lot.

Installation

You can install it from the Debian package repositories:

apt update
apt install libperl-critic-perl libperl-critic-freenode-perl

Configuration

We target the highest severity level 5 only for now, albeit you can also try the level 4 to see if some other sensible style nits pop up, but that level may already get a bit noisy.

Setup the following configuration in ~/.perlcriticrc to fine tune the policies:

severity  = 5

[TestingAndDebugging::RequireUseStrict]
severity = 5

[TestingAndDebugging::RequireUseWarnings]
severity = 5

# we allow and use octals
[Perl::Critic::Policy::ValuesAndExpressions::ProhibitLeadingZeros]
severity  = 2

# sub prototypes can be handy, even only sometimes
[Perl::Critic::Policy::Subroutines::ProhibitSubroutinePrototypes]
severity  = 2

[Perl::Critic::Policy::BuiltinFunctions::RequireGlobFunction]
severity  = 2

[Perl::Critic::Policy::Freenode::DollarAB]
severity  = 2

# mostly for test, but also for very short module childs which live in
# the parent file
[Perl::Critic::Policy::Modules::RequireFilenameMatchesPackage]
severity  = 4

# return undef; vs. return; - while the latter is better, do not
[Perl::Critic::Policy::Subroutines::ProhibitExplicitReturnUndef]
severity  = 4

[Perl::Critic::Policy::Subroutines::RequireFinalReturn]
severity  = 3

[Perl::Critic::Policy::Variables::RequireLocalizedPunctuationVars]
severity  = 3

Usage

After installation and configuration you can use it by setting a directory or file as first argument, without that perlcritic waits on STDIN for code to check.

perlcritic PVE/

You can also set another serverity to check by passing -X where X is the integer level:

perlcritic -4 PVE/

Note that it does strictly linting checks, so a

perl -wc PVE/File.pm

would be still required for some other checks - they complement each other.