Reverse-mode AD (adjoint)

The adjoint of the Fortran source code is constructed using a source-to-source and line-by-line approach, transforming the target (to be transformed) routine into three different routines, one of which computes the adjoints of the variables by reversing the order of computation of the target routine.

This is implemented in PSyclone by parsing the source code file containing the target routine, and eventually the routines it calls, transforming it into a PSyIR AST and applying reverse-mode automatic differentiation transformations to the nodes thus obtained. The resulting PSyIR tree is then written to Fortran source code.

Reverse-mode transformations

Several transformations, to be applied on PSyIR nodes, have been implemented. In reverse-mode, all of them follow the naming convention ADReverse[PSyIRNodeSubclass]Trans. The one of most interest for the user is the ADReverseContainerTrans class and its apply method.

After parsing the Fortran code file containing the target routine, an ADReverseContainerTrans instance should be applied to it to perform automatic differentiation. The ADReverseContainerTrans.apply method in turn applies an ADReverseRoutineTrans to the target routine, which goes line-by-line through the statements found in the Routine node, applying ADReverse[PSyIRNodeSubclass]Trans to the statements, etc.

Figure made with TikZ

Reverse-mode AD transformation call graph

Container transformation

class psyclone.autodiff.transformations.ADReverseContainerTrans

A class for automatic differentation transformation of Container nodes in reverse-mode. This is the transformation to apply on the PSyIR AST generated from a source.

apply(container, routine_name, dependent_vars, independent_vars, reversal_schedule, options=None)

Applies the transformation, returning a new container with routine definitions for both motions using the reverse-mode of automatic differentiation.

Options:
- bool ‘jacobian’: whether to generate the Jacobian routine. Defaults to False.
- bool ‘verbose’ : toggles explanatory comments. Defaults to False.
- bool ‘simplify’: True to apply simplifications after applying AD transformations. Defaults to True.
- int ‘simplify_n_times’: number of time to apply simplification rules to BinaryOperation nodes. Defaults to 5.
- bool ‘inline_operation_adjoints’: True to inline all possible operation adjoints definitions. Defaults to True.
Parameters:
  • container (psyclone.psyir.nodes.Container) – Container Node to the transformed.

  • routine_name (Str) – name of the Routine to be transformed.

  • dependent_vars (List[Str]) – list of dependent variables names to be differentiated.

  • independent_vars (List[str]) – list of independent variables names to differentiate with respect to.

  • reversal_schedule (psyclone.autodiff.ADReversalSchedule) – reversal schedule for routined called inside the one to transform (and inside them, etc.).

  • options (Optional[Dict[Str, Any]]) – a dictionary with options for transformations, defaults to None.

Returns:

a copied and modified container with all necessary Routine definitions.

Return type:

psyclone.psyir.nodes.Container

As can be seen, the required arguments include the PSyIR [File]Container node obtained by parsing and transforming the source code, the names of the target routine, dependent variables (to be differentiated) and independent variables (to differentiate with respect to), as well as the reversal schedule and eventual transformation options.

The transformation returns a PSyIR Container node containing four Routine definitions for:

  • the advancing (original) motion,

  • the recording motion, which records overwritten values to the tape,

  • the returning motion, which recovers values from the tape and computes the adjoints of the independent variables,

  • the reversing motion, which combines the two precedent recording and returning motions and is the one to call in order to differentiate.

If some other routine is called by the target one, the returned Container node also contains four definitions for its different motions.

Target routine

The target routine is the Fortran routine in which to differentiate the dependent variables with respect to the independent variables. The routines it may call will also be differentiated iff they can be found in the [File]Container being transformed.

Dependent variables

The dependent variables are those to differentiate. Their intent in the target routine must be compatible with their values being returned, ie. they cannot be intent(in) arguments of the target routine.

Independent variables

The independent variables are those to differentiate with respect to. Their intent in the target routine must be compatible with their values being provided as arguments, ie. they cannot be intent(out) arguments of the target routine.

Reversal schedules

Reversal schedules (see Griewank and Walther[1] chapter 12.2, p.265) specify the way a transformed routine may call other transformed routines. They are implemented as 3 subclasses of ADReversalSchedule.

As an example let us consider the target routine foo calling subroutines bar and qux.

subroutine foo(x, y)
    call bar(x, y)
    call qux(x, y)
end subroutine foo

And let us denote:

  • advancing \square routine the original,

  • recording \Large\triangleright routine the one recording values to the tape,

  • returning \Large\triangleleft routine the one computing the adjoints,

  • reversing \Large\triangleright\triangleleft routine the one combining the two preceding. Call it to differentiate.

Split reversal schedule

In split reversal, children (called) routines follow the recording or returning motion of their parent (calling) routine.

By doing so, the computational complexity is kept as low as possible but values stored to the tape in the recording motion of the called routines need to be kept until they are called in returning motion, thus using a possibly large amount of memory.

Figure made with TikZ

Split reversal schedule

Joint reversal schedule

In joint reversal, all children (called) routines advance without recording when their parent (calling) routine is recording and reverse (record then immendiatly return) when their parent routine is returning.

On the one hand, this reversal schedule uses a smaller tape overall, as the values used in adjoining the called routines do no need to be stored longer than for them to be reversed. On the other hand, called subroutines computations are repeated, with increases the computational complexity of the adjoint program.

Figure made with TikZ

Joint reversal schedule

Value tape

Prevalues of overwritten variables are recorded and restored from a value tape, implemented as a static array. The transformations themselves employ an ADValueTape to generate recording and restoring statements to and from the value tape array.

class psyclone.autodiff.tapes.ADValueTape(name, datatype)

A class for recording and recovering function values in reverse-mode automatic differentiation. The prevalues of references are recorded. Based on static arrays storing a single type of data rather than a LIFO stack. Provides methods to create the PSyIR assignments for recording and restoring operations.

Parameters:
  • name – name of the value_tape (after a prefix).

  • datatype (Union[psyclone.psyir.symbols.ScalarType, psyclone.psyir.symbols.ArrayType]) – datatype of the elements of the value_tape.

Raises:
  • TypeError – if name is of the wrong type.

  • TypeError – if datatype is of the wrong type.

  • NotImplementedError – if datatype is not of type ‘ScalarType’.

record(reference)

Add the reference as last element of the value_tape and return the Assignment node to record the prevalue to the tape.

Parameters:

reference (psyclone.psyir.nodes.Reference) – reference whose prevalue should be recorded.

Raises:
  • TypeError – if reference is of the wrong type.

  • TypeError – if the intrinsic of reference’s datatype is not the same as the intrinsic of the value_tape’s elements datatype.

  • NotImplementedError – if the reference’s datatype is ArrayType.

Returns:

an Assignment node for recording the prevalue of the reference as last element of the value_tape.

Return type:

psyclone.psyir.nodes.Assignment

restore(reference)

Restore the last element of the value_tape if it is the symbol argument and return the Assignment node to restore the prevalue to the variable.

Parameters:

reference (psyclone.psyir.symbols.DataSymbol) – reference whose prevalue should be restored from the value_tape.

Raises:

TypeError – if reference is of the wrong type.

Returns:

an Assignment node for restoring the prevalue of the reference from the last element of the value_tape.

Return type:

psyclone.psyir.nodes.Assignment

Generating adjoints

The transformations applied to generate adjoints are detailled below. They mostly follow the guidelines found in Griewank and Walther[1] chapter 6.2, pp.125-126.

Internally, the transformations used are ADReverseAssignmentTrans, ADReverseOperationTrans and ADReverseCallTrans, depending on the PSyIR node being transformed. These all return two separate lists of PSyIR statements, used respectively in extending the recording and returning routines being generated.

Adjoints of operations

Adjoints of unary operations

Advancing motion

Recording motion

Returning motion

f = +x

value_tape(i) = f

f = +x

f = value_tape(i)

x_adj = x_adj + f_adj

f_adj = 0.0

f = -x

value_tape(i) = f

f = -x

f = value_tape(i)

x_adj = x_adj - f_adj

f_adj = 0.0

f = SQRT(x)

value_tape(i) = f

f = SQRT(x)

f = value_tape(i)

x_adj = x_adj + f_adj / (2 * SQRT(x))

f_adj = 0.0

f = EXP(x)

value_tape(i) = f

f = EXP(x)

f = value_tape(i)

x_adj = x_adj + f_adj * EXP(x)

f_adj = 0.0

f = LOG(x)

value_tape(i) = f

f = LOG(x)

f = value_tape(i)

x_adj = x_adj + f_adj / x

f_adj = 0.0

f = LOG10(x)

value_tape(i) = f

f = LOG10(x)

f = value_tape(i)

x_adj = x_adj + f_adj / (x * LOG(10.0))

f_adj = 0.0

f = COS(x)

value_tape(i) = f

f = COS(x)

f = value_tape(i)

x_adj = x_adj - f_adj * SIN(x)

f_adj = 0.0

f = SIN(x)

value_tape(i) = f

f = SIN(x)

f = value_tape(i)

x_adj = x_adj + f_adj * COS(x)

f_adj = 0.0

f = TAN(x)

value_tape(i) = f

f = TAN(x)

f = value_tape(i)

x_adj = x_adj + f_adj * (1.0 + TAN(x) ** 2)

f_adj = 0.0

f = ACOS(x)

value_tape(i) = f

f = ACOS(x)

f = value_tape(i)

x_adj = x_adj - f_adj / SQRT(1.0 - x ** 2)

f_adj = 0.0

f = ASIN(x)

value_tape(i) = f

f = ASIN(x)

f = value_tape(i)

x_adj = x_adj + f_adj / SQRT(1.0 - x ** 2)

f_adj = 0.0

f = ATAN(x)

value_tape(i) = f

f = ATAN(x)

f = value_tape(i)

x_adj = x_adj + f_adj / (1.0 + x ** 2)

f_adj = 0.0

f = ABS(x)

value_tape(i) = f

f = ABS(x)

f = value_tape(i)

x_adj = x_adj + f_adj * (x / ABS(x))

f_adj = 0.0

Note: some of these adjoints computations, explicitly those for SQRT, EXP, TAN and ABS, could reuse the (post)value of f before restoring its prevalue from the value tape rather than recompute it (see Griewank and Walther[1] table 4.8, p.68). This is not implemented yet.

Adjoints of binary operations

Advancing motion

Recording motion

Returning motion

f = x + y

value_tape(i) = f

f = x + y

f = value_tape(i)

x_adj = x_adj + f_adj

y_adj = y_adj + f_adj

f_adj = 0.0

f = x - y

value_tape(i) = f

f = x - y

f = value_tape(i)

x_adj = x_adj + f_adj

y_adj = y_adj - f_adj

f_adj = 0.0

f = x * y

value_tape(i) = f

f = x * y

f = value_tape(i)

x_adj = x_adj + f_adj * y

y_adj = y_adj + f_adj * x

f_adj = 0.0

f = x / y

value_tape(i) = f

f = x / y

f = value_tape(i)

x_adj = x_adj + f_adj / y

y_adj = y_adj - f_adj * x / y ** 2

f_adj = 0.0

f = x ** y

value_tape(i) = f

f = x ** y

f = value_tape(i)

x_adj = x_adj + f_adj * y * x ** (y - 1)

y_adj = y_adj + f_adj * x ** y * LOG(x)

f_adj = 0.0

Note: some of these adjoints computations, explicitly those for / and ** could reuse the (post)value of f before restoring its prevalue from the value tape rather than recompute it (see Griewank and Walther[1] table 4.8, p.68). This is not implemented yet.

The cases detailled above are the simpler ones, of assigning the result of an operation to a variable.

When composed operations are present, an adjoint variable is declared for the adjoint of the operation itself and used to increment the adjoints of its operands.

The transformation option inline_operation_adjoints allows the user to choose whether these operation adjoints should be substituted in further computations of adjoints as a postprocessing step, iff they only appear once on the RHS of an assignment.

As an example, consider the following computation involving composed operations and the associated adjoints computations, without and with substitution. Note: taping assignments are omitted below.

Composed operation

Adjoints, without substitution

Adjoints, with substitution

f = EXP(x) + z

op_adj = f_adj

z_adj = z_adj + f_adj

x_adj = x_adj + op_adj * EXP(x)

f_adj = 0.0

z_adj = z_adj + f_adj

x_adj = x_adj + f_adj * EXP(x)

f_adj = 0.0

Adjoints of iterative assignments

In the case of iterative assignments ie. where the LHS variable of the assignment is also present on the RHS, additional care must be taken to avoid incorrect computations of the LHS adjoint by assigning to it last rather than incrementing its value as in the general case detailled above (see Griewank and Walther[1] chapter 5.1, p.93).

As an example consider the following adjoint:

Advancing motion

Recording motion

Returning motion

f = 2 * f + x

value_tape(i) = f

f = 2 * f + x

f = value_tape(i)

x_adj = x_adj + f_adj

f_adj = f_adj * 2

Adjoints of calls to subroutines

The adjoints of calls to subroutines depend on the reversal schedule that is used.

Whether the prevalues of the arguments are recorded and restored from the tape depend on their intent in the called subroutine, which determines whether their value might be overwritten by it or not.

Operations as subroutine call arguments are also transformed.

Split reversal schedule

Advancing motion

Recording motion

Returning motion

call func(x, y)

[value_tape(i) = x]

[x = value_tape(i)]

[value_tape(i + 1) = y]

[y = value_tape(i + 1)]

call func_recording(x,y)

call func_returning(x, x_adj, y, y_adj)

Joint reversal schedule

Advancing motion

Recording motion

Returning motion

call func(x, y)

[value_tape(i) = x]

[x = value_tape(i)]

[value_tape(i + 1) = y]

[y = value_tape(i + 1)]

call func(x,y)

call func_reversing(x, x_adj, y, y_adj)