Automatic differentiation (AD)
Simply put, the aim of automatic differentiation (AD) is to automatically obtain the derivatives of somes variables output by an existing program with respect to some of its input variables.
It avoids resorting to symbolic differentiation, which is error-prone when done manually and quickly of excessive complexity when applied automatically, or finite differences, which are inexact.
To gain an intuition of the way this is achieved, consider a program computing
return values of variables
from values of arguments
through intermediate values
, where each value is obtained from its
direct predecessors through elemental operations
.
- Let us denote:
independent variables:
,dependent variables:
,intermediate values:
which may or not be assigned to variables in the program,relation:
if
depends on
, eg. below
.predecessors:
eg. below 
operation:
eg. below 

Example program
Since all programs can be reduced to sequential elemental operations in this
fashion, automatic differentiation allows to compute
by differentiating operations
and using
the chain rule.
It comes in two main flavors, usually called forward- or tangent-mode and reverse- or adjoint-mode, which differ in the way substitutions are performed in the chain rule, which partial derivatives are computed as a result and the order in which statements in the original program are differentiated by the AD transformation.
Forward- or tangent-mode
Using the notations introduced above, forward-mode automatic
differentiations allows to compute all derivatives w.r.t. a single
independent variable
.
Let us denote the derivatives w.r.t.
as

such that the chain rule writes

Forward-mode automatic differentation is equivalent to applying substitutions in the order indicated by the arrow in

Initializing
and
,
we obtain in a single evaluation 
where
is the Jacobian matrix
.
Advantages and inconvenients
Forward-mode is easy to implement as derivatives can be computed in the same order of computation as that of the original program.
If there are less independent than dependent variables, its complexity is lower than that of the reverse- or adjoint-mode. But frequently, and maybe even more so in ocean and atmosphere models, the number of inputs greatly exceeds the number of outputs, requiring many repeated evaluations, one for each input or independent variable to differentiate with respect to.
Reverse- or adjoint-mode
Using the notations introduced above, reverse-mode automatic
differentiations allows to compute all derivatives of a single
dependent variable
.
Let us denote the adjoints w.r.t.
as

such that the chain rule writes

where bold font is used to highlight how the value of the adjoint
depends on successors of
.
Reverse-mode automatic differentation is equivalent to applying substitutions in the order indicated by the arrow in

Initializing
and
,
we obtain in a single evaluation
.
Advantages and inconvenients
Reverse-mode is quite a lot more complicated to implement than forward-mode as adjoints need to be computed in the reversed order of computation compared to that of the original program as illustated in the example below.
If there are less dependent than independent variables, as is often the case, its complexity is lower than that of the forward- or tagent-mode.
However, when some variables are overwritten in the program, reverse-mode also requires running the original program and recording overwritten values, and eventually some the results of some operations, when they appear in the computations of some adjoints. This add further complications compared to forward-mode and requires using a persistent “tape”, which needs to be kept in memory, or recomputing values as many times as they are required.
A simple example in reverse-mode with non-linearities
Let us consider the simple computations displayed below and illustate how to
compute the adjoints
and
for a chosen dependent variable
.

Simple program example

Reverse-mode example
Initialize with
to obtain the adjoints.
Notice that the adjoint of variables appearing as operands in the original computations (top) are incremented in the reverse-mode ones (bottom). Moreover, non-linearities in the original occasion the presence of operation results/ non-adjoint variables in the adjoint computations, which could be either recomputed or recorded and restored from a tape.