Function as a directed acyclic graph

Viewing a function as a directed acyclic graph is a powerful tool. It allows one to understand how autodiff systems are designed and used to compute gradients.
0. Simple examples
DAG of \( y = f(x) \): \( \amber{x \longrightarrow y} \)
DAG of \( \rose{z = g(y)}, y = f(x) \): \( \amber{x \longrightarrow y \longrightarrow z} \)
Note that the we have left out how the input is mapped to the output. The DAG only shows the structure of the function and how different variables are related to each other.
Here is a DAG of a function that calculates the variance of two variables \( {\sigma^2 = \frac{1}{n} \sum_{i=1}^n (\underbrace{x_i - \mu}_{\rose{y_i}})^2} \) where \( \mu = \frac{1}{n} \sum_{i=1}^n x_i \):
\( x_1 \)\( x_2 \)\( \mu \)\( y_1 \)\( y_2 \)\( y_1^2 + y_2^2 \)\( \sigma^2 \)
A DAG of the variance function
1. Mixing vectors and scalars
In the previous example, every node in the DAG of variance function was a scalar. We can consider a DAG where some of the nodes are vectors. This provide a more linear-algebraic view of the function. From this point of view, variance is a function \( f: R^n \rightarrow R \) and its DAG looks like:
Another DAG of the variance function
\( \underbrace{x}_{\rose{\in R^2}} \)\( \mu \)\( \underbrace{y}_{\rose{\in R^2}} \)\( \| y \|^2 \)\( \sigma^2 \)
\( x,y \) are vectors while the rest are scalars
Click the equation below to see the intermediate steps and variable definitions:$$ \toggle {x \ \grey{\text{(click me)}}} {\mu = \frac{1}{n} 1^T x} {y = \underbrace{x - \mu}_{\rose{\text{elementwise subtraction}}}} {\| y \|^2 = y^T y} {\sigma^2 = \frac{1}{n} \| y \|^2 \ \blacksquare} \endtoggle $$
2. Jacobians as edge representations
Each edge can be seen as a function \( f: R^n \rightarrow R^m \) whose Jacobian is a matrix of shape \( (m, n) \). We attach this Jacobian matrix to its corresponding edge.$$ x \in R^n \xrightarrow[{\amber{\text{Shape (m,n)}}}]{ \amber{\partial y / \partial x} } y \in R^m $$Next we will see how to use this setup to calculate the Jacobian of the function represented by the DAG.