Understanding Why Chain Rule Works
Table of Contents
“Mathematical intuition is the ability to see the truth without first having gone through a formal process of reasoning.”
— Henri Poincaré
The goal of this post is to force you to think about the chain rule a bit deeply.
The only prerequisite for reading this is understanding the power rule(i.e ). Understanding differentiation intuitively is recommended.
The chain rule is a differentiation method for composite functions. It is defined as follows for a function where which implies differentiate with respect to treating as an independent variable, and differentiate with respect to and then multiply both.
For example, given the function , we can differentiate it as follows: we would set and so we can express as .
Now, let’s use the chain rule: , and . Recall that , so we have .
To understand the chain rule, you must first understand how function compositions work.
If we have a functions and , what does mean?
It means that for the function , evaluate it at instead of . We want to study the effect of this change.
How does change with respect to a change ? Another way to put it is this: how does change with respect to and with respect to ?
Let’s explore the behaviour of change in a function composition.
Example 1: For and for an interval .
Let’s see the graph:
Let’s build the intuition for this by looking at the table of values of , and on the interval.
The graph shows that the maximum value for is while that of is . That’s a scale-up.
Can you see a pattern? Every value of is times the value of .
We can show this formally by picking any two unique pairs of and (i.e at the same ) and finding the slope(i.e ). For example, using the pairs (, ) and (, ), we have . Let’s use another two pairs (, ) and (, ), we have
This turns out to be easy to see because is simply times so it makes sense that this is the case. Other places you can notice is as follows:
- The values of is 4 times the values of
- Every value of and is divisible by
- The difference between any two values of and is divisible by The means that for every change in the value of there’s a 4x change in the value of . This sounds weird, right? we just represented the rate of change between two somewhat independent functions as if they were dependent on each other. If this relationship was a function it would be so that . This works because is a composition of and and .
Let’s try this for and . But then, you might ask: Is a composition of ? Let’s see.
If we created a function , then equals . This is true for every function of . Having set this foundation, we can see that for every change in , by a factor of .
Note: You won’t always have a constant factor between and or between their derivatives. But most times would affect the behaviour of change in the compared to . It either makes it faster, slower, scale-up, scale-down and even combination of these along certain intervals.
The examples below show some of these other behaviours of change.
Example 2: For and for an interval .
Example 3: For and for an interval .
Chain rule answers the question: how do you express a function dependent on a variable as the rate of change in respect to that variable when the variable is a function?
Why is treated like a variable?
In the chain rule why are we differentiating in respect to as if it was a variable? This is simply because it is.
The intuition is the definition of the behaviour of change in a composite function: as changes and as changes .
is a variable in so it’s treated as such when represent the change in
Why is the change in multiplied by the change in ?
This is simply because as changes changes and that’s a factor affecting the change in .
Why Multiplication(and Not Addition)?
Why is the chain rule and not ?
We would look two ways to look at this: the first is a bit hand-wavy and the other is more intuitive.
For the first one, the intuition lies is in the definition: as changes and as changes . For clarity: Let’s break this statement into three part:
- as changes - is treated as a variable
- changes
- and - from binary operations, we know that and implies multiplication. just like or implies addition.
Let’s look the more intuitive way version: currency conversions!
I have Nigerian Naira (NGN) and I want to convert it to Pounds sterling(GBP). But, there’s a little challenge: there are only two exchanges available; NGN/USD and USD/GBP. We would have to convert from NGN to USD and then, from USD to GBP. The rates are 1 USD = NGN1,500 and 1 GBP = 1.25 USD respectively. How do we convert NGN20,000 to GBP?
First, we would convert to USD. NGN20,000 to USD is 20,000/1500 = 13.33 USD. Then, USD to GBP is 13.33 / 1.25 = 10.67 GBP. That is, NGN20,000 equals 10.67 GBP. To get the rate of NGN to GBP, we multiply the both rates(.i.e 1/1500 x 1/1.25 = 1/1875).
The interesting thing is we can convert these individual conversions to functions. The function for converting NGN to USD would be and the one for converting from USD to GBP would be . The most interesting part of this is that the function for converting from NGN to GBP is function composition of and ! That is, .
If we apply the chain rule to the composite function where , we have . That is our expected rate!
But, if we change multiplication to addition to we have . This is very wrong!
I employ you to think of others ways asides currency conversions whereby this can be shown!
I hope you were able to think wide and far about function compositions and the chain rule!