Motivation
Suppose there is an earthquake.
Let
be the number of casualties and
be the Richter scale of the earthquake.
(a) Without given anything, what is the distribution of
?
(b) Given that
, what is the distribution of
?
(c) Given that
, what is the distribution of
?
Remark.
means the earthquake is micro, and
means the earthquake is great.
Are your answers to (a),(b),(c) different?
In (b) and (c), we have the conditional distribution of
given
, and the conditional distribution of
given
respectively.
In general, we have conditional distribution of
given
(before observing the value of
), or
given
(after observing the value of
).
Conditional distributions
Recall the definition of conditional probability:
in which
are events, with
.
Applying this definition to discrete random variables
, we have
where
is the joint pmf of
and
, and
is the marginal pmf of
.
It is natural to call such conditional probability as conditional pmf, right?
We will denote such conditional probability as
.
Then, this is basically the definition of conditional pmf: conditional pmf of
given
is the conditional probability
.
Naturally, we will expect that conditional pdf is defined similarly. This is indeed the case:
Remark.
- The marginal pdf can be interpreted as normalizing constant, which makes the integral
, since
(integrating over the region in which
is fixed to be
(the region in which the condition is satisfied), so we only integrate over the corresponding interval of
(
is still a variable)).
- This is similar to the denominator in the definition of conditional probability, which makes the conditional probability of the whole sample space equals one, to satisfy the probability axiom.
To understand the definition more intuitively for the continuous case, consider the following diagram.
Top view:
|
|
*---------------*
| |
| |
fixed y *===============* <--- corresponding interval
| |
| |
*---------------*
|
*---------------- x
Side view:
*
/ \
*\ * /
/|#\ \
| / |##\ / *---------*
| * |###\ /\
| |\ |##/#\----------/--\
| | \|#/###*--------* /
| | \/############/#\ /
| |y *\===========/===*
| | / *---------* /
| |/ \ /
| *----------------*
|/
*------------------------- x
Front view:
|
|
|
*\
|#\
|##\
|###\
|####\ <------ Area: f_Y(y)
|#####*--------*
|###############\
*================*-------------- x
*---*
|###| : corresponding cross section from joint pdf
*---*
We can see that when we are conditioning
, we take a "slice" out from the region under joint pdf,
and the area of the "whole slice" is the area
between the univariate joint pdf
with fixed
and variable
,
and the
-axis.
Since the area is given by
,
while according to the probability axioms, the area should equal 1.
Hence, we scale down the area of "slice" by a factor of
, by dividing the univariate joint pdf
by
.
After that, the curve at the top of scaled "slice" is the graph of the conditional pdf
.
Now, we have discussed the case where both random variables are discrete or continuous.
How about the case where one of them is discrete and another one is continuous?
In this case, there is no "joint probability function" of these two random variables, since one is discrete and another is continuous!
But, we can still define the conditional probability function in some other ways.
To motivate the following definition, let
be the conditional probability
.
Then, differentiating
with respect to
should yield the conditional pdf
.
So, we have
Thus, it is natural to have the following definition.
Now, how about the case where
is discrete and
is continuous?
In this case, let us use the above definition for the motivation of definition. However, we should interchange
and
so that the assumptions are still satisfied.
Then, we get
In this case,
is discrete, so it is natural to define the conditional pmf of
given
as
in the expression.
Now, after rearranging the terms, we get
Thus, we have the following definition.
Based on the definitions of conditional probability functions, it is natural to define the conditional cdf as follows.
Remark.
- We should be aware that when
is continuous, the event
has probability zero. So, according to the definition of conditional probability, the conditional cdf in this case should be undefined. However, in this context, we still define the conditional probability as an expression that makes sense and is defined.
Graphical illustration of the definition (continuous random variables):
Top view:
|
|
*---------------*
| |
| |
fixed y *=========@=====* <--- corresponding interval
| x |
| |
*---------------*
|
*----------------
Side view:
*
/ \
*\ * /
/|#\ \
| / |##\ / *---------*
| * |###\ /\
| |\ |##/#\----------/--\
| | \|#/###*--------* /
| | \/######### / \ /
| |y *\========@==/===*
| | / *-------x-* /
| |/ \ /
| *----------------*
|/
*------------------------- x
Front view:
|
|
|
*\
|#\
|##\
|###\
|####\ <------------- Area: f_Y(y)
|#####*--------*
|########### \
*==========@=====*--------------
x
*---*
|###| : the desired region from the cross section from joint pdf, whose area is the probability from the cdf
*---*
If
for some event
,
we have some special notations for simplicity:
- the conditional probability function of
given
becomes
- the conditional cdf of
given
becomes
Proof.
Recall the definition of independence between two random variables:
are independent if
- for each
.
Since
for each
,
we have the desired result.
Remark.
- This is expected, since the conditioning on independent event should not affect the occurrence of another independent event.
We can extend the definition of conditional probability function and cdf to groups of random variables, for joint cdf's and joint probability functions, as follows:
Definition.
(Conditional joint probability function)
Let
and
be two random vectors.
The conditional joint probability function of
given
is
Then, we also have a similar proposition for determining independence of two random vectors.
Proposition.
(Determining independence of two random vectors)
Random vectors
are independent if and only if
for each
.
Proof.
The definition of independence between two random vectors is
are independent if
- for each
.
Since
for each
,
we have the desired result.
Conditional distributions of bivariate normal distribution
Recall from the Probability/Important Distributions chapter that the joint pdf of
is
,
and
and
in this case.
in which
and
are positive.
Proposition.
(Conditional distributions of bivariate normal distribution)
Let
.
Then,
(abuse of notations: when we say the distribution of "
", we mean the conditional distribution of
given
).
Proof.
- First, the conditional pdf
- Then, we can see that
,
- and by symmetry (interchanging
and
, and also interchanging
and
),
.
Conditional version of concepts
We can obtain conditional version of concepts previously established for 'unconditional'
distributions analogously for conditional distributions by substituting 'unconditional' cdf, pdf or pmf, i.e.
or
,
by their conditional counterparts, i.e.
or
.
Conditional independence
Definition.
Random variables
are conditionally independent given
if and only if
or
.
for each real number
and for each positive integer
, in which
and
denote the joint cdf and probability function of
conditional on
respectively.
Remark.
- For random variables, conditional independence and independence are not related, i.e. one of them does not imply the another.
Example.
(Conditional independence does not imply independence)
TODO
Example.
(Independence does not imply conditional independence)
TODO
Conditional expectation
Similarly, we have conditional version of law of the unconscious statistician.
Proposition.
(Conditional expectation under independence)
If random variables
are independent,
for each function
.
Proof.
Remark.
- This equality may not hold if
are not independent.
The properties of
still hold for conditional expectations
, with every 'unconditional' expectation replaced by conditional expectation and some suitable modifications, as follows:
Proposition.
(Properties of conditional expectation)
For each random variable
,
- (linearity)
![{\displaystyle \mathbb {E} [\underbrace {\alpha {\color {darkgreen}(Y)}} _{{\text{constant given }}Y}X_{1}+\underbrace {\beta {\color {darkgreen}(Y)}} _{{\text{constant given }}Y}X_{2}+\underbrace {\gamma {\color {darkgreen}(Y)}} _{{\text{constant given }}Y}{\color {darkgreen}|Y}]=\alpha {\color {darkgreen}(Y)}\mathbb {E} [X_{1}{\color {darkgreen}|Y}]+\beta {\color {darkgreen}(Y)}\mathbb {E} [X_{2}{\color {darkgreen}|Y}]+\gamma {\color {darkgreen}(Y)}}](../169b88611d7866fcef14998cef04831787fd4312.svg)
- for each functions
of
and for each random variable 
- (nonnegativity) if
, ![{\displaystyle \mathbb {E} [X{\color {darkgreen}|Y}]\geq 0}](../10a4b6f031dd9b68366c4be59db4250317373b1e.svg)
- (monotonicity) if
,
for each random variable 
- (triangle inequality)
- (multiplicativity under independence) if
are conditionally independent given
,
Proof.
The proof is similar to the one for 'unconditional' expectations.
Remark.
are treated as constants given
, since after observing the value of
, they cannot be changed.
- Each result also holds with
replaced by random vectors
.
The following theorem about conditional expectation is quite important.
Theorem.
(Law of total expectation)
For each function
and for each random variable
,
Proof.
Remark.
- We can replace
by
and get
Corollary.
(Generalized law of total probability)
For each event
,
Proof.
- Then, using law of total expectation,
Corollary.
(Expectation version of law of total probability)
Suppose the sample space
in which
's are mutually exclusive.
Then,
Remark.
- the number of events can be finite, as long as they are mutually exclusive and their union is the whole sample space
- if
, it reduces to law of total probability
Example.
Let
be the human height in m.
A person is randomly selected from a population consisting of same number of men and women. Given that the mean height of a man is 1.8 m, and that of a woman is 1.7m,
the mean height of the entire population is
Proof.
By the formula of expectation computed by weighted average of conditional expectations,
and the result follows if
.
Remark.
- if
, it reduces to the definition of the conditional probability
by the fundamental bridge between probability and expectation
After defining conditional expectation, we can also have conditional variance, covariance and correlation coefficient, since variance, covariance, and correlation coefficient are built upon expectation.
Conditional expectations of bivariate normal distribution
Proposition.
(Conditional expectations of bivariate normal distribution)
Let
.
Then,
Proof.
- The result follows from the proposition about conditional distributions of bivariate normal distribution readily.
Conditional variance
Definition.
(Conditional variance)
The conditional variance of random variable
given
is
Similarly, we have properties of conditional variance which are similar to that of variance.
Proof.
The proof is similar to the one for properties of variance.
Beside law of total expectation, we also have law of total variance, as follows:
Proposition.
(Law of total variance)
For each rnadom variable
,
Proof.
Remark.
- We can replace
by
, a random vector.
Conditional variances of bivariate normal distribution
Proposition.
(Conditional variances of bivariate normal distribution)
Let
.
Then,
Proof.
- The result follows from he proposition about conditional distributions of bivariate normal distribution readily.
Remark.
- It can be observed that the exact values of
and
in the conditions do not matter. The result is the same for different values of them.
Conditional covariance
Proposition.
(Properties of conditional covariance)
(i) (symmetry) for each random variable
,
(ii) for each random variable
,
(iii) (alternative formula of covariance)
(iv)
for each constant
,
and for each random variables
,
(v) for each random variable
,
Conditional correlation coefficient
Remark.
- Similar to 'unconditional' correlation coefficient, conditional correlation coefficient also lies between
and
inclusively. The proof is similar, by replacing every unconditional terms with conditional terms.
Conditional quantile
Remark.
- Then, we can have conditional median, interquartile range, etc., which are defined using conditional quantile in the same way as the unconditional ones