You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/mix_model.md
+46-46Lines changed: 46 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,32 +33,32 @@ A compound lottery can be said to create a _mixture distribution_.
33
33
34
34
Our two ways of constructing a compound lottery will differ in their **timing**.
35
35
36
-
* in one, mixing between two possible probability distributions will occur once and all at the beginning of time
36
+
* in one, mixing between two possible probability distributions will occur once and all at the beginning of time
37
37
38
38
* in the other, mixing between the same two possible probability distributions will occur each period
39
39
40
40
The statistical setting is close but not identical to the problem studied in that quantecon lecture.
41
41
42
-
In that lecture, there were two i.i.d. processes that could possibly govern successive draws of a non-negative random variable $W$.
42
+
In that lecture, there were two i.i.d. processes that could possibly govern successive draws of a non-negative random variable $W$.
43
43
44
-
Nature decided once and for all whether to make a sequence of IID draws from either $ f $ or from $ g $.
44
+
Nature decided once and for all whether to make a sequence of IID draws from either $ f $ or from $ g $.
45
45
46
-
That lecture studied an agent who knew both $f$ and $g$ but did not know which distribution nature chose at time $-1$.
46
+
That lecture studied an agent who knew both $f$ and $g$ but did not know which distribution nature chose at time $-1$.
47
47
48
-
The agent represented that ignorance by assuming that nature had chosen $f$ or $g$ by flipping an unfair coin that put probability $\pi_{-1}$ on probability distribution $f$.
48
+
The agent represented that ignorance by assuming that nature had chosen $f$ or $g$ by flipping an unfair coin that put probability $\pi_{-1}$ on probability distribution $f$.
49
49
50
50
That assumption allowed the agent to construct a subjective joint probability distribution over the
51
51
random sequence $\{W_t\}_{t=0}^\infty$.
52
52
53
53
We studied how the agent would then use the laws of conditional probability and an observed history $w^t =\{w_s\}_{s=0}^t$ to form
54
54
55
55
$$
56
-
\pi_t = E [ \textrm{nature chose distribution} f | w^t] , \quad t = 0, 1, 2, \ldots
56
+
\pi_t = E [ \textrm{nature chose distribution} f | w^t] , \quad t = 0, 1, 2, \ldots
57
57
$$
58
58
59
-
However, in the setting of this lecture, that rule imputes to the agent an incorrect model.
59
+
However, in the setting of this lecture, that rule imputes to the agent an incorrect model.
60
60
61
-
The reason is that now the wage sequence is actually described by a different statistical model.
61
+
The reason is that now the wage sequence is actually described by a different statistical model.
62
62
63
63
Thus, we change the {doc}`quantecon lecture <likelihood_bayes>` specification in the following way.
We'll study two agents who try to learn about the wage process, but who use different statistical models.
74
+
We'll study two agents who try to learn about the wage process, but who use different statistical models.
75
75
76
76
Both types of agent know $f$ and $g$ but neither knows $\alpha$.
77
77
78
-
Our first type of agent erroneously thinks that at time $-1$ nature once and for all chose $f$ or $g$ and thereafter
78
+
Our first type of agent erroneously thinks that at time $-1$, nature once and for all chose $f$ or $g$ and thereafter
79
79
permanently draws from that distribution.
80
80
81
81
Our second type of agent knows, correctly, that nature mixes $f$ and $g$ with mixing probability $\alpha \in (0,1)$
82
82
each period, though the agent doesn't know the mixing parameter.
83
83
84
-
Our first type of agent applies the learning algorithm described in {doc}`this quantecon lecture <likelihood_bayes>`.
84
+
Our first type of agent applies the learning algorithm described in {doc}`this quantecon lecture <likelihood_bayes>`.
85
85
86
86
In the context of the statistical model that prevailed in that lecture, that was a good learning algorithm and it enabled the Bayesian learner
87
87
eventually to learn the distribution that nature had drawn at time $-1$.
@@ -93,7 +93,7 @@ But in the present context, our type 1 decision maker's model is incorrect becau
93
93
generates the data is neither $f$ nor $g$ and so is beyond the support of the models that the agent thinks are
94
94
possible.
95
95
96
-
Nevertheless, we'll see that our first type of agent muddles through and eventually learns something interesting and useful, even though it is not *true*.
96
+
Nevertheless, we'll see that our first type of agent muddles through and eventually learns something interesting and useful, even though it is not *true*.
97
97
98
98
Instead, it turns out that our type 1 agent who is armed with a wrong statistical model ends up learning whichever probability distribution, $f$ or $g$,
99
99
is in a special sense *closest* to the $h$ that actually generates the data.
@@ -103,7 +103,7 @@ We'll tell the sense in which it is closest.
103
103
Our second type of agent understands that nature mixes between $f$ and $g$ each period with a fixed mixing
104
104
probability $\alpha$.
105
105
106
-
But the agent doesn't know $\alpha$.
106
+
But the agent doesn't know $\alpha$.
107
107
108
108
The agent sets out to learn $\alpha$ using Bayes' law applied to his model.
109
109
@@ -116,7 +116,7 @@ In this lecture, we'll learn about
116
116
117
117
* The [Kullback-Leibler statistical divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) that governs statistical learning under an incorrect statistical model
118
118
119
-
* A useful Python function `numpy.searchsorted` that, in conjunction with a uniform random number generator, can be used to sample from an arbitrary distribution
119
+
* A useful Python function `numpy.searchsorted` that, in conjunction with a uniform random number generator, can be used to sample from an arbitrary distribution
120
120
121
121
As usual, we'll start by importing some Python tools.
We'll generate samples using each of them and verify that they match well.
@@ -237,7 +237,7 @@ In other words, if $X \sim F(x)$ we can generate a random sample from $F$ by dra
237
237
a uniform distribution on $[0,1]$ and computing $F^{-1}(U)$.
238
238
239
239
240
-
We'll use this fact
240
+
We'll use this fact
241
241
in conjunction with the `numpy.searchsorted` command to sample from $H$ directly.
242
242
243
243
See the [numpy.searchsorted documentation](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html) for details on the `searchsorted` function.
@@ -263,7 +263,7 @@ def draw_lottery(p, N):
263
263
def draw_lottery_MC(p, N):
264
264
"Draw from the compound lottery using the Monte Carlo trick."
ax1.plot(np.nan, np.nan, '--', color='b', label='Log likelihood ratio process')
443
+
ax1.plot(np.nan, np.nan, '--', color='b', label='Log likelihood ratio process')
444
444
ax1.set_ylabel(r"$\pi_t$")
445
445
ax1.set_xlabel("t")
446
446
ax1.legend()
@@ -471,7 +471,7 @@ Evidently, $\alpha$ is having a big effect on the destination of $\pi_t$ as $t \
471
471
472
472
## Kullback-Leibler divergence governs limit of $\pi_t$
473
473
474
-
To understand what determines whether the limit point of $\pi_t$ is $0$ or $1$ and how the answer depends on the true value of the mixing probability $\alpha \in (0,1)$ that generates
474
+
To understand what determines whether the limit point of $\pi_t$ is $0$ or $1$ and how the answer depends on the true value of the mixing probability $\alpha \in (0,1)$ that generates
Evidently, our type 1 learner who applies Bayes' law to his misspecified set of statistical models eventually learns an approximating model that is as close as possible to the true model, as measured by its
610
610
Kullback-Leibler divergence:
611
611
612
-
- When $\alpha$ is small, $KL_g < KL_f$ meaning the divergence of $g$ from $h$ is smaller than that of $f$ and so the limit point of $\pi_t$ is close to $0$.
612
+
- When $\alpha$ is small, $KL_g < KL_f$, meaning the divergence of $g$ from $h$ is smaller than that of $f$ and so the limit point of $\pi_t$ is close to $0$.
613
613
614
-
- When $\alpha$ is large, $KL_f < KL_g$ meaning the divergence of $f$ from $h$ is smaller than that of $g$ and so the limit point of $\pi_t$ is close to $1$.
614
+
- When $\alpha$ is large, $KL_f < KL_g$, meaning the divergence of $f$ from $h$ is smaller than that of $g$ and so the limit point of $\pi_t$ is close to $1$.
615
615
616
616
## Type 2 agent
617
617
618
618
We now describe how our type 2 agent formulates his learning problem and what he eventually learns.
619
619
620
620
Our type 2 agent understands the correct statistical model but does not know $\alpha$.
621
621
622
-
We apply Bayes law to deduce an algorithm for learning $\alpha$ under the assumption
622
+
We apply Bayes law to deduce an algorithm for learning $\alpha$ under the assumption
623
623
that the agent knows that
624
624
625
625
$$
@@ -628,11 +628,11 @@ $$
628
628
629
629
but does not know $\alpha$.
630
630
631
-
We'll assume that the person starts out with a prior probability $\pi_0(\alpha)$ on
631
+
We'll assume that the agent starts out with a prior probability $\pi_0(\alpha)$ on
632
632
$\alpha \in (0,1)$ where the prior has one of the forms that we deployed in {doc}`this quantecon lecture <bayes_nonconj>`.
633
633
634
634
635
-
We'll fire up `numpyro` and apply it to the present situation.
635
+
We'll fire up `numpyro` and apply it to the present situation.
We'll create graphs of the posterior $\pi_t(\alpha)$ as
647
+
We'll create graphs of the posterior $\pi_t(\alpha)$ as
648
648
$t \rightarrow +\infty$ corresponding to ones presented in the [quantecon lecture on Bayesian nonconjugate priors](https://python.quantecon.org/bayes_nonconj.html).
649
649
650
-
We anticipate that a posterior distribution will collapse around the true $\alpha$ as
650
+
We anticipate that a posterior distribution will collapse around the true $\alpha$ as
651
651
$t \rightarrow + \infty$.
652
652
653
653
Let us try a uniform prior first.
@@ -695,40 +695,40 @@ ax.set_xlabel(r'$\alpha$')
695
695
plt.show()
696
696
```
697
697
698
-
Evidently, the Bayesian posterior narrows in on the true value $\alpha = .8$ of the mixing parameter as the length of a history of observations grows.
698
+
Evidently, the Bayesian posterior narrows in on the true value $\alpha = .8$ of the mixing parameter as the length of a history of observations grows.
699
699
700
700
## Concluding remarks
701
701
702
-
Our type 1 person deploys an incorrect statistical model.
702
+
Our type 1 agent deploys an incorrect statistical model.
703
703
704
704
He believes
705
705
that either $f$ or $g$ generated the $w$ process, but just doesn't know which one.
706
706
707
707
That is wrong because nature is actually mixing each period with mixing probability $\alpha$.
708
708
709
-
Our type 1 agent eventually believes that either $f$ or $g$ generated the $w$ sequence, the outcome being determined by the model, either $f$ or $g$, whose KL divergence relative to $h$ is smaller.
709
+
Our type 1 agent eventually believes that either $f$ or $g$ generated the $w$ sequence, the outcome being determined by the model, either $f$ or $g$, whose KL divergence relative to $h$ is smaller.
710
710
711
711
Our type 2 agent has a different statistical model, one that is correctly specified.
712
712
713
713
He knows the parametric form of the statistical model but not the mixing parameter $\alpha$.
714
714
715
715
He knows that he does not know it.
716
716
717
-
But by using Bayes' law in conjunction with his statistical model and a history of data, he eventually acquires a more and more accurate inference about $\alpha$.
717
+
But by using Bayes' law in conjunction with his statistical model and a history of data, he eventually acquires a more and more accurate inference about $\alpha$.
718
718
719
-
This little laboratory exhibits some important general principles that govern outcomes of Bayesian learning of misspecified models.
719
+
This little laboratory exhibits some important general principles that govern outcomes of Bayesian learning of misspecified models.
720
720
721
-
Thus, the following situation prevails quite generally in empirical work.
721
+
Thus, the following situation prevails quite generally in empirical work.
722
722
723
723
A scientist approaches the data with a manifold $S$ of statistical models $ s (X | \theta)$ , where $s$ is a probability distribution over a random vector $X$, $\theta \in \Theta$
724
724
is a vector of parameters, and $\Theta$ indexes the manifold of models.
725
725
726
726
The scientist with observations that he interprets as realizations $x$ of the random vector $X$ wants to solve an **inverse problem** of somehow _inverting_
727
727
$s(x | \theta)$ to infer $\theta$ from $x$.
728
728
729
-
But the scientist's model is misspecified, being only an approximation to an unknown model $h$ that nature uses to generate $X$.
729
+
But the scientist's model is misspecified, being only an approximation to an unknown model $h$ that nature uses to generate $X$.
730
730
731
-
If the scientist uses Bayes' law or a related likelihood-based method to infer $\theta$, it occurs quite generally that for large sample sizes the inverse problem infers a $\theta$ that minimizes the KL divergence of the scientist's model $s$ relative to nature's model $h$.
731
+
If the scientist uses Bayes' law or a related likelihood-based method to infer $\theta$, it occurs quite generally that for large sample sizes the inverse problem infers a $\theta$ that minimizes the KL divergence of the scientist's model $s$ relative to nature's model $h$.
0 commit comments