Skip to content

Commit 67c3a4b

Browse files
committed
fix spacing, spelling and unclear sentences using claude code
1 parent 7cba5eb commit 67c3a4b

File tree

1 file changed

+46
-46
lines changed

1 file changed

+46
-46
lines changed

lectures/mix_model.md

Lines changed: 46 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -33,32 +33,32 @@ A compound lottery can be said to create a _mixture distribution_.
3333

3434
Our two ways of constructing a compound lottery will differ in their **timing**.
3535

36-
* in one, mixing between two possible probability distributions will occur once and all at the beginning of time
36+
* in one, mixing between two possible probability distributions will occur once and all at the beginning of time
3737

3838
* in the other, mixing between the same two possible probability distributions will occur each period
3939

4040
The statistical setting is close but not identical to the problem studied in that quantecon lecture.
4141

42-
In that lecture, there were two i.i.d. processes that could possibly govern successive draws of a non-negative random variable $W$.
42+
In that lecture, there were two i.i.d. processes that could possibly govern successive draws of a non-negative random variable $W$.
4343

44-
Nature decided once and for all whether to make a sequence of IID draws from either $ f $ or from $ g $.
44+
Nature decided once and for all whether to make a sequence of IID draws from either $ f $ or from $ g $.
4545

46-
That lecture studied an agent who knew both $f$ and $g$ but did not know which distribution nature chose at time $-1$.
46+
That lecture studied an agent who knew both $f$ and $g$ but did not know which distribution nature chose at time $-1$.
4747

48-
The agent represented that ignorance by assuming that nature had chosen $f$ or $g$ by flipping an unfair coin that put probability $\pi_{-1}$ on probability distribution $f$.
48+
The agent represented that ignorance by assuming that nature had chosen $f$ or $g$ by flipping an unfair coin that put probability $\pi_{-1}$ on probability distribution $f$.
4949

5050
That assumption allowed the agent to construct a subjective joint probability distribution over the
5151
random sequence $\{W_t\}_{t=0}^\infty$.
5252

5353
We studied how the agent would then use the laws of conditional probability and an observed history $w^t =\{w_s\}_{s=0}^t$ to form
5454

5555
$$
56-
\pi_t = E [ \textrm{nature chose distribution} f | w^t] , \quad t = 0, 1, 2, \ldots
56+
\pi_t = E [ \textrm{nature chose distribution} f | w^t] , \quad t = 0, 1, 2, \ldots
5757
$$
5858

59-
However, in the setting of this lecture, that rule imputes to the agent an incorrect model.
59+
However, in the setting of this lecture, that rule imputes to the agent an incorrect model.
6060

61-
The reason is that now the wage sequence is actually described by a different statistical model.
61+
The reason is that now the wage sequence is actually described by a different statistical model.
6262

6363
Thus, we change the {doc}`quantecon lecture <likelihood_bayes>` specification in the following way.
6464

@@ -71,17 +71,17 @@ $$
7171
H(w) = \alpha F(w) + (1-\alpha) G(w), \quad \alpha \in (0,1)
7272
$$
7373

74-
We'll study two agents who try to learn about the wage process, but who use different statistical models.
74+
We'll study two agents who try to learn about the wage process, but who use different statistical models.
7575

7676
Both types of agent know $f$ and $g$ but neither knows $\alpha$.
7777

78-
Our first type of agent erroneously thinks that at time $-1$ nature once and for all chose $f$ or $g$ and thereafter
78+
Our first type of agent erroneously thinks that at time $-1$, nature once and for all chose $f$ or $g$ and thereafter
7979
permanently draws from that distribution.
8080

8181
Our second type of agent knows, correctly, that nature mixes $f$ and $g$ with mixing probability $\alpha \in (0,1)$
8282
each period, though the agent doesn't know the mixing parameter.
8383

84-
Our first type of agent applies the learning algorithm described in {doc}`this quantecon lecture <likelihood_bayes>`.
84+
Our first type of agent applies the learning algorithm described in {doc}`this quantecon lecture <likelihood_bayes>`.
8585

8686
In the context of the statistical model that prevailed in that lecture, that was a good learning algorithm and it enabled the Bayesian learner
8787
eventually to learn the distribution that nature had drawn at time $-1$.
@@ -93,7 +93,7 @@ But in the present context, our type 1 decision maker's model is incorrect becau
9393
generates the data is neither $f$ nor $g$ and so is beyond the support of the models that the agent thinks are
9494
possible.
9595

96-
Nevertheless, we'll see that our first type of agent muddles through and eventually learns something interesting and useful, even though it is not *true*.
96+
Nevertheless, we'll see that our first type of agent muddles through and eventually learns something interesting and useful, even though it is not *true*.
9797

9898
Instead, it turns out that our type 1 agent who is armed with a wrong statistical model ends up learning whichever probability distribution, $f$ or $g$,
9999
is in a special sense *closest* to the $h$ that actually generates the data.
@@ -103,7 +103,7 @@ We'll tell the sense in which it is closest.
103103
Our second type of agent understands that nature mixes between $f$ and $g$ each period with a fixed mixing
104104
probability $\alpha$.
105105

106-
But the agent doesn't know $\alpha$.
106+
But the agent doesn't know $\alpha$.
107107

108108
The agent sets out to learn $\alpha$ using Bayes' law applied to his model.
109109

@@ -116,7 +116,7 @@ In this lecture, we'll learn about
116116

117117
* The [Kullback-Leibler statistical divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) that governs statistical learning under an incorrect statistical model
118118

119-
* A useful Python function `numpy.searchsorted` that, in conjunction with a uniform random number generator, can be used to sample from an arbitrary distribution
119+
* A useful Python function `numpy.searchsorted` that, in conjunction with a uniform random number generator, can be used to sample from an arbitrary distribution
120120

121121
As usual, we'll start by importing some Python tools.
122122

@@ -161,7 +161,7 @@ G_a, G_b = 3, 1.2
161161
@vectorize
162162
def p(x, a, b):
163163
r = gamma(a + b) / (gamma(a) * gamma(b))
164-
return r * x** (a-1) * (1 - x) ** (b-1)
164+
return r * x**(a-1) * (1 - x)**(b-1)
165165
166166
# The two density functions.
167167
f = jit(lambda x: p(x, F_a, F_b))
@@ -208,7 +208,7 @@ l_seq_f = np.cumprod(l_arr_f, axis=1)
208208

209209
## Sampling from compound lottery $H$
210210

211-
We implement two methods to draw samples from
211+
We implement two methods to draw samples from
212212
our mixture model $\alpha F + (1-\alpha) G$.
213213

214214
We'll generate samples using each of them and verify that they match well.
@@ -237,7 +237,7 @@ In other words, if $X \sim F(x)$ we can generate a random sample from $F$ by dra
237237
a uniform distribution on $[0,1]$ and computing $F^{-1}(U)$.
238238

239239

240-
We'll use this fact
240+
We'll use this fact
241241
in conjunction with the `numpy.searchsorted` command to sample from $H$ directly.
242242

243243
See the [numpy.searchsorted documentation](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html) for details on the `searchsorted` function.
@@ -263,7 +263,7 @@ def draw_lottery(p, N):
263263
def draw_lottery_MC(p, N):
264264
"Draw from the compound lottery using the Monte Carlo trick."
265265
266-
xs = np.linspace(1e-8,1-(1e-8),10000)
266+
xs = np.linspace(1e-8, 1-(1e-8), 10000)
267267
CDF = p*sp.beta.cdf(xs, F_a, F_b) + (1-p)*sp.beta.cdf(xs, G_a, G_b)
268268
269269
Us = np.random.rand(N)
@@ -296,7 +296,7 @@ We'll now study what our type 1 agent learns
296296

297297
Remember that our type 1 agent uses the wrong statistical model, thinking that nature mixed between $f$ and $g$ once and for all at time $-1$.
298298

299-
The type 1 agent thus uses the learning algorithm studied in {doc}`this quantecon lecture <likelihood_bayes>`.
299+
The type 1 agent thus uses the learning algorithm studied in {doc}`this quantecon lecture <likelihood_bayes>`.
300300

301301
We'll briefly review that learning algorithm now.
302302

@@ -306,8 +306,8 @@ $$
306306
\pi_t = {\rm Prob}(q=f|w^t)
307307
$$
308308

309-
The likelihood ratio process plays a principal role in the formula that governs the evolution
310-
of the posterior probability $ \pi_t $, an instance of **Bayes Law**.
309+
The likelihood ratio process plays a principal role in the formula that governs the evolution
310+
of the posterior probability $ \pi_t $, an instance of **Bayes' Law**.
311311

312312
Bayes’ law implies that $ \{\pi_t\} $ obeys the recursion
313313

@@ -334,8 +334,8 @@ def update(π, l):
334334
return π
335335
```
336336
337-
Formula {eq}`eq:recur1` can be generalized by iterating on it and thereby deriving an
338-
expression for the time $ t $ posterior $ \pi_{t+1} $ as a function
337+
Formula {eq}`eq:recur1` can be generalized by iterating on it and thereby deriving an
338+
expression for the time $ t $ posterior $ \pi_{t+1} $ as a function
339339
of the time $ 0 $ prior $ \pi_0 $ and the likelihood ratio process
340340
$ L(w^{t+1}) $ at time $ t $.
341341
@@ -381,7 +381,7 @@ $ L\left(w^{t+1}\right)>0 $, we can verify that
381381
$ \pi_{t+1}\in\left(0,1\right) $.
382382
383383
After rearranging the preceding equation, we can express $ \pi_{t+1} $ as a
384-
function of $ L\left(w^{t+1}\right) $, the likelihood ratio process at $ t+1 $,
384+
function of $ L\left(w^{t+1}\right) $, the likelihood ratio process at $ t+1 $,
385385
and the initial prior $ \pi_{0} $
386386
387387
$$
@@ -440,7 +440,7 @@ def plot_π_seq(α, π1=0.2, π2=0.8, T=200):
440440
for i in range(2):
441441
ax1.plot(range(T+1), π_seq_mixed[i, :], label=rf"$\pi_0$={π_seq_mixed[i, 0]}")
442442
443-
ax1.plot(np.nan, np.nan, '--', color='b', label='Log likelihood ratio process')
443+
ax1.plot(np.nan, np.nan, '--', color='b', label='Log likelihood ratio process')
444444
ax1.set_ylabel(r"$\pi_t$")
445445
ax1.set_xlabel("t")
446446
ax1.legend()
@@ -471,7 +471,7 @@ Evidently, $\alpha$ is having a big effect on the destination of $\pi_t$ as $t \
471471
472472
## Kullback-Leibler divergence governs limit of $\pi_t$
473473
474-
To understand what determines whether the limit point of $\pi_t$ is $0$ or $1$ and how the answer depends on the true value of the mixing probability $\alpha \in (0,1) $ that generates
474+
To understand what determines whether the limit point of $\pi_t$ is $0$ or $1$ and how the answer depends on the true value of the mixing probability $\alpha \in (0,1)$ that generates
475475
476476
$$ h(w) \equiv h(w | \alpha) = \alpha f(w) + (1-\alpha) g(w) $$
477477

@@ -490,13 +490,13 @@ $$
490490
We shall plot both of these functions against $\alpha$ as we use $\alpha$ to vary
491491
$h(w) = h(w|\alpha)$.
492492

493-
The limit of $\pi_t$ is determined by
493+
The limit of $\pi_t$ is determined by
494494

495495
$$ \min_{f,g} \{KL_g, KL_f\} $$
496496

497497
The only possible limits are $0$ and $1$.
498498

499-
As $t \rightarrow +\infty$, $\pi_t$ goes to one if and only if $KL_f < KL_g$
499+
As $t \rightarrow +\infty$, $\pi_t$ goes to one if and only if $KL_f < KL_g$
500500

501501
```{code-cell} ipython3
502502
@vectorize
@@ -566,7 +566,7 @@ ax.legend(loc='upper right')
566566
plt.show()
567567
```
568568

569-
Let's compute an $\alpha$ for which the KL divergence between $h$ and $g$ is the same as that between $h$ and $f$.
569+
Let's compute an $\alpha$ for which the KL divergence between $h$ and $g$ is the same as that between $h$ and $f$.
570570

571571
```{code-cell} ipython3
572572
# where KL_f = KL_g
@@ -578,7 +578,7 @@ We can compute and plot the convergence point $\pi_{\infty}$ for each $\alpha$ t
578578
The blue circles show the limiting values of $\pi_t$ that simulations discover for different values of $\alpha$
579579
recorded on the $x$ axis.
580580

581-
Thus, the graph below confirms how a minimum KL divergence governs what our type 1 agent eventually learns.
581+
Thus, the graph below confirms how a minimum KL divergence governs what our type 1 agent eventually learns.
582582

583583
```{code-cell} ipython3
584584
α_arr_x = α_arr[(α_arr<discretion)|(α_arr>discretion)]
@@ -609,17 +609,17 @@ plt.show()
609609
Evidently, our type 1 learner who applies Bayes' law to his misspecified set of statistical models eventually learns an approximating model that is as close as possible to the true model, as measured by its
610610
Kullback-Leibler divergence:
611611

612-
- When $\alpha$ is small, $KL_g < KL_f$ meaning the divergence of $g$ from $h$ is smaller than that of $f$ and so the limit point of $\pi_t$ is close to $0$.
612+
- When $\alpha$ is small, $KL_g < KL_f$, meaning the divergence of $g$ from $h$ is smaller than that of $f$ and so the limit point of $\pi_t$ is close to $0$.
613613

614-
- When $\alpha$ is large, $KL_f < KL_g$ meaning the divergence of $f$ from $h$ is smaller than that of $g$ and so the limit point of $\pi_t$ is close to $1$.
614+
- When $\alpha$ is large, $KL_f < KL_g$, meaning the divergence of $f$ from $h$ is smaller than that of $g$ and so the limit point of $\pi_t$ is close to $1$.
615615

616616
## Type 2 agent
617617

618618
We now describe how our type 2 agent formulates his learning problem and what he eventually learns.
619619

620620
Our type 2 agent understands the correct statistical model but does not know $\alpha$.
621621

622-
We apply Bayes law to deduce an algorithm for learning $\alpha$ under the assumption
622+
We apply Bayes law to deduce an algorithm for learning $\alpha$ under the assumption
623623
that the agent knows that
624624

625625
$$
@@ -628,11 +628,11 @@ $$
628628

629629
but does not know $\alpha$.
630630

631-
We'll assume that the person starts out with a prior probability $\pi_0(\alpha)$ on
631+
We'll assume that the agent starts out with a prior probability $\pi_0(\alpha)$ on
632632
$\alpha \in (0,1)$ where the prior has one of the forms that we deployed in {doc}`this quantecon lecture <bayes_nonconj>`.
633633

634634

635-
We'll fire up `numpyro` and apply it to the present situation.
635+
We'll fire up `numpyro` and apply it to the present situation.
636636

637637
Bayes' law now takes the form
638638

@@ -642,12 +642,12 @@ $$
642642
{ \int h(w_{t+1} | \hat \alpha) \pi_t(\hat \alpha) d \hat \alpha }
643643
$$
644644

645-
We'll use numpyro to approximate this equation.
645+
We'll use numpyro to approximate this equation.
646646

647-
We'll create graphs of the posterior $\pi_t(\alpha)$ as
647+
We'll create graphs of the posterior $\pi_t(\alpha)$ as
648648
$t \rightarrow +\infty$ corresponding to ones presented in the [quantecon lecture on Bayesian nonconjugate priors](https://python.quantecon.org/bayes_nonconj.html).
649649

650-
We anticipate that a posterior distribution will collapse around the true $\alpha$ as
650+
We anticipate that a posterior distribution will collapse around the true $\alpha$ as
651651
$t \rightarrow + \infty$.
652652

653653
Let us try a uniform prior first.
@@ -695,40 +695,40 @@ ax.set_xlabel(r'$\alpha$')
695695
plt.show()
696696
```
697697

698-
Evidently, the Bayesian posterior narrows in on the true value $\alpha = .8$ of the mixing parameter as the length of a history of observations grows.
698+
Evidently, the Bayesian posterior narrows in on the true value $\alpha = .8$ of the mixing parameter as the length of a history of observations grows.
699699

700700
## Concluding remarks
701701

702-
Our type 1 person deploys an incorrect statistical model.
702+
Our type 1 agent deploys an incorrect statistical model.
703703

704704
He believes
705705
that either $f$ or $g$ generated the $w$ process, but just doesn't know which one.
706706

707707
That is wrong because nature is actually mixing each period with mixing probability $\alpha$.
708708

709-
Our type 1 agent eventually believes that either $f$ or $g$ generated the $w$ sequence, the outcome being determined by the model, either $f$ or $g$, whose KL divergence relative to $h$ is smaller.
709+
Our type 1 agent eventually believes that either $f$ or $g$ generated the $w$ sequence, the outcome being determined by the model, either $f$ or $g$, whose KL divergence relative to $h$ is smaller.
710710

711711
Our type 2 agent has a different statistical model, one that is correctly specified.
712712

713713
He knows the parametric form of the statistical model but not the mixing parameter $\alpha$.
714714

715715
He knows that he does not know it.
716716

717-
But by using Bayes' law in conjunction with his statistical model and a history of data, he eventually acquires a more and more accurate inference about $\alpha$.
717+
But by using Bayes' law in conjunction with his statistical model and a history of data, he eventually acquires a more and more accurate inference about $\alpha$.
718718

719-
This little laboratory exhibits some important general principles that govern outcomes of Bayesian learning of misspecified models.
719+
This little laboratory exhibits some important general principles that govern outcomes of Bayesian learning of misspecified models.
720720

721-
Thus, the following situation prevails quite generally in empirical work.
721+
Thus, the following situation prevails quite generally in empirical work.
722722

723723
A scientist approaches the data with a manifold $S$ of statistical models $ s (X | \theta)$ , where $s$ is a probability distribution over a random vector $X$, $\theta \in \Theta$
724724
is a vector of parameters, and $\Theta$ indexes the manifold of models.
725725

726726
The scientist with observations that he interprets as realizations $x$ of the random vector $X$ wants to solve an **inverse problem** of somehow _inverting_
727727
$s(x | \theta)$ to infer $\theta$ from $x$.
728728

729-
But the scientist's model is misspecified, being only an approximation to an unknown model $h$ that nature uses to generate $X$.
729+
But the scientist's model is misspecified, being only an approximation to an unknown model $h$ that nature uses to generate $X$.
730730

731-
If the scientist uses Bayes' law or a related likelihood-based method to infer $\theta$, it occurs quite generally that for large sample sizes the inverse problem infers a $\theta$ that minimizes the KL divergence of the scientist's model $s$ relative to nature's model $h$.
731+
If the scientist uses Bayes' law or a related likelihood-based method to infer $\theta$, it occurs quite generally that for large sample sizes the inverse problem infers a $\theta$ that minimizes the KL divergence of the scientist's model $s$ relative to nature's model $h$.
732732

733733

734734
## Exercises

0 commit comments

Comments
 (0)