Skip to content

Commit 6824b3a

Browse files
Merge pull request #20 from sichkar-valentyn/updating-readme
Update README.md
2 parents 1ced65e + 466c36b commit 6824b3a

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,11 @@ The environment:
1717

1818
Goal is to learn how to take actions in order to maximize the reward. The objective function is as following:
1919

20-
<b>Q[s_, a_] = Q[s, a] + λ * (r + γ * max (Q[s_, a_]) – Q[s, a]),</b>
20+
<b>Q_[s_, a_] = Q[s, a] + λ * (r + γ * max(Q_[s_, a_]) – Q[s, a]),</b>
2121

2222
where,
23+
<br/><b>Q_[s_, a_]</b> - value of the objective function on the next step,
24+
<br/><b>max(Q_[s_, a_]) – Q[s, a])</b> - choosing maximum value from the possible next steps,
2325
<br/><b>s</b> – current position of the agent,
2426
<br/><b>a</b> – current action,
2527
<br/><b>λ</b> – learning rate,

0 commit comments

Comments
 (0)