Skip to content

Exercise 10.5 #93

@ShawnHymel

Description

@ShawnHymel

I also found the wording of this question confusing. My best guess is to be "how would the differential TD(0) algorithm be different from tabular TD(0)?" Like you, I also came up with the update formula for the weight vector. (10.10) gives us the TD error, assuming we have the average reward estimate R_bar. From there, I think the only thing you're missing to create the differential TD(0) algorithm is the update for R_bar, which uses the TD error.

In tabular TD(0), we have a single line that updates V(S). For differential TD(0), I think we need to expand that to the following 3 lines to update the weights vector.

IMG-0032

Let me know if you think that sounds reasonable.

Also, since you have done a lot of work to produce these solutions, you might want to see if Rich Sutton would honor the offer to provide book solutions if you email him your answers :) He said he would on his site! http://incompleteideas.net/book/solutions.html. Your answers have been invaluable as I work through the textbook, and I'd also be curious to know how close you are to the book solutions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions