Inconsistent policy gradient

The policy gradient here seems to be different from the policy gradient in most places, e.g., [Berkeley CS285](http://rail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-5.pdf). Can the author provide where did you cite the original algorithm?