Skip to content

Conversation

@GregorySchwartz
Copy link

Unfortunately this changes the API, but I believe that a p-value should be reported with every correlation. This technique uses Student's t distribution, which is fine, but it would be neat to have exact p-values with small samples sizes as well (for the future).

@Shimuuar Shimuuar self-requested a review July 30, 2017 13:46
@GregorySchwartz
Copy link
Author

I changed this request a little -- it makes more sense to start from pearson, which has the same test, and go from there.

pearson :: (G.Vector v (Double, Double), G.Vector v Double)
=> v (Double, Double) -> Double
pearson = correlation
=> v (Double, Double) -> (Double, PValue Double)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably document the return value? that's not clear to an amateur like me at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.

Copy link
Collaborator

@Shimuuar Shimuuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most important thing is comments. For statistical applications it's crucial to describe what exactly function does

pearson :: (G.Vector v (Double, Double), G.Vector v Double)
=> v (Double, Double) -> Double
pearson = correlation
=> v (Double, Double) -> (Double, PValue Double)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's absolutely necessary to document meaning of p-value and what hypothesis is being tested.

)
=> v (a, b)
-> Double
-> (Double, PValue Double)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's especially important for Spearman correlation. What is meaning of p-value here? Is it described anywhere? I'm not sure that student's distribution will arise for ranks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia sources the following for this test:

Press; Vettering; Teukolsky; Flannery (1992). Numerical Recipes in C: The Art of Scientific Computing (2nd ed.). p. 640.

The equation is 14.6.2. Whether this approximation is optimal, I do not know, but I'm sure there are better methods out there for p-values for Spearman's correlation coefficient, but I used the Student's t distribution as a simple solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants