Skip to content

Add defaults for parameters #1293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

yangdanny97
Copy link
Contributor

@yangdanny97 yangdanny97 commented Jul 27, 2025

Closes #1292

This PR uses a semi-automated/LLM-assisted method to add a few hundred parameter defaults to pandas-stubs, replacing what was previously ....

There were 3 annotations that had to change because their type signatures are not compatible with the default.

The codemod that made these transformations is here: https://gist.github.com/yangdanny97/0d81e16f6374582413dd5bcf90d902fd

How it works is:

  1. parse pandas-stubs and find the module, method/function, & parameter name for any params w/ default ...
  2. parse the docstring of the method/function in pandas, and if the parameter's docstring contains "default" extract the part after that keyword
  3. look up the type annotation in a mapping of the extracted snippet to type annotation
  4. use libCST to apply the type annotation to the parameter

The main thing that needs careful reviewing is the default_type_map which maps the docstring snippet to the type annotation we replace the ... with. That part was generated with chatGPT's help (I gave it every snippet I got after step 2), and I commented out the mappings that seemed wrong or unclear. So, even though this was "LLM-assisted" the script is deterministic and will always give you the same thing each time, and the LLM is only involved in a one-time manual step.

start: TimedeltaConvertibleTypes = ...,
end: TimedeltaConvertibleTypes = ...,
periods: int | None = ...,
start: TimedeltaConvertibleTypes | None = None,
Copy link
Contributor Author

@yangdanny97 yangdanny97 Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says the default is None so I had to change the annotation

https://github.com/pandas-dev/pandas/blob/e4a03b6e47a8ef9cd045902916289cbc976d3d33/pandas/core/indexes/timedeltas.py#L256-L259

Based on the docstring, this should probably be rewritten to have overloads:

Of the four parameters start, end, periods, and freq,
exactly three must be specified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is true for date_range() as well

@loicdiridollou
Copy link
Member

Just for my own knowledge, what is adding the defaults instead of the ellipsis trying to achieve? Is it to help when looking at the docs in the IDE, for pyrefly, or something else?

@@ -1133,7 +1141,7 @@ class Series(IndexOpsMixin[S1], NDFrame):
self,
periods: int | Sequence[int] = ...,
freq: DateOffset | timedelta | _str | None = ...,
axis: Axis = ...,
axis: Axis | None = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this annotation changed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs aren't right here for Series.align

@yangdanny97
Copy link
Contributor Author

yangdanny97 commented Jul 27, 2025

Just for my own knowledge, what is adding the defaults instead of the ellipsis trying to achieve? Is it to help when looking at the docs in the IDE, for pyrefly, or something else?

It's to help with looking at docs/signature in the IDE. This work is unrelated to Pyrefly.

Here's a thread on typeshed w/ some discussion on the pros and cons of defaults: python/typeshed#8988

The consensus seems to be that if the default is simple (like None, or some literal) then it's fine, but if it's complex we use .... Mypy's stub generator has been updated to emit stubs containing simple defaults.

The second reason I made the PR is just to test out this way of doing LLM-assisted codemods. I have another PR in the works that uses the same approach to add a few hundred types for un-annotated parameters and returns, but that one requires more manual filtering to make sure the generated types are good.

@yangdanny97 yangdanny97 marked this pull request as ready for review July 27, 2025 03:05
@yangdanny97 yangdanny97 marked this pull request as draft July 27, 2025 18:21
@yangdanny97 yangdanny97 marked this pull request as ready for review July 27, 2025 19:51
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked every method you changed by looking in the docs. It picks up things that we need to delete or modify anyway. Some general comments:

  1. I use the documented signatures to determine default values, not just what the docs say, because the docs don't always indicate the correct default value, but the signature does because it is picked up in the docs from the actual python declaration.
  2. The copy argument often has a default value of None, so I had to change the declaration a lot of times in your PR, and probably missed a few.

It might be worth modifying your script to look at the default values that are in the signatures rather than the docs. If you do that, you'll pick up a lot more defaults (and probably catch a bunch that I caught via this review).

This PR points out that the pandas docs could be improved, but creating an issue for each improvement would be a lot of work.

When you update, please don't force push, and don't resolve conversations, because then I can see if everything I found was addressed, and can just concentrate on what is changed in your next commit(s).

Thanks for doing this.

@@ -197,12 +197,12 @@ class Period(PeriodMixin):
def day_of_year(self) -> int: ...
@property
def day_of_week(self) -> int: ...
def asfreq(self, freq: str | BaseOffset, how: _PeriodFreqHow = ...) -> Period: ...
def asfreq(self, freq: str | BaseOffset, how: _PeriodFreqHow = "end") -> Period: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this matters. Code has default of how as "E", but docs say "end"

I'm fine with leaving it as you have it, but open to which way to go.

@classmethod
def now(cls, freq: str | BaseOffset = ...) -> Period: ...
def strftime(self, fmt: str) -> str: ...
def to_timestamp(
self,
freq: str | BaseOffset | None = ...,
how: _PeriodToTimestampHow = ...,
how: _PeriodToTimestampHow = "S",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue as in asfreq(). docs say "S", but code says "start"

I'm fine with leaving it as you have it, but open to which way to go.

@@ -157,7 +157,7 @@ class Timestamp(datetime, SupportsIndex):
) -> Timestamp: ...
def astimezone(self, tz: _tzinfo | None = ...) -> Self: ...
def ctime(self) -> str: ...
def isoformat(self, sep: str = ..., timespec: str = ...) -> str: ...
def isoformat(self, sep: str = "T", timespec: str = "auto") -> str: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def isoformat(self, sep: str = "T", timespec: str = "auto") -> str: ...
def isoformat(self, sep: str = "T", timespec: Literal[‘auto’, ‘hours’, ‘minutes’, ‘seconds’, ‘milliseconds’, ‘microseconds’, ‘nanoseconds’]= "auto") -> str: ...

periods: int = ...,
fill_method: None = ...,
periods: int = 1,
fill_method: None = None,
freq: DateOffset | timedelta | _str | None = ...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
freq: DateOffset | timedelta | _str | None = ...,
freq: DateOffset | timedelta | _str | None = None,

periods: int = ...,
fill_method: None = ...,
periods: int = 1,
fill_method: None = None,
freq: DateOffset | timedelta | _str | None = ...,
*,
fill_value: Scalar | NAType | None = ...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove fill_value param - not documented

Comment on lines 1836 to 1838
span: float | None = ...,
halflife: float | None = ...,
alpha: float | None = ...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
span: float | None = ...,
halflife: float | None = ...,
alpha: float | None = ...,
com: float | None = None,
span: float | None = None,
halflife: float | None = None,
alpha: float | None = None,

github wouldn't let me do suggestion on com

@@ -1950,67 +1958,67 @@ class Series(IndexOpsMixin[S1], NDFrame):
self,
other: num | _ListLike | Series[S1],
level: Level | None = ...,
fill_value: float | None = ...,
fill_value: float | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be changed in mul too

def pow(
self,
other: num | _ListLike | Series[S1],
level: Level | None = ...,
fill_value: float | None = ...,
fill_value: float | None = None,
axis: AxisIndex | None = ...,
) -> Series[S1]: ...
def prod(
self,
axis: AxisIndex | None = ...,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these operations, the axis value defaults to 0. There are lots of them in here - I'm commenting on this one, but you should fix the others. I think it is the case that if axis is an argument to a Series method, the default value is 0, because that's the only value that makes sense.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 28, 2025

2. The copy argument often has a default value of None, so I had to change the declaration a lot of times in your PR, and probably missed a few.

I looked further into the copy argument issue. Eventually, that argument will be removed. So I think it would be best to not change copy: _bool = False or copy: _bool = True to be copy: _bool | None = None, because None won't be accepted in the future. So ignore those requests in my review.

@yangdanny97
Copy link
Contributor Author

Thanks for the detailed review!

That's a good point about taking the default values directly from the function declaration. I've pushed a commit doing that using https://gist.github.com/yangdanny97/170f82ee5389584f8b6292bc4ea9c24d

I'll go thru the comments one by one and address them over the next few days, and keep track of which followup issues we need to file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do we want to add trivial defaults to the stubs
3 participants