Skip to content

Conversation

tim-schilling
Copy link
Member

@tim-schilling tim-schilling commented Jul 23, 2025

The blog results should have a property of whether it is included in the search results. We should also limit the blogs that are searchable for a version of Django based on the support end. This will allow us to limit the inclusion of blog posts in the search based on the time the entry was created, keeping the search results relevant to that version of Django.

This is a WIP yet. I'm opening the draft PR to solicit early feedback on the approach. I'm thinking we'll re-use the DocumentationCategory.WEBSITE in place of "weblog".

@tim-schilling
Copy link
Member Author

tim-schilling commented Jul 31, 2025

I was given tacit approval (lack of pushback) from the website WG to use this approach. I plan on working on this at the DjangoCon Africa sprints (Aug 14 & 15), but if anyone wants to complete it before that, please do!

@tim-schilling tim-schilling marked this pull request as ready for review September 1, 2025 17:48
@tim-schilling
Copy link
Member Author

I'm going to try to clean up the commits shortly here.

The blog results should have a property of whether it is included in
the search results. We should also limit the blogs that are searchable
for a version of Django based on the support end. This will allow us
to limit the inclusion of blog posts in the search based on the time
the entry was created, keeping the search results relevant to that
version of Django.
@tim-schilling
Copy link
Member Author

Cleaned up the commits. Please let me know if any other changes are necessary.

docs/models.py Outdated
path = reverse_path(url_name, kwargs=kwargs)
request = RequestFactory().get(path, HTTP_HOST=www_host)
body = resolve(path).func(request).render().text
# Need to parse the body element.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot this TODO yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmispelon do you have any good ideas here?

Copy link
Member

@bmispelon bmispelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic PR, thanks for submitting it! It addresses a real issue that's been plaguing the site for a long time, and as far as PR go, it's a really high quality one, kudos 👏🏻

Of course because I'm me I'll always find something to complain about so I've left a few comments ranging from improvement suggestions to open questions.

I'll note that I have not tested the code locally (yet). I'm still working on getting the preview server up and running and I think this PR would be a great candidate for it.

Comment on lines 644 to 646
cls.document_index, cls.document_view, cls.document_detail = (
cls.release.documents.order_by("path")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break if/when more views are added to _sync_views_to_db(), right?

I'd suggest breaking the single test_document_url() method into three separate tests then.

Copy link
Member Author

@tim-schilling tim-schilling Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from the rest of the tests, it seems like there's an effort to keep them efficient. It may make more sense as two separate tests, one for the old document_url and then a new one for the DocumentationCategory.WEBSITE branch. Let me know if you disagree.

docs/models.py Outdated
]
if not www_hosts or not (www_host := www_hosts[0]):
return
synced_views = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should live in a constant at the top level of a module (I'll let you decide which module makes more sense 😁 )

Copy link
Member Author

@tim-schilling tim-schilling Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're moving this from the code away from the usage logic, then we should use a dataclass so it's not relying on positional arguments. That gets a bit more verbose and likely could go in an entirely separate module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be side-stepped due to #2136 (comment)

docs/models.py Outdated
Comment on lines 275 to 277
path = reverse_path(url_name, kwargs=kwargs)
request = RequestFactory().get(path, HTTP_HOST=www_host)
body = resolve(path).func(request).render().text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of stuff going on in such a few lines: you make it look deceptively simple 😁

I'm curious: did you try other approaches before settling on this? I'm a bit worried that the whole www_host logic could be a bit brittle. If we're calling the view function directly anyway, can we skip setting the host on the request object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely for a reason, but let me try once more and better document the weirdness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so the simplest we can get it is to use get_template("aggregator/ecosystem.html").render(). However, that does sort of open us to the problem of the view (which is defined in the urls.py) of using a different template or implementation than expected. However, using this approach eliminates the need to get the host which eliminates having the search the ALLOWED_HOSTS setting. So I think it's worth it. Let me know what you think.

Copy link
Member

@bmispelon bmispelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the move to a dataclass for SearchableView, nice one 👍🏻

I think we can go further and delegate the nitty-gritty details of getting a "document"'s URL and content to that class.

I've got grand ideas of generalizing this and having a collection of searchable/indexable models (sphinx document, blog, template view, flat page, ...) but that'd make the scope of this PR explode and I'd rather see this shipped. We can always iterate.

Other than that I left a question about the new support_end. I think we should be conservative with adding new fields and would like to explore reusing existing ones, to minimize the combinatorial explosion of possible model states. If we have to keep support_end, would it be possible for the existing DocumentRelease to get a sensible default assigned to them with a data migration (we have 100 of them in production, it'd be a shame if we had to go and back-populate them by hand).

Thanks again for your work on this!

docs/models.py Outdated
Comment on lines 107 to 111
support_end = models.DateField(
null=True,
blank=True,
help_text="The end of support for this release of Django.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a dedicated field for this, why couldn't we get the information from release.eol_date instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why we do code reviews. Let me get this switched over.

docs/models.py Outdated
Comment on lines 270 to 273
absolute_url = reverse(searchable_view.url_name, host="www")
# This must match the template used for the url `community-ecosystem`
html = get_template("aggregator/ecosystem.html").render()
# Need to parse the body element.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make absolute_url and html attributes (likely a property) of SearchableView? The name of the template should be an attribute of the SearchableView too and not be hardcoded here.

docs/search.py Outdated
TOPICS = "topics", _("Using Django")
HOWTO = "howto", _("How-to guides")
RELEASE_NOTES = "releases", _("Release notes")
WEBSITE = "weblog", _("Django Website")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the breadcrumbs work for you in the first place? I don't think I've ever used that feature and it doesn't seem to work correctly for me locally: even for documents with two or more parents, the links for all the parents are all the same and go to the document's own page.

Not sure I'd want to start adding redirects, that seems like solving the problem at the wrong level to me.

@pauloxnet
Copy link
Member

Can you review this PR based on this latest change in the search docs #2192

@tim-schilling
Copy link
Member Author

@pauloxnet I don't see anything obvious that would cause a problem. As long as #2192 didn't change how the search data needs to be stored, then this PR won't be impacted. The hack that is this PR is to insert search-able data and point it to the blog or ecosystem page.

@pauloxnet
Copy link
Member

I don't see anything obvious that would cause a problem. As long as #2192 didn't change how the search data needs to be stored, then this PR won't be impacted. The hack that is this PR is to insert search-able data and point it to the blog or ecosystem page.

I was indirectly suggesting using the approach outlined in the documentation, which involves delegating the generation of search vectors to a generated field. I'm sorry I don't have the time to do a more thorough review of this PR (I'm recuperating from my Django sprints). 😅

@tim-schilling
Copy link
Member Author

@pauloxnet I think you can agree there's value in shipping functionality rather than perfecting the approach. If that suggestion is more of a, "hey we need to conform to this new standard", I can make the changes. If it's a "it'd be cool if it did this" type suggestion, I'm going to push for shipping something first.

@pauloxnet
Copy link
Member

I think you can agree there's value in shipping functionality rather than perfecting the approach. If that suggestion is more of a, "hey we need to conform to this new standard", I can make the changes. If it's a "it'd be cool if it did this" type suggestion, I'm going to push for shipping something first.

My comment was, rather, I saw that you started this PR before the docs search improvement with GeneratedField, and, not having time to do a full review of the PR, I tried to point it out in this PR. No intent of perfectionism.

Sync the blog entries into search based on the release documents
support end date.
"""
if self.lang != "en" or not self.release.eol_date:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.lang != "en" or not self.release.eol_date:
if self.lang != "en" or (self.release and not self.release.eol_date):

You need something like that as there are english "dev" docs which don't have a release object and eol date
Might be worth testing this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my assumption was that any release had an EOL date set when it was created.

),
default=False,
)
is_searchable = models.BooleanField(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have examples of blog entries you're expecting to be included and excluded? To me, I feel like everything should be searchable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I haven't, but since this is searching the docs, not searching the site, I would imagine blog posts such as announcing board elections would not be searchable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants