Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 17 additions & 7 deletions insights/core/spec_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,22 @@ def __init__(self, relative_path, root="/", save_as=None, ds=None, ctx=None, cle

self.validate()

def _is_inside_root(self):
Copy link
Contributor

@xiangce xiangce Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method is not used anywhere out the validate, better make it an inner funciton of the validate, Otherwise, we may need to add particular tests it (although it's a private one)

Copy link
Contributor Author

@jholecek-rh jholecek-rh Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot agree with this. In my opinion, nested function and class definitions are a niche feature for specific cases (e.g. wrappers). Nested functions cannot be accessed, reused, or called manually when troubleshooting.

I intentionally chose a private name for _is_inside_root so that it remains an implementation detail. It is not part of the public interface and thus should not need a dedicated test. The implementation is properly covered by tests that I did add (in this PR and in #4458) that use the public validate method.

In fact, I would argue that validate should not be public either. We touched the issue of interfaces of provider classes in our private conversations and I do intend to continue those. However, clarifying the public interface of the FileProvider class was beyond the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. The FileProvider.validate() method is not used only in FileProvider.__init__. Should it be defined as a nested function in the __init__ method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't get your point

  1. Although the validate() is called only in the __ini__, but for different child classes, it has different implementation. It should not be designed as inner function.
  2. inner function, only works for function which is very simple (just like this one) and sometimes a bit common (can be called times) for its parent function.

If its function is common enough, and is useful not only for its parent caller. We can upgrade it as you did or might need to consider moving it to the insights.util as a utility function.

Because I didn't get from the code and feel hard to imagine the possibility that it will be used by other cases in this class, I asked you move it to a inner function or just expand it in the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the validate() is called only in the ini, but for different child classes, it has different implementation. It should not be designed as inner function.

I do agree that the validate() method should not be defined as an inner function. I used this example to demonstrate why I do not agree with the reasons that you provided for defining _is_inside_root() as an inner function.

As for the "different implemenations", no subclass of FileProvider overrides the validate() method. The CommandOutputProvider class happens to define a similar method, but the implementation is completely independent. I have brought up issues in the insight-core hierarchy of provider classes with you in a separate discussion. I think the hierarchy deserves a close review of the interfaces and separation of responsibilities. I would prefer to keep this topic separate from this PR that aims to fix a vulnerability.

As for the rest, it seems that our views of what a well-structured, readable code looks like differ significantly. The structure and readability of insights-core code is actually one of the broader topics that we wanted to discuss with you, because my team continuously struggles when dealing with the insights-core code base.

I cannot continue this PR without a better understanding of/alignment on desired code style and its justification. I will reach out to you in private to agree next steps and timelines.

"""Checks that `self.relative_path` does not point outside `self.root`."""
resolved = os.path.realpath(self.path)

# pathlib.Path.is_relative_to() has been added only in Python 3.9
resolved_root = os.path.realpath(self.root)
if not resolved_root.endswith(os.sep):
resolved_root += os.sep

return resolved.startswith(resolved_root)

def validate(self):
if not self._is_inside_root():
msg = "Relative path points outside the root: %s"
raise ValueError(msg % (self.path))

# 1. No Such File
if not os.path.exists(self.path):
raise ContentException("%s does not exist." % self.path)
Expand All @@ -224,15 +239,10 @@ def validate(self):
if self._filterable and not self._filters:
raise NoFilterException("Skipping %s due to no filters." % dr.get_name(self.ds))
# 2.2 Customer Prohibits Collection
if not blacklist.allow_file("/" + self.relative_path):
log.warning("WARNING: Skipping file %s", "/" + self.relative_path)
if not blacklist.allow_file(os.sep + self.relative_path):
log.warning("WARNING: Skipping file %s", os.sep + self.relative_path)
raise BlacklistedSpec()

resolved = os.path.realpath(self.path)
if not resolved.startswith(os.path.realpath(self.root)):
msg = "Relative path points outside the root: %s -> %s."
raise Exception(msg % (self.path, resolved))

if not os.access(self.path, os.R_OK):
raise ContentException("Cannot access %s" % self.path)

Expand Down
11 changes: 8 additions & 3 deletions insights/tests/core/spec_factory/test_file_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ def create_file(relpath, content=""):
create_file(SAMPLE_FILE)
create_directory("sample/root")
create_file("sample/root/access_denied.txt")
os.chmod(os.path.join(root, "sample/root/access_denied.txt"), 0)
os.chmod(os.path.join(root, "sample/root/access_denied.txt"), 0o000)
create_file("sample/root.txt")
return root


Expand All @@ -48,7 +49,7 @@ def test_repr(sample_directory):


@pytest.mark.skipif(
# GitHub workflows run Python 2.7 and 3.6 tests in containers, because these old Python
# GitHub workflows run Python 2.7 and 3.6 tests in containers because these old Python
# versions are not available in GitHub-hosted runners anymore. The tests are executed as root.
os.getuid() == 0,
reason="Test must not be run as root. Root ignores file mode bits."
Expand All @@ -59,7 +60,11 @@ def test_repr(sample_directory):
("access_denied.txt", ContentException, "Cannot access"),
("no_such_file.txt", ContentException, "does not exist"),
# file outside root that exists
("../../sample_file.txt", Exception, "Relative path points outside the root"),
("../../sample_file.txt", ValueError, "Relative path points outside the root"),
# file outside root that does not exist
("../no_such_file.txt", ValueError, "Relative path points outside the root"),
# file outside root where root path is a prefix of the file path string-wise
("../root.txt", ValueError, "Relative path points outside the root"),
]
)
def test_validate(sample_directory, relpath, exception_type, match):
Expand Down
Loading