Skip to content

Conversation

alexsku
Copy link
Collaborator

@alexsku alexsku commented Jul 26, 2025

Based on the original PR #13933 by @david-leifker, this implementation provides a focused subset of system information functionality, specifically extracting configuration data from ConfigurationProvider and Spring application context details. While David's comprehensive PR included storage systems, Kubernetes, and Kafka collectors with extensive infrastructure dependencies, this implementation streamlines the scope to focus on core system configuration and Spring component status (GMS, MAE Consumer, MCE Consumer) with a new dedicated /openapi/v1/system-info endpoint family. The implementation preserves the hierarchical JSON structure from the original design, includes robust property filtering for sensitive data redaction, and maintains the parallel execution architecture, but excludes the broader infrastructure monitoring capabilities to provide a more targeted solution for configuration visibility and component health reporting. MAE/MCE consumer info currently uses placeholder data to avoid circular dependency issues in embedded deployment mode, with the foundation prepared for future deployment mode detection as envisioned in the original PR.

@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Jul 26, 2025
Copy link

alwaysmeticulous bot commented Jul 26, 2025

✅ Meticulous spotted 0 visual differences across 1393 screens tested: view results.

Meticulous evaluated ~9 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit f847596. This comment will update as new commits are pushed.

Copy link

codecov bot commented Jul 26, 2025

Bundle Report

Changes will decrease total bundle size by 82 bytes (-0.0%) ⬇️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 22.36MB -82 bytes (-0.0%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js -82 bytes 18.69MB -0.0%

Copy link

codecov bot commented Jul 26, 2025

Codecov Report

❌ Patch coverage is 80.35714% with 11 lines in your changes missing coverage. Please review.
⚠️ Test Analytics upload error: Unsupported file format

Files with missing lines Patch % Lines
...ct/openapi/operations/v1/SystemInfoController.java 86.53% 2 Missing and 5 partials ⚠️
.../factory/system_info/SystemInfoServiceFactory.java 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

private boolean isAllowedProperty(String key) {
String lowerKey = key.toLowerCase();
return ALLOWED_PATTERNS.stream().anyMatch(pattern -> pattern.matcher(lowerKey).find())
|| SENSITIVE_PATTERNS.stream().noneMatch(lowerKey::endsWith);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the sensitive detection logic from the original PR, however I wonder how much we are concerned about avoiding exposing sensitive data.

I think the most safe is approach is the explicit allow list, when the element is not in the allow list it would be redacted. This allow list should not use wildcards (since with the wildcards it is easier to miss and leak sensitive data. At the same time there are almost 700 items in the config, so this explicit allowlist would be pretty large.

@david-leifker and @RyanHolstien - what are your thoughts on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem with an allow list is that we'll never keep it up to date. I was trying to balance the maintenance of an allow list with some kind of rational default.
While the endpoint is behind a restricted endpoint, the use-case is for support requests. We'd like to request the dump of system information from a customers' system for debugging and prevent leaking sensitive information to us.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryan and I discussed this and we should add a unit test that explicitly will require properties to be tagged as sensitive or not (and check that the sensitive ones get redacted)

Copy link
Collaborator

@david-leifker david-leifker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending passing CI

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Jul 30, 2025
return token_data["accessToken"], token_data["metadata"]["id"]


def test_system_info_authenticated_non_admin_user_returns_403():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RyanHolstien I added this test to make sure users without MANAGE_SYSTEM_OPERATIONS_PRIVILEGE cannot access the system-info endpoints. the test does create_user and remove_user in finally so normally there should not be users left after running this test.

the pattern is similar to what privileges_and_test_user_setup from test_privileges.py (for example) is doing


This directory contains end-to-end smoke tests for DataHub functionality. These tests can be run locally for faster development and debugging compared to the full CI pipeline.

> 📖 **Official Documentation**: For additional setup details and troubleshooting, see the [Running smoke tests locally](https://www.notion.so/acryldata/Running-smoke-tests-locally-23ffc6a6427780df8f22dd0a4e57d793) guide.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't link out to internal Notion doc on OSS repo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, missed this comment - I will do another pr to remove it

"neo4j.maxConnectionPoolSize",
"neo4j.maxTransactionRetryTime",
"neo4j.uri",
"neo4j.username",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waffling on whether or not usernames should be considered sensitive, they aren't really, but might get some pushback with them. Wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the non sensitive list, so they are not considered sensitive right now, it makes sense to me. David's pr also had them as such

Copy link
Collaborator

@RyanHolstien RyanHolstien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good to me, just the failing CI test needs to be fixed

@Slf4j
@Component
@RequiredArgsConstructor
public class SpringComponentsCollector {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed fetchRemoteComponentInfo - it is not used and it was adding to non covered lines.

it is still here

private CompletableFuture<ComponentInfo> fetchRemoteComponentInfo(

@alexsku alexsku merged commit 6b64c25 into master Jul 31, 2025
44 checks passed
@alexsku alexsku deleted the sa-config-PFP-1193-david branch July 31, 2025 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment pending-submitter-merge product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants