Skip to content

Conversation

@EmilienM
Copy link
Contributor

@EmilienM EmilienM commented May 15, 2025

The Tempest report parser was updated to correctly extract the final traceback
when multiple tracebacks are present in a single log entry.

Previously, the parser would capture all tracebacks encountered and the logs in the middle. This could lead to large amount of
inputs that our model can't handle. For now let's just focus on the last
traceback that is found for each test.

The Tempest report parser was updated to correctly extract the final traceback
when multiple tracebacks are present in a single log entry.

Previously, the parser would capture the first traceback encountered. This could lead to large amount of
inputs that our model can't handle. For now let's just focus on the last
traceback that is found for each test.
@EmilienM
Copy link
Contributor Author

Tested with this test which has 7 Tracebacks.

API logs (with a print enabled to show that only the last traceback was sent):

2025-05-15 18:51:21 - HTTP Request: GET https://sf.apps.int.gpc.ocp-hub.prod.psi.redhat.com/logs/ac3/components-integration/ac3cf6af0d7f486bbc69aed0b96c833b/logs/controller-0/ci-framework-data/tests/test_operator//tempest-tests-tempest-workflow-step-01-single-thread-testing//stestr_results.html "HTTP/1.1 200 OK"
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_mtu.py", line 247, in test_south_to_north_pmtud_udp_basic
    self.check_pmtud_basic()
  File "/usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_mtu.py", line 202, in check_pmtud_basic
    self.validate_next_hop_mtu(
  File "/usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_mtu.py", line 190, in validate_next_hop_mtu
    self._verify_capture_for_icmp_unreacheable(
  File "/usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_mtu.py", line 114, in _verify_capture_for_icmp_unreacheable
    self.assertEqual(
  File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 393, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/usr/lib/python3.9/site-packages/testtools/testcase.py", line 480, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: 0 != 1442: Delivered data has size 0 bytes while expected 1442.
2025-05-15 18:51:22 - HTTP Request: POST http://x.x.x.4:8000/tokenize "HTTP/1.1 200 OK"
2025-05-15 18:51:22 - HTTP Request: GET http://x.x.x.173:6333/collections "HTTP/1.1 200 OK"
2025-05-15 18:51:22 - HTTP Request: POST http://x.x.x.4:8000/v1/embeddings "HTTP/1.1 200 OK"
2025-05-15 18:51:22 - HTTP Request: POST http://x.x.x.173:6333/collections/rca-knowledge-base/points/search "HTTP/1.1 200 OK"
2025-05-15 18:51:23 - HTTP Request: POST http://x.x.x.4:8001/v1/rerank "HTTP/1.1 200 OK"
2025-05-15 18:51:23 - HTTP Request: POST http://x.x.x.4:8001/v1/rerank "HTTP/1.1 200 OK"
2025-05-15 18:51:23 - HTTP Request: POST http://x.x.x.173:6333/collections/rca-ci/points/search "HTTP/1.1 200 OK"

Curl:

curl -X POST http://0.0.0.0:8001/rca-from-tempest \
  -H "Content-Type: application/json" -H "Authorization: Bearer 4469540e-932f-42ff-9f5a-ba44d45696df" \
  -d '{"tempest_report_url": "https://sf.apps.int.gpc.ocp-hub.prod.psi.redhat.com/logs/ac3/components-integration/ac3cf6af0d7f486bbc69aed0b96c833b/logs/controller-0/ci-framework-data/tests/test_operator//tempest-tests-tempest-workflow-step-01-single-thread-testing//stestr_results.html"}'
[{"test_name":"whitebox_neutron_tempest_plugin.tests.scenario.test_mtu.GatewayMtuTestUdp.test_south_to_north_pmtud_udp_basic","response":"\n\n**Root Cause of the Failure:**\nThe test failures are likely caused by firewall rules in the test environment that are blocking the expected network traffic. The `iptables` rules might be incorrectly filtering or dropping the necessary UDP traffic, leading to the \"delivered data size\" mismatch.\n\n**Steps to Resolve:**\n1. **Check Firewall Rules:** Inspect the `iptables` rules on the test environment to ensure that UDP ports (e.g., 65000) are not being blocked or dropped.\n2. **Temporarily Disable Firewall:** If the issue is due to temporary firewall rules, consider disabling or adjusting the firewall during test execution to allow the expected traffic.\n3. **Verify Network Configuration:** Ensure that network interfaces and routing are correctly configured to allow the test traffic to flow as expected.\n4. **Re-run Tests:** After making changes to the firewall or network configuration, re-run the tests to check if the issue is resolved.\n\n**Additional Considerations:**\n- If the issue persists, it may be related to a deeper network configuration problem or a misconfiguration in the test setup.\n- Consider reaching out to the network or infrastructure team for assistance in verifying the test environment's network configuration.","urls":["https://issues.redhat.com/browse/OSPRH-9095","https://issues.redhat.com/browse/OSPRH-12863"]}]

@sbekkerm
Copy link
Contributor

LGTM

@EmilienM EmilienM requested a review from lpiwowar May 15, 2025 19:18
Copy link
Contributor

@lpiwowar lpiwowar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Maybe we could update the tools repo as well so that we store in the vector db the last traceback only as well [1].

[1] https://github.com/RCAccelerator/tools/blob/e294a89983d13bad9f7719ebb46323737395b38c/data_scraper/processors/ci_logs_provider.py#L187

@lpiwowar lpiwowar merged commit 8791d57 into main May 16, 2025
3 checks passed
@lpiwowar lpiwowar deleted the tracebacks branch May 16, 2025 09:02
@EmilienM
Copy link
Contributor Author

LGTM! 👍

Maybe we could update the tools repo as well so that we store in the vector db the last traceback only as well [1].

[1] https://github.com/RCAccelerator/tools/blob/e294a89983d13bad9f7719ebb46323737395b38c/data_scraper/processors/ci_logs_provider.py#L187

Not necessarily, as the more tracebacks we have in vectordb the better?

@lpiwowar
Copy link
Contributor

Not necessarily, as the more tracebacks we have in vectordb the better?

The matches (cosine similarity) between the long logs and the short logs we are producing here are not going to be as high as they would if we had the short logs in the vector db as well.

Also, with the long logs there's going to be a lot of other noise (other tracebacks etc). More difficult for matching. The problem IMO is not the fact that we are storing more tracebacks in the vectordb. The issue is that we are taking in some cases almost the entire error string and computing the key for it (instead of computing the key solely with the Traceback section).

@EmilienM
Copy link
Contributor Author

good points, let's do it then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants