Added ioservice parser + unit testing #196

Aweinhof · 2025-09-15T10:48:28Z

Added a recursive parser that parses all the files in the ioreg folder.

Added a string parser that recursively parses structures found in the values.

…timized yet

… some cleanup

cvandeplas · 2025-09-25T06:45:46Z

Great job @Aweinhof !
Minor question: would you be able to also add a unit test per parser? (doing the for case_id, case in self.sd.cases().items(): loop)
That's practical to identify issues when we run unit tests on a larger dataset.

Aweinhof · 2025-09-25T11:44:03Z

Is this better or am I missing something?

cvandeplas · 2025-09-25T06:50:01Z

src/sysdiagnose/utils/ioreg_parsers/structure_parser.py

+    def check_start_node(self):
+        if '+-o' not in self.line:
+            logger.error('This is not normal. Recursive function called on random line.')
+            exit(1)


We should not call exit in a library as that will kill the full app using the library. Better is to throw an exception then.

cvandeplas · 2025-09-25T06:53:57Z

src/sysdiagnose/utils/ioreg_parsers/structure_parser.py

+
+class IORegStructParser:
+    rollback_addr = None
+    line = None


Minor remark: Using this class var makes your life easier so you don't have to pass on the variable to all the functions. On the other hand, it may lead to confusion for someone using one of the functions later on.
Following PEP8 guidance best is to use the __foo variable name convention. Perhaps __curr_line would be also more clear to the purpose of the line tracker.

I made them as class attributes because they are basically used everywhere in the class. This way I don't have to pass them as argument and return them every time, which would make the code a lot heavier to read in my opinion.

tests/test_string_parser.py

….py to utils

Aweinhof · 2025-09-25T16:04:56Z

Alright, done. I also decided to move up string_parser.py to the utils folder, because I believe it might become useful somewhere else in the project at some point. Let me know what you think of it.

cvandeplas · 2025-09-30T13:28:34Z

I notice that the tests_parsers_* all contain the same test_basic_structure, test_value_overflow_anomaly, tests_non_ascii_byte_anomaly. This is a lot of duplicate/redundant data.

I would advise to move these tests to one file such as test_ioregstruct.py file instead considering this is testing functionality of the util. The test_parsers_ files will contain the minimal

below the result of the tests on more data:

test_non_ascii_byte_anomaly

test_non_ascii_byte_anomaly (test_parsers_iousb.TestParsersIOService.test_non_ascii_byte_anomaly) ... Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> value -->¿<--
ok

Is there a way that this test can run without this warning? (so by changing the lib, or if this is expected the lib outputs this warning, then have the test not show the warning.

test_parsers_ioacpiplane

runs very quickly !

test_parsers_iodevicetree

outputs a few of

test_parse_case (test_parsers_iodevicetree.TestParsersIODeviceTree.test_parse_case) ... Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> < hpadhpa>
Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> < hpa>
Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> < CIAUPCSUPCAKLCS>

Can you review this ?

takes a reasonable time.

test_parser_iofirewire

runs very quickly !

test_parsers_iopower

test_parse_case (test_parsers_iopower.TestParsersIOPower.test_parse_case) ... Skipped line with 526635 characters. Recursion depth : 2954
--> max recursion depth can be increased in utils/string_parser.py in parse(). Feel free to try as high as needed to parse this line.

many of those max recursion reached, took some more time, about 10 seconds per case

test_parsers_ioservice

takes a loong time.
Similar warnings as iodevicetree.
max recursion depth reached many many times

Warning : trying to incorporate dict/list in a string :
---> <Z[STRUCT]1lDA>

test_parsers_iousb

runs very quickly

recursion limit

The limit is being reached quite some time.
What happens when this is the case? will the value still be stored in the key ? is the rest of the file converted?
Best would be to find a good recursion number, or re-think how to handle too large recursions.
Potentially look at the size of a value before starting recursion?

What are your thoughts?

Aweinhof · 2025-09-30T15:30:46Z

About the warning stating that there is a potential struct :

This is a basic warning that triggers if it finds struct-like characters (< > ( ) { ... ) in a string. I see that '-->' is triggering it which is a false alarm. "< abc>" however is a case I have to handle, my detection probably takes it as a struct because of the space. I don't think this warning is relevant anymore, unless there are undiscovered struct out there in the wild.

About the redundant test files :

I'll fix this, I did this because i thought there was some kind of rule to have one testfile per parser.

About this warning :

Warning : trying to incorporate dict/list in a string :
---> <Z[STRUCT]1lDA>

This warning is very relevant, it is a very strange situation. Imagine we have the string :

<raw{key1: value1, key2: value2}text>

What should the resulting struct be? I am asking you because I don't currently know, my way to handle this is to discard the rogue text and only keep the dict {key1: value1, key2: value2}

What happens when the recursion depth is reached ?

The stringparser tool takes in a string, and returns a struct, or the initial input string if it couldn't parse it. Hence you will have a long string (unchanged) as value to a "top level" key in the parsed data.

Concerning the recursion

This problem is complex, the thing is that you cannot make a top-to-bottom parser with a stateless regex-based parser (thus we cannot make a partial parsing of a long line, which is sad). Currently, as said previously, the value in the main dict is left unchanged if it is too long to parse. This is slow however because we tried to parse the long string. Looking at the size of the line is not an accurate way, since some lines have a huuuge amount of data but almost no structures, hence they are parsed instantly. My proposal is to make a struct-defining character counter, where we basically count the amounts of {}, (), [], "". This would be a quick, accurate enough check (since these characters are not that often found in strings). Let me know what you think of it

Aweinhof added 5 commits September 12, 2025 12:08

Added recursive, basic version of IOService parser

81e2f35

IOService parser now recreates the tree in a json format

2a7131f

IOService parser now handles anomalies

333cbda

Fixed an anomaly where a non-ascii byte stands in the IOService.txt file

7484620

Added unit testing for ioservice parser

5e03cec

Aweinhof closed this Sep 15, 2025

Fixed codestyling of ioservice and its unit testing

25145b8

Aweinhof reopened this Sep 15, 2025

Aweinhof marked this pull request as draft September 16, 2025 08:31

Aweinhof and others added 11 commits September 16, 2025 17:42

Small refactor of the resulting data structure in the ioservice parser

ebb74a6

Merge branch 'EC-DIGIT-CSIRC:main' into main

5d26e31

Added basic value string parsers, used for io* parsers

2de2208

Merge branch 'main' of https://github.com/Aweinhof/sysdiagnose

858d5b6

Added recursive version of string_parser, still rudimentary

b5a9188

Refactor of algo logic in utils/string_parser

49b1982

Enhanced string_parser + Added solid tests + fixes here and there

61d386c

string_parser now fully working on the testfile, quoted values not op…

5c81d1c

…timized yet

Added a line preparation that greatly optimizes the recursion depth +…

12c566e

… some cleanup

Cleaned code releated to ioreg and stringparsers

58a0524

bug fix concerning strings inside <> and cleanup

4394482

Aweinhof marked this pull request as ready for review September 24, 2025 09:28

cvandeplas self-assigned this Sep 25, 2025

dissociated ioreg parsers + small fixes

e5a9eba

cvandeplas reviewed Sep 25, 2025

View reviewed changes

Aweinhof and others added 2 commits September 25, 2025 17:56

Replaced exit calls, renamed class attributes and moved string_parser…

eea716a

….py to utils

Merge branch 'EC-DIGIT-CSIRC:main' into main

9020e7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added ioservice parser + unit testing #196

Added ioservice parser + unit testing #196

Uh oh!

Aweinhof commented Sep 15, 2025 •

edited

Loading

Uh oh!

cvandeplas commented Sep 25, 2025

Uh oh!

Aweinhof commented Sep 25, 2025

Uh oh!

cvandeplas Sep 25, 2025

Uh oh!

cvandeplas Sep 25, 2025

Uh oh!

Aweinhof Sep 25, 2025

Uh oh!

Uh oh!

Aweinhof commented Sep 25, 2025

Uh oh!

cvandeplas commented Sep 30, 2025 •

edited

Loading

Uh oh!

Aweinhof commented Sep 30, 2025

Uh oh!

Uh oh!

Added ioservice parser + unit testing #196

Are you sure you want to change the base?

Added ioservice parser + unit testing #196

Uh oh!

Conversation

Aweinhof commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cvandeplas commented Sep 25, 2025

Uh oh!

Aweinhof commented Sep 25, 2025

Uh oh!

cvandeplas Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

cvandeplas Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Aweinhof Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Aweinhof commented Sep 25, 2025

Uh oh!

cvandeplas commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

test_non_ascii_byte_anomaly

test_parsers_ioacpiplane

test_parsers_iodevicetree

test_parser_iofirewire

test_parsers_iopower

test_parsers_ioservice

test_parsers_iousb

recursion limit

Uh oh!

Aweinhof commented Sep 30, 2025

Uh oh!

Uh oh!

Aweinhof commented Sep 15, 2025 •

edited

Loading

cvandeplas commented Sep 30, 2025 •

edited

Loading