Skip to content

Conversation

Aweinhof
Copy link
Contributor

@Aweinhof Aweinhof commented Sep 15, 2025

Added a recursive parser that parses all the files in the ioreg folder.

Added a string parser that recursively parses structures found in the values.

@Aweinhof Aweinhof closed this Sep 15, 2025
@Aweinhof Aweinhof reopened this Sep 15, 2025
@Aweinhof Aweinhof marked this pull request as draft September 16, 2025 08:31
@Aweinhof Aweinhof marked this pull request as ready for review September 24, 2025 09:28
@cvandeplas
Copy link
Contributor

Great job @Aweinhof !
Minor question: would you be able to also add a unit test per parser? (doing the for case_id, case in self.sd.cases().items(): loop)
That's practical to identify issues when we run unit tests on a larger dataset.

@cvandeplas cvandeplas self-assigned this Sep 25, 2025
@Aweinhof
Copy link
Contributor Author

Is this better or am I missing something?

def check_start_node(self):
if '+-o' not in self.line:
logger.error('This is not normal. Recursive function called on random line.')
exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not call exit in a library as that will kill the full app using the library. Better is to throw an exception then.


class IORegStructParser:
rollback_addr = None
line = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor remark: Using this class var makes your life easier so you don't have to pass on the variable to all the functions. On the other hand, it may lead to confusion for someone using one of the functions later on.
Following PEP8 guidance best is to use the __foo variable name convention. Perhaps __curr_line would be also more clear to the purpose of the line tracker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made them as class attributes because they are basically used everywhere in the class. This way I don't have to pass them as argument and return them every time, which would make the code a lot heavier to read in my opinion.

@Aweinhof
Copy link
Contributor Author

Alright, done. I also decided to move up string_parser.py to the utils folder, because I believe it might become useful somewhere else in the project at some point. Let me know what you think of it.

@cvandeplas
Copy link
Contributor

cvandeplas commented Sep 30, 2025

I notice that the tests_parsers_* all contain the same test_basic_structure, test_value_overflow_anomaly, tests_non_ascii_byte_anomaly. This is a lot of duplicate/redundant data.

  • I would advise to move these tests to one file such as test_ioregstruct.py file instead considering this is testing functionality of the util. The test_parsers_ files will contain the minimal

below the result of the tests on more data:

test_non_ascii_byte_anomaly

test_non_ascii_byte_anomaly (test_parsers_iousb.TestParsersIOService.test_non_ascii_byte_anomaly) ... Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> value -->¿<--
ok
  • Is there a way that this test can run without this warning? (so by changing the lib, or if this is expected the lib outputs this warning, then have the test not show the warning.

test_parsers_ioacpiplane

runs very quickly !

test_parsers_iodevicetree

outputs a few of

test_parse_case (test_parsers_iodevicetree.TestParsersIODeviceTree.test_parse_case) ... Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> < hpadhpa>
Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> < hpa>
Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
Warning : A structure might have been recognized in here, if so please consider adding it to the string_parser.py file
---> < CIAUPCSUPCAKLCS>

  • Can you review this ?

takes a reasonable time.

test_parser_iofirewire

runs very quickly !

test_parsers_iopower

test_parse_case (test_parsers_iopower.TestParsersIOPower.test_parse_case) ... Skipped line with 526635 characters. Recursion depth : 2954
--> max recursion depth can be increased in utils/string_parser.py in parse(). Feel free to try as high as needed to parse this line.

many of those max recursion reached, took some more time, about 10 seconds per case

test_parsers_ioservice

takes a loong time.
Similar warnings as iodevicetree.
max recursion depth reached many many times

Warning : trying to incorporate dict/list in a string :
---> <Z[STRUCT]1lDA>

test_parsers_iousb

runs very quickly

recursion limit

The limit is being reached quite some time.
What happens when this is the case? will the value still be stored in the key ? is the rest of the file converted?
Best would be to find a good recursion number, or re-think how to handle too large recursions.
Potentially look at the size of a value before starting recursion?

What are your thoughts?

@Aweinhof
Copy link
Contributor Author

  • About the warning stating that there is a potential struct :

This is a basic warning that triggers if it finds struct-like characters (< > ( ) { ... ) in a string. I see that '-->' is triggering it which is a false alarm. "< abc>" however is a case I have to handle, my detection probably takes it as a struct because of the space. I don't think this warning is relevant anymore, unless there are undiscovered struct out there in the wild.

  • About the redundant test files :

I'll fix this, I did this because i thought there was some kind of rule to have one testfile per parser.

  • About this warning :
Warning : trying to incorporate dict/list in a string :
---> <Z[STRUCT]1lDA>

This warning is very relevant, it is a very strange situation. Imagine we have the string :

<raw{key1: value1, key2: value2}text>

What should the resulting struct be? I am asking you because I don't currently know, my way to handle this is to discard the rogue text and only keep the dict {key1: value1, key2: value2}

  • What happens when the recursion depth is reached ?

The stringparser tool takes in a string, and returns a struct, or the initial input string if it couldn't parse it. Hence you will have a long string (unchanged) as value to a "top level" key in the parsed data.

  • Concerning the recursion

This problem is complex, the thing is that you cannot make a top-to-bottom parser with a stateless regex-based parser (thus we cannot make a partial parsing of a long line, which is sad). Currently, as said previously, the value in the main dict is left unchanged if it is too long to parse. This is slow however because we tried to parse the long string. Looking at the size of the line is not an accurate way, since some lines have a huuuge amount of data but almost no structures, hence they are parsed instantly. My proposal is to make a struct-defining character counter, where we basically count the amounts of {}, (), [], "". This would be a quick, accurate enough check (since these characters are not that often found in strings). Let me know what you think of it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants