-
Notifications
You must be signed in to change notification settings - Fork 30
Added ioservice parser + unit testing #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Great job @Aweinhof ! |
Is this better or am I missing something? |
def check_start_node(self): | ||
if '+-o' not in self.line: | ||
logger.error('This is not normal. Recursive function called on random line.') | ||
exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not call exit
in a library as that will kill the full app using the library. Better is to throw an exception then.
|
||
class IORegStructParser: | ||
rollback_addr = None | ||
line = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor remark: Using this class var makes your life easier so you don't have to pass on the variable to all the functions. On the other hand, it may lead to confusion for someone using one of the functions later on.
Following PEP8 guidance best is to use the __foo
variable name convention. Perhaps __curr_line
would be also more clear to the purpose of the line tracker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made them as class attributes because they are basically used everywhere in the class. This way I don't have to pass them as argument and return them every time, which would make the code a lot heavier to read in my opinion.
Alright, done. I also decided to move up string_parser.py to the utils folder, because I believe it might become useful somewhere else in the project at some point. Let me know what you think of it. |
I notice that the tests_parsers_* all contain the same test_basic_structure, test_value_overflow_anomaly, tests_non_ascii_byte_anomaly. This is a lot of duplicate/redundant data.
below the result of the tests on more data: test_non_ascii_byte_anomaly
test_parsers_ioacpiplaneruns very quickly ! test_parsers_iodevicetreeoutputs a few of
takes a reasonable time. test_parser_iofirewireruns very quickly ! test_parsers_iopower
many of those max recursion reached, took some more time, about 10 seconds per case test_parsers_ioservicetakes a loong time.
test_parsers_iousbruns very quickly recursion limitThe limit is being reached quite some time. What are your thoughts? |
This is a basic warning that triggers if it finds struct-like characters (< > ( ) { ... ) in a string. I see that '-->' is triggering it which is a false alarm. "< abc>" however is a case I have to handle, my detection probably takes it as a struct because of the space. I don't think this warning is relevant anymore, unless there are undiscovered struct out there in the wild.
I'll fix this, I did this because i thought there was some kind of rule to have one testfile per parser.
This warning is very relevant, it is a very strange situation. Imagine we have the string :
What should the resulting struct be? I am asking you because I don't currently know, my way to handle this is to discard the rogue text and only keep the dict
The stringparser tool takes in a string, and returns a struct, or the initial input string if it couldn't parse it. Hence you will have a long string (unchanged) as value to a "top level" key in the parsed data.
This problem is complex, the thing is that you cannot make a top-to-bottom parser with a stateless regex-based parser (thus we cannot make a partial parsing of a long line, which is sad). Currently, as said previously, the value in the main dict is left unchanged if it is too long to parse. This is slow however because we tried to parse the long string. Looking at the size of the line is not an accurate way, since some lines have a huuuge amount of data but almost no structures, hence they are parsed instantly. My proposal is to make a struct-defining character counter, where we basically count the amounts of {}, (), [], "". This would be a quick, accurate enough check (since these characters are not that often found in strings). Let me know what you think of it |
Added a recursive parser that parses all the files in the ioreg folder.
Added a string parser that recursively parses structures found in the values.