-
-
Notifications
You must be signed in to change notification settings - Fork 56
New lexer dev #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
New lexer dev #190
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In Progress Experimental Lexer
I have been working on a new lexer. In no hurry. Just hacking away at it when time permits. It's nowhere near ready. But it's in a state now where it seems to work, as far as I can tell, and I haven't looked real hard. I've no doubt there are still issues. There are still issues. Error handling is minimal. There's debugging code. It uses an external header-only lexing library (lexertl17) ('ve never used lexertl before, I picked it because it's the lexer Boost uses). But it's fast! Really fast! I would not normally post messy dev code like this, but I feel the speed difference justifies it.
I am under no illusions that this should be merged in its current state. A PR seems the only vechicle to share this kind of stuff however.
ImHex changes here
Timing Tests
These tests are from the "hex.builtin.task.analyzing_data" background task.
Thoughts
There are CI build errors. Some of them are my fault. I think, but can’t be sure, that some are not. I'm getting better at using GIT, but I'm still crap at it.
I can't be sure that as I make my lexer more fit for purpose that the 2x+ performance gains won't be whittled away. Although I can't be sure exactly what makes it faster, it seems obvious it’s lexertl.
I’ve added a pre-build step to the make system. Lexertl supports generating a “static lexer” (it generates source code to build the state machine at compile time). I was planning on using this in release builds. It’ causing problems on some platforms. I’ve never used CMake before ImHex. I could use some help here.
I guess I'm posting this to see if anyone is interested. Without meaning to reopen old wounds, I don’t want to waste my time. I’ve made a few PRs that I thought that deserved consideration that were rejected out of hand. That said, I’ll probably complete it anyway if I’m honest. It’s an interesting problem.
I was planning on rewriting the whole lex/pre-process/parse stack. But in no hurry.
There are some decisions I've made in the lexer that I would move to the parser.
<=
and>=
being lexed as two seperate tokens for example (matching the old lexer). The lexer could be simpler. Although the line is blury, at times it feels like it's straying into parser-land.Part of me suspects I've made some stupid mistake. 2x+ seems too good to be true.
The Deferred Confusion Anti-Pattern
We’ve all written code that later we find hard to understand ourselves. Hand-written lexers/parsers excel here. I feel a more formal framework, once you get your head around it, is beneficial.