-
Notifications
You must be signed in to change notification settings - Fork 6
Add match input stream #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi konefah, Thanks for your comments and the pull request. I checked it and I saw some issues regarding Unicode and case normalization, which would have to be done differently if we never had the whole string in memory. That said, I think it could be done, but it's not completely trivial. That's why I would ask you about your use case if possible. Where are you reading the strings from? How long are they? Best regards. |
|
Hi marianobarrios & thank you for your feedback! |
|
Thanks for the explanation. Again, trying to understand the use case (to see if the extra complexity is justified). In your case, I see that you could split the file using new-lines, as each JSON is in a different line. Is it possible to rely on that? Additionally, would you mind sharing the regular expression that you are using? |
|
Thanks for your feedback! |
|
But what about the new-lines? Additionally, parsing JSON using a regular expressions is not really possible (JSON is not a regular language). It only works in your case because you are asking for a specific JSON. Sorry, but really need to understand the actual use case: I which context this program will run, where do these requirements come from. |
|
Hi @marianobarrios , |
|
OK, but out cannot parse an arbitrary JSON with a regular expression... |
|
Thank you for your feedback! I will replace the JSON file with the TXT file for the test. Your suggestions regarding Unicode handling and normalization cases would be greatly appreciated. |
|
What I am asking is some real-world use case in which this is useful. I cannot imagine any. It's important to have some use-case that justifies the added complexity. |
|
Hi @marianobarrios, Let me start by comparing the situation before and after my modification: Before: The matches method could only accept a parameter of type CharSequence. The matches method is used to validate text. Analogy: With this modification, it becomes possible to validate any data source and any data size without loading it into memory. Here are some examples: Embedded Resource Validation=> Loading a configuration file from classpath or a JAR. Data Pipeline Validation=> ETL or streaming data processing. |
This library meets part of my needs. It provides a solid foundation and already offers several useful features. However, it does not support using an InputStream to handle a continuous data stream without loading everything into memory — a feature that is essential for my use case. Without this capability, I cannot fully leverage the library in my context.
To address this limitation, I propose an update to the library that adds this missing functionality. With this modification, the matchAndReport method could take an InputStream as input.