-
Notifications
You must be signed in to change notification settings - Fork 82
Encode column names with encoding of context #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Encode column names with encoding of context #210
Conversation
contrib/ruby/ext/trilogy-ruby/cext.c
Outdated
| } | ||
| } | ||
|
|
||
| rb_encoding *conn_enc = rb_to_encoding(ctx->encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rb_to_encoding is quite costly. We probably should it only once after connect or something, and keep a rb_encoding * in ctx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, it should be moved outside the loop, so that if you have 20 fields you don't do the conversion 20 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I moved the call to rb_to_encoding to the rb_trilogy_connect method and store it in rb_encoding *conn_encoding in the ctx. I also replaced the call to the rb_to_encoding method in rb_trilogy_query to use this value in the ctx struct. Let me know if you'd need any other adaptions for this.
contrib/ruby/ext/trilogy-ruby/cext.c
Outdated
|
|
||
| rb_encoding *conn_enc = rb_to_encoding(ctx->encoding); | ||
|
|
||
| #ifdef HAVE_RB_INTERNED_STR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be updated to HAVE_RB_ENC_INTERNED_STR and extconf.rb should also be updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I changed this to use HAVE_RB_ENC_INTERNED_STR and added the corresponding have_func to extconf.rb
contrib/ruby/ext/trilogy-ruby/cext.c
Outdated
| trilogy_handshake_t handshake; | ||
| VALUE val; | ||
|
|
||
| RB_OBJ_WRITE(self, &ctx->encoding, encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| RB_OBJ_WRITE(self, &ctx->encoding, encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're keeping both. I think we can get rid of the reference to encoding. Just turn encoding into a rb_encoding *, so we don't need to mark it or antyhing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I remove encoding from the ctx (and also from the trilogy_ctx struct) alltogether?
It's only used in the RB_OBJ_WRITE call (which, if I understand correctly, writes the encoding value to the ctx->encoding) and in the rb_to_encoding call, which probably makes storing this obsolete as we have the rb_encoding pointer in the ctx struct which can be used to encode values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I'm suggesting yes. That member is no longer used, so no point keeping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, didn't see your other comment in the conversation tab 😅 I applied these changes, now encoding in the ctx struct is the rb_encoding * value which can be directly used :)
|
Looks good to me now, but I'm not a maintainer so I can't merge. |
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
|
@byroot Great, thanks for reviewing, in that case let's wait for someone from the maintainers to review this as well :) |
|
Makes sense to me. It's still not quite the same as mysql2, which uses the default internal encoding, but that might also be fine—we are not obligated to match mysql2 behavior. |
Not matching the exact mysql2 behaviour is totally fine, I only mentioned this because it broke some code that relied on non-ascii encoding of column names 😅 Let me know if you need me to change anything else for you to be able to accept this PR 😄 |
|
May we place a friendly reminder to consider this pull request? Many thanks for your work. |
We detected some behaviour which might lead to some unexpected failures when migrating from the
mysql2Adapter to thetrilogyadapter.In the result set returned by a query, one can access the column names using the
fieldsattribute, however the encoding of these strings is different from the encoding of the database.Behaviour in mysql2:
When a query is executed via the database adapter (or rather its client), the column names (in the attribute
fieldsof the result) respect the encoding set for the database connection. Running a query on a database withencoding: utf8mb4specified in the client creation respects this encoding for the column names, returning the strings withEncoding:UTF-8encoding.Behaviour in trilogy:
Running the same query on the same database using the trilogy adapter returns the column names encoded with
Encoding:US-ASCII, ignoring the database encoding.This might lead to some unexpected encoding issues when the client uses the names from the query to further process the data.
Tests:
Behaviour with
mysql2Adapter:Result:
UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8Behaviour with
trilogy2.9.0:Result:
US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCII, US-ASCIIBehaviour after patch:
Result with patch:
UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8, UTF-8With this patch, trilogy returns the strings in the same encoding as the database, which matches the behaviour of the previous used mysql2 adapter.
I am not sure whether simply returning the strings in ASCII encoding is a deliberate choice or not, therefore feel free to accept or reject this PR, both is fine, I wanted to bring this to attention.
Let me know if you need any other informations, and I'd appreciate someone looking at my usage of Ruby C-API methods to ensure everything is used in its intended way, as I have only limited experience with the C-API.
The CI passed in my fork, see here and here.
Have a pleasant day!