-
Notifications
You must be signed in to change notification settings - Fork 1
Fix o2m id field handling rebased3 #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @bosd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly strengthens the data import capabilities by introducing advanced pre-processing and error recovery mechanisms. It addresses common data quality issues, such as incorrect field types and encoding problems in source CSVs, by automatically correcting them or providing robust fallbacks. The changes aim to make the import process more resilient to imperfect input data, reducing manual intervention and improving the success rate of complex data synchronizations with Odoo. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to the data import process, focusing on robustness and correcting the handling of relational fields. Key changes include:
- Proactive data type conversion for numeric fields to prevent common Odoo errors.
- A more robust method for reading CSV files with fallback encoding support.
- A complete fix for the
write_tuplestrategy for many-to-many relationships, which was previously using incorrect logic. - Improved resolution of external IDs, now supporting both database IDs and XML IDs more efficiently.
The changes are well-structured and address important issues. However, I've identified a critical memory regression in the new file reading logic that could impact performance with large files. I've also found a new, potentially useful function that appears to be uncalled. My detailed comments are below.
- Add automatic type correction preflight check for integer fields - Fix batch processing scaling logic for memory/gateway errors - Ensure successful imports are properly aggregated in id_map - Improve error handling and fallback mechanisms
26f240b to
96ad746
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to the data import functionality, focusing on robustness and correctness. Key changes include a major refactoring of CSV file reading to handle various encodings and fix a bug in skipping header rows. It proactively converts data types before sending them to Odoo, which should prevent many common import errors. The logic for handling many-to-many relationships and resolving external IDs has been rewritten to be more correct, performant, and robust. A new preflight check for auto-correcting data types is also a great addition. My review includes a suggestion to fix a bug in the new error handling logic to prevent double-processing of failed data chunks.
- Fixed undefined variable 'clean_error' in _execute_load_batch function - Fixed batch processing aggregation logic to properly collect successful records - Added type_correction_check preflight function with comprehensive tests - Fixed manual fallback implementation to use _handle_fallback_create helper - Added missing continue statement to prevent double-processing - Added 4 new tests for preflight functionality (+2.17% coverage improvement) - All syntax and type errors resolved - All 75 core tests passing (0 regressions)
…age check - Added 4 tests for type_correction_check preflight functionality (+2.17% coverage) - Added test for language check error handling when _get_installed_languages fails - Fixed syntax errors from pre-commit changes in _execute_load_batch function - Improved overall coverage from 78.57% to 80.83% (+2.26 percentage points) - All 47 preflight tests now pass (previously 31 passed) - All 30 import_threaded tests still pass (no regressions) - Fixed manual fallback implementation bug from review comment
- Added comprehensive tests for type_correction_check preflight function (+2.17% coverage) - Added tests for language check error handling scenarios - Improved overall coverage from 78.57% to 80.87% (+2.30 percentage points) - All core functionality now properly tested - Fixed syntax errors and undefined variable issues - All 47 preflight tests and 30 import_threaded tests passing
6feb4c9 to
85ec927
Compare
85ec927 to
fbfde07
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces substantial improvements to the data import process, focusing on robustness and error handling. Key changes include a sophisticated file reading mechanism with encoding fallbacks, proactive data type correction for numeric fields, and a more efficient and correct method for resolving relational IDs. The code is also significantly refactored into smaller, more manageable helper functions, and test coverage has been greatly increased. My review includes a couple of suggestions to address a potential regression in ID sanitization and a minor code redundancy.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and impressive set of improvements to the data import process. The refactoring greatly enhances robustness, correctness, and maintainability. Key highlights include the new pre-flight type correction, safer value conversion to prevent server-side errors, and a complete overhaul of relational ID handling to correctly process both database IDs and various XML ID formats. The addition of comprehensive, focused unit tests is also a major step forward. I've found a critical issue related to batch processing logic and a couple of areas for further improvement, detailed in the comments.
| elif isinstance(id_val, str) and len(id_val) > 0: | ||
| # It's a non-empty string that's not purely numeric - treat as XML ID | ||
| xml_ids.append(id_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new logic for resolving IDs is a great improvement for handling both database IDs and XML IDs. However, there seems to be a potential regression. The previous implementation handled full XML IDs with module prefixes (e.g., base.res_partner_1) by splitting the string. This new logic treats the entire string as the XML ID name.
If the source data might contain full XML IDs with module prefixes, the lookup in ir.model.data will fail because it only stores the identifier part in the name field. It would be more robust to strip the module prefix if it exists.
| elif isinstance(id_val, str) and len(id_val) > 0: | |
| # It's a non-empty string that's not purely numeric - treat as XML ID | |
| xml_ids.append(id_val) | |
| elif isinstance(id_val, str) and len(id_val) > 0: | |
| # It's a non-empty string that's not purely numeric - treat as XML ID. | |
| # Strip module prefix if present, as ir.model.data stores the name without it. | |
| xml_ids.append(id_val.split(".", 1)[-1]) |
| except IndexError: | ||
| # Handle tuple index out of range errors specifically in load operations | ||
| log.warning( | ||
| "Tuple index out of range error detected, falling back to individual " | ||
| "record processing" | ||
| ) | ||
| progress.console.print( | ||
| "[yellow]WARN:[/] Tuple index out of range error, falling back to " | ||
| "individual record processing" | ||
| ) | ||
| _handle_fallback_create( | ||
| model, | ||
| current_chunk, | ||
| batch_header, | ||
| uid_index, | ||
| context, | ||
| ignore_list, | ||
| progress, | ||
| aggregated_id_map, | ||
| aggregated_failed_lines, | ||
| batch_number, | ||
| error_message="type conversion error", | ||
| ) | ||
| lines_to_process = lines_to_process[chunk_size:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to handle IndexError (specifically 'tuple index out of range') is duplicated here and later in the generic except Exception block at line 1142. While this might be intentional to catch the error from different sources (local vs. RPC), it can be a bit confusing.
Since both blocks trigger the same _handle_fallback_create logic, consider adding a brief comment explaining why two separate handlers are necessary. This would improve clarity for future maintenance.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
b06d4df to
4b67f29
Compare
4b67f29 to
f90aff9
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a substantial and valuable pull request that significantly refactors and hardens the import process. The introduction of helper functions like _parse_csv_data, _safe_convert_field_value, and _handle_fallback_create greatly improves code clarity and maintainability. The more robust error handling, especially for different file encodings and common Odoo RPC errors like 'tuple index out of range', is a major improvement that will make the import tool much more reliable. The addition of a preflight check for data type correction is also a fantastic feature. Finally, the extensive suite of new tests demonstrates a strong commitment to quality. I've left a couple of minor comments on potential issues, but overall this is excellent work.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a major refactoring to improve the robustness and error handling of the data import process. Key improvements include:
- Enhanced CSV Reading: The file reading logic now gracefully handles
UnicodeDecodeErrorby trying a series of fallback encodings. - Proactive Type Conversion: A new
_safe_convert_field_valuefunction and a pre-flighttype_correction_checkhave been added to automatically handle common data type mismatches (e.g., converting "1.0" to an integer), preventing many server-side errors. - Flexible Relational ID Handling: The logic for resolving relational fields (
/idcolumns) has been significantly improved to support both database IDs (integers) and XML IDs (strings), making it much more versatile. - Improved Error Handling: Error handling for "tuple index out of range" errors is now centralized and provides more user-friendly feedback. The fallback logic for batch failures has been refactored into a clean helper function, reducing code duplication.
- Code Modularity: The codebase has been broken down into smaller, more focused helper functions, improving readability and maintainability.
- Increased Test Coverage: A substantial number of new tests have been added to cover the new functionality and edge cases.
Overall, this is a very strong pull request that significantly hardens the import engine. I have one suggestion to further improve the robustness of the CSV reading logic for relational imports.
* Update import_threaded.py * Update src/odoo_data_flow/import_threaded.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/odoo_data_flow/import_threaded.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/odoo_data_flow/import_threaded.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: bosd <5e2fd43-d292-4c90-9d1f-74ff3436329a@anonaddy.me> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* Update test_import_threaded.py * Refactor: Use list comprehension for XML ID sanitization - Replace verbose nested loops with concise list comprehension for sanitizing unique ID fields - Makes the code more Pythonic and readable while maintaining the same functionality - Applies to_xmlid sanitization to unique ID field values when model has no _fields attribute Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * Update src/odoo_data_flow/import_threaded.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fixups * Update src/odoo_data_flow/import_threaded.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update src/odoo_data_flow/import_threaded.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: bosd <5e2fd43-d292-4c90-9d1f-74ff3436329a@anonaddy.me>
…ataFlow/odoo-data-flow into fix-o2m-id-field-handling-rebased3
…ed comprehensive tests to improve coverage of import_threaded.py from ~53% to 79%\n- Added targeted tests to improve coverage of importer.py from ~73% to 80%\n- Fixed mypy type annotation issues in new test files\n- All tests pass (564/564)\n- Project coverage now at 88.70% (above 85% target)\n- Mypy sessions now pass across all Python versions (3.9-3.13)
- Added comprehensive tests to improve coverage of import_threaded.py from ~53% to 79% - Added targeted tests to improve coverage of importer.py from ~73% to 80% - Fixed mypy type annotation issues in new test files - Fixed ruff unused variable warnings (prefixed with _) - All tests pass (564/564) - Project coverage now at 88.70% (above 85% target) - Mypy sessions now pass across all Python versions (3.9-3.13)
- Added comprehensive tests for error handling functions (_handle_create_error, _format_odoo_error, _handle_tuple_index_error) - Added extensive tests for data processing functions (_safe_convert_field_value, _read_data_file, _parse_csv_data) - Added tests for batch processing functions (_create_batch_individually, _execute_load_batch, _recursive_create_batches) - Added tests for utility functions (_get_model_fields, _filter_ignored_columns, _setup_fail_file) - Added tests for orchestration functions (_orchestrate_pass_1, _orchestrate_pass_2, _run_threaded_pass) - Coverage improved from ~53% to 78% for import_threaded.py - All 594 tests pass - Overall project coverage at 89.18% (above 85% target)
- Added missing type annotations to resolve mypy errors - Fixed assertion in test_safe_convert_field_value_edge_cases to allow None values - All mypy checks now pass - All 594 tests still pass - Coverage remains at 89.18%
- Moved all coverage improvement tests from separate files into existing test files: * tests/test_import_threaded_additional_coverage.py → tests/test_import_threaded.py * tests/test_import_threaded_coverage_improvement.py → tests/test_import_threaded_edge_cases.py * tests/test_import_threaded_targeted_coverage.py → tests/test_import_threaded_focused.py * tests/test_importer_coverage.py → tests/test_importer.py - Improved organization by grouping related tests together - All 594 tests still pass - All mypy checks pass - Project coverage remains at 89.18%
- Added missing import polars as pl statement - Fixed duplicate test function definitions that caused type errors - Removed orphaned code that caused syntax errors - Added assertion to clarify to mypy that source_df is not None after CSV reading - All mypy checks now pass (35 source files with no issues) - All 590 tests still pass - Coverage remains at 89.18%
- Fixed mypy type errors in test files by adding proper type annotations - Fixed failing tests by correcting assertions and mock return values - Added missing polars import to enable CSV reading functionality - Fixed tuple unpacking issues in test assertions - Fixed encoding fallback logic in CSV reading - All 569 tests now pass - Mypy passes on all 84 source files - Coverage remains at 86.41% (above 85% target) - Pre-commit style checks pass
b2e031f to
5031438
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and valuable refactoring of the data import logic. The changes make the import process more robust by improving error handling for data types and encodings, enhancing the flexibility of relational ID resolution, and adding proactive data correction checks. The introduction of helper functions and centralization of logic for error handling and batch fallbacks greatly improves code clarity and maintainability. The addition of extensive tests is also a major plus. I've identified a couple of minor areas for improvement regarding code clarity and logging precision, but overall, this is an excellent set of changes.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to the data import functionality, particularly around error handling, data type conversion, and relational field processing. The refactoring of _read_data_file to handle encoding fallbacks and the introduction of helper functions like _get_model_fields, _safe_convert_field_value, and _handle_fallback_create greatly enhance the robustness and maintainability of the code. The addition of a pre-flight check for type correction is also a valuable feature.
I've identified a few areas for improvement, mainly concerning the simplification of data conversion logic and fixing a bug in ID resolution. Overall, this is a high-quality contribution that addresses important issues in the import process.
| elif field_type == "float": | ||
| try: | ||
| # Convert numeric strings to float | ||
| if str_value.replace(".", "").replace("-", "").isdigit(): | ||
| return float(str_value) | ||
| else: | ||
| # Non-numeric string - leave as-is | ||
| return str_value | ||
| except (ValueError, TypeError): | ||
| # Conversion failed - leave as original string | ||
| return field_value | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for converting a string to a float is overly complex and potentially buggy. The check str_value.replace(".", "").replace("-", "").isdigit() can be fooled by strings with multiple dots or misplaced minus signs (e.g., "1.2.3" or "1--2"), which would then cause float() to raise a ValueError. A more direct and Pythonic approach is to just try the conversion and handle the exception.
elif field_type == "float":
try:
# Attempt to convert to float directly. This is more robust.
return float(str_value)
except (ValueError, TypeError):
# If conversion fails, return the original value to let Odoo handle it.
return field_value| else: | ||
| log.info(f"Load operation returned {msg_type}: {msg_text}") | ||
|
|
||
| res["messages"][0].get("message", "Batch load failed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for id_val in id_list: | ||
| if isinstance(id_val, str) and id_val.isdigit(): | ||
| # It's a numeric database ID | ||
| db_ids.append(int(id_val)) | ||
| elif isinstance(id_val, str) and len(id_val) > 0: | ||
| # It's a non-empty string that's not purely numeric - treat as XML ID | ||
| xml_ids.append(id_val) | ||
| else: | ||
| # Empty or None values | ||
| invalid_ids.append(id_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to separate database IDs from XML IDs does not handle integer values from the external_ids series. If id_list contains integers, they will fall into the else case and be treated as invalid IDs, which is incorrect. The logic should be updated to correctly handle integer database IDs.
for id_val in id_list:
if isinstance(id_val, int):
db_ids.append(id_val)
elif isinstance(id_val, str):
if id_val.isdigit():
db_ids.append(int(id_val))
elif len(id_val) > 0:
xml_ids.append(id_val)
else: # empty string
invalid_ids.append(id_val)
else:
# Other types like float, or None which was already dropped
invalid_ids.append(id_val)| if field_type in ("integer", "positive", "negative"): | ||
| try: | ||
| # Handle float strings like "1.0", "2.0" by converting to int | ||
| if "." in str_value: | ||
| float_val = float(str_value) | ||
| if float_val.is_integer(): | ||
| return int(float_val) | ||
| else: | ||
| # Non-integer float - leave as-is to let Odoo handle it | ||
| return str_value | ||
| elif str_value.lstrip("+-").isdigit(): | ||
| # Integer string like "1", "-5", or "+5" | ||
| return int(str_value) | ||
| else: | ||
| # Non-numeric string - leave as-is | ||
| return str_value | ||
| except (ValueError, TypeError): | ||
| # Conversion failed - leave as original string to let Odoo handle it | ||
| return field_value | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for integer conversion is more complex than necessary. The elif/else structure to check if a string is a digit before converting is redundant. A try-except block around the conversion is more Pythonic and handles all cases more cleanly and robustly.
try:
# Handle float strings like "1.0", "2.0" by converting to int
if "." in str_value:
float_val = float(str_value)
if float_val.is_integer():
return int(float_val)
# It's a non-integer float, return as string for Odoo to handle.
return str_value
# For other cases, attempt direct conversion to int.
return int(str_value)
except (ValueError, TypeError):
# If any conversion fails, return the original value.
return field_value
No description provided.