Skip to content

Delta Kernel Draft PR #729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Delta Kernel Draft PR #729

wants to merge 17 commits into from

Conversation

vaibhavk1992
Copy link
Contributor

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

(For example: This pull request implements the sync for delta format.)

Brief change log

(for example:)

  • Fixed JSON parsing error when persisting state
  • Added unit tests for schema evolution

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added TestConversionController to verify the change.
  • Manually verified the change by running a job locally.

@vaibhavk1992 vaibhavk1992 marked this pull request as draft June 30, 2025 10:42
@vaibhavk1992 vaibhavk1992 marked this pull request as ready for review June 30, 2025 15:40
@@ -53,7 +53,7 @@
<module>xtable-utilities</module>
<module>xtable-aws</module>
<module>xtable-hive-metastore</module>
<module>xtable-service</module>
<!-- <module>xtable-service</module>-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added back, any reason why you had to comment this out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, xtable-service was giving build failures. As xtable-service is independent of the delta kernel changes. Can we just review the delta kernel changes as of now? I just want to get my changes validated once and anyhow the final version of delta kernel changes would have xtable-service module too.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaibhavk1992 I think if you rebase with latest main branch you shouldn't see those failures.

<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-kernel-api</artifactId>
<version>4.0.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a properly in the root pom called <delta.kernel.version>4.0.0</delta.kernel.version>, instead of using the hardcoded value?

Also curious how you ended up choosing delta kernel version, is there some specific version that needs to align with delta lake version we have in the repo?

public class DeltaKernelConversionSourceProvider extends ConversionSourceProvider<Long> {
@Override
public DeltaKernelConversionSource getConversionSourceInstance(SourceTable sourceTable) {
Configuration hadoopConf = new Configuration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why you are creating a new hadoopConf, can you instead use the hadoopConf from the parent class similar to what DeltaConversionSourceProvider does.

return INSTANCE;
}

public InternalSchema toInternalSchema_v2(StructType structType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just call this toInternalSchema, since its in its own distinct class right?

// Get schema from Delta Kernel's snapshot
io.delta.kernel.types.StructType schema = snapshot.getSchema();

System.out.println("Kernelschema: " + schema);
Copy link
Contributor

@rahil-c rahil-c Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] well need to remove these in final version of the pr.

* Converts between Delta and InternalTable schemas. Some items to be aware of:
*
* <ul>
* <li>Delta schemas are represented as Spark StructTypes which do not have enums so the enum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can live this file as is right?

import org.apache.xtable.spi.extractor.ConversionSource;

@Builder
public class DeltaKernelConversionSource implements ConversionSource<Long> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need full implementation of all the interface methods, otherwise this will fail during the table format sync. Can you refer to the impl for DeltaConversionSource for these methods?

@@ -53,7 +53,7 @@
<module>xtable-utilities</module>
<module>xtable-aws</module>
<module>xtable-hive-metastore</module>
<module>xtable-service</module>
<!-- <module>xtable-service</module>-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comment this?

@@ -14,6 +14,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
junit.jupiter.execution.parallel.enabled=true
junit.jupiter.execution.parallel.enabled=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run the tests locally by setting this to true and they pass, can we revert this config and see the GH build?

Copy link
Contributor

@vinishjail97 vinishjail97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great progress @vaibhavk1992, added some comments.

@@ -53,7 +53,7 @@
<module>xtable-utilities</module>
<module>xtable-aws</module>
<module>xtable-hive-metastore</module>
<module>xtable-service</module>
<!-- <module>xtable-service</module>-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaibhavk1992 I think if you rebase with latest main branch you shouldn't see those failures.

@@ -713,7 +713,7 @@
</executions>
<configuration>
<skip>${skipUTs}</skip>
<redirectTestOutputToFile>true</redirectTestOutputToFile>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why revert this?

@@ -19,6 +19,6 @@ targetFormats:
- DELTA
datasets:
-
tableBasePath: /Desktop/opensource/iceberg/warehouse/demo/nyc/taxis
tableDataPath: /Desktop/opensource/iceberg/warehouse/demo/nyc/taxis/data
tableBasePath: /Users/vaibhakumar/Desktop/opensource/iceberg/warehouse/demo/nyc/taxis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can be reverted too?

* limitations under the License.
*/

package org.apache.xtable.delta;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move new DeltaKernel* classes to org.apache.xtable.kernel?
Any reason for having kernel in a new package outside delta? We might rename to delta-kernel or something if we keep it outside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants