-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Current behavior
when hb read some nested lists with ragged_rank > 1,the read Value cannot be transformed to SparseTensor by function hb.data.to_sparse.
For example:
dense_feature is one of the features read by hb.data.ParquetDataset, and to_sparse does not work for it.

Moreover, if I swap the order of the two nested_row_splits, then it can be to_sparse.
So maybe the order of the nested_row_splits when reading parquet file is incorrect?
Expected behavior
the Value read from parquet file can be transformed to SparseTensor.
System information
- GPU model and memory: No
- OS Platform: Ubuntu
- Docker version: No
- GCC/CUDA/cuDNN version: 7.4/No/No
- Python/conda version:3.6.13/4.13.0
- TensorFlow/PyTorch version:1.14.0
Code to reproduce
import tensorflow as tf
import hybridbackend.tensorflow as hb
dataset = hb.data.ParquetDataset("test2.zstd.parquet", batch_size=1)
dataset = dataset.apply(hb.data.to_sparse())
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
sess = tf.Session()
vals = sess.run(next_element)
# One more simple demo:
import tensorflow as tf
import hybridbackend.tensorflow as hb
val = hb.data.dataframe.DataFrame.Value(values = np.array([1,2,3,4,5]), nested_row_splits=(np.array([0,1,3,4,5]), np.array([0,2,4])))
sess = tf.Session()
sess.run(val.to_sparse())Willing to contribute
Yes
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
