Efficient Dataset Loading and Preprocessing using TensorFlow Datasets
Access to standardized, ready-to-use datasets is critical for deep learning research and prototyping. This project demonstrates how to load, inspect, preprocess, and prepare datasets using TensorFlow Datasets (TFDS), enabling quick experimentation without manual dataset handling.
The notebook follows a clear workflow:
- Loading datasets from TFDS – Use
tfds.load()
to fetch datasets with optional train/test splits. - Inspecting dataset metadata – View dataset info, features, and label mappings.
- Preprocessing – Apply image resizing, normalization, and type conversion.
- Batching, shuffling, and caching – Build optimized input pipelines for training.
- Visualization – Display sample images with labels for verification.
From the code:
- TensorFlow – Model compatibility, preprocessing, and pipeline integration.
- TensorFlow Datasets (TFDS) – Dataset loading and metadata management.
- Matplotlib – Visualization of images and labels.
- NumPy – Optional numerical handling.
The notebook uses datasets from TensorFlow Datasets (TFDS) — dataset choice (e.g., MNIST, CIFAR-10) depends on tfds.load()
parameters in the code.
No manual dataset download or external file handling is required.
Requirements:
pip install tensorflow tensorflow-datasets matplotlib numpy
Run the notebook:
jupyter notebook tfds.ipynb
or in JupyterLab:
jupyter lab tfds.ipynb
- Successfully loaded a TFDS dataset with both training and testing splits.
- Visualized sample images with correct labels for dataset validation.
- Built an optimized input pipeline using batching, shuffling, caching, and prefetching.
Example snippet:
train_ds, test_ds = tfds.load('mnist', split=['train', 'test'], as_supervised=True)
train_ds = train_ds.shuffle(1024).batch(32).prefetch(tf.data.AUTOTUNE)
Sample dataset visualization:
Image: <tf.Tensor: shape=(28, 28, 1), dtype=uint8>
Label: 7
(Accompanied by plotted image using Matplotlib in the notebook)
- TFDS provides instant access to a wide variety of datasets with minimal code.
- Integrating TFDS with
tf.data
transformations ensures optimal GPU utilization. - Always inspect a dataset visually before training to verify preprocessing correctness.
- TFDS is highly useful for benchmarking and educational purposes.
💡 Some interactive outputs (e.g., plots, widgets) may not display correctly on GitHub. If so, please view this notebook via nbviewer.org for full rendering.
Mehran Asgari Email: imehranasgari@gmail.com GitHub: https://github.com/imehranasgari
This project is licensed under the Apache 2.0 License – see the LICENSE
file for details.