One Does Not Simply Query a Stream - Project Analysis

Table of Contents

🔍 Project Summary
🚀 Architecture Overview
🔧 Technology Stack
💡 Improvement Suggestions
🛠️ Implementation Plan
📈 Comparison of Implementations
✈️ Kafka Connect JDBC Sink Demo
- Running the Demo

🔍 Project Summary

This project demonstrates different approaches to stream processing using various technologies:

Apache Kafka - Used as the core messaging platform
Kafka Streams - For Java-based stream processing
Apache Flink - For SQL-based stream processing
ksqlDB - For SQL-like stream processing on Kafka
RisingWave - For PostgreSQL-compatible stream processing
TableFlow - For data visualization using Trino and Apache Superset

The project implements a word count example across different stream processing technologies, as well as a flight status tracking system. It uses Docker Compose for containerization and provides a Makefile with emoji-enhanced commands for better user experience.

🚀 Architecture Overview

The architecture consists of several components:

Data Generation - A flight data generator produces events to Kafka
Stream Processing - Multiple implementations (Kafka Streams, Flink, ksqlDB, RisingWave)
Data Storage - SQLite database for storing flight status
Data Visualization - Apache Superset connected to Trino for querying data

🔧 Technology Stack

Messaging: Apache Kafka 3.9.0 (KRaft mode without ZooKeeper)
Stream Processing:
- Kafka Streams 3.5.0 (Java implementation)
- Apache Flink (SQL implementation)
- ksqlDB (SQL-like implementation)
- RisingWave (PostgreSQL-compatible implementation)
Data Storage: SQLite
Data Visualization: Apache Superset + Trino
Build System: Gradle with Kotlin DSL
Containerization: Docker Compose
Documentation: AsciiDoc

💡 Improvement Suggestions

1. Version Alignment

:warning-caption: :warning:

[WARNING]
====
* Kafka version in docker-compose.yml (3.9.0) doesn't match the user's preferred version (3.8.0)
* Kafka Streams version (3.5.0) doesn't match the user's preferred version (3.8.0)
* No explicit Flink version control to ensure 1.20 is used
====

2. Code Structure Improvements

:tip-caption: :bulb:

[TIP]
====
* *Package Structure*: The `MovieProducer.java` uses package `com.example` instead of the project's package structure `ai.startree.dev.query.kafka`
* *Consistent Naming*: Align class names with functionality (e.g., rename `MovieProducer` to `FlightStatusProducer`)
* *TODOs Cleanup*: Remove TODOs like "TODO versions" in build.gradle.kts and "TODO - configurable (?)" in WordCountService.java
====

3. Documentation Enhancements

:tip-caption: :bulb:

[TIP]
====
* Create a comprehensive root README.adoc that explains:
  ** Project overview and purpose
  ** How the different implementations relate to each other
  ** Step-by-step guide to run the complete project
  ** Architecture diagram showing data flow
* Add diagrams using PlantUML or Mermaid for better visualization
====

4. Build System Improvements

:tip-caption: :bulb:

[TIP]
====
* Update Gradle to use Kotlin DSL consistently across all modules
* Add version catalogs for dependency management
* Implement a root Gradle project with subprojects for each implementation
* Align all dependency versions across modules
====

5. Testing Enhancements

:tip-caption: :bulb:

[TIP]
====
* Add comprehensive unit tests for all implementations
* Implement integration tests using testcontainers
* Add performance benchmarks to compare different implementations
* Create CI/CD pipeline using GitHub Actions
====

6. Operational Improvements

:tip-caption: :bulb:

[TIP]
====
* Create a unified Makefile at the root level with colorful emoji commands
* Add health checks for all services in docker-compose.yml
* Implement proper logging configuration across all components
* Add monitoring using Prometheus and Grafana
====

7. Feature Enhancements

:tip-caption: :bulb:

[TIP]
====
* Implement a more complex example beyond word count (the flight status is a good start)
* Add a unified web UI to compare results from different implementations
* Implement exactly-once semantics across all implementations
* Add schema registry integration for data consistency
====

🛠️ Implementation Plan

Here’s a suggested implementation plan for these improvements:

:task-caption: :clipboard:

[discrete]
=== Phase 1: Alignment & Cleanup

[task]
* Update Kafka and Kafka Streams versions to 3.8.0
* Fix package structure inconsistencies
* Clean up TODOs
* Create comprehensive root README.adoc

[discrete]
=== Phase 2: Build & Test Improvements

[task]
* Implement root Gradle project with subprojects
* Add version catalogs
* Enhance unit and integration tests
* Set up GitHub Actions CI/CD

[discrete]
=== Phase 3: Operational & Feature Enhancements

[task]
* Create unified Makefile with emoji commands
* Add monitoring and health checks
* Implement more complex examples
* Create unified web UI

📈 Comparison of Implementations

Technology	Strengths	Weaknesses	Best Use Cases
Kafka Streams	Java API, stateful processing, exactly-once semantics	Java-only, steeper learning curve	Complex event processing, stateful applications
Apache Flink	SQL interface, powerful windowing, batch+stream	More complex deployment	Complex analytics, large-scale processing
ksqlDB	SQL-like syntax, easy to use, interactive	Limited advanced features	Simple analytics, stream-table joins
RisingWave	PostgreSQL compatibility, materialized views	Newer technology, less mature	SQL-centric teams, materialized views
Kafka Connect	Easy integration with external systems, declarative configuration	Limited transformation capabilities	Data ingestion/egress, CDC, database integration

✈️ Kafka Connect JDBC Sink Demo

The project includes a Kafka Connect demo that shows how to use Kafka Connect with a JDBC Sink connector to stream flight status data from a Kafka topic to a SQLite database. The demo uses Confluent Cloud for Kafka and Schema Registry.

Running the Demo

To run the Kafka Connect demo:

Navigate to the project root directory
Configure your Confluent Cloud credentials in kafka-connect/cloud.properties
Run the setup command:
```
make setup
```
Verify the setup by querying the flight status data:
```
make query
```

For detailed instructions, refer to the Kafka Connect Demo README.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
flink		flink
kafka-connect		kafka-connect
kafka-streams		kafka-streams
kwack		kwack
risingwave		risingwave
tableflow		tableflow
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.adoc		README.adoc
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

One Does Not Simply Query a Stream - Project Analysis

🔍 Project Summary

🚀 Architecture Overview

🔧 Technology Stack

💡 Improvement Suggestions

1. Version Alignment

2. Code Structure Improvements

3. Documentation Enhancements

4. Build System Improvements

5. Testing Enhancements

6. Operational Improvements

7. Feature Enhancements

🛠️ Implementation Plan

📈 Comparison of Implementations

✈️ Kafka Connect JDBC Sink Demo

Running the Demo

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

gAmUssA/one-does-not-simply-query-a-stream

Folders and files

Latest commit

History

Repository files navigation

One Does Not Simply Query a Stream - Project Analysis

🔍 Project Summary

🚀 Architecture Overview

🔧 Technology Stack

💡 Improvement Suggestions

1. Version Alignment

2. Code Structure Improvements

3. Documentation Enhancements

4. Build System Improvements

5. Testing Enhancements

6. Operational Improvements

7. Feature Enhancements

🛠️ Implementation Plan

📈 Comparison of Implementations

✈️ Kafka Connect JDBC Sink Demo

Running the Demo

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages