|
| 1 | +# Kafka PubSub |
| 2 | + |
| 3 | +The Kafka implemented of the Bullet PubSub can be used on any Backend and Web Service. It uses [Apache Kafka](https://kafka.apache.org) as the backing PubSub queue and works on all Backends. |
| 4 | + |
| 5 | +## How does it work? |
| 6 | + |
| 7 | +The implementation by default asks you to create two topics in a Kafka cluster - one for queries and another for results. The Web Service publishes queries to the queries topic and reads results from the results topic. Similarly, the Backend reads queries from the queries topic and writes results to the results topic. All messages are sent as [PubSubMessages](architecture.md#messages). |
| 8 | + |
| 9 | +You do not need to have two topics. You can have one but you should use multiple partitions and configure your Web Service and Backend to produce to and consume from the right partitions. See the [setup](#configuration) section for more details. |
| 10 | + |
| 11 | +!!! note "Kafka Client API" |
| 12 | + |
| 13 | + The Bullet Kafka implementation uses the Kafka 0.10.2 client APIs. Generally, your forward or backward compatibilities should work as expected. |
| 14 | + |
| 15 | +## Setup |
| 16 | + |
| 17 | +Before setting up, you will obviously need a Kafka cluster setup with your topic(s) created. This cluster need only be a couple of machines. However, this depends on your query and result volumes. Generally, these are at most a few hundred or thousands of messages per second and a small Kafka cluster will suffice. |
| 18 | + |
| 19 | +### Plug into the Backend |
| 20 | + |
| 21 | +Depending on how your Backend is built, either add Bullet Kafka to your classpath or include it in your build tool. Head over to our [releases page](../releases.md#bullet-kafka) for getting the artifacts. If you're adding Bullet Kafka to the classpath instead of building a fat jar, you will need to get the jar with the classifier: ```fat``` since you will need Bullet Kafka and all its dependencies. |
| 22 | + |
| 23 | +Configure the backend to use the Kafka PubSub: |
| 24 | + |
| 25 | +```yaml |
| 26 | +bullet.pubsub.context.name: "QUERY_PROCESSING" |
| 27 | +bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub" |
| 28 | +bullet.pubsub.kafka.bootstrap.servers: "server1:port1,server2:port2,..." |
| 29 | +bullet.pubsub.kafka.request.topic.name: "your-query-topic" |
| 30 | +bullet.pubsub.kafka.response.topic.name: "your-result-topic" |
| 31 | +``` |
| 32 | +
|
| 33 | +You will then need to configure the Publishers and Subscribers. For details on what to configure and what the defaults are, see the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml). |
| 34 | +
|
| 35 | +### Plug into the Web Service |
| 36 | +
|
| 37 | +You will need the Head over to our [releases page](../releases.md#bullet-kafka) and get the JAR artifact with the ```fat``` classifier. For example, you can download the artifact for the 0.2.0 release [directly from JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-kafka/0.2.0/)). |
| 38 | + |
| 39 | +You should then plug in this JAR to your Web Service following the instructions [here](../ws/setup.md#launch). |
| 40 | + |
| 41 | +For configuration, you should [follow the steps here](../ws/setup.md#pubsub-configuration) to create and provide a YAML file to the Web Service. Remember to change the context to ```QUERY_SUBMISSION```. |
| 42 | + |
| 43 | +```yaml |
| 44 | +bullet.pubsub.context.name: "QUERY_SUBMISSION" |
| 45 | +bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub" |
| 46 | +bullet.pubsub.kafka.bootstrap.servers: "server1:port1,server2:port2,..." |
| 47 | +bullet.pubsub.kafka.request.topic.name: "your-query-topic" |
| 48 | +bullet.pubsub.kafka.response.topic.name: "your-result-topic" |
| 49 | +``` |
| 50 | + |
| 51 | +As with the Backend, you will then need to configure the Publishers and Subscribers. See the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml). Remember that your Subscribers in the Backend are reading what the Producers in your Web Service are producing and vice-versa, so make sure to match up the topics and settings accordingly if you have any custom changes. |
| 52 | + |
| 53 | +## Passthrough Configuration |
| 54 | + |
| 55 | +You can pass additional Kafka Producer or Consumer properties to the PubSub Publishers and Subscribers by prefixing them with either ```bullet.pubsub.kafka.producer.``` for Producers or ```bullet.pubsub.kafka.consumer.``` for Consumers. The PubSub configuration uses and provides a few defaults for settings it thinks is important to manage. You can tweak them and add others. For a list of properties that you can configure, see the [Producer](https://kafka.apache.org/0102/documentation.html#producerconfigs) or [Consumer](https://kafka.apache.org/0102/documentation.html#newconsumerconfigs) configs in Kafka. |
| 56 | + |
| 57 | +!!! note "Types for the properties" |
| 58 | + |
| 59 | + All Kafka properties are better off specified as Strings since Kafka type casts them accordingly. If you provide types, you might run into issues where YAML types do not match what the Kafka client is expecting. |
| 60 | + |
| 61 | +## Partitions |
| 62 | + |
| 63 | +You may choose to partition your topics for a couple of reasons: |
| 64 | + |
| 65 | +1. You may have one topic for both queries and responses and use partitions as a way to separate them. |
| 66 | +2. You may use two topics and partition one or both for scalability when reading and writing |
| 67 | +3. You may use two topics and partition one or both for sharding across multiple Web Service instances (and multiple instances in your Backend) |
| 68 | + |
| 69 | +You can accomplish all this with partition maps. You can configure what partitions your Publishers (Web Service or Backend) will write to using ```bullet.pubsub.kafka.request.partitions``` and what partitions your Subscribers will read from using ```bullet.pubsub.kafka.response.partitions```. Providing these to an instance of the Web Service or the Backend in the YAML file ensures that the Publishers in that instance only write to these request partitions and Subscribers only read from the response partitions. The Publishers will randomly adds one of the response partitions in the messages sent to ensure that the responses only arrive to one of those partitions this instance's Subscribers are waiting on. For more details, see the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml). |
0 commit comments