bullet-db
diff --git a/‎docs/about/contact.md
Lines changed: 2 additions & 0 deletions b/‎docs/about/contact.md
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/about/contributing.md
Lines changed: 8 additions & 5 deletions b/‎docs/about/contributing.md
Lines changed: 8 additions & 5 deletions
diff --git a/‎docs/backend/storm-setup.md
Lines changed: 1 addition & 1 deletion b/‎docs/backend/storm-setup.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/index.md
Lines changed: 2 additions & 2 deletions b/‎docs/index.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/pubsub/architecture.md
Lines changed: 2 additions & 2 deletions b/‎docs/pubsub/architecture.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/pubsub/kafka-setup.md
Lines changed: 0 additions & 3 deletions b/‎docs/pubsub/kafka-setup.md
Lines changed: 0 additions & 3 deletions
diff --git a/‎docs/pubsub/kafka.md
Lines changed: 69 additions & 0 deletions b/‎docs/pubsub/kafka.md
Lines changed: 69 additions & 0 deletions
diff --git a/‎docs/pubsub/storm-drpc-setup.md renamed to ‎docs/pubsub/storm-drpc.md
Lines changed: 13 additions & 10 deletions b/‎docs/pubsub/storm-drpc-setup.md renamed to ‎docs/pubsub/storm-drpc.md
Lines changed: 13 additions & 10 deletions
@@ -10,6 +10,8 @@ If you have any issues with any of the particular Bullet sub-components, feel fr
 | Web Service   | [https://github.com/yahoo/bullet-service/issues](https://github.com/yahoo/bullet-service/issues) |
 | UI            | [https://github.com/yahoo/bullet-ui/issues](https://github.com/yahoo/bullet-ui/issues)           |
 | Record        | [https://github.com/yahoo/bullet-record/issues](https://github.com/yahoo/bullet-record/issues)   |
+| Core          | [https://github.com/yahoo/bullet-core/issues](https://github.com/yahoo/bullet-core/issues)     |
+| Kafka PubSub  | [https://github.com/yahoo/bullet-kafka/issues](https://github.com/yahoo/bullet-kafka/issues)     |
 | Documentation | [https://github.com/yahoo/bullet-docs/issues](https://github.com/yahoo/bullet-docs/issues)       |
 
 ## Mailing Lists
 
@@ -6,18 +6,21 @@ We welcome all contributions! We also welcome all usage experiences, stories, an
 
 Bullet is hosted under the [Yahoo Github Organization](https://github.com/yahoo). In order to contribute to any Yahoo project, you will need to submit a CLA. When you submit a Pull Request to any Bullet repository, a CLABot will ask  you to sign the CLA if you haven't signed one already.
 
+Read the [human-readable summary](https://yahoocla.herokuapp.com/) of the CLA.
+
 ## Future plans
 
-Here is a list of features we are currently considering/working on. Feel free to [contact us](contact.md) with any ideas/suggestions/PRs for features mentioned here or anything else you think about!
+Here is some selected list of features we are currently considering/working on. Feel free to [contact us](contact.md) with any ideas/suggestions/PRs for features mentioned here or anything else you think about!
 
-This list is neither comprehensive nor in any particular order and lists some high level directions.
+This list is neither comprehensive nor in any particular order.
 
 | Feature             | Components  | Description               | Status        |
 |-------------------- | ----------- | ------------------------- | ------------- |
-| PubSub              | BE, WS, UI  | WS and BE talk through the PubSub. Bullet Storm uses Storm DRPC for this (strictly request-response) Using a pub/sub queue will let us implement Bullet on other Stream Processors, support incremental updates through WebSockets and more! | In Progress [#1](https://github.com/yahoo/bullet-core/pull/1) |
-| Incremental updates | BE, WS, UI  | Push results back to users as soon as they arrive. Our aggregations are additive, so progressive results can be streamed back. Micro-batching and other features come into play | In Progress |
+| Incremental updates | BE, WS, UI  | Push results back to users during the query lifetime. Micro-batching, windowing and other features come into play | In Progress |
+| Bullet on Spark     | BE          | Implement Bullet on Spark Streaming. Compared with SQL on Spark Streaming which stores data in memory, Bullet will be light-weight | In Progress |
 | Security            | WS, UI      | The obvious enterprise security for locking down access to the data and the instance of Bullet. Considering SSL, Kerberos, LDAP etc. | Planning |
-| Bullet on X         | BE          | With the pub/sub feature, Bullet can be implemented on other Stream Processors like Spark Streaming, Flink, Kafka Streaming, Samza etc | Open |
+| Bullet on X         | BE          | With the pub/sub feature, Bullet can be implemented on other Stream Processors like Flink, Kafka Streaming, Samza etc | Open |
 | SQL API             | BE, WS      | WS supports an endpoint that converts a SQL-like query into Bullet queries | Open |
 | LocalForage         | UI          | Migration to LocalForage to distance ourselves from the relatively small LocalStorage space | [#9](https://github.com/yahoo/bullet-ui/issues/9) |
+| Spring Boot Reactor | WS          | Migrate the Web Service to use Spring Boot reactor instead of servlet containers | Open |
 | UI Packaging        | UI          | Github releases and building from source are the only two options. Docker or something similar may be more apt | Open |
@@ -1,6 +1,6 @@
 # Bullet on Storm
 
-This section explains how to set up and run Bullet on Storm. If you're using the Storm DRPC PubSub, refer to [this section](../pubsub/storm-drpc-setup.md) for further details.
+This section explains how to set up and run Bullet on Storm. If you're using the Storm DRPC PubSub, refer to [this section](../pubsub/storm-drpc.md) for further details.
 
 ## Configuration
 
 
@@ -48,7 +48,7 @@ To set up Bullet on a real data stream, you need:
     1. Plug in your source of data. See [Getting your data into Bullet](backend/ingestion.md) for details
     2. Consume your data stream
 2. The [Web Service](ws/setup.md) set up to convey queries and return results back from the backend
-3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka-setup.md) on any Backend and [Storm DRPC](pubsub/storm-drpc-setup.md) for the Storm Backend.
+3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka.md) on any Backend and [Storm DRPC](pubsub/storm-drpc.md) for the Storm Backend.
 4. The optional [UI](ui/setup.md) set up to talk to your Web Service. You can skip the UI if all your access is programmatic
 
 !!! note "Schema in the UI"
@@ -151,7 +151,7 @@ The core of Bullet querying is not tied to the Backend and lives in a core libra
 
 The PubSub is responsible for transmitting queries from the API to the Backend and returning results back from the Backend to the clients. It decouples whatever particular Backend you are using with the API. We currently provide a PubSub implementation using Kafka as the transport layer. You can very easily [implement your own](pubsub/architecture.md#implementing-your-own-pubsub) by defining a few interfaces that we provide.
 
-In the case of Bullet on Storm, there is an [additional simplified option](pubsub/storm-drpc-setup.md) using [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as the PubSub. This layer is planned to only support a request-response model for querying in the future.
+In the case of Bullet on Storm, there is an [additional simplified option](pubsub/storm-drpc.md) using [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as the PubSub. This layer is planned to only support a request-response model for querying in the future.
 
 !!! note "DRPC PubSub"
 
 
@@ -27,8 +27,8 @@ The PubSub layer does not deal with queries and results and just works on instan
 
 If you want to use an implementation already built, we currently support:
 
-1. [Kafka](kafka-setup.md#setup) for any Backend
-2. [Storm DRPC](storm-drpc-setup.md#setup) if you're using Bullet on Storm as your Backend
+1. [Kafka](kafka.md#setup) for any Backend
+2. [Storm DRPC](storm-drpc.md#setup) if you're using Bullet on Storm as your Backend
 
 ## Implementing your own PubSub
 
 
@@ -0,0 +1,69 @@
+# Kafka PubSub
+
+The Kafka implemented of the Bullet PubSub can be used on any Backend and Web Service. It uses [Apache Kafka](https://kafka.apache.org) as the backing PubSub queue and works on all Backends.
+
+## How does it work?
+
+The implementation by default asks you to create two topics in a Kafka cluster - one for queries and another for results. The Web Service publishes queries to the queries topic and reads results from the results topic. Similarly, the Backend reads queries from the queries topic and writes results to the results topic. All messages are sent as [PubSubMessages](architecture.md#messages).
+
+You do not need to have two topics. You can have one but you should use multiple partitions and configure your Web Service and Backend to produce to and consume from the right partitions. See the [setup](#configuration) section for more details.
+
+!!! note "Kafka Client API"
+
+    The Bullet Kafka implementation uses the Kafka 0.10.2 client APIs. Generally, your forward or backward compatibilities should work as expected.
+
+## Setup
+
+Before setting up, you will obviously need a Kafka cluster setup with your topic(s) created. This cluster need only be a couple of machines. However, this depends on your query and result volumes. Generally, these are at most a few hundred or thousands of messages per second and a small Kafka cluster will suffice.
+
+### Plug into the Backend
+
+Depending on how your Backend is built, either add Bullet Kafka to your classpath or include it in your build tool. Head over to our [releases page](../releases.md#bullet-kafka) for getting the artifacts. If you're adding Bullet Kafka to the classpath instead of building a fat jar, you will need to get the jar with the classifier: ```fat``` since you will need Bullet Kafka and all its dependencies.
+
+Configure the backend to use the Kafka PubSub:
+
+```yaml
+bullet.pubsub.context.name: "QUERY_PROCESSING"
+bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"
+bullet.pubsub.kafka.bootstrap.servers: "server1:port1,server2:port2,..."
+bullet.pubsub.kafka.request.topic.name: "your-query-topic"
+bullet.pubsub.kafka.response.topic.name: "your-result-topic"
+```
+
+You will then need to configure the Publishers and Subscribers. For details on what to configure and what the defaults are, see the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml).
+
+### Plug into the Web Service
+
+You will need the Head over to our [releases page](../releases.md#bullet-kafka) and get the JAR artifact with the ```fat``` classifier. For example, you can download the artifact for the 0.2.0 release [directly from JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-kafka/0.2.0/)).
+
+You should then plug in this JAR to your Web Service following the instructions [here](../ws/setup.md#launch).
+
+For configuration, you should [follow the steps here](../ws/setup.md#pubsub-configuration) to create and provide a YAML file to the Web Service. Remember to change the context to ```QUERY_SUBMISSION```.
+
+```yaml
+bullet.pubsub.context.name: "QUERY_SUBMISSION"
+bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"
+bullet.pubsub.kafka.bootstrap.servers: "server1:port1,server2:port2,..."
+bullet.pubsub.kafka.request.topic.name: "your-query-topic"
+bullet.pubsub.kafka.response.topic.name: "your-result-topic"
+```
+
+As with the Backend, you will then need to configure the Publishers and Subscribers. See the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml). Remember that your Subscribers in the Backend are reading what the Producers in your Web Service are producing and vice-versa, so make sure to match up the topics and settings accordingly if you have any custom changes.
+
+## Passthrough Configuration
+
+You can pass additional Kafka Producer or Consumer properties to the PubSub Publishers and Subscribers by prefixing them with either ```bullet.pubsub.kafka.producer.``` for Producers or ```bullet.pubsub.kafka.consumer.``` for Consumers. The PubSub configuration uses and provides a few defaults for settings it thinks is important to manage. You can tweak them and add others. For a list of properties that you can configure, see the [Producer](https://kafka.apache.org/0102/documentation.html#producerconfigs) or [Consumer](https://kafka.apache.org/0102/documentation.html#newconsumerconfigs) configs in Kafka.
+
+!!! note "Types for the properties"
+
+    All Kafka properties are better off specified as Strings since Kafka type casts them accordingly. If you provide types, you might run into issues where YAML types do not match what the Kafka client is expecting.
+
+## Partitions
+
+You may choose to partition your topics for a couple of reasons:
+
+1. You may have one topic for both queries and responses and use partitions as a way to separate them.
+2. You may use two topics and partition one or both for scalability when reading and writing
+3. You may use two topics and partition one or both for sharding across multiple Web Service instances (and multiple instances in your Backend)
+
+You can accomplish all this with partition maps. You can configure what partitions your Publishers (Web Service or Backend) will write to using ```bullet.pubsub.kafka.request.partitions``` and what partitions your Subscribers will read from using ```bullet.pubsub.kafka.response.partitions```. Providing these to an instance of the Web Service or the Backend in the YAML file ensures that the Publishers in that instance only write to these request partitions and Subscribers only read from the response partitions. The Publishers will randomly adds one of the response partitions in the messages sent to ensure that the responses only arrive to one of those partitions this instance's Subscribers are waiting on. For more details, see the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml).
@@ -1,4 +1,4 @@
-# Storm DRPC PubSub Setup
+# Storm DRPC PubSub 
 
 Bullet on [Storm](https://storm.apache.org/) can use [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as a PubSub layer. DRPC or Distributed Remote Procedure Call, is built into Storm and consists of a set of servers that are part of the Storm cluster.
 
@@ -22,7 +22,7 @@ The DRPC PubSub is part of the [Bullet Storm](../releases.md#bullet-storm) start
 
 ### Plug into the Storm Backend
 
-When you are setting up your Bullet topology with your plug-in data source (a Spout or a topology), you will naturally build a JAR with all the dependencies or a *fat* JAR. This will include all the DRPC PubSub code and dependencies. You do not need anything else. For configuration, the YAML file that you probably already provide to your topology needs to have the additional settings listed below (the function name is optional but it should be unique per Storm cluster).
+When you are setting up your Bullet topology with your plug-in data source (a Spout or a topology), you will naturally build a JAR with all the dependencies or a *fat* JAR. This will include all the DRPC PubSub code and dependencies. You do not need anything else. For configuration, the YAML file that you probably already provide to your topology needs to have the additional settings listed below (the function name is optional but you should change the default since the DRPC function needs to be unique per Storm cluster). Now if you launch your topology, it should be wired up to use Storm DRPC.
 
 ```yaml
 bullet.pubsub.context.name: "QUERY_PROCESSING"
@@ -36,12 +36,15 @@ When you're plugging in the DRPC PubSub layer into your Web Service, you will ne
 
 You should then plug in this JAR to your Web Service following the instructions [here](../ws/setup.md#launch).
 
-For configuration, you should [follow the steps here](..ws/setup.md#pubsub-configuration) and add the context and class name listed above. You can also configure other settings that are listed in the [PubSub and PubSub Storm DRPC defaults section](https://github.com/yahoo/bullet-storm/blob/master/src/main/resources/bullet_storm_defaults.yaml) in the Bullet Storm defaults file. You will need to point to your DRPC servers using and set the function to the same value you chose [above](#storm-backend).
+For configuration, you should [follow the steps here](../ws/setup.md#pubsub-configuration) and add the context and class name listed above. You will need to point to your DRPC servers and set the function to the same value you chose [above](#storm-backend). You can configure this and other settings that are explained further in the [PubSub and PubSub Storm DRPC defaults section](https://github.com/yahoo/bullet-storm/blob/master/src/main/resources/bullet_storm_defaults.yaml) in the Bullet Storm defaults file.
 
 ```yaml
 bullet.pubsub.context.name: "QUERY_SUBMISSION"
 bullet.pubsub.class.name: "com.yahoo.bullet.storm.drpc.DRPCPubSub"
-bullet.pubsub.storm.drpc.servers: [server1, server2, server3]
+bullet.pubsub.storm.drpc.servers:
+  - server1
+  - server2
+  - server3
 bullet.pubsub.storm.drpc.function: "custom-name"
 bullet.pubsub.storm.drpc.http.protocol: "http"
 bullet.pubsub.storm.drpc.http.port: "4080"
@@ -54,16 +57,16 @@ bullet.pubsub.storm.drpc.http.connect.timeout.ms: 1000
 
 #### Scalability
 
-DRPC servers are a shared resource per Storm cluster and while it is horizontally scalable, the scalability of the Bullet backend is tied to it. If you only have a few DRPC servers in your Storm cluster, you may also need to add more to support more simultaneous DRPC requests. We have [found that](../backend/storm-performance.md#conclusion_3) each server gives us about ~250 simultaneous queries.
+DRPC servers are a shared resource per Storm cluster and it may be possible that you have to contend with other topologies in your multi-tenant cluster. While it is horizontally scalable, it does tie the scalability of the Bullet backend to it. If you only have a few DRPC servers in your Storm cluster, you may need to add more to support more simultaneous DRPC requests. We have [found that](../backend/storm-performance.md#conclusion_3) each server gives us about ~250 simultaneous queries. There is an Async implementation coming in Storm 2.0 that should increase the throughput.
 
 #### Query Duration
 
-The maximum time a query can run for depends on the maximum time Storm DRPC request can last in your Storm topology. Generally the default is set to 10 minutes. This means that the **longest query duration possible will be 10 minutes**. This is up to your cluster maintainers.
+The maximum time a query can run for depends on the maximum time Storm DRPC request can last in your Storm topology. Generally the default is set to 10 minutes. This means that the **longest query duration possible will be 10 minutes**. The value of this is up to your cluster maintainers.
 
-#### Reliability
+#### Request-Response
 
-Storm DRPC follows the principle of leaving retries to the DRPC user (in our case, the Bullet web service). At this moment, we have not chosen to add reliability mechanisms to the query publishing, result publishing or result subscribing sides of our DRPC PubSub implementations but the query subscribers do use the ```BufferingSubscriber``` mentioned [here](architecture.md#reliability).
+Our PubSub uses DRPC using HTTP REST in a request-response model. This means that it will not support incremental results as it is! We could switch our usage of DRPC to send signals to the topology to fetch results and start queries. Depending on if there is demand, we may support this in our implementation in the future.
 
-#### Request-Response
+#### Reliability
 
-Our PubSub uses DRPC using HTTP REST in a request-response model. This means that it will not support incremental results as it is! We could switch our usage of DRPC to send signals to the topology in addition to queries. Depending on if there is demand, we may support this in our implementation in the future.
+Storm DRPC follows the principle of leaving retries to the DRPC user (in our case, the Bullet web service). At this moment, we have not chosen to add reliability mechanisms to the query publishing, result publishing or result subscribing sides of our DRPC PubSub implementations but the query subscribers do use the ```BufferingSubscriber``` mentioned [here](architecture.md#reliability).