Skip to content

Commit 222239b

Browse files
committed
All but Quick Start
1 parent 2aeb481 commit 222239b

File tree

12 files changed

+184
-66
lines changed

12 files changed

+184
-66
lines changed

docs/about/contact.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ If you have any issues with any of the particular Bullet sub-components, feel fr
1010
| Web Service | [https://github.com/yahoo/bullet-service/issues](https://github.com/yahoo/bullet-service/issues) |
1111
| UI | [https://github.com/yahoo/bullet-ui/issues](https://github.com/yahoo/bullet-ui/issues) |
1212
| Record | [https://github.com/yahoo/bullet-record/issues](https://github.com/yahoo/bullet-record/issues) |
13+
| Core | [https://github.com/yahoo/bullet-core/issues](https://github.com/yahoo/bullet-core/issues) |
14+
| Kafka PubSub | [https://github.com/yahoo/bullet-kafka/issues](https://github.com/yahoo/bullet-kafka/issues) |
1315
| Documentation | [https://github.com/yahoo/bullet-docs/issues](https://github.com/yahoo/bullet-docs/issues) |
1416

1517
## Mailing Lists

docs/about/contributing.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,21 @@ We welcome all contributions! We also welcome all usage experiences, stories, an
66

77
Bullet is hosted under the [Yahoo Github Organization](https://github.com/yahoo). In order to contribute to any Yahoo project, you will need to submit a CLA. When you submit a Pull Request to any Bullet repository, a CLABot will ask you to sign the CLA if you haven't signed one already.
88

9+
Read the [human-readable summary](https://yahoocla.herokuapp.com/) of the CLA.
10+
911
## Future plans
1012

11-
Here is a list of features we are currently considering/working on. Feel free to [contact us](contact.md) with any ideas/suggestions/PRs for features mentioned here or anything else you think about!
13+
Here is some selected list of features we are currently considering/working on. Feel free to [contact us](contact.md) with any ideas/suggestions/PRs for features mentioned here or anything else you think about!
1214

13-
This list is neither comprehensive nor in any particular order and lists some high level directions.
15+
This list is neither comprehensive nor in any particular order.
1416

1517
| Feature | Components | Description | Status |
1618
|-------------------- | ----------- | ------------------------- | ------------- |
17-
| PubSub | BE, WS, UI | WS and BE talk through the PubSub. Bullet Storm uses Storm DRPC for this (strictly request-response) Using a pub/sub queue will let us implement Bullet on other Stream Processors, support incremental updates through WebSockets and more! | In Progress [#1](https://github.com/yahoo/bullet-core/pull/1) |
18-
| Incremental updates | BE, WS, UI | Push results back to users as soon as they arrive. Our aggregations are additive, so progressive results can be streamed back. Micro-batching and other features come into play | In Progress |
19+
| Incremental updates | BE, WS, UI | Push results back to users during the query lifetime. Micro-batching, windowing and other features come into play | In Progress |
20+
| Bullet on Spark | BE | Implement Bullet on Spark Streaming. Compared with SQL on Spark Streaming which stores data in memory, Bullet will be light-weight | In Progress |
1921
| Security | WS, UI | The obvious enterprise security for locking down access to the data and the instance of Bullet. Considering SSL, Kerberos, LDAP etc. | Planning |
20-
| Bullet on X | BE | With the pub/sub feature, Bullet can be implemented on other Stream Processors like Spark Streaming, Flink, Kafka Streaming, Samza etc | Open |
22+
| Bullet on X | BE | With the pub/sub feature, Bullet can be implemented on other Stream Processors like Flink, Kafka Streaming, Samza etc | Open |
2123
| SQL API | BE, WS | WS supports an endpoint that converts a SQL-like query into Bullet queries | Open |
2224
| LocalForage | UI | Migration to LocalForage to distance ourselves from the relatively small LocalStorage space | [#9](https://github.com/yahoo/bullet-ui/issues/9) |
25+
| Spring Boot Reactor | WS | Migrate the Web Service to use Spring Boot reactor instead of servlet containers | Open |
2326
| UI Packaging | UI | Github releases and building from source are the only two options. Docker or something similar may be more apt | Open |

docs/backend/storm-setup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Bullet on Storm
22

3-
This section explains how to set up and run Bullet on Storm. If you're using the Storm DRPC PubSub, refer to [this section](../pubsub/storm-drpc-setup.md) for further details.
3+
This section explains how to set up and run Bullet on Storm. If you're using the Storm DRPC PubSub, refer to [this section](../pubsub/storm-drpc.md) for further details.
44

55
## Configuration
66

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ To set up Bullet on a real data stream, you need:
4848
1. Plug in your source of data. See [Getting your data into Bullet](backend/ingestion.md) for details
4949
2. Consume your data stream
5050
2. The [Web Service](ws/setup.md) set up to convey queries and return results back from the backend
51-
3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka-setup.md) on any Backend and [Storm DRPC](pubsub/storm-drpc-setup.md) for the Storm Backend.
51+
3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka.md) on any Backend and [Storm DRPC](pubsub/storm-drpc.md) for the Storm Backend.
5252
4. The optional [UI](ui/setup.md) set up to talk to your Web Service. You can skip the UI if all your access is programmatic
5353

5454
!!! note "Schema in the UI"
@@ -151,7 +151,7 @@ The core of Bullet querying is not tied to the Backend and lives in a core libra
151151

152152
The PubSub is responsible for transmitting queries from the API to the Backend and returning results back from the Backend to the clients. It decouples whatever particular Backend you are using with the API. We currently provide a PubSub implementation using Kafka as the transport layer. You can very easily [implement your own](pubsub/architecture.md#implementing-your-own-pubsub) by defining a few interfaces that we provide.
153153

154-
In the case of Bullet on Storm, there is an [additional simplified option](pubsub/storm-drpc-setup.md) using [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as the PubSub. This layer is planned to only support a request-response model for querying in the future.
154+
In the case of Bullet on Storm, there is an [additional simplified option](pubsub/storm-drpc.md) using [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as the PubSub. This layer is planned to only support a request-response model for querying in the future.
155155

156156
!!! note "DRPC PubSub"
157157

docs/pubsub/architecture.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ The PubSub layer does not deal with queries and results and just works on instan
2727

2828
If you want to use an implementation already built, we currently support:
2929

30-
1. [Kafka](kafka-setup.md#setup) for any Backend
31-
2. [Storm DRPC](storm-drpc-setup.md#setup) if you're using Bullet on Storm as your Backend
30+
1. [Kafka](kafka.md#setup) for any Backend
31+
2. [Storm DRPC](storm-drpc.md#setup) if you're using Bullet on Storm as your Backend
3232

3333
## Implementing your own PubSub
3434

docs/pubsub/kafka-setup.md

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs/pubsub/kafka.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Kafka PubSub
2+
3+
The Kafka implemented of the Bullet PubSub can be used on any Backend and Web Service. It uses [Apache Kafka](https://kafka.apache.org) as the backing PubSub queue and works on all Backends.
4+
5+
## How does it work?
6+
7+
The implementation by default asks you to create two topics in a Kafka cluster - one for queries and another for results. The Web Service publishes queries to the queries topic and reads results from the results topic. Similarly, the Backend reads queries from the queries topic and writes results to the results topic. All messages are sent as [PubSubMessages](architecture.md#messages).
8+
9+
You do not need to have two topics. You can have one but you should use multiple partitions and configure your Web Service and Backend to produce to and consume from the right partitions. See the [setup](#configuration) section for more details.
10+
11+
!!! note "Kafka Client API"
12+
13+
The Bullet Kafka implementation uses the Kafka 0.10.2 client APIs. Generally, your forward or backward compatibilities should work as expected.
14+
15+
## Setup
16+
17+
Before setting up, you will obviously need a Kafka cluster setup with your topic(s) created. This cluster need only be a couple of machines. However, this depends on your query and result volumes. Generally, these are at most a few hundred or thousands of messages per second and a small Kafka cluster will suffice.
18+
19+
### Plug into the Backend
20+
21+
Depending on how your Backend is built, either add Bullet Kafka to your classpath or include it in your build tool. Head over to our [releases page](../releases.md#bullet-kafka) for getting the artifacts. If you're adding Bullet Kafka to the classpath instead of building a fat jar, you will need to get the jar with the classifier: ```fat``` since you will need Bullet Kafka and all its dependencies.
22+
23+
Configure the backend to use the Kafka PubSub:
24+
25+
```yaml
26+
bullet.pubsub.context.name: "QUERY_PROCESSING"
27+
bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"
28+
bullet.pubsub.kafka.bootstrap.servers: "server1:port1,server2:port2,..."
29+
bullet.pubsub.kafka.request.topic.name: "your-query-topic"
30+
bullet.pubsub.kafka.response.topic.name: "your-result-topic"
31+
```
32+
33+
You will then need to configure the Publishers and Subscribers. For details on what to configure and what the defaults are, see the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml).
34+
35+
### Plug into the Web Service
36+
37+
You will need the Head over to our [releases page](../releases.md#bullet-kafka) and get the JAR artifact with the ```fat``` classifier. For example, you can download the artifact for the 0.2.0 release [directly from JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-kafka/0.2.0/)).
38+
39+
You should then plug in this JAR to your Web Service following the instructions [here](../ws/setup.md#launch).
40+
41+
For configuration, you should [follow the steps here](../ws/setup.md#pubsub-configuration) to create and provide a YAML file to the Web Service. Remember to change the context to ```QUERY_SUBMISSION```.
42+
43+
```yaml
44+
bullet.pubsub.context.name: "QUERY_SUBMISSION"
45+
bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"
46+
bullet.pubsub.kafka.bootstrap.servers: "server1:port1,server2:port2,..."
47+
bullet.pubsub.kafka.request.topic.name: "your-query-topic"
48+
bullet.pubsub.kafka.response.topic.name: "your-result-topic"
49+
```
50+
51+
As with the Backend, you will then need to configure the Publishers and Subscribers. See the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml). Remember that your Subscribers in the Backend are reading what the Producers in your Web Service are producing and vice-versa, so make sure to match up the topics and settings accordingly if you have any custom changes.
52+
53+
## Passthrough Configuration
54+
55+
You can pass additional Kafka Producer or Consumer properties to the PubSub Publishers and Subscribers by prefixing them with either ```bullet.pubsub.kafka.producer.``` for Producers or ```bullet.pubsub.kafka.consumer.``` for Consumers. The PubSub configuration uses and provides a few defaults for settings it thinks is important to manage. You can tweak them and add others. For a list of properties that you can configure, see the [Producer](https://kafka.apache.org/0102/documentation.html#producerconfigs) or [Consumer](https://kafka.apache.org/0102/documentation.html#newconsumerconfigs) configs in Kafka.
56+
57+
!!! note "Types for the properties"
58+
59+
All Kafka properties are better off specified as Strings since Kafka type casts them accordingly. If you provide types, you might run into issues where YAML types do not match what the Kafka client is expecting.
60+
61+
## Partitions
62+
63+
You may choose to partition your topics for a couple of reasons:
64+
65+
1. You may have one topic for both queries and responses and use partitions as a way to separate them.
66+
2. You may use two topics and partition one or both for scalability when reading and writing
67+
3. You may use two topics and partition one or both for sharding across multiple Web Service instances (and multiple instances in your Backend)
68+
69+
You can accomplish all this with partition maps. You can configure what partitions your Publishers (Web Service or Backend) will write to using ```bullet.pubsub.kafka.request.partitions``` and what partitions your Subscribers will read from using ```bullet.pubsub.kafka.response.partitions```. Providing these to an instance of the Web Service or the Backend in the YAML file ensures that the Publishers in that instance only write to these request partitions and Subscribers only read from the response partitions. The Publishers will randomly adds one of the response partitions in the messages sent to ensure that the responses only arrive to one of those partitions this instance's Subscribers are waiting on. For more details, see the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml).

docs/pubsub/storm-drpc-setup.md renamed to docs/pubsub/storm-drpc.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Storm DRPC PubSub Setup
1+
# Storm DRPC PubSub
22

33
Bullet on [Storm](https://storm.apache.org/) can use [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as a PubSub layer. DRPC or Distributed Remote Procedure Call, is built into Storm and consists of a set of servers that are part of the Storm cluster.
44

@@ -22,7 +22,7 @@ The DRPC PubSub is part of the [Bullet Storm](../releases.md#bullet-storm) start
2222

2323
### Plug into the Storm Backend
2424

25-
When you are setting up your Bullet topology with your plug-in data source (a Spout or a topology), you will naturally build a JAR with all the dependencies or a *fat* JAR. This will include all the DRPC PubSub code and dependencies. You do not need anything else. For configuration, the YAML file that you probably already provide to your topology needs to have the additional settings listed below (the function name is optional but it should be unique per Storm cluster).
25+
When you are setting up your Bullet topology with your plug-in data source (a Spout or a topology), you will naturally build a JAR with all the dependencies or a *fat* JAR. This will include all the DRPC PubSub code and dependencies. You do not need anything else. For configuration, the YAML file that you probably already provide to your topology needs to have the additional settings listed below (the function name is optional but you should change the default since the DRPC function needs to be unique per Storm cluster). Now if you launch your topology, it should be wired up to use Storm DRPC.
2626

2727
```yaml
2828
bullet.pubsub.context.name: "QUERY_PROCESSING"
@@ -36,12 +36,15 @@ When you're plugging in the DRPC PubSub layer into your Web Service, you will ne
3636

3737
You should then plug in this JAR to your Web Service following the instructions [here](../ws/setup.md#launch).
3838

39-
For configuration, you should [follow the steps here](..ws/setup.md#pubsub-configuration) and add the context and class name listed above. You can also configure other settings that are listed in the [PubSub and PubSub Storm DRPC defaults section](https://github.com/yahoo/bullet-storm/blob/master/src/main/resources/bullet_storm_defaults.yaml) in the Bullet Storm defaults file. You will need to point to your DRPC servers using and set the function to the same value you chose [above](#storm-backend).
39+
For configuration, you should [follow the steps here](../ws/setup.md#pubsub-configuration) and add the context and class name listed above. You will need to point to your DRPC servers and set the function to the same value you chose [above](#storm-backend). You can configure this and other settings that are explained further in the [PubSub and PubSub Storm DRPC defaults section](https://github.com/yahoo/bullet-storm/blob/master/src/main/resources/bullet_storm_defaults.yaml) in the Bullet Storm defaults file.
4040

4141
```yaml
4242
bullet.pubsub.context.name: "QUERY_SUBMISSION"
4343
bullet.pubsub.class.name: "com.yahoo.bullet.storm.drpc.DRPCPubSub"
44-
bullet.pubsub.storm.drpc.servers: [server1, server2, server3]
44+
bullet.pubsub.storm.drpc.servers:
45+
- server1
46+
- server2
47+
- server3
4548
bullet.pubsub.storm.drpc.function: "custom-name"
4649
bullet.pubsub.storm.drpc.http.protocol: "http"
4750
bullet.pubsub.storm.drpc.http.port: "4080"
@@ -54,16 +57,16 @@ bullet.pubsub.storm.drpc.http.connect.timeout.ms: 1000
5457

5558
#### Scalability
5659

57-
DRPC servers are a shared resource per Storm cluster and while it is horizontally scalable, the scalability of the Bullet backend is tied to it. If you only have a few DRPC servers in your Storm cluster, you may also need to add more to support more simultaneous DRPC requests. We have [found that](../backend/storm-performance.md#conclusion_3) each server gives us about ~250 simultaneous queries.
60+
DRPC servers are a shared resource per Storm cluster and it may be possible that you have to contend with other topologies in your multi-tenant cluster. While it is horizontally scalable, it does tie the scalability of the Bullet backend to it. If you only have a few DRPC servers in your Storm cluster, you may need to add more to support more simultaneous DRPC requests. We have [found that](../backend/storm-performance.md#conclusion_3) each server gives us about ~250 simultaneous queries. There is an Async implementation coming in Storm 2.0 that should increase the throughput.
5861

5962
#### Query Duration
6063

61-
The maximum time a query can run for depends on the maximum time Storm DRPC request can last in your Storm topology. Generally the default is set to 10 minutes. This means that the **longest query duration possible will be 10 minutes**. This is up to your cluster maintainers.
64+
The maximum time a query can run for depends on the maximum time Storm DRPC request can last in your Storm topology. Generally the default is set to 10 minutes. This means that the **longest query duration possible will be 10 minutes**. The value of this is up to your cluster maintainers.
6265

63-
#### Reliability
66+
#### Request-Response
6467

65-
Storm DRPC follows the principle of leaving retries to the DRPC user (in our case, the Bullet web service). At this moment, we have not chosen to add reliability mechanisms to the query publishing, result publishing or result subscribing sides of our DRPC PubSub implementations but the query subscribers do use the ```BufferingSubscriber``` mentioned [here](architecture.md#reliability).
68+
Our PubSub uses DRPC using HTTP REST in a request-response model. This means that it will not support incremental results as it is! We could switch our usage of DRPC to send signals to the topology to fetch results and start queries. Depending on if there is demand, we may support this in our implementation in the future.
6669

67-
#### Request-Response
70+
#### Reliability
6871

69-
Our PubSub uses DRPC using HTTP REST in a request-response model. This means that it will not support incremental results as it is! We could switch our usage of DRPC to send signals to the topology in addition to queries. Depending on if there is demand, we may support this in our implementation in the future.
72+
Storm DRPC follows the principle of leaving retries to the DRPC user (in our case, the Bullet web service). At this moment, we have not chosen to add reliability mechanisms to the query publishing, result publishing or result subscribing sides of our DRPC PubSub implementations but the query subscribers do use the ```BufferingSubscriber``` mentioned [here](architecture.md#reliability).

0 commit comments

Comments
 (0)