Skip to content

Commit d6b1bb7

Browse files
authored
Docs update (#42)
1 parent b161ac6 commit d6b1bb7

28 files changed

+460
-366
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ While mkdocs is available:
5555

5656
## Building the examples
5757

58-
You will need [Maven 3](https://maven.apache.org/install.html) and [JDK 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html) installed to build the examples.
58+
You will need [Maven 3](https://maven.apache.org/install.html) and [JDK 8](https://jdk.java.net/java-se-ri/8-MR3) installed to build the examples.
5959

6060
```bash
6161
cd bullet-docs/examples/ && make
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.

docs/backend/dsl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ The JSONBulletRecordConverter is used to convert String JSON representations of
138138

139139
### AvroBulletRecordConverter
140140

141-
The AvroBulletRecordConverter is used to convert Avro records into BulletRecords. Without a schema, it inserts every field into a BulletRecord without any type-checking. With a schema, you get type-checking, and you can also specify a RECORD field, and the converter will accept Avro Records in addition to Maps, flattening them into the BulletRecord.
141+
The AvroBulletRecordConverter is used to convert Avro records into BulletRecords. Without a schema, it inserts every field into a BulletRecord without any type-checking. With a schema, you get type-checking, and you can also specify a RECORD field, and the converter will accept Avro Records in addition to Maps, flattening them into the BulletRecord. This converter also handles container types (such as Maps and Lists) that contain heterogenous nested types as well having more nesting levels than the types we support. It maps then to the appropriate Bullet `UNKNOWN_MAP`, `UNKNOWN_MAP_MAP` etc types so that queries can still be written that pull out these fields and if *they* are types that Bullet understands, the query can still execute.
142142

143143
### Schema
144144

docs/backend/ingestion.md

Lines changed: 3 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -36,26 +36,13 @@ Data placed into a Bullet Record is strongly typed. We support these types curre
3636
3. List of any of the [Primitives](#primitives)
3737
3. List of any Map in 1
3838

39-
With these types, it is unlikely you would have data that cannot be represented as Bullet Record but if you do, please let us know and we are more than willing to accommodate.
39+
With these types, it is unlikely you would have data that cannot be represented as Bullet Record but if you do, please let us know and we are more than willing to accommodate. It is also possible to place `UNKNOWN` container types such as Maps and Lists into the record. This can be useful for more deeply nested data structures or heterogenous container types. However, operations that extract fields from it can only work if the type of the extracted object is in the supported types above.
4040

4141
## Installing the Record directly
4242

4343
Generally, you depend on the Bullet Core artifact for your Stream Processor when you plug in the piece that gets your data into the Stream processor. The Bullet Core artifact already brings in the Bullet Record containers as well. See the usage for the [Storm](storm-setup.md#installation) for an example.
4444

45-
However, if you need it, the artifacts are available through JCenter to depend on them in code directly. You will need to add the repository. Below is a Maven example:
46-
47-
```xml
48-
<repositories>
49-
<repository>
50-
<snapshots>
51-
<enabled>false</enabled>
52-
</snapshots>
53-
<id>central</id>
54-
<name>bintray</name>
55-
<url>http://jcenter.bintray.com</url>
56-
</repository>
57-
</repositories>
58-
```
45+
However, if you need it, the artifacts are available through Maven Central to depend on them in code directly. Below is a Maven example:
5946

6047
```xml
6148
<dependency>
@@ -65,6 +52,6 @@ However, if you need it, the artifacts are available through JCenter to depend o
6552
</dependency>
6653
```
6754

68-
If you just need the jar artifact, you can download it directly from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-record/).
55+
If you just need the jar artifact, you can download it directly from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-record/).
6956

7057
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the sources or the javadoc.

docs/backend/spark-architecture.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ The red lines are the path for the queries that come in through the PubSub, the
1414

1515
### Data processing
1616

17-
Bullet can accept arbitrary sources of data as long as they can be ingested by Spark. They can be Kafka, Flume, Kinesis, and TCP sockets etc. In order to hook up your data to Bullet Spark, you just need to implement the [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala). In your implementation, you can either:
17+
Bullet can accept arbitrary sources of data as long as they can be ingested by Spark. They can be Kafka, Flume, Kinesis, and TCP sockets etc. You can either use [DSL](dsl.md) or hook up your data directly to Bullet Spark. To do the latter, you just need to implement the [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala). In your implementation, you can either:
1818

1919
* Use [Spark Streaming built-in sources](https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers) to receive data. Below is a quick example for a direct Kafka source in Scala. You can also write it in Java:
2020

@@ -48,6 +48,8 @@ class DirectKafkaProducer extends DataProducer {
4848

4949
* Write a [custom receiver](https://spark.apache.org/docs/latest/streaming-custom-receivers.html) to receive data from any arbitrary data source beyond the ones for which it has built-in support (that is, beyond Flume, Kafka, Kinesis, files, sockets, etc.). See [example](https://github.com/bullet-db/bullet-db.github.io/tree/src/examples/spark/src/main/scala/com/yahoo/bullet/spark/examples).
5050

51+
To use DSL, you can enable it by providing the `bullet.spark.dsl.data.producer.enable: true` and configuring the various DSL parameters.
52+
5153
After receiving your data, you can do any transformations like joins or type conversions in your implementation before emitting to the Filter Streaming stage.
5254

5355
The Filter Streaming stage checks every record from your data source against every query from Query Unioning stage to see if it matches and emits partial results to the Join Streaming stage.

docs/backend/spark-setup.md

Lines changed: 7 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,18 @@ Bullet is configured at run-time using settings defined in a file. Settings not
88

99
## Installation
1010

11-
Download the Bullet Spark standalone jar from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-spark/).
11+
Download the Bullet Spark standalone jar from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-spark/).
1212

13-
If you are using Bullet Kafka as pluggable PubSub, you can download the fat jar from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-kafka/). Otherwise, you need to plug in your own PubSub jar or use the RESTPubSub built-into bullet-core and turned on in the API.
13+
If you are using Bullet Kafka as pluggable PubSub, you can download the fat jar from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-kafka/). Otherwise, you need to plug in your own PubSub jar or use the RESTPubSub built-into bullet-core and turned on in the API.
1414

15-
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project or you can use Bullet DSL (see below). If you choose to implement your own, you have two ways as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through JCenter. Here is an example if you use Scala and Maven:
16-
17-
```xml
18-
<repositories>
19-
<repository>
20-
<snapshots>
21-
<enabled>false</enabled>
22-
</snapshots>
23-
<id>central</id>
24-
<name>bintray</name>
25-
<url>http://jcenter.bintray.com</url>
26-
</repository>
27-
</repositories>
28-
```
15+
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project or you can use Bullet DSL (see below). If you choose to implement your own, you have two ways as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through Maven Central. Here is an example if you use Scala and Maven:
2916

3017
```xml
3118
<properties>
32-
<scala.version>2.11.7</scala.version>
33-
<scala.dep.version>2.11</scala.dep.version>
34-
<spark.version>2.3.0</spark.version>
35-
<bullet.spark.version>0.1.1</bullet.spark.version>
19+
<scala.version>2.12.10</scala.version>
20+
<scala.dep.version>2.12</scala.dep.version>
21+
<spark.version>3.1.2</spark.version>
22+
<bullet.spark.version>1.2.0</bullet.spark.version>
3623
</properties>
3724

3825
<dependency>

docs/backend/storm-setup.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,14 @@ Bullet is configured at run-time using settings defined in a file. Settings not
88

99
## Installation
1010

11-
To use Bullet, you need to implement a way to read from your data source and convert your data into Bullet Records (bullet-record is a transitive dependency for Bullet and can be found [in JCenter](ingestion.md#installing-the-record-directly). You have a couple of options in how to get your data into Bullet:
11+
To use Bullet, you need to implement a way to read from your data source and convert your data into Bullet Records (bullet-record is a transitive dependency for Bullet and can be found in [Maven Central](ingestion.md#installing-the-record-directly). You have a couple of options in how to get your data into Bullet:
1212

13-
1. You can implement a spout (or even a topology) that reads from your data source and emits Bullet Records. You then write a main class that submits the topology with your topology wired in [using our submit method](https://github.com/bullet-db/bullet-storm/blob/master/src/main/java/com/yahoo/bullet/storm/StormUtils.java).
13+
1. You can implement a spout (or even a topology) that reads from your data source and emits Bullet Records. You then write a main class that submits the topology with your topology wired in [using our submit methods](https://github.com/bullet-db/bullet-storm/blob/master/src/main/java/com/yahoo/bullet/storm/StormUtils.java).
1414
2. Use Bullet DSL to configure a spout (and optionally a bolt) that you provide in the settings to our main class. This will wire up your data source and data format to Bullet without you having to write code!
1515

1616
You can refer to the [Pros and Cons](storm-architecture.md#data-processing) of the various approaches to determine what works best for you.
1717

18-
You need a JVM based project that implements one of the two options above. You include the Bullet artifact and Storm dependencies in your pom.xml or other dependency management system. The artifacts are available through JCenter, so you will need to add the repository.
19-
20-
```xml
21-
<repositories>
22-
<repository>
23-
<snapshots>
24-
<enabled>false</enabled>
25-
</snapshots>
26-
<id>central</id>
27-
<name>bintray</name>
28-
<url>http://jcenter.bintray.com</url>
29-
</repository>
30-
</repositories>
31-
```
18+
You need a JVM based project that implements one of the two options above. You include the Bullet artifact and Storm dependencies in your pom.xml or other dependency management system. The artifacts are available through Maven Central.
3219

3320
```xml
3421
<dependency>
@@ -45,7 +32,7 @@ You need a JVM based project that implements one of the two options above. You i
4532
</dependency>
4633
```
4734

48-
If you just need the jar artifact directly, you can download it from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-storm/).
35+
If you just need the jar artifact directly, you can download it from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-storm/).
4936

5037
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the sources or javadoc. We also package up our test code where we have some helper classes to deal with [Storm components](https://github.com/bullet-db/bullet-storm/tree/master/src/test/java/com/yahoo/bullet/storm). If you wish to use these to help with testing your topology, you can add another dependency on bullet-storm with ```<type>test-jar</type>```.
5138

@@ -80,7 +67,22 @@ Storm topologies are generally launched with "fat" jars (jar-with-dependencies),
8067
### Older Storm Versions
8168

8269
Since package prefixes changed from `backtype.storm` to `org.apache.storm` in Storm 1.0 and above, you will need to get the storm-0.10 version of Bullet if
83-
your Storm cluster is still not at 1.0 or higher. You change your dependency to:
70+
your Storm cluster is still not at 1.0 or higher. These older packages are only available in JCenter, which is already sunset but available in read-only
71+
mode. We recommend you do not use those versions and migrate to Bullet Storm versions greater than 1.1.2 as soon as possible. If you still
72+
need them, you can change your dependency to:
73+
74+
```xml
75+
<repositories>
76+
<repository>
77+
<snapshots>
78+
<enabled>false</enabled>
79+
</snapshots>
80+
<id>central</id>
81+
<name>bintray</name>
82+
<url>http://jcenter.bintray.com</url>
83+
</repository>
84+
</repositories>
85+
```
8486

8587
```xml
8688
<dependency>
@@ -90,8 +92,6 @@ your Storm cluster is still not at 1.0 or higher. You change your dependency to:
9092
</dependency>
9193
```
9294

93-
The jar artifact can be downloaded directly from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-storm-0.10/).
94-
9595
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the source or javadoc and ```<type>test-jar</type>``` for the test classes as with bullet-storm.
9696

9797
Also, since storm-metrics and the Resource Aware Scheduler are not in Storm versions less than 1.0, there are changes in the Bullet settings. The settings that set the CPU and memory loads do not exist (so the config file does not specify them). The setting to enable the topology scheduler are no longer present (you can still override these settings if you run a custom version of Storm by passing it to the storm jar command. [See below](#launch).) You can take a look the settings file on the storm-0.10 branch in the Git repo.
@@ -157,26 +157,26 @@ The Bullet Storm jar is not built with Bullet DSL or with other dependencies you
157157

158158
##### Kafka
159159

160-
[Kafka Clients 2.1.0](https://bintray.com/bintray/jcenter/org.apache.kafka%3Akafka-clients)
160+
[Kafka Clients 2.6.1](https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/2.6.1/)
161161

162162
##### Pulsar
163163

164-
[Pulsar Client 2.2.1](https://bintray.com/bintray/jcenter/org.apache.pulsar%3Apulsar-client)
164+
[Pulsar Client 2.2.1](https://repo1.maven.org/maven2/org/apache/pulsar/pulsar-client/2.2.1/)
165165

166-
[Pulsar Client Schema 2.2.1](https://bintray.com/bintray/jcenter/org.apache.pulsar%3Apulsar-client-schema)
166+
[Pulsar Client Schema 2.2.1](https://repo1.maven.org/maven2/org/apache/pulsar/pulsar-client-schema/2.2.1/)
167167

168-
[Pulsar Protobuf Shaded 2.1.0-incubating](https://bintray.com/bintray/jcenter/org.apache.pulsar%3Aprotobuf-shaded)
168+
[Pulsar Protobuf Shaded 2.1.1-incubating](https://repo1.maven.org/maven2/org/apache/pulsar/protobuf-shaded/2.1.1-incubating/)
169169

170170
##### Example
171171

172172
The following is an example for Pulsar in Storm 1.2.2+:
173173

174174
```
175175
176-
storm jar bullet-storm-0.9.1.jar \
176+
storm jar bullet-storm-1.3.0.jar \
177177
com.yahoo.bullet.storm.Topology \
178178
--bullet-conf ./bullet_settings.yaml \
179-
--jars "bullet-dsl-0.1.2.jar,pulsar-client-2.2.1.jar,pulsar-client-schema-2.2.1.jar,protobuf-shaded-2.1.0-incubating.jar"
179+
--jars "bullet-dsl-1.2.0.jar,pulsar-client-2.2.1.jar,pulsar-client-schema-2.2.1.jar,protobuf-shaded-2.1.1-incubating.jar"
180180
```
181181

182182
## Storage and Replay

docs/pubsub/architecture.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@ A PubSub provides Publisher and Subscriber instances that, depending on the cont
2323

2424
The PubSub layer does not deal with queries and results and just works on instances of messages of type ```com.yahoo.bullet.pubsub.PubSubMessage```. These [PubSubMessages](https://github.com/bullet-db/bullet-core/blob/master/src/main/java/com/yahoo/bullet/pubsub/PubSubMessage.java) are keyed (```id``` and ```sequence```), store content and metadata. This is a light wrapper around the payload and is tailored to work with multiple results per query and support communicating additional information and signals to and from the PubSub in addition to just queries and results.
2525

26+
### SerDe
27+
28+
The PubSub layer also supports a ```PubSubMessageSerDe``` interface to customize how the data is stored in the message. The SerDe is only used for publishing a message from the Web Service and for reading it in the backend. This is particularly relevant if you are storing the PubSubMessage in a [storage layer](../ws/setup.md#storage-configuration) for resiliency. Using an appropriate SerDe controls how the payload is serialized and deserialized for transportation and storage. For instance (and by default), the [ByteArrayPubSubMessageSerDe](https://github.com/bullet-db/bullet-core/blob/master/src/main/java/com/yahoo/bullet/pubsub/ByteArrayPubSubMessageSerDe.java) is used for queries. This converts the Query object payload into a byte[] when storing and transmitting it to the backend. The backend, however, does not reify the payload back into a Query object till it needs the Query. So the PubSubMessage can be serialized and deserialized multiple times as it is transferred between components without needless conversions back and forth. You can write your own if you wish to customize the behavior and control what is stored in the Storage layer if one is used. For instance, BQL provides a [LazyPubSubMessageSerDe](https://github.com/bullet-db/bullet-bql/blob/master/src/main/java/com/yahoo/bullet/bql/query/LazyPubSubMessageSerDe.java) that keeps the query as a String and makes the backend create the Query object using BQL (normally this is done in the API)!
29+
2630
## Choosing a PubSub implementation
2731

2832
If you want to use an implementation already built, we currently support:

0 commit comments

Comments
 (0)