You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,7 @@ While mkdocs is available:
55
55
56
56
## Building the examples
57
57
58
-
You will need [Maven 3](https://maven.apache.org/install.html) and [JDK 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html) installed to build the examples.
58
+
You will need [Maven 3](https://maven.apache.org/install.html) and [JDK 8](https://jdk.java.net/java-se-ri/8-MR3) installed to build the examples.
Copy file name to clipboardExpand all lines: docs/backend/dsl.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,7 @@ The JSONBulletRecordConverter is used to convert String JSON representations of
138
138
139
139
### AvroBulletRecordConverter
140
140
141
-
The AvroBulletRecordConverter is used to convert Avro records into BulletRecords. Without a schema, it inserts every field into a BulletRecord without any type-checking. With a schema, you get type-checking, and you can also specify a RECORD field, and the converter will accept Avro Records in addition to Maps, flattening them into the BulletRecord.
141
+
The AvroBulletRecordConverter is used to convert Avro records into BulletRecords. Without a schema, it inserts every field into a BulletRecord without any type-checking. With a schema, you get type-checking, and you can also specify a RECORD field, and the converter will accept Avro Records in addition to Maps, flattening them into the BulletRecord. This converter also handles container types (such as Maps and Lists) that contain heterogenous nested types as well having more nesting levels than the types we support. It maps then to the appropriate Bullet `UNKNOWN_MAP`, `UNKNOWN_MAP_MAP` etc types so that queries can still be written that pull out these fields and if *they* are types that Bullet understands, the query can still execute.
Copy file name to clipboardExpand all lines: docs/backend/ingestion.md
+3-16Lines changed: 3 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -36,26 +36,13 @@ Data placed into a Bullet Record is strongly typed. We support these types curre
36
36
3. List of any of the [Primitives](#primitives)
37
37
3. List of any Map in 1
38
38
39
-
With these types, it is unlikely you would have data that cannot be represented as Bullet Record but if you do, please let us know and we are more than willing to accommodate.
39
+
With these types, it is unlikely you would have data that cannot be represented as Bullet Record but if you do, please let us know and we are more than willing to accommodate. It is also possible to place `UNKNOWN` container types such as Maps and Lists into the record. This can be useful for more deeply nested data structures or heterogenous container types. However, operations that extract fields from it can only work if the type of the extracted object is in the supported types above.
40
40
41
41
## Installing the Record directly
42
42
43
43
Generally, you depend on the Bullet Core artifact for your Stream Processor when you plug in the piece that gets your data into the Stream processor. The Bullet Core artifact already brings in the Bullet Record containers as well. See the usage for the [Storm](storm-setup.md#installation) for an example.
44
44
45
-
However, if you need it, the artifacts are available through JCenter to depend on them in code directly. You will need to add the repository. Below is a Maven example:
46
-
47
-
```xml
48
-
<repositories>
49
-
<repository>
50
-
<snapshots>
51
-
<enabled>false</enabled>
52
-
</snapshots>
53
-
<id>central</id>
54
-
<name>bintray</name>
55
-
<url>http://jcenter.bintray.com</url>
56
-
</repository>
57
-
</repositories>
58
-
```
45
+
However, if you need it, the artifacts are available through Maven Central to depend on them in code directly. Below is a Maven example:
59
46
60
47
```xml
61
48
<dependency>
@@ -65,6 +52,6 @@ However, if you need it, the artifacts are available through JCenter to depend o
65
52
</dependency>
66
53
```
67
54
68
-
If you just need the jar artifact, you can download it directly from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-record/).
55
+
If you just need the jar artifact, you can download it directly from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-record/).
69
56
70
57
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the sources or the javadoc.
Copy file name to clipboardExpand all lines: docs/backend/spark-architecture.md
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ The red lines are the path for the queries that come in through the PubSub, the
14
14
15
15
### Data processing
16
16
17
-
Bullet can accept arbitrary sources of data as long as they can be ingested by Spark. They can be Kafka, Flume, Kinesis, and TCP sockets etc. In order to hook up your data to Bullet Spark, you just need to implement the [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala). In your implementation, you can either:
17
+
Bullet can accept arbitrary sources of data as long as they can be ingested by Spark. They can be Kafka, Flume, Kinesis, and TCP sockets etc. You can either use [DSL](dsl.md) or hook up your data directly to Bullet Spark. To do the latter, you just need to implement the [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala). In your implementation, you can either:
18
18
19
19
* Use [Spark Streaming built-in sources](https://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams-and-receivers) to receive data. Below is a quick example for a direct Kafka source in Scala. You can also write it in Java:
20
20
@@ -48,6 +48,8 @@ class DirectKafkaProducer extends DataProducer {
48
48
49
49
* Write a [custom receiver](https://spark.apache.org/docs/latest/streaming-custom-receivers.html) to receive data from any arbitrary data source beyond the ones for which it has built-in support (that is, beyond Flume, Kafka, Kinesis, files, sockets, etc.). See [example](https://github.com/bullet-db/bullet-db.github.io/tree/src/examples/spark/src/main/scala/com/yahoo/bullet/spark/examples).
50
50
51
+
To use DSL, you can enable it by providing the `bullet.spark.dsl.data.producer.enable: true` and configuring the various DSL parameters.
52
+
51
53
After receiving your data, you can do any transformations like joins or type conversions in your implementation before emitting to the Filter Streaming stage.
52
54
53
55
The Filter Streaming stage checks every record from your data source against every query from Query Unioning stage to see if it matches and emits partial results to the Join Streaming stage.
Copy file name to clipboardExpand all lines: docs/backend/spark-setup.md
+7-20Lines changed: 7 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -8,31 +8,18 @@ Bullet is configured at run-time using settings defined in a file. Settings not
8
8
9
9
## Installation
10
10
11
-
Download the Bullet Spark standalone jar from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-spark/).
11
+
Download the Bullet Spark standalone jar from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-spark/).
12
12
13
-
If you are using Bullet Kafka as pluggable PubSub, you can download the fat jar from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-kafka/). Otherwise, you need to plug in your own PubSub jar or use the RESTPubSub built-into bullet-core and turned on in the API.
13
+
If you are using Bullet Kafka as pluggable PubSub, you can download the fat jar from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-kafka/). Otherwise, you need to plug in your own PubSub jar or use the RESTPubSub built-into bullet-core and turned on in the API.
14
14
15
-
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project or you can use Bullet DSL (see below). If you choose to implement your own, you have two ways as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through JCenter. Here is an example if you use Scala and Maven:
16
-
17
-
```xml
18
-
<repositories>
19
-
<repository>
20
-
<snapshots>
21
-
<enabled>false</enabled>
22
-
</snapshots>
23
-
<id>central</id>
24
-
<name>bintray</name>
25
-
<url>http://jcenter.bintray.com</url>
26
-
</repository>
27
-
</repositories>
28
-
```
15
+
To use Bullet Spark, you need to implement your own [Data Producer Trait](https://github.com/bullet-db/bullet-spark/blob/master/src/main/scala/com/yahoo/bullet/spark/DataProducer.scala) with a JVM based project or you can use Bullet DSL (see below). If you choose to implement your own, you have two ways as described in the [Spark Architecture](spark-architecture.md#data-processing) section. You include the Bullet artifact and Spark dependencies in your pom.xml or other equivalent build tools. The artifacts are available through Maven Central. Here is an example if you use Scala and Maven:
Copy file name to clipboardExpand all lines: docs/backend/storm-setup.md
+26-26Lines changed: 26 additions & 26 deletions
Original file line number
Diff line number
Diff line change
@@ -8,27 +8,14 @@ Bullet is configured at run-time using settings defined in a file. Settings not
8
8
9
9
## Installation
10
10
11
-
To use Bullet, you need to implement a way to read from your data source and convert your data into Bullet Records (bullet-record is a transitive dependency for Bullet and can be found [in JCenter](ingestion.md#installing-the-record-directly). You have a couple of options in how to get your data into Bullet:
11
+
To use Bullet, you need to implement a way to read from your data source and convert your data into Bullet Records (bullet-record is a transitive dependency for Bullet and can be found in [Maven Central](ingestion.md#installing-the-record-directly). You have a couple of options in how to get your data into Bullet:
12
12
13
-
1. You can implement a spout (or even a topology) that reads from your data source and emits Bullet Records. You then write a main class that submits the topology with your topology wired in [using our submit method](https://github.com/bullet-db/bullet-storm/blob/master/src/main/java/com/yahoo/bullet/storm/StormUtils.java).
13
+
1. You can implement a spout (or even a topology) that reads from your data source and emits Bullet Records. You then write a main class that submits the topology with your topology wired in [using our submit methods](https://github.com/bullet-db/bullet-storm/blob/master/src/main/java/com/yahoo/bullet/storm/StormUtils.java).
14
14
2. Use Bullet DSL to configure a spout (and optionally a bolt) that you provide in the settings to our main class. This will wire up your data source and data format to Bullet without you having to write code!
15
15
16
16
You can refer to the [Pros and Cons](storm-architecture.md#data-processing) of the various approaches to determine what works best for you.
17
17
18
-
You need a JVM based project that implements one of the two options above. You include the Bullet artifact and Storm dependencies in your pom.xml or other dependency management system. The artifacts are available through JCenter, so you will need to add the repository.
19
-
20
-
```xml
21
-
<repositories>
22
-
<repository>
23
-
<snapshots>
24
-
<enabled>false</enabled>
25
-
</snapshots>
26
-
<id>central</id>
27
-
<name>bintray</name>
28
-
<url>http://jcenter.bintray.com</url>
29
-
</repository>
30
-
</repositories>
31
-
```
18
+
You need a JVM based project that implements one of the two options above. You include the Bullet artifact and Storm dependencies in your pom.xml or other dependency management system. The artifacts are available through Maven Central.
32
19
33
20
```xml
34
21
<dependency>
@@ -45,7 +32,7 @@ You need a JVM based project that implements one of the two options above. You i
45
32
</dependency>
46
33
```
47
34
48
-
If you just need the jar artifact directly, you can download it from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-storm/).
35
+
If you just need the jar artifact directly, you can download it from [Maven Central](https://repo1.maven.org/maven2/com/yahoo/bullet/bullet-storm/).
49
36
50
37
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the sources or javadoc. We also package up our test code where we have some helper classes to deal with [Storm components](https://github.com/bullet-db/bullet-storm/tree/master/src/test/java/com/yahoo/bullet/storm). If you wish to use these to help with testing your topology, you can add another dependency on bullet-storm with ```<type>test-jar</type>```.
51
38
@@ -80,7 +67,22 @@ Storm topologies are generally launched with "fat" jars (jar-with-dependencies),
80
67
### Older Storm Versions
81
68
82
69
Since package prefixes changed from `backtype.storm` to `org.apache.storm` in Storm 1.0 and above, you will need to get the storm-0.10 version of Bullet if
83
-
your Storm cluster is still not at 1.0 or higher. You change your dependency to:
70
+
your Storm cluster is still not at 1.0 or higher. These older packages are only available in JCenter, which is already sunset but available in read-only
71
+
mode. We recommend you do not use those versions and migrate to Bullet Storm versions greater than 1.1.2 as soon as possible. If you still
72
+
need them, you can change your dependency to:
73
+
74
+
```xml
75
+
<repositories>
76
+
<repository>
77
+
<snapshots>
78
+
<enabled>false</enabled>
79
+
</snapshots>
80
+
<id>central</id>
81
+
<name>bintray</name>
82
+
<url>http://jcenter.bintray.com</url>
83
+
</repository>
84
+
</repositories>
85
+
```
84
86
85
87
```xml
86
88
<dependency>
@@ -90,8 +92,6 @@ your Storm cluster is still not at 1.0 or higher. You change your dependency to:
90
92
</dependency>
91
93
```
92
94
93
-
The jar artifact can be downloaded directly from [JCenter](http://jcenter.bintray.com/com/yahoo/bullet/bullet-storm-0.10/).
94
-
95
95
You can also add ```<classifier>sources</classifier>``` or ```<classifier>javadoc</classifier>``` if you want the source or javadoc and ```<type>test-jar</type>``` for the test classes as with bullet-storm.
96
96
97
97
Also, since storm-metrics and the Resource Aware Scheduler are not in Storm versions less than 1.0, there are changes in the Bullet settings. The settings that set the CPU and memory loads do not exist (so the config file does not specify them). The setting to enable the topology scheduler are no longer present (you can still override these settings if you run a custom version of Storm by passing it to the storm jar command. [See below](#launch).) You can take a look the settings file on the storm-0.10 branch in the Git repo.
@@ -157,26 +157,26 @@ The Bullet Storm jar is not built with Bullet DSL or with other dependencies you
Copy file name to clipboardExpand all lines: docs/pubsub/architecture.md
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,10 @@ A PubSub provides Publisher and Subscriber instances that, depending on the cont
23
23
24
24
The PubSub layer does not deal with queries and results and just works on instances of messages of type ```com.yahoo.bullet.pubsub.PubSubMessage```. These [PubSubMessages](https://github.com/bullet-db/bullet-core/blob/master/src/main/java/com/yahoo/bullet/pubsub/PubSubMessage.java) are keyed (```id``` and ```sequence```), store content and metadata. This is a light wrapper around the payload and is tailored to work with multiple results per query and support communicating additional information and signals to and from the PubSub in addition to just queries and results.
25
25
26
+
### SerDe
27
+
28
+
The PubSub layer also supports a ```PubSubMessageSerDe``` interface to customize how the data is stored in the message. The SerDe is only used for publishing a message from the Web Service and for reading it in the backend. This is particularly relevant if you are storing the PubSubMessage in a [storage layer](../ws/setup.md#storage-configuration) for resiliency. Using an appropriate SerDe controls how the payload is serialized and deserialized for transportation and storage. For instance (and by default), the [ByteArrayPubSubMessageSerDe](https://github.com/bullet-db/bullet-core/blob/master/src/main/java/com/yahoo/bullet/pubsub/ByteArrayPubSubMessageSerDe.java) is used for queries. This converts the Query object payload into a byte[] when storing and transmitting it to the backend. The backend, however, does not reify the payload back into a Query object till it needs the Query. So the PubSubMessage can be serialized and deserialized multiple times as it is transferred between components without needless conversions back and forth. You can write your own if you wish to customize the behavior and control what is stored in the Storage layer if one is used. For instance, BQL provides a [LazyPubSubMessageSerDe](https://github.com/bullet-db/bullet-bql/blob/master/src/main/java/com/yahoo/bullet/bql/query/LazyPubSubMessageSerDe.java) that keeps the query as a String and makes the backend create the Query object using BQL (normally this is done in the API)!
29
+
26
30
## Choosing a PubSub implementation
27
31
28
32
If you want to use an implementation already built, we currently support:
0 commit comments