You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| 2016-05-03 |[**0.2.2**](https://github.com/yahoo/bullet-ui/releases/tag/v0.2.2)| Fixes maxlength of the input for points |
71
+
| 2016-05-02 |[**0.2.1**](https://github.com/yahoo/bullet-ui/releases/tag/v0.2.1)| Fixes a bug with a dependency that broke sorting the Filters |
70
72
| 2016-05-01 |[**0.2.0**](https://github.com/yahoo/bullet-ui/releases/tag/v0.2.0)| Release for Top K and Distribution. Supports Bullet Storm 0.4.2+ |
71
73
| 2016-02-21 |[**0.1.0**](https://github.com/yahoo/bullet-ui/releases/tag/v0.1.0)| The first release with support for all features included in Bullet Storm 0.2.1+ |
Copy file name to clipboardExpand all lines: docs/quick-start.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ By the following the steps in this section, you will:
6
6
7
7
* Setup the Bullet topology using a custom spout on [bullet-storm-0.4.2](https://github.com/yahoo/bullet-storm/releases/tag/bullet-storm-0.4.2)
8
8
* Setup the [Web Service](ws/setup.md) talking to the topology and serving a schema for your UI using [bullet-service-0.0.1](https://github.com/yahoo/bullet-service/releases/tag/bullet-service-0.0.1)
9
-
* Setup the [UI](ui/setup.md) talking to the Web Service using [bullet-ui-0.2.0](https://github.com/yahoo/bullet-ui/releases/tag/v0.2.0)
9
+
* Setup the [UI](ui/setup.md) talking to the Web Service using [bullet-ui-0.2.2](https://github.com/yahoo/bullet-ui/releases/tag/v0.2.2)
```bullet.query.aggregation.top.k.sketch.entries: 1024``` 0.75 times this number is the number of unique items for which counts can be done exactly. Approximates after.
130
130
131
-
```bullet.query.aggregation.distribution.max.points: 100``` The maximum number of points you can generate, use or provide for a Distribution aggregation.
131
+
```bullet.query.aggregation.distribution.max.points: 200``` The maximum number of points you can generate, use or provide for a Distribution aggregation.
132
132
133
133
!!! note "Want to tweak the example topology code?"
This was enabled as part of the configuration for the example backend. This was the timestamp when Bullet first saw this record. If you have timestamps in your data (as this example does), you will be able to tell exactly when your data was received by Bullet. This coupled with the timestamps in the Result Metadata for when your query was submitted and terminated, you will be able to tell why or why not a particular record was or was not seen in Bullet.
46
46
@@ -53,7 +53,7 @@ The Output Data section lets you aggregate or choose to see raw data records. Yo
53
53
**Example: Finding and picking out fields from events that have probability > 0.5**
!!! note "So why is the approximate count what it is?"
120
120
121
-
The backend should have produced ```20 * 200000/101``` or ```39603``` tuples with unique uuids. Due to the synthetic nature of the data generation and the building delays mentioned above, we estimated that we should subtract about 20 tuples for every 10 s the query runs. Since this query ran for ```200 s```, this makes the actual uuids generated to be at best ```39603 - (200/10) * 20``` or ```39203```. The result from Bullet was ```38886```, which is an error of ```~0.8 %```. The real error is probably about a *third* of that because we assumed the delay between periods to be 1 ms. It is more on the order of 2 or 3 ms, which makes the number of uuids actually generated even less.
121
+
The backend should have produced ```20 * 200000/101``` or ```39603``` tuples with unique uuids. Due to the synthetic nature of the data generation and the building delays mentioned above, we estimated that we should subtract about 20 tuples for every 10 s the query runs. Since this query ran for ```200 s```, this makes the actual uuids generated to be at best ```39603 - (200/10) * 20``` or ```39203```. The result from Bullet was ```39069```, which is an error of ```~0.3 %```. The real error is probably less than that because we assumed the delay between periods to be 1 ms to get the ```39203``` number. It's probably slightly larger making the actual uuids generated lower and closer to our estimate.
122
122
123
123
## Group all
124
124
@@ -129,7 +129,7 @@ When choosing the Grouped Data option, you can choose to add fields to group by.
129
129
The metrics you apply on fields are all numeric presently. If you apply a metric on a non-numeric field, Bullet will try to **type-cast** your field into number and if it's not possible, the result will be ```null```. The result will also be ```null``` if the field was not present or no data matched your filters.
@@ -156,7 +156,7 @@ In this example, we group by ```tuple_number```. Recall that this is the number
156
156
157
157
!!! note "What happens if I group by uuid?"
158
158
159
-
Try it out! Nothing bad should happen. If the number of unique group values exceeds the [maximum configured](../quick-start.md#setting-up-the-example-bullet-topology) (we used 1024 for this example), you will receive a *uniform sample* across your unique group values. The results for your metrics however, are **not sampled**. It is the groups that are sampled on. This means that is **no** guarantee of order if you were expecting the *most popular* groups or similar. We are working on adding a ```TOP K``` query that can support these kinds of use-cases.
159
+
Try it out! Nothing bad should happen. If the number of unique group values exceeds the [maximum configured](../quick-start.md#setting-up-the-example-bullet-topology) (we used 1024 for this example), you will receive a *uniform sample* across your unique group values. The results for your metrics however, are **not sampled**. It is the groups that are sampled on. This means that is **no** guarantee of order if you were expecting the *most popular* groups or similar. You should use the Top K query in that scenario.
160
160
161
161
!!! note "Why no Count Distinct after Grouping"
162
162
@@ -167,3 +167,82 @@ In this example, we group by ```tuple_number```. Recall that this is the number
167
167
Good job, eagle eyes! Unfortunately, whenever we group on fields, those fields become strings under the current implementation. Rather than convert them back at the end, we have currently decided to leave it as is. This means that in your results, if you try and sort by a grouped field, it will perform a lexicographical sort even if it was originally a number.
168
168
169
169
However, this also means that you can actually group by any field - including non primitives such as maps and lists! The field will be converted to a string and that string will be used as the field's representation for uniqueness and grouping purposes.
170
+
171
+
## Distributions
172
+
173
+
In this example, we find distributions of the ```duration``` field. This field is generated randomly from 0 to 10,049, with a tendency to have values that are closer to 0 than 10,049. Let's see if this is true. Note that since this field has random values, the results you see per query are the values generated during that query's duration.
174
+
175
+
The distribution type of output data requires you to pick a type of distribution: ```Quantiles```, ```Frequencies``` or ```Cumulative Frequencies```. ```Quantiles``` lets you get various percentiles (e.g. 25th, 99th) of your numeric field. ```Frequencies``` lets you break up the range of values of your field into intervals and get a count of how many values fell into each interval. ```Cumulative Frequencies``` does the same as ```Frequencies``` but each interval includes the counts of all the intervals prior to it. Both ```Frequencies``` and ```Cumulative Frequencies``` also give you a probability of how likely a value is to fall into the interval.
176
+
177
+
All the distributions require you to specify some numeric points. For ```Quantiles```, these points are between 0 and 1 and the value denotes the percentile you are looking for. (0.25 for 25th percentile, 0.99 for 99th etc). For ```Frequencies``` and ```Cumulative Frequencies```, the points are between the minimum and maximum value of your field and every 2 contiguous points create an interval. However, the first interval always starts from *-∞* to the first point and the last interval always starts from your last point to *+∞*.
178
+
179
+
You can read much more about this in the UI help by clicking the ```Need more help?``` link.
180
+
181
+
### Exact
182
+
183
+
**Example: Finding the various percentiles of duration**
184
+
185
+
This example shows all 3 values of specifying points and shows *exact* distribution results for the ```duration``` field.
To understand what this means, refer to the [explanation here](../ws/examples.md#normalized-rank-error). You can also refer to the help in the Result Metadata section.
219
+
220
+
!!! note "Wouldn't it be nice to graph these?"
221
+
222
+
This is in the works! We plan to add pivoting and graphing as a general option in the results pages. Feel free to follow [the issue here](https://github.com/yahoo/bullet-ui/issues/24).
223
+
224
+
## Top K
225
+
226
+
Top K lets you get the most *frequent items* or the *heavy hitters* for the values in a set of a fields.
227
+
228
+
### Exact
229
+
230
+
This example gets the Top 3 most popular ```type``` values (there are only 6 but this illustrates the idea).
By adding ```duration``` into the fields, the number of unique values for ```(type, duration)``` is increased. However, because ```duration``` has a tendency to have low values, we will have some *frequent items*. The counts are now estimated. We ask for the top 300 results but we also say that they should have a count of at least 20. This restricts the overall number of results to 12.
The ```maximum_count_error``` value for the query above was ```3```. This means that the difference between the upper bound and the lower bound of each count estimate is ```3```. Bullet returns the upper bound as the estimate so subtracting ```3``` from each count gives you the lower bound of the count. Note that some counts are closer to each other than the count error. For instance, ```(quux, 1)``` and ```(baz, 0)``` have counts ```67``` and ```66``` but their true counts could be from ```64 to 67``` and ```63 to 66``` respectively. This means that ```(baz, 0)``` could well be the most frequent item for this query.
0 commit comments