Goal: Efficient (time and space) quantiles over sliding, time-bounded windows. Terms: t: abstract quantity of time units over which measurements are drawn N: maximum number of events in time window V: the number of distinct observable values we take this to be the integers [0,V) (support is planned for dynamic ranges with fixed relative error) q: number of arbitrary quantiles to be measured N_b: number of bytes in word required to represent a value in [0,N) V_b: number of bytes in word required to represent a value in [0,V) take log_256(x), round up to nearest word size: 1, 2, 4, 8 w: word size of machine in bits Requirements: monotonic time The algorithms here are presented in a way that depends on V for time. The goal is O(1), but the best hypothetical we currently have is O(loglogV). Let's briefly layout the algorithm. We will use the following window of values for example purposes: { 5, 6, 5, 1, 1, 8, 9 } sorted this is: [ 1, 1, 5, 5, 6, 8, 9 ] We always store at least one copy of each event along with time it occurred. We do this in a circular buffer of length N. Events are ordered by time due to the monotonic time requirement. This means that we always know where the next event will go -- at the head of the list -- and where the next element may be popped from -- the tail of the list. This is trivial to maintain in O(1) per event. The second structure we maintain is an associative array from values in [0,V) to their frequency in [0,N). This is trivial to maintain in O(1) per event as an array of values (full array, not sparse). In our example it would be: 0: 0 1: 2 2: 0 ... 5: 2 6: 1 ... 8: 1 9: 1 ... The bulk of the interesting work is in the maintenance of the quantiles themselves. For each quantile, we store which quantile (as a double in [0,1]) along with the current numeric value. In addition, we store two values which reference an imaginary flattened, sorted version of the full window of values. We store the absolute index into this flattened version, along with the offset into the run of values it is. From our example, the median quantile would have the values: index: 3 value: 5 offset: 1 (it points to the first 5) When a new value is inserted, we may have to adjust the pseudo-index of the current value the quantile is pointed to. If the new value is less than or equal to the value of the quantile, the pseudo index has effectively increased by one and we track it accordingly. Otherwise, the pseudo-index is unaffected. We then possibly adjust the pseudo-index to match the target index of the quantile for the size array we now have. We can move up or down in a run of the same value in constant time with a simple decrement or increment. For instance, if a 0 value was added to the example, the median would actually be the first 5 and we'd only need to decrement the offset. In the event we are in the run of the wrong number, we can detect this event when frequency < 0 or > the count of the value, we need to search for the next lowest or highest value in the window and set the quantiles value to that. Naively, we must seach the entire value space of V, for O(V). We can do this with a very low constant factor. Using Van Emde Boas trees, we can perform this operation in O(loglogV). With naive find prev/next: space: O(V*N_b + N*V_b + q + 1) time: O(Vq) per event we speculate this to be closer to O(1) under realistic scenarios This may require a linear scan through the universe V to find next/prev per quantile tracked per event. With fast find prev/next: space: O(V*N_b + N*V_b + q + 1 + V/8) time: O(Vq) per event, low constant => O(Vq/w) we speculate this to be closer to O(1) under realistic scenarios This improves the speed by maintaining an auxillary bitset marking numbers in the range [0,V). We can skip entire any empty block the size of w with a single comparison against 0 in this case. Inside a w-block, we can use a single instruction like bit scan reverse if it is available. We anticipate the constant to be on the order of 1/w potentially making this very fast. O(Vq/w) = O(Vq), however. With Van Emde Boas tree for finding prev/next: space: TODO time: O(q * loglogV) per event This is a direct extension to maintain a Van Emde Boas tree for its fast find previous/next functions. For small universes of values, the constants are likely to swamp out the "fast" naive method.