dtree: dynamic trees à la carte (C++)

David Eisenstat

July 2012, revised October 2013 and May 2014

Download

The latest release is May 2014.

Feel free to contact me if you have any questions.

Introduction

“Dynamic tree” refers to one of a family of abstract data types that specify operations for querying and updating a collection of trees (a forest). These operations typically include Link, which joins two trees into one, and Cut, which splits one tree into two. The forest is often node- or edge-weighted, with operations to update many weights at a time or aggregate them.

Dynamic trees were introduced in the early 1980s by Sleator and Tarjan, who used them to obtain the then-fastest algorithm for maximum flow. It is the case for many graph problems that the solutions with the best known asymptotic running time use dynamic trees.

Most dynamic-tree operations can be implemented in (amortized) logarithmic time. For dtree, the few exceptions are noted individually. In theory, amortization is not necessary, but it has led to simpler data structures that perform better in practice. The worst-case time per operation is linear.

Please let me know if you find dtree useful.

Why dtree?

Efficient
dtree adheres to the pay-for-what-you-use philosophy of C++, and I have spent a lot of time tuning it. In particular, the internal operation Expose makes one pass. Some other implementations, including every possible generic implementation of top trees, make three.
Flexible
dtree provides operations with tree scope as well as path scope. Many dynamic tree operations described in the literature are implemented, as are dynamic sequences based on splay trees.
Open source
dtree is available under a permissive MIT license.
Portable
The dtree header files are intended to conform to the C++98 standard. dtree is known to work on Linux and OS X with recent versions of the GNU Compiler Collection (GCC), Clang, and Intel C++ Compiler. Unfortunately, dtree is incompatible with Microsoft Visual C++ 2010, due to a compiler bug involving shadowed names (see the reduced test case in msvc_bug.cpp). There seems to be no reasonable workaround.
Well tested
dtree comes with an extensive test suite. For many tree/sequence configurations, the results of random operations are compared against the ground truth computed by a simpler implementation. Between operations, all data-structural invariants are verified.

Installation

Copy the header files in standalone/ into your project.

Building the example programs and tests requires a POSIX environment and C99 functionality. Run make to build the examples with GCC and make check to build and run the tests. To use Clang, for example, instead of GCC, run make CXX=clang++. To fix errors involving __attribute__, run the script tool/remove_attribute.sh and try again.

Example programs

flow/
Implements the push-relabel algorithm for maximum (pre)flow due to Goldberg and Tarjan, with global and gap relabeling, with and without dynamic trees. Please note that, while the core algorithm is competitive with Goldberg’s implementation, the input and initialization steps could be made significantly more efficient.
maze/
Samples and displays random spanning trees of a grid graph.
tsp/
Implements a 2-opt local search for the traveling salesman problem in the 2D unit square, with dynamic sequences.

Version history

The July 2012 version was the first public release. The October 2013 version excised the hack previously required to support index predicates. The May 2014 version fixed a bug that prevented the naive versions of seq-inl.h and tree-inl.h from being included in the same file. As part of the fix, the function template Connected was deprecated in favor of SameSeq and SameTree.

May 2014 also changed the usage of WithReverseBy and WithEvertBy and introduced the transformers WithAncDescValue and WithAncDescAggr, the method SetNonDescValue, and the preprocessor symbol DTREE_VERSION (currently 201405L).

Unless another bug is found, May 2014 likely will be the last version of dtree.

Acknowledgments

dtree never would have been written without the vision, encouragement, and support of my PhD coadvisor Philip N. Klein. I would like to thank Sarah Eisenstat and Daniel Patterson for reading early drafts of this manual. Any errors or omissions that remain are my own.

The development of dtree was funded in part by NSF grant CCF-0964037.

Quick start: Sleator–Tarjan trees

The following MIT-licensed code demonstrates the operations of the dynamic tree described in the 1985 paper “Self-Adjusting Binary Search Trees” by Sleator and Tarjan.

#include "dtree/tree.h"
#include "dtree/tree-inl.h"

class Forest {
 public:
  // don't change Cost to an unsigned type; that breaks the Min aggregate
  // read the appendix before changing Cost to a floating-point type
  typedef int Cost;

 private:
  typedef dtree::WithAncAggr<dtree::Min<Cost>,
                             dtree::WithAncValue<dtree::Add<Cost>,
                                                 dtree::Begin<> > > C;
  typedef dtree::WithEvert<C> E;
  typedef dtree::EndTree<E> Node;

 public:
  typedef size_t NodeId;
  Forest();
  ~Forest();
  // creates a forest with node ids 0..size-1
  void Initialize(size_t size);
  // as in [ST85]
  Cost FindCost(NodeId v) { return C::Value(&node_[v]); }
  NodeId FindRoot(NodeId v) { return Root(&node_[v]) - node_; }
  NodeId FindMin(NodeId v);
  void AddCost(NodeId v, Cost x) { C::AddToAnc(&node_[v], x); }
  void Link(NodeId v, NodeId w) { dtree::Link(&node_[v], &node_[w]); }
  void Cut(NodeId v) { dtree::Cut(&node_[v]); }
  void Evert(NodeId v) { E::Evert(&node_[v]); }

 private:
  Node* node_;
  // disallows the copy constructor and the assignment operator
  Forest(const Forest&);
  void operator=(const Forest&);
};

Forest::Forest() : node_(NULL) {
}

Forest::~Forest() {
  delete[] node_;
}

void Forest::Initialize(size_t size) {
  Node* old_node = node_;
  // avoids a double delete[] if new[] throws
  node_ = NULL;
  delete[] old_node;
  node_ = new Node[size];
}

Forest::NodeId Forest::FindMin(NodeId v) {
  return C::FindRootmostAnc(&node_[v],
                            dtree::LessEqual(C::AggrAnc(&node_[v]))) - node_;
}

Terminology

Round brackets (like this) delimit parenthetical remarks. Square brackets [like this] resolve syntactic ambiguity.

For us, trees always are rooted and unordered. All of the terms defined below are standard except for “leafmost” and “rootmost”, whose meanings should be clear. Every node is its own ancestor and descendant.

Formally, the topology of a forest on a set of nodes V is specified by a partial map p : VV such that, for every integer k > 0, the iterated map pk has no fixed point (∀k > 0. ∄uV. pk(u) = u). The parent of a node uV is p(u). There is an edge from u to p(u). The ancestors of u comprise the set {pk(u) : k ≥ 0}. The proper ancestors of u are the ancestors of u except for u itself. With respect to a set of nodes UV, a node uU is rootmost if and only if U contains no proper ancestor of u. The root of u’s tree is the rootmost ancestor of u. Nodes u and v belong to the same tree if and only if the roots of their trees are the same node. The nodes belonging to the same tree as u comprise u’s tree.

Most of these terms have an analog describing the reverse relationship. The children of a node u are the nodes with parent u. The descendants of u are the nodes with ancestor u. The proper descendants of u are the descendants of u except for u itself. With respect to a set of nodes, a member u is leafmost if and only if the set contains no proper descendant of u. A leaf is a node without proper descendants.

Sequences are ordered from left to right. The topology of a collection of sequences on a set of nodes V is specified by partial maps l, r : VV such that, for every node uV, if l(u) is defined, then r(l(u)) = u, and if r(u) is defined, then l(r(u)) = u. The node immediately to the left of a node u is l(u). The node immediately to the right of a node u is r(u). A node u is properly to the left of a node v if and only if there exists an integer k > 0 such that lk(v) = u. A node u is properly to the right of a node v if and only if there exists an integer k > 0 such that rk(v) = u. With respect to a set of nodes UV, a node uU is leftmost if and only if U contains no node properly to the left of u. Node u is rightmost if and only if U contains no node properly to the right of u. Nodes u and v belong to the same sequence if and only if there exists an integer k ≥ 0 such that u ∈ {lk(v), rk(v)}. The nodes belonging to the same sequence as u comprise u’s sequence.

Transformers and template arguments

We use the term “transformer” to mean a template class that derives from its last template argument. Below is an example of a transformer.

template<typename Base>
class Transformer : public Base {
  /* ... */
};

Node types are constructed by applying the transformer dtree::Begin<Base>, whose sole argument defaults to the empty class dtree::Empty, followed by transformers specifying the features desired, followed by an end transformer. The simplest tree node, for example, has type dtree::EndTree<dtree::Begin<> >.

Other than the base class, transformers in dtree accept template arguments of three kinds: groups, semigroups, and predicates.

Groups

Groups specify how values are updated. dtree defines several groups. For more information, see the appendix.

Semigroups

Semigroups specify how aggregates are computed. dtree defines several semigroups. For more information, see the appendix.

The aggregate type of dtree::CountAndSum<C, S> is dtree::CasAggr<C, S>, a structure with two fields: count, of type C; and sum, of type S.

Predicates

Predicates specify search goals. dtree defines several predicates. For more information, see the appendix.

Three special predicates match the node at a specified index, which is asserted to be nonnegative. Formally, let the nodes in search scope be u1, …, un from dirmost to dirleast. (In case the search has nonlinear scope, the ordering is topological and inconsistent across invocations.) For [Index with Count] or IndexByCount, node ui is matched by indexes in the interval [i-1, i). For [Index with Sum] or IndexBySum, all node values in scope must be nonnegative, and node ui is matched by indexes in the interval [Si-1, Si), where Sj = value(u1) + … + value(uj).

Memory

Nodes cannot be copied and thus cannot be stored directly in Standard Template Library containers. Store smart pointers to nodes instead or use a lightweight container such as Boost’s scoped_array. Alternatively, one hook method to support a custom node container is provided.

Except for Assemble, an operation described in the appendix, every operation

dtree thus can be used in a multithreaded program if accesses to the same tree/sequence are properly synchronized.

Trees

Include "dtree/tree.h" for the types and "dtree/tree-inl.h" for the functions. The transformer dtree::EndTree<Base> yields a node type for trees without descendant pointers. The transformer dtree::EndTreeWithDesc<Base> yields a node type for trees with descendant pointers. Descendant pointers consume time and space and should be requested only if required. Each of these transformers must be the last to be applied (most derived).

Topological operations on trees

Topological operations on trees with descendant pointers

Changing the root of a tree (eversion)

The transformer dtree::WithEvert<Base> extends its argument to support the Evert operation.

Values and updates in trees

The transformer dtree::WithStaticValue<Type, Base> extends its second argument by a value field (Type must be default-constructible, copyable, and assignable). The transformer dtree::WithAncValue<Group, Base> extends its second argument by a value field supporting bulk addition and subtraction in Group with ancestor (path) scope. The transformer dtree::WithDescValue<Group, Base> extends its second argument by a value field supporting bulk addition and subtraction in Group with descendant (tree) scope. The transformer dtree::WithAncDescValue<Group, Base> extends its second argument by a value field supporting bulk addition and subtraction in Group with ancestor and descendant scope. Descendant values do not require descendant pointers.

Static values (not bulk-updatable)

Ancestor values (path-updatable)

Descendant values (tree-updatable)

Ancestor-descendant values (path- and tree-updatable)

With three caveats, all of the methods for ancestor and descendant values described above are available for WithAncDescValue. A custom group with PlusFilter and MinusFilter is required; see below for more details. Intuitively, the filter passes descendant updates only.

SetValue is restricted to trees with descendant pointers. The delta argument for the ancestor-value methods is filtered by delta = Group::MinusFilter(delta, delta). All of the operations with descendant scope use [PlusFilter instead of Plus] and [MinusFilter instead of Minus].

Aggregates and searches in trees

The transformer dtree::WithAncAggr<Semigroup, Base> extends its second argument to maintain Semigroup aggregates with ancestor (path) scope. The transformer dtree::WithDescAggr<Semigroup, Base> extends its second argument to maintain Semigroup aggregates with descendant (tree) scope. The transformer dtree::WithAncDescAggr<Semigroup, Base> extends its second argument to maintain Semigroup aggregates with both ancestor and descendant scope.

Aggregation uses the value field declared by second argument’s most derived transformer of type Begin, WithStaticValue, WithAncValue, WithDescValue, WithAncDescValue, or maybe WithEvert (don’t make any assumptions). The group implicitly associated with dtree::Begin<> is dtree::Nop<dtree::NoValue>. The group implicitly associated with dtree::WithStaticValue<Type, Base> is dtree::Nop<Type>.

Descendant aggregates require descendant pointers. Searches require an aggregate compatible with the search scope and predicate.

Ancestor aggregates (path-searchable)

Descendant aggregates (tree-searchable)

Ancestor-descendant aggregates (path- and tree-searchable)

All of the methods for ancestor and descendant aggregates described above are available for WithAncDescAggr.

Simulating edge values

Edges in dtree are not first-class. If edge values are required, there are at least two solutions. The first, which applies if there is no eversion, is to store the value of each edge in its leafmost endpoint. The second is to subdivide each edge and store its value in the midpoint. Neither solution is provided by dtree because both result in leaky abstractions.

Sequences

Include "dtree/seq.h" for the types and "dtree/seq-inl.h" for the functions. The transformer dtree::EndSeq<Base> yields a node type for sequences. This transformer must be the last to be applied (most derived).

Topological operations on sequences

Operating on contiguous subsequences

The template class dtree::ScopedCutDirOf<Node> streamlines the idiom of cutting a sequence, operating on one of its parts, and linking the parts back together again. Analogous subclasses dtree::ScopedCutLeftOf<Node> and dtree::ScopedCutRightOf<Node> binding dir to dtree::kLeft or dtree::kRight respectively also are provided.

Reversing a sequence

The transformer dtree::WithReverse<Base> extends its argument to support the ReverseSeq operation.

Values and updates in sequences

The transformer dtree::WithValue<Group, Base> extends its second argument by a value field supporting bulk addition and subtraction in Group.

Aggregates and searches in sequences

The transformer dtree::WithAggr<Semigroup, Base> extends its second argument to maintain Semigroup aggregates. Aggregation uses the value field declared by second argument’s most derived transformer of type Begin, WithValue, or maybe WithReverse (don’t make any assumptions). The group implicitly associated with Begin is dtree::Nop<dtree::NoValue>.

Searches require an aggregate compatible with the search predicate.

Directed pairs

The template group dtree::DpAdd<T> provides pairs of values under componentwise addition that, when used with the proper transformers, respond to eversion or reversal. One element of the pair is denoted lw, for leafward or leftward, and the other is denoted rw, for rootward or rightward. The implementation of directed pairs uses the wreath product (T, +) ≀ Z2. Read the appendix before choosing T to be an unsigned or floating-point type.

The value type for directed pairs is dtree::DpValue<T>, with a zero-argument constructor that constructs zero and a two-argument constructor with arguments lw and rw. The aggregate type for directed pairs is dtree::DpAggr<T>, with analogous constructors. These types are distinct because DpValue also has a flip member. Both types have getters .lw() and .rw() and setters .set_lw(T lw) and .set_rw(T rw). Additionally, getter .dw(int dir) and setter .set_dw(int dir, T dw) provide generic access to lw and rw by accepting a direction argument that is either dtree::kLw or dtree::kRw. For convenience, it is guaranteed that dtree::kLw == dtree::kLeaf and dtree::kRw == dtree::kRoot and dtree::kLw == dtree::kLeft and dtree::kRw == dtree::kRight.

Directed pairs are especially useful in combination with simulated edge values.

Directed pairs for trees

Directed pairs for sequences

Semigroups for directed pairs

Pass these to WithAncAggr, WithDescAggr, or WithAggr.

Predicates for directed pairs

Pass these to Find*.

Appendix

The appendix describes dtree in more detail than is necessary for most uses.

Custom groups

“Group”, Plus, and Minus are misnomers; see the nonstandard axioms below. Groups are specified by classes with three public members.

An example of a group follows.

struct AddInt {
  typedef int Type;
  int Plus(int x, int y) { return x + y; }
  int Minus(int x, int y) { return x - y; }
};

Groups must satisfy several axioms. Plus and Minus must be associative.

Plus(·, y) and Minus(·, y) must be inverses.

Groups for WithEvertBy and WithReverseBy

To be used with WithEvertBy or WithReverseBy, a group must have two additional public members.

FlippedFromValue must be a homomorphism from the group to Z2.

flip_delta() must have an effect.

Groups for WithAncDescValue

To be used with WithAncDescValue, a group must have two additional public members.

There are several more axioms. To specify them compactly, we stipulate the existence of a function Type Filter(Type x) with the following properties.

Custom semigroups

Semigroups are specified by classes with four public members.

An example of an semigroup follows.

struct MinInt {
  typedef int Type;
  int AggrFromValue(int x) { return x; }
  int CombineAggrs(int a, int b) { return std::min(a, b); }
  int empty_aggr() { return std::numeric_limits<int>::max(); }
};

The group must have a right action on aggregates consisting of two public members.

Right actions must satisfy several axioms analogous to the group axioms.

Semigroups must satisfy several axioms and be compatible with the group and its right action. CombineAggrs must be commutative and associative.

empty_aggr() must be the identity element of CombineAggrs.

AggrFromValue must commute with Plus(·, y) and Minus(·, y).

Plus and Minus must distribute over CombineAggrs.

Semigroups for WithAncDescValue

The group must have a filtered right action on aggregates consisting of two public members.

This action must satisfy the following two axioms, where Filter is the witness from before.

Custom predicates

Predicates must satisfy the following three axioms. In algebraic terms, the support of a predicate must be a proper two-sided ideal of the semigroup.

Overflow and unobserved violations of the axioms

Technically, since the behavior of signed-integer overflow is undefined in C++, dtree::Add<int> may not be a group. Even if overflow is resolved modulo UINT_MAX + 1, the group dtree::Add<int> is not compatible with the semigroup dtree::Min<int>, as the distributive axiom min(a + x, b + x) == min(a, b) + x is violated in some cases where exactly one of the additions on the left-hand side wraps around. Floating-point numbers are problematic as well; floating-point addition, for example, is not associative.

Fortunately, axiom violations that dtree does not observe do not affect correctness. Since dtree constructs values and aggregates in only a few ways, detailed below, careful use can keep all violations hidden. For integers, it suffices never to store negative values. For floating-point numbers, this means using a subset on which floating-point arithmetic behaves like fixed-point.

At all times, for every node u and every value field, the value x of u may be constructed. Additionally, for every node w in the same tree/sequence as u, the value Minus(x, y) may be constructed, where y is the value of w. For every [aggregate associated with the value field] and every nonempty set of nodes in the same tree/sequence, the aggregate a of that set may be constructed, as may, for every node u in the set, Minus(a, x), where x is the value of u. The identity element empty_aggr() may be constructed. WithAncDescAggr may construct Minus(empty_aggr(), x).

Naive implementations

For the non-standalone headers, defining the preprocessor symbol NAIVE_DTREE causes the implementations based on splay trees to be replaced by ones based on linked lists. The latter often are faster for small numbers of nodes. The naive tree omits descendant pointers, descendant values, and descendant aggregates. The naive sequence omits nothing.

Miscellaneous operations on trees

Include "dtree/tree-extra.h".


© 2012–2014 David Eisenstat