Trees, roots and leaves #2 @ Software @ Cunobaros

Comments? Send me an e-mail, read what others have written, or say what you think below.

Trees, roots and leaves #2

This is the second article of three in which I create a generic, re-usable tree. In the first part, I defined the requirements and made a start at the implementation, putting the value and parent members in place. In this part, we'll see how the management of children brings its own, special headaches, and finish with something that is quite complete and usable.

If it weren't for those pesky kids

With the requirement that I should be able to take a subtree out and treat it as a separate tree, I must use dynamic containment, as a list of pointers, rather than direct aggregation to hold the children. I choose to use an ordinary std::vector to keep track of the pointers, and I'll give access to many of its functions through simple forwarding functions.

template <typename T >
class tree_node
{
public:
    typedef tree_node<T> node_type;
    typedef tree_node<T>& reference;
    typedef const tree_node<T>& const_reference;
    typedef tree_node<T>* pointer;
    typedef std::vector<pointer> child_holder;
    typedef child_holder::iterator iterator;
    typedef const T& const_value_reference;
    ...
    iterator begin()
    {
        return children_.begin();
    }
    reference front()
    {
        return *children_.front();
    }
    ...
private:
    child_holder children_;
    ...

However, this is the first obvious problem with my new container. Because I have a vector of pointers, I have to de-reference what I get from the call to front(). So what do you get if you de-reference the iterator above? Right. A pointer to a tree_node. That it is a tree_node and not the template type isn't a problem - as I said earlier this is a feature, not an implementation detail - but I would ordinarily expect to get a reference out rather than a pointer. For the time being, I will ignore this, as it does not fundamentally affect how the class works, but just makes for uglier code when using it. It's an issue I will return to in the last part, though.

While we all know you should never break encapsulation casually, this class illustrates why. Were I to reveal the storage mechanism of the children, it would allow the uncontrolled adding and deletion of children, something that could, no, make that would, lead to memory leaks and access violations. We must keep track of those relationships.

Liaisons dangereuse

A node serving as branch or root needs to be able to take in new children and get rid of old. Since we're talking about a parent-child relationship, I decided to call these adopt and disown, to get a bit of drama. At first look, they are simple: the first adds a child to an internal list of children while the second removes it. But it is a bit more complicated than that - relationships between parents and children are dangerous and volatile things, as anyone with a teenager at home will tell you.

By necessity, there is a two-way relationship between parent and child - they both know of each other. A node can be made a parent by telling it to adopt another node. This not only adds the new child to the list of children the parent holds, but also tells the child about this new relationship by calling a private function, reparent, on the child.

    iterator adopt(pointer child)
    {
        if (NULL == child)
            throw std::invalid_argument(":Cannot adopt NULL pointers":);
        iterator i = find_child(child);
        if (i == children_.end())
        {
            children_.push_back(child);
            i =
            --children_.end();

        }
        child->reparent(this);
        return i;
    }

private:
    iterator find_child(const pointer child)
    {
        return std::find(children_.begin(), children_.end(), child);
    }

    void reparent(pointer parent)
    {
        if (parent == parent_)
            return;
        elope();
        parent_ = parent;
    }

I do little more than the barest minimum of safety checking here, just enough to make sure I don't put a NULL pointer in. This is bad enough to warrant an exception. Of course, in the documentation I could just say that the pointer should be valid and bring out the ":undefined behaviour": warning flag if it isn't, but since I need to check whether I already have it anyway (as a pointer that occurs twice in the list would be deleted twice on destruction), I might as well give a hint. Should someone try to adopt the same child twice, that's fine with me. At the end of the call, the child is adopted, so conceptually, the function has succeeded.

But what happens if the child already has another parent, if we are in fact moving it from one parent to another? Contrary to literary tradition, it must inform its old parent that it's leaving. This is done internally by the child through a call to elope(), which in turn calls the private function eloping(). This lets the old parent remove the child from its list of dependants and get on with its life, without having to worry about the mysterious disappearance of its offspring. Here, I'm using a return value to provide a handle to the object. If elope() is called through an iterator, I most definitely want to have its pointer, since I will have lost it forever if I don't.

    pointer elope()
    {
        if (NULL != parent_)
        {
            parent_->eloping(this);
            parent_ = NULL;
        }
        return this;
    }

    pointer disown(iterator child)
    {
        if ((child != children_.end()) && (this == (*child)->parent()))
            return (*child)->elope();
        throw std::invalid_argument("Cannot find child to disown");

    }

private:
    void eloping(pointer child)
    {
        iterator ch = find_child(child);
        if (ch != children_.end())
            children_.erase(ch);
    }
    ...

The scenario opposite from adoption, when I take a child from its parent, works the same way behind the scenes. You call the public function disown on the parent, telling it which child to give up on. This, in turn, causes the child to shout, "You can't disown me, I'm eloping!" like the ungrateful brat it is.

Admittedly, this seems a bit unnecessary. Couldn't you just inform the child it's been abandoned and remove it from the list? Yes, you could, but this has the advantage that there is only one place where this abandoning of children takes place, in eloping. And it looks more dramatic when I write about it. Note that disown doesn't erase the child, only its relationship with the parent, which is why I return a pointer to the newly independent child.

Clearance

However, we do need that function - abandon - or something like it. Why? Well, a parent can do worse things to its children than disown them. It can also get rid of them completely. The destructor of a node must inform its parent, if any, that it is no longer a valid child by calling elope(). This is right and proper, but inefficient and unnecessary if it is the parent that is destroying it.

Since I figure I should provide functions to erase children anyway, I wrote the following:

    iterator erase(iterator it)
    {
        pointer p = *it;
        iterator i = children_.erase(it);
        p->parent_ = NULL;           // abandon
        delete p;                       // delete
        return i;
    }

This works fine, as far as it goes. But what about when the client wants to erase a range, or even all children? What about the destructor, which needs to do just that? While I could call the above function manually for each one, this would be error-prone and inefficient. A better idea is to use separate function to do the actual deletion, giving the following clean-up code:

    ~tree_node()
    {
        delete value_;
        clear();
    }

    iterator erase(iterator it)
    {
        pointer p = *it;
        iterator next = children_.erase(it);
        p->erase_silent();
        return next;
    }

    iterator erase(iterator first, iterator last)
    {
        child_holder backup(first, last);
        iterator next = children_.erase(first, last);
        std::for_each(backup.begin(), backup.end(), std::mem_fun(&node_type::erase_silent));
        return next;
    }

    void clear()
    {
        erase(begin(), end());
    }
    ...
private:
    bool erase_silent()
    {
        parent_ = NULL;
        delete this;
        return true;
    }
    ...

There are some points worthy of notice in this code. Again, I only have one place in which the action happens - the erase_silent function - just like in the eloping and disowning discussed earlier. This is a sound design principle, both to avoid code duplication, and therefore maintenance problems, and to increase clarity and brevity. Remember that sword - there should only be one purpose, whether it's a class, function or variable. Conversely, any purpose should only be expressed once, and all other instances - of functions, classes and variables - should be expressed in terms of that one.

I am also careful not to delete things until I have managed to take them out of my list of children. Had I done it the other way around, I might erase children that are not mine, without letting the other parent know this had happened (since the parent pointer is reset before deletion) before failing to get them out of my own list of children. The other parent would then have pointers to children that didn't exit, causing all sorts of unpleasant surprises.

This way, std::vector::erase will protest before I do anything I might regret. Admittedly, this means I'm relying on the erase function to raise a fuss, and since the standard says that erasing an iterator that is not valid causes undefined behaviour, I can't be sure what happens. I feel it's sufficient to emulate this, though, and if nothing else it will be consistent with how your implementation of the standard library does things.

In the range version of the erase function, I take a copy of the range first. This might fail if the range is invalid, as it falls under undefined behaviour, but even if it is valid it might not be mine. If it isn't, the call to children_.erase should protest, and I won't have deleted anything. If it works, I still have a record of the objects I need to delete, even though they've been erased from the list of children.

Making the function erase_silent appear as a functor, or function object, by putting it in the std::mem_fun adapter also saves me the chore of writing a loop, since I can simply use the one provided in the algorithm library. Many programmers I know only use the containers from the STL, and consider things like algorithms and function adapters to be strange and esoteric, but really it is quite simple. The std::mem_fun just makes a member function look like a functor. Unfortunately, any function passed to the standard function adapters has to return a value. They shouldn't have to, but few compilers and libraries accept this construct yet, which is why erase_silent always returns true.

Copy right

That's the erasing part done, but I also promised a function to copy the contents of a tree_node. For this, I'll use a similar approach, and add an adoption function that creates a new child from a value instead of taking an existing one. This is a bit more complicated, so I will need some helper functions, both private and static.

    void adopt(const_value_reference child)
    {
        adopt(create_value_node(&child));
    }

    void copy_contents(const_reference node)
    {
        node_type temp();
        temp.copy_children(node);
        if (node.has_value())
            temp.assign_value(node.value());
        // Make room
        clear();
        if (temp.size() > children_.capacity())
            children_.reserve(temp.size());
        while (temp.size())
            adopt(*temp.begin());
        value_ = temp.value_;
        temp.value_ = NULL;
    }
    ...
private:
    void copy_children(const_reference node)
    {
        clear();
        std::for_each(node.begin(), node.end(), std::bind1st(AdoptCopy (), this));
    }

    static pointer create_value_node(const_value_pointer child)
    {
        if (NULL != child)
            return new node_type(*child);
        return new node_type();
    }

    static pointer create_copy(const_pointer pnode)
    {
        pointer p = NULL;
        try
        {
            p = create_value_node(pnode->value_);
            std::for_each(pnode->begin(), pnode->end(), std::bind1st(AdoptCopy(), p));
        }
        catch (...)
        {
            delete p;
            throw;
        }
        return p;
    }

    struct AdoptCopy : public std::binary_function<pointer, const_pointer, bool>
    {
        bool operator()(pointer lhs, const_pointer rhs) const
        {
            lhs->adopt(create_copy(rhs));
            return true;
        }
    };

First of all, in order to maximise exception safety I use the "create a temporary and swap" idiom. If the creation of the temporary works, I can swap safely. The same goes for the value, which is a simple pointer copy operation and guaranteed not to throw. Of course, I can't use the vector::swap function, since the parent needs to be set, but the idiom is the same. The adopt function is safe, since it's only doing pointer assignment and a push_back on a vector which is guaranteed to have the capacity.

Unlike the erasing, I can't easily use std::mem_fun on a member function to get a functor. The reason for this is that I need to bind one of the parameters - the destination - since std::for_each only works on unary (single-parameter) functors. For std::bind1st to work, it needs the functor to have some type definitions you get for free from std::binary_function, which also explains why it quite pointlessly returns a boolean - it is required to have a return type.

The functor AdoptCopy takes pointers to what would be the left and right sides of an assignment operator, i.e. the destination and source, respectively. Its purpose is simply to call a static function that creates a copy of the source and ask the destination to adopt this copy. In both the copy_children and create_copy functions, it is used in the std::for_each algorithm, with the destination bound to it.

Note that this is a recursive setup. AdoptCopy calls create_copy which calls AdoptCopy... There have been many arguments about the benefits and/or horrors of recursion over the years, and I have to say that I am usually not fond of using it. In this case, however, I think it's the best solution, as it makes a much more elegant solution than a non-recursive one. Trust me, I tried.

On a final note, the snippet above also features the only try-catch block in the code. That's because this is the only potential memory leak I can control directly. Should the copying of children fail - because of lack of memory, or an exception thrown in the value constructor - I abandon the whole thing, delete the node I was assigning children to and rethrow the exception, making sure I do not leave any unadopted children lying around.

To finish the tree_node, I'll add some call-through functions for the list of children - like back, end, empty and size - and put in const versions of existing functions. With that, this class is complete. It is as exception-safe as std::vector and the value type allows, it's without memory leaks, it works, and it's useful. Wonderful. But before I dislocate my shoulder trying to pat myself on the back, I must remember that it's not quite finished.

In the next, and final instalment, we'll take a look at what is required of a container, and do what we can to fulfil those requirements. Until then, you are welcome to have a look at the source for what we have done so far.

Here is a zip file containing tree_node_2.hpp which defines and declares the tree_node class as it stands at the end of this article, and tree_test.cpp which contains a small example of how it can be used, in this case by representing a very simple XML document.

Comments