Iterating through a C++ enumeration
To give a bit of motivation, when running a particle physics
analysis one needs to consider different systematic effects.
The correct way to do this involves applying one systematic effect
at a time to the analysis procedure. Thus, in order to produce
the analysis with systematics requires looping over all possible
systematic effects. One could handle these systematics as a
string and loop over a container of all strings, or store these
in a JSON/XML file and parse it. Of course in C++, this is
describing a single particular value among a list of pre-defined
values which is perhaps best handled by an enum
. Still, when
faced with an enum
type how does one loop over all possible
values?
The key to this can be taken from two perspectives - one is the
manner in which STL containers are looped over prior to C++11 and
the second is that multiple enum
names can be backed by the same
value using assignment. Prior to C++11, one traditionally loops
over an iterator from the container’s begin()
method up to but
excluding the iterator from the container’s end()
method.
With an enum
, rather than listing all the values if one adds
two extra values named BEGIN
and END
as follows:
enum Systematic {
BEGIN,
Nominal = BEGIN,
Electron_Up,
// ...
END
};
then one can also loop over the values as
for(Systematic systematic(BEGIN); systematic != END; systematic=static_cast<Systematic>(systematic+1)) {
run_analysis(systematic);
}
Of course, if one defines an enum
with gaps in the values then the above
cannot work as it clearly only increments by one. With that limitation,
though, we’ve successfully iterated through an enum
just by defining
two special values. The downside is that the loop is downright awful.
We are free to define global operators which might be useful, though. Consider
Systematic operator+(Systematic s, int n) {
return static_cast<Systematic>(static_cast<int>(s) + n);
}
If one’s confused by the inner static_cast
, note that s + n
by
itself would invoke this function. With this we have at least eliminated
the cast in the loop:
for(Systematic systematic(BEGIN); systematic != END; systematic=systematic+1) {
run_analysis(systematic);
}
Of course we can go a step further and define
Systematic& operator+=(Systematic& s, int n) {
s = static_cast<Systematic>(s + n);
return s;
}
which produces a loop like
for(Systematic systematic(BEGIN); systematic != END; systematic += 1) {
run_analysis(systematic);
}
For an even more iterator-like loop, defining the pre-fix incrementation operator as
Systematic& operator++(Systematic& s) {
s = static_cast<Systematic>(s + 1);
return s;
}
yields
for(Systematic systematic(BEGIN); systematic != END; ++systematic) {
run_analysis(systematic);
}
Personally, I don’t like the post-fix operator so I won’t bother with that. However, there are at least two things that we can improve upon here.
One is that the enum
names are in the global scope. It would be better
if one had to write Systematic::BEGIN
. This is also important if one ever
wants to have more than one iterable enumeration without resorting to gross
tricks such as looping over BEGIN
to END
for one type and Begin
to End
for another. Clearly something as simple as a namespace
would solve this
scope issue, as would C++11’s enum class
. Another solution, which I favor,
is placing the enum
inside a class
.
The second thing I don’t like is actually having to specify that we’re
looping from BEGIN
to END
. In my typical usage I wanted to loop over
all systematics so I would like that to be the default behavior. Ideally,
I would want to write code like this and then figure out how to support
that syntax:
for(Systematic systematic; systematic; ++systematic) {
run_analysis(systematic);
}
Clearly this requires two additional things beyond just being able to
increment an enumeration value. First, we need the default value to be
the BEGIN
value, and we need to support truth testing. Both of these
problems are easily solved with a class
as the default constructor
can ensure the correct value is chosen, and truth testing can be handled
via the conversion operator to a boolean, operator bool()
, which must
be a member function.
Thus, the solution looks something like:
class Systematic {
public:
enum Mode { BEGIN,
Nominal = BEGIN,
Electron_Up,
Electron_Down,
// ...
END
};
Systematic() : mode(BEGIN) { }
Systematic(Mode m) : mode(m) { }
operator bool() const {
return mode != END;
}
Systematic& operator++() {
mode = static_cast<Mode>(mode + 1);
return *this;
};
private:
Mode mode;
};
This solves most of the problems. The last points to consider are
what do we want something like if(systematic == Systematic::Electron_Up)
to actually do. As-is, there’s no comparison operator between a
Systematic
and a Systematic::Mode
. The comparison would require that
a temporary Systematic
is implicitly constructed via the second constructor
and the comparison is made between two Systematic
objects using the compiler generated default comparison operator, which simply compares the
mode
members. We could add a function such as
bool Systematic::operator==(Systematic::Mode m) {
return mode == m;
}
and remove the implicitly constructed Systematic
.
As a very useful extension of this enum
inside a class
pattern is
the ability to then define helper methods that are clearly part of the
class. For example, a method that returns the name of the systematic mode
as a string, or returns the name of the TTree
for writing/reading data.
Combining such methods with a switch
statement is even more powerful
as the compiler can automatically detect missing cases. In the case that
a particular change to the analysis is made for a variety of modes then
one can end up with long boolean chains, like
if(systematic == Systematic::Electron_Up || systematic == Systematic::Electron_Down)
but a simple member function can fix all that:
bool Systematic::IsElectron() const {
switch(mode) {
case Electron_Up:
case Electron_Down:
return true;
default:
return false;
}
}
Now, of course with the plain-old enum
we always had the freedom to
define something like
bool IsElectron(Systematic s) {
switch(s) {
case Electron_Up:
case Electron_Down:
return true;
default:
return false;
}
}
and use if(IsElectron(systematic))
. The object-oriented change to
if(systematic.IsElectron())
isn’t fundamentally different, aside from
the C++ class being closed to such modifications.
With C++11 syntax, is there opportunity for improvement? To an extent, yes. It is possible to enable the following code to work the same as our previous loop:
for(Systematic systematic: Systematic()) {
run_analysis(systematic);
}
Perhaps the first approach one would have is to define a function which essentially just aggregrates all possible systematics using a pre-C++11 style loop, eg,
std::vector<Systematic> GetSystematics() {
std::vector<Systematic> systematics;
for(Systematic systematic; systematic; ++systematic) {
systematics.push_back(systematic);
}
return systematics;
}
which generates a loop like
for(Systematic systematic : GetSystematics()) {
run_analysis(systematic);
}
This, of course, is just iteration through a STL container and therefore
boring. However, digging into how the C++11 loops work we know that they rely
on a begin
and end
function and a dereference operator. If we choose
to define these as:
// static
Systematic Systematic::begin() {
return Mode::BEGIN;
}
// static
Systematic Systematic::end() {
return Mode::END;
}
const Systematic& Systematic::operator*() const {
return *this;
}
then the loop
for(Systematic systematic : Systematic())
ends up working as one wishes. This particular implementation is still
relatively incomplete. For example, we can only start at the beginning
and we can only stop at the end. Iterating through just a subset is not
possible directly - one needs an if
statement in the body to weed out
undesired cases. Also note the above begin
and end
member methods
do not need to be static
; they are only suggested as static
because
they do not depend on an instance of the class.
Taking this just one step further, we can reduce the repetitiveness of the C++11 loop with
for(auto systematic: Systematic())