Wednesday, June 18, 2008

Threading mess!

Software development requires discipline, you know what I mean: brainstorming, coding rules, code inspections, pair programming. Unfortunately all these activities for the management are a waste of time so at the end you end up to just act as a "code monkey"; to rub salt to the wound "multithread programming" requires ten time the discipline you need in a single thread environment. I've recently stepped in a project of medium size, and at the easy question: "are those class instances shared between two or more threads" the response was: "no... wait... yes, well I'm not sure... I don't know...". Riiiight.
Let's see a quick technique that should permit to detect (at runtime, sigh!) if two or more threads are using concurrently a class.

Suppose we have the following class:

struct Shared {
   void foo() { ...  }
};

and we are unsure if two threads are calling the Shared::foo() at same time. One way is to add a mutex to the class Shared and then attempt a "try lock" as first thing to do inside the foo and raise an error in case the try lock fails.

Something like:

class Shared {
   void foo() {
      TryScopedLock aLock(theMutex);

      if (!aLock) { throw std::runtime_error("BOOM"); }

      ...
   }

private:
   volatile mutex theMutex;

};

this approach works but it will slow down your software, hiding other problems around and, most of all, introduces useless synchronization; a mutex lock is not exactly a cheap operation.

The idea is to use the technique above but without using a lock, GCC gives us some functions for atomic memory access, and we can use for example:

bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)

for our very purpose. That function assigns at *ptr the value newval only if the current value of *ptr is oldval, it returns true if the comparison is successful and newval was written. We can use it to store the threadId when we enter the critical section, "zeroing" the value when we exit.

Basically I wrote a class that store (with an atomic operation) the threadID of the thread entering the critical section, and when it leaves forgets about the threadID. This was the result:

#ifndef THREAD_COLLISION_WARNING
#define THREAD_COLLISION_WARNING

#include <stdexcept>

#ifdef NDEBUG

#define THREAD_WATCH(obj)
#define SCOPED_WATCH(obj)

#else

#define THREAD_WATCH(obj) ThreadCollisionWarning _##obj;
#define SCOPED_WATCH(obj) ThreadCollisionWarning::ScopedWatch scoped_watch_##obj(_##obj);

#endif

class ThreadCollisionWarning {
public:
    ThreadCollisionWarning()
    :theActiveThread(0)
    { }

    ~ThreadCollisionWarning() { }

    class ScopedWatch {
        public:
            ScopedWatch(ThreadCollisionWarning& aTCW)
            :theWarner(aTCW)
            { theWarner.enter(); }

            ~ScopedWatch() { theWarner.leave(); }

        private:
            ThreadCollisionWarning& theWarner;
    };

private:

    void enter() { 
        if (!__sync_bool_compare_and_swap(&theActiveThread, 0, pthread_self())) {
            //gotcha! another thread is trying to use the same class
            throw std::runtime_error("Thread Collision");
        }
    }
    void leave() { 
        __sync_fetch_and_xor(&theActiveThread, theActiveThread);
    }

    pthread_t theActiveThread;
};

#endif

The class ThreadCollisionWarning has the responsibility to store the thread using the class (or more in general entering a critical section) while the nested class ScopedWatch is used to notify the entering and the leaving the critical section. Look the implementation of the two ThreadCollisionWarning::enter and ThreadCollisionWarning::leave, the former stores the thread Id only if the old value was 0 the latter just zeroes it. The macros simplify the usage.

So there we go, the class Shared becomes then:

struct Shared {
   void foo(char aC) { 
       SCOPED_WATCH(Shared)
       ...
   }

   THREAD_WATCH(Shared)
};

using SCOPED_WATCH we just check that two threads are not using the method foo concurrently.

Of course the implementation above is not by any mean a complete solution to the problem I exposed at the beginning, it helps and it can be a good start point to create a better tool to detect if someone messed around.

No comments: