Thursday, March 26, 2009

Lockless Programming in Games (Bruce Dawson, Microsoft)

Current Hardware

  • 360 – 6 hardware threads
  • PS3 – 9 hardware threads
  • Windows – Quad cores not uncommon
  • Point being multi-core is here to stay

Multithreading is mandatory if you want to harness the available power.  If not you are really wasting the advanced features of the hardware.

Multithreaded programming is easy if you don’t share data.  :)  Of course this is not usually an option.

Best way to share data between threads is by using locks.  This is important.  Lockless is not a one-size fits all approach.

Lockless programming typically involves a job queue, using STL queue.  The problem is STL queues not thread safe.  So we have to make them safe. :)

Solution, use critical section to block off the code.

Bad things

  • Acquiring and releasing locks takes time
  • Deadlocks
  • Contention – waiting, holding locks too long
  • Priority inversions – system threads on 360 do this (too often)

Use locks carefully or lockless

  • Safely share data without locks (no deadlocks or priority inversion)
  • Cons
      • Very limited, tricky, generally not portable

sList (singly linked list) InterlockedPushEntrySList

This is NOT a queue!  This is a stack!  Don’t use on 360!!!!!

One writer, one reader (singleton) (works on paper, not in real world)

Read data (cpu to L2), write (cpu to L2)

Writes can happen before getting put in L2 cache

Happens on reads too (second read could come from L1)

read and write can pass each other


Power PC read / writes can pass each other but on x86 only load can pass a store

Reads not passing writes would basically disable L1, huge perf hit

publisher / subscriber model

ExportBarrier – no passing sign (stop sign) HANDLE BOTH reads and writes


Compilers are just as evil, rearrange code (single threaded)

Compiler/CPU reordering barriers needed

_ReadWriteBarrier();   x86

_lwsync();  PowerPC  (both cpu and compiler)

Positioning is crucial (barrier between writes)

write-release semantics is the name

read-acquire semantics is the name

reader needs both read / write


Dekker’s / Peterson’s Algorithm



  • x86 _asm xchange Barrier, eax
  • x64 _FastStorefence()
  • power _sync();


what about volatile

standard volatile…..NO

doesn’t prevent CPU reordering and all variables would need to tagged volatile

VC++ is better, doesn’t prevent hardware reodering on 360

Acts as read-acquire / write-release on x86/x64 and Itanium

atomic <T> in C++0x

Double checked locking – singleton



doesn’t work on 360

its a full barrier on x86/x64/Itanium

InterlockedXxx Acquire/Release are portable (preferred)


  • Reference counts
  • Setting a flag
  • Publish/Subscribe
  • SLists
  • XMCore on 360
  • Double checked locking

Export, import, full barriers

Prefer to use locks!!!!

use lockless when locks are too costly

No comments: