Current Hardware
- 360 – 6 hardware threads
- PS3 – 9 hardware threads
- Windows – Quad cores not uncommon
- Point being multi-core is here to stay
Multithreading is mandatory if you want to harness the available power. If not you are really wasting the advanced features of the hardware.
Multithreaded programming is easy if you don’t share data. :) Of course this is not usually an option.
Best way to share data between threads is by using locks. This is important. Lockless is not a one-size fits all approach.
Lockless programming typically involves a job queue, using STL queue. The problem is STL queues not thread safe. So we have to make them safe. :)
Solution, use critical section to block off the code.
Bad things
- Acquiring and releasing locks takes time
- Deadlocks
- Contention – waiting, holding locks too long
- Priority inversions – system threads on 360 do this (too often)
Use locks carefully or lockless
- Safely share data without locks (no deadlocks or priority inversion)
- Cons
- Very limited, tricky, generally not portable
sList (singly linked list) InterlockedPushEntrySList
This is NOT a queue! This is a stack! Don’t use on 360!!!!!
One writer, one reader (singleton) (works on paper, not in real world)
Read data (cpu to L2), write (cpu to L2)
Writes can happen before getting put in L2 cache
Happens on reads too (second read could come from L1)
read and write can pass each other
Power PC read / writes can pass each other but on x86 only load can pass a store
Reads not passing writes would basically disable L1, huge perf hit
publisher / subscriber model
ExportBarrier – no passing sign (stop sign) HANDLE BOTH reads and writes
Compilers are just as evil, rearrange code (single threaded)
Compiler/CPU reordering barriers needed
_ReadWriteBarrier(); x86
_lwsync(); PowerPC (both cpu and compiler)
Positioning is crucial (barrier between writes)
write-release semantics is the name
read-acquire semantics is the name
reader needs both read / write
Dekker’s / Peterson’s Algorithm
MemoryBarrier
- x86 _asm xchange Barrier, eax
- x64 _FastStorefence()
- power _sync();
what about volatile
standard volatile…..NO
doesn’t prevent CPU reordering and all variables would need to tagged volatile
VC++ is better, doesn’t prevent hardware reodering on 360
Acts as read-acquire / write-release on x86/x64 and Itanium
atomic <T> in C++0x
Double checked locking – singleton
InterlockedXxx
doesn’t work on 360
its a full barrier on x86/x64/Itanium
InterlockedXxx Acquire/Release are portable (preferred)
Uses
- Reference counts
- Setting a flag
- Publish/Subscribe
- SLists
- XMCore on 360
- Double checked locking
Export, import, full barriers
Prefer to use locks!!!!
use lockless when locks are too costly
http://msdn.microsoft.com/en-us/library/bb310595(VS.85).aspx
No comments:
Post a Comment