Direct X, XNA, etc: The PlayStation 3’s SPUs in the Real World (presented by Michiel van der Leeuw)

Thursday, March 26, 2009

The PlayStation 3’s SPUs in the Real World (presented by Michiel van der Leeuw)

Things they did on the SPU’s (post mordum)
What worked and didn’t work
Practical advice
Food for thought

3 Years ~120 team size with 27 programmers

Cinematic
Dense
Realistic
Intense

6 x 3.2 Ghz processor
Local mem per SPU
Very fast DMA

Core Requirements

Animation
AI
Skinning
Physics
Compression/Decompression
etc

Graphics

Light probe sampling
~2500 static light probes per level
9x3 Spherical Harmonics in KD-tree
sample light, blend 4 closet light probes, rotate in view space,
bake lights into level

Particle simulation

250 particle systems per frame
150 drawn
3000 particles updated
200 colision ray cast
System grown over time

Refactoered

Vertex generation
Particle simulation inner loop
Initilaization & deletion of particles
High-level management / glue

Not done on SPU

Updated global scene graph
Starting & stopping sounds

Image Post Processing

Effects done on SPU

Moiton blur
Depth of field
Bloom

Spu assist the RSX with post-processing

RSX prepares low-rew image buffers
RSX triggers interrupt to start SPUs
SPUs perform image operations
RSX already starts next frame
Result in SPU processed by RSX early in next frame
Similar to PhyreEngine now

SPUs are compute-bound

Bandwidth no issue
Code can be optimized

Our trade of: RSX vs SPU time

SPUs take longer
SPUs look better
RSX was the bottleneck

Bloom and Lens Relection

13% on one spu
Depth dependnd intensity response curve
7x7 guassian blur
Upscaling resulst from deifferent levels
Internal Lens Relfection
Result buffer

Waypoint cover maps –> depth map

IBL Sampling

SPU cost a lot of dev time

Code is future proof, scales to more cores, supports the items they require.

The future is memory-local and excessively parallel

SPUS are just one of these ‘new architectures’

Optimize for the concept

Keep code portable

Parallelization of code takes time

Treat CPU as cluster

Think in workloads / jobs

Build latency in algorithms

Don’t optimize too early

No comments:

Subscribe to: Post Comments (Atom)