- Things they did on the SPU’s (post mordum)
- What worked and didn’t work
- Practical advice
- Food for thought
3 Years ~120 team size with 27 programmers
- Cinematic
- Dense
- Realistic
- Intense
- 6 x 3.2 Ghz processor
- Local mem per SPU
- Very fast DMA
- Core Requirements
- Animation
- AI
- Skinning
- Physics
- Compression/Decompression
- etc
Graphics
- Light probe sampling
- ~2500 static light probes per level
- 9x3 Spherical Harmonics in KD-tree
- sample light, blend 4 closet light probes, rotate in view space,
- bake lights into level
Particle simulation
- 250 particle systems per frame
- 150 drawn
- 3000 particles updated
- 200 colision ray cast
- System grown over time
Refactoered
- Vertex generation
- Particle simulation inner loop
- Initilaization & deletion of particles
- High-level management / glue
Not done on SPU
- Updated global scene graph
- Starting & stopping sounds
Image Post Processing
Effects done on SPU
- Moiton blur
- Depth of field
- Bloom
Spu assist the RSX with post-processing
- RSX prepares low-rew image buffers
- RSX triggers interrupt to start SPUs
- SPUs perform image operations
- RSX already starts next frame
- Result in SPU processed by RSX early in next frame
- Similar to PhyreEngine now
SPUs are compute-bound
- Bandwidth no issue
- Code can be optimized
Our trade of: RSX vs SPU time
- SPUs take longer
- SPUs look better
- RSX was the bottleneck
Bloom and Lens Relection
- 13% on one spu
- Depth dependnd intensity response curve
- 7x7 guassian blur
- Upscaling resulst from deifferent levels
- Internal Lens Relfection
- Result buffer
Waypoint cover maps –> depth map
IBL Sampling
SPU cost a lot of dev time
Code is future proof, scales to more cores, supports the items they require.
The future is memory-local and excessively parallel
SPUS are just one of these ‘new architectures’
Optimize for the concept
Keep code portable
Parallelization of code takes time
Treat CPU as cluster
Think in workloads / jobs
Build latency in algorithms
Don’t optimize too early
No comments:
Post a Comment