How we made Killzone 2 run @ 30FPS
- Deferred shading
- Diet for render targets
- Dirty lighting tricks
- Rendering, memory and SPUs
Deferred shading
- not forward rendering
- Geometry pass – fill the GBuffer (all material info for lighting)
- loading depth map, normal / bump map, albedo (diffuse color and texture), shininess (reflective materials)
- Lighting pass – accumulate info (only light, no textures)
GBuffer
- RGBA FP16 buffers proved to be too much
- Moved to RGBA8
- 4xRGBA8 + D24S8 – 18.4mb
- 2xMSAA (Quincunx) – 36.8mb
- Memory reused by later rendering stages
- Low res pass, post processing, HUD
- View space position computed from depth buffer
- Normal.z = sqrt(1 – Normal.x2 – Normal.y2)
- No neg z, but does not cause problems
- 2xFP16 compressed to RGBA8 on write
- Motion vectors – screen space
- Albedo – material diffuse color
- Roughness – specular exponent in log range
- Specular intesity – single channel only
- Sun Shadow – pre-rendered sun shadows (offline light map)
- Mixed with real-time sun shadows
- Lighting accumulation buffer (LAB)
- Geometry pass fills in indirect lighting terms
- Stored in lightmaps and IBLs
- Adds ambient color, scene reflections
- Lighting pass adds contribution of each light
- Geometry pass fills in indirect lighting terms
- Glow – contains HDR luminance of LAB
- Used to reconstruct HDR RGB for bloom
Lighting pass
- Most expensive pass
- 100+ dynamic lights per frame
- 10+ shadow casting lights per frame
- AA means more of everything
- Optimization
- Avoid hard work
- Work less for MSAA
- Precompute sun shadow offline
- Approximate
Avoid hard work
- Don’t run shaders
- Use early z/stencil cull unit
- Depth bounds test is the new cool
- Enable conditional rendering
- Optimized light shaders
- For each combination of light features
- Fade out shadows for small lights
- Remove small objects from shadow map
Lighting pass and MSAA
- MSAA facts
- Each sample has to be lit
- Samples of non-edge pixel are equal
- KZ2 solution – in shader supersampling
- Run at 1280x720 not 2560x720
- Light two samples in one go
Shadow map filtering distribution
- Motivation
- Define filtering quality per pixel rather than per sample.
- Split filter coordinates into disjoint sets
- One set per pixel sample
- MSAA is almost as fast as non-MSAA
Sunlight
- Fullscreen directional light
- We divide screen into depth slices
- Each depth slice is lit separately
- Different shadow properties
- Used depth bounds test
- Use sun shadow from GBuffer
- Stencil mark pixels completely in shadow
- Skip expensive sunlight shader
- Also mixed with real-time shadows
- Stencil mark pixels completely in shadow
Sunlight rendering – Fake MSAA
- Used only in distance pixels
- Cut down lighting cost
- Run lighting equation on closest sample only
- Is this wrong?
- Its a hack
- Works correctly against background
- The edges are still partially anti-aliased
- Distant scenery is heavily post processed
Sunlight – shadow map rendering
- Generate shadow map for each depth slice
- Common approach
- Align shadow map to view direction
- Pros – max shadow map usage
- Result – shadow map shimmering
- Fix
- Remove shadow map rotation
- Align shadow maps to world instead of view
- Remove sub-pixel movement
- Cons – unused shadow map space
- Remove shadow map rotation
GPU driven memory allocation system
Push Buffer building
- Multiple SPUs building PB in parallel
- Additional SPUs generating data
- Skinning, particles – VB
- IBL interpolation – textures
- Common solutions
- Ring Buffering
- Issue with out of order allocations
- Double Buffering
- Too much memory
- Ring Buffering
KZ2 render memory allocator
- Fixed mem pool
- 22MB block – split into 256k blocks
- Each block has associated AllocationID
- Specified by client during allocation
- Only whole block can be allocated
- Global FreeID identify free blocks
- Updated as RSX consumes ‘Free’ marker
- Lockless, out of order, memory allocation
- From PPU and/or SPU
- Simple table walk (fast!)
- Allows immediate memory reuse
- WE generate push-buffer just in time for RSX
- Block can be reused right after RSX consumption
- Can allocate memory for skinning early…
- and still free at correct point in frame
No comments:
Post a Comment