Sei sulla pagina 1di 235

Mittwoch, 17.

Juni 2009 1
Björn Knafla

Mittwoch, 17. Juni 2009 2


Good afternoon, my name is…
Parallelization
of
Game AI

Mittwoch, 17. Juni 2009 3


Motivation - why this talks?
Today I will give an overview of ways to parallelize game ai
… in no way complete, just a brief view
… purely technology oriented
Mittwoch, 17. Juni 2009 4
Developing Game AI is a challenge
bigger worlds, more details, more interactions, more entities, dynamic worlds,
more realism, expectations of lifelike behavior, emotions, individuals, …
More

Mittwoch, 17. Juni 2009 4


Developing Game AI is a challenge
bigger worlds, more details, more interactions, more entities, dynamic worlds,
more realism, expectations of lifelike behavior, emotions, individuals, …
More
Dynamic

Mittwoch, 17. Juni 2009 4


Developing Game AI is a challenge
bigger worlds, more details, more interactions, more entities, dynamic worlds,
more realism, expectations of lifelike behavior, emotions, individuals, …
More
Dynamic
Lifelike

Mittwoch, 17. Juni 2009 4


Developing Game AI is a challenge
bigger worlds, more details, more interactions, more entities, dynamic worlds,
more realism, expectations of lifelike behavior, emotions, individuals, …
More
Dynamic
Lifelike
Realism
Mittwoch, 17. Juni 2009 4
Developing Game AI is a challenge
bigger worlds, more details, more interactions, more entities, dynamic worlds,
more realism, expectations of lifelike behavior, emotions, individuals, …
Mittwoch, 17. Juni 2009 5
Power wall leads to parallel hardware leads to parallel programming
Mittwoch, 17. Juni 2009 5
Power wall leads to parallel hardware leads to parallel programming
Mittwoch, 17. Juni 2009 6

-Errors due to non-synchronized access to shared resources


-Errors due to tasks blocking each other because they need resources the other holds
-Contention due to synchronization which limits performance
Parallel Programming
Problems

Mittwoch, 17. Juni 2009 6

-Errors due to non-synchronized access to shared resources


-Errors due to tasks blocking each other because they need resources the other holds
-Contention due to synchronization which limits performance
Parallel Programming
Problems
Errors

Mittwoch, 17. Juni 2009 6

-Errors due to non-synchronized access to shared resources


-Errors due to tasks blocking each other because they need resources the other holds
-Contention due to synchronization which limits performance
Parallel Programming
Problems
Errors
Race conditions

Mittwoch, 17. Juni 2009 6

-Errors due to non-synchronized access to shared resources


-Errors due to tasks blocking each other because they need resources the other holds
-Contention due to synchronization which limits performance
Parallel Programming
Problems
Errors
Race conditions
Deadlocks

Mittwoch, 17. Juni 2009 6

-Errors due to non-synchronized access to shared resources


-Errors due to tasks blocking each other because they need resources the other holds
-Contention due to synchronization which limits performance
Parallel Programming
Problems
Errors
Race conditions
Deadlocks
Performance

Mittwoch, 17. Juni 2009 6

-Errors due to non-synchronized access to shared resources


-Errors due to tasks blocking each other because they need resources the other holds
-Contention due to synchronization which limits performance
Parallel Programming
Problems
Errors
Race conditions
Deadlocks
Performance
Contention

Mittwoch, 17. Juni 2009 6

-Errors due to non-synchronized access to shared resources


-Errors due to tasks blocking each other because they need resources the other holds
-Contention due to synchronization which limits performance
Mittwoch, 17. Juni 2009 7

-Parallel computing is not deterministic


Not Deterministic

Mittwoch, 17. Juni 2009 7

-Parallel computing is not deterministic


Mittwoch, 17. Juni 2009 8
I can’t offer you a silver bullet
Mittwoch, 17. Juni 2009 8
I can’t offer you a silver bullet
Mittwoch, 17. Juni 2009 9

… but will show you some methods and approaches that can help you exploit parallelism and
push your AI further
Mittwoch, 17. Juni 2009 10
Outline

Mittwoch, 17. Juni 2009 11


… and following a panel discussion with
Julien Hamaide, Markus Mohr, and me
Outline

1. Parallel Hardware

Mittwoch, 17. Juni 2009 11


… and following a panel discussion with
Julien Hamaide, Markus Mohr, and me
Outline

1. Parallel Hardware
2. Example AI

Mittwoch, 17. Juni 2009 11


… and following a panel discussion with
Julien Hamaide, Markus Mohr, and me
Outline

1. Parallel Hardware
2. Example AI
3. Parallelization Approaches

Mittwoch, 17. Juni 2009 11


… and following a panel discussion with
Julien Hamaide, Markus Mohr, and me
Outline

1. Parallel Hardware
2. Example AI
3. Parallelization Approaches
4. Conclusion

Mittwoch, 17. Juni 2009 11


… and following a panel discussion with
Julien Hamaide, Markus Mohr, and me
Parallel Hardware

Mittwoch, 17. Juni 2009 12


… lets lift off
Its the parallel hardware that forces us to program in parallel - we need to understand it to understand parallel programming - a different programming model than Von Neumann Architecture
(Model for sequential programming)
- high-level view of processor
Mittwoch, 17. Juni 2009 13
assume single address space and shared memory
core like a CPU, a processing unit
Core

Mittwoch, 17. Juni 2009 13


assume single address space and shared memory
core like a CPU, a processing unit
Core

L1

Mittwoch, 17. Juni 2009 13


assume single address space and shared memory
core like a CPU, a processing unit
Core

L1

Memory

Mittwoch, 17. Juni 2009 13


assume single address space and shared memory
core like a CPU, a processing unit
Core

L1

Mittwoch, 17. Juni 2009 13


assume single address space and shared memory
core like a CPU, a processing unit
Core

L1

Mittwoch, 17. Juni 2009 14


mention NUMA
mention shared caches and caches that are connected
mention homogeneous core clusters and heterogeneous core cluster (PS3 , PC and GPU)
Core Core

L1 L1

Mittwoch, 17. Juni 2009 14


mention NUMA
mention shared caches and caches that are connected
mention homogeneous core clusters and heterogeneous core cluster (PS3 , PC and GPU)
Core Core

L1 L1

L2

Memory

Mittwoch, 17. Juni 2009 14


mention NUMA
mention shared caches and caches that are connected
mention homogeneous core clusters and heterogeneous core cluster (PS3 , PC and GPU)
Core Core Core Core Core Core

L1 L1 L1 L1 L1 L1

L2 L2 L2

Memory

Mittwoch, 17. Juni 2009 14


mention NUMA
mention shared caches and caches that are connected
mention homogeneous core clusters and heterogeneous core cluster (PS3 , PC and GPU)
Core Core Core Core Core Core

L1 L1 L1 L1 L1 L1

L2 L2 L2

Memory

Mittwoch, 17. Juni 2009 14


mention NUMA
mention shared caches and caches that are connected
mention homogeneous core clusters and heterogeneous core cluster (PS3 , PC and GPU)
Mittwoch, 17. Juni 2009 15
Problem revisited: memory wall -> latency fighting, bandwidth usage
Triggered by faster and faster processors - or
more and more cores that want to be fed
Memory Wall

Mittwoch, 17. Juni 2009 15


Problem revisited: memory wall -> latency fighting, bandwidth usage
Triggered by faster and faster processors - or
more and more cores that want to be fed
Latency

Mittwoch, 17. Juni 2009 16


Latency doesn’t decrease as fast as core speed went / goes up
Latency

Mittwoch, 17. Juni 2009 16


Latency doesn’t decrease as fast as core speed went / goes up
Bandwidth

Mittwoch, 17. Juni 2009 17


Bandwidth increases - but precious resource when we have tons of cores (thousands? ten thousands? …) that can easily saturate memory “bus”/connection
Bandwidth

Mittwoch, 17. Juni 2009 17


Bandwidth increases - but precious resource when we have tons of cores (thousands? ten thousands? …) that can easily saturate memory “bus”/connection
Mittwoch, 17. Juni 2009 18
sharing is the root of contention - sharing needs synchronization
cache-coherency protocol ping-pong
communication used bandwidth and takes time due to latency
synchronization cheaper for cores that share a cache or local storage
Combined
Problem

Mittwoch, 17. Juni 2009 18


sharing is the root of contention - sharing needs synchronization
cache-coherency protocol ping-pong
communication used bandwidth and takes time due to latency
synchronization cheaper for cores that share a cache or local storage
Combined
Problem

Sharing

Mittwoch, 17. Juni 2009 18


sharing is the root of contention - sharing needs synchronization
cache-coherency protocol ping-pong
communication used bandwidth and takes time due to latency
synchronization cheaper for cores that share a cache or local storage
Solutions

Mittwoch, 17. Juni 2009 19


Latency

Mittwoch, 17. Juni 2009 20

Prefetching
SMT -> THROUGHPUT
Maximize
Throughput

Mittwoch, 17. Juni 2009 21


Fight latency
Out-of-order execution
Caches or local storage
Prefetching
Simultaneous Multithreading (SMT)
Bandwidth

Mittwoch, 17. Juni 2009 22


Maximize Memory
Locality

Mittwoch, 17. Juni 2009 23


Fight bandwidth problem
especially between SMT hardware threads that share a cache
Reuse cache or local storage
Increase arithmetic intensity
Contiguous memory accesses - not random
Coalesced memory accesses
Sharing

Mittwoch, 17. Juni 2009 24


Minimize (False)
Sharing

Mittwoch, 17. Juni 2009 25


minimize sharing to prevent cache-coherency protocol ping-pong
minimize sharing to minimize need for syncing
minimize syncing
Balance

Mittwoch, 17. Juni 2009 26


-balance synchronization needs vs. pure parallelism
Balance

maximize minimize
vs.
Locality Sharing

Mittwoch, 17. Juni 2009 26


-balance synchronization needs vs. pure parallelism
Mittwoch, 17. Juni 2009 27
Ok, so we have covered the background - we know the basic properties of parallel hardware, lets take a “real” example to see what all of this has to do with Game AI
- Computer controlled entities = agents
- real-time game, agents cooperate to capture a flag or clear an area
Example AI

Mittwoch, 17. Juni 2009 27


Ok, so we have covered the background - we know the basic properties of parallel hardware, lets take a “real” example to see what all of this has to do with Game AI
- Computer controlled entities = agents
- real-time game, agents cooperate to capture a flag or clear an area
Agent

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Agent

Reactive behavior

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Agent
Sensing
Reactive behavior

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Agent
Sensing
Reactive behavior
Pathfinding

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Agent
Sensing
Reactive behavior
Pathfinding

Steering

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Agent
Sensing
Reactive behavior
Pathfinding

Steering Locomotion

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Agent
Sensing
Reactive behavior
Pathfinding

Steering Locomotion

Acting

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Agent
Sensing
Reactive behavior
Pathfinding

Steering Locomotion

Acting Animation

Mittwoch, 17. Juni 2009 28


agents have higher-level planning AI, lower-level reactive AI (FSM or BTs)
they sense their environment and use pathfinding and locomotion planning to create effector actions
- reactive agent (higher level might be planning + terrain analysis)
- uses sensing to perceive its surrounding
- plans paths with pathfinding and follows paths using steering and locomotion planning
- generates actions (movement, taking ammunition, firing a gun) and animations
Parallelization
Approaches

Mittwoch, 17. Juni 2009 29


Now on towards real parallelim, simultaneous execution of multiple tasks
Parallelize because conceptual planning shows need
Parallelize because profiling shows need
I show parallelization for performance (game logic relevant) - other alternative, parallelization for eye candy - non-game logic relevant but nice and fluffy
1.
Asynchronous Calls

Mittwoch, 17. Juni 2009 30

Ok, lets concentrate purely on AI from now on


Profiler driven
Valve’s Source Engine supports this
Mittwoch, 17. Juni 2009 31

Sequential agent update-loop


Run pathfinding async on other thread
Freestep using raw threads -> not deterministic
Uncontrollable scaling with raw threads (over- or under-subscribed, context switching)
Expensive to create and destroy raw threads often
Frames

1
Core

Mittwoch, 17. Juni 2009 31

Sequential agent update-loop


Run pathfinding async on other thread
Freestep using raw threads -> not deterministic
Uncontrollable scaling with raw threads (over- or under-subscribed, context switching)
Expensive to create and destroy raw threads often
Frames

1
Core

1
2
3
4
5
...

Mittwoch, 17. Juni 2009 31

Sequential agent update-loop


Run pathfinding async on other thread
Freestep using raw threads -> not deterministic
Uncontrollable scaling with raw threads (over- or under-subscribed, context switching)
Expensive to create and destroy raw threads often
Frames

1
Core

1 Agent 1
2
3
4
5
...

Mittwoch, 17. Juni 2009 31

Sequential agent update-loop


Run pathfinding async on other thread
Freestep using raw threads -> not deterministic
Uncontrollable scaling with raw threads (over- or under-subscribed, context switching)
Expensive to create and destroy raw threads often
Frames

1
Core

1 Agent 1
2
3
4
5
...

Mittwoch, 17. Juni 2009 31

Sequential agent update-loop


Run pathfinding async on other thread
Freestep using raw threads -> not deterministic
Uncontrollable scaling with raw threads (over- or under-subscribed, context switching)
Expensive to create and destroy raw threads often
Frames

1
Core

1 Agent 1
2 Freestep Path 1
3
4
5
...

Mittwoch, 17. Juni 2009 31

Sequential agent update-loop


Run pathfinding async on other thread
Freestep using raw threads -> not deterministic
Uncontrollable scaling with raw threads (over- or under-subscribed, context switching)
Expensive to create and destroy raw threads often
Frames

1 2
Core

1 Agent 1 Agent 2 Agent 3 Agent 4 ...


2 Freestep Path 1
3 Freestep Path 2
4 Freestep Path 3
5 Freestep Path 4
...

Mittwoch, 17. Juni 2009 31

Sequential agent update-loop


Run pathfinding async on other thread
Freestep using raw threads -> not deterministic
Uncontrollable scaling with raw threads (over- or under-subscribed, context switching)
Expensive to create and destroy raw threads often
Mittwoch, 17. Juni 2009 32

Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Task Pool

Mittwoch, 17. Juni 2009 32

Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Task Pool

Mittwoch, 17. Juni 2009 32

Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 32

Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Enqueued
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 32

Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Enqueued
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 33


Frames

1 2

Enqueued
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 33


Frames

1 2

Enqueued
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 34

Task decomposition -> pathfinding


… for all agents
Frames

1 2

Agent 1

Enqueued
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 34

Task decomposition -> pathfinding


… for all agents
Frames

1 2

Agent 1

Enqueued
1
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 34

Task decomposition -> pathfinding


… for all agents
Frames

1 2

Agent 1

Enqueued
1
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 35


Frames

1 2

Agent 1 Agent 2

Enqueued
1 2
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 35


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
1 2 3 4
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 36


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
1 2 3 4
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 37

Task pool schedules tasks onto free threads


… for all tasks enqueued
… observes queues
Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
2 3 4
Tasks
Task Pool

Thread 1
1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 37

Task pool schedules tasks onto free threads


… for all tasks enqueued
… observes queues
Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
2 3 4
Tasks
Task Pool

Thread 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 38


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
2 3 4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 38


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
2 3 4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 39


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
3 4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
2
Thread n

Mittwoch, 17. Juni 2009 39


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
3 4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Thread n

Mittwoch, 17. Juni 2009 40


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
3 4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n

Mittwoch, 17. Juni 2009 40


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
3 4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n

Mittwoch, 17. Juni 2009 41


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
3

Mittwoch, 17. Juni 2009 41


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n

Mittwoch, 17. Juni 2009 42


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
Freestep path 3

Mittwoch, 17. Juni 2009 42


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
4
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
Freestep path 3

Mittwoch, 17. Juni 2009 43


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
Tasks
Task Pool

Thread 1
Freestep Path 1 4
Thread 2
Freestep Path 2
Thread n
Freestep path 3

Mittwoch, 17. Juni 2009 43


Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
Tasks
Task Pool

Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
Freestep path 3

Mittwoch, 17. Juni 2009 44

Scaling + automatic load balancing!!!


Still not deterministic -> lockstep approach with pathfinding iteration steps -> better
bandwidth control with lockstep and systems
Task pools from now on -> or whatever helps to automatize SPU or GPU usage.
Frames

1 2

Agent 1 Agent 2 Agent 3 Agent 4 ...

Enqueued
Tasks
Task Pool

Thread 1
Freestep Path 1 Freestep Path 4
Thread 2
Freestep Path 2
Thread n
Freestep path 3

Mittwoch, 17. Juni 2009 44

Scaling + automatic load balancing!!!


Still not deterministic -> lockstep approach with pathfinding iteration steps -> better
bandwidth control with lockstep and systems
Task pools from now on -> or whatever helps to automatize SPU or GPU usage.
Frames

1 2

Agent 1 Agent 2 Agent


ea s
d 3 Agent 4 ...
No raw thr
anymore
Enqueued
from now on!
Tasks
Task Pool

Thread 1
Freestep Path 1 Freestep Path 4
Thread 2
Freestep Path 2
Thread n
Freestep path 3

Mittwoch, 17. Juni 2009 44

Scaling + automatic load balancing!!!


Still not deterministic -> lockstep approach with pathfinding iteration steps -> better
bandwidth control with lockstep and systems
Task pools from now on -> or whatever helps to automatize SPU or GPU usage.
Synchronization

Mittwoch, 17. Juni 2009 45

Futures
Events / messages
Polling of a pathfinding system
etc.
Future

Mittwoch, 17. Juni 2009 46


Future
Agent
Instance

Mittwoch, 17. Juni 2009 46


Future
Agent
Instance

Path
Future

Mittwoch, 17. Juni 2009 46


Future
Agent
Instance

Path
Future

Freestep Path
Mittwoch, 17. Juni 2009 46
Future
Agent
Instance

Path
Future

Freestep Path
Mittwoch, 17. Juni 2009 46
Future
Agent
Instance

Path
Future

Freestep Path
Mittwoch, 17. Juni 2009 47

Not deterministic (we need other syncing + lockstep for that)


Event / Message

Mittwoch, 17. Juni 2009 48

React to messages in the next frame!!


With lockstep system deterministic
Or polling
Enqueue path result or only message to pull path result from inside the path system
Event / Message
Agent
Instance

Mittwoch, 17. Juni 2009 48

React to messages in the next frame!!


With lockstep system deterministic
Or polling
Enqueue path result or only message to pull path result from inside the path system
Event / Message
Agent
Instance
Msg
Queue

Mittwoch, 17. Juni 2009 48

React to messages in the next frame!!


With lockstep system deterministic
Or polling
Enqueue path result or only message to pull path result from inside the path system
Event / Message
Agent
Instance
Msg
Queue

Freestep Path
Mittwoch, 17. Juni 2009 48

React to messages in the next frame!!


With lockstep system deterministic
Or polling
Enqueue path result or only message to pull path result from inside the path system
Event / Message
Agent
Instance
Msg
Queue

Freestep Path
Mittwoch, 17. Juni 2009 48

React to messages in the next frame!!


With lockstep system deterministic
Or polling
Enqueue path result or only message to pull path result from inside the path system
Event / Message
Agent
Instance
Msg
Queue

Freestep Path
Mittwoch, 17. Juni 2009 48

React to messages in the next frame!!


With lockstep system deterministic
Or polling
Enqueue path result or only message to pull path result from inside the path system
Asynchronous Calls
Pros Cons

Mittwoch, 17. Juni 2009 49


Asynchronous Calls
Pros Cons

Scaling (with Task Pool)

Potentially better
memory locality

Mittwoch, 17. Juni 2009 50


Asynchronous Calls
Pros Cons

Determinism needs
Scaling (with Task Pool)
care
Potentially better
Get syncing right
memory locality

Beware of side effects

Mittwoch, 17. Juni 2009 51


Asynchronous Calls
Avo id raw th re a d s !
Pros Cons

Determinism needs
Scaling (with Task Pool)
care
Potentially better
Get syncing right
memory locality

Beware of side effects

Mittwoch, 17. Juni 2009 51


Mittwoch, 17. Juni 2009 52

What if seq. updates take too long?


We still have lots of untapped parallelism left (cores to exploit)
2.
Parallel Agents

Mittwoch, 17. Juni 2009 52

What if seq. updates take too long?


We still have lots of untapped parallelism left (cores to exploit)
Mittwoch, 17. Juni 2009 53

Look into one agent update step


We process agents in a sequential loop -> opportunity for parallelism!!!!
Agents as tasks
Frames

Mittwoch, 17. Juni 2009 53

Look into one agent update step


We process agents in a sequential loop -> opportunity for parallelism!!!!
Agents as tasks
Frames

Update
per time
step

Mittwoch, 17. Juni 2009 53

Look into one agent update step


We process agents in a sequential loop -> opportunity for parallelism!!!!
Agents as tasks
Frames

Update Update 1 Update 2 Updt. 3 Update 4


per time
step

Mittwoch, 17. Juni 2009 53

Look into one agent update step


We process agents in a sequential loop -> opportunity for parallelism!!!!
Agents as tasks
Frames

Agent
instances

Update Update 1 Update 2 Updt. 3 Update 4


per time
step

Mittwoch, 17. Juni 2009 53

Look into one agent update step


We process agents in a sequential loop -> opportunity for parallelism!!!!
Agents as tasks
Frames

Agent
instances Agent Agent Agent Agent
1 2 3 4

Update Update 1 Update 2 Updt. 3 Update 4


per time
step

Mittwoch, 17. Juni 2009 53

Look into one agent update step


We process agents in a sequential loop -> opportunity for parallelism!!!!
Agents as tasks
Frames

Paralleliz a tio n p o te n t ia l!
Agent
instances Agent Agent Agent Agent
1 2 3 4

Update Update 1 Update 2 Updt. 3 Update 4


per time
step

Mittwoch, 17. Juni 2009 53

Look into one agent update step


We process agents in a sequential loop -> opportunity for parallelism!!!!
Agents as tasks
Frames

Agent Agent Agent Agent


1 2 3 4
Update 1 Update 2 Updt. 3 Update 4

Mittwoch, 17. Juni 2009 54

Think message passing + no shared mem to steer thinking!!!!!!!!!!!


Task + message passing
Task pool threads or on SPU or GPU or whatever -> scaling + automatic load balance
Frames

Agent Agent Agent Agent


1 2 3 4
Update 1 Update 2 Updt. 3 Update 4

Task Pool Thread n


Task Pool Thread n+1

Mittwoch, 17. Juni 2009 54

Think message passing + no shared mem to steer thinking!!!!!!!!!!!


Task + message passing
Task pool threads or on SPU or GPU or whatever -> scaling + automatic load balance
Frames

Agent Agent Agent Agent


1 2 3 4

Task Pool Thread n


Update 1 Updt. 3
Task Pool Thread n+1
Update 2 Update 4

Mittwoch, 17. Juni 2009 54

Think message passing + no shared mem to steer thinking!!!!!!!!!!!


Task + message passing
Task pool threads or on SPU or GPU or whatever -> scaling + automatic load balance
Frames

Agent Agent Agent Agent


1 2 3 4

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 55


Frames

Agent Agent Agent Agent


1 2 3 4

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 55


Frames

Agent Agent Agent Agent


1 2 3 4

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 56

Shared data structures like spatial data and accessing / using other game world entities
-> unorganized - bad for bandwidth + locality bad and not deterministic
-> syncing needed -> convoying -> contention -> serialization -> blocking bad for task
pools
… if sequential solution was order dependent, then we are non-deterministic and need
refactoring
Frames

Agent Agent Agent Agent Spatial Entity Entity


1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 56

Shared data structures like spatial data and accessing / using other game world entities
-> unorganized - bad for bandwidth + locality bad and not deterministic
-> syncing needed -> convoying -> contention -> serialization -> blocking bad for task
pools
… if sequential solution was order dependent, then we are non-deterministic and need
refactoring
Frames

Agent Agent Agent Agent Spatial Entity Entity


1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 56

Shared data structures like spatial data and accessing / using other game world entities
-> unorganized - bad for bandwidth + locality bad and not deterministic
-> syncing needed -> convoying -> contention -> serialization -> blocking bad for task
pools
… if sequential solution was order dependent, then we are non-deterministic and need
refactoring
Frames

Agent Agent Agent Agent Spatial Entity Entity


1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 57

Entity management is ownership management -> network and distributed programming


Suggestion: defer anything but read accesses, enqueue ownership/action requests and…
… work on them after all agents are updated, agents react to results in next update.
Detect conflict islands and resolve them deterministically. Compare with actor model ;-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 57

Entity management is ownership management -> network and distributed programming


Suggestion: defer anything but read accesses, enqueue ownership/action requests and…
… work on them after all agents are updated, agents react to results in next update.
Detect conflict islands and resolve them deterministically. Compare with actor model ;-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 57

Entity management is ownership management -> network and distributed programming


Suggestion: defer anything but read accesses, enqueue ownership/action requests and…
… work on them after all agents are updated, agents react to results in next update.
Detect conflict islands and resolve them deterministically. Compare with actor model ;-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 58

So, how to solve syncing, serialization, and determinism problem?


First: identify dependencies, reading, writing, public data, private data
Frames

Agent Agent Agent Agent Spatial Entity Entity


1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 59

Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 59

Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public
Frame-1

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 59

Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 59

Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 60

Reorganize update phase into two stages:


1. read public data and write own private data
-> reading public data must be without side effects!!!
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Mittwoch, 17. Juni 2009 61

collection
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Collection stage
Mittwoch, 17. Juni 2009 61

collection
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Collection stage
Mittwoch, 17. Juni 2009 61

collection
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Collection stage
Mittwoch, 17. Juni 2009 61

collection
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Collection stage
Mittwoch, 17. Juni 2009 62

processing and own data updating


now update last public data :-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3
Update 2 Update 4

Collection stage Instance update stage


Mittwoch, 17. Juni 2009 62

processing and own data updating


now update last public data :-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3 Update 1 Update 4


Update 2 Update 4 Update 2 Updt. 3

Collection stage Instance update stage


Mittwoch, 17. Juni 2009 62

processing and own data updating


now update last public data :-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame 1 2 3 4 Data 1 2

Update 1 Updt. 3 Update 1 Update 4


Update 2 Update 4 Update 2 Updt. 3

Collection stage Instance update stage


Mittwoch, 17. Juni 2009 62

processing and own data updating


now update last public data :-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame 1 2 3 4 Data 1 2

Update 1 Updt. 3 Update 1 Update 4


Update 2 Update 4 Update 2 Updt. 3

Collection stage Instance update stage


Mittwoch, 17. Juni 2009 62

processing and own data updating


now update last public data :-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame 1 2 3 4 Data 1 2

Update 1 Updt. 3 Update 1 Update 4


Update 2 Update 4 Update 2 Updt. 3

Collection stage Instance update stage


Mittwoch, 17. Juni 2009 62

processing and own data updating


now update last public data :-)
Frames

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3 Update 1 Update 4


Update 2 Update 4 Update 2 Updt. 3

Collection stage Instance update stage


Mittwoch, 17. Juni 2009 63

Private and public data a bit like double buffering, could also be handled with double
buffering…
On to the next time step
Frames

1 2

Agent Agent Agent Agent Spatial Entity Entity


Private 1 2 3 4 Data 1 2

Public Agent Agent Agent Agent Spatial Entity Entity


Frame-1 1 2 3 4 Data 1 2

Update 1 Updt. 3 Update 1 Update 4


Update 2 Update 4 Update 2 Updt. 3

Collection stage Instance update stage


Mittwoch, 17. Juni 2009 63

Private and public data a bit like double buffering, could also be handled with double
buffering…
On to the next time step
Parallel Agents
Pros Cons

Mittwoch, 17. Juni 2009 64

Still danger of side effects and non-thread-safe func calls in agent…


Really: implementation complex but no other way -> changes help even sequential exec!!!
First step to get better locality and more implicit syncing
Parallel Agents
Pros Cons
Guides parallelization

Scaling possible

Implicit syncing

Mittwoch, 17. Juni 2009 65


Parallel Agents
Pros Cons
Guides parallelization Implementation effort

Scaling possible Get determinism right

Implicit syncing Get syncing right

Avoid side effects

Bad locality

Mittwoch, 17. Juni 2009 66


Parallel Agents
I m ple m e n t a t io n e f fo r t
Pros e v e n Cons
but enables
mo r e
Guides parallelizationp a r a l l el i s m
Implementation effort

Scaling possible Get determinism right

Implicit syncing Get syncing right

Avoid side effects

Bad locality

Mittwoch, 17. Juni 2009 66


3.
Data Parallel Systems

Mittwoch, 17. Juni 2009 67

Unsolved problem from last approach: bad locality and bandwidth usage because of random/
unordered access to data structures -> lets improve this
Approach builds upon last approach
Approach supported by Valve’s Source Engine, used by Insomniac and Guerilla Games for
Killzone2
Mittwoch, 17. Juni 2009 68

… task pool thread or SPU/GPU instruction exec


Mittwoch, 17. Juni 2009 68

… task pool thread or SPU/GPU instruction exec


Update agent

Mittwoch, 17. Juni 2009 68

… task pool thread or SPU/GPU instruction exec


Update agent

Collection
stage

Mittwoch, 17. Juni 2009 68

… task pool thread or SPU/GPU instruction exec


Update agent

Collection Instance
stage update
stage

Mittwoch, 17. Juni 2009 68

… task pool thread or SPU/GPU instruction exec


Collection Instance
stage update
stage

Mittwoch, 17. Juni 2009 69

What happens inside the collection stage? Details


Mittwoch, 17. Juni 2009 69

What happens inside the collection stage? Details


Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic

Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic

sensing

Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic logic

sensing

Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic logic

sensing

Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic logic

sensing

Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic logic

sensing
pathfinding

Mittwoch, 17. Juni 2009 70

… looks different for every agent (order, timing, branches)


- agents run in parallel on many cores -> random mem accesses, no coordination
-> bandwidth?? memory locality??
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic logic

sensing
pathfinding

Mittwoch, 17. Juni 2009 71

Agent logic, sensing, pathfinding, terrain analysis, planning


-> all different aspects (of the behavior) of an agent
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

logic logic

sensing
pathfinding

Mittwoch, 17. Juni 2009 71

Agent logic, sensing, pathfinding, terrain analysis, planning


-> all different aspects (of the behavior) of an agent
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Mittwoch, 17. Juni 2009 71

Agent logic, sensing, pathfinding, terrain analysis, planning


-> all different aspects (of the behavior) of an agent
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Aspects
Mittwoch, 17. Juni 2009 71

Agent logic, sensing, pathfinding, terrain analysis, planning


-> all different aspects (of the behavior) of an agent
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Mittwoch, 17. Juni 2009 72

Slice the agent into its aspects


Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Mittwoch, 17. Juni 2009 72

Slice the agent into its aspects


Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Mittwoch, 17. Juni 2009 73

for all agents!!!


Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Mittwoch, 17. Juni 2009 73

for all agents!!!


Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Mittwoch, 17. Juni 2009 74

Group aspects processing and data -> and handle it by specialized systems
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

pathfinding sensing logic


logic

Mittwoch, 17. Juni 2009 74

Group aspects processing and data -> and handle it by specialized systems
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure

Path. Sensor AI logic


system system system
Mittwoch, 17. Juni 2009 74

Group aspects processing and data -> and handle it by specialized systems
Pathfinding data Sensor Agent
Entity
Path. Sensor
spatial data
structure
AI logic
inst.

system system system


Mittwoch, 17. Juni 2009 74

Group aspects processing and data -> and handle it by specialized systems
Path. Sensor AI logic
system system system
Mittwoch, 17. Juni 2009 75
Path. Sensor AI logic
system system system
Mittwoch, 17. Juni 2009 75
Path. Sensor AI logic
system system system
Mittwoch, 17. Juni 2009 76

No single update func per agent anymore...


System have dependencies ->independent systems could run in parallel, asynchronously
Stages of our AI (pipeline)
Aspects needed every step always computed, other deferred upon request -> async calls
Manage parallel resources if multiple systems run in parallel!
Path. Sensor AI logic
system system system
Mittwoch, 17. Juni 2009 76

No single update func per agent anymore...


System have dependencies ->independent systems could run in parallel, asynchronously
Stages of our AI (pipeline)
Aspects needed every step always computed, other deferred upon request -> async calls
Manage parallel resources if multiple systems run in parallel!
Sensor AI logic
system system

Path.
system

Mittwoch, 17. Juni 2009 76

No single update func per agent anymore...


System have dependencies ->independent systems could run in parallel, asynchronously
Stages of our AI (pipeline)
Aspects needed every step always computed, other deferred upon request -> async calls
Manage parallel resources if multiple systems run in parallel!
Sensor
system

Mittwoch, 17. Juni 2009 77

Innards of a system
Sensor system data

Sensor system execution

Mittwoch, 17. Juni 2009 77

Innards of a system
Sensor system data

Sensor system execution

Mittwoch, 17. Juni 2009 78

Sensor system execution runs through several stages


-> implicit syncing of stages (barriers)
-> first stage always active to collect input for next time step
Sensor system data

Sensor system execution

Collect data
updates and
sensing
requests

Mittwoch, 17. Juni 2009 78

Sensor system execution runs through several stages


-> implicit syncing of stages (barriers)
-> first stage always active to collect input for next time step
Sensor system data

Sensor system execution

Collect data Prepare


updates and parallel
sensing execution
requests

Mittwoch, 17. Juni 2009 78

Sensor system execution runs through several stages


-> implicit syncing of stages (barriers)
-> first stage always active to collect input for next time step
Sensor system data

Sensor system execution

Collect data Prepare Parallel


updates and parallel execution
sensing execution
requests

Mittwoch, 17. Juni 2009 78

Sensor system execution runs through several stages


-> implicit syncing of stages (barriers)
-> first stage always active to collect input for next time step
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 78

Sensor system execution runs through several stages


-> implicit syncing of stages (barriers)
-> first stage always active to collect input for next time step
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 79

Example: 2d grid to store sense-able entities and sensors


Data decomposition -> data parallelism
Lean data, locality possible!
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 79

Example: 2d grid to store sense-able entities and sensors


Data decomposition -> data parallelism
Lean data, locality possible!
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 80

Collect (message queue?)


… data or update commands for data (points = entities)
… and sensing requests (triangle = field of view)
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 80

Collect (message queue?)


… data or update commands for data (points = entities)
… and sensing requests (triangle = field of view)
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 81

Prepare data if necessary, possibly in parallel


Associate entities with grid cells or change their transformations
Add or enable/disable sensor view frustums, and associate them with grid cells
Sort if necessary...
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 81

Prepare data if necessary, possibly in parallel


Associate entities with grid cells or change their transformations
Add or enable/disable sensor view frustums, and associate them with grid cells
Sort if necessary...
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 82

Execute stages parallel:


… to maximize locality run sensing per grid cell and in parallel
-> control: iterations per timestep and
… lockstep possible -> determinism if order independent parallel computation
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 82

Execute stages parallel:


… to maximize locality run sensing per grid cell and in parallel
-> control: iterations per timestep and
… lockstep possible -> determinism if order independent parallel computation
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 83

Publish the computational results, via observer (Intel Smoke Framework),


or MVC pattern (planned for AiGameDev.com Sandbox)
… decide: buffer results till pulled or send them out
… decide: events, futures -> think determinism, look at async service call approach
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 83

Publish the computational results, via observer (Intel Smoke Framework),


or MVC pattern (planned for AiGameDev.com Sandbox)
… decide: buffer results till pulled or send them out
… decide: events, futures -> think determinism, look at async service call approach
Sensor system data

Sensor system execution

Collect data Prepare Parallel Publish


updates and parallel execution results
sensing execution
requests

Mittwoch, 17. Juni 2009 84

Exploit internal data access pattern by allow injection of Insomniac “SPU” shaders to generate
other data that might be useful.
Full control of context for system and “SPU” shader -> no side effects
-> easy to exploit parallelism offered by system developer
Data Parallel Systems
Pros Cons

Mittwoch, 17. Juni 2009 85

Data exchange between systems -> walk through system outputs and the inputs needed
from last system to first and design the systems and their data accordingly (Insomniac’s
advice).
Porting system for system possible
Data Parallel Systems
Pros Cons
Lean Data Structures
Locality
Scaling
Deterministic
Implicit Synchronization
Explicit Context
Possibilities: “Shaders”

Mittwoch, 17. Juni 2009 86

deterministic if no order dependency


Data Parallel Systems
Pros Cons
Lean Data Structures Implementation effort
Locality Intern. low-level parallel
Scaling Get syncing right
Deterministic Deferring necessary
Implicit Synchronization
Explicit Context
Possibilities: “Shaders”

Mittwoch, 17. Juni 2009 87


Stream Processing
Input flows through kernels into output
buffers
Solution space constraint
Automatic parallelization
Cross-platform capable
Look into:
Emergen’t Floodgate tech
OpenCL
Mittwoch, 17. Juni 2009 88
Conclusion

Mittwoch, 17. Juni 2009 89


- minimize possibility for errors first
- then maximize performance
- in your domain: no low-level parallel programming, create frameworks that structure parallelism, allow you to express while framework manages parallelism and uses implicit syncing
- think tasks and workloads and avoid side effects
- use double buffering and message passing to decouple calculations and allow deferring of computations
- use ownership management and instanced random number generators and lockstep mode in systems to enable determinism - be order of computation independent
- differentiate reading and writing, input and output to make dependencies explicit and to identify aspects of your computation and collect aspects in staged systems
- optimize for throughput and balance max. locality and min. sharing to exploit the hardware
- no rocket science, just software engineering
- use parallelism to push your AI further
Understand
Parallel Hardware

Mittwoch, 17. Juni 2009 90


Mittwoch, 17. Juni 2009 91

Prevent errors - than care for performance


Know the hardware: optimize throughput, balance max locality with min sharing
Error-freeness first

Mittwoch, 17. Juni 2009 91

Prevent errors - than care for performance


Know the hardware: optimize throughput, balance max locality with min sharing
Error-freeness first
Performance afterwards

Mittwoch, 17. Juni 2009 91

Prevent errors - than care for performance


Know the hardware: optimize throughput, balance max locality with min sharing
Parallelization
Approaches
as
Guides

Mittwoch, 17. Juni 2009 92

Use approaches as guides how to tap into parallelism


Mittwoch, 17. Juni 2009 93
KISS

Mittwoch, 17. Juni 2009 93


Keep It Short and Simple

Mittwoch, 17. Juni 2009 94


… thats it

Mittwoch, 17. Juni 2009 95


Mittwoch, 17. Juni 2009 96
Email: bknafla@member.igda.org

Mittwoch, 17. Juni 2009 96


Email: bknafla@member.igda.org

Twitter: @bjoernknafla

Mittwoch, 17. Juni 2009 96


Thank You!

Mittwoch, 17. Juni 2009 97


Three Great
References
• Joe Valenzuela (Insomniac) “SPU gameplay”
presentation from GDC ’09 - http://
www.insomniacgames.com/tech/
techpage.php
• OpenCL or Nvidia’s CUDA documentation
• James Reinders “Intel Threading Building
Blocks”, O’Reilly, 2007

Mittwoch, 17. Juni 2009 98