Sei sulla pagina 1di 16

Planning to Chronicle

Hazhar Rahmani1 , Dylan A. Shell2 , and Jason M. O’Kane1


1
University of South Carolina, Department of Computer Science and Engineering
2
Texas A&M University, Department of Computer Science and Engineering

Abstract. An important class of applications entails a robot moni-


toring, scrutinizing, or recording the evolution of an uncertain time-
extended process. This sort of situation leads to an interesting family
of planning problems in which the robot is limited in what it sees and
must, thus, choose what to pay attention to. The distinguishing char-
acteristic of this setting is that the robot has influence over what it
captures via its sensors, but exercises no causal authority over the evolv-
ing process. As such, the robot’s objective is to observe the underlying
process and to produce a ‘chronicle’ of occurrent events, subject to a
goal specification of the sorts of event sequences that may be of interest.
This paper examines variants of such problems when the robot aims to
collect sets of observations to meet a rich specification of their sequen-
tial structure. We study this class of problems by modeling a stochastic
process via a variant of a hidden Markov model, and specify the event
sequences of interest as a regular language, developing a vocabulary of
‘mutators’ that enable sophisticated requirements to be expressed. Un-
der different suppositions about the information gleaned about the event
model, we formulate and solve different planning problems. The core un-
derlying idea is the construction of a product between the event model
and a specification automaton. The paper reports and compares perfor-
mance metrics by drawing on some small case studies analyzed in depth
in simulation.

Keywords: Planning, Story-telling, Reconnoitering, Raconteuring

1 Motivation and Introduction


This paper is about robotic planning problems in which the goals are expressed
as time-extended sequences of discrete events whose occurrence the robot cannot
causally influence. As a concrete motivation for this sort of setting, consider the
proliferation of home videos. These videos are, with remarkably few exceptions,
crummy specimens of the cinematic arts. They fail, generally, to establish and
then bracket a scene; they often founder in emphasizing the importance of key
subjects within the developing action, and are usually unsuccessful in attempts
to trace an evolving narrative arc. And the current generation of autonomous
This work was graciously supported, in part, by the National Science Foundation
through awards IIS-1453652, IIS-1849249, and IIS-1849291.
2 Rahmani, Shell, and O’Kane

personal robots and video drones, in their roles as costly and glorified ‘selfie
sticks,’ are set to follow suit. The trouble is that capturing footage to tell a story
is challenging. A camera can only record what you point it toward, so part of
the difficulty stems from the fact that you can’t know exactly how the scene will
unfold before it actually does. Moreover, what constitutes structure isn’t easily
summed up with a few trite quantities. Another part of the challenge, of course,
is that one has only limited time to capture video footage.
Setting aside pure vanity as a motivator, many applications can be cast as
the problem of producing a finite-length sensor-based recording of the evolution
of some process. As the video example emphasizes, one might be interested
in recordings that meet rich specifications of the event sequences that are of
interest. When the evolution of the event-generating process is uncertain/non-
deterministic and sensing is local (necessitating its active direction), then one
encounters an instance from this class of problem. The broad class encompasses
many monitoring and surveillance scenarios. An important characteristic of such
settings is that the robot has influence over what it captures via its sensors, but
cannot control the process of interest.
Our incursion into this class of problem involves two lines of attack. The first
is a wide-embracing formulation in which we pose a general stochastic model,
including aspects of hidden/latent state, simultaneity of event occurrence, and
various assumptions on the form of observability. Secondly, we specify the se-
quences of interest via a deterministic finite automaton (DFA), and we define
several language mutators, which permit composition and refinement of specifi-
cation DFAs, allowing for rich descriptions of desirable event sequences. The two
parts are brought together via our approach to planning: we show how to com-
pute an optimal policy (to satisfy the specifications as quickly as possible) via
a form of product automaton. Empirical evidence from simulation experiments
attests to the feasibility of this approach.
Beyond the pragmatics of planning, a theoretical contribution of the paper
is to prove a result on representation independence of the specifications. That
is, though multiple distinct DFAs may express the same regular language and
despite the DFA being involved directly in constructing the product automaton
used to solve the planning problem, we show that it is merely the language ex-
pressed that affects the resulting optimal solution. Returning to mutators that
transform DFAs, enabling easy expression of sophisticated requirements, we dis-
tinguish when mutators preserve representational independence too.

2 Related Work
Our interest in understanding robot behavior in terms of the robots’ observations
of a sequence of discrete events is, of course, not unique. The story validation
problem [24, 25] can be viewed as an inverse of our problem. The aim there is to
determine whether a given story is consistent with a sequence of events captured
by a network of sensors in the environment. In our problem, it is the robot that
needs to capture a sequence of events that constitute a desired story.
Video summarization is the problem of making a ‘good’ summary of a given
video by prioritizing sequences of frames based on some selection criterion (im-
Planning to Chronicle 3

portance, representativeness, diversity, etc.). Various approaches include identi-


fying important objects [9], finding interesting events [4], selection using super-
vised learning [3], and finding inter-frame connections [11]. For a survey on video
summarization see [23], which one might augment with the more recent results
of [6, 12, 13, 26]. Girdhar and Dudek [2] considered the related vacation snap-
shot problem, in which the goal is to retain a diverse subset from data observed
by a mobile robot. However, in such summarization techniques, the problem is
essentially to post-process a collection of images already recorded. This paper,
by contrast, addresses the problem of deciding which video segments the robot
should attempt to capture in the first place.
For text-based and interactive narratives, a variety of methods are known for
narrative planning and generating natural language stories [5, 15, 16].
Closely related research to the present paper is [21], which introduces the idea
of using a team of autonomous robots, coordinated by a planner, to capture a
sequence of events specifying a given narrative structure. That work raised (but
did not answer) several questions, among which is how the robot can formulate
effective plans to capture events relevant to the story specification. Here we build
upon that prior effort showing how such plans can be formed in a principled way.
Related to our problem are also the theories of Markov decision processes
(MDPs) and partially observable Markov decision processes (POMDPs), which
are surveyed in [1, 8, 17, 20]. We solve our problem by constructing a product of
the event model and the specification, which together yield a specific POMDP.

3 The Problem
First, we introduce the basic elements of our model and problem formalization.

3.1 Events and observations


The essential objects of interest are events, that is, atomic occurrences situated
at specific times and places. We propose to treat each event as a letter drawn
from a finite alphabet E, a set which contains all possible events. Accordingly,
any finite sequence of events, in particular a story ξ the robot wants to record
from the events that occur in the system, is a finite word in E ∗ .
We model the occurrence of events using a structure defined as follows.
Definition 1 (event model). An event model M = (S, P, s0 , E, g) is a tuple
in which (1) S, which is a nonempty finite set, is the state space of the model; (2)
P : S×S → [0, 1] is Pthe transition probability function of the model, such that for
each state s ∈ S, s0 ∈S P(s, s0 ) = 1; (3) s0 ∈ S is the initial state; (4) E is the
set of all possible events; and (5) g : S → 2E is a labeling function assigning, to
each state, the (possibly empty) set of events that are occurring (mnemonically,
‘going-on’) simultaneously at that state. We assume that g(s0 ) = ∅.
An execution of the model starts from the initial state s0 and then, at each time
step k, the system makes a transition from state sk to state sk+1 , the latter
being chosen randomly based on P from those states for which P(sk , ·) > 0.
This execution specifies a path s0 s1 · · · . For every time step k, when the system
enters state sk , each event in g(sk ) occurs simultaneously.
4 Rahmani, Shell, and O’Kane

We are interested in scenarios in which a robot is tasked with recording


certain sequences of events. We model the state of the event model as only
partially observable to the robot. That is, the current state sk of the event model
is hidden from the robot, but the system instead emits an output observable to
the robot at each time step. The next definition formalizes the idea.
Definition 2 (observation model). For a given event model M = (S, P, s0 ,
E, g), an observation model B = (Y, h) is a pair in which (1) Y is a set of
observations or outputs; (2) h : S × Y → [0, 1] is P the emission probability
function of the model, such that for each state s ∈ S, y∈Y h(s, y) = 1.
At each time step, when the system enters a state sk , it emits an output yk ,
drawn according to h(sk , ·). The emitted output yk is observable to the robot.
An event model and observation model can be depicted together as a directed
graph (e.g., see Figure 1a), where we show each state’s events as an attached
set (in braces in the figure) and display observations from Y along with their
emission probabilities (in brackets). We consider, as important special cases, two
particular types of observation models.
Definition 3. Given an event model M = (S, P, s0 , E, g) with observation model
B = (Y, h), we say that B makes M fully observable if (1) Y = S, and (2) h(s, y) =
1 if and only if s = y.
We write Bobs (M) to denote the unique observation model that makes M fully
observable. At the other extreme, another special event model is one in which
the emitted outputs do not help at all to reduce uncertainty.
Definition 4. Given an event model M = (S, P, s0 , E, g) with observation model
B = (Y, h), then B causes the event model to be fully hidden if the observation
space Y is a singleton set.
Since the particular single observation comprising Y is unimportant, by Bhid (M)
we denote some observation model making M fully hidden.

3.2 Story specifications, belief states, and policies


As the system evolves along s0 s1 s2 · · · , the robot attempts to record some of the
events that actually occur in the world to form a story ξ ∈ E ∗ . We specify the
desired story using a deterministic finite automaton (DFA) D = (Q, E, δ, q0 , F ),
where Q is its state space, E is its alphabet, δ : Q × E → Q is its transition
function, q0 its initial state, and F ⊆ Q is the set of all final (accepting) states
of the automaton. In other words, we want the robot to make a story ξ in the
language of D, denoted L(D), which is the set of all strings in E ∗ that when are
tracked from q0 , the automaton reaches an accepting state.
The semantics of event capture are as follows. At each step k ≥ 0, the robot
chooses one event e from E to attempt to record in the next step, k + 1. If any
of the actual events that do happen at step k + 1 (i.e., any of the events in
g(sk+1 )) match the robot’s prediction, then the robot successfully records this
event; otherwise, it records nothing. The robot is aware of the success or failure
of each of its attempts. The robot stops making guesses and observations once
it has recorded a desired story—a story in L(D).
Planning to Chronicle 5

To estimate the current state, theP robot maintains, at each time step k, a
belief state bk : S → [0, 1], in which s∈S bk (s) = 1. For each s ∈ S, bk (s) repre-
sents the probability that the event model is in state s at time step k, according
to the information available to the robot, including both the observations emit-
ted directly by the event model, along with the sequence of successes or failures
in recording events. It also maintains, for each time k, the sequence ξk of events
it has recorded until time step k, and the (unique) DFA state qk obtained by ξk .
The robot’s predictions are governed by a policy π : ∆(S) × Q → E that
depends on the belief state and the state of the DFA. At time step k + 1, the
robot may append a recorded event to ξk via the following formula:
(
ξk π(bk , qk ) π(bk , qk ) ∈ g(sk+1 )
ξk+1 = (1)
ξk π(bk , qk ) ∈
/ g(sk+1 ).
The initial condition is that ξ0 = , in which  is the empty string. The robot
changes the value of variable qk only when the guessed event actually happened:
(
δ(qk , π(bk , qk )) π(bk , qk ) ∈ g(sk+1 )
qk+1 = (2)
qk π(bk , qk ) ∈
/ g(sk+1 ).
The robot stops when qk ∈ F .
3.3 Optimal recording problems
The robot’s goal is to record a story (or video) as quickly as possible. We consider
this problem in three different settings: a general setting without any restriction
on the event model, a setting in which the event model is fully observable, and
a final one in which the event model is fully hidden. First, the general setting.
Problem: Recording Time Minimization (Rtm)
Input: An event set E, an event model M = (S, P, s0 , E, g) with observa-
tion model B = (Y, h), and a DFA D = (Q, E, δ, q0 , F ).
Output: A policy minimizing the expected number of steps k until ξk ∈ L(D).

Note that k is not necessarily the length of the resulting story ξk , but rather
is the number of steps the system runs to capture that story. In fact, |ξk | ≤ k.
The second setting constrains the system to be fully observable.
Problem: RTM with Fully Observable Model (Rtm/Fom)
Input: An event set E, an event model M = (S, P, s0 , E, g), and a DFA
D = (Q, E, δ, q0 , F ).
Output: A policy that, under observation model Bobs (M), minimizes the
expected number of steps k until ξk ∈ L(D).

In this setting, because states are fully observable to the robot, we might
have defined the policy as a function over S × Q rather than over ∆(S) × Q.
Nonetheless, our current definition does not pose any problem. Any reachable
belief state in this setting considers only a single outcome (i.e., given any k,
bk (s) = 1 for exactly one s ∈ S) and thus, we are interested in the optimal
policy only for those reachable beliefs.
The third setting assumes a fully hidden event model state.
6 Rahmani, Shell, and O’Kane

Problem: RTM with Fully Hidden Model (Rtm/Fhm)


Input: An event set E, an event model M = (S, P, s0 , E, g), and a DFA
D = (Q, E, δ, q0 , F ).
Output: A policy that, under observation model Bhid (M), minimizes the
expected number of steps k until ξk ∈ L(D).

4 Algorithm Description
Next we give an algorithm for Rtm, which also solves Rtm/Fom and Rtm/Fhm,
which are basically the same Rtm problem but with two special kinds of event
models as inputs to the problem.

4.1 The Goal POMDP


The first step of the algorithm constructs a specific partially observable Markov
decision process (POMDP), which we term the Goal POMDP, as follows:
Definition 5 (Goal POMDP). For an event model M = (S, P, s0 , E, g) with
observation model B = (Y, h), and a DFA D = (Q, E, δ, q0 , F ), the associated
Goal POMDP is a tuple P(M,B;D) = (X, A, b0 , T, XG , Z, O, c), in which
1. X = S × Q is the state space;
2. A = E is the action space;
3. b0 ∈ ∆(X) is the initial belief state, in which b0 (x) = 1 iff x = (s0 , q0 );
4. T : X × A × X → [0, 1] is the transition probability function such that for
each e ∈ E and (s, q), (s0 , q 0 ) ∈ X,
0
/ F, q 0 = δ(q, e), and e ∈ g(s0 )

P(s, s ) if q ∈

(4.a)
0 0 0
P(s, s ) if q ∈
/ F, q = q, and e ∈
/ g(s ) (4.b)

T((s, q), e, (s0 , q 0 )) = 0 0

 1 if q ∈ F, q = q, and s = s (4.c)

0 otherwise;
5. XG = S × F is the set of goal states;
6. Z = ({True, False} × Y ) ∪ {⊥} is the set of observations;
7. O : A × X × Z → [0, 1] is the observation probability function such that for
each e ∈ E, s ∈ S, q ∈ Q, and y ∈ Y :
(a) O(e, (s, q), (True, y)) = h(s, y) if q ∈/ F and e ∈ g(s),
(b) O(e, (s, q), (False, y)) = 0 if q ∈
/ F and e ∈ g(s),
(c) O(e, (s, q), (False, y)) = h(s, y) if q ∈
/ F and e ∈/ g(s),
(d) O(e, (s, q), (True, y)) = 0 if q ∈/ F and e ∈/ g(s),
(e) O(e, (s, q), ⊥) = 1 if q ∈ F ;
8. c : X × A → R≥0 is the cost function such that for each x ∈ X and a ∈ A,
c(x, a) = 1 if x ∈/ XG , and c(x, a) = 0 otherwise.

Each state of this POMDP is a pair (s, q) indicating the situation where, under
an execution of the system, the current state of the event model is s and the
current state of the DFA is q. For each x, x0 ∈ X and a ∈ A, T(x, a, x0 ) gives
the probability of transitioning from state x to state x0 under performance of
action a. In the context of our event model, each transition corresponds to a
situation where the robot chooses an event e to observe and the event model
Planning to Chronicle 7

a 1 a
1
a (False , x)
s 0, q 0 s 1, q 1
( False , y) .6
[ x :.7, y : .3] .3 b .8 a b 1
{} .8 s 2, q 1 .7
b .4
.6
s0 s1 a ,b .3 1 b .2 b 1
{a } a 1 a
[ x :1, y :0] .2 s 2, q 0
.4 .4
a ,b 1
q0 q1 a .2 a 1 b
s2 {b } .3
1 b s 1, q 0
[ x :0, y :1] 1 .6 .7
a) b) (True , y) b c) .8 b a (True , x )

Fig. 1: a) An event model M with its observation model B. b) A DFA D, specifying


event sequences that contain at least one event. c) The Goal POMDP P(M,B;D) , con-
structed by Definition 5. (Self-loop transitions of the goal states have been omitted to
try reduce visual clutter.)

makes a transition from a state s to s0 . If e appears in g(s0 ), then the robot


records e and then changes the current state of the DFA to δ(q, e); otherwise, it
does nothing and the DFA remains in state q. These correspond to cases (4.a)
and (4.b) above, respectively. Case (4.c) makes all the goal states of the POMDP
absorbing states. The goal states of the POMDP are those in which the robot
has recorded a story, i.e., the current state of the specification DFA is accepting.
For each a ∈ A, x ∈ X, and z ∈ Z, the function O(a, x, z) is an observation
model, its value being the probability of observing z given that the system has
entered state x via action a. The POMDP has a special observation, ⊥, which is
observed only when a goal state is reached. Any other observation is a pair (r, y)
where r ∈ {True, False} discloses whether the robot’s prediction was correct—
the event did happen—or not, and y indicates the sensed observation the robot
made (as per B). Rules 7a–7d ensure that the first element of the observation
pair informs the robot whether its prediction was correct. To see this, if the robot
has predicted e to occur, the event model has entered state s such that e ∈ g(s),
and the robot has made an observation y, then the probability of observing
(True, y) by entering to state (s, q) via action e is equal h(s, y) (case 7a). If event
e∈/ g(s), then the robot’s prediction has to be wrong, and thus, the probability
of observing (False, y) in state (s, q) when it is reached via action e is h(s, y)
(expressed in case 7c). Cases 7b and 7d ensure that there is no misreporting
of the correctness of the prediction. Case 7e indicates the observation that the
robot has completed recording of a story in L(D).
Figure 1 illustrates this construction for an elementary example.
4.2 Solving the Goal POMDP
A POMDP is usually formulated as a fully observable MDP called a belief MDP
whose (continuous) state space consists of the belief space of the POMDP. Ac-
cordingly, in the belief MDP from Goal POMDP P = (X, A, b0 , T, XG , Z, O, c),
for each belief state b ∈ ∆(X), action a ∈ A, and observation z ∈ Z, we de-
note the updated belief state of b after action a and observation z by baz . It is
computed as follows:
O(a, x, z) x0 ∈X T(x0 , a, x)b(x)
P
a
bz (x) = P r(x|z, a, b) = , (3)
P r(z|a, b)
8 Rahmani, Shell, and O’Kane

in which, X X
P r(z|a, b) = O(z, a, x) T(x0 , a, x)b(x). (4)
x∈X x0 ∈X
0
For
P this belief MDP, the cost of each0 action a at belief state b is c (b, a) =
x∈X b(x)c(x, a), which in our case, c (b, a) = 1 if b is a not a goal belief state,

and otherwise c0 (b, a) = 0. An optimal policy π 0 : X → A for this MDP is
formulated as a solution to the Bellman recurrences
∗ ∗
X
V 0 (b) = min c0 (b, a) + P r(z|a, b)V 0 (baz ) ,

(5)
a∈A
z∈Z
∗ ∗
X
π 0 (b) = arg min c0 (b, a) + P r(z|a, b)V 0 (baz ) .

(6)
a∈A
z∈Z

One can use any standard technique to solve these recurrences. For surveys
on methods, see [1, 17, 20]. An optimal policy computed via these recurrences
prescribes, for any belief state reachable from b0 , an optimal action to execute.
Accordingly, the robot executes at each step, the action given by the optimal
policy, and then updates its belief state via (3). One can show, via induction, that
at each step i, there is a unique qi ∈ Q such that belief state bi has outcomes only
for (but probably not all) xj = (sj , qi ) ∈ X, j = 1, 2, · · · |S|. As such, function
β : ∆(X) → ∆(S) × Q maps each bi of those belief states to a tuple (d, qi ), where

for each s ∈ S, d(s) = b((s, qi )). Subsequently, the optimal policy π 0 computed
for P(M,B;D) can be mapped to an optimal solution π ∗ : ∆(S) × Q → A to Rtm,

by interpreting π ∗ (β(bi )) = π 0 (bi ), for each reachable belief state bi ∈ ∆(X).

4.3 Solving Rtm/Fom via Goal MDP


The previous construction can be used to solve Rtm/Fom too, but, given that
the event model fed into Rtm/Fom is fully observable, to improve solution
tractability, it is more sensible to construct a Goal MDP. To do so, for the event
model and the DFA in Definition 5, the Goal MDP M = (X, A, b0 , T, XG , c)
embedded in the POMDP P in that definition is extracted and then an optimal

policy for M is solved. An optimal policy π 00 for the MDP is a function over
X = S × Q, which is computed via the Bellman equations
∗ ∗
X
V 00 (x) = min c(x, a) + V 00 (x0 )T(x, a, x0 ) ,

(7)
a∈A
x0 ∈X
∗ ∗
X
π 00 (x) = arg min c(x, a) + V 00 (x0 )T(x, a, x0 ) .

(8)
a∈A
x0 ∈X

These equations may be solved by a variety of methods (see [8, Chp. 10] for a
survey). In the evaluation reported below, we use standard value iteration. After

computing π 00 , for each x = (s, q) ∈ X, we make a belief state b ∈ ∆(S) such

that b(s ) = 1 if and only if s0 = s, and then set π ∗ (b, q) = π 00 ((s, q)), where
0
∗ ∗
π is an optimal solution to Rtm/Fom. Observe that π for Rtm/Fom is only
computed for finitely many pairs (b, q), those in which b is a single outcome.

5 Representation-invariance of expected time


Notice that the event selected by the policy π ∗ at each step depends, in part, on
the current state of the specification DFA. Because a single regular language may
Planning to Chronicle 9

be represented with a variety of distinct DFAs with different sets of states —and
thus, their optimal policies cannot be identical— one might wonder whether the
expected execution time achieved by their computed policies depends on the
specific DFA, rather than only on the language. The question is particularly
relevant in light of the language mutators we examine in Section 6. Here, we
show that the expected number of steps required to capture a story within a
given event model does indeed depend only on the language specified by the
DFA, and not on the particular representation of that language.
For a DFA D = (Q, E, δ, q0 , F ), we define a function f : Q → {0, 1} such
that for each q ∈ Q, f (q) = 1 if q ∈ F , and otherwise, f (q) = 0. Now consider
the well-known notion of bisimulation, defined as follows:
Definition 6 (bisimulation [18]). Given DFAs D = (Q, E, δ, q0 , F ) and D0 =
(Q0 , E, δ 0 , q00 , F 0 ), a relation R ⊆ Q × Q0 is a bisimulation relation for (D, D0 ) if
for any (q, q 0 ) ∈ R: (1) f (q) = f 0 (q 0 ); (2) for any e ∈ E, (δ(q, e), δ 0 (q 0 , e)) ∈ R.
Bisimulation implies language equivalence and vice versa.
Proposition 1. ([18]) For two DFAs D = (Q, E, δ, q0 , F ), D0 = (Q0 , E, δ 0 , q00 , F 0 ),
we have L(D) = L(D0 ) iff (q0 , q00 ) ∈ R for a bisimulation relation R for (D, D0 ).
Bisimulation is preserved for any reachable pairs. The state to which a DFA
with transition function δ reaches by tracking an event sequence r from state q
is denoted δ ∗ (q, s).
Proposition 2. If (q, q 0 ) are related by a bisimulation relation R for (D, D0 ),

then for any r ∈ E ∗ , (δ ∗ (q, r), δ 0 (q 0 , r)) ∈ R.
We now define a notion of equivalence for a pair of belief states.
Definition 7. Given an event model M = (S, P, s0 , E, g), an observation model
B = (Y, h) for M, DFAs D = (Q, E, δ, q0 , F ) and D0 = (Q0 , E, δ 0 , q00 , F 0 ) such
that L(D) = L(D0 ), let P(M,B;D) = (X, A, b0 , T, XG , Z, O, c) and P(M,B;D 0
0) =

(X , A, b0 , T , XG , Z, O , c ). For two reachable belief states b ∈ ∆(X) and b0 ∈


0 0 0 0 0 0

∆(X 0 ), with β(b) = (d, q) and β 0 (b0 ) = (d0 , q 0 ), we say that b0 is equivalent to
b, denoted b ≡ b0 , if (1) (q, q 0 ) are related by a bisimulation relation for (D, D0 )
and that (2) d = d0 , i.e. for each s ∈ S, d(s) = d0 (s).
Equivalence is preserved for updated belief states.
Lemma 1. Given the structures in Definition 7, let b ∈ ∆(X) and b0 ∈ ∆(X 0 ) be
two reachable belief states such that b ≡ b0 . For any action a ∈ A and observation
a
z ∈ Z, it holds that baz ≡ b0 z and that P r(z|a, b) = P r(z|a, b0 ).
Note that for a Goal POMDP P with initial belief state, V ∗ (b0 ) is the ex-
pected cost of reaching a goal belief state by an optimal policy for P. We now
present our result.

Theorem 1. For the structures in Definition 7, it holds that V ∗ (b0 ) = V 0 (b00 ).
Proof. For a belief MDP M, let T ree(M) to be its tree-unravelling—the tree
whose paths from the root to the leaf nodes are all possible paths in M that
start from the initial belief state. A policy π for M chooses a fixed set of paths
over T ree(M), and the expected cost of reaching a goal belief state under π is
10 Rahmani, Shell, and O’Kane

P
equal to p∈GoalP aths(π,T ree(M)) C(p) ∗ W (p), where GoalP aths(π, T ree(M))
is the set of all paths that are chosen by π and reach a goal belief state from
the root of T ree(M), C(p) is the sum of costs of all transitions in path p, and
W (p) is the product of the probability values of all transitions in p. The idea
is that if we can overlap the tree-unravellings of the belief MDPs P(M,B;D) and
0
P(M,B;D 0 ) in such a way that each pair of overlapped belief states are equivalent

in the sense of Definition 7 and that each pair of overlapped transitions have
the same probability and the same cost, then for each pair of overlapped belief
states b ∈ ∆(X) and b0 ∈ ∆(X 0 ), if we use π ∗ (b) as the decision at the belief

state b0 , then because those fixed paths are overlapped, then V ∗ (b0 ) ≥ V 0 (b00 ).
∗ 0∗ 0 ∗ 0∗ 0
And, in a similar fashion, V (b0 ) ≤ V (b0 ), and thus, V (b0 ) = V (b0 ). The
following construction makes those trees and shows how we can overlap them.
For an integer n ≥ 1, we can make two trees Tn and Tn0 by the following
procedure. (1) Set b0 as the root of Tn and set b00 as the root of Tn0 ; make a
relation R and set R ← {(b0 , b00 )}. (2) While |Tn | < n, extract a pair (b, b0 )
from R that has not been checked yet and in which b and b0 are not goal belief
a
states; for each action a and observation z, compute baz and b0 z , add node baz
a 0a 0 0a 0
and edge (b, bz ) to T , and add node b z and edge (b , b z ) to T ; label both edges
(a, z). Also assign to edge (b, baz ), P r(z|a, b) as its probability value, and set the
z
probability value of (b0 , b0 a ), P r(z|a, b0 ); the cost of each edge is set 1.
Given that L(D) = L(D0 ), by Proposition 1, states q0 and q00 are related by a
bisimulation relation for (D, D0 ), which by Definition 7 and the construction in
Definition 5 implies that b0 ≡ b00 . This combined with Lemma 1 implies that for
each pair (b, b0 ) ∈ R, b ≡ b0 . We now overlap Tn and Tn0 such that each pair (b, b0 )
that are related by R are overlapped. By Lemma 1, each pair of overlapped edges
have the same probability value and the same cost value. Since for any integer
n ≥ 0 we can overlap trees Tn and Tn0 in the desired way, we can overlap the
0
tree-unravellings of the belief MDPs of P(M,B;D) and P(M,B;D 0 ) in the desired

way too; this completes the proof.


The upshot of this analysis is that we can indeed attend only to the story
specification language (given indirectly via D) and that the specific presentation
of that language does not impact the expected number of steps to capture an
event sequence satisfying that specification.

6 Construction of Specification Languages


In this section we describe how one might construct, in a partially automated
way, specifications for a variety of interesting scenarios. The idea is to use a
variety of mutators to construct specification DFAs.

6.1 Multiple recipients

Suppose we would like to capture several videos, one for each of several recipients,
within a single execution. Given language specifications D1 , . . . , Dn ∈ D , where
D denotes the set of all DFAs over a fixed event set E, how can we form a
Planning to Chronicle 11

single specification that directs the robot to capture events that can be post-
processed into the individual output sequences? One way is via two relatively
simple operations on DFAs:
(MS ) A supersequence operation MS : D → D, where
L(MS (D)) = {w ∈ E ∗ | ∃w0 ∈ L(D), w0 is a subsequence of w}. (9)
This operation is produced by first treating D as a nondeterministic finite
automaton (NFA), then for each event and state, adding a transition labeled by
that event from that state to itself, and converting result back into a DFA [14].
(MI ) An intersection operation MI : D×D → D, under which L(MI (D1 , D2 )) =
L(D1 ) ∩ L(D2 ).
Based on these two operations, we can form a specification that asks the robot
to capture an event sequence that satisfies all n recipients as follows:
D = MI (MI (MS (D1 ), MS (D2 )) . . . , MS (Dn )) (10)
Then from any ξ ∈ L(D), we can produce a ξi ∈ L(Di ) by discarding (as a
post-production step) some events from ξ.

6.2 Mistakes were made

What should the robot do if it cannot capture an event sequence that fits its
specification D, either because some necessary events did not occur, or because
the robot failed to capture them when they did occur? One possibility is to
accept some limited deviation between the desired specification and what the
robot actually captures.
Let d : E ∗ ×E ∗ → Z+ denote the Levenshtein distance [10], that is, a distance
metric that measures the minimum number of insert, delete, and substitute
operations needed to transform one string into another. A mutator that allows
a bounded amount of such distance might be:
(ML ) A Levenshtein mutator ML : D × Z+ → D that transforms a DFA D
into one that accepts strings within a given distance from some string in L(D).
L(ML (D, k)) = {ξ | ∃ξ 0 ∈ L(D), d(ξ, ξ 0 ) ≤ k}. (11)
This mutation can be achieved using a Levenshtein automaton construc-
tion [7, 19]. Then, if the robot captures a sequence in L(ML (D, k)), it can be
converted to a sequence in L(D) by at most k edits. For example, an insertion
edit would perhaps require the undesirable use of alternative ‘stock footage’,
rendering of appropriate footage synthetically, or simply a leap of faith on the
part of the viewer. By assigning the costs associated with each edit appropriately
in the construction, we can model the relative costs of these kinds of repairs.

6.3 At least one good shot

In some scenarios, there are multiple distinct views available of the same basic
event. We may consider, therefore, scenarios in which this kind of good/better
correspondence is known between two events, and in which the robot should
12 Rahmani, Shell, and O’Kane

endeavor to capture, say, at least one better shot from that class. We define a
mutator that produces such a DFA:
(MG ) An at-least-k-good-shots mutator MG : D × E × E × Z+ → D, in which
MG (D, e, e0 , k) produces a DFA in which e0 is considered to be a superior version
of event e, and the resulting DFA accepts strings similar to those in L(D), but
with at least k occurrences of e replaced with e0 .
The construction makes a DFA in which D has been copied k + 1 times,
each called a level, with the initial state at level 1 and the accepting states at
level k + 1. Most edges remain unchanged, but each edge labeled e, at all levels
less than k + 1, is augmented by a corresponding edge labeled e0 that moves to
the next level. This guarantees that e0 has replaced e at least k times, before
any accepting state can be reached.

7 Case studies
In this section, we present two examples solved by a Python implementation of
our algorithm. For Rtm/Fom we form the Goal MDP, while for Rtm/Fhm and
Rtm we form a Goal POMDP. To solve the POMDP, we use APPL online (Ap-
proximate POMDP Planning Online) toolkit, which implements the DESPOT
algorithm [22]—one of the fastest known online solvers. We compare the results
for different observability conditions based upon the expected number of steps
the system runs until the robot records a desired story under an optimal policy.

7.1 Turisti Oulussa


William is a tourist visiting Oulu as shown in Figure 2a. William’s family
has secretly contracted a robotic videography company to record him seeing
the sights, specifically the Kauppahalli (k), the Hupisaaret park (h), and either
Tietomaa museum (t) or the Oulu Cathedral (c). The robot does not know
William’s specific plans, but it does know, through some statistics, that a typical
tourist moves among those districts according to the event model in Figure 2b.
The desired video is specified using the DFA in Figure 2c. The robot is
given other tasks to do aside from recording William, and thus, cannot merely
follow William; it must form a strategy that predicts which events to try to
capture. We conducted our experiments in three settings: (1) Rtm/Fom: the
robot always knows the current district in which William is located, perhaps by
the help of some static sensors; (2) Rtm: when the robot does not know at which
district William is currently located but there is a single useful observation, a
message sent from a security guard in district s1 , that informs the robot that
William is in district s1 whenever he is there; (3) Rtm/Fhm: the robot receives
no direct knowledge about William’s location. We computed the optimal policy
for Rtm/Fom, case (1), using the Goal MDP approach in Section 4.3. According
to this policy, the expected number of steps to record under a optimal policy
with full observability, a story satisfying the specification, is approximately 35.24.
To verify the correctness of the algorithm, we simulated the execution of this
policy 1,000 times. In each simulation, William followed a random path through
Planning to Chronicle 13

a) �1 �0 �8 {} .25 {} .25
b) 1 .25
�1 �0 �8
�2 .25 ..25 .25 .5 .2 .3
.25 .25
�7 .25 � 2 � 7 .25 .5
Hupisaaret Park
�3 Oulu cathedral �6 {k} .5
{h} .3
� 3 .5 {} {} .25 {c} {t}
.5 .2
Kauppahalli �4 �5 .25
�4 �5 �6
Tietomaa .25 .25
.25 .25
k ,h .25 .25
h hkt 71
tkh 33

k kch 121 hkc 320


q {h } q {k ,h }
t,c k ,h,t ,c
t,c 180
c) h k h,t,c d) 160
kht 7
h khc 45
140 kth 16

Number of simulations
k q {k ,h ,c−t } 120 hck 25
k q {k } q {h ,c− t } 100 htk 2

t,c 80
t,c t ,c k ,t ,c 60 chk 182 ckh 154
h h 40 thk 24
20
q {c−t } k q {k , c−t } 0

101-105
106-110
111-115
116-120
121-125
126-130
131-135
136-140
141-145
146-145
96-100
11-15
16-20
21-25
26-30
31-35
36-40
41-46
46-50
51-55
56-60
61-65
66-70
71-75
76-80
81-85
86-90
91-95
6-10

>150
hkt 117 Number of steps system ran hkt 144
tkh 52 tkh 16
e) hkc 347
f) kch 51
hkc 344
kch 51
kht 4 180 kht 8
180 khc 22 khc 21
160 kth 23 160 kth 18
140 hck 28 140 hck 31
Number of simulations

Number of simulations
120 120
100 htk 7 100
htk 7
80 chk 160 80 ckh 49
60 ckh 132 60 chk 239
thk 71
thk 57 40
40
20 20
0 0

101-105
106-110
111-115
116-120
121-125
126-130
131-135
136-140
141-145
146-145
96-100
11-15
16-20
21-25
26-30
31-35
36-40
41-46
46-50
51-55
56-60
61-65
66-70
71-75
76-80
81-85
86-90
91-95
6-10

>150
11-15
16-20

41-46

71-75
76-80
81-85
86-90
91-95

101-105
106-110
111-115
116-120
121-125
126-130
131-135
6-10

21-25
26-30
31-35
36-40

46-50
51-55
56-60
61-65
66-70

96-100

136-140
141-145
146-145
>150

Number of steps system ran Number of steps system ran

Fig. 2: a) Districts of Oulu that William is touring. b) An event model describing how
tourist visit those districts. Edges are labeled with transition probabilities. c) A DFA
specifying that the captured story must contain events k and h and at least one of c or t.
d) A histogram showing for a thousand simulations, the distribution of the number
of hours (steps) William (system) circulated (ran) until, under the full observability
assumption—the Rtm/Fom problem—the robot recorded a story specified by the DFA,
and a pie chart showing the distribution of recorded sequences in these simulations.
e) Histogram and pie chart for 1,000 simulations of the Rtm problem where the current
state of event model is observable to the robot only when William is in district s1 .
f ) Histogram and pie chart for 1,000 simulations of the Rtm/Fhm problem.

the city according to the event model in Figure 2b, and the robot executed
the computed policy to capture an event sequence satisfying the specification.
The average number of steps to record a satisfactory sequence for those 1,000
simulations was 35.16, quite close to the expected number of steps. Figure 2d
shows results of those simulations in form of a histogram and a pie chart.
For cases (2) and (3), our algorithm constructed a Goal POMDP, as described
by Definition 5, and supplied it to APPL to conduct 1,000 simulations. In case
(2), Rtm with a useful observation, the average number of steps to record a
desired story was 37.15, while in case (3), Rtm/Fhm, the average number of
steps was 45.32. Note how a single observation of whether William is in s1 helped
the robot to record a story considerably faster than when it did not have any
state information. Even a stream of quite limited information, if chosen aptly,
can be very useful to this kind of robot. The histograms and the pie charts for
these two cases are shown in Figure 2e and Figure 2f, respectively. Note the
difference between the histograms of those three settings.
14 Rahmani, Shell, and O’Kane

80
{c i } a) I i Ei B i C i D i S i b)
0 .1 .3 .2 .2 .2 Ii 60

Number of simulations
{ei } 0
 .1 .2 .2 .3 .2 Ei
Ci  .2 .1 .2 .3 .2 Bi
40

Ei
S i {s } P  0 
i
0 .1 .1 .3 .2 .3 Ci 20
Di Bi 0 .2 .3 .2 .1 .2 Di
  0
0 .2 .3 .3 .2 0  Si

103-107
108-112
113-117
118-122
98-102
13-17
18-22
23-27
28-32
33-37
38-42
43-47
48-52
53-57
58-62
63-67
68-72
73-77
78-82
83-87
88-92
93-97
5-12

>=123
{d i } {bi }
Number of steps system ran

80 80
c) d)
60 60
Number of simulations

Number of simulations
40 40

20 20

0 0

103-107
108-112
113-117
118-122

103-107
108-112
113-117
118-122
98-102

98-102
13-17
18-22
23-27
28-32
33-37
38-42
43-47
48-52
53-57
58-62
63-67
68-72
73-77
78-82
83-87
88-92
93-97

13-17
18-22
23-27
28-32
33-37
38-42
43-47
48-52
53-57
58-62
63-67
68-72
73-77
78-82
83-87
88-92
93-97
5-12

5-12
>=123

>=123
Number of steps system ran Number of steps system ran

Fig. 3: a) The event model for the behavior of a typical person in a party, which
has six states: Ii , the state of arriving; Ei , the state of being entertaining; Ci , for
consuming coffee; Bi , for drinking other beverages; Di , for dancing; and Si , for smoking.
b) The histogram of execution times for 500 simulations of the wedding reception
example for Rtm/Fom c) The histogram of execution times for 500 simulations of the
wedding reception example for Rtm with only one single observation of whether Chris
is currently smoking or not d) The histogram of execution times for 500 simulations
of the wedding reception example for Rtm/Fhm

7.2 Wedding reception

A videographer robot is asked to produce videos that convey different stories,


assembled from unpredictable events at a wedding reception. The wedding guests
include Alice, Bob, and Chris, and the events of interest for any of those guests
are: arriving at the reception, (i); dancing, (d); drinking coffee, (c); drinking
other beverages, (b); smoking, (s); and being entertained, (e). Each guest has
their own sense of the events they would like to see captured: Alice is mainly
interested in seeing Chris drinking or smoking, but also has plans to share the
last dance with Bob; Bob cares for nothing but seeing his own dancing through
the evening, but hopes to share the last dance with Alice; Chris does not care
to see any events at all, but Chris’s children are concerned about his unhealthy
habits, and so if Chris is drinking too much coffee or smoking too much, they
would like to know. The robot in that scenario is given three parallel objectives.
We can formalize those as languages, shown here for compactness as regular
expressions: for Alice, r1 = (s3 + c3 )+ d12 ; for Bob, r2 = (d2 + d12 + d23 )+ d12 ;
and for Chris, r3 = (s3 + c3 )(s3 + c3 )(s3 + c3 )+ . These three requests are encoded
using DFAs D1 , D2 , and D3 , respectively.
The behavior of each guest is modeled by the event model in Figure 3a,
in which P is the transition probability function of the model. The joint be-
havior of the three guests is modeled by an event model M obtained as the
Cartesian product of the models for the individuals, which has 63 states in this
example. The joint event model is further enhanced with joint events created
from single events. For example, d12 is the event in which Alice and Bob dance
Planning to Chronicle 15

together. To form a DFA D from the given specification DFAs, the robot uses
D = MI (MI (MS (D1 ), MS (D2 )), MS (D3 )).
Our implementation for this case study consists of 500 simulations for each
of the settings Rtm/Fom, Rtm/Fhm, and Rtm where the only observation is
if Chris is currently smoking or not, which could perhaps be sensed through a
smart smoke detector. The expected number of steps for an optimal policy for
Rtm/Fom is 35.2, and over the 500 simulations, the average number of steps to
record a story was 35.63, which is very close. The average number for Rtm with
a single useful observation and Rtm/Fhm were respectively 37.68 and 40.4.

8 Conclusions and future work


We have considered the problem of minimizing the expected time to record an
event sequence satisfying a set of specifications. This was posed as the problem
of computing an optimal policy in an associated Markov decision problem. Our
implementation has verified that as the robot’s ability to perceive the world
increases, the expected number of steps to record a desired story decreases.
Future work should consider several extensions. For instance, factoring in the
means needed to navigate in order to record an event, hence the objective might
minimize some expected cost rather than the expected number of steps. Or the
case where a set of events (rather than a single event), each assigned to a single
robot, may be predicted. Also the case where the robot is given new specification
DFAs to satisfy while recording stories for previous requests, or perhaps the case
where the robot needs to learn a new event model to describe the environment
owing to failure in predicting events.

References
1. B. Bonet and H. Geffner, “Solving POMDPs: RTDP-Bel versus point-based algo-
rithms,” in International Joint Conference on Artificial Intelligence, 2009.
2. Y. Girdhar and G. Dudek, “Efficient on-line data summarization using extremum
summaries,” in IEEE International Conference on Robotics and Automation, 2012,
pp. 3490–3496.
3. B. Gong, W.-L. Chao, K. Grauman, and F. Sha, “Diverse sequential subset se-
lection for supervised video summarization,” in Advances in neural information
processing systems, 2014, pp. 2069–2077.
4. M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool, “Creating summaries
from user videos,” in European conference on computer vision, 2014, pp. 505–520.
5. A. Jhala and R. M. Young, “Cinematic visual discourse: Representation, genera-
tion, and evaluation,” IEEE Transactions on computational intelligence and AI in
games, vol. 2, no. 2, pp. 69–81, 2010.
6. Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attention-based
encoder-decoder networks,” IEEE Transactions on Circuits and Systems for Video
Technology, 2019.
7. S. Konstantinidis, “Computing the edit distance of a regular language,” Informa-
tion and Computation, vol. 205, no. 9, pp. 1307–1316, Sep. 2007.
8. S. M. LaValle, Planning Algorithms. Cambridge, U.K.: Cambridge University
Press, 2006, available at http://planning.cs.uiuc.edu/.
16 Rahmani, Shell, and O’Kane

9. Y. J. Lee, J. Ghosh, and K. Grauman, “Discovering important people and objects


for egocentric video summarization,” in IEEE Conference on Computer Vision and
Pattern Recognition, 2012, pp. 1346–1353.
10. V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and
reversals,” Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710, 1966.
11. Z. Lu and K. Grauman, “Story-driven summarization for egocentric video,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
2013, pp. 2714–2721.
12. B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with
adversarial lstm networks,” in Proceedings of the IEEE conference on Computer
Vision and Pattern Recognition, 2017, pp. 202–211.
13. B. A. Plummer, M. Brown, and S. Lazebnik, “Enhancing video summarization via
vision-language embedding,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2017, pp. 5781–5789.
14. M. O. Rabin and D. Scott, “Finite automata and their decision problems,” IBM
Journal of Research and Development, vol. 3, no. 2, pp. 114–125, 1959.
15. M. O. Riedl and R. M. Young, “Narrative planning: Balancing plot and character,”
Journal of Artificial Intelligence Research, vol. 39, pp. 217–268, 2010.
16. J. Robertson and R. M. Young, “Narrative mediation as probabilistic planning,”
in Thirteenth Artificial Intelligence and Interactive Digital Entertainment Confer-
ence, 2017.
17. S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa, “Online planning algorithms for
POMDPs,” Journal of Artificial Intelligence Research, vol. 32, pp. 663–704, 2008.
18. J. Rot, M. Bonsangue, and J. Rutten, “Proving language inclusion and equivalence
by coinduction,” Information and Computation, vol. 246, pp. 62–76, 2016.
19. K. U. Schulz and S. Mihov, “Fast string correction with Levenshtein automata,”
International Journal on Document Analysis and Recognition, vol. 5, no. 1, pp.
67–85, 2002.
20. G. Shani, J. Pineau, and R. Kaplow, “A survey of point-based POMDP solvers,”
Autonomous Agents and Multi-Agent Systems, vol. 27, no. 1, pp. 1–51, 2013.
21. D. A. Shell, L. Huang, A. T. Becker, and J. M. O’Kane, “Planning coordinated
event observation for structured narratives,” in IEEE International Conference on
Robotics and Automation (ICRA), 2019, pp. 7632–7638.
22. A. Somani, N. Ye, D. Hsu, and W. S. Lee, “Despot: Online POMDP planning with
regularization,” in Advances in Neural Information Processing Systems, 2013, pp.
1772–1780.
23. B. T. Truong and S. Venkatesh, “Video abstraction: A systematic review and
classification,” ACM transactions on multimedia computing, communications, and
applications (TOMM), vol. 3, no. 1, pp. 3–es, 2007.
24. J. Yu and S. M. LaValle, “Cyber detectives: Determining when robots or people
misbehave,” in Algorithmic Foundations of Robotics IX. Springer, 2010, pp. 391–
407.
25. ——, “Story validation and approximate path inference with a sparse network of
heterogeneous sensors,” in 2011 IEEE International Conference on Robotics and
Automation. IEEE, 2011, pp. 4980–4985.
26. K. Zhang, K. Grauman, and F. Sha, “Retrospective encoders for video summariza-
tion,” in Proceedings of the European Conference on Computer Vision (ECCV),
2018, pp. 383–399.

Potrebbero piacerti anche