A simple aircraft collision avoidance POMDP in Julia (part of POMDPs.jl).
Included is an implementation of the unscented Kalman filter for belief updating.
Climb action | Descend action |
---|---|
@with_kw mutable struct CollisionAvoidancePOMDP <: POMDP{Vector{Float64}, Float64, Vector{Float64}}
h_rel_range::Vector{Real} = [-150, 150] # initial relative altitudes [m]
dh_rel_range::Vector{Real} = [-1e-6, 1e-6] # initial relative vertical rates [m²]
ddh_max::Real = 1.0 # vertical acceleration limit [m/s²]
τ_max::Real = 40 # max time to closest approach [s]
actions::Vector{Real} = [0.0, -5, 5] # relative vertical rate actions [m/s²]
a_prev_zero::Bool = true # whether to update `a_prev` when the action is zero
collision_threshold::Real = 50 # collision threshold [m]
reward_collision::Real = -100 # reward obtained if collision occurs
reward_reversal::Real = -1 # reward obtained if action reverses direction (e.g., from +5 to -5)
reward_alert::Real = -1 # reward obtained if alerted (i.e., non-zero vertical rates)
apply_continuous_alerting_cost::Bool = false # apply penalty during any alert, not just the first alert
apply_min_separation_cost::Bool = false # apply penalty based on separation to be minimized
px = DiscreteNonParametric([1, 0.0, -1], [0.25, 0.5, 0.25]) # transition noise on relative vertical rate [m/s²]
σobs::Vector{Real} = [15, 1, eps(), eps()] # observation noise [h_rel, dh_rel, a_prev, τ]
γ::Real = 0.99 # discount factor
end
] add https://github.com/sisl/CollisionAvoidancePOMDP.jl
using CollisionAvoidancePOMDPs
pomdp = CollisionAvoidancePOMDP()
up = CASBeliefUpdater(pomdp)
policy = RandomPolicy(pomdp)
h = simulate(HistoryRecorder(), pomdp, policy, up)
Online CAS policy | Value function estimate | Probability of failure estimate |
---|---|---|
(Derivative free! How clean!)
An implementation of the unscented Kalman filter (UKF) is included as a belief Updater
.
Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray, "Algorithms for Decision Making", Chapter 19: Beliefs, MIT Press, 2022.
using CollisionAvoidancePOMDPs
pomdp = CollisionAvoidancePOMDP()
up = UKFUpdater(pomdp; λ=1.0)
ds = initialstate(pomdp)
b::UKFBelief = initialize_belief(up, ds)
s = rand(b)
a = rand(actions(pomdp))
o = rand(observation(pomdp, a, s))
b′ = update(up, b, a, o)
@with_kw mutable struct UKFUpdater{P<:POMDP} <: Updater
pomdp::P
Σₛ # state covariance matrix
Σₒ # obs. covariance matrix
λ # sigma point spread parameter
end
@with_kw mutable struct UKFBelief
μ = missing # mean vector
Σ = missing # covariance matrix
ϵ = 1e-6 # added to covariance for numerical stability in sampling
end
Predict where the agent is going based on the nonlinear transition function
Reconstruct updated mean and covariance based on a nonlinear transform
If we wanted to reconstruct our provided mean and covariance using the generated sigma points
- Update observation model using predicted mean and covariance.
- Calculate the cross covariance matrix (measures the variance between two multi-dimensional variables; here it's the transition prediction
$𝛍_p$ and observation model update$𝛍_o$ ). - Update mean and covariance of our belief.
function POMDPs.update(up::UKFUpdater, b::UKFBelief, a, o)
μ, Σ, λ = b.μ, b.Σ, up.λ
w = weights(μ, λ)
# Predict
fₜ = s -> rand(transition(up.pomdp, s, a))
μₚ, Σₚ, _, _ = unscented_transform(μ, Σ, fₜ, λ, w)
Σₚ += up.Σₛ
# Update
fₒ = sp -> rand(observation(up.pomdp, sp))
(μₒ, Σₒ, Sₒ, Sₒ′) = unscented_transform(μₚ, Σₚ, fₒ, λ, w)
Σₒ += up.Σₒ
# Calculate the cross covariance matrix
Σₚₒ = cross_cov(μₚ, μₒ, w, Sₒ, Sₒ′)
# Apply Kalman gain belief
K = Σₚₒ / Σₒ # Kalman gain
μ′ = μₚ + K*(o - μₒ) # updated mean
Σ′ = Σₚ - K*Σₒ*K' # updated covariance
return UKFBelief(μ′, Σ′, b.ϵ)
end
Create a set of sigma point samples as an approximation for
The sigma points are associated with the weights: