seqdec — sequential decision making under uncertainty
seqdec is a small C program that models a classic problem
in sequential decision making under uncertainty: a pizza restaurant
ordering sausage from a supplier. It evaluates the
order-up-to policy, in which a minimum threshold
(theta-min) triggers an order that refills inventory up to
a maximum threshold (theta-max). By simulating stochastic
demand over many time periods, the program searches the policy
parameter space and reports the pair that yields the highest profit.
The model also accounts for shipping fees and a free-shipping threshold, reflecting real considerations a small restaurant faces when placing supply orders. The work is inspired by Warren B. Powell's Reinforcement Learning and Stochastic Optimization (2022).
Goals
- Provide a compact, readable implementation of an order-up-to policy search.
- Demonstrate parallelism with OpenMPI across cores and nodes, using a leader/worker scheme to divide the parameter search.
- Run cleanly on both OpenBSD and GNU/Linux (including Slurm
clusters via
slurm-job.sh).
How it works
Rank 0 is the leader: it generates the stochastic demand scenarios,
broadcasts them to the workers, and partitions the policy parameter
ranges. Each worker evaluates its slice of the search space and
returns its best (theta-min, theta-max) pair along with
the resulting profit. The leader picks the global best and writes it
to standard output. See order-up(1) for
the full set of flags.
Building
Pick a worker count (typically up to the number of cores per node)
and build with FAST=1 for an optimized binary:
- GNU:
make -f Makefile.gnumake MT=128 FAST=1 - OpenBSD:
export OMPI_CC=/usr/bin/clang && make MT=128 FAST=1
Run with sbatch slurm-job.sh on a Slurm system, or
mpirun directly (see the run target in
the Makefile). The openmpi
package must be installed. Note that OpenMPI relies on
shmget(2), which precludes the use of
pledge(2).