Heuristic production control policies such as CONWIP, kanban, and other hyb
rid policies have been in use for years as better alternatives to MRP-based
push control policies. It is a fact that these policies, although efficien
t, are far from optimal. Our goal is to develop a methodology that, for a g
iven system, finds a dynamic control policy via intelligent agents. Such a
policy while achieving the productivity (i.e., demand service rate) goal of
the system will optimize a cost/reward function based on the WIP inventory
. To achieve this goal we applied a simulation-based optimization technique
called Reinforcement Learning (RL) on a four-station serial line. The cont
rol policy attained by the application of a RL algorithm was compared with
the other existing policies on the basis of total average WIP and average c
ost of WIP. We also develop a heuristic control policy in light of our expe
rience gained from a close examination of the policies obtained by the RL a
lgorithm. This heuristic policy named Behavior-Based Control (BBC), althoug
h placed second to the RL policy, proved to be a more efficient and leaner
control policy than most of the existing policies in the literature. The pe
rformance of the BBC policy was found to be comparable to the Extended Kanb
an Control System (EKCS), which as per our experimentation, turned out to b
e the best of the existing policies. The numerical results used for compari
son purposes were obtained from a four-station serial line with two differe
nt (constant and Poisson) demand arrival processes.