We propose to train trading systems and portfolios by optimizing objective
functions that directly measure trading and investment performance. Rather
than basing a trading system on forecasts or training via a supervised lear
ning algorithm using labelled trading data, we train our systems using recu
rrent reinforcement learning (RRL) algorithms. The performance functions th
at we consider for reinforcement learning are profit or wealth, economic ut
ility, the Sharpe ratio and our proposed differential Sharpe ratio. The tra
ding and portfolio management systems require prior decisions as input in o
rder to properly take into account the effects of transactions costs, marke
t impact, and taxes. This temporal dependence on system state requires the
use of reinforcement versions of standard recurrent learning algorithms. We
present empirical results in controlled experiments that demonstrate the e
fficacy of some of our methods for optimizing trading systems and portfolio
s. For a long/short trader, we find that maximizing the differential Sharpe
ratio yields more consistent results than maximizing profits, and that bot
h methods outperform a trading system based on forecasts that minimize MSE.
We find that portfolio traders trained to maximize the differential Sharpe
ratio achieve better risk-adjusted returns than those trained to maximize
profit. Finally, we provide simulation results for an S&P 500/TBill asset a
llocation system that demonstrate the presence of out-of-sample predictabil
ity in the monthly S&P 500 stock index for the 25 year period 1970 through
1994. (C) 1998 John Wiley & Sons, Ltd.