]> High-resolution path-integral development of financial options

# High-resolution path-integral development of financial options

Lester Ingber

Lester Ingber Research
POB 06440 Sears Tower, Chicago, IL 60606
and
DRW Investments LLC
311 S Wacker Dr, Ste 900, Chicago, IL 60606

ingber@ingber.com, ingber@alumni.caltech.edu

+1.312.542.1006 Voice
+1.801.881.1088 Fax

ABSTRACT

The Black-Scholes theory of option pricing has been considered for many years as an important but very approximate zeroth-order description of actual market behavior. We generalize the functional form of the diffusion of these systems and also consider multi-factor models including stochastic volatility. Daily Eurodollar futures prices and implied volatilities are fit to determine exponents of functional behavior of diffusions using methods of global optimization, Adaptive Simulated Annealing (ASA), to generate tight fits across moving time windows of Eurodollar contracts. These short-time fitted distributions are then developed into long-time distributions using a robust non-Monte Carlo path-integral algorithm, PATHINT, to generate prices and derivatives commonly used by option traders.

Keywords: options; eurodollar; volatility; path integral; optimization; statistical mechanics

PACS Nos.: 05.10.-a, 02.70.-c, 82.20.wt, 89.90.+n

## 1. INTRODUCTION

### 1.1. Background

There always is much interest in developing more sophisticated pricing models for financial instruments. In particular, there currently is much interest in improving option pricing models, particularly with respect to stochastic variables [1-4].
The standard Black-Scholes (BS) theory assumes a lognormal distribution of market prices, i.e., a diffusion linearly proportional to the market price. However, many texts include outlines of more general diffusions proportional to an arbitrary power of the market price [5].
The above aspects of stochastic volatility and of more general functional dependencies of diffusions are most often “swept under the rug” of a simple lognormal form. Experienced traders often use their own intuition to put volatility “smiles” into the BS theoretical constant coefficient in the BS lognormal distribution to compensate for these aspects.
It is generally acknowledged that since the market crash of 1987, markets have been increasingly difficult to describe using the BS model, and so better modelling and computational techniques should be used traders [6], although in practice simple BS models are the rule rather than the exception simply because they are easy to use [7]. To a large extent, previous modelling that has included stochastic volatility and multiple factors has been driven more by the desire to either delve into mathematics tangential to these issues, or to deal only with models that can accommodate closed-form algebraic expressions. We do not see much of the philosophy in the literature that has long driven the natural sciences: to respect first raw data, secondly models of raw data, and finally the use of numerical techniques that do not excessively distort models for the sake of ease of analysis and speed of computation. Indeed, very often the reverse set of priorities is seen in mathematical finance.

### 1.2. Our Approach

We have addressed the above issues in detail within the framework of a previously developed statistical mechanics of financial markets (SMFM) [8-13].
Our approach requires three sensible parts. Part one is the formulation of the model, which to some extent also involves specification of the specific market(s) data to be addressed. Part two is the fitting of the model to specific market data. Part three is the use of the resulting model to calculate option prices and their Greeks (partial derivatives of the prices with respect to their independent variables), which are used as risk parameters by traders. Each part requires some specific numerical tuning to the market under consideration.
The first part was to develop the algebraic model to replace/generalize BS, including the possibility of also addressing how to handle data regions not previously observed in trading. This is not absurd; current BS models perform integrals that must include a much influence from fat tails that include data regions never seen or likely to be seen in real-world markets. There are some issues as to whether we should take seriously the notion that the market is strongly driven by some element of a “self-fulfilling prophesy” by the BS model [14], but in any case our models have parameters to handle a wide range of possible cases that might arise.
We have developed two parallel tracks starting with part one, a one-factor and a two-factor model. The two-factor model includes stochastic volatility. At first we sensed the need to develop this two-factor model, and we now see that this is at the least an important benchmark against which to judge the worth of the one-factor model.
The second part was to fit the actual raw data so we can come up with real distributions. Some tests illustrated that standard quasi-linear fitting routines, could not get the proper fits, and so we used a more powerful global optimization, Adaptive Simulated Annealing (ASA) [15]. Tuning and selection of the time periods to perform the fits to the data were not trivial aspects of this research. Practical decisions had to be made on the time span of data to be fit and how to aggregate the fits to get sensible “fair values” for reasonable standard deviations of the exponents in the diffusions.
The third part was to develop Greeks and risk parameters from these distributions without making premature approximations just to ease the analysis. Perhaps someday, simple approximations and intuitions similar to what traders now use for BS models will be available for these models, but we do not think the best approach is to start out with such approximations until we first see proper calculations, especially in this uncharted territory. When it seemed that Cox-Ross-Rubenstein (CRR) standard tree codes (discretized approximations to partial differential equations) [16] were not stable for general exponents, i.e., for other than the lognormal case, we turned to a PATHINT code developed a decade ago for some hard nonlinear multifactor problems [17], e.g., combat analyses [18], neuroscience [19,20], and potentially chaotic systems [21,22]. In 1990 and 1991 papers on financial applications, it was mentioned how these techniques could be used for stochastic interest rates and bonds [9,10]. The modifications required here for one-factor European and then American cases went surprisingly smoothly; we still had to tune the meshes, etc. The two-factor model presented a technical problem to the algorithm, which we have reasonably handled using a combination of selection of the model in part one and a reasonable approach to developing the meshes.
More detailed numerical calculations than contained in this paper will be presented in a subsequent paper [23].

### 1.3. Outline of Paper

Section 1 is this introduction. Section 2 describes the nature of Eurodollar (ED) futures data and the evidence for stochastic volatility. Section 3 outlines the algebra of modelling options, including the standard BS theory and our generalizations. Section 4 outlines the three equivalent mathematical representations used by SMFM; this is required to understand the development of the short-time distribution that defines the cost function we derive for global optimization, as well as the numerical methods we have developed to calculate the long-time evolution of these short-time distributions. Section 5 outlines ASA and explains its use to fit short-time probability distributions defined by our models to the Eurodollar data; we offer the fitted exponent in the diffusion as a new important technical indicator of market behavior. Section 6 outlines PATHINT and explains its use to develop long-time probability distributions from the fitted short-time probability distributions, for both the one-factor and two-factor tracks. Section 7 describes how we use these long-time probability distributions to calculate European and American option prices and Greeks; here we give numerical tests of our approach to BS CRR algorithms. Section 8 is our conclusion.

## 2. DATA

### 2.1. Eurodollars

Eurodollars are fixed-rate time deposits held primarily by overseas banks, but denominated in US dollars. They are not subject to US banking regulations and therefore tend to have a tighter bid-ask spread than deposits held in the United States [24].

### 2.2. Futures

The three-month Eurodollar futures contract is one of the most actively traded futures markets in the world. The contract is quoted as an index where the yield is equal to the Eurodollar price subtracted from 100. This yield is equal to the fixed rate of interest paid by Eurodollar time deposits upon maturity and is expressed as an annualized interest rate based on a 360-day year. The Eurodollar futures are cash settled based on the 90-day London Interbank Offer Rate (LIBOR). A “notional” principal amount of \$1 million, is used to determine the change in the total interest payable on a hypothetical underlying time deposit, but is never actually paid or received [24].
Currently a total of 40 quarterly Eurodollar futures contracts (or ten years worth) are listed, with expirations annually in March, June, September and December.

### 2.3. Options on Futures

The options traded on the Eurodollar futures include not only 18 months of options expiring at the same time as the underlying future, but also various short dated options which themselves expire up to one year prior to the expiration of the underlying futures contract.

### 2.4. Front/Back Month Contracts

For purposes of risk minimization, as discussed in a previous paper [4], traders put on spreads across a variety of option contracts. One common example is to trade the spread on contracts expiring one year apart, where the future closer to expiration is referred to as the front month contract, and the future expiring one year later is called the back month. The availability of short dated or “mid-curve” options which are based on an underlying back month futures contract, but expire at the same time as the front month, allow one to trade the volatility ratios of the front and back month futures contracts without having to take the time differences in option expirations into consideration. We studied the volatilities of these types of front and back month contracts. Here, we give analyses with respect only to quarterly data longer than six months from expiration.

### 2.5. Stochastic Volatility

Below we develop two-factor models to address stochastic volatility. In a previous paper, we have performed empirical studies of Eurodollar futures to support the necessity of dealing with these issues [4].

## 3. MODELS

### 3.1. Random walk model

The use of Brownian motion as a model for financial systems is generally attributed to Bachelier [25], though he incorrectly intuited that the noise scaled linearly instead of as the square root relative to the random log-price variable. Einstein is generally credited with using the correct mathematical description in a larger physical context of statistical systems. However, several studies imply that changing prices of many markets do not follow a random walk, that they may have long-term dependences in price correlations, and that they may not be efficient in quickly arbitraging new information [26-28]. A random walk for returns, rate of change of prices over prices, is described by a Langevin equation with simple additive noise$\eta$ , typically representing the continual random influx of information into the market.

$\stackrel{\text{.}}{\Gamma }=-{\gamma }_{1}+{\gamma }_{2}\eta \text{ },$
$\stackrel{\text{.}}{\Gamma }=d\Gamma /dt\text{ },$
$<\eta \left(t\right){>}_{\eta }=0\text{ },\text{ }<\eta \left(t\right),\eta \left(t\prime \right){>}_{\eta }=\delta \left(t-t\prime \right)\text{ },$

(1)

where${\gamma }_{1}$ and${\gamma }_{2}$ are constants, and$\Gamma$ is the logarithm of (scaled) price. Price, although the most dramatic observable, may not be the only appropriate dependent variable or order parameter for the system of markets [29]. This possibility has also been called the “semistrong form of the efficient market hypothesis” [26].
The generalization of this approach to include multivariate nonlinear nonequilibrium markets led to a model of statistical mechanics of financial markets (SMFM) [8].

### 3.2. Black-Scholes (BS) Theory

The standard partial-differential equation used to formulate most variants of Black-Scholes (BS) models describing the market value of an option,$V$ , is

 $\frac{\partial V}{\partial t}+\frac{1}{2}{\sigma }^{2}{S}^{2}\frac{{\partial }^{2}V}{\partial {S}^{2}}+rS\frac{\partial V}{\partial S}-rV=0\text{ },$ (2) where$S$ is the asset price, and$\sigma$ is the standard deviation, or volatility of$S$ , and$r$ is the short-term interest rate. The solution depends on boundary conditions, subject to a number of interpretations, some requiring minor transformations of the basic BS equation or its solution. For example, the basic equation can apply to a number of one-dimensional models of interpretations of prices given to$V$ , e.g., puts or calls, and to$S$ , e.g., stocks or futures, dividends, etc. For instance, if$V$ is set to$C$ , a call on an European option with exercise price$X$ with maturity at$T$ , the solution is
$C\left(S,t\right)=SN\left({d}_{1}\right)-X{e}^{-r\left(T-t\right)}N\left({d}_{2}\right)\text{ },$
${d}_{1}=\frac{\text{ln}\left(S/X\right)+\left(r+\frac{1}{2}{\sigma }^{2}\right)\left(T-t\right)}{\sigma \left(T-t{\right)}^{1/2}}\text{ },$
${d}_{2}=\frac{\text{ln}\left(S/X\right)+\left(r-\frac{1}{2}{\sigma }^{2}\right)\left(T-t\right)}{\sigma \left(T-t{\right)}^{1/2}}\text{ }.$

(3)

In practice, the volatility$\sigma$ is the least known parameter in this equation, and its estimation is generally the most important part of pricing options. Usually the volatility is given in a yearly basis, baselined to some standard, e.g., 252 trading days per year, or 360 or 365 calendar days. Therefore, all values of volatility given in the graphs in this paper, based on daily data, would be annualized by multiplying the standard deviations of the yields by$\sqrt{252}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}15.87$ We have used this factor to present our implied volatilities as daily movements.

### 3.3. Some Key Issues in Derivation of BS

The basic BS model considers a portfolio in terms of delta ($\Delta$ ),

 $\Pi =V-\Delta S\text{ },$ (4) in a market with Gaussian-Markovian (“white”) noise$X$ and drift$\mu$ ,
 $\frac{dS}{S}=\sigma dX+\mu dt\text{ },$ (5) where$V\left(S,t\right)$ inherits a random process from$S$ ,
 $dV=\sigma S\frac{\partial V}{\partial S}dX+\left(\mu S\frac{\partial V}{\partial S}+\frac{1}{2}{\sigma }^{2}{S}^{2}\frac{{\partial }^{2}V}{\partial {S}^{2}}+\frac{\partial V}{\partial t}\right)dt\text{ }.$ (6) This yields
 $d\Pi =\sigma \left(\frac{\partial V}{\partial S}-\Delta \right)dX+\left(\mu S\frac{\partial V}{\partial S}+\frac{1}{2}{\sigma }^{2}{S}^{2}\frac{{\partial }^{2}V}{\partial {S}^{2}}+\frac{\partial V}{\partial t}-\mu \Delta S\right)dt\text{ }.$ (7) The expected risk-neutral return of$\Pi$ is
 $d\Pi =r\Pi dt=r\left(V-\Delta S\right)dt\text{ }.$ (8) Options$V$ on futures$F$ can be derived, e.g., using simple transformations to take cost of carry into consideration, such as
 $F=S{e}^{r\left(T-t\right)}\text{ },$ (9) and setting
 $d\Pi =rV\text{\hspace{0.17em}}dt\text{ }.$ (10) The corresponding BS equation for futures$F$ is
 $\frac{\partial V}{\partial t}+\frac{1}{2}{\sigma }^{2}{F}^{2}\frac{{\partial }^{2}V}{\partial {S}^{2}}-rV=0\text{ }.$ (11) At least two advantages are present if$\Delta$ is chosen such that
 $\Delta =\frac{\partial V}{\partial S}\text{ }.$ (12) Then, the portfolio can be instantaneously “risk-neutral,” in terms of zeroing the coefficient of$X$ , as well as independent of the direction of market, in terms of zeroing the coefficient of$\mu$ For the above example of$V$ =$C$ ,
 $\Delta =N\left({d}_{1}\right)\text{ }.$ (13) Other trading strategies based on this simple model use similar constructs as risk parameters, e.g., gamma ($\Gamma$ ), theta ($\Theta$ ), vega ($Υ$ ), rho ($\rho$ ) [5],
$\Gamma =\frac{{\partial }^{2}\Pi }{\partial {S}^{2}}\text{ },$
$\Theta =\frac{\partial \Pi }{\partial t}\text{ },$
$Υ=\frac{\partial \Pi }{\partial \sigma }\text{ },$
 $\rho =\frac{\partial \Pi }{\partial r}\text{ }.$ (14) The BS equation, Eq. (2), may be written as
$\Theta +rS\Delta +\frac{1}{2}\left(\sigma S{\right)}^{2}\Gamma =rf\text{ }.$

(15)

### 3.4. Models

Our two-factor model includes stochastic volatility$\sigma$ of the underlying$S$ ,

$dS=\mu \text{\hspace{0.17em}}dt+\sigma \text{\hspace{0.17em}}F\left(S,{S}_{0},{S}_{\infty },x,y\right)\text{\hspace{0.17em}}d{z}_{S}\text{ },$
$d\sigma =\nu \text{\hspace{0.17em}}dt+\epsilon \text{\hspace{0.17em}}d{z}_{\sigma }\text{ },$
$\text{\hspace{0.17em}}=0\text{ },\text{ }i=\text{{}S,\sigma \text{}}\text{ },$
$\text{\hspace{0.17em}}=\text{\hspace{0.17em}}dt\text{\hspace{0.17em}}\delta \left(t-t\prime \right)\text{ },\text{ }i=j\text{ },$
$\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\rho \text{\hspace{0.17em}}dt\text{\hspace{0.17em}}\delta \left(t-t\prime \right)\text{ },\text{ }i\ne j\text{ },$
 $F\left(S,{S}_{0},{S}_{\infty },x,y\right)=\left\{\begin{array}{l}S,\\ {S}^{x}{S}_{0}^{1-x},\\ {S}^{y}{S}_{0}^{1-x}{S}_{\infty }^{x-y},\end{array}\begin{array}{l}\text{ }\text{ }S<{S}_{0}\\ \text{ }\text{ }{S}_{0}\le S\le {S}_{\infty }\\ \text{ }\text{ }S>{S}_{\infty }\end{array}\text{ },\left(null\right)$ (16) where${S}_{0}$ and${S}_{\infty }$ are selected to lie outside the data region used to fit the other parameters, e.g.,${S}_{0}=1$ and${S}_{\infty }=20$ for fits to Eurodollar futures which historically have a very tight range relative to other markets. We have used the Black-Scholes form$F=S$ inside$S<{S}_{0}$ to obtain the usual benefits, e.g., no negative prices as the distribution is naturally excluded from$S<0$ and preservation of put-call parity. Put-call parity for European options is derived quite independent of any mathematical model of options [5]. In its simplest form, it is given by
 $c+X{e}^{-r\left(T-t\right)}=p+S\text{ },$ (17) where$c$ ($p$ ) is the fair price of a call (put),$X$ is the strike price,$r$ is the risk-free interest rate,$t$ is the present time,$T$ is the time of expiration, and$S$ is the underlying market. We have taken$y=0$ , a normal distribution, to reflect total ignorance of markets outside the range of$S>{S}_{\infty }$ The one-factor model just assumes a constant$\sigma$ It is often noted that BS models incorrectly include untenable contributions from large$S$ regions because of their fat tails [30]. (If we wished to handle negative interest rates, ED prices > 100, we would move shift the$S=0$ axis to some$S<0$ value.) We found that the abrupt, albeit continuous, changes across${S}_{0}$ especially for$x\le 0$ did not cause any similar effects in the distributions evolved using these diffusions, as reported below. The formula for pricing an option$P$ , derived in a Black-Scholes generalized framework after factoring out interest-rate discounting, is equivalent to using the form
$dS=\mu \text{\hspace{0.17em}}S\text{\hspace{0.17em}}dt+\sigma \text{\hspace{0.17em}}F\left(S,{S}_{0},{S}_{\infty },x,y\right)\text{\hspace{0.17em}}d{z}_{S}\text{ },$
 $d\sigma =\nu \text{\hspace{0.17em}}dt+\epsilon \text{\hspace{0.17em}}d{z}_{\sigma }\text{ }.$ (18) We experimented with some alternative functional forms, primarily to apply some smooth cutoffs across the above three regions of$S$ For example, we used$F\prime$ , a function$F$ designed to revert to the lognormal Black-Scholes model in several limits,
$F\prime \left(S,{S}_{0},{S}_{\infty },x\right)=S\text{\hspace{0.17em}}{C}_{0}+\left(1-{C}_{0}\right)\text{\hspace{0.17em}}\left({S}^{x}\text{\hspace{0.17em}}{S}_{0}^{1-x}\text{\hspace{0.17em}}{C}_{\infty }+{S}_{0}\left(1-{C}_{\infty }\right)\right)\text{ },$
${C}_{0}=\text{exp}\left[-{\left(\frac{S}{{S}_{0}}\text{\hspace{0.17em}}\frac{|1-x|}{1+|1-x|}\right)}^{|2-x|+1}\right]\text{ },$
${C}_{\infty }=\text{exp}\left[-{\left(\frac{S}{{S}_{\infty }}\right)}^{2}\right]\text{ },$
$\underset{S\to \infty ,\text{\hspace{0.17em}}x\ne 1}{\text{lim}}F\prime \left(S,{S}_{0},{S}_{\infty },x\right)={S}_{0}=constant\text{ },$
$\underset{S\to {0}^{+}}{\text{lim}}F\prime \left(S,{S}_{0},{S}_{\infty },x\right)=\underset{x\to 1}{\text{lim}}F\prime \left(S,{S}_{0},{S}_{\infty },x\right)=S\text{ }.$

(19)

However, our fits were most sensitive to the data when we permitted the central region to be pure${S}^{x}$ using$F$ above.

#### 3.4.1. Various Diffusions

Fig. 1. gives examples of$F\left(S,{S}_{0},{S}_{\infty },x,y\right)\text{\hspace{0.17em}}d{z}_{S}$ for$x$ in {-1, 0, 1, 2}. The other parameters are$S=5$ ,${S}_{0}=0.5$ ,${S}_{\infty }=20$ ,$y=0$

## 4. STATISTICAL MECHANICS OF FINANCIAL MARKETS (SMFM)

### 4.1. Statistical Mechanics of Large Systems

Aggregation problems in nonlinear nonequilibrium systems typically are “solved” (accommodated) by having new entities/languages developed at these disparate scales in order to efficiently pass information back and forth. This is quite different from the nature of quasi-equilibrium quasi-linear systems, where thermodynamic or cybernetic approaches are possible. These approaches typically fail for nonequilibrium nonlinear systems.
Many systems are aptly modeled in terms of multivariate differential rate-equations, known as Langevin equations,

${\stackrel{\text{.}}{M}}^{G}={f}^{G}+{\stackrel{\text{^}}{g}}_{j}^{G}{\eta }^{j}\text{ },\left(G=1,...,\Lambda \right)\text{ },\text{ }\left(j=1,...,N\right)\text{ },\text{ }$
${\stackrel{\text{.}}{M}}^{G}=d{M}^{G}/dt\text{ },$
 $<{\eta }^{j}\left(t\right){>}_{\eta }=0\text{ },\text{ }<{\eta }^{j}\left(t\right),{\eta }^{j\prime }\left(t\prime \right){>}_{\eta }={\delta }^{jj\prime }\delta \left(t-t\prime \right)\text{ },$ (20) where${f}^{G}$ and${\stackrel{\text{^}}{g}}_{j}^{G}$ are generally nonlinear functions of mesoscopic order parameters${M}^{G}$ ,$j$ is a microscopic index indicating the source of fluctuations, and$N\ge \Lambda$ The Einstein convention of summing over repeated indices is used. Vertical bars on an index, e.g., |j|, imply no sum is to be taken on repeated indices. Via a somewhat lengthy, albeit instructive calculation, outlined in several other papers [8,10,31], involving an intermediate derivation of a corresponding Fokker-Planck or Sch¨odinger-type equation for the conditional probability distribution$P\left[M\left(t\right)|M\left({t}_{0}\right)\right]$ , the Langevin rate Eq. (20) is developed into the more useful probability distribution for${M}^{G}$ at long-time macroscopic time event$t=\left(u+1\right)\theta +{t}_{0}$ , in terms of a Stratonovich path-integral over mesoscopic Gaussian conditional probabilities [32-36]. Here, macroscopic variables are defined as the long-time limit of the evolving mesoscopic system. The corresponding Sch¨odinger-type equation is [34,35]
$\partial P/\partial t=\frac{1}{2}\left({g}^{GG\prime }P{\right)}_{,GG\prime }-\left({g}^{G}P{\right)}_{,G}+V\text{ },$
${g}^{GG\prime }={k}_{T}{\delta }^{jk}{\stackrel{\text{^}}{g}}_{j}^{G}{\stackrel{\text{^}}{g}}_{k}^{G\prime }\text{ },$
${g}^{G}={f}^{G}+\frac{1}{2}{\delta }^{jk}{\stackrel{\text{^}}{g}}_{j}^{G\prime }{\stackrel{\text{^}}{g}}_{k,G\prime }^{G}\text{ },$
 $\left[...{\right]}_{,G}=\partial \left[...\right]/\partial {M}^{G}\text{ }.$ (21) This is properly referred to as a Fokker-Planck equation when$V\equiv 0$ Note that although the partial differential Eq. (21) contains information regarding${M}^{G}$ as in the stochastic differential Eq. (20), all references to$j$ have been properly averaged over. I.e.,${\stackrel{\text{^}}{g}}_{j}^{G}$ in Eq. (20) is an entity with parameters in both microscopic and mesoscopic spaces, but$M$ is a purely mesoscopic variable, and this is more clearly reflected in Eq. (21). The path integral representation is given in terms of the “Feynman” Lagrangian
$L$
$P\left[{M}_{t}|{M}_{{t}_{0}}\right]dM\left(t\right)=\int \text{\hspace{0.17em}}...\text{\hspace{0.17em}}\int \text{\hspace{0.17em}}\underset{_}{D}M\text{exp}\left(-S\right)\delta \left[M\left({t}_{0}\right)={M}_{0}\right]\delta \left[M\left(t\right)={M}_{t}\right]\text{ },$
$S={k}_{T}^{-1}\text{min}\text{\hspace{0.17em}}\underset{{t}_{0}}{\overset{t}{\int }}\text{\hspace{0.17em}}dt\prime L\text{ },$
$\underset{_}{D}M=\underset{u\to \infty }{\text{lim}}\text{\hspace{0.17em}}\underset{\rho =1}{\overset{u+1}{\Pi }}\text{\hspace{0.17em}}{g}^{1/2}\text{\hspace{0.17em}}\underset{G}{\Pi }\text{\hspace{0.17em}}\left(2\pi \theta {\right)}^{-1/2}d{M}_{\rho }^{G}\text{ },$
$L\left({\stackrel{\text{.}}{M}}^{G},{M}^{G},t\right)=\frac{1}{2}\left({\stackrel{\text{.}}{M}}^{G}-{h}^{G}\right){g}_{GG\prime }\left({\stackrel{\text{.}}{M}}^{G\prime }-{h}^{G\prime }\right)+\frac{1}{2}{{h}^{G}}_{;G}+R/6-V\text{ },$
${h}^{G}={g}^{G}-\frac{1}{2}{g}^{-1/2}\left({g}^{1/2}{g}^{GG\prime }{\right)}_{,G\prime }\text{ },$
${g}_{GG\prime }=\left({g}^{GG\prime }{\right)}^{-1}\text{ },$
$g=\text{det}\left({g}_{GG\prime }\right)\text{ },$
${{h}^{G}}_{;G}={h}_{,G}^{G}+{\Gamma }_{GF}^{F}{h}^{G}={g}^{-1/2}\left({g}^{1/2}{h}^{G}{\right)}_{,G}\text{ },$
${\Gamma }_{JK}^{F}\equiv {g}^{LF}\left[JK,L\right]={g}^{LF}\left({g}_{JL,K}+{g}_{KL,J}-{g}_{JK,L}\right)\text{ },$
$R={g}^{JL}{R}_{JL}={g}^{JL}{g}^{JK}{R}_{FJKL}\text{ },$
 ${R}_{FJKL}=\frac{1}{2}\left({g}_{FK,JL}-{g}_{JK,FL}-{g}_{FL,JK}+{g}_{JL,FK}\right)+{g}_{MN}\left({\Gamma }_{FK}^{M}{\Gamma }_{JL}^{N}-{\Gamma }_{FL}^{M}{\Gamma }_{JK}^{N}\right)\text{ }.$ (22) Mesoscopic variables have been defined as${M}^{G}$ in the Langevin and Fokker-Planck representations, in terms of their development from the microscopic system labeled by$j$ The Riemannian curvature term$R$ arises from nonlinear${g}_{GG\prime }$ , which is a bona fide metric of this space [34]. Even if a stationary solution, i.e.,${\stackrel{\text{.}}{M}}^{G}=0$ , is ultimately sought, a necessarily prior stochastic treatment of${\stackrel{\text{.}}{M}}^{G}$ terms gives rise to these Riemannian “corrections.” Even for a constant metric, the term${{h}^{G}}_{;G}$ contributes to$L$ for a nonlinear mean${h}^{G}$$V$ may include terms such as$\underset{T\prime }{\Sigma }\text{\hspace{0.17em}}{J}_{T\prime G}{M}^{G}$ , where the Lagrange multipliers${J}_{T\prime G}$ are constraints on${M}^{G}$ , which are advantageously modeled as extrinsic sources in this representation; they too may be time-dependent. For our purposes, the above Feynman Lagrangian defines a kernel of the short-time conditional probability distribution, in the curved space defined by the metric, in the limit of continuous time, whose iteration yields the solution of the previous partial differential equation Eq. (21). This differs from the Lagrangian which satisfies the requirement that the action is stationary to the first order in$dt$ — the WKBJ approximation, but which does not include the first-order correction to the WKBJ approximation as does the Feynman Lagrangian. This latter Lagrangian differs from the Feynman Lagrangian, essentially by replacing$R/6$ above by$R/12$  [37]. In this sense, the WKBJ Lagrangian is more useful for some theoretical discussions [38]. However, the use of the Feynman Lagrangian coincides with the numerical method we present here using the PATHINT code. Using the variational principle,${J}_{TG}$ may also be used to constrain${M}^{G}$ to regions where they are empirically bound. More complicated constraints may be affixed to$L$ using methods of optimal control theory [39]. With respect to a steady state$\overline{P}$ , when it exists, the information gain in state$P$ is defined by
$Υ\left[P\right]=\text{\hspace{0.17em}}\int \text{\hspace{0.17em}}...\text{\hspace{0.17em}}\int \text{\hspace{0.17em}}\underset{_}{D}M\prime \text{\hspace{0.17em}}P\text{\hspace{0.17em}}\text{ln}\text{\hspace{0.17em}}\left(P/\overline{P}\right)\text{ },$
 $\underset{_}{D}M\prime =\underset{_}{D}M/d{M}_{u+1}\text{ }.$ (23) In the economics literature, there appears to be sentiment to define Eq. (20) by the Ito, rather than the Stratonovich prescription. It is true that Ito integrals have Martingale properties not possessed by Stratonovich integrals [40] which leads to risk-neural theorems for markets [41,42], but the nature of the proper mathematics should eventually be determined by proper aggregation of relatively microscopic models of markets. It should be noted that virtually all investigations of other physical systems, which are also continuous time models of discrete processes, conclude that the Stratonovich interpretation coincides with reality, when multiplicative noise with zero correlation time, modeled in terms of white noise${\eta }^{j}$ , is properly considered as the limit of real noise with finite correlation time [43]. The path integral succinctly demonstrates the difference between the two: The Ito prescription corresponds to the prepoint discretization of$L$ , wherein$\theta \stackrel{\text{.}}{M}\left(t\right)\to {M}_{\rho +1}-{M}_{\rho }$ and$M\left(t\right)\to {M}_{\rho }$ The Stratonovich prescription corresponds to the midpoint discretization of$L$ , wherein$\theta \stackrel{\text{.}}{M}\left(t\right)\to {M}_{\rho +1}-{M}_{\rho }$ and$M\left(t\right)\to \frac{1}{2}\left({M}_{\rho +1}+{M}_{\rho }\right)$ In terms of the functions appearing in the Fokker-Planck Eq. (21), the Ito prescription of the prepoint discretized Lagrangian,${L}_{I}$ , is relatively simple, albeit deceptively so because of its nonstandard calculus.
 ${L}_{I}\left({\stackrel{\text{.}}{M}}^{G},{M}^{G},t\right)=\frac{1}{2}\left({\stackrel{\text{.}}{M}}^{G}-{g}^{G}\right){g}_{GG\prime }\left({\stackrel{\text{.}}{M}}^{G\prime }-{g}^{G\prime }\right)-V\text{ }.$ (24) In the absence of a nonphenomenological microscopic theory, the difference between a Ito prescription and a Stratonovich prescription is simply a transformed drift [37]. There are several other advantages to Eq. (22) over Eq. (20). Extrema and most probable states of${M}^{G}$ ,$\ll {M}^{G}\gg$ , are simply derived by a variational principle, similar to conditions sought in previous studies [44]. In the Stratonovich prescription, necessary, albeit not sufficient, conditions are given by
${\delta }_{G}L={L}_{,G}-{L}_{,\stackrel{\text{.}}{G}:t}=0\text{ },$
${L}_{,\stackrel{\text{.}}{G}:t}={L}_{,\stackrel{\text{.}}{G}G\prime }{\stackrel{\text{.}}{M}}^{G\prime }+{L}_{,\stackrel{\text{.}}{G}\stackrel{\text{.}}{G}\prime }{\stackrel{\text{..}}{M}}^{G\prime }\text{ }.$

(25)

For stationary states,${\stackrel{\text{.}}{M}}^{G}=0$ , and$\partial \overline{L}/\partial {\overline{M}}^{G}=0$ defines$\ll {\overline{M}}^{G}\gg$ , where the bars identify stationary variables; in this case, the macroscopic variables are equal to their mesoscopic counterparts. [Note that$\overline{L}$ is not the stationary solution of the system, e.g., to Eq. (21) with$\partial P/\partial t=0$ However, in some cases [45],$\overline{L}$ is a definite aid to finding such stationary states.] Many times only properties of stationary states are examined, but here a temporal dependence is included. E.g., the${\stackrel{\text{.}}{M}}^{G}$ terms in$L$ permit steady states and their fluctuations to be investigated in a nonequilibrium context. Note that Eq. (25) must be derived from the path integral, Eq. (22), which is at least one reason to justify its development.

### 4.2. Correlations

Correlations between variables are modeled explicitly in the Lagrangian as a parameter usually designated$\rho$ (not to be confused with the Rho Greek calculated for options). This section uses a simple two-factor model to develop the correspondence between the correlation$\rho$ in the Lagrangian and that among the commonly written Weiner distributions$dz$
Consider coupled stochastic differential equations

$dr={f}^{r}\left(r,l\right)dt+{\stackrel{\text{^}}{g}}^{r}\left(r,l\right){\sigma }_{1}d{z}_{1}\text{ },$
$dl={f}^{l}\left(r,l\right)dt+{\stackrel{\text{^}}{g}}^{l}\left(r,l\right){\sigma }_{2}d{z}_{2}\text{ },$
$=0\text{ },\text{ }i=\text{{}1,2\text{}}\text{ },$
$=dt\delta \left(t-t\prime \right)\text{ },\text{ }i=j\text{ },$
$=\rho dt\delta \left(t-t\prime \right)\text{ },\text{ }i\ne j\text{ },$
 $\delta \left(t-t\prime \right)=\left\{\begin{array}{c}0\text{\hspace{0.17em}},,\\ 1\text{\hspace{0.17em}},\end{array}\text{ }\text{ }\begin{array}{c}t\ne t\prime \text{ },\\ t=t\prime \text{ },\end{array}\left(null\right)$ (26) where$<.>$ denotes expectations. These can be rewritten as Langevin equations (in the It^ prepoint discretization)
$dr/dt={f}^{r}+{\stackrel{\text{^}}{g}}^{r}{\sigma }_{1}\left({\gamma }^{+}{n}_{1}+sgn\rho \text{\hspace{0.17em}}{\gamma }^{-}{n}_{2}\right)\text{ },$
$dl/dt={g}^{l}+{\stackrel{\text{^}}{g}}^{l}{\sigma }_{2}\left(sgn\rho \text{\hspace{0.17em}}{\gamma }^{-}{n}_{1}+{\gamma }^{+}{n}_{2}\right)\text{ },$
${\gamma }^{±}=\frac{1}{\sqrt{2}}\left[1±\left(1-{\rho }^{2}{\right)}^{1/2}{\right]}^{1/2}\text{ },$
 ${n}_{i}=\left(dt{\right)}^{1/2}{p}_{i}\text{ },$ (27) where${p}_{1}$ and${p}_{2}$ are independent [0,1] Gaussian distributions. The equivalent short-time probability distribution,$P$ , for the above set of equations is
$P={g}^{1/2}\left(2\pi dt{\right)}^{-1/2}\text{exp}\left(-Ldt\right)\text{ },$
$L=\frac{1}{2}{F}^{†}\underset{_}{g}F\text{ },$
$F=\left(\begin{array}{c}\hfill dr/dt-{f}^{r}\right)\hfill \\ \hfill dl/dt-{f}^{l}\right)\hfill \end{array}\right)\text{ },$
$g=\text{det}\left(\underset{_}{g}\right)\text{ },$
 $k=1-{\rho }^{2}\text{ }.$ (28)$\underset{_}{g}$ , the metric in$\text{{}r,l\text{}}$ -space, is the inverse of the covariance matrix,
${\underset{_}{g}}^{-1}=\left(\begin{array}{cc}\left({\stackrel{\text{^}}{g}}^{r}{\sigma }_{1}{\right)}^{2}\hfill & \hfill \rho {\stackrel{\text{^}}{g}}^{r}{\stackrel{\text{^}}{g}}^{l}{\sigma }_{1}{\sigma }_{2}\\ \rho {\stackrel{\text{^}}{g}}^{r}{\stackrel{\text{^}}{g}}^{l}{\sigma }_{1}{\sigma }_{2}\hfill & \hfill \left({\stackrel{\text{^}}{g}}^{l}{\sigma }_{2}{\right)}^{2}\end{array}\right)\text{ }.$

(29)

The above also corrects previous papers which inadvertently dropped the$sgn$ factors in the above [9,10,17].

## 5. ADAPTIVE SIMULATED ANNEALING (ASA) FITS

### 5.1. ASA Outline

The algorithm developed which is now called Adaptive Simulated Annealing (ASA) [46] fits short-time probability distributions to observed data, using a maximum likelihood technique on the Lagrangian. This algorithm has been developed to fit observed data to a theoretical cost function over a$D$ -dimensional parameter space [46], adapting for varying sensitivities of parameters during the fit. The ASA code can be obtained at no charge, via WWW from http://www.ingber.com/ or via FTP from ftp.ingber.com [15].

#### 5.1.1. General description

Simulated annealing (SA) was developed in 1983 to deal with highly nonlinear problems [47], as an extension of a Monte-Carlo importance-sampling technique developed in 1953 for chemical physics problems. It helps to visualize the problems presented by such complex systems as a geographical terrain. For example, consider a mountain range, with two “parameters,” e.g., along the North−South and East−West directions. We wish to find the lowest valley in this terrain. SA approaches this problem similar to using a bouncing ball that can bounce over mountains from valley to valley. We start at a high “temperature,” where the temperature is an SA parameter that mimics the effect of a fast moving particle in a hot object like a hot molten metal, thereby permitting the ball to make very high bounces and being able to bounce over any mountain to access any valley, given enough bounces. As the temperature is made relatively colder, the ball cannot bounce so high, and it also can settle to become trapped in relatively smaller ranges of valleys.
We imagine that our mountain range is aptly described by a “cost function.” We define probability distributions of the two directional parameters, called generating distributions since they generate possible valleys or states we are to explore. We define another distribution, called the acceptance distribution, which depends on the difference of cost functions of the present generated valley we are to explore and the last saved lowest valley. The acceptance distribution decides probabilistically whether to stay in a new lower valley or to bounce out of it. All the generating and acceptance distributions depend on temperatures.
In 1984 [48], it was established that SA possessed a proof that, by carefully controlling the rates of cooling of temperatures, it could statistically find the best minimum, e.g., the lowest valley of our example above. This was good news for people trying to solve hard problems which could not be solved by other algorithms. The bad news was that the guarantee was only good if they were willing to run SA forever. In 1987, a method of fast annealing (FA) was developed [49], which permitted lowering the temperature exponentially faster, thereby statistically guaranteeing that the minimum could be found in some finite time. However, that time still could be quite long. Shortly thereafter, Very Fast Simulated Reannealing (VFSR) was developed in 1987 [46], now called Adaptive Simulated Annealing (ASA), which is exponentially faster than FA.
ASA has been applied to many problems by many people in many disciplines [50-52]. The feedback of many users regularly scrutinizing the source code ensures its soundness as it becomes more flexible and powerful.

#### 5.1.2. Mathematical outline

ASA considers a parameter${\alpha }_{k}^{i}$ in dimension$i$ generated at annealing-time$k$ with the range

 ${\alpha }_{k}^{i}\in \left[{A}_{i},{B}_{i}\right]\text{ },$ (30) calculated with the random variable${y}^{i}$ ,
${\alpha }_{k+1}^{i}={\alpha }_{k}^{i}+{y}^{i}\left({B}_{i}-{A}_{i}\right)\text{ },$
 ${y}^{i}\in \left[-1,1\right]\text{ }.$ (31) The generating function${g}_{T}\left(y\right)$ is defined,
 ${g}_{T}\left(y\right)=\text{\hspace{0.17em}}\underset{i=1}{\overset{D}{\Pi }}\text{\hspace{0.17em}}\frac{1}{2\left(|{y}^{i}|+{T}_{i}\right)\text{\hspace{0.17em}}\text{ln}\left(1+1/{T}_{i}\right)}\equiv \text{\hspace{0.17em}}\underset{i=1}{\overset{D}{\Pi }}\text{\hspace{0.17em}}{g}_{T}^{i}\left({y}^{i}\right)\text{ },$ (32) where the subscript$i$ on${T}_{i}$ specifies the parameter index, and the$k$ -dependence in${T}_{i}\left(k\right)$ for the annealing schedule has been dropped for brevity. Its cumulative probability distribution is
${G}_{T}\left(y\right)=\text{\hspace{0.17em}}\underset{-1}{\overset{{y}^{1}}{\int }}...\underset{-1}{\overset{{y}^{D}}{\int }}\text{\hspace{0.17em}}dy{\prime }^{1}...dy{\prime }^{D}\text{\hspace{0.17em}}{g}_{T}\left(y\prime \right)\equiv \text{\hspace{0.17em}}\underset{i=1}{\overset{D}{\Pi }}\text{\hspace{0.17em}}{G}_{T}^{i}\left({y}^{i}\right)\text{ },$
 ${G}_{T}^{i}\left({y}^{i}\right)=\frac{1}{2}+\frac{\text{\hspace{0.17em}}sgn\text{\hspace{0.17em}}\left({y}^{i}\right)}{2}\frac{\text{ln}\left(1+|{y}^{i}|/{T}_{i}\right)}{\text{ln}\left(1+1/{T}_{i}\right)}\text{ }.$ (33)${y}^{i}$ is generated from a${u}^{i}$ from the uniform distribution
${u}^{i}\in U\left[0,1\right]\text{ },$
 ${y}^{i}=\text{\hspace{0.17em}}sgn\text{\hspace{0.17em}}\left({u}^{i}-\frac{1}{2}\right){T}_{i}\left[\left(1+1/{T}_{i}{\right)}^{|2{u}^{i}-1|}-1\right]\text{ }.$ (34) It is straightforward to calculate that for an annealing schedule for
${T}_{i}$
 ${T}_{i}\left(k\right)={T}_{0i}\text{exp}\left(-{c}_{i}{k}^{1/D}\right)\text{ },$ (35) a global minima statistically can be obtained. I.e.,
 $\underset{{k}_{0}}{\overset{\infty }{\Sigma }}\text{\hspace{0.17em}}{g}_{k}\text{≈}\text{\hspace{0.17em}}\underset{{k}_{0}}{\overset{\infty }{\Sigma }}\text{\hspace{0.17em}}\left[\text{\hspace{0.17em}}\underset{i=1}{\overset{D}{\Pi }}\text{\hspace{0.17em}}\frac{1}{2|{y}^{i}|{c}_{i}}\right]\frac{1}{k}=\infty \text{ }.$ (36) Control can be taken over${c}_{i}$ , such that
${T}_{fi}={T}_{0i}\text{exp}\left(-{m}_{i}\right)\text{ }when\text{ }{k}_{f}=\text{exp}{n}_{i}\text{ },$
${c}_{i}={m}_{i}\text{exp}\left(-{n}_{i}/D\right)\text{ },$

(37)

where${m}_{i}$ and${n}_{i}$ can be considered “free” parameters to help tune ASA for specific problems.
ASA has over 100 OPTIONS available for tuning. A few important ones were used in this project.

#### 5.1.3. Reannealing

Whenever doing a multi-dimensional search in the course of a complex nonlinear physical problem, inevitably one must deal with different changing sensitivities of the${\alpha }^{i}$ in the search. At any given annealing-time, the range over which the relatively insensitive parameters are being searched can be “stretched out” relative to the ranges of the more sensitive parameters. This can be accomplished by periodically rescaling the annealing-time$k$ , essentially reannealing, every hundred or so acceptance-events (or at some user-defined modulus of the number of accepted or generated states), in terms of the sensitivities${s}_{i}$ calculated at the most current minimum value of the cost function,$C$ ,

 ${s}_{i}=\partial C/\partial {\alpha }^{i}\text{ }.$ (38) In terms of the largest${s}_{i}$ =${s}_{\text{max}}$ , a default rescaling is performed for each${k}_{i}$ of each parameter dimension, whereby a new index$k{\prime }_{i}$ is calculated from each${k}_{i}$ ,
${k}_{i}\to k{\prime }_{i}\text{ },$
$T{\prime }_{ik\prime }={T}_{ik}\left({s}_{\text{max}}/{s}_{i}\right)\text{ },$
$k{\prime }_{i}=\left(\text{ln}\left({T}_{i0}/{T}_{ik\prime }\right)/{c}_{i}{\right)}^{D}\text{ }.$

(39)

${T}_{i0}$

is set to unity to begin the search, which is ample to span each parameter dimension.

#### 5.1.4. Quenching

Another adaptive feature of ASA is its ability to perform quenching in a methodical fashion. This is applied by noting that the temperature schedule above can be redefined as

${T}_{i}\left({k}_{i}\right)={T}_{0i}\text{exp}\left(-{c}_{i}{k}_{i}^{{Q}_{i}/D}\right)\text{ },$
 ${c}_{i}={m}_{i}\text{exp}\left(-{n}_{i}{Q}_{i}/D\right)\text{ },$ (40) in terms of the “quenching factor”${Q}_{i}$ The sampling proof fails if${Q}_{i}>1$ as
$\underset{k}{\Sigma }\stackrel{D}{\Pi }1/{k}^{{Q}_{i}/D}=\underset{k}{\Sigma }1/{k}^{{Q}_{i}}<\infty \text{ }.$

(41)

This simple calculation shows how the “curse of dimensionality” arises, and also gives a possible way of living with this disease. In ASA, the influence of large dimensions becomes clearly focussed on the exponential of the power of$k$ being$1/D$ , as the annealing required to properly sample the space becomes prohibitively slow. So, if resources cannot be committed to properly sample the space, then for some systems perhaps the next best procedure may be to turn on quenching, whereby${Q}_{i}$ can become on the order of the size of number of dimensions.
The scale of the power of$1/D$ temperature schedule used for the acceptance function can be altered in a similar fashion. However, this does not affect the annealing proof of ASA, and so this may used without damaging the sampling property.

### 5.2. -Indicator of Market Contexts

Our studies of contexts of markets well recognized by option traders to have significantly different volatility behavior show that the exponents$x$ are reasonably faithful indicators defining these different contexts.
We feel the two-factor model is more accurate because the data indeed demonstrate stochastic volatility [4]. We also note that the two-factor$x$ fit by ASA across the last few years. This is not true of the one-factor ASA fitted$x$ the Black-Scholes$\sigma$ as a parameter, but rather calculate as historical volatility during all runs. Some results of two-factor studies and one-factor studies using a Black-Scholes$\sigma$ have been reported elsewhere [13].
Since$\sigma$ is not widely traded and arbitraged, to fit the two-factor model, we calculate this quantity as an historical volatility for both its prepoint and postpoint values. Some previous studies used a scaled implied volatility (which is calculated from a BS model). We use a standard deviation$\sigma \prime$ ,

$\sigma \prime =StdDev\left(dS\text{\hspace{0.17em}}/\text{\hspace{0.17em}}F\left(S,{S}_{0},{S}_{\infty },x,y\right)\right)\text{ }.$

(42)

In the one-factor model, it does not make good numerical sense to have two free parameters in one term, i.e.,$\sigma$ and$x$ , as these cannot be fit very well within the variance the data. Instead, one method is to take guidance from the two-factor results, to set a scale for an effective$\sigma$ , and then fit the parameter$x$ Another method it apply the above StdDev as a proxy for$\sigma$ Some motivation for this approach is given by considering collapsing a two-factor stochastic volatility model in one-factor model: The one-factor model now has an integral over the stochastic process in its diffusion term. The is integral is what we are approximating by using the standard deviation of a moving window of the data.

## 6. PATH-INTEGRAL (PATHINT) DEVELOPMENT

### 6.1. PATHINT Outline

The fits described above clearly demonstrate the need to incorporate stochastic volatility in option pricing models. If one-factor fits are desired, e.g., for efficiency of calculation, then at the least the exponent of price$x$ should be permitted to freely adapt to the data. In either case, it is required to develop a full set of Greeks for trading. To meet these needs, we have used a path-integral code, PATHINT, described below, with great success. At this time, the two-factor code takes too long to run for daily use, but it proves to be a good weekly baseline for the one-factor code.
The PATHINT algorithm develops the long-time probability distribution from the Lagrangian fit by the first optimization code. A robust and accurate histogram-based (non-Monte Carlo) path-integral algorithm to calculate the long-time probability distribution has been developed to handle nonlinear Lagrangians [18-20,22,53-55],
The histogram procedure recognizes that the distribution can be numerically approximated to a high degree of accuracy as sum of rectangles at points${M}_{i}$ of height${P}_{i}$ and width$\Delta {M}_{i}$ For convenience, just consider a one-dimensional system. The above path-integral representation can be rewritten, for each of its intermediate integrals, as

$P\left(M;t+\Delta t\right)=\text{\hspace{0.17em}}\int \text{\hspace{0.17em}}dM\prime \left[{g}_{s}^{1/2}\left(2\pi \Delta t{\right)}^{-1/2}\text{exp}\left(-{L}_{s}\Delta t\right)\right]P\left(M\prime ;t\right)=\text{\hspace{0.17em}}\int \text{\hspace{0.17em}}dM\prime G\left(M,M\prime ;\Delta t\right)P\left(M\prime ;t\right)\text{ },$
$P\left(M;t\right)=\underset{i=1}{\overset{N}{\Sigma }}\pi \left(M-{M}_{i}\right){P}_{i}\left(t\right)\text{ },$
 $\pi \left(M-{M}_{i}\right)=\left\{\begin{array}{l}0\text{\hspace{0.17em}},\text{ }\left({M}_{i}-\frac{1}{2}\Delta {M}_{i-1}\right)\le M\le \left({M}_{i}+\frac{1}{2}\Delta {M}_{i}\right)\text{ },\\ 1\text{\hspace{0.17em}},\text{ }otherwise\text{\hspace{0.17em}},\end{array}\text{ },\left(null\right)$ (43) which yields
${P}_{i}\left(t+\Delta t\right)={T}_{ij}\left(\Delta t\right){P}_{j}\left(t\right)\text{ },$
 ${T}_{ij}\left(\Delta t\right)=\frac{2}{\Delta {M}_{i-1}+\Delta {M}_{i}}\text{\hspace{0.17em}}\int \underset{{M}_{i}-\Delta {M}_{i-1}/2}{\overset{{M}_{i}+\Delta {M}_{i}/2}{\text{\hspace{0.17em}}}}dM\text{\hspace{0.17em}}\int \underset{{M}_{j}-\Delta {M}_{j-1}/2}{\overset{{M}_{j}+\Delta {M}_{j}/2}{\text{\hspace{0.17em}}}}dM\prime G\left(M,M\prime ;\Delta t\right)\text{ }.$ (44)${T}_{ij}$ is a banded matrix representing the Gaussian nature of the short-time probability centered about the (varying) drift. Care must be used in developing the mesh in$\Delta {M}^{G}$ , which is strongly dependent on the diagonal elements of the diffusion matrix, e.g.,
 $\Delta {M}^{G}\text{≈}\left(\Delta t{g}^{|G||G|}{\right)}^{1/2}\text{ }.$ (45) Presently, this constrains the dependence of the covariance of each variable to be a nonlinear function of that variable, albeit arbitrarily nonlinear, in order to present a straightforward rectangular underlying mesh. Below we address how we have handled this problem in our two-factor stochastic-volatility model. Fitting data with the short-time probability distribution, effectively using an integral over this epoch, permits the use of coarser meshes than the corresponding stochastic differential equation. The coarser resolution is appropriate, typically required, for numerical solution of the time-dependent path-integral: By considering the contributions to the first and second moments of$\Delta {M}^{G}$ for small time slices$\theta$ , conditions on the time and variable meshes can be derived [53]. The time slice essentially is determined by$\theta \text{ }\le \text{ }{\overline{L}}^{-1}$ , where$\overline{L}$ is the “static” Lagrangian with$d{M}^{G}/dt\text{ }=\text{ }0$ , throughout the ranges of${M}^{G}$ giving the most important contributions to the probability distribution$P$ The variable mesh, a function of${M}^{G}$ , is optimally chosen such that$\Delta {M}^{G}$ is measured by the covariance${g}^{GG\prime }$ , or$\Delta {M}^{G}\text{≈}\left({g}^{GG}\theta {\right)}^{1/2}$ If the histogram method is further developed into a trapezoidal integration, then more accuracy can be expected at more complex boundaries [54]. Such problems does not arise here, and 6−7 significant figure accuracy is easily achieved provided great care is taken to develop the mesh as described above. For example, after setting the initial-condition discretized delta function$MSUPG$ at the prepoint of an interval, the mesh going forward in${M}^{G}$ is simply calculated stepwise using
$\Delta {M}^{G}=\left({g}^{GG}\theta {\right)}^{1/2}\text{ }.$

(46)

However, going backwards in${M}^{G}$ , an iterative procedure was used at each step, starting with an estimate from the prepoint and going forward again, until there was no mismatch. That much care is required for the mesh was observed in the original Wehner-Wolfer paper [53].
It is important to stress that very good numerical accuracy is required to get very good Greeks required for real-world trading. Many authors develop very efficient numerical schemes to get reasonable prices to 2 or 3 significant figures, but these methods often are not very good to enough significant figures to get good precision for the Greeks. Typical Monte Carlo methods are notorious for giving such poor results after very long computer runs. In particular, we do not believe that good Greeks required for trading can be obtained by using meshes obtained by other simpler numerical algorithms [56].
The PATHINT algorithm in its present form can “theoretically” handle any n-factor model subject to its diffusion-mesh constraints. In practice, the calculation of 3-factor and 4-factor models likely will wait until giga-hertz speeds and giga-byte RAM are commonplace.

### 6.2. Development of Long-Time Probabilities

The noise determined empirically as the diffusion of the data is the same, independent of$x$ within our approach. Therefore, we scale different exponents such that the the diffusions, the square of the “basis-point volatilities” (BPV), are scaled to be equivalent. Then, there is not a very drastic change in option prices for different exponents$x$ for the strike$X$ set to the$S$ underlying, the at-the-money (ATM) strike. This is not the case for out of the money (OTM) or in the money (ITM) strikes, e.g., when exercising the strike would generate loss or profit, resp. This implies that current pricing models are not radically mispricing the markets, but there still are significant changes in Greeks using more sophisticated models.

### 6.3. Dependence of Probabilities on and

Fig. 2. gives examples of the short-time distribution evolved out to$T=0.5$ year for$x$ in {-1, 0, 1, 2}, with 500 intermediate epochs/foldings, and BS$\sigma =0.0075$ Each calculation scales$\sigma$ by multiplying by$S/F\left(S,{S}_{0},{S}_{\infty },x,y\right)$
Fig. 3. gives an example of a two-factor distribution evolved out to$T=0.5$ year for$x=0.7$

### 6.4. Two-Factor Volatility and PATHINT Modifications

In our two-factor model, the mesh of$S$ would depend on$\sigma$ and cause some problems in any PATHINT grid to be developed in$S$ -$\sigma$
For some time we have considered how to handle this generic problem for$n$ -factor multivariate systems with truly multivariate diffusions within the framework of PATHINT. In one case, we have taken advantage of the Riemannian invariance of the probability distribution as discussed above, to transform to a system where the diffusions have only “diagonal” multiplicative dependence [19]. However, this leads to cumbersome numerical problems with the transformed boundary conditions [20]. Another method, not yet fully tested, is to develop a tiling of diagonal meshes for each factor$i$ that often are suitable for off-diagonal regions in an$n$ -factor system, e.g.,

$\Delta {M}_{k}^{i}={2}^{{m}_{k}^{i}}\Delta {M}_{0}^{i}\text{ },$
$\Delta {M}_{0}^{i}\text{≈}\sqrt{{g}_{{k}_{0}}^{|i||i|}\Delta t}\text{ },$

(47)

where the mesh of variable$i$ at a given point labeled by$k$ is an exponentiation of 2, labeled by${m}_{k}^{i}$ ; the integral power${m}_{k}^{i}$ is determined such that it gives a good approximation to the diagonal mesh given by the one-factor PATHINT mesh conditions, in terms of some minimal mesh$\Delta {M}_{0}^{i}$ , throughout regions of the Lagrangian giving most important contributions to the distribution as predetermined by a scan of the system. This tiling of the kernel is to be used together with interpolation of intermediate distributions.
The results of our study here are that, after the at-the-money BPV are scaled to be equivalent, there is not a very drastic change in the one-factor ATM Greeks developed below. Therefore, while we have not at all changed the functional dependence of the Lagrangian on$S$ and$\sigma$ , we have determined our meshes using a diffusion for the$S$ equation as${\sigma }_{0}\text{\hspace{0.17em}}F\left(S,{S}_{0},{S}_{\infty },x,y\right)$ , where${\sigma }_{0}$ is determined by the same BPV-equivalent condition as imposed on the one-factor models. This seems to work very well, especially since we have taken our$\sigma$ equation to be normal with a limited range of influence in the calculations. Future work yet has to establish a more definitive distribution for$\sigma$

## 7. CALCULATION OF DERIVATIVES

### 7.1. Primary Use of Probabilities For European Options

We can use PATHINT to develop the distribution of the option value back in time from expiration. This is the standard approach used by CRR, explicit and implicit Crank-Nicolson models, etc [57].
For European options, we also take advantage of the accuracy of PATHINT enhanced by normalizing the distribution as well as the kernel at each iteration (though in these studies this was not required after normalizing the kernel). Therefore, we have calculated our European option prices and Greeks using the most elementary and intuitive definition of the option’s price$V$  [58], which is the expected value

 $V=\text{\hspace{0.17em}}<\text{max}\left[z\left(S-X\right),0\right]>\text{ },\text{ }\left\{\begin{array}{cc}z=1\text{ },\hfill & call\hfill \\ z=-1\text{ },\hfill & put\hfill \end{array}\text{ },\left(null\right)$ (48) where$X$ is the strike price, and the expected value$<.>$ is taken with respect to the risk-neutral distribution of the underlying market$S$ It should be noted that, while the standard approach of developing the option price delivers at the present time a range of underlying values for a given strike, our approach delivers a more practical complete range of strikes (as many as 50−60 for Eurodollar options) for a given underlying at the present time, resulting in a greatly enhanced numerical efficiency. The risk-neutral distribution is effectively calculated taking the drift as the cost-of-carry$b$ times$S$ , using the above arguments leading to the BS formula. We have designed our codes to use parameters risk-free-rate$r$ and cost-of-carry$b$ such that
$b=r,\text{ }\text{ }cost\text{ }of\text{ }carry\text{ }on\text{ }nondividend\text{ }stock\text{ },$
$b=r-q,\text{ }cost\text{ }of\text{ }carry\text{ }on\text{ }stock\text{ }paying\text{ }dividend\text{ }yield\text{ }q\text{ },$
$b=0,\text{ }cost\text{ }of\text{ }carry\text{ }on\text{ }future\text{ }contract\text{ },$
 $b=r-{r}_{f},\text{ }cost\text{ }of\text{ }carry\text{ }on\text{ }currency\text{ }with\text{ }rate\text{ }{r}_{f}\text{ },$ (49) which is similar to how generalized European BS codes use$b$ and$r$  [59]. Using this approach, the European price${V}_{E}$ is calculated as
 ${V}_{E}=\text{\hspace{0.17em}}<\text{max}\left[z\left({e}^{\left(b-r\right)T}{S}_{T}-{e}^{-rT}X\right),0\right]>\text{ }.$ (50) The American price${V}_{A}$ must be calculated using a different kernel going back in time from expiration, using as “initial conditions” the option values used in the above average. This kernel is the transposed matrix used for the European case, and includes additional drift and “potential” terms due to the need to develop this back in time. This can be understood as requiring the adjoint partial differential equation or a postpoint Lagrangian in real time. That is, a forward equation for the conditional probability distribution$P\left[M\left(t+dt\right),t+dt|M\left(t\right),t\right]$ is
 $\partial P/\partial t=\frac{1}{2}\left({g}^{GG\prime }P{\right)}_{,GG\prime }-\left({g}^{G}P{\right)}_{,G}+V\text{ },$ (51) where the partial derivatives with respect to${M}^{G}$ act on the postpoint${M}^{G}\left(t+dt\right)$ A backward equation for the conditional probability distribution$P\left[M\left(t+dt\right),t+dt|M\left(t\right),t\right]$ is
 $\partial P/\partial t=\frac{1}{2}{g}^{GG\prime }{P}_{,GG\prime }-{g}^{G}{P}_{,G}+V\text{ },$ (52) where the partial derivatives with respect to${M}^{G}$ act on the prepoint${M}^{G}\left(t\right)$ The forward equation has a particularly simple form in the mathematically equivalent prepoint path integral representation. Above, we have described how the forward distribution at present time${T}_{0}$ is evolved using PATHINT to the time of expiration,$P\left(T\right)$ , e.g., using a path-integral kernel$K$ folded over$n$ epochs, where it is folded with the function$O$ ,
$V=\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}\text{ },$
 $O\left(T\right)=\text{max}\left[z\left({e}^{\left(b-r\right)T}{S}_{T}-{e}^{-rT}X\right),0\right]\text{ },$ (53) to determine the European value at the present time of the calls and puts at different strike values$X$ An equivalent calculation can be performed by using the backward equation, expressed in terms of the “equivalent” kernel${K}^{†}$ acting on$O$ ,
 $V=\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}<\left({K}^{n}O\left(T\right)\right)P\left({t}_{0}\right)>\text{ }.$ (54) It is convenient to use the simple prepoint representation for the Lagrangian, so the backward kernel is first re-expressed as a forward kernel by bringing the diffusions and drifts inside the partial derivatives, giving a transformed adjoint kernel${K}^{†}$ The above mathematics is easily tested by calculating European options going forwards and backwards. For American options, while performing the backwards calculation, at each point of mesh, the options price evolving backwards from$T$ is tested and calculated as
${O}_{new}=\text{max}\left[\left(S-X\right),\left({e}^{-rdt}{O}_{old}\right)\right]\text{ }.$

(55)

The Greeks$\text{{}\Delta ,\Gamma ,\Theta \text{}}$ are directly taken off the final developed option at the present time, since the${M}^{G}$ mesh is available for all derivatives. We get excellent results for all Greeks. Note that for CRR trees, only one point of mesh is at the present time, so$\Delta$ requires moving back one epoch and$\Gamma$ requires moving back two epochs, unless the present time is pushed back additional epochs, etc.

### 7.2. PATHINT Baselined to CRR and BS

The CRR method is a simple binomial tree which in a specific limit approaches the BS partial differential equation. It has the virtues of being fast and readily accommodates European and American calculations. However, it suffers a number of well-known numerical problems, e.g., a systematic bias in the tree approximation and an oscillatory error as a function of the number of intermediate epochs/iterations in its time mesh. Some Greeks like$\text{{}\Delta ,\Gamma ,\Theta \text{}}$ can be directly taken off the tree used for pricing with reasonable approximations (at epochs just before the actual current time). The first problem for American options can be alleviated somewhat by using the variant method [5],

 $CR{R}_{variant}=CR{R}_{American}-CR{R}_{European}+BS\text{ }.$ (56) The second problem can be alleviated somewhat by averaging runs of$n$ iterations with runs of$n+1$ iterations [60]. This four-fold increase of runs is rarely used, though perhaps it should be more often. Furthermore, if increased accuracy in price is needed in order to take numerical derivatives, typically 200−300 iterations should be used for expirations some months away, not 30−70 as too often used in practice. When taking numerical derivatives there can arise a need to tune increments taken for the differentials. For some Greeks like$\Delta$ and$\Gamma$ the size of the best differentials to use may vary with strikes that have different sensitivities to parts of the underlying distribution. One method of building in some adaptive flexibility across many such strikes is to increase the order of terms used in taking numerical derivatives. (This was not required for results presented here.) For example, it is straightforward to verify that, while the central difference
 $\frac{df}{dx}=\frac{f\left(x+dx\right)-f\left(x-dx\right)}{2dx}$ (57) is typically good to$o\left(\left(dx{\right)}^{3}\right)$ ,
 $\frac{df}{dx}=\frac{-f\left(x+2dx\right)+8f\left(x+dx\right)-8f\left(x-dx\right)+f\left(x-2dx\right)}{12dx}$ (58) is typically good to$o\left(\left(dx{\right)}^{5}\right)$ Similarly, while
 $\frac{{d}^{2}f}{d{x}^{2}}=\frac{f\left(x+dx\right)-2f\left(x\right)+f\left(x-dx\right)}{dx\text{\hspace{0.17em}}dx}$ (59) is typically good to$\left(\left(dx{\right)}^{4}\right)$ ,
$\frac{{d}^{2}f}{d{x}^{2}}=\frac{-d\left(x+2dx\right)+16f\left(x+dx\right)-30f\left(x\right)+16f\left(x-dx\right)-f\left(x-2dx\right)}{dx\text{\hspace{0.17em}}dx}$

(60)

is typically good to$\left(\left(dx{\right)}^{6}\right)$
Table 1 gives an example of baselining our one-factor PATHINT code to the CRR and BS codes using the above safeguards for an option whose American price is the same as its European counterpart, a typical circumstance [59]. In the literature, the CRR code is most often taken as the benchmark for American calculations. We took the number of intermediate epochs/points to be 300 for each calculation. Parameters used for this particular ATM call are$T=0.5$ years,$r=0.05$ ,$b=0$ ,$\sigma =0.10$

Table 1.

Tests with American CRR and American PATHINT lead to results with the same degrees of accuracy.

### 7.3. Two-Factor PATHINT Baselined to One-Factor PATHINT

Previous papers and tests have demonstrated that the two-factor PATHINT code performs as expected. The code was developed with only a few lines to be changed for running any$n$ -factor problem, i.e., of course after coding the Lagrangian specific to a given system. Tests were performed by combining two one-factor problems, and there is no loss of accuracy. However, here we are making some additional mesh approximations as discussed above to accommodate$\sigma$ in the$S$ diffusion. This seems quite reasonable, but there is no sure test of the accuracy. We indeed see that the ATM results are very close across$x$ similar to our ATM comparisons between BS and our one-factor PATHINT results for various$x$ scaling is performed to have all models used the same BPV (using the${\sigma }_{0}$ procedure for the mesh as described above for the two-factor model).
The logical extension of Greeks for the two-factor model is to develop derivatives of price with respect to$\rho$ and$\epsilon$ in$\sigma$ volatility equation. However, we did not find a bona fide two-factor proxy for the one-factor$Υ$ , the derivative of price with respect to the one-factor$\sigma$ constant. We get very good ATM$Υ$ comparisons between BS and our one-factor models with various$x$ We tried simply multiplying the noise in the two-factor stochastic volatility in the price equation by a parameter with deviations from 1 to get numerical derivatives of PATHINT solutions, and this gave somewhat good agreement with the ATM BPV-scaled BS$Υ$ within a couple of significant figures. Perhaps this is not too surprising, especially given the correlation substantial$\rho$ between the price and volatility equations which we do not neglect.

## 8. CONCLUSIONS

The results of our study are that, after the at-the-money basis-point volatilities are scaled to be equivalent, there is only a very small change in option prices for different exponents$x$ , both for the one-factor and two-factor models. There still are significant differences in Greeks using more sophisticated models, especially for out-of-the-money options. This implies that current pricing models are not radically mispricing the markets.
Our studies point to contexts of markets well recognized by option traders to have significantly different volatility behavior. Suppression of stochastic volatility in the one-factor model just leaks out into stochasticity of parameters in the model, e.g., especially in$x$ , unless additional numerical methods are employed, e.g., using an adaptive standard deviation. Our studies show that the two-factor exponents$x$ are reasonably faithful indicators defining these different contexts. The$x$ -exponents in the two-factor fits are quite stable. As such, especially the two-factor$x$ can be considered as a “context indicator” over a longer time scale than other indicators typically used by traders. The two-factor fits also exhibit differences due to the$\sigma$ parameters, including the$\rho$ correlations, in accord with the sense traders have about the nature of changing volatilities across this time period.
Modern methods of developing multivariate nonlinear multiplicative Gaussian-Markovian systems are quite important, as there are many such systems and the mathematics must be diligently exercised if such models are to faithfully represent the data they describe. Similarly, sophisticated numerical techniques, e.g., global optimization and path integration are important tools to carry out the modeling and fitting to data without compromising the model, e.g., by unwarranted quasi-linear approximations. Three quite different systems have benefited from this approach:
The large-scale modeling of neocortical interactions has benefited from the use of intuitive constructs that yet are faithful to the complex algebra describing this multiple-scaled complex system. For example, canonical-momenta indicators have been successfully applied to multivariate financial markets.
It is clear that ASA optimization and PATHINT path-integral tools are very useful to develop the algebra of statistical mechanics for a large class of nonlinear stochastic systems encountered in finance. However, it also is clear that each system typically presents its own non-typical unique character and this must be included in any such analysis. A virtue of this statistical mechanics approach and these associated tools is they appear to be flexible and robust to handle quite different systems.

## ACKNOWLEDGMENTS

I thank Donald Wilson for his support and Jennifer Wilson for collaboration running ASA and PATHINT calculations. Implied volatility and yield data was extracted from the MIM database of Logical Information Machines (LIM).

## FIGURE CAPTIONS

FIG. 1. (a)$F\left(S,{S}_{0},{S}_{\infty },x,y\right)$ for$x=1$ , the Black-Scholes case. The other parameters are$S=5$ ,${S}_{0}=0.5$ ,${S}_{\infty }=20$ ,$y=0$ (b)$F\left(S,{S}_{0},{S}_{\infty },x,y\right)$ for$x=0$ , the normal distribution. (c)$F\left(S,{S}_{0},{S}_{\infty },x,y\right)$ for$x=-1$ (d)$F\left(S,{S}_{0},{S}_{\infty },x,y\right)$ for$x=2$
FIG. 2. The short-time probability distribution at time$T=0.5$ years for$x=1$ , the (truncated) Black-Scholes distribution. The short-time probability distribution at time$T=0.5$ years for$x=0$ , the normal distribution. The short-time probability distribution at time$T=0.5$ years for$x=-1$ The short-time probability distribution at time$T=0.5$ years for$x=2$
FIG. 3. A two-factor distribution evolved out to$T=0.5$ year for$x=0.7$

## TABLE CAPTIONS

Table 1. Calculation of prices and Greeks are given for closed form BS (only valid for European options), binomial tree$CR{R}_{European}$ ,$CR{R}_{American}$ ,$CR{R}_{variant}$ , and PATHINT. As verified by calculation, the American option would not be exercised early, so the PATHINT results are identical to the European option. The$CR{R}_{American}$ differs somewhat from the$CR{R}_{European}$ due to the discrete nature of the calculation. All CRR calculations include averaging over 300 and 301 iterations to minimize oscillatory errors.