]> High-resolution path-integral development of financial options

High-resolution path-integral development of financial options

1. INTRODUCTION
1.1. Background
1.2. Our Approach
1.3. Outline of Paper
2. DATA
2.1. Eurodollars
2.2. Futures
2.3. Options on Futures
2.4. Front/Back Month Contracts
2.5. Stochastic Volatility
3. MODELS
3.1. Random walk model
3.2. Black-Scholes (BS) Theory
3.3. Some Key Issues in Derivation of BS
3.4. Models
3.4.1. Various Diffusions
4. STATISTICAL MECHANICS OF FINANCIAL MARKETS (SMFM)
4.1. Statistical Mechanics of Large Systems
4.2. Correlations
5. ADAPTIVE SIMULATED ANNEALING (ASA) FITS
5.1. ASA Outline
5.1.1. General description
5.1.2. Mathematical outline
5.1.3. Reannealing
5.1.4. Quenching
5.2. -Indicator of Market Contexts
6. PATH-INTEGRAL (PATHINT) DEVELOPMENT
6.1. PATHINT Outline
6.2. Development of Long-Time Probabilities
6.3. Dependence of Probabilities on and
6.4. Two-Factor Volatility and PATHINT Modifications
7. CALCULATION OF DERIVATIVES
7.1. Primary Use of Probabilities For European Options
7.2. PATHINT Baselined to CRR and BS
7.3. Two-Factor PATHINT Baselined to One-Factor PATHINT
8. CONCLUSIONS
ACKNOWLEDGMENTS
REFERENCES
FIGURE CAPTIONS
TABLE CAPTIONS
Figure 1
Figure 2
Figure 3
Table 1


Lester Ingber

Lester Ingber Research
POB 06440 Sears Tower, Chicago, IL 60606
and
DRW Investments LLC
311 S Wacker Dr, Ste 900, Chicago, IL 60606

ingber@ingber.com, ingber@alumni.caltech.edu

+1.312.542.1006 Voice
+1.801.881.1088 Fax

ABSTRACT

The Black-Scholes theory of option pricing has been considered for many years as an important but very approximate zeroth-order description of actual market behavior. We generalize the functional form of the diffusion of these systems and also consider multi-factor models including stochastic volatility. Daily Eurodollar futures prices and implied volatilities are fit to determine exponents of functional behavior of diffusions using methods of global optimization, Adaptive Simulated Annealing (ASA), to generate tight fits across moving time windows of Eurodollar contracts. These short-time fitted distributions are then developed into long-time distributions using a robust non-Monte Carlo path-integral algorithm, PATHINT, to generate prices and derivatives commonly used by option traders.

Keywords: options; eurodollar; volatility; path integral; optimization; statistical mechanics

PACS Nos.: 05.10.-a, 02.70.-c, 82.20.wt, 89.90.+n

1. INTRODUCTION

1.1. Background

There always is much interest in developing more sophisticated pricing models for financial instruments. In particular, there currently is much interest in improving option pricing models, particularly with respect to stochastic variables [1-4].
The standard Black-Scholes (BS) theory assumes a lognormal distribution of market prices, i.e., a diffusion linearly proportional to the market price. However, many texts include outlines of more general diffusions proportional to an arbitrary power of the market price [5].
The above aspects of stochastic volatility and of more general functional dependencies of diffusions are most often “swept under the rug” of a simple lognormal form. Experienced traders often use their own intuition to put volatility “smiles” into the BS theoretical constant coefficient in the BS lognormal distribution to compensate for these aspects.
It is generally acknowledged that since the market crash of 1987, markets have been increasingly difficult to describe using the BS model, and so better modelling and computational techniques should be used traders [6], although in practice simple BS models are the rule rather than the exception simply because they are easy to use [7]. To a large extent, previous modelling that has included stochastic volatility and multiple factors has been driven more by the desire to either delve into mathematics tangential to these issues, or to deal only with models that can accommodate closed-form algebraic expressions. We do not see much of the philosophy in the literature that has long driven the natural sciences: to respect first raw data, secondly models of raw data, and finally the use of numerical techniques that do not excessively distort models for the sake of ease of analysis and speed of computation. Indeed, very often the reverse set of priorities is seen in mathematical finance.

1.2. Our Approach

We have addressed the above issues in detail within the framework of a previously developed statistical mechanics of financial markets (SMFM) [8-13].
Our approach requires three sensible parts. Part one is the formulation of the model, which to some extent also involves specification of the specific market(s) data to be addressed. Part two is the fitting of the model to specific market data. Part three is the use of the resulting model to calculate option prices and their Greeks (partial derivatives of the prices with respect to their independent variables), which are used as risk parameters by traders. Each part requires some specific numerical tuning to the market under consideration.
The first part was to develop the algebraic model to replace/generalize BS, including the possibility of also addressing how to handle data regions not previously observed in trading. This is not absurd; current BS models perform integrals that must include a much influence from fat tails that include data regions never seen or likely to be seen in real-world markets. There are some issues as to whether we should take seriously the notion that the market is strongly driven by some element of a “self-fulfilling prophesy” by the BS model [14], but in any case our models have parameters to handle a wide range of possible cases that might arise.
We have developed two parallel tracks starting with part one, a one-factor and a two-factor model. The two-factor model includes stochastic volatility. At first we sensed the need to develop this two-factor model, and we now see that this is at the least an important benchmark against which to judge the worth of the one-factor model.
The second part was to fit the actual raw data so we can come up with real distributions. Some tests illustrated that standard quasi-linear fitting routines, could not get the proper fits, and so we used a more powerful global optimization, Adaptive Simulated Annealing (ASA) [15]. Tuning and selection of the time periods to perform the fits to the data were not trivial aspects of this research. Practical decisions had to be made on the time span of data to be fit and how to aggregate the fits to get sensible “fair values” for reasonable standard deviations of the exponents in the diffusions.
The third part was to develop Greeks and risk parameters from these distributions without making premature approximations just to ease the analysis. Perhaps someday, simple approximations and intuitions similar to what traders now use for BS models will be available for these models, but we do not think the best approach is to start out with such approximations until we first see proper calculations, especially in this uncharted territory. When it seemed that Cox-Ross-Rubenstein (CRR) standard tree codes (discretized approximations to partial differential equations) [16] were not stable for general exponents, i.e., for other than the lognormal case, we turned to a PATHINT code developed a decade ago for some hard nonlinear multifactor problems [17], e.g., combat analyses [18], neuroscience [19,20], and potentially chaotic systems [21,22]. In 1990 and 1991 papers on financial applications, it was mentioned how these techniques could be used for stochastic interest rates and bonds [9,10]. The modifications required here for one-factor European and then American cases went surprisingly smoothly; we still had to tune the meshes, etc. The two-factor model presented a technical problem to the algorithm, which we have reasonably handled using a combination of selection of the model in part one and a reasonable approach to developing the meshes.
More detailed numerical calculations than contained in this paper will be presented in a subsequent paper [23].

1.3. Outline of Paper

Section 1 is this introduction. Section 2 describes the nature of Eurodollar (ED) futures data and the evidence for stochastic volatility. Section 3 outlines the algebra of modelling options, including the standard BS theory and our generalizations. Section 4 outlines the three equivalent mathematical representations used by SMFM; this is required to understand the development of the short-time distribution that defines the cost function we derive for global optimization, as well as the numerical methods we have developed to calculate the long-time evolution of these short-time distributions. Section 5 outlines ASA and explains its use to fit short-time probability distributions defined by our models to the Eurodollar data; we offer the fitted exponent in the diffusion as a new important technical indicator of market behavior. Section 6 outlines PATHINT and explains its use to develop long-time probability distributions from the fitted short-time probability distributions, for both the one-factor and two-factor tracks. Section 7 describes how we use these long-time probability distributions to calculate European and American option prices and Greeks; here we give numerical tests of our approach to BS CRR algorithms. Section 8 is our conclusion.

2. DATA

2.1. Eurodollars

Eurodollars are fixed-rate time deposits held primarily by overseas banks, but denominated in US dollars. They are not subject to US banking regulations and therefore tend to have a tighter bid-ask spread than deposits held in the United States [24].

2.2. Futures

The three-month Eurodollar futures contract is one of the most actively traded futures markets in the world. The contract is quoted as an index where the yield is equal to the Eurodollar price subtracted from 100. This yield is equal to the fixed rate of interest paid by Eurodollar time deposits upon maturity and is expressed as an annualized interest rate based on a 360-day year. The Eurodollar futures are cash settled based on the 90-day London Interbank Offer Rate (LIBOR). A “notional” principal amount of $1 million, is used to determine the change in the total interest payable on a hypothetical underlying time deposit, but is never actually paid or received [24].
Currently a total of 40 quarterly Eurodollar futures contracts (or ten years worth) are listed, with expirations annually in March, June, September and December.

2.3. Options on Futures

The options traded on the Eurodollar futures include not only 18 months of options expiring at the same time as the underlying future, but also various short dated options which themselves expire up to one year prior to the expiration of the underlying futures contract.

2.4. Front/Back Month Contracts

For purposes of risk minimization, as discussed in a previous paper [4], traders put on spreads across a variety of option contracts. One common example is to trade the spread on contracts expiring one year apart, where the future closer to expiration is referred to as the front month contract, and the future expiring one year later is called the back month. The availability of short dated or “mid-curve” options which are based on an underlying back month futures contract, but expire at the same time as the front month, allow one to trade the volatility ratios of the front and back month futures contracts without having to take the time differences in option expirations into consideration. We studied the volatilities of these types of front and back month contracts. Here, we give analyses with respect only to quarterly data longer than six months from expiration.

2.5. Stochastic Volatility

Below we develop two-factor models to address stochastic volatility. In a previous paper, we have performed empirical studies of Eurodollar futures to support the necessity of dealing with these issues [4].

3. MODELS

3.1. Random walk model

The use of Brownian motion as a model for financial systems is generally attributed to Bachelier [25], though he incorrectly intuited that the noise scaled linearly instead of as the square root relative to the random log-price variable. Einstein is generally credited with using the correct mathematical description in a larger physical context of statistical systems. However, several studies imply that changing prices of many markets do not follow a random walk, that they may have long-term dependences in price correlations, and that they may not be efficient in quickly arbitraging new information [26-28]. A random walk for returns, rate of change of prices over prices, is described by a Langevin equation with simple additive noiseη , typically representing the continual random influx of information into the market.

Γ.=γ1+γ2η,
Γ.=dΓ/dt,
<η(t)>η=0,<η(t),η(t)>η=δ(tt),

(1)

whereγ1 andγ2 are constants, andΓ is the logarithm of (scaled) price. Price, although the most dramatic observable, may not be the only appropriate dependent variable or order parameter for the system of markets [29]. This possibility has also been called the “semistrong form of the efficient market hypothesis” [26].
The generalization of this approach to include multivariate nonlinear nonequilibrium markets led to a model of statistical mechanics of financial markets (SMFM) [8].

3.2. Black-Scholes (BS) Theory

The standard partial-differential equation used to formulate most variants of Black-Scholes (BS) models describing the market value of an option,V , is

Vt+12σ2S22VS2+rSVSrV=0,

(2)

whereS is the asset price, andσ is the standard deviation, or volatility ofS , andr is the short-term interest rate. The solution depends on boundary conditions, subject to a number of interpretations, some requiring minor transformations of the basic BS equation or its solution. For example, the basic equation can apply to a number of one-dimensional models of interpretations of prices given toV , e.g., puts or calls, and toS , e.g., stocks or futures, dividends, etc.
For instance, ifV is set toC , a call on an European option with exercise priceX with maturity atT , the solution is

C(S,t)=SN(d1)Xer(Tt)N(d2),
d1=ln(S/X)+(r+12σ2)(Tt)σ(Tt)1/2,
d2=ln(S/X)+(r12σ2)(Tt)σ(Tt)1/2.

(3)

In practice, the volatilityσ is the least known parameter in this equation, and its estimation is generally the most important part of pricing options. Usually the volatility is given in a yearly basis, baselined to some standard, e.g., 252 trading days per year, or 360 or 365 calendar days. Therefore, all values of volatility given in the graphs in this paper, based on daily data, would be annualized by multiplying the standard deviations of the yields by252=15.87 We have used this factor to present our implied volatilities as daily movements.

3.3. Some Key Issues in Derivation of BS

The basic BS model considers a portfolio in terms of delta (Δ ),

Π=VΔS,

(4)

in a market with Gaussian-Markovian (“white”) noiseX and driftμ ,

dSS=σdX+μdt,

(5)

whereV(S,t) inherits a random process fromS ,

dV=σSVSdX+(μSVS+12σ2S22VS2+Vt)dt.

(6)

This yields

dΠ=σ(VSΔ)dX+(μSVS+12σ2S22VS2+VtμΔS)dt.

(7)

The expected risk-neutral return ofΠ is

dΠ=rΠdt=r(VΔS)dt.

(8)

OptionsV on futuresF can be derived, e.g., using simple transformations to take cost of carry into consideration, such as

F=Ser(Tt),

(9)

and setting

dΠ=rVdt.

(10)

The corresponding BS equation for futuresF is

Vt+12σ2F22VS2rV=0.

(11)

At least two advantages are present ifΔ is chosen such that

Δ=VS.

(12)

Then, the portfolio can be instantaneously “risk-neutral,” in terms of zeroing the coefficient ofX , as well as independent of the direction of market, in terms of zeroing the coefficient ofμ For the above example ofV =C ,

Δ=N(d1).

(13)

Other trading strategies based on this simple model use similar constructs as risk parameters, e.g., gamma (Γ ), theta (Θ ), vega (Υ ), rho (ρ ) [5],

Γ=2ΠS2,
Θ=Πt,
Υ=Πσ,
ρ=Πr.

(14)

The BS equation, Eq. (2), may be written as

Θ+rSΔ+12(σS)2Γ=rf.

(15)

3.4. Models

Our two-factor model includes stochastic volatilityσ of the underlyingS ,

dS=μdt+σF(S,S0,S,x,y)dzS,
dσ=νdt+εdzσ,
<dzi>=0,i={S,σ},
<dzi(t)dzj(t)>=dtδ(tt),i=j,
<dzi(t)dzj(t)>=ρdtδ(tt),ij,
F(S,S0,S,x,y)={S,SxS01x,SyS01xSxy,S<S0S0SSS>S,(null)

(16)

whereS0 andS are selected to lie outside the data region used to fit the other parameters, e.g.,S0=1 andS=20 for fits to Eurodollar futures which historically have a very tight range relative to other markets. We have used the Black-Scholes formF=S insideS<S0 to obtain the usual benefits, e.g., no negative prices as the distribution is naturally excluded fromS<0 and preservation of put-call parity. Put-call parity for European options is derived quite independent of any mathematical model of options [5]. In its simplest form, it is given by

c+Xer(Tt)=p+S,

(17)

wherec (p ) is the fair price of a call (put),X is the strike price,r is the risk-free interest rate,t is the present time,T is the time of expiration, andS is the underlying market. We have takeny=0 , a normal distribution, to reflect total ignorance of markets outside the range ofS>S The one-factor model just assumes a constantσ It is often noted that BS models incorrectly include untenable contributions from largeS regions because of their fat tails [30]. (If we wished to handle negative interest rates, ED prices > 100, we would move shift theS=0 axis to someS<0 value.)
We found that the abrupt, albeit continuous, changes acrossS0 especially forx0 did not cause any similar effects in the distributions evolved using these diffusions, as reported below.
The formula for pricing an optionP , derived in a Black-Scholes generalized framework after factoring out interest-rate discounting, is equivalent to using the form

dS=μSdt+σF(S,S0,S,x,y)dzS,
dσ=νdt+εdzσ.

(18)

We experimented with some alternative functional forms, primarily to apply some smooth cutoffs across the above three regions ofS For example, we usedF , a functionF designed to revert to the lognormal Black-Scholes model in several limits,

F(S,S0,S,x)=SC0+(1C0)(SxS01xC+S0(1C)),
C0=exp[(SS0|1x|1+|1x|)|2x|+1],
C=exp[(SS)2],
limS,x1F(S,S0,S,x)=S0=constant,
limS0+F(S,S0,S,x)=limx1F(S,S0,S,x)=S.

(19)

However, our fits were most sensitive to the data when we permitted the central region to be pureSx usingF above.

3.4.1. Various Diffusions

Fig. 1. gives examples ofF(S,S0,S,x,y)dzS forx in {-1, 0, 1, 2}. The other parameters areS=5 ,S0=0.5 ,S=20 ,y=0

4. STATISTICAL MECHANICS OF FINANCIAL MARKETS (SMFM)

4.1. Statistical Mechanics of Large Systems

Aggregation problems in nonlinear nonequilibrium systems typically are “solved” (accommodated) by having new entities/languages developed at these disparate scales in order to efficiently pass information back and forth. This is quite different from the nature of quasi-equilibrium quasi-linear systems, where thermodynamic or cybernetic approaches are possible. These approaches typically fail for nonequilibrium nonlinear systems.
Many systems are aptly modeled in terms of multivariate differential rate-equations, known as Langevin equations,

M.G=fG+g^jGηj,(G=1,...,Λ),(j=1,...,N),
M.G=dMG/dt,
<ηj(t)>η=0,<ηj(t),ηj(t)>η=δjjδ(tt),

(20)

wherefG andg^jG are generally nonlinear functions of mesoscopic order parametersMG ,j is a microscopic index indicating the source of fluctuations, andNΛ The Einstein convention of summing over repeated indices is used. Vertical bars on an index, e.g., |j|, imply no sum is to be taken on repeated indices.
Via a somewhat lengthy, albeit instructive calculation, outlined in several other papers [8,10,31], involving an intermediate derivation of a corresponding Fokker-Planck or Sch¨odinger-type equation for the conditional probability distributionP[M(t)|M(t0)] , the Langevin rate Eq. (20) is developed into the more useful probability distribution forMG at long-time macroscopic time eventt=(u+1)θ+t0 , in terms of a Stratonovich path-integral over mesoscopic Gaussian conditional probabilities [32-36]. Here, macroscopic variables are defined as the long-time limit of the evolving mesoscopic system.
The corresponding Sch¨odinger-type equation is [34,35]

P/t=12(gGGP),GG(gGP),G+V,
gGG=kTδjkg^jGg^kG,
gG=fG+12δjkg^jGg^k,GG,
[...],G=[...]/MG.

(21)

This is properly referred to as a Fokker-Planck equation whenV0 Note that although the partial differential Eq. (21) contains information regardingMG as in the stochastic differential Eq. (20), all references toj have been properly averaged over. I.e.,g^jG in Eq. (20) is an entity with parameters in both microscopic and mesoscopic spaces, butM is a purely mesoscopic variable, and this is more clearly reflected in Eq. (21).
The path integral representation is given in terms of the “Feynman” Lagrangian

L
P[Mt|Mt0]dM(t)=...D_Mexp(S)δ[M(t0)=M0]δ[M(t)=Mt],
S=kT1mint0tdtL,
D_M=limuΠρ=1u+1g1/2ΠG(2πθ)1/2dMρG,
L(M.G,MG,t)=12(M.GhG)gGG(M.GhG)+12hG;G+R/6V,
hG=gG12g1/2(g1/2gGG),G,
gGG=(gGG)1,
g=det(gGG),
hG;G=h,GG+ΓGFFhG=g1/2(g1/2hG),G,
ΓJKFgLF[JK,L]=gLF(gJL,K+gKL,JgJK,L),
R=gJLRJL=gJLgJKRFJKL,
RFJKL=12(gFK,JLgJK,FLgFL,JK+gJL,FK)+gMN(ΓFKMΓJLNΓFLMΓJKN).

(22)

Mesoscopic variables have been defined asMG in the Langevin and Fokker-Planck representations, in terms of their development from the microscopic system labeled byj The Riemannian curvature termR arises from nonlineargGG , which is a bona fide metric of this space [34]. Even if a stationary solution, i.e.,M.G=0 , is ultimately sought, a necessarily prior stochastic treatment ofM.G terms gives rise to these Riemannian “corrections.” Even for a constant metric, the termhG;G contributes toL for a nonlinear meanhGV may include terms such asΣTJTGMG , where the Lagrange multipliersJTG are constraints onMG , which are advantageously modeled as extrinsic sources in this representation; they too may be time-dependent.
For our purposes, the above Feynman Lagrangian defines a kernel of the short-time conditional probability distribution, in the curved space defined by the metric, in the limit of continuous time, whose iteration yields the solution of the previous partial differential equation Eq. (21). This differs from the Lagrangian which satisfies the requirement that the action is stationary to the first order indt — the WKBJ approximation, but which does not include the first-order correction to the WKBJ approximation as does the Feynman Lagrangian. This latter Lagrangian differs from the Feynman Lagrangian, essentially by replacingR/6 above byR/12  [37]. In this sense, the WKBJ Lagrangian is more useful for some theoretical discussions [38]. However, the use of the Feynman Lagrangian coincides with the numerical method we present here using the PATHINT code.
Using the variational principle,JTG may also be used to constrainMG to regions where they are empirically bound. More complicated constraints may be affixed toL using methods of optimal control theory [39]. With respect to a steady stateP¯ , when it exists, the information gain in stateP is defined by

Υ[P]=...D_MPln(P/P¯),
D_M=D_M/dMu+1.

(23)

In the economics literature, there appears to be sentiment to define Eq. (20) by the Ito, rather than the Stratonovich prescription. It is true that Ito integrals have Martingale properties not possessed by Stratonovich integrals [40] which leads to risk-neural theorems for markets [41,42], but the nature of the proper mathematics should eventually be determined by proper aggregation of relatively microscopic models of markets. It should be noted that virtually all investigations of other physical systems, which are also continuous time models of discrete processes, conclude that the Stratonovich interpretation coincides with reality, when multiplicative noise with zero correlation time, modeled in terms of white noiseηj , is properly considered as the limit of real noise with finite correlation time [43]. The path integral succinctly demonstrates the difference between the two: The Ito prescription corresponds to the prepoint discretization ofL , whereinθM.(t)Mρ+1Mρ andM(t)Mρ The Stratonovich prescription corresponds to the midpoint discretization ofL , whereinθM.(t)Mρ+1Mρ andM(t)12(Mρ+1+Mρ) In terms of the functions appearing in the Fokker-Planck Eq. (21), the Ito prescription of the prepoint discretized Lagrangian,LI , is relatively simple, albeit deceptively so because of its nonstandard calculus.

LI(M.G,MG,t)=12(M.GgG)gGG(M.GgG)V.

(24)

In the absence of a nonphenomenological microscopic theory, the difference between a Ito prescription and a Stratonovich prescription is simply a transformed drift [37].
There are several other advantages to Eq. (22) over Eq. (20). Extrema and most probable states ofMG ,MG , are simply derived by a variational principle, similar to conditions sought in previous studies [44]. In the Stratonovich prescription, necessary, albeit not sufficient, conditions are given by

δGL=L,GL,G.:t=0,
L,G.:t=L,G.GM.G+L,G.G.M..G.

(25)

For stationary states,M.G=0 , andL¯/M¯G=0 definesM¯G , where the bars identify stationary variables; in this case, the macroscopic variables are equal to their mesoscopic counterparts. [Note thatL¯ is not the stationary solution of the system, e.g., to Eq. (21) withP/t=0 However, in some cases [45],L¯ is a definite aid to finding such stationary states.] Many times only properties of stationary states are examined, but here a temporal dependence is included. E.g., theM.G terms inL permit steady states and their fluctuations to be investigated in a nonequilibrium context. Note that Eq. (25) must be derived from the path integral, Eq. (22), which is at least one reason to justify its development.

4.2. Correlations

Correlations between variables are modeled explicitly in the Lagrangian as a parameter usually designatedρ (not to be confused with the Rho Greek calculated for options). This section uses a simple two-factor model to develop the correspondence between the correlationρ in the Lagrangian and that among the commonly written Weiner distributionsdz
Consider coupled stochastic differential equations

dr=fr(r,l)dt+g^r(r,l)σ1dz1,
dl=fl(r,l)dt+g^l(r,l)σ2dz2,
<dzi>=0,i={1,2},
<dzi(t)dzj(t)>=dtδ(tt),i=j,
<dzi(t)dzj(t)>=ρdtδ(tt),ij,
δ(tt)={0,,1,tt,t=t,(null)

(26)

where<.> denotes expectations.
These can be rewritten as Langevin equations (in the It^ prepoint discretization)

dr/dt=fr+g^rσ1(γ+n1+sgnργn2),
dl/dt=gl+g^lσ2(sgnργn1+γ+n2),
γ±=12[1±(1ρ2)1/2]1/2,
ni=(dt)1/2pi,

(27)

wherep1 andp2 are independent [0,1] Gaussian distributions.
The equivalent short-time probability distribution,P , for the above set of equations is

P=g1/2(2πdt)1/2exp(Ldt),
L=12Fg_F,
F=(dr/dtfr)dl/dtfl)),
g=det(g_),
k=1ρ2.

(28)

g_

, the metric in{r,l} -space, is the inverse of the covariance matrix,

g_1=((g^rσ1)2ρg^rg^lσ1σ2ρg^rg^lσ1σ2(g^lσ2)2).

(29)

The above also corrects previous papers which inadvertently dropped thesgn factors in the above [9,10,17].

5. ADAPTIVE SIMULATED ANNEALING (ASA) FITS

5.1. ASA Outline

The algorithm developed which is now called Adaptive Simulated Annealing (ASA) [46] fits short-time probability distributions to observed data, using a maximum likelihood technique on the Lagrangian. This algorithm has been developed to fit observed data to a theoretical cost function over aD -dimensional parameter space [46], adapting for varying sensitivities of parameters during the fit. The ASA code can be obtained at no charge, via WWW from http://www.ingber.com/ or via FTP from ftp.ingber.com [15].

5.1.1. General description

Simulated annealing (SA) was developed in 1983 to deal with highly nonlinear problems [47], as an extension of a Monte-Carlo importance-sampling technique developed in 1953 for chemical physics problems. It helps to visualize the problems presented by such complex systems as a geographical terrain. For example, consider a mountain range, with two “parameters,” e.g., along the North−South and East−West directions. We wish to find the lowest valley in this terrain. SA approaches this problem similar to using a bouncing ball that can bounce over mountains from valley to valley. We start at a high “temperature,” where the temperature is an SA parameter that mimics the effect of a fast moving particle in a hot object like a hot molten metal, thereby permitting the ball to make very high bounces and being able to bounce over any mountain to access any valley, given enough bounces. As the temperature is made relatively colder, the ball cannot bounce so high, and it also can settle to become trapped in relatively smaller ranges of valleys.
We imagine that our mountain range is aptly described by a “cost function.” We define probability distributions of the two directional parameters, called generating distributions since they generate possible valleys or states we are to explore. We define another distribution, called the acceptance distribution, which depends on the difference of cost functions of the present generated valley we are to explore and the last saved lowest valley. The acceptance distribution decides probabilistically whether to stay in a new lower valley or to bounce out of it. All the generating and acceptance distributions depend on temperatures.
In 1984 [48], it was established that SA possessed a proof that, by carefully controlling the rates of cooling of temperatures, it could statistically find the best minimum, e.g., the lowest valley of our example above. This was good news for people trying to solve hard problems which could not be solved by other algorithms. The bad news was that the guarantee was only good if they were willing to run SA forever. In 1987, a method of fast annealing (FA) was developed [49], which permitted lowering the temperature exponentially faster, thereby statistically guaranteeing that the minimum could be found in some finite time. However, that time still could be quite long. Shortly thereafter, Very Fast Simulated Reannealing (VFSR) was developed in 1987 [46], now called Adaptive Simulated Annealing (ASA), which is exponentially faster than FA.
ASA has been applied to many problems by many people in many disciplines [50-52]. The feedback of many users regularly scrutinizing the source code ensures its soundness as it becomes more flexible and powerful.

5.1.2. Mathematical outline

ASA considers a parameterαki in dimensioni generated at annealing-timek with the range

αki[Ai,Bi],

(30)

calculated with the random variableyi ,

αk+1i=αki+yi(BiAi),
yi[1,1].

(31)

The generating functiongT(y) is defined,

gT(y)=Πi=1D12(|yi|+Ti)ln(1+1/Ti)Πi=1DgTi(yi),

(32)

where the subscripti onTi specifies the parameter index, and thek -dependence inTi(k) for the annealing schedule has been dropped for brevity. Its cumulative probability distribution is

GT(y)=1y1...1yDdy1...dyDgT(y)Πi=1DGTi(yi),
GTi(yi)=12+sgn(yi)2ln(1+|yi|/Ti)ln(1+1/Ti).

(33)

yi

is generated from aui from the uniform distribution

uiU[0,1],
yi=sgn(ui12)Ti[(1+1/Ti)|2ui1|1].

(34)

It is straightforward to calculate that for an annealing schedule for

Ti
Ti(k)=T0iexp(cik1/D),

(35)

a global minima statistically can be obtained. I.e.,

Σk0gkΣk0[Πi=1D12|yi|ci]1k=.

(36)

Control can be taken overci , such that

Tfi=T0iexp(mi)whenkf=expni,
ci=miexp(ni/D),

(37)

wheremi andni can be considered “free” parameters to help tune ASA for specific problems.
ASA has over 100 OPTIONS available for tuning. A few important ones were used in this project.

5.1.3. Reannealing

Whenever doing a multi-dimensional search in the course of a complex nonlinear physical problem, inevitably one must deal with different changing sensitivities of theαi in the search. At any given annealing-time, the range over which the relatively insensitive parameters are being searched can be “stretched out” relative to the ranges of the more sensitive parameters. This can be accomplished by periodically rescaling the annealing-timek , essentially reannealing, every hundred or so acceptance-events (or at some user-defined modulus of the number of accepted or generated states), in terms of the sensitivitiessi calculated at the most current minimum value of the cost function,C ,

si=C/αi.

(38)

In terms of the largestsi =smax , a default rescaling is performed for eachki of each parameter dimension, whereby a new indexki is calculated from eachki ,

kiki,
Tik=Tik(smax/si),
ki=(ln(Ti0/Tik)/ci)D.

(39)

Ti0

is set to unity to begin the search, which is ample to span each parameter dimension.

5.1.4. Quenching

Another adaptive feature of ASA is its ability to perform quenching in a methodical fashion. This is applied by noting that the temperature schedule above can be redefined as

Ti(ki)=T0iexp(cikiQi/D),
ci=miexp(niQi/D),

(40)

in terms of the “quenching factor”Qi The sampling proof fails ifQi>1 as

ΣkΠD1/kQi/D=Σk1/kQi<.

(41)

This simple calculation shows how the “curse of dimensionality” arises, and also gives a possible way of living with this disease. In ASA, the influence of large dimensions becomes clearly focussed on the exponential of the power ofk being1/D , as the annealing required to properly sample the space becomes prohibitively slow. So, if resources cannot be committed to properly sample the space, then for some systems perhaps the next best procedure may be to turn on quenching, wherebyQi can become on the order of the size of number of dimensions.
The scale of the power of1/D temperature schedule used for the acceptance function can be altered in a similar fashion. However, this does not affect the annealing proof of ASA, and so this may used without damaging the sampling property.

5.2. -Indicator of Market Contexts

Our studies of contexts of markets well recognized by option traders to have significantly different volatility behavior show that the exponentsx are reasonably faithful indicators defining these different contexts.
We feel the two-factor model is more accurate because the data indeed demonstrate stochastic volatility [4]. We also note that the two-factorx fit by ASA across the last few years. This is not true of the one-factor ASA fittedx the Black-Scholesσ as a parameter, but rather calculate as historical volatility during all runs. Some results of two-factor studies and one-factor studies using a Black-Sc