Back
 JMF  Vol.10 No.1 , February 2020
A Clustering Method to Solve Backward Stochastic Differential Equations with Jumps
Abstract: In this paper, we introduce a clustering method to approximate the solution to a general Backward Stochastic Differential Equation with Jumps (BSDEJ). We show the convergence of the sequence of approximate solutions to the true one. The method is implemented for an application in finance. Numerical results show that the method is efficient.

1. Introduction

Solving BSDEs is of great importance in mathematical finance. The derivatives pricing problems, dynamic portfolio choice problems and even dynamic stochastic general equilibrium problems can be related to various types of BSDEs. However, due to its recursive nature, a BSDE with jumps cannot be directly solved via Monte Carlo method.

In this paper, we apply the methodology to evaluate conditional expectations, which is originally introduced in [1], to solve BSDEs with jumps. This novel method simplifies the structures of the approximate solution to the true one, by running a local approximation. Under the assumption that the solution is continuous, we can show that a linear regression approximation locally will be accurate enough with very limited computation budget and convergence is shown. To be specific, we use Monte-Carlo simulation and machine learning clustering method to divide the state space into small clusters, in which we approximate the true solutions via linear regression.

Recent literature of machine learning method to solve PDEs/BSDEs can be found in, for example, [2] [3] [4] [5] and [6]. In those references, the authors perform a brute-force machine learning approximation to the true solutions and the methods are therefore time-consuming. Our method can serve as a generalization of the above methods, in that, when the number of clusters is set to be 1, our method degenerates to the aforementioned ones.

The organization of this paper is as follows. Section 2 describes the method and convergence proof. Section 3 discusses an application in finance and Section 4 concludes. All the proofs can be found in Appendix.

2. A Formal Description of Our Approach

2.1. The BSDE

Throughout the paper, we consider a filtered probability space ( Ω , F , ( F t ) 0 t T , ) , with T R + . The space is supporting a d-dimensional Brownian motion W t = ( W t 1 , , W t d ) and a Poisson random measure N on B ( [ 0, T ] ) E , where B ( [ 0, T ] ) is the Borel σ-algebra on [ 0, T ] and ( E , E ) is a measurable space. Define E : = q and E as the Borel σ-algebra on E. is the probability measure on F . The filtration ( F t ) 0 t T is completed with all -null sets, right-continuous and F t = F t W , N is generated by ( W t , N ( , [ 0, t ] , ) ) for t [ 0, T ] . Assume that F = F T W , N and W and N are mutually independent under . Suppose that the compensating measure of N is ν ( d t , d e ) : = ν ( d e ) d t , where ν is a σ-finite measure on ( E , E ) satisfying E ( 1 | e | 2 ) ν ( d e ) < . The corresponding compensated Poisson random measure is defined by N ˜ ( ω , d t , d e ) : = N ( ω , d t , d e ) ν ( d e ) d t .

The BSDEJ under consideration, whose solution is denoted by ( X , Y , Z , U ) , is

d X t = μ ( t , X t ) d t + σ ( t , X t ) d W t + E γ ( t , X t , e ) N ˜ ( d t , d e ) X 0 = x 0 d Y t = f ( t , X t , Y t , Z t , V t ) d t + Z t d W t + E U t ( e ) N ˜ ( d t , d e ) Y T = ϕ ( X T ) (1)

where V t = E U t ( e ) ρ ( e ) ν ( d e ) for a given smooth and bounded function ρ . The vector of state variables X t F t is r-dimensional and satisfies the Forward-SDE (FSDE) indicated. The standard Brownian motion W is d-dimensional, Y t F t is 1-dimensional, Z t F t is d-dimensional, U t ( e ) F t is q-dimensional, V t F t is 1-dimensional and N ˜ is q-dimensional. ( X , Y , Z , U ) is called the solution to the BSDEJ system (1). We assume that r d throughout the paper without loss of generality.

2.2. The Numerical Approximation Scheme

This section describes the construction of the approximate solution to the BSDEJ (1). The intermediate solutions can be obtained by the following equations, according to [7]

Y t = u ( t , X t ) = E [ ϕ ( X T ) + t T f ( v , X v , Y v , Z v , V v ) d v | F t ] Z t = x u ( t , X t ) σ ( t , X t ) U t ( e ) = u ( t , X t + γ ( t , X t , e ) ) u ( t , X t ) (2)

where ( Y t , Z t , U t ) are functions of ( t , X t ) . It is therefore crucial to be able to evaluate the above conditional expectations for each t.

Suppose that the domain of state variables X is D r 1. Partition D by a class of disjoint sets { U k } k = 1 K . Denote the distance d U = sup x , y U | x y | , where | | is the canonical Euclidean distance. It is known that the evaluation of conditional expectation problems are essentially regression ones, in the following sense

E t [ ψ ( X T ) ] arg min ϕ Φ ( D ) E [ | ψ ( X T ) ϕ ( t , X t ) | 2 ] (3)

where Φ ( D ) is an appropriate function space with domain D 2. First, we need the following result.

Lemma 1. We have

E t [ ψ ( X T ) ] lim i arg min ϕ Φ ( D i ) E [ | ψ ( X T ) ϕ ( t , X t ) | 2 1 X t D i ] (4)

in L 2 sense.

With the above result in mind, we will always write D i as D and assume that D is bounded and closed. Many methods have been emerging to find global solutions to the above Equation (3). However, in this paper, we follow a divide-and-conquer approach, by trying to fit the regression locally in each of the subspace U k , for k = 1 , 2 , , K . This means, we should try to evaluate

ϕ k ( t , X t ) : = arg min ϕ k Φ ( U k ) E [ | ψ ( X T ) 1 X t U k ϕ ( t , X t ) 1 X t U k | 2 ] (5)

where Φ ( U k ) is an appropriate function space with domain U k , under the constraint k = 1 K ϕ k ( t , x ) 1 x U k Φ . ( [1], Theorem 23) concludes that, indeed, we have ϕ ( t , x ) k = 1 K ϕ k ( t , x ) 1 x U k . In ( [1], Theorem 24), the condition

k = 1 K ϕ k ( t , x ) 1 x U k Φ is relaxed under certain conditions. In this paper, we will use the two aforementioned theorems and show that polynomial regressions and Taylor expansions for ϕ k and ψ , applying a Monte Carlo simulation to generate samples and use machine learning clustering to partition the simulated paths, will result in convergent numerical methods. In order to use Taylor expansion, we need ψ to be C l -smooth, meaning it should have bounded derivatives up to order l in D . This requires us to approximate it using C l -smooth functions. The following assumption applies throughout the paper.

Assumption 1. (On ψ ). Assume that, in some sense, ψ can be approximated by a sequence of C l -smooth functions { ψ i } , with l finite or . Moreover, we have

lim i E t [ | ψ ( X t ) ψ i ( X t ) | 2 ] = 0 (6)

for any t [ 0, T ] .

Remark 2. For example, ψ i can converge to ψ in a point-wise sense or L 2 sense. Because of Assumption 1, we will always assume that ψ is C l -smooth.

Then, we have the following Theorems.

Theorem 3. (On Polynomial Regression). Assume that3

ϕ ^ k , J ( t , X t ) : = arg min P J P J ( U k ) E [ | ψ ( X T ) 1 X t U k P J ( X t ) 1 X t U k | 2 ] (7)

where P J ( U k ) is the space of all polynomials of degree equal to or less than J with domain U k . Then, we have

ϕ ( t , X t ) = lim max 1 k K d U k 0 + k = 1 K ϕ ^ k , J ( t , X t ) 1 X t U k (8)

in L 2 sense with J 1 fixed.

Theorem 4 (On Taylor Expansion). Assume that ψ ( x ) ψ ^ J ( x , x t k ) = | α | J α ψ ( x t k ) α ! ( x x t k ) α and

ϕ ^ k , J ( t , X t k ) = E t [ ψ ^ J ( X T , X t k ) 1 X t k U k ] (9)

where X t k U k . Then, we have

ϕ ( t , X t ) = lim max 1 k K d U k 0 + k = 1 K ϕ ^ k , J ( t , X t ) 1 X t U k (10)

in L 2 sense with J 1 fixed.

The following theorem establishes the convergence under Monte Carlo simulation.

Theorem 5 (On Monte Carlo Simulation). Assume that there is a time-discretization { t j } j = 1 n and we obtain M samples { X t j m } j , m = 1 , 1 n , M . Then, we have

ϕ ( t , X t ) = lim M lim max 1 k K d U k 0 + 1 M m = 1 M k = 1 K ϕ ^ k , J ( t , X t m ) 1 X t m U k (11)

where ϕ ^ k , J is obtained via OLS in the cases of polynomial regression and Taylor expansion, in L 2 sense with J 1 fixed.

Theorem 5 is ultimately the convergence result for the methodology we propose. We first generate M copies of the sample paths of X from 0 to T. Then, we use MiniBatchKMeans function in Python programming language to partition the M copies of simulated values of X, at each time t [ 0, T ] , into K clusters. With each cluster U k for k = 1 , 2 , , K , we perform the analysis as shown above and concatenate different pieces together to formulate the values for the conditional expectations at time t. The methodology works backwards in time. Of course, there are many different choices of ϕ ^ k , J , depending on how we want to approximate the conditional expectations within U k .

With the above discussions on the evaluation of conditional expectations, we can start to describe the main numerical method for a BSDEJ. First, define a set of time nodes π n = { t i } i = 1 n , with t 0 = 0 and t n = T . Suppose that

t i t i 1 h = T n . Second, we simulate M copies of the forward process X at the

discretization nodes { t i } i = 1 n . Third, we iterate backwards in time using the methodologies to evaluate conditional expectations outlined above. At time interval [ t n 1 , t n ) , use MiniBatchKMeans method to cluster { X t n 1 π n , m } m = 1 M into K disjoint subsets { U t n 1 k } k = 1 K . Within each cluster U t n 1 k , apply polynomial regression or Taylor expansion method to evaluate ϕ ^ π n , M , k , J ( t n 1 , X t n 1 π n , m ) , where we add subscript π n to remind the readers that this intermediate function depends on time discretization now and M to denote Monte Carlo simulation dependence. After we obtain { ϕ ^ π n , M , k , J } k = 1 K , we can treat it as the intermediate solution to the BSDEJ at time t n 1 . In order to evaluate Z and V, taking Z as an example,

we use the relation Z t 1 h E t [ Δ Y t Δ W t ] . The equations for Z and V are stated in

( [7], Equation 2.10). When we obtain intermediate approximate solution ( Y t n 1 π n , M , k , J , Z t n 1 π n , M , k , J , V t n 1 π n , M , k , J ) , we plug the tuple into the driver f of the BSDEJ and Y t n 1 π n , M , k , J + h f ( t n 1 , X t n 1 π n , Y t n 1 π n , M , k , J , Z t n 1 π n , M , k , J , V t n 1 π n , M , k , J ) is the terminal condition for the evaluations in time interval [ t n 2 , t n 1 ) . Iterate until we reach t 0 .

2.3. Convergence Results

First, we introduce necessary spaces to facilitate the discussions of convergence results. For an r -valued function x : [ 0, T ] r , let the sup-norm be

x [ a , b ] : = sup { | x t | , t [ a , b ] } . (12)

We shall use the following spaces for stochastic processes with p 2

S r p [ s , t ] is the set of r -valued adapted càdlàg processes X such that

X S r p [ s , t ] : = E [ e θ ( t s ) X ( ω ) [ s , t ] p ] 1 p < . (13)

Here θ 0 is a nonnegative constant. We sometimes write S r p [ s , t ] as S p if doing so causes no ambiguity. The same is true for the spaces to be defined below.

p [ s , t ] is the set of progressively measurable d -valued processes Z such that

Z p [ s , t ] : = E [ ( s t e θ ( v s ) | Z v | 2 d v ) p 2 ] 1 p < . (14)

J p [ s , t ] is the set of q-dimensional processes ψ = { ψ i ,1 i q } , ψ i : Ω × [ 0, T ] × E which is F × B ( [ 0, T ] ) × B ( E ) -measurable and satisfy

ψ J p [ s , t ] : = E [ ( j = 1 q s t E e θ ( v s ) | ψ i ( ω , v , e ) | 2 ν i ( d e ) d v ) p 2 ] 1 p < . (15)

K p [ s , t ] is the set of processes ( Y , Z , ψ ) in the space S p [ s , t ] × p [ s , t ] × J p [ s , t ] with the norm

( Y , Z , ψ ) K p [ s , t ] : = ( Y S p [ s , t ] p + Z p [ s , t ] p + ψ J p [ s , t ] p ) 1 p . (16)

Let C b g ( D ) be the space of bounded functions that have continuous and bounded derivatives up to order g in the domain D r and C g ( D ) the space of functions that have continuous derivatives up to order g. In what follows, we will only consider the norms with θ > 0 and p = 2 . In order to provide the final convergence result, we need the following assumption.

Assumption 6 (On BSDEJ). Assume that the BSDEJ (1) admits a unique solution S t = ( Y t , Z t , V t ) in K p [ 0, T ] and the time discretized solution S t π n : = ( Y t π n , Z t π n , V t π n ) converges to S t = ( Y t , Z t , V t ) in K p [ 0, T ] sense.

Then, the following convergence result is valid to justify the methodology we introduce above.

Theorem 7 (Main Convergence Result). Under Assumptions 1 and 6, we have

lim h 0 + lim M lim max 1 k K d U k 0 + S π n , M , K , J S K p [ 0, T ] = 0 (17)

where S t π n , M , K , J = k = 1 K S t π n , M , k , J 1 X t U t k and S t π n , M , k , J = ( Y t π n , M , k , J , Z t π n , M , k , J , V t π n , M , k , J ) .

3. Applications in Finance

The BSDE associated with American option prices can be found in [8]. We numerically solve ( [8], Equation 2.10) using the methodology introduced in Section 2. Performance is documented below. Here M is the number of sample paths and H is the number of time discretization nodes. The time to maturity of the American call option is T = 0.50 . S 0 is the initial price of the underlying asset and the strike price is 100. Table 1 and Table 2 contain efficiency results.

4. Conclusion

In this paper, we propose a new machine learning method, based on clustering, to run [9] type regression to solve BSDEs with jumps. Convergence proof is given and numerical results show good performance. The generalization of the proposed methodologies to more complicated BSDEs with jumps, for example, the Mckean-Vlasov type BSDEs, is of interest. We leave it for future research.

Table 1. Numerical results when M = 50000.

Table 2. Numerical results when M = 75000.

Appendix: Proofs

Proof of Lemma 1. Denote λ i = min ϕ Φ ( D i ) E [ | ψ ( X T ) ϕ ( t , X t ) | 2 1 X t D i ] . Because we have D i D i + 1 , it is obvious that Φ ( D i + 1 ) Φ ( D i ) . Therefore, we have λ i < λ i + 1 λ * , where

λ * = min ϕ Φ ( D ) E [ | ψ ( X T ) ϕ ( t , X t ) | 2 ] . (18)

We immediately have lim i λ i = λ ^ λ * . In addition, we have

λ * = min ϕ Φ ( D ) E [ | ψ ( X T ) ϕ ( t , X t ) | 2 ] (19)

min ϕ 1 Φ ( D i ) E [ | ψ ( X T ) ϕ 1 ( t , X t ) | 2 1 X t D i ] (20)

+ E [ | ψ ( X T ) ϕ * ( t , X t ) | 2 1 X t D D i ] . (21)

Here ϕ * = arg min ϕ Φ ( D ) E [ | ψ ( X T ) ϕ ( t , X t ) | 2 ] . The term

E [ | ψ ( X T ) ϕ * ( t , X t ) | 2 1 X t D D i ] (22)

C ( X t D D i ) . (23)

Here C is independent of i and we have lim i ( X t D D i ) = 0 . Combine this result with λ i < λ i + 1 λ * and we will obtain the claim of this lemma.

Lemma 2 (On Lead-Lag Regression). Suppose that H Φ , then we have

arg min ϕ H E [ | ψ ( X T ) ϕ ( t , X t ) | 2 ] = arg min ϕ H E [ | ϕ * ( t , X t ) ϕ ( t , X t ) | 2 ] (24)

where ϕ * ( t , X t ) = arg min ϕ Φ E [ | ψ ( X T ) ϕ ( t , X t ) | 2 ] .

Proof of Lemma 2. The proof of this lemma follows directly from the Repeated Projection Theorem (see, e.g., [1], Theorem 8).

Proof of Theorem 3. Apply Lemma 2 and ( [1], Theorem 23) and the proof is obvious.

Proof of Theorem 4. The proof of the theorem follows from the error term of the Taylor expansion, Lemma 2 and ( [1], Theorem 24).

NOTES

1Of course, set D can be time-dependent and unbounded. For brevity we ignore the time variable throughout the paper. But we will use a sequence of bounded and closed subsets of D , denoted by { D i } i = 1 and assume that D i D i + 1 and i = 1 D i = D . Then, we will choose a sufficiently large I and perform computations in D i instead of D .

2For example, we can ask Φ ( D ) : = { ϕ | ϕ ( t , X t ) is L 2 for all t [ 0 , T ] } .

3The equation below means that we run a lead-lag polynomial regression within each U k , with Y variable ψ ( X T ) and X variable X t .

Cite this paper: Zhang, L. (2020) A Clustering Method to Solve Backward Stochastic Differential Equations with Jumps. Journal of Mathematical Finance, 10, 1-9. doi: 10.4236/jmf.2020.101001.
References

[1]   Ye, T.T. and Zhang, L.L. (2019 Derivatives Pricing via Machine Learning. Journal of Mathematical Finance, 9, 561-589. https://doi.org/10.4236/jmf.2019.93029

[2]   Han, J., Jentzen, A. and E, W.N. (2017) Overcoming the Curse of Dimensionality: Solving High-Dimensional Partial Differential Equations Using Deep Learning. Working Paper. https://arxiv.org/pdf/1707.02568.pdf

[3]   Khoo, Y., Lu, J. and Ying, L. (2017) Solving Parametric PDE Problems with Artificial Neural Networks. Working Paper. https://arxiv.org/pdf/1707.03351.pdf

[4]   Sirignano, J. and Spiliopoulos, K. (2017) DGM: A Deep Learning Algorithm for Solving Partial Differential Equations. Working Paper.
https://arxiv.org/pdf/1708.07469.pdf

[5]   Beck, C., E, W.N. and Jentzen, A. (2017) Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Deferential Equations and Second-Order Backward Stochastic Differential Equations. Working Paper.
https://arxiv.org/pdf/1709.05963.pdf

[6]   E, W.N., Han, J. and Jentzen, A. (2017) Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Deferential Equations and Backward Stochastic Differential Equations. Working Paper. https://arxiv.org/pdf/1706.04702.pdf

[7]   Bouchard, B. and Elie, R. (2008) Discrete-Time Approximation of Decoupled Forward-Backward SDE with Jumps. Stochastic Processes and Their Applications, 118, 53-75. https://doi.org/10.1016/j.spa.2007.03.010

[8]   Fujii, M., Sato, S. and Takahashi, A. (2015) An FBSDE Approach to American Option Pricing with an Interacting Particle Method. Asia-Pacific Financial Markets, 22, 239-260. https://doi.org/10.1093/rfs/14.1.113

[9]   Longstaff, F.A. and Schwartz, E.S. (2001) Valuing American Options by Simulation: A Simple Least Square Approach. Review of Financial Studies, 14,113-147.
https://doi.org/10.1007/s10690-014-9195-6

 
 
Top