Solving BSDEs is of great importance in mathematical finance. The derivatives pricing problems, dynamic portfolio choice problems and even dynamic stochastic general equilibrium problems can be related to various types of BSDEs. However, due to its recursive nature, a BSDE with jumps cannot be directly solved via Monte Carlo method.
In this paper, we apply the methodology to evaluate conditional expectations, which is originally introduced in , to solve BSDEs with jumps. This novel method simplifies the structures of the approximate solution to the true one, by running a local approximation. Under the assumption that the solution is continuous, we can show that a linear regression approximation locally will be accurate enough with very limited computation budget and convergence is shown. To be specific, we use Monte-Carlo simulation and machine learning clustering method to divide the state space into small clusters, in which we approximate the true solutions via linear regression.
Recent literature of machine learning method to solve PDEs/BSDEs can be found in, for example,     and . In those references, the authors perform a brute-force machine learning approximation to the true solutions and the methods are therefore time-consuming. Our method can serve as a generalization of the above methods, in that, when the number of clusters is set to be 1, our method degenerates to the aforementioned ones.
The organization of this paper is as follows. Section 2 describes the method and convergence proof. Section 3 discusses an application in finance and Section 4 concludes. All the proofs can be found in Appendix.
2. A Formal Description of Our Approach
2.1. The BSDE
Throughout the paper, we consider a filtered probability space , with . The space is supporting a d-dimensional Brownian motion and a Poisson random measure N on , where is the Borel σ-algebra on and is a measurable space. Define and as the Borel σ-algebra on E. is the probability measure on . The filtration is completed with all -null sets, right-continuous and is generated by for . Assume that and W and N are mutually independent under . Suppose that the compensating measure of N is , where is a σ-finite measure on satisfying . The corresponding compensated Poisson random measure is defined by .
The BSDEJ under consideration, whose solution is denoted by , is
where for a given smooth and bounded function . The vector of state variables is r-dimensional and satisfies the Forward-SDE (FSDE) indicated. The standard Brownian motion W is d-dimensional, is 1-dimensional, is d-dimensional, is q-dimensional, is 1-dimensional and is q-dimensional. is called the solution to the BSDEJ system (1). We assume that throughout the paper without loss of generality.
2.2. The Numerical Approximation Scheme
This section describes the construction of the approximate solution to the BSDEJ (1). The intermediate solutions can be obtained by the following equations, according to 
where are functions of . It is therefore crucial to be able to evaluate the above conditional expectations for each t.
Suppose that the domain of state variables X is 1. Partition by a class of disjoint sets . Denote the distance , where is the canonical Euclidean distance. It is known that the evaluation of conditional expectation problems are essentially regression ones, in the following sense
where is an appropriate function space with domain 2. First, we need the following result.
Lemma 1. We have
With the above result in mind, we will always write as and assume that is bounded and closed. Many methods have been emerging to find global solutions to the above Equation (3). However, in this paper, we follow a divide-and-conquer approach, by trying to fit the regression locally in each of the subspace , for . This means, we should try to evaluate
where is an appropriate function space with domain , under the constraint . ( , Theorem 23) concludes that, indeed, we have . In ( , Theorem 24), the condition
is relaxed under certain conditions. In this paper, we will use the two aforementioned theorems and show that polynomial regressions and Taylor expansions for and , applying a Monte Carlo simulation to generate samples and use machine learning clustering to partition the simulated paths, will result in convergent numerical methods. In order to use Taylor expansion, we need to be -smooth, meaning it should have bounded derivatives up to order l in . This requires us to approximate it using -smooth functions. The following assumption applies throughout the paper.
Assumption 1. (On ). Assume that, in some sense, can be approximated by a sequence of -smooth functions , with l finite or . Moreover, we have
for any .
Remark 2. For example, can converge to in a point-wise sense or sense. Because of Assumption 1, we will always assume that is -smooth.
Then, we have the following Theorems.
Theorem 3. (On Polynomial Regression). Assume that3
where is the space of all polynomials of degree equal to or less than J with domain . Then, we have
in sense with fixed.
Theorem 4 (On Taylor Expansion). Assume that and
where . Then, we have
in sense with fixed.
The following theorem establishes the convergence under Monte Carlo simulation.
Theorem 5 (On Monte Carlo Simulation). Assume that there is a time-discretization and we obtain M samples . Then, we have
where is obtained via OLS in the cases of polynomial regression and Taylor expansion, in sense with fixed.
Theorem 5 is ultimately the convergence result for the methodology we propose. We first generate M copies of the sample paths of X from 0 to T. Then, we use MiniBatchKMeans function in Python programming language to partition the M copies of simulated values of X, at each time , into K clusters. With each cluster for , we perform the analysis as shown above and concatenate different pieces together to formulate the values for the conditional expectations at time t. The methodology works backwards in time. Of course, there are many different choices of , depending on how we want to approximate the conditional expectations within .
With the above discussions on the evaluation of conditional expectations, we can start to describe the main numerical method for a BSDEJ. First, define a set of time nodes , with and . Suppose that
. Second, we simulate M copies of the forward process X at the
discretization nodes . Third, we iterate backwards in time using the methodologies to evaluate conditional expectations outlined above. At time interval , use MiniBatchKMeans method to cluster into K disjoint subsets . Within each cluster , apply polynomial regression or Taylor expansion method to evaluate , where we add subscript to remind the readers that this intermediate function depends on time discretization now and M to denote Monte Carlo simulation dependence. After we obtain , we can treat it as the intermediate solution to the BSDEJ at time . In order to evaluate Z and V, taking Z as an example,
we use the relation . The equations for Z and V are stated in
( , Equation 2.10). When we obtain intermediate approximate solution , we plug the tuple into the driver f of the BSDEJ and is the terminal condition for the evaluations in time interval . Iterate until we reach .
2.3. Convergence Results
First, we introduce necessary spaces to facilitate the discussions of convergence results. For an -valued function , let the sup-norm be
We shall use the following spaces for stochastic processes with
• is the set of -valued adapted càdlàg processes X such that
Here is a nonnegative constant. We sometimes write as if doing so causes no ambiguity. The same is true for the spaces to be defined below.
• is the set of progressively measurable -valued processes Z such that
• is the set of q-dimensional processes , which is -measurable and satisfy
• is the set of processes in the space with the norm
Let be the space of bounded functions that have continuous and bounded derivatives up to order g in the domain and the space of functions that have continuous derivatives up to order g. In what follows, we will only consider the norms with and . In order to provide the final convergence result, we need the following assumption.
Assumption 6 (On BSDEJ). Assume that the BSDEJ (1) admits a unique solution in and the time discretized solution converges to in sense.
Then, the following convergence result is valid to justify the methodology we introduce above.
Theorem 7 (Main Convergence Result). Under Assumptions 1 and 6, we have
where and .
3. Applications in Finance
The BSDE associated with American option prices can be found in . We numerically solve ( , Equation 2.10) using the methodology introduced in Section 2. Performance is documented below. Here M is the number of sample paths and H is the number of time discretization nodes. The time to maturity of the American call option is . is the initial price of the underlying asset and the strike price is 100. Table 1 and Table 2 contain efficiency results.
In this paper, we propose a new machine learning method, based on clustering, to run  type regression to solve BSDEs with jumps. Convergence proof is given and numerical results show good performance. The generalization of the proposed methodologies to more complicated BSDEs with jumps, for example, the Mckean-Vlasov type BSDEs, is of interest. We leave it for future research.
Table 1. Numerical results when M = 50000.
Table 2. Numerical results when M = 75000.
Proof of Lemma 1. Denote . Because we have , it is obvious that . Therefore, we have , where
We immediately have . In addition, we have
Here . The term
Here C is independent of i and we have . Combine this result with and we will obtain the claim of this lemma.
Lemma 2 (On Lead-Lag Regression). Suppose that , then we have
Proof of Lemma 2. The proof of this lemma follows directly from the Repeated Projection Theorem (see, e.g., , Theorem 8).
Proof of Theorem 3. Apply Lemma 2 and ( , Theorem 23) and the proof is obvious.
Proof of Theorem 4. The proof of the theorem follows from the error term of the Taylor expansion, Lemma 2 and ( , Theorem 24).
1Of course, set can be time-dependent and unbounded. For brevity we ignore the time variable throughout the paper. But we will use a sequence of bounded and closed subsets of , denoted by and assume that and . Then, we will choose a sufficiently large I and perform computations in instead of .
2For example, we can ask .
3The equation below means that we run a lead-lag polynomial regression within each , with Y variable and X variable .
 Han, J., Jentzen, A. and E, W.N. (2017) Overcoming the Curse of Dimensionality: Solving High-Dimensional Partial Differential Equations Using Deep Learning. Working Paper. https://arxiv.org/pdf/1707.02568.pdf
 Beck, C., E, W.N. and Jentzen, A. (2017) Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Deferential Equations and Second-Order Backward Stochastic Differential Equations. Working Paper.
 E, W.N., Han, J. and Jentzen, A. (2017) Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Deferential Equations and Backward Stochastic Differential Equations. Working Paper. https://arxiv.org/pdf/1706.04702.pdf
 Bouchard, B. and Elie, R. (2008) Discrete-Time Approximation of Decoupled Forward-Backward SDE with Jumps. Stochastic Processes and Their Applications, 118, 53-75. https://doi.org/10.1016/j.spa.2007.03.010
 Fujii, M., Sato, S. and Takahashi, A. (2015) An FBSDE Approach to American Option Pricing with an Interacting Particle Method. Asia-Pacific Financial Markets, 22, 239-260. https://doi.org/10.1093/rfs/14.1.113
 Longstaff, F.A. and Schwartz, E.S. (2001) Valuing American Options by Simulation: A Simple Least Square Approach. Review of Financial Studies, 14,113-147.