Derivatives Pricing via Machine Learning

Show more

1. Introduction

Theoretical and empirical finance research involves the evaluation of conditional expectations, which, in a continuous time jump-diffusion setting, can be related to second order partial integral differential equations of parabolic type (PIDEs) by the Feynman-Kac theorem, and other types of equations such as backward stochastic differential equations with jumps (BSDEJs) or quasi-linear PIDEs in more complicated settings. In theoretical continuous-time finance, many problems, such as asset pricing with market frictions, dynamic hedging or dynamic portfolio-consumption choice problems, can be related to Hamilton-Jacobi-Bellman (HJB) equations via dynamic programming techniques. The HJB equations, from another perspective, are equivalent to BSDEs derived from a probabilistic approach. The nonlinear BSDEs, studied in [2], can be decomposed into a sequence of linear equations, which can be solved by taking conditional expectations, via Picard iteration. For empirical studies, the focus of the literature has been the evaluation of the cross sectional conditional risk-adjusted expected returns and the explanation of them using factors. See [3] [4] and [5] as good illustrations. It is easily seen that, regardless of the fact whether the underlying models are continuous-time or discrete-time, evaluating conditional expectations is inevitable in finance literature. Moreover, in order to perform XVA computations for the measurement of counterparty credit risk, we need to evaluate the conditional expectations, i.e., the derivative prices, on a future simulation grid, as outlined in [6]. These facts call for efficient methods to compute the quantities aforementioned.

In this paper, we extend the basis function expansion approach proposed in [1] with machine learning techniques. Specifically, we propose new efficient methods to evaluate conditional expectations, regardless of the dynamics of the underlying stochastic process, as long as they can be simulated. Rigorous convergence proofs are given using Hilbert space theory. The methodologies can be applied to time zero pricing as well as pricing on a future simulation grid, with the advantage of ANN approximation most prominent in high dimensional problems. In the sequel, we show applications of our methodologies on the pricing of European derivatives and extension to contracts with optimal stopping feature is straightforward through either [1] approach or reflected-BSDEs.

Compared to the literature on traditional stochastic analysis, our methodologies are able to handle large data sets and high-dimensional problems, therefore suffering much less from the curse of dimensionality due to the nature of ANN methods. Moreover, our methodologies are very efficient when evaluating solutions of BSDEJs and PIDEs on a future simulation grid, where none of the traditional methodologies applies. With respect to recent machine learning literature on numerical solutions to BSDEs and PDEs, our methodologies enjoy the theoretical advantage of being able to handle equations with jump-diffusion and convergence results are provided. When applied to the solutions of BSDEJs and PIDEs, our methodologies require much less number of parameters, as compared to the current machine learning based methods to be mentioned below. At any step in the solution process, only one ANN is needed and we do not require nested optimization. In terms of application, not all the prices of OTC derivatives can be easily translated into BSDEJs and PIDEs, for example, a range accrual with both American and barrier (knock-out, for example) feature. However, our methodologies are naturally suitable in those situations. To conclude, our methods enjoy many theoretical and empirical advantages, which makes them attractive and novel.

There has been a huge literature on applications of machine learning techniques to financial research. Classical applications focus on the prediction of market variables such as equity indexes or FX rates and the detection of market anomalies, for example, [7] and [8]. Option pricing via a brute-force curving fitting by ANNs dates back to [9]. More applications of machine learning in finance, especially option pricing prediction, are surveyed in [10]. See references therein. Pricing of American options in high dimensions can be found in [11], which is closest to our method 1. However, there are several improvements of our methods compared to this reference. First of all, we enable deep neural network (DNN) approximation and show convergence. Second, we can incorporate constraints in DNN approximation estimation and prove the mathematical validity of this approach. Third, we propose two more efficient methods to complement the first method of ours. Our treatment of constraints in the estimation of DNNs extends the work of [12] in that we can deal with a larger class of constraints by specifying a general Hilbert subspace as the constrained set. Risk measure computation using machine learning can be found in [13]. Applications of machine learning function approximation on financial econometrics can be found in [14], [15], [16] and [17]. Recent applications include empirical and theoretical asset pricing, reinforcement learning and Q-learning in solving dynamic programming problems such as optimal investment-consumption choice, option pricing and optimal trading strategies construction, e.g., [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28] and references therein. Numerical methods to solve PDEs and BSDEs or the related inverse problems can be found in [29], [30], [31], [32], [33], [34], [35], [36], [37], [38] and [39]. Machine learning based methods enjoy the advantage of being fast, able to handle large data sets and high dimensional problems.

Our methodologies are combinations of traditional statistical learning theory and stochastic analysis with advanced machine learning techniques, introducing powerful function approximation method via the universal approximation theorem and artificial neural networks (ANNs), while preserving the regression-type analysis documented in [1]. The methods are very easy to use, effective, accurate as illustrated by numerical experiments and time efficient. They are different from the convergent expansion method, e.g., [40], simulation methods such as [41], [42], [43] and [44] or the asymptotic expansion method proposed by [45], [46], [47], [48] [49] [50] [51] [52], in that we no longer resort to polynomial basis function expansion or small-diffusion type analysis. Our methods are also different from the pure machine learning based ones documented in [29], [30], [31], [32], [33], [34] and [35], in that we utilize the lead-lag regression formula to evaluate the conditional expectations, preserving the time dependent structure and our methods are able to handle jump-diffusion processes easily.

The organization of this paper is as follows. Section 2 documents the methodologies. Section 3 illustrates the usefulness of our methods by considering European and American derivatives pricing. Section 4 considers numerical experiments and Section 5 concludes. An outline of the proofs and other applications can be found in the appendices.

2. The Methodology

Mathematical Setup

We use a Markov process modeled by a jump-diffusion as illustration. Suppose that we have a stochastic differential equation with jumps

$\text{\hspace{0.05em}}\text{d}{X}_{t}=\mu \left(t\mathrm{,}{X}_{t}\right)\text{d}t+\sigma \left(t\mathrm{,}{X}_{t}\right)\text{d}{W}_{t}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\gamma \left(t\mathrm{,}{X}_{t}\mathrm{,}e\right)\stackrel{\u02dc}{N}\left(\text{d}t\mathrm{,}\text{d}e\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{X}_{0}={x}_{0}$ (1)

where $X\in {\mathbb{R}}^{r}$, $W\in {\mathbb{R}}^{d}$ is a standard d-dimensional Brownian motion and $\stackrel{\u02dc}{N}$ is a q-dimensional compensated Poisson random measure, with the compensator $\nu \text{\hspace{0.05em}}\left(\text{d}t\mathrm{,}\text{d}e\right)\mathrm{:}=\nu \left(\text{d}e\right)\text{d}t$. Information filtration ${\mathcal{F}}_{t}={\mathcal{F}}_{t}^{W,N}$ is generated by $\left(W\mathrm{,}N\right)$. We hope to evaluate the conditional expectation ${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]$ for any $0<t<T$, e.g., see [53]. Assumptions on $\psi $ and X are stated below.

Assumption 1 (On Growth Condition of ψ). $\psi $ has polynomial growth in its argument x, i.e., there exists a positive integer P, independent of x, such that for all $\left|x\right|>1$, we have, for constant C independent of x

$\left|\psi \left(x\right)\right|\le C{\left|x\right|}^{P}\mathrm{.}$ (2)

The following assumption is w.r.t. X.

Assumption 2 (On X). There exists a unique strong solution to Equation (1) and X has finite polynomial moments of all orders.

The General Approximation Theory

First, we need the following assumptions, definitions and results. Please note that, some of the spaces we introduce are actually conditional ones. The discussions of conditional Hilbert spaces can be found in [54], e.g., ${L}^{2}\left({\mathcal{F}}_{t}\right)$ is a conditional Hilbert space for all $t\in \left[\mathrm{0,}T\right]$.

Definition 3 (Projection Operator). For Hilbert spaces $\mathcal{X}$ and $\mathcal{H}$, where $\mathcal{H}\subset \mathcal{X}$. Define ${\text{PROJ}}_{\mathcal{H}}x$ as the projection of $x\in \mathcal{X}$ onto $\mathcal{H}$.

Definition 4 (Orthogonal Space). For Hilbert spaces $\mathcal{X}$ and $\mathcal{H}$, where $\mathcal{H}\subset \mathcal{X}$. Define ${\text{ORTH}}_{\mathcal{H}}\mathcal{X}$ as the orthogonal space of $\mathcal{H}$ in $\mathcal{X}$.

Definition 5 (Spanning the Hilbert Space). Assume that $\mathcal{E}={\left\{{e}^{j}\right\}}_{j\in \Lambda}$ is a set of elements in Hilbert space $\mathcal{X}$ and $\Lambda $ is an index set. Define ${\text{H}}_{\mathcal{E}}$ as the intersection of all Hilbert subspaces of $\mathcal{X}$ containing $\mathcal{E}$.

Assumption 6 (On Joint Continuity). $\mathcal{X}$ and $\mathcal{H}$ are two Hilbert spaces and $\mathcal{H}\subset \mathcal{X}$. Moreover, ${\left\{{\mathcal{H}}_{n}\right\}}_{n=1}^{\infty}$ is a sequence of Hilbert sub-spaces of $\mathcal{H}$ satisfying ${\mathcal{H}}_{n}\subset {\mathcal{H}}_{n+1}$ for any $n\ge 1$ and $\stackrel{\xaf}{{\displaystyle {\cup}_{n=1}^{\infty}{\mathcal{H}}_{n}}}=\mathcal{H}$. We have ${\mathrm{lim}}_{n\to \infty}{\Vert h-{\text{PROJ}}_{{\mathcal{H}}_{n}}{h}_{n}\Vert}_{\mathcal{H}}=0$ for any $h\in \mathcal{H}$ and ${\mathrm{lim}}_{n\to \infty}{h}_{n}=h$.

The next two theorems are well-known in the literature.

Theorem 7 (Hilbert Projection Theorem). Let $\mathcal{H}\subset \mathcal{X}$ be two Hilbert spaces and let $x\in \mathcal{X}$. Then, ${\text{PROJ}}_{\mathcal{H}}x$ exists and is unique. Moreover, it is characterized uniquely by $x-{\text{PROJ}}_{\mathcal{H}}x\in {\text{ORTH}}_{\mathcal{H}}\mathcal{X}$.

Theorem 8 (Repeated Projection Theorem). Let $\mathcal{G}\subset \mathcal{H}\subset \mathcal{X}$ be three Hilbert spaces. Then, for any $x\in \mathcal{X}$, ${\text{PROJ}}_{\mathcal{G}}x={\text{PROJ}}_{\mathcal{G}}\left({\text{PROJ}}_{\mathcal{H}}x\right)$.

Remark 9 The conditions of Theorems 7 and 8 on $\mathcal{G}$ and $\mathcal{H}$ can be relaxed to convexity and completeness instead of Hilbert sub-spaces.

Finally, we have the result below.

Theorem 10. Suppose $\mathcal{X}$ is a Hilbert space, ${\left\{{\mathcal{H}}_{n}\right\}}_{n=1}^{\infty}$ and $\mathcal{H}$ are Hilbert subspaces of $\mathcal{X}$ satisfying ${\mathcal{H}}_{n}\subset {\mathcal{H}}_{n+1}$ and $\stackrel{\xaf}{{\displaystyle {\cup}_{n=1}^{\infty}{\mathcal{H}}_{n}}}=\mathcal{H}\subset \mathcal{X}$. $x\in \mathcal{X}$, define ${h}_{n}={\text{PROJ}}_{{\mathcal{H}}_{n}}x$ and $h={\text{PROJ}}_{\mathcal{H}}x$. Then we have ${\mathrm{lim}}_{n\to \infty}{h}_{n}=h$ w.r.t. the norm topology in $\mathcal{X}$, if Assumption 6 is satisfied.

Sometimes we need to add constraints on the calibrated ANN, e.g., the shape constraints. The following assumption and theorem deal with this situation.

Assumption 11 (On Constrained Sub-space). Suppose that $\Psi \subset \mathcal{X}$ such that ${\left\{\Psi \cap {\mathcal{H}}_{n}\right\}}_{n=1}^{\infty}$ is a sequence of non-empty convex and complete subspaces of $\mathcal{X}$ satisfying Assumption 6, where $\mathcal{X}$ and ${\left\{{\mathcal{H}}_{n}\right\}}_{n=1}^{\infty}$ are described.

The following theorem handles the constrained approximation and its convergence.

Theorem 12 (On Constrained Approximation). Under Assumptions 6 and 11, for $x\in \mathcal{X}$, if $h={\text{PROJ}}_{\mathcal{H}}x\in \Psi $, then, we have ${\mathrm{lim}}_{n\to \infty}{\text{PROJ}}_{\Psi \cap {\mathcal{H}}_{n}}x=h$.

Remark 13 (On ψ). In Theorem 12, the set $\Psi $ represents prior knowledge on constraints that h satisfies. It can be represented by a set of non-linear inequalities or equalities on functionals of h. Common constraints for option pricing include non-negativity constraint and the positiveness constraint on the second order derivatives. The verification of ${\left\{\Psi \cap {\mathcal{H}}_{n}\right\}}_{n=1}^{\infty}$ satisfying Assumption 6 should be based on a case-by-case manner.

To proceed further, we need the following assumptions.

Assumption 14 (On Some Spaces). ${\left\{{\mathcal{H}}_{t}^{J}\right\}}_{J=1}^{\infty}$ is an increasing sequence of Hilbert sub-spaces of ${L}^{2}\left({\mathcal{F}}_{t}\right)$, ${\mathcal{H}}_{t}^{J}\subset {\mathcal{H}}_{t}^{J+1}$, $\stackrel{\xaf}{{\displaystyle {\cup}_{J=1}^{\infty}{\mathcal{H}}_{t}^{J}}}={\mathcal{H}}_{t}\subset {L}^{2}\left({\mathcal{F}}_{t}\right)$. Moreover, $\stackrel{\xaf}{\left\{{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\mathrm{|}{\xi}_{T}\in {L}^{2}\left({\mathcal{F}}_{T}\right)\mathrm{,}{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\in {L}^{2}\left({\mathcal{F}}_{t}\right)\right\}}\subset {\mathcal{H}}_{t}\subset {L}^{2}\left({\mathcal{F}}_{t}\right)\subset {L}^{2}\left({\mathcal{F}}_{T}\right)={\mathcal{X}}_{T}$.

Assumption 15 (On Structure of
${\mathcal{H}}_{t}^{J}$ ).
${\left\{{e}_{t}^{j}\right\}}_{j\in \Lambda}$ is a set of elements of
${L}^{2}\left({\mathcal{F}}_{t}\right)$, such that
${\mathcal{H}}_{t}^{J}={\text{H}}_{{\left\{{e}_{t}^{j}\right\}}_{j\in {\Lambda}_{J}}}$, where
${\Lambda}_{J}\subset {\Lambda}_{J+1}\subset \Lambda $ for any
$J\ge 1$ and
${\cup}_{J=1}^{\infty}{\Lambda}_{J}}=\Lambda $, satisfies Assumption 14^{1}.

Then, we have the following results.

Lemma 1. For any adapted stochastic process $\xi $ such that ${\xi}_{T}\in {L}^{2}\left({\mathcal{F}}_{T}\right)$, if ${\mathbb{E}}_{t}\left[{\xi}_{T}\right]\in {L}^{2}\left({\mathcal{F}}_{t}\right)$, we have

${\mathbb{E}}_{t}\left[{\xi}_{T}\right]=\mathrm{arg}\underset{{\eta}_{t}\in {L}^{2}\left({\mathcal{F}}_{t}\right)}{\mathrm{min}}\mathbb{E}\left[{\left({\xi}_{T}-{\eta}_{t}\right)}^{2}\right].$ (3)

The following proposition is a natural extension of Lemma 1.

Proposition 16. For any measurable function $\psi $ and stochastic process X such that $\psi \left({X}_{T}\right)\in {L}^{2}\left({\mathcal{F}}_{T}\right)$ and ${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]\in {L}^{2}\left({\mathcal{F}}_{t}\right)$, we have

${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]=\mathrm{arg}\underset{{\xi}_{t}\in {L}^{2}\left({\mathcal{F}}_{t}\right)}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\xi}_{t}\right)}^{2}\right]\mathrm{.}$ (4)

Here ${\xi}_{t}\in {\mathcal{F}}_{t}$ and the above minimization problem has a unique solution. In particular, if X is a Markov process, then ${\xi}_{t}=\varphi \left(t\mathrm{,}{X}_{t}\right)$, i.e., ${\xi}_{t}$ is a function of time t and ${X}_{t}$.

We then have the following theorem.

Theorem 17. Under Assumptions 1, 2, 6, 14 and 15, for any adapted stochastic process $\xi $ such that ${\xi}_{T}\in {L}^{2}\left({\mathcal{F}}_{T}\right)$ and ${\mathbb{E}}_{t}\left[{\xi}_{T}\right]\in {L}^{2}\left({\mathcal{F}}_{t}\right)$, we have

$\underset{J\to \infty}{\mathrm{lim}}\mathrm{arg}\underset{{\eta}_{t}\in {\mathcal{H}}_{t}^{J}}{\mathrm{min}}\mathbb{E}\left[{\left({\xi}_{T}-{\eta}_{t}\right)}^{2}\right]{=}_{{L}^{2}\left({\mathcal{F}}_{t}\right)}{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\mathrm{.}$ (5)

Further, for any measurable function $\psi $ and stochastic process X such that $\psi \left({X}_{T}\right)\in {L}^{2}\left({\mathcal{F}}_{T}\right)$ and ${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]\in {L}^{2}\left({\mathcal{F}}_{t}\right)$, we have the following equality

$\underset{J\to \infty}{\mathrm{lim}}\mathrm{arg}\underset{{\xi}_{t}\in {\mathcal{H}}_{t}^{J}}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\xi}_{t}\right)}^{2}\right]{=}_{{L}^{2}\left({\mathcal{F}}_{t}\right)}{\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right].$ (6)

If X is Markov, then we have ${\xi}_{t}=\varphi \left(t\mathrm{,}{X}_{t}\right)$, i.e., ${\xi}_{t}$ is a function of time t and ${X}_{t}$.

The following theorem justifies the Monte Carlo approximation of expectation in the above optimization problems.

Theorem 18 (On Sequential Convergence). Under Assumptions 1, 2, 6, 14 and 15, suppose that $\left|{\Lambda}_{J}\right|={m}_{J}<\infty $ for all $J\ge 1$, ${\left\{{X}_{T}^{i}\right\}}_{i=1}^{M}$ and ${\left\{{e}_{t}^{j,i}\right\}}_{j,i=1,1}^{{m}_{J},M}$ are M i.i.d. copies of ${X}_{T}$ and ${\left\{{e}_{t}^{j}\right\}}_{j=1}^{{m}_{J}}$. Then we have

$\underset{J\to \infty}{\mathrm{lim}}\underset{M\to \infty}{\mathrm{lim}}\mathrm{arg}\underset{{\xi}_{t}^{m}\in {\mathcal{H}}_{t}^{J}}{\mathrm{min}}\frac{1}{M}{\displaystyle \underset{m\mathrm{=1}}{\overset{M}{\sum}}}{\left(\psi \left({X}_{T}^{m}\right)-{\xi}_{t}^{m}\right)}^{2}{=}_{\mathbb{P}}{\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]\mathrm{.}$ (7)

The following results justify the universal approximation and ANN approximation approaches proposed in this paper.

Proposition 19 (On Universal Approximation Theory). Let $\sigma $ denote the function in the universal approximation theorem mentioned in [55], [56] and [57]. Define ${\left\{{e}_{t}^{j}\right\}}_{j=1}^{{m}_{n}}:={\left\{\sigma \left({\alpha}_{j}+{\beta}_{j}{X}_{t}\right)\right\}}_{j=1}^{{m}_{n}}$, where X satisfies Equation (1) and Assumption 2, ${\alpha}_{j}$ and ${\beta}_{j}$ have at most n significant digits in total, where $n\in \mathbb{N}$, i.e., n belongs to the set of natural numbers, j runs from 1 to ${m}_{n}$ and

${m}_{n}$ is the number of all related $\left\{{e}_{t}^{j}\right\}$, i.e., ${m}_{n}=\left|\left\{\sigma \left(\alpha +\beta {X}_{t}\right)\mathrm{|}\alpha \text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\beta \text{haveatmost}\text{\hspace{0.17em}}n\text{\hspace{0.17em}}\text{totalsignificantdigits}\right\}\right|$. Then, ${\left\{{\text{H}}_{{\left\{{e}_{t}^{j}\right\}}_{j=1}^{{m}_{n}}}\right\}}_{n\in \mathbb{N}}$ satisfies Assumptions 6, 14 and 15. Therefore, Theorems 17 and 18 apply.

Proposition 20 (On Deep Neural Network Approximation). For the DNN defined in ( [58], Definition 1.1], observe that ${W}_{l}\left(x\right)={\alpha}_{l}+{\beta}_{l}x$. Define

${e}_{t}^{j}\mathrm{:}={W}_{L\mathrm{,}j}\circ \rho \circ {W}_{L-\mathrm{1,}j}\circ \rho \circ \cdots \circ {W}_{\mathrm{1,}j}\circ \rho \left({X}_{t}\right)$ (8)

where ${W}_{l,j}\left(x\right)={\alpha}_{l,j}+{\beta}_{l,j}x$ satisfies that $l=1,2,\cdots ,L$, $\left({\alpha}_{l\mathrm{,}j}\mathrm{,}{\beta}_{l\mathrm{,}j}\right)$ have at most n total significant digits and $n\in \mathbb{N}$. Then, ${\left\{{\text{H}}_{{\left\{\mathrm{1,}{e}_{t}^{j}\right\}}_{j=1}^{{m}_{n}}}\right\}}_{n\in \mathbb{N}}$, where 1 means

function $f\left(x\right)\equiv 1$ for all x, satisfies Assumptions 6, 14 and 15. Therefore, Theorems 17 and 18 apply after a localization argument on $\psi $ and X on a compact sub-domain in ${\mathbb{R}}^{r}$.

Remark 21 (On DNN). Please note that, in Proposition 20, we do not intend to prove the convergence when the number of layers goes to infinity. Instead, we show convergence when the number of connections goes to infinity, which can be achieved via enlarging the number of neurons in each layer with the total number of layers remaining fixed.

Remark 22 (On Euler Time Discretization). [59] proposes an exact simulation method for multi-dimensional stochastic differential equations. The discussion of discretization error, of the regression approach proposed in this paper, with Euler method is not hard if $\psi $ satisfies Assumption 1, in which case the dominated convergence theorem and ${L}^{2}$ convergence of Euler method can be applied to show the convergence.

The proofs of the above results can be found in Appendix A. In what follows, we will propose three methods to compute, approximately, the function $\varphi $ in Proposition 16.

Method 1

In general, $\varphi $, defined in Proposition 16 and Theorem 17, can not be found in closed-form. A natural thought would be to resort to function expansion representations, i.e., to find the solution to the following problem

${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]=\mathrm{arg}\text{}\underset{{\left\{{a}_{j}\mathrm{,}{\theta}_{j}\right\}}_{j=0}^{\infty}\in \mathcal{A}}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\displaystyle \underset{j=0}{\overset{\infty}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}_{j}{e}^{j}\left(t\mathrm{,}{X}_{t}\mathrm{|}{\theta}_{j}\right)\right)}^{2}\right]$ (9)

where
$\mathcal{A}$ is an appropriate space for coefficients
${\left\{{a}_{j},{\theta}_{j}\right\}}_{j=0}^{\infty}$ and
${\left\{{e}^{j}\left({\theta}_{j}\right)\right\}}_{j=0}^{\infty}$ is a set of functions, with
$\text{Span}\left({\left\{{e}^{j}\left({\theta}_{j}\right)\right\}}_{j=0}^{\infty}\right)$ ^{2} dense in an appropriate function space
$\Phi $ ^{3}. To further proceed, we seek a truncation of the function representation formula as follows

${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]\cong \mathrm{arg}\underset{{\left\{{a}_{j}\mathrm{,}{\theta}_{j}\right\}}_{j=0}^{J}\in {\mathcal{A}}_{J}}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\displaystyle \underset{j=0}{\overset{J}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}_{j}{e}^{j}\left(t\mathrm{,}{X}_{t}\mathrm{|}{\theta}_{j}\right)\right)}^{2}\right]$ (10)

for J sufficiently large, where ${\mathcal{A}}_{J}$ is a compact set in the Euclidean space where ${\left\{{a}_{j}\mathrm{,}{\theta}_{j}\right\}}_{j=0}^{J}$ take values. The last step would be to use Monte Carlo simulation to approximate the unconditional expectation appearing in Equations (9) and (10). Therefore turning the conditional expectation computation problem, into a least-square function regression problem, similar to [1]. An obvious choice of ${\left\{{e}^{j}\left({\theta}_{j}\right)\right\}}_{j=0}^{\infty}$ is polynomial basis, for example, the set of Fourier-Hermite basis functions. For expansion using Fourier-Hermite basis functions in high dimensions, see [60].

In fact, Artificial Neural Networks (ANNs) prove to be an efficient and convergent function approximation tool that we can utilize in the above expressions. Write

${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]\cong \mathrm{arg}\underset{{\left\{{a}_{j}\mathrm{,}{\theta}_{j}\right\}}_{j=0}^{J}\in {\mathcal{A}}_{J}}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\text{ANN}}_{J}\left({\left\{{a}_{j}\mathrm{,}{\theta}_{j}\right\}}_{j=0}^{J}\mathrm{|}t\mathrm{,}{X}_{t}\right)\right)}^{2}\right]$ (11)

where ${\text{ANN}}_{J}$ denotes an ANN with parameters ${\left\{{a}_{j},{\theta}_{j}\right\}}_{j=0}^{J}$.

Note that, via proper time discretization and fixed point iteration, solving a BSDE with jumps can be decomposed into a series of evaluations of conditional expectations. The machine learning based method outlined above can be applied there. We will write down the algorithm to solve a general Coupled Forward-Backward Stochastic Differential Equation with Jumps (CFBSDEJs) in the appendix. Extensions to other types of BSDEJs are possible.

Here we assume that X is a Markov process. To handle path dependency or non-Markov processes, we can apply the backward induction method outlined in [1]. With the machine learning approach, it is easy to see that this method enables us to get the values of conditional expectations on a future simulation grid.

Method 2

Another method to utilize the idea of [1] is inspired by the boosting random tree method (BRT), see, [61], for example. Partition the domain space
${\mathbb{R}}^{r}={\displaystyle {\cup}_{k=1}^{K}}\text{\hspace{0.05em}}{U}_{t}^{k}$ ^{4}, where
${\left\{{U}_{t}^{k}\right\}}_{k=1}^{K}$ is a set of disjoint sets in
${\mathbb{R}}^{r}$ and consider

$\begin{array}{c}{\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]=\mathrm{arg}\underset{\varphi \in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-\varphi \left(t\mathrm{,}{X}_{t}\right)\right)}^{2}\right]\\ \cong \mathrm{arg}\underset{{\displaystyle {\sum}_{k=1}^{K}{\varphi}_{k}\left(t\mathrm{,}x\right){1}_{x\in {U}_{t}^{k}}}\in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\varphi}_{k}\left(t\mathrm{,}{X}_{t}\right)\right)}^{2}{1}_{{X}_{t}\in {U}_{t}^{k}}\right]\mathrm{.}\end{array}$ (12)

The choice of ${\left\{{U}_{t}^{k}\right\}}_{k=1}^{K}$ is important and we can use the machine learning classification techniques (or any classification rule), such as kmeans function in R programming language, in Monte Carlo simulation and related computations. Denote ${d}_{U}={\mathrm{sup}}_{x,y\in U}\left|x-y\right|$. It is possible to show that as long as ${\mathrm{lim}}_{K\to \infty}{\mathrm{max}}_{1\le k\le K}{d}_{{U}_{t}^{k}}=0$, we only need finite number of functions, for example, ${\left\{{e}^{j}\left({\theta}_{j}\right)\right\}}_{j=0}^{J}$, to approximate each ${\left\{{\varphi}_{k}\right\}}_{k=1}^{K}$ and obtain convergence. In practice, although the domain of ${X}_{t}$ is ${\mathbb{R}}^{r}$, it might be centered at a small subspace ${\u2102}_{t}$, therefore facilitating the partition process. Note also that this method might require us to mollify the function $\psi $, if it is not smooth. We adopt finite order Taylor expansion as the function expansion representation approach. The following theorems provide convergence analysis for this method.

Theorem 23. For an appropriate function space $\Phi $, we have

$\begin{array}{c}{\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]=\mathrm{arg}\underset{\varphi \in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-\varphi \left(t\mathrm{,}{X}_{t}\right)\right)}^{2}\right]\\ =\mathrm{arg}\underset{\varphi \in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-\varphi \left(t\mathrm{,}{X}_{t}\right)\right)}^{2}{\displaystyle \underset{k\mathrm{=1}}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{1}_{{X}_{t}\in {U}_{t}^{k}}\right]\\ =\mathrm{arg}\underset{\varphi \in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right){\displaystyle \underset{k\mathrm{=1}}{\overset{K}{\sum}}}{1}_{{X}_{t}\in {U}_{t}^{k}}-{\displaystyle \underset{k\mathrm{=1}}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\varphi \left(t\mathrm{,}{X}_{t}\right){1}_{{X}_{t}\in {U}_{t}^{k}}\right)}^{2}\right]\\ =\mathrm{arg}\underset{{\displaystyle {\sum}_{k=1}^{K}{\varphi}_{k}\left(t\mathrm{,}x\right){1}_{x\in {U}_{t}^{k}}}\in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\varphi}_{k}\left(t\mathrm{,}{X}_{t}\right)\right)}^{2}{1}_{{X}_{t}\in {U}_{t}^{k}}\right]\mathrm{.}\end{array}$ (13)

Theorem 24. Let ${\mathcal{H}}_{t}^{J}$ be as described previously and ${\mathcal{H}}_{t}=\left\{\varphi \left(t,{X}_{t}\right)|\varphi \in \Phi \right\}$. Then, we have

$\underset{{\mathrm{max}}_{1\le k\le K}{d}_{{U}_{t}^{k}}\to 0}{\mathrm{lim}}{\Vert {\displaystyle \underset{k=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\stackrel{^}{\varphi}}_{k}\left(t\mathrm{,}{X}_{t}\right){1}_{{X}_{t}\in {U}_{t}^{k}}-\varphi \left(t\mathrm{,}{X}_{t}\right)\Vert}_{{L}^{2}\left({\mathcal{F}}_{t}\right)}=0$ (14)

with J large enough, fixed, finite and ${\stackrel{^}{\varphi}}_{k}$ is an approximation to ${\varphi}_{k}$, which satisfies

$\mathbb{E}\left[{\left({\varphi}_{k}\mathrm{(}t\mathrm{,}{X}_{t}\mathrm{)}-{\stackrel{^}{\varphi}}_{k}\left(t\mathrm{,}{X}_{t}\right)\right)}^{2}{1}_{{X}_{t}\in {U}_{t}^{k}}\right]\le {\u03f5}_{K}$ (15)

for any $k=1,2,\cdots ,K$, $K\in \mathbb{N}$, ${\mathrm{lim}}_{K\to \infty}K{\u03f5}_{K}=0$ and ${\u03f5}_{K}$ is independent of k when K is sufficiently large.

Method 3

Next, we propose an algorithm combining the ANN and universal approximation theorem (UAT). Suppose that ${L}^{2}\left({\mathcal{F}}_{t}\right)$ is the space where we are performing the approximation. Also assume that ${\mathcal{F}}_{t}^{W\mathrm{,}N}={\mathcal{F}}_{t}^{X}$, i.e., the information filtration is equivalently generated by X. Define an ANN with connection N by $\text{ANN}\left(x\mathrm{,}N\mathrm{,}{\theta}_{j}\mathrm{,}j\right)$, where x is the state variables that the ANN depends on, ${\theta}_{j}$ is the vector of parameters and j is its label. We define the following nested regression approximation

$\psi \left({X}_{T}\right)=\text{ANN}\left({X}_{t}\mathrm{,}N\mathrm{,}{\theta}_{1}\mathrm{,1}\right)+{\u03f5}_{t\mathrm{,}T}^{1}$ (16)

${\u03f5}_{t\mathrm{,}T}^{1}=\text{ANN}\left({X}_{t}\mathrm{,}N\mathrm{,}{\theta}_{2}\mathrm{,2}\right)+{\u03f5}_{t\mathrm{,}T}^{2}$ (17)

${\u03f5}_{t\mathrm{,}T}^{2}=\text{ANN}\left({X}_{t}\mathrm{,}N\mathrm{,}{\theta}_{3}\mathrm{,3}\right)+{\u03f5}_{t\mathrm{,}T}^{3}$ (18)

$\cdots =\cdots $ (19)

${\u03f5}_{t\mathrm{,}T}^{J}=\text{ANN}\left({X}_{t}\mathrm{,}N\mathrm{,}{\theta}_{J+1}\mathrm{,}J+1\right)+{\u03f5}_{t\mathrm{,}T}^{J+1}$ (20)

$\cdots =\cdots $ (21)

where ${\left\{{\displaystyle {\sum}_{j=1}^{J+1}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{ANN}\left({X}_{t}\mathrm{,}N\mathrm{,}{\theta}_{j}\mathrm{,}j\right)\right\}}_{J=0}^{\infty}$ is the approximate sequence of ${\mathbb{E}}_{t}\left[\psi \left({X}_{T}\right)\right]$.

In this paper, we will test and compare the performance of all of the proposed methods. A general discussion and rigorous proofs can be found in Appendix A^{5}.

3. Applications in Derivatives Pricing

3.1. European Option Pricing

Suppose that the payoff of a European claim can be written as, similar to [62] and [63], $\left(f\mathrm{,}\psi \right)$, where ${f}_{t}$ is a stream of cash flows materialized at each time instance t and ${\psi}_{T}$ is a one-time terminal payoff at time T. Therefore, under no-arbitrage condition, the price of this European payoff can be written as, under risk neutral measure

${V}_{t}^{e}\mathrm{:}={\mathbb{E}}_{t}\left[{\displaystyle {\int}_{t}^{T}}{D}_{t\mathrm{,}u}{f}_{u}\text{d}u+{D}_{t\mathrm{,}T}{\psi}_{T}\right]$ (22)

where ${D}_{t\mathrm{,}u}\mathrm{:}={\text{e}}^{-{\displaystyle {\int}_{t}^{u}}\text{\hspace{0.05em}}{r}_{v}\text{d}v}$ is the stochastic discount factor. If we assume a Markov structure ${f}_{t}=f\left(t,{X}_{t}\right)$ and ${\psi}_{T}=\psi \left({X}_{T}\right)$, then ${V}_{t}^{e}:={v}^{e}\left(t,{X}_{t}\right)$, i.e., ${V}_{t}^{e}$ is a function of time t and state vector ${X}_{t}$. This problem is a canonical application of the evaluation of conditional expectations and we can apply the methodologies outlined in Section 2 to solve it. European claims with barrier features can be incorporated and priced in a similar way. For example, the price of a knock-in European claim can be written as

${V}_{t}^{e}\mathrm{:}={\mathbb{E}}_{t}\left[{\displaystyle {\int}_{\tau}^{T}}{D}_{t\mathrm{,}u}{f}_{u}\text{d}u+{D}_{\tau \mathrm{,}T}{\psi}_{T}\right]$ (23)

where $\tau ={\mathrm{inf}}_{v\in \left[t\mathrm{,}T\right]}\left\{{X}_{v}\in \mathcal{T}\mathrm{|}{X}_{t}\notin \mathcal{T}\right\}$, where $\mathcal{T}\subset {\mathbb{R}}^{r}$. In our setting, the dynamics of X can be arbitrary, possibly stochastic differential equations with jumps, Markov chains, or even non-Markov processes. Previously, Monte Carlo based method for option pricing can be found in [64] and [65], among others.

3.2. American Option Pricing

Still use $\left(f\mathrm{,}\psi \right)$ to denote the payoff structure of an American claim, whose price can be obtained via formula

${V}_{t}^{a}\mathrm{:}=\underset{\tau \in \mathcal{S}\left[t\mathrm{,}T\right]}{\mathrm{sup}}{\mathbb{E}}_{t}\left[{\displaystyle {\int}_{t}^{\tau}}{D}_{t\mathrm{,}u}{f}_{u}\text{d}u+{D}_{t\mathrm{,}\tau}{\psi}_{\tau}\right]\mathrm{.}$ (24)

Here $\mathcal{S}\left[t\mathrm{,}T\right]$ is the space of all the stopping times in $\left[t\mathrm{,}T\right]$. We refer the interested readers to [62] and [66] for general derivation and explanation of Equation (24). It is also possible to derive the general BSDE that an American claim price satisfies, for example [67]. Moreover, in [27] and [1], the authors utilize a backward induction approach to solve optimal stopping problems. The idea can be carried out using the methodologies documented in Section 2. American claims with barrier features can be incorporated and priced in a similar way. It is also known that American option prices can be related to reflected BSDEs (RBSDEs), a rigorous discussion of existence and uniqueness of such equations can be found in [68] and references therein.

4. Numerical Experiments

4.1. European Option Pricing

In this section, we consider a Heston model

$\frac{\text{d}{S}_{t}}{{S}_{t}}=r\text{d}t+\sqrt{{\nu}_{t}}\text{d}{W}_{t},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{S}_{0}={s}_{0}$ (25)

$\text{d}{\nu}_{t}=\kappa \left(\theta -{\nu}_{t}\right)\text{d}t+\sigma \sqrt{{\nu}_{t}}\left(\rho \text{d}{W}_{t}+\sqrt{1-{\rho}^{2}}\text{d}{B}_{t}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\nu}_{0}={v}_{0}$ (26)

where $\left(W\mathrm{,}B\right)$ is a two dimensional standard Brownian motion. The parameter values are chosen as $r=0.05$, $\kappa =1.00$, $\theta =0.04$, $\sigma =0.10$, $\rho =-0.50$, ${s}_{0}=1.00$, $K=1.00$ and ${v}_{0}=0.04$. Time to maturity is set to be $T=0.50$,

with time discretization step $h=0.01$ and $N=\frac{T}{h}=50$. The number of

simulation paths is $M=10000$. We price a plain vanilla European call option ${\left({S}_{T}-K\right)}^{+}$ as an illustration. The QQ-plots are displayed in Figures 1-10. The first three correspond to a recursive evaluation, i.e., regressing the values at $t+1$ on state variables at time t. The rest of the plots correspond to direct regression, i.e., regressing the discounted payoffs at time T on state variables at time t. Figures 10-12 are for the prices of a digital call option under Black-Scholes setting and Figures 13-15 are QQ-plots for Delta values. Figure 16 and Figure 17 show the QQ-plots for method 3 under Heston model with 3 nested ANN approximations of size 4 and one ANN approximation of size 12 using R routine nnet. The absolute RMSE for the former is 0.1938% and latter 0.2581%, with the running time 10.36 seconds compared to 52.31 for ANN approximation with size 12.

Figure 1. QQ-plot for Method 1, $\tau =0.05$ and relative pricing error is 1.20%.

Figure 2. QQ-plot for Method 1, $\tau =0.25$ and relative pricing error is 1.50%.

Figure 3. QQ-plot for Method 1, $\tau =0.45$ and relative pricing error is 1.20%.

Figure 4. QQ-plot for Method 1, $\tau =0.05$ and relative pricing error is 1.66%.

Figure 5. QQ-plot for Method 1, $\tau =0.20$ and relative pricing error is 1.75%.

Figure 6. QQ-plot for Method 1, $\tau =0.30$ and relative pricing error is 3.00%.

Figure 7. QQ-plot for Method 2, $\tau =0.05$ and relative pricing error is 1.80%.

Figure 8. QQ-plot for Method 2, $\tau =0.20$ and relative pricing error is 3.50%.

Figure 9. QQ-plot for Method 2, $\tau =0.30$ and relative pricing error is 3.53%.

Figure 10. QQ-plot for Method 1, $\tau =0.02$ and relative pricing error is 0.40%.

Figure 11. QQ-plot for Method 1, $\tau =0.05$ and relative pricing error is 0.80%.

Figure 12. QQ-plot for Method 1, $\tau =0.08$ and relative pricing error is 0.60%.

Figure 13. Delta QQ-plot for Method 1, $\tau =0.02$.

Figure 14. Delta QQ-plot for Method 1, $\tau =0.05$.

Figure 15. Delta QQ-plot for Method 1, $\tau =0.08$.

Figure 16. Price QQ-plot for Method 3, $\tau =0.20$.

Figure 17. Price QQ-plot for Method 3, $\tau =0.20$.

4.2. American Option Pricing

Here we refer the readers to [67] for the BSDE satisfied by a plain vanilla American option. For $r=0.03$, $d=0.07$, $\sigma =0.20$, $T=3.00$, $N=150$, ${S}_{0}=100$ and $K=100$, the benchmark American option price at ${t}_{0}=0$ is 9.0660 and the relative difference of our Monte-Carlo price is 0.27%. The running time is less than 30 seconds.

5. Conclusion and Future Research

In this paper, we show how machine learning techniques, specifically, ANN function approximation methods, can be applied to derivatives pricing. We relate pricing problems to the evaluation of conditional expectations via BSDEJs and PIDEs. Future research topics can, potentially, be the development of reinforcement learning methodologies to solve dynamic programming problems and apply them in the context of empirical asset pricing literature. Moreover, the evaluation of energy derivatives calls for SDEJs defined in a Hilbert space. The same theoretical constructions can also be found in the evaluation of fixed income derivatives, such as the random field models proposed and studied in [69]. One can, of course, apply Karhunen-Loéve expansion for a dimension reduction to reduce the problem to the evaluation of conditional expectations of regular SDEJs. However, the development of machine learning based methods to solve directly the conditional expectations on the stochastic processes defined in a Hilbert space is important. In addition, stochastic differential games, that arise in the context of American game options, equity swaps, and the related Mckean-Vlasov type FBSDEJs (mean-field FBSDEJ, see [70] ) are important topics in mathematical finance. They are also related to the theoretical analysis of high-frequency trading. Finding machine-learning based numerical methods to solve these equations is of great interest to us. Last, but not least, machine learning methods in asset pricing and portfolio optimization, which can be found in [71], [72], [73], [28], [74] and [75], admit an elegant way to price financial derivatives under $\mathbb{P}$ -measure. For example, we can use the method in [72] to calibrate the SDF process and use [75] to generate market scenarios. These methodologies, combined with the methods documented in this paper and [1], have the potential to solve for any derivative price. We leave all the development to future research.

Acknowledgements

We thank the Editor and the referee for their comments. Moreover, we are grateful to Professor Jérôme Detemple, Professor Marcel Rindisbacher and Professor Weidong Tian for their useful suggestions.

Appendix

A. Convergence of the Proposed Methodologies

Proof of Theorem 10. It is known from the projection theorem of Hilbert space that ${\left\{{h}_{n}\right\}}_{n=1}^{\infty}$ and h actually exist and are unique. Moreover, ${\text{PROJ}}_{{\mathcal{H}}_{n}}h={h}_{n}$ as indicated by the repeated projection theorem. It is also known that $h-{h}_{n}\in {\text{ORTH}}_{{\mathcal{H}}_{n}}\mathcal{H}$. As we ask that Assumption 6 hold, we know that ${\Vert h-{h}_{n}\Vert}_{\mathcal{X}}\to 0+$ as $n\to \infty $.

Proof of Theorem 12. The proof follows from Assumption 6 and Theorem 8. We have

$\underset{n\to \infty}{\mathrm{lim}}{\text{PROJ}}_{\Psi \cap {\mathcal{H}}_{n}}x$ (27)

$=\underset{n\to \infty}{\mathrm{lim}}{\text{PROJ}}_{\Psi \cap {\mathcal{H}}_{n}}{\text{PROJ}}_{{\mathcal{H}}_{n}}x$ (28)

$=\underset{n\to \infty}{\mathrm{lim}}{\text{PROJ}}_{\Psi \cap {\mathcal{H}}_{n}}{h}_{n}$ (29)

$={\text{PROJ}}_{\Psi \cap \mathcal{H}}h$ (30)

$=h\mathrm{.}$ (31)

This concludes the proof.

Proof of Lemma 1. For any ${\lambda}_{t}\in {L}^{2}\left({\mathcal{F}}_{t}\right)$, we have

$\mathbb{E}\left[{\left({\xi}_{T}-{\lambda}_{t}\right)}^{2}\right]$ (32)

$=\mathbb{E}\left[{\left({\xi}_{T}-{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\right)}^{2}\right]+\mathbb{E}\left[{\left({\lambda}_{t}-{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\right)}^{2}\right]$ (33)

$\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}+\underset{=0}{\underset{\ufe38}{2\mathbb{E}\left[\left({\lambda}_{t}-{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\right)\left({\xi}_{T}-{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\right)\right]}}$ (34)

$=\mathbb{E}\left[{\left({\xi}_{T}-{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\right)}^{2}\right]+\mathbb{E}\left[{\left({\lambda}_{t}-{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\right)}^{2}\right]$ (35)

$\ge \mathbb{E}\left[{\left({\xi}_{T}-{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\right)}^{2}\right]\mathrm{.}$ (36)

Therefore we have the claim announced.

Proof of Theorem 17. The proof of this theorem follows from Assumptions 1, 2, 6, 14, 15 and Theorem 10, by choosing $\stackrel{\xaf}{\left\{{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\mathrm{|}{\xi}_{T}\in {L}^{2}\left({\mathcal{F}}_{T}\right)\mathrm{,}{\mathbb{E}}_{t}\left[{\xi}_{T}\right]\in {L}^{2}\left({\mathcal{F}}_{t}\right)\right\}}\subset {\mathcal{H}}_{t}\subset {L}^{2}\left({\mathcal{F}}_{t}\right)\subset {L}^{2}\left({\mathcal{F}}_{T}\right)={\mathcal{X}}_{T}$.

Proof of Theorem 18. Essentially, Equation (7) is the result of Gauss-Markov Theorem and the consistency property of OLS estimator.

Proof of Proposition 19. This is a direct consequence of the discussion in ( [57], Section 3) (see Equation (5)) and Theorem 10. To elaborate, consider ${\mathcal{X}}_{T}={L}^{2}\left({\mathcal{F}}_{T}\right)$, $x=\psi \left({X}_{T}\right)$, its projections h and ${h}_{n}$ on ${\mathcal{H}}_{t}=\stackrel{\xaf}{{\displaystyle {\cup}_{n=1}^{\infty}{\mathcal{H}}_{t}^{n}}}\subset {L}^{2}\left({\mathcal{F}}_{t}\right)$ and ${\mathcal{H}}_{t}^{n}$ defined in this proposition. Suppose that

$h={\displaystyle {\sum}_{j=1}^{\infty}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\lambda}_{j}{e}_{t}^{j}$ and ${h}_{n}={\displaystyle {\sum}_{j=1}^{{m}_{n}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\mu}_{j}^{n}{e}_{t}^{j}$,

where
${m}_{n}<{m}_{n+1}$ and
${\left\{{e}_{t}^{j}\right\}}_{j=1}^{\infty}$ is a set of orthonormal basis in
${\mathcal{H}}_{t}$. From the repeated projection theorem, we know that
${\mu}_{j}^{n+1}={\mu}_{j}^{n}={\lambda}_{j}$ for any
$1\le j\le {m}_{n}$ ^{6} and
$n\in \mathbb{N}$. From the
${L}^{2}$ property of h, we know that
${\sum}_{j=1}^{\infty}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\lambda}_{j}^{2}<\infty $. Therefore,
${\Vert h-{h}_{n}\Vert}_{{L}^{2}\left({\mathcal{F}}_{T}\right)}={\displaystyle {\sum}_{j=n+1}^{\infty}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\lambda}_{j}^{2}\to 0$ as
$n\to \infty $.

Proof of Proposition 20. This is a direct consequence of the discussion in ( [58], Theorem 2.2), localization arguments, Theorem 10 and the proof of Proposition 19.

Proof of Theorem 23. The first, second and third equality are obvious given an appropriate choice of $\Phi $ depending on the Markov property of X and its moment conditions in Assumption 2. Actually, because of the existence and uniqueness of $\varphi \in \Phi $ such that the RHS of the first equality achieves minimum, we know that

$\underset{\varphi \in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right){\displaystyle \underset{k=1}{\overset{K}{\sum}}}{1}_{{X}_{t}\in {U}_{t}^{k}}-{\displaystyle \underset{k=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\varphi \left(t\mathrm{,}{X}_{t}\right){1}_{{X}_{t}\in {U}_{t}^{k}}\right)}^{2}\right]$ (37)

$\le \underset{{\displaystyle {\sum}_{k=1}^{K}{\varphi}_{k}\left(t\mathrm{,}x\right){1}_{x\in {U}_{t}^{k}}}\in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\varphi}_{k}\left(t\mathrm{,}{X}_{t}\right)\right)}^{2}{1}_{{X}_{t}\in {U}_{t}^{k}}\right]\mathrm{.}$ (38)

From another perspective, we know that ${\mathrm{min}}_{{\displaystyle {\sum}_{k=1}^{K}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\varphi}_{k}\left(t\mathrm{,}x\right){1}_{x\in {U}_{t}^{k}}\in \Phi}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\varphi}_{k}\left(t\mathrm{,}{X}_{t}\right)\right)}^{2}{1}_{{X}_{t}\in {U}_{t}^{k}}\right]$ is a piecewise minimization. Therefore

$\underset{\varphi \in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right){\displaystyle \underset{k=1}{\overset{K}{\sum}}}{1}_{{X}_{t}\in {U}_{t}^{k}}-{\displaystyle \underset{k=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\varphi \left(t\mathrm{,}{X}_{t}\right){1}_{{X}_{t}\in {U}_{t}^{k}}\right)}^{2}\right]$ (39)

$\ge \underset{{\displaystyle {\sum}_{k=1}^{K}{\varphi}_{k}\left(t\mathrm{,}x\right){1}_{x\in {U}_{t}^{k}}}\in \Phi}{\mathrm{min}}\mathbb{E}\left[{\left(\psi \left({X}_{T}\right)-{\varphi}_{k}\left(t\mathrm{,}{X}_{t}\right)\right)}^{2}{1}_{{X}_{t}\in {U}_{t}^{k}}\right]\mathrm{.}$ (40)

The last equality in Equation (13) holds.

Proof of Theorem 24. The proof of this theorem is a direct consequence of Equations (13), (15) and triangle inequality.

B. Other Applications

In this section, we document other applications of our methodologies in finance.

B.1. Joint Valuation and Calibration

Suppose that there are N derivatives contracts whose prices at time ${t}_{0}$ can be expressed as ${\left\{{V}_{{t}_{0}}^{n}\right\}}_{n=1}^{N}$. Their payoffs are ${\left\{{\phi}_{n}\left({X}_{\cdot}\right)\right\}}_{n=1}^{N}$, where X is an

r-dimensional vector of state variables. Sometimes we write ${X}^{\theta}$ to explicitly state dependence of X on its vector of parameters $\theta $. Here suppose ${X}^{\theta}$ satisfies a system of stochastic differential equations with jumps

$\text{d}{X}_{t}^{\theta}=\mu \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{|}\theta \right)\text{d}t+\sigma \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{|}\theta \right)\text{d}{W}_{t}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\gamma \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{,}e\mathrm{|}\theta \right)\stackrel{\u02dc}{N}\left(\text{d}t\mathrm{,}\text{d}e\right)\mathrm{.}$ (41)

The main idea is that ${\left\{{V}_{{t}_{0}}^{n}\right\}}_{n=1}^{N}$ might contain derivatives contracts from different asset classes or hybrid ones. Therefore, we need to model X as a joint high dimensional cross-asset system. One potential problem is that $\theta $ is in general a high-dimensional vector, which will be hard to estimate using usual optimization routines in R or MATLAB software system. However, we can apply ADAM method, studied in [76] for the parameter estimation. It is based on a stochastic iteration method via the gradient of the MSE function. The key to evaluate the gradient of the MSE function is to evaluate the dynamics of ${\partial}_{\theta}{X}_{t}^{\theta}$. It satisfies the following system of SDEJ

$\begin{array}{c}\text{d}{\partial}_{\theta}{X}_{t}^{\theta}={\partial}_{\theta}\mu \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{|}\theta \right)\text{d}t+{\partial}_{x}\mu \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{|}\theta \right){\partial}_{\theta}{X}_{t}^{\theta}\text{d}t\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\partial}_{\theta}\sigma \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{|}\theta \right)\text{d}{W}_{t}+{\partial}_{x}\sigma \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{|}\theta \right){\partial}_{\theta}{X}_{t}^{\theta}\text{d}{W}_{t}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\partial}_{\theta}\gamma \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{,}e\mathrm{|}\theta \right)\stackrel{\u02dc}{N}\left(\text{d}t\mathrm{,}\text{d}e\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\partial}_{x}\gamma \left(t\mathrm{,}{X}_{t}^{\theta}\mathrm{,}e\mathrm{|}\theta \right){\partial}_{\theta}{X}_{t}^{\theta}\stackrel{\u02dc}{N}\left(\text{d}t\mathrm{,}\text{d}e\right)\mathrm{.}\end{array}$ (42)

The existence and uniqueness of the solution to the SDEJ system (42) can be obtained with necessary regularity conditions on the coefficients.

B.2. Option Surface Fitting

There is a strand of literature that strives to fit option panels using different dynamics for the underlying assets, for example, [77] on stochastic volatility models, [78] on local volatility models and [79] on local-stochastic volatility models. Models that incorporate jumps can be found in [80], [81] and references therein.

Consider the following stochastic differential equation

$\begin{array}{l}\frac{\text{d}{S}_{t}}{{S}_{t}}=r\left(t\mathrm{,}{X}_{t}\right)\text{d}t+\sigma \left(t\mathrm{,}{S}_{t}\mathrm{,}{X}_{t}\right)\text{d}{W}_{t},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{S}_{0}={s}_{0}\\ \text{d}{X}_{t}=\alpha \left(t\mathrm{,}{X}_{t}\right)\text{d}t+\beta \left(t\mathrm{,}{X}_{t}\right)\text{d}{W}_{t},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{X}_{0}={x}_{0}\mathrm{.}\end{array}$ (43)

Here we model $\sigma $ by a DNN. The advantage of doing so is that it might fully capture the market volatility surface meantime ensuring a good dynamic fit, while still preserving the existence and uniqueness result for the related stochastic differential equation system (43).

B.3. Credit Risk Management: Evaluation on a Future Simulation Grid

We refer the problem definition to [6]. It is easy to illustrate that the problem is equivalent to the evaluation of conditional expectations on a future simulation grid and our methods are suitable for this type of problems. Note that, some XVA quantities, such as KVA, require the evaluation of CVA on a future simulation grid. Our methodologies, such as the ones proposed in Sections 2 and B.7, can be applied on the evaluation of KVA, once we obtain future present values of financial claims.

B.4. Dynamic Hedging

There are references that utilize machine learning (mainly Reinforcement Learning, or RL) to solve dynamic hedging problems, e.g., [82], [83] and [84]. However, here in this paper we will not follow this route. Instead, we use the BSDE formulation of the problem in [2] and try to solve the BSDE that characterizes the hedging problem. The methodology is outlined in Appendix B.11.

B.5. Dynamic Portfolio-Consumption Choice

We use [85] as an example and try to solve the related coupled FBSDE with jumps. The methodology is outlined in Appendix B.11. Other examples of dynamic portfolio optimization can be found in [53], [86], [87], [88], [89], [90], [91], [92], [93] and [94]. Essentially, dynamic portfolio-consumption choice problems are stochastic programming in nature and can be related to HJB equations or BSDEs. An example of using HJB representation of the problem can be found in [95]. The equations can be solved using the methodologies outlined in Section 0 and Appendix B.11.

B.6. Transition Density Approximation

We can generalize the theory in [96] and [97] to approximate the transition density of a multivariate time-inhomogeneous stochastic differential equation with jumps. According to [96] and [97], the transition density of a multivariate time-inhomogeneous stochastic differential equation with or without jumps can be approximated by polynomials in a weighted-Hilbert space. See ( [97], Equation (2.1)), for example. The key is to evaluate the coefficients ${\left\{{c}_{\alpha}\right\}}_{\alpha}$, which is, again, the evaluation of conditional expectations. The resulted transition density can be used in option pricing, MLE estimation for MSDEJs and prediction, filtering and smoothing problems for hidden Markov models, see [98].

B.7. Evaluating Conditional Expectations via a Measure Change

Consider the following equation

${\mathbb{E}}_{t}\left[\psi \left({X}_{\tau}\right)\right]={\displaystyle {\int}_{{\mathbb{R}}^{r}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\Gamma \left(t\mathrm{,}x\mathrm{;}\tau \mathrm{,}y\right)\psi \left(y\right)\text{d}y$ (44)

$={\displaystyle {\int}_{{\mathbb{R}}^{r}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\Gamma}_{0}\left({t}_{0}\mathrm{,}x\mathrm{;}\tau \mathrm{,}y\right)\frac{\Gamma \left(t\mathrm{,}x\mathrm{;}\tau \mathrm{,}y\right)}{{\Gamma}_{0}\left({t}_{0}\mathrm{,}x\mathrm{;}\tau \mathrm{,}y\right)}\psi \left(y\right)\text{d}y$ (45)

where
${\Gamma}_{0}$ is the transition density of a stochastic differential equation with jumps, which can be simulated for arbitrary
$\left(t\mathrm{,}\tau \right)$ without using time discretization^{7} and
$\Gamma $ is the transition density function of X.
$\Gamma $ can be approximated by the method outlined in Appendix B.6. It is immediately obvious that we can generate random numbers from
${\Gamma}_{0}$ and reuse them for the evaluation of the conditional expectation on the left hand side of Equation (44) for different
$\left(t\mathrm{,}\tau \right)$.

B.8. Empirical Asset Pricing with Factor Models: Evaluating Expected Returns

In this section, we propose to use machine learning, mainly, ANN techniques, to construct factor models and evaluate the conditional expected asset returns and risk-premium cross-sectionally. Related references are [28] and [74], among others. [3] provide a good example with basis function expansion to capture the non-linearity in asset returns. Specifically, consider the following lead-lag regression

${R}_{t+1}=f\left(t\mathrm{,}{X}_{t}\right)+{\epsilon}_{t\mathrm{,}t+1}\mathrm{.}$ (46)

Here ${\mathbb{E}}_{t}\left[{\epsilon}_{t\mathrm{,}t+1}\right]=0$ and X is a set of risk factors. Then, ${\mathbb{E}}_{t}\left[{R}_{t+1}\right]=f\left(t,{X}_{t}\right)$. Linear factor models assume that $f\left(t,x\right)={a}_{t}+{b}_{t}x$. f can also be approximated by basis function expansion, using universal approximation theorem, or via ANNs. The fitted conditional expected asset returns can be fed into the mean-variance optimizer, i.e., [99] and construct long-short portfolios or other trading strategies.

B.9. Recovery and Representation Theorem

In [100], the authors propose a model-free recovery theorem, based on a series expansion of higher order conditional moments of asset returns. Their work inspires us to exploit the ANN-factor models to represent the higher order conditional moments of the asset returns and therefore validating the recovery theorem proposed there-in. Moreover, similar to [57], our machine learning approximation to the conditional expectations of financial payoffs amounts to a compound option representation of arbitrary ${L}^{2}$ -claims in the financial economic system. Also, the second numerical method means that any financial claim, can be locally approximated by a linear combination of power derivatives, following the same idea.

B.10. Theoretical Asset Pricing via Dynamic Stochastic General Equilibrium

Note that, the equation systems proposed in [101], [102] and [103] can be transformed into BSDEs and we can use time discretization and apply the techniques proposed in Section 2 and Appendix B.11 to solve them. In this paper, however, we will not test our methods on this strand of literature.

B.11. Solving High-Dimensional CFBSDEJs

A coupled forward-backward stochastic differential equation with jumps (CFBSDEJ) can be written as

$\begin{array}{l}\text{d}{X}_{t}=\mu \left(t\mathrm{,}{X}_{t}\mathrm{,}{Y}_{t}\mathrm{,}{Z}_{t}\mathrm{,}{V}_{t}\right)\text{d}t+\sigma \left(t\mathrm{,}{X}_{t}\mathrm{,}{Y}_{t}\mathrm{,}{Z}_{t}\mathrm{,}{V}_{t}\right)\text{d}{W}_{t}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\gamma \left(t\mathrm{,}{X}_{t}\mathrm{,}{Y}_{t}\mathrm{,}{Z}_{t}\mathrm{,}{V}_{t}\mathrm{,}e\right)\stackrel{\u02dc}{N}\left(\text{d}t\mathrm{,}\text{d}e\right)\\ {X}_{0}={x}_{0}\\ \text{d}{Y}_{t}=f\left(t\mathrm{,}{X}_{t}\mathrm{,}{Y}_{t}\mathrm{,}{Z}_{t}\mathrm{,}{V}_{t}\right)\text{d}t+{Z}_{t}\text{d}{W}_{t}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{t}\left(e\right)\stackrel{\u02dc}{N}\left(\text{d}t\mathrm{,}\text{d}e\right)\\ {V}_{t}={\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{t}\left(e\right)\nu \left(\text{d}e\right)\\ {Y}_{T}=\varphi \left({X}_{T}\right)\end{array}$ (47)

where $\stackrel{\u02dc}{N}\left(\text{d}t\mathrm{,}\text{d}e\right)=N\left(\text{d}t\mathrm{,}\text{d}e\right)-\nu \left(\text{d}e\right)\text{d}t$ is a compensated Poisson random measure. We take the following steps to solve Equation (47) numerically.

Time Discretization

Discretize time interval $\left[\text{\hspace{0.05em}}t\mathrm{,}T\right]$ into n-equal distance sub-intervals $\pi ={\left\{\left[{t}_{i},{t}_{i+1}\right)\right\}}_{i=0}^{n-1}$ with $h=\frac{{t}_{i+1}-{t}_{i}}{n}$, ${t}_{0}=t$ and ${t}_{n}=T$. Consider the following Euler discretized equation.

$\begin{array}{l}\text{d}{X}_{{t}_{i}}=\mu \left({t}_{i}\mathrm{,}{X}_{{t}_{i}}\mathrm{,}{Y}_{{t}_{i}}\mathrm{,}{Z}_{{t}_{i}}\mathrm{,}{V}_{{t}_{i}}\right)h+\sigma \left({t}_{i}\mathrm{,}{X}_{{t}_{i}}\mathrm{,}{Y}_{{t}_{i}}\mathrm{,}{Z}_{{t}_{i}}\mathrm{,}{V}_{{t}_{i}}\right)\text{d}{W}_{{t}_{i}}\\ \text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\gamma \left({t}_{i}\mathrm{,}{X}_{{t}_{i}}\mathrm{,}{Y}_{{t}_{i}}\mathrm{,}{Z}_{{t}_{i}}\mathrm{,}{V}_{{t}_{i}}\mathrm{,}e\right)\stackrel{\u02dc}{N}\left(\text{d}{t}_{i}\mathrm{,}\text{d}e\right)\\ {X}_{0}={x}_{0}\\ \text{d}{Y}_{{t}_{i}}=f\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}\mathrm{,}{Y}_{{t}_{i}}\mathrm{,}{Z}_{{t}_{i}}\mathrm{,}{V}_{{t}_{i}}\right)h+{Z}_{{t}_{i}}\text{d}{W}_{{t}_{i}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{{t}_{i}}\left(e\right)\stackrel{\u02dc}{N}\left(\text{d}{t}_{i}\mathrm{,}\text{d}e\right)\\ {V}_{{t}_{i}}={\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{{t}_{i}}\left(e\right)\nu \left(\text{d}e\right)\\ {Y}_{T}=\varphi \left({X}_{T}\right)\end{array}$ (48)

where $\text{d}{X}_{{t}_{i}}:={X}_{{t}_{i+1}}-{X}_{{t}_{i}}$ and $\text{d}{Y}_{{t}_{i}}\mathrm{:}={Y}_{{t}_{i+1}}-{Y}_{{t}_{i}}$. Denote the solution to the time-discretized CFBSDEJ as $\left({X}^{\pi}\mathrm{,}{Y}^{\pi}\mathrm{,}{Z}^{\pi}\mathrm{,}{U}^{\pi}\right)$. We need the following assumption.

Assumption 25. Under the norm ${\Vert \text{\hspace{0.05em}}\cdot \text{\hspace{0.05em}}\Vert}_{\mathcal{K}\left[t\mathrm{,}T\right]}^{2}$ introduced in [104], we have

${\Vert \left(X\mathrm{,}Y\mathrm{,}Z\mathrm{,}U\right)-\left({X}^{\pi}\mathrm{,}{Y}^{\pi}\mathrm{,}{Z}^{\pi}\mathrm{,}{U}^{\pi}\right)\Vert}_{\mathcal{K}\left[t\mathrm{,}T\right]}^{2}\to 0$ (49)

as $n\to \infty $.

Mollification

Define a sequence of functions $\left({\mu}^{m}\mathrm{,}{\sigma}^{m}\mathrm{,}{\gamma}^{m}\mathrm{,}{f}^{m}\mathrm{,}{\varphi}^{m}\right)$, which are bounded and have bounded derivatives of all orders and

$\underset{m\to \infty}{\mathrm{lim}}\left({\mu}^{m}\mathrm{,}{\sigma}^{m}\mathrm{,}{\gamma}^{m}\mathrm{,}{f}^{m}\mathrm{,}{\varphi}^{m}\right)=\left(\mu \mathrm{,}\sigma \mathrm{,}\gamma \mathrm{,}f\mathrm{,}\varphi \right)$ (50)

in a point-wise sense. Also denote the solution to the CFBSDEJ with coefficients $\left({\mu}^{m}\mathrm{,}{\sigma}^{m}\mathrm{,}{\gamma}^{m}\mathrm{,}{f}^{m}\mathrm{,}{\varphi}^{m}\right)$ as $\left({X}^{m}\mathrm{,}{Y}^{m}\mathrm{,}{Z}^{m}\mathrm{,}{U}^{m}\right)$. Then, we have the following theorem.

Theorem 26. Under Assumption 25

${\mathbb{E}}_{t}\left[g\left({X}_{u}^{\pi \mathrm{,}m}\mathrm{,}{Y}_{u}^{\pi \mathrm{,}m}\mathrm{,}{Z}_{u}^{\pi \mathrm{,}m}\mathrm{,}{V}_{u}^{\pi \mathrm{,}m}\right)\right]\to {\mathbb{E}}_{t}\left[g\left({X}_{u}\mathrm{,}{Y}_{u}\mathrm{,}{Z}_{u}\mathrm{,}{V}_{u}\right)\right]$ (51)

as $n\mathrm{,}m\to \infty $ for arbitrary $T>u>t>0$. g is a function with at most polynomial growth in its arguments.

Picard Iteration

After the time discretization and mollification are done, we will resort to Picard fixed point iteration technique to decompose the solution $\left({X}^{\pi \mathrm{,}m}\mathrm{,}{Y}^{\pi \mathrm{,}m}\mathrm{,}{Z}^{\pi \mathrm{,}m}\mathrm{,}{U}^{\pi \mathrm{,}m}\right)$ to a sequence of uncoupled FBSDEJs whose solutions are denoted by $\left({X}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{Y}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{Z}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{U}^{\pi \mathrm{,}m\mathrm{,}k}\right)$, where k denotes the index of Picard iteration. For zeroth order, consider

$\begin{array}{l}\text{d}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}={\mu}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\mathrm{,0,0,0}\right)h+{\sigma}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\mathrm{,0,0,0}\right)\text{d}{W}_{{t}_{i}}\\ \text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\gamma}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\mathrm{,0,0,0,}e\right)\stackrel{\u02dc}{N}\left(\text{d}{t}_{i}\mathrm{,}\text{d}e\right)\\ {X}_{0}^{\pi \mathrm{,}m\mathrm{,1}}={x}_{0}\end{array}$

$\begin{array}{l}\text{d}{Y}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}={f}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\mathrm{,}{Y}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\mathrm{,}{Z}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\mathrm{,}{V}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\right)h+{Z}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\text{d}{W}_{{t}_{i}}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\left(e\right)\stackrel{\u02dc}{N}\left(\text{d}{t}_{i}\mathrm{,}\text{d}e\right)\\ {V}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}={\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,1}}\left(e\right)\nu \left(\text{d}e\right)\\ {Y}_{T}^{\pi \mathrm{,}m\mathrm{,1}}=\varphi \left({X}_{T}^{\pi \mathrm{,}m\mathrm{,1}}\right)\end{array}$ (52)

For $k\ge 2$, define

$\begin{array}{l}\text{d}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}={\mu}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{Y}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\mathrm{,}{Z}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\mathrm{,}{V}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\right)h\\ \text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\sigma}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{Y}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\mathrm{,}{Z}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\mathrm{,}{V}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\right)\text{d}{W}_{{t}_{i}}\\ \text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\gamma}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{Y}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\mathrm{,}{Z}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\mathrm{,}{V}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k-1}\mathrm{,}e\right)\stackrel{\u02dc}{N}\left(\text{d}{t}_{i}\mathrm{,}\text{d}e\right)\\ {X}_{0}^{\pi \mathrm{,}m\mathrm{,}k}={x}_{0}\end{array}$

$\begin{array}{l}\text{d}{Y}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}={f}^{m}\left({t}_{i}\mathrm{,}{X}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{Y}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{Z}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\mathrm{,}{V}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\right)h\\ \text{\hspace{1em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}+{Z}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\text{d}{W}_{{t}_{i}}+{\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\left(e\right)\stackrel{\u02dc}{N}\left(\text{d}{t}_{i}\mathrm{,}\text{d}e\right)\\ {V}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}={\displaystyle {\int}_{E}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{U}_{{t}_{i}}^{\pi \mathrm{,}m\mathrm{,}k}\left(e\right)\nu \left(\text{d}e\right)\\ {Y}_{T}^{\pi \mathrm{,}m\mathrm{,}k}=\varphi \left({X}_{T}^{\pi \mathrm{,}m\mathrm{,}k}\right)\end{array}$ (53)

Evaluation of Conditional Expectations

For Equation system (53), we can start from the last time interval and work backwards. The problem is transformed into the evaluation of ${\mathbb{E}}_{{t}_{i}}\left[u\left({t}_{i+1}\mathrm{,}{X}_{{t}_{i+1}}^{\pi \mathrm{,}m\mathrm{,}k}\right)\right]$, where u is the intermediate solution and satisfies $u\left(T\mathrm{,}\cdot \right)=\varphi (\cdot )$.

B.12. Pricing Kernel Approximation

A pricing kernel ${\eta}_{t}$ is an ${L}^{2}\left({\mathcal{F}}_{t}\right)$ stochastic process, adapted to the information filtration ${\left\{{\mathcal{F}}_{t}\right\}}_{0\le t\le T}$, such that

${V}_{t}={\mathbb{E}}_{t}\left[{D}_{t,T}{\eta}_{t,T}{V}_{T}\right]$ (54)

where ${V}_{T}$ is an ${\mathcal{F}}_{T}$ payoff, ${D}_{t\mathrm{,}T}=\frac{{D}_{T}}{{D}_{t}}={\text{e}}^{-{\displaystyle {\int}_{t}^{T}}{r}_{v}\text{d}v}$ and ${\eta}_{t,T}=\frac{{\eta}_{T}}{{\eta}_{t}}$. It is obvious that ${\eta}_{t}={\mathbb{E}}_{t}\left[{\eta}_{T}\right]$, i.e., $\eta $ is a $\mathbb{P}$ -martingale. Represent

${D}_{T}{\eta}_{T}={\displaystyle {\sum}_{j=0}^{\infty}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}^{j}{e}_{T}^{j}(\theta j)$

where ${\left\{{e}_{T}^{j}\right\}}_{j=0}^{\infty}$ is a set of orthonormal basis in ${L}^{2}\left({\mathcal{F}}_{T}\right)$ space and ${\theta}_{j}$ is the vector of coefficients of ${e}^{j}$. Suppose that we have K derivative contracts, denoted by ${\left\{{V}_{T}^{k}\right\}}_{k=1}^{K}$, with basis representation ${V}_{T}^{k}={\displaystyle {\sum}_{j=0}^{\infty}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{b}_{k}^{j}{e}_{T}^{j}\left({\theta}_{j}\right)$. Therefore

${V}_{{t}_{0}}^{k}={\mathbb{E}}_{{t}_{0}}\left[{\displaystyle \underset{j=0}{\overset{\infty}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}^{j}{e}_{T}^{j}\left({\theta}_{j}\right){\displaystyle \underset{j=0}{\overset{\infty}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{b}_{k}^{j}{e}_{T}^{j}\left({\theta}_{j}\right)\right]={\displaystyle \underset{j=0}{\overset{\infty}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}^{j}{b}_{k}^{j}.$ (55)

Equation (55), if truncated after J terms, formulates a linear equation system and the unknowns ${\left\{{a}^{j}\right\}}_{j=0}^{J}$ and ${\left\{{\theta}_{j}\right\}}_{j=0}^{J}$ can be recovered from ordinary least square optimization. After we obtain ${\eta}_{T}$, ${\eta}_{t}$ can be recovered by ${\eta}_{t}={\mathbb{E}}_{t}\left[{\eta}_{T}\right]$, via the methodology outlined in Section 2.

Remark 27. If ${\left\{{e}_{t}^{j}\left({\theta}_{j}\right)\right\}}_{j=0}^{\infty}$ is not orthonormal, Equation (55) becomes nonlinear in ${\left\{{\theta}_{j}\right\}}_{j=0}^{J}$. The evaluations remain the same, with only more complicated numerical computations. The basis can also be represented by ANNs.

Remark 28. For a specific representation via universal approximation theorem, see [55].

Remark 29. It is possible to allow shape constraints in the estimation (55) and formulate a constrained optimization problem, see [105], for example.

We can also directly utilize the method proposed in Section 2, when used with time discretization and Monte Carlo simulation. Denote M as the number of

sample paths and ${\left\{{V}_{T}^{m,k}\right\}}_{m=1,k=1}^{M,K}$ as M simulated final payoffs for each of the K derivatives. Define ${\left\{{a}_{m}\right\}}_{m=1}^{M}$ as M real numbers. Let ${\left\{{V}_{0}^{k}\right\}}_{k=1}^{K}$ be K derivative prices at time ${t}_{0}=0$. Find the solution to the following optimization problem

${\left\{{a}_{m}\right\}}_{m=1}^{M}=\mathrm{arg}\underset{{\left\{{\varphi}_{m}\right\}}_{m=1}^{M}}{\mathrm{min}}\text{}\left[{\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\left({V}_{0}^{k}-\frac{1}{M}{\displaystyle \underset{m=1}{\overset{M}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\varphi}_{m}{V}_{T}^{m,k}\right)}^{2}\right]\mathrm{.}$ (56)

After obtaining ${\left\{{a}_{m}\right\}}_{m=1}^{M}$, we try to find function relation g such that

${a}_{m}=g\left(T,{X}_{T}^{m}\right)={D}_{0,T}^{m}{\eta}_{T}^{m}$

where ${\left\{{X}_{T}^{m}\right\}}_{m=1}^{M}$ is a set of simulated state variables at time T. When fitting g, we can add some shape or no-arbitrage constraints, or other regularization conditions, to the optimization problem and formulate a constrained ANN

(ACNN). We always assume that the matrix $\text{t}\left({\left\{{V}_{T}^{m,k}\right\}}_{m=1,k=1}^{M,K}\right){\left\{{V}_{T}^{m\mathrm{,}k}\right\}}_{m=1,k=1}^{M\mathrm{,}K}$ is a $K\times K$ invertible matrix, where $\text{t}(\cdot )$ is the matrix transpose operator.

C. Intuition of Convergence Proof for Appendix B.11

In Appendix B.11, we propose a method to solve numerically a CFBSDEJ. As long as the time discretization step is convergent, we can argue that the methodology converges, in some sense, to the true one, as outlined above in Appendix B.11. Potentially, we need an a priori estimate formula, similar to the one in [2], for coupled BSDEs, to justify Picard iteration at every time discretization step.

NOTES

^{1}It is obvious that
${\left\{{e}_{t}^{j}\right\}}_{j\in \Lambda}$ can be the basis or frame of
${L}^{2}\left({\mathcal{F}}_{t}\right)$. However, we do not assume so in this paper.

^{2}It is the linear space spanned by the set
${\left\{{e}^{j}\left({\theta}_{j}\right)\right\}}_{j=0}^{\infty}$.

^{3}We should understand that distance can be defined in function space
$\Phi $.

^{4}K can be positive infinity, i.e.,
$K=\infty $.

^{5}We will only show convergence of Methods 1 and 2.

^{6}Here we only consider the case where
$\left|{\Lambda}_{n}\right|={m}_{n}<\infty $ for any
$n\in \mathbb{N}$. The case with
$\left|{\Lambda}_{n}\right|=\infty $ is analogous.

^{7}For example, a Lévy process.

References

[1] Longstaff, F. and Schwartz, E. (2001) Valuing American Options by Simulation: A Simple Least—Square Approach. The Review of Financial Studies, 14, 113-147.

https://doi.org/10.1093/rfs/14.1.113

[2] El Karoui, N., Peng, S. and Quenez, M.C. (1997) Backward Stochastic Differential Equations in Finance. Mathematical Finance, 7, 1-71.

https://doi.org/10.1111/1467-9965.00022

[3] Adrian, T., Crump, R. and Vogt, E. (2018) Nonlinearity and Flight-to-Safety in the Risk-Return Trade-Off for Stocks and Bonds. Forthcoming in Journal of Finance, 74, 1931-1973.

[4] Fama, E. and French, K. (1993) Common Risk Factors in the Returns on Stocks and Bonds. Journal of Financial Economics, 33, 3-56.

https://doi.org/10.1016/0304-405X(93)90023-5

[5] Fama, E. and French, K. (2015) A Five-Factor Asset Pricing Model. Journal of Financial Economics, 116, 1-22.

[6] Zhu, S. and Pykhtin, M. (2008) A Guide to Modeling Counterparty Credit Risk. Working Paper.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1032522

[7] Aydogdu, M. (2018) Predicting Stock Returns Using Neural Networks. Working Paper.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3141492

https://doi.org/10.2139/ssrn.3141492

[8] Voshgha, H. (2008) Early Detection of Defaulting Firms: Artificial Neural Network Application; Australian Context. Working Paper.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2130505

[9] Hutchinson, J., Lo, A. and Poggio, T. (1994) A Nonparametric Approach to Pricing and Hedging Derivative Securities via Learning Networks. Journal of Finance, 49, 851-889.

https://doi.org/10.1111/j.1540-6261.1994.tb00081.x

[10] Hahn, J.T. (2013) Option Pricing Using Artificial Neural Networks: The Australian Perspective. Ph.D. Thesis, Bond University, Queensland.

[11] Kohler, M., Krzyzak, M. and Todorovic, N. (2010) Pricing of High-Dimensional American Options by Neural Networks. Mathematical Finance, 20, 383-410.

https://doi.org/10.1111/j.1467-9965.2010.00404.x

[12] Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C. and Garcia, R. (2009) Incorporating Functional Knowledge in Neural Networks. Journal of Machine Learning Research, 10, 1239-1262.

[13] Eckstein, S., Kupper, M. and Pohl, M. (2018) Robust Risk Aggregation with Neural Networks. Quantitative Finance, 1-40.

https://arxiv.org/abs/1811.00304

[14] Giovanis, E. (2010) Applications of Neural Network Radial Basis Function in Economics and Financial time Series. SSRN Electronic Journal.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1667442

https://doi.org/10.2139/ssrn.1667442

[15] Kopitkov, D. and Indelman, V. (2018) Deep PDF: Probabilistic Surface Optimization and Density Estimation. Computer Science, 1-18.

https://arxiv.org/abs/1807.10728

[16] Luo, R., Zhang, W., Xu, X. and Wang, J. (2017) A Neural Stochastic Volatility Model. Computer Science, 1-11.

https://arxiv.org/pdf/1712.00504.pdf

[17] Sasaki, H. and Hyvarinen, A. (2018) Neural-Kernelized Conditional Density Estimation. Statistics, 1-12.

https://arxiv.org/abs/1806.01754

[18] Weissensteiner, A. (2009) AQ-Learning Approach to Derive Optimal Consumption and Investment Strategies. IEEE Transactions on Neural Networks, 20, 1234-1243.

https://doi.org/10.1109/TNN.2009.2020850

[19] Casgrain, P. and Jaimungal, S. (2016) Trading Algorithms with Learning in Latent Alpha Models. SSRN Electronic Journal.

https://doi.org/10.2139/ssrn.2871403

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2871403

[20] Heaton, J., Polson, N. and Witte, J. (2016) Deep Learning for Finance: Deep Portfolios. Applied Stochastic Models in Business and Industry, 33, 3-12.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2838013

https://doi.org/10.2139/ssrn.2838013

[21] Samo, Y. and Vernuurt, A. (2016) Stochastic Portfolio Theory: A Machine Learning Perspective. Quantitative Finance, 1-9.

https://arxiv.org/pdf/1605.02654.pdf

[22] Jiang, Z., Xu, D. and Liang, J. (2017) A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem. Computational Finance, 1-31.

https://arxiv.org/pdf/1706.10059.pdf

[23] Deng, Y., Bao, F., Kong, Y., Ren, Z. and Dai, Q. (2017) Deep Direct Reinforcement Learning for Financial Signal Representation and Trading. IEEE Transactions on Neural Networks and Learning Systems, 28, 653-664.

https://doi.org/10.1109/TNNLS.2016.2522401

[24] Halperin, I. (2017) QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds. Quantitative Finance, 1-34.

https://arxiv.org/abs/1712.04609v2

https://doi.org/10.2139/ssrn.3087076

[25] Ritter, G. (2017) Machine Learning for Trading. Working Paper.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3015609

https://doi.org/10.2139/ssrn.3015609

[26] Xing, F., Cambrida, E., Malandri, L. and Vercellis, C. (2018) Discovering Bayesian Market Views for Intelligent Asset Allocatio.

https://arxiv.org/pdf/1802.09911.pdf

[27] Becker, S., Cheridito, P. and Jentzen, A. (2018) Deep Optimal Stopping. Mathematics, arXiv: 1804. 05394.

https://arxiv.org/abs/1804.05394

[28] Gu, S., Kelly, B. and Xiu, D. (2018) Empirical Asset Pricing via Machine Learning. 31st Australasian Finance and Banking Conference 2018, Sydney, 13-15 December 2018.

https://doi.org/10.3386/w25398

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3159577

[29] Weinan, E., Han, J. and Jentzen, A. (2017) Deep Learning-Based Numerical Methods for High-Dimensional Parabolic Partial Differential Equations and Backward Stochastic Differential Equations. Mathematics, 1-39.

https://arxiv.org/pdf/1706.04702.pdf

[30] Weinan, E., Hutzenthaler, M., Jentzen, A. and Kruse, T. (2017) On Multilevel Picard Numerical Approximations for High-Dimensional Nonlinear Parabolic Partial Differential Equations and High-Dimensional Nonlinear Backward Stochastic Differential Equations. Mathematics, 1-25.

https://arxiv.org/pdf/1708.03223.pdf

[31] Han, J., Jentzen, A. and Weinan, E. (2017) Overcoming the Curse of Dimensionality: Solving High-Dimensional Partial Differential Equations Using Deep Learning. Mathematics, 1-14.

https://arxiv.org/pdf/1707.02568.pdf

[32] Khoo, Y., Lu, J. and Ying, L. (2017) Solving Parametric PDE Problems with Artificial Neural Networks. Mathematics, 1-17.

https://arxiv.org/pdf/1707.03351.pdf

[33] Beck, C., Weinan, E. and Jentzen, A. (2017) Machine Learning Approximation Algorithms for High-Dimensional Fully Nonlinear Partial Differential Equations and Second-Order Backward Stochastic Differential Equations. Mathematics, 1-56.

https://arxiv.org/pdf/1709.05963.pdf

[34] Sirignano, J. and Spiliopoulos, K. (2017) DGM: A Deep Learning Algorithm for Solving Partial Differential Equations. Mathematics, 1-31.

https://arxiv.org/pdf/1708.07469.pdf

[35] Long, Z., Lu, Y. and Ma, X. (2018) PDE-Net: Learning PDEs from Data. Mathematics, 1-17.

https://arxiv.org/pdf/1710.09668.pdf

[36] Long, Z. and Lu, Y. (2018) PDE-Net 2.0: Learning PDEs from Data with a Numeric Symbolic Hybrid Deep Network. Computer Science, 1-16.

https://arxiv.org/pdf/1812.04426.pdf

[37] Haehnel, P., Marecek, J. and Monteil, J. (2018) Scaling up Deep Learning for PDE-Based Models. Computer Science, 1-39.

https://arxiv.org/pdf/1810.09425.pdf

[38] Berg, J. and Nystrom, K. (2018) Data-Driven Discovery of PDEs in Complex Datasets. Statistics, 1-22.

https://arxiv.org/pdf/1808.10788.pdf

[39] Rudy, S., Alla, A., Brunton, S. and Nathan Kutz, J. (2018) Data-Driven Identification of Parametric Partial Differential Equations. Mathematics, 1-17.

https://arxiv.org/pdf/1806.00732.pdf

[40] Detemple, J., Lorig, M., Rindisbacher, M. and Zhang, L. (2018) An Analytical Expansion Method for Forward Backwards to Chastic Differential Equations with Jumps.

[41] Briand, P. and Labart, C. (2012) Simulation of BSDEs by Wiener Chaos Expansion. The Annals of Applied Probability, 24, 1129-1171.

https://doi.org/10.1214/13-AAP943

[42] Geiss, C. and Labart, C. (2015) Simulation of BSDEs with Jumps by Wiener Chaos Expansion. Mathematics, arXiv: 1502.05649.

http://arxiv.org/abs/1502.05649

[43] Gnameho, K., Stadje, M. and Pelsser, A. (2017) A Regression-Later Algorithm for Backward Stochastic Differential Equations. Mathematics, 1-33.

https://arxiv.org/pdf/1706.07986

[44] Gobet, E. and Labart, C. (2007) Error Expansion for the Discretization of Backward Stochastic Differential Equations. Stochastic Processes and Their Applications, 117, 803-829.

https://doi.org/10.1016/j.spa.2006.10.007

[45] Takahashi, A. and Yamada, T. (2016) An Asymptotic Expansion for Forward-Backward SDEs: A Malliavin Calculus Approach. Asia-Pacific Financial Markets, 23, 337-373.

[46] Takahashi, A. and Yamada, T. (2015) On the Expansion to Quadratic FBSDEs.

[47] Gobet, E. and Pagliarani, S. (2014) Analytical Approximations of BSDEs with Non-Smooth Driver. SIAM Journal on Financial Mathematics, 6, 919-958.

https://doi.org/10.2139/ssrn.2448691

[48] Fujii, M. and Takahashi, A. (2012) Analytical Approximation for Non-Linear FBSDEs with Perturbation Scheme. International Journal of Theoretical and Applied Finance, 15, Article ID: 1250034.

https://doi.org/10.1142/S0219024912500343

[49] Fujii, M. and Takahashi, A. (2012) Perturbative Expansion of FBSDE in an Incomplete Market with Stochastic Volatility. The Quarterly Journal of Finance, 2, 1-22.

https://doi.org/10.2139/ssrn.1999137

[50] Fujii, M. and Takahashi, A. (2015) Asymptotic Expansion for Forward-Backward SDEs with Jumps. Quantitative Finance, 1-39.

https://doi.org/10.2139/ssrn.2672890

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2672890

[51] Fujii, M. and Takahashi, A. (2016) Quadratic-Exponential Growth BSDEs with Jumps and Their Malliavin’s Differentiability. Working Paper.

https://doi.org/10.2139/ssrn.2705670

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2705670

[52] Fujii, M. and Takahashi, A. (2016) Solving Backward Stochastic Differential Equations by Connecting the Short-Term Expansions. Quantitative Finance, 1-41.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2795490

[53] Detemple, J. and Rindisbacher, M. (2005) Closed-Form Solutions for Optimal Portfolio Selection with Stochastic Interest Rate and Investment Constraints. Mathematical Finance, 15, 539-568.

https://doi.org/10.1111/j.1467-9965.2005.00250.x

[54] Hansen, L. and Richard, S. (1987) The Role of Conditioning Information in Deducing Testable Restrictions Implied by Dynamic Asset Pricing Models. Econometrica, 55, 587-613.

https://doi.org/10.2307/1913601

[55] Jiang, J. and Tian, W. (2018) Semi-Nonparametric Approximation and Index Options. Annals of Finance, 1-38.

https://doi.org/10.1007/s10436-018-0341-4

[56] Tian, W. (2014) Spanning with Indexes. Journal of Mathematical Economics, 53, 111-118.

https://doi.org/10.1016/j.jmateco.2014.06.007

[57] Tian, W. (2018) The Financial Market: Not as Big as You Think. Mathematics and Financial Economics, 51, 1-19.

[58] Bolcskei, H., Grohs, P., Kutyniok, G. and Petersen, P. (2018) Optimal Approximation with Sparsely Connected Deep Neural Networks. Computer Science, 1-36.

https://arxiv.org/abs/1705.01714

[59] Henry-Labordere, P. (2015) Exact Simulation of Multi-Dimensional Stochastic Differential Equations. Working Paper, 1-28.

https://doi.org/10.2139/ssrn.2598505

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2598505

[60] Prater, A. (2012) Discrete Sparse Fourier Hermite Approximations in High Dimensions. Doctoral Thesis, Syracuse University, New York.

[61] Fonseca, Y., Medeiros, M., Vasconcelos, G. and Veiga, A. (2018) Boost: Boosting Smooth Trees for Partial Effect Estimation in Nonlinear Regressions. Statistics, 1-30.

https://arxiv.org/pdf/1808.03698.pdf

[62] Detemple, J. (2006) American-Style Derivatives: Valuation and Computation. Chapman and Hall/CRC, New York.

https://doi.org/10.1201/9781420034868

[63] Guyon, J. and Henry-Labordere, P. (2014) Nonliner Option Pricing. Chapman and Hall, New York.

https://doi.org/10.1201/b16332

[64] Detemple, J., Garcia, R. and Rindisbacher, M. (2005) Representation Formulas for Malliavin Derivatives of Diffusion Processes. Finance and Stochastics, 9, 349-367.

https://doi.org/10.1007/s00780-004-0151-6

[65] Detemple, J. and Rindisbacher, M. (2005) Asymptotic Properties of Monte Carlo Estimators of Derivatives. Management Science, 51, 1657-1675.

https://doi.org/10.1287/mnsc.1050.0398

[66] Detemple, J. (2014) Optimal Exercise for Derivative Securities. Annual Review of Financial Economics, 6, 459-487.

https://doi.org/10.1146/annurev-financial-110613-034241

[67] Fujii, M., Sato, S. and Takahashi, A. (2012) An FBSDE Approach to American Option Pricing with an Interacting Particle Method. Quantitative Finance, 1-18.

https://arxiv.org/abs/1211.5867

https://doi.org/10.2139/ssrn.2180696

[68] Chassagneux, J., Elie, R. and Kharroubi, I. (2010) A Note on Existence and Uniqueness for Solutions of Multidimensional Reflected BSDEs. Electronic Communications in Probability, 16, 120-128.

https://doi.org/10.1214/ECP.v16-1614

[69] Collin-Dufresne, P. and Goldstein, R. (2003) Generalizing the Affine Framework to HJM and Random Field Models. SSRN Electronic Journal.

https://doi.org/10.2139/ssrn.410421

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=410421

[70] Carmona, R. and Delarue, F. (2015) Forward-Backward Stochastic Differential Equations and Controlled McKean-Vlasov Dynamics. Annals of Probability, 43, 2647-2700.

https://doi.org/10.1214/14-AOP946

[71] Bianchi, D., Büchner, M. and Tamoni, A. (2019) Bond Risk Premia with Machine Learning. USC-INET Research Paper No. 19-11.

https://doi.org/10.2139/ssrn.3400941

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3232721

[72] Chen, L., Pelger, M. and Zhu, J. (2019) Deep Learning in Asset Pricing. Quantitative Finance, 1-89.

https://arxiv.org/abs/1904.00745

https://doi.org/10.2139/ssrn.3350138

[73] Feng, G., Polson, N. and Xu, J. (2019) Deep Learning in Asset Pricing. Statistics, 1-33.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3350138

[74] Yang, Q., Ye, T. and Zhang, L. (2018) A General Framework of Optimal Investment. Working Paper.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3136708

[75] Yu, P., Lee, J., Kulyatin, I., Shi, Z. and Dasgupta, S. (2019) Model-Based Deep Reinforcement Learning for Dynamic Portfolio Optimization. Computer Science, 1-21.

https://arxiv.org/abs/1901.08740

[76] Kingma, D. and Ba, J.L. (2014) Adam: A Method for Stochastic Optimization. Computer Science, 1-15.

https://arxiv.org/abs/1412.6980

[77] Heston, S. (1993) A Closed-Form Solution for Options with Stochastic Volatility with Applications to Bond and Currency Options. The Review of Financial Studies, 6, 327-343.

https://doi.org/10.1093/rfs/6.2.327

[78] Dupire, B. (1994) Pricing with a Smile. Risk.

http://www.risk.net/data/risk/pdf/technical/2007/risk20_0707_technical_volatility.pdf

[79] Homescu, C. (2014) Local Stochastic Volatility Models: Calibration and Pricing. Working Paper.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2448098

https://doi.org/10.2139/ssrn.2448098

[80] Broadie, M., Chernov, M. and Johannes, M. (2007) Model Specification and Risk Premia: Evidence from futures Options. Journal of Finance, 62, 1453-1490.

https://doi.org/10.1111/j.1540-6261.2007.01241.x

[81] Guennon, H. (2016) Local Volatility Models Enhanced with Jumps. Working Paper, 1-11.

https://papers.ssrn.com/abstract=2781102

https://doi.org/10.2139/ssrn.2781102

[82] Buehler, H., Gonon, L., Teichmann, J. and Wood, B. (2018) Deep Hedging. Working Paper.

https://doi.org/10.2139/ssrn.3120710

https://arxiv.org/abs/1802.03042

[83] Halperin, I. (2018) The QLBS Q-Learner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option Portfolios. Quantitative Finance, 1-18.

https://arxiv.org/abs/1801.06077

https://doi.org/10.2139/ssrn.3102707

[84] Halperin, I. (2018) QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds. Quantitative Finance, 1-34.

https://arxiv.org/abs/1712.04609

https://doi.org/10.2139/ssrn.3087076

[85] Schroder, M. and Skiadas, C. (2008) Optimality and State Pricing in Constrained Financial Markets with Recursive Utility under Continuous and Discontinuous Information. Mathematical Finance, 18, 199-238.

https://doi.org/10.1111/j.1467-9965.2007.00330.x

[86] Detemple, J. and Zapatero, F. (1991) Asset Prices in an Exchange Economy with Habit Formation. Econometrica, 59, 1633-1657.

https://doi.org/10.2307/2938283

[87] Karatzas, I., Lehoczky, J., Shreve, S. and Xu, G. (1991) Martingale and Duality Methods for Utility Maximization in a Incomplete Market. SIAM Journal on Control and Optimization, 29, 702-730.

https://doi.org/10.1137/0329039

[88] He, H. and Pearson, N. (1991) Consumption and Portfolio Policies with Incomplete Markets and Short-Sale Constraints: The Infinite Dimensional Case. Journal of Economic Theory, 54, 259-304.

https://doi.org/10.1016/0022-0531(91)90123-L

[89] Karatzas, I. and Cvitanic, J. (1992) Convex Duality in Constrained Portfolio Optimization. Annals of Applied Probability, 2, 767-818.

https://doi.org/10.1214/aoap/1177005576

[90] Detemple, J., Garcia, R. and Rindisbacher, M. (2003) A Monte Carlo Method for Optimal Portfolios. Journal of Finance, 58, 401-446.

https://doi.org/10.1111/1540-6261.00529

[91] Detemple, J., Garcia, R. and Rindisbacher, M. (2005) Intertemporal Asset Allocation: A Comparison of Methods. Journal of Banking and Finance, 29, 2821-2848.

https://doi.org/10.1016/j.jbankfin.2005.02.004

[92] Detemple, J. and Rindisbacher, M. (2010) Dynamic Asset Allocation: Portfolio Decomposition Formula and Applications. The Review of Financial Studies, 23, 25-100.

https://doi.org/10.1093/rfs/hhp040

[93] Detemple, J. (2012) Portfolio Selection: A Review. Journal of Optimization Theory and Applications, 161, 1-21.

https://doi.org/10.1007/s10957-012-0208-1

[94] Matoussi, A. and Xing, H. (2016) Convex Duality for Stochastic Differential Utility. Quantitative Finance, 1-22.

http://arxiv.org/pdf/1601.03562.pdf

https://doi.org/10.2139/ssrn.2715425

[95] Kraft, H., Seiferling, T. and Seifried, F. (2015) Optimal Consumption and Investment with Epstein-Z in Recursive Utility. Working Paper.

https://doi.org/10.2139/ssrn.2444747

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2424706

[96] Ait-Sahalia, Y. (2008) Closed-Form Likelihood Expansions for Multivariate Diffusions. Annals of Statistics, 36, 906-937.

https://doi.org/10.1214/009053607000000622

[97] Filipovic, D., Mayerhofer, E. and Schneider, P. (2013) Density Approximations for Multivariate Affine Jump Diffusion Processes. Journal of Econometrics, 176, 93-111.

https://doi.org/10.1016/j.jeconom.2012.12.003

[98] Van Handel, R. (2008) Hidden Markov Models. Princeton Lecture Notes.

[99] Markowitz, H. (1952) Portfolio Selection. Journal of Finance, 7, 77-91.

https://doi.org/10.1111/j.1540-6261.1952.tb01525.x

[100] Schneider, P. and Trojani, F. (2018) (Almost) Model Free Recovery. Forthcoming in Journal of Finance, 74, 323-370.

https://doi.org/10.1111/jofi.12737

[101] Chabakauri, G. (2013) Dynamic Equilibrium with Two Stocks, Heterogeneous Investors, and Portfolio Constraints. The Review of Financial Studies, 26, 3104-3141.

https://doi.org/10.2139/ssrn.2221073

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2221073

[102] Chabakauri, G. (2015) Asset Pricing with Heterogeneous Preferences, Beliefs, and Portfolio Constraints. Journal of Monetary Economics, 75, 21-34.

[103] Kardaras, C., Xing, H. and Zitkovic, G. (2015) Incomplete Stochastic Equilibria for Dynamic Monetary Utility. Mathematics, 1-33.

https://arxiv.org/abs/1505.07224

[104] Halle, J.O. (2010) Backward Stochastic Differential Equations with Jumps. Master Thesis, University of Oslo, Oslo, Norway.

[105] Dalderop, J. (2016) Nonparametric State-Price Density Estimation Using High Frequency Data. Working Paper.

https://doi.org/10.2139/ssrn.2718938