Full Image Inference Conditionally upon Available Pieces Transmitted into Limited Resources Context

Rodrigue Saoungoumi-Sourpele^{1}^{*},
Jean Michel Nlong^{2},
David Jaurès Fotsa-Mbogne^{1},
Jean-Robert Kala Kamdjoug^{3},
Laurent Bitjoka^{4}

Show more

1. Introduction

Transmission of digital images has been widely studied, since the early years of the Internet [1]. It deals with compression and transmission of images data in such a way that the receiver can start decoding and displaying the received images even without receiving the whole file. Because of the large amounts of data needed in image technology, applications are highly constrained by the available resources, and the quality of service during the transmission. In video streaming, for instance, the latency of the transmission of individual image frames plays a fundamental role [2] [3] because of the isochronal character of the video. The images must be displayed at a given frequency, with a fault threshold above which the visual quality of the video is not acceptable. Many techniques have been proposed in the literature to tackle these problems, among which image compression and Progressive Image Transmission (PIT).

The primary objective of PIT is to transmit a significant and interpretable core of the image and subsequently transmit complements layers in order to gradually improve the quality. This method requires a preparation of the image to be transmitted. PIT techniques can be grouped into three main areas: the spatial domain [4], the methods based on transform domain, and the pyramid-structured domain [4] [5]. As new areas of interest emerge, like live streaming over narrow networks, wireless sensor networks, digital image transmissions are still of a significant challenge. As reviewed in [4], most of the recent improvements in image coding are based on wavelet transformation. The challenge is then to organize the transmission of the bitstream to adapt to the fluctuations of the network and the receiving device capabilities.

In this work, we are interested in the progressive transmission and refinement of still images, as a process that adapts to low quality network service. A special focus is made on JPEG2000 format since it is the most used standard nowadays [6] [7]. However, the method presented here is general enough and can be applied to any image format that encodes the image file as a two-dimension data container (the resolution and the color depth), and where the resolution and the color depth can be picked independently to adapt to the end user device. The display of an image is considered as a progressive process in order to adapt to network conditions. The sender selects the image data, layer by layer, from most significant to the least one, depending on the quality of the desired image at the receiver. Upon reception of these data, the receiver decodes a blurred version of the image and smooths it by statistically inferring the missing information. As more refinement data come from the network, this process is recursively repeated. Kalman filtering (KF) algorithm will be used to infer the refinement data [8] [9] [10].

The rest of the paper is organized as follows. In Section 2, we review the literature works about progressive image transmission. Section 3 deals with the theoretical foundations on the discrete Kalman Filtering, while Section 4 presents the proposed method modeling image transmission as a filtering procedure. In Section 5, we apply the FRM-KF on a standard image gallery and discuss its performance.

2. Short Background on PIT

PIT techniques can be grouped into three main areas namely the spatial domain, the methods based on transformed domain and the pyramid structured domain.

2.1. Spatial Domain

Spatial domain methods are based on the bit-plane decomposition (BPDM) [4] and the vector quantization method (VQM).

The bit plane decomposition method is the most intuitive one when tackling the problem of progressive transmission. Indeed, the level of gray of each pixel in an image is coded over 8 bits having different significances. The collection of the *i ^{th}* significant bits of all pixels constitutes the

During the vector quantization the pixels are grouped in blocks (code-blocks) which are transformed each into a vector. The obtained vectors are grouped into a lighter structure called code-book where they are codewords. Codewords are progressively transmitted and used to produce an approximative image on the receiver side. The main available improvement of the VQM is the Tree-sourced Vector Quantization method (TSVQM) which consists to transmit first the vector quantizations contributing more quickly to obtain a better image quality [12]. The VQM has some disadvantages: block effects during the display, transmission by codeword and overout of the transmitter side complexity, calculation overhead for the creation, organization and codewords selection.

2.2. Transform Domain

The main goal of transform based methods (e.g. Discrete Cosines Transform (DCT)) is to achieve the concentration of energy in low-frequency areas which are grouped into a small number of coefficients. The low frequency coefficients have a strong and decisive impact on the final and overall quality of the image. Before their transmission in the decreasing order of importance, those coefficients are hierarchized following a technical scanning pattern (e.g. the zigzag scan used in JPEG [13] ) or the multistage quantization based on variances of coefficients.

2.3. Pyramid Structured Domain

The pyramidal shape is ideal for progressive transmission. To form a pyramid, an image is reduced in terms of resolution according to a predefined method such as Discrete Wavelet Decomposition (DWT) [4] [5] and the Quadrature Mirror Filter (QMF) [12]. The reduced image has few coefficients and the transmission process therefore consists of transmitting first the top of the pyramid followed by the differences between the current layer and the next layer.

2.4. Fast Progressive Image Transmission

All the above techniques do not integrate a prediction on the data not yet received. Such an inference allows a faster access to a transmitted image. A method trying to achieve that goal is the pixel interpolation permitting to estimate not yet received data using a model constructed based on available data. The SIDE-MATCH algorithm is an implementation of interpolation method [14]. Although the expected rapidity to converge to a relatively good image quality, inference methods suffer of a certain number of drawbacks, namely the difficulty of producing good quality images at the beginning of the process and their complexity inducing large calculation times.

3. The Discrete Kalman Filter

Filtering is a procedure which aims at estimating the state of a given dynamic system with noisy observations. Usually, the outputs are given as a sequence: ${\left\{{Y}_{n}\right\}}_{n\in T}$ . $T\subseteq \mathbb{R}$ denotes a set of time values. It can be discrete or continuous, depending on data availability and observation rate. Each output ${Y}_{n}$ is related to an unknown or partially known state ${X}_{n}$ through a stochastic model of the form

${Y}_{n}={H}_{n}\left({X}_{n}\right)+{V}_{n}\mathrm{,}$ (1)

where ${V}_{n}$ is the noise occurring in the measurement procedure. ${H}_{n}$ represents and averaged relationship between ${Y}_{n}$ and ${X}_{n}$ . In other terms it is a trend of evolution of Y as a function of X The observation noise is usually assumed to be a normal or Gaussian random variable [15]. The additional hypothesis of independence of system ${\left\{{V}_{n}\right\}}_{i\in T}$ is very common and useful for computations. In Equation (1), the observation ${Y}_{n}$ is available while neither ${X}_{n}$ nor ${V}_{n}$ is known. Filtering aims at giving the conditional law of the hidden signal ${X}_{n}$ conditionally upon the subsequence ${\left\{{Y}_{i}\right\}}_{i\le n}$ despite the presence of noise ${\left\{{V}_{i}\right\}}_{i\le n}$ . One can focus on ${\stackrel{^}{X}}_{n}$ , the conditional expectation of the signal ${X}_{n}$ upon subsequence ${\left\{{Y}_{i}\right\}}_{i\le n}$ noted as ${Y}_{\mathrm{0:}n}$ for simplicity [10] [16] [17]. As mentioned in [18] [19] [20], applications of filtering cover areas such as sensorless control, prognostics and health management (PHM), fault-tolerant control of ac drives, management of storage systems, signal processing, robotics, computer vision, real-time industrial control systems, localization, navigation, mobile trajectory tracking and other applications combining knowledge of a priori dynamics with sensors measurements.

A large class of these applications is covered by the discrete filtering that can be described by the general linear problem

$\{\begin{array}{l}{X}_{n+1}={A}_{n}{X}_{n}+{B}_{n}+{\epsilon}_{n}\hfill \\ {Y}_{n+1}={C}_{n+1}{X}_{n+1}+{D}_{n+1}+{\omega}_{n+1}\hfill \\ {\epsilon}_{n}\u21dd\mathcal{N}\left(\mathrm{0,}{K}_{n}\right)\mathrm{,}{\omega}_{n}\u21dd\mathcal{N}\left(\mathrm{0,}{W}_{n}\right)\hfill \end{array}$ (2)

where ${A}_{n}$ , ${B}_{n}$ , ${C}_{n}$ and ${D}_{n}$ are matrices expressing the dynamics of the signal and the observation. The filtering problem (2) has an explicit solution in the Gaussian linear case known as the “discrete Kalman Filter” which is presented as follows. Let

${X}_{n}^{p}=\mathbb{E}\left[{X}_{n}\mathrm{|}{Y}_{0}\mathrm{,}\cdots \mathrm{,}{Y}_{n-1}\right]\mathrm{,}$ (3)

${X}_{n}^{e}=\mathbb{E}\left[{X}_{n}\mathrm{|}{Y}_{0}\mathrm{,}\cdots \mathrm{,}{Y}_{n}\right]$ (4)

and ${Q}_{n}^{p}=Var\left({X}_{n}-{X}_{n}^{p}\right)$ , ${Q}_{0}^{p}$ given by the law of ${X}_{0}$ and ${X}_{0}^{p}={X}_{0}^{e}$ . The filtering equations are given as

$\{\begin{array}{l}{x}_{n+1}^{p}={A}_{n}\left(I-{N}_{n}{C}_{n}\right){x}_{n}^{p}+{A}_{n}{N}_{n}\left({y}_{n}-{D}_{n}\right)+{B}_{n}\\ {x}_{n+1}^{e}=\left(I-{N}_{n+1}{C}_{n+1}\right){x}_{n+1}^{p}+{N}_{n+1}\left({y}_{n+1}-{D}_{n+1}\right)\\ {N}_{n}={Q}_{n}^{p}{C}_{n}^{\text{T}}{\left({C}_{n}{Q}_{n}^{p}{C}_{n}^{\text{T}}+{W}_{n}\right)}^{-1}\\ {Q}_{n+1}^{p}={A}_{n}\left(I-{N}_{n}{C}_{n}\right){Q}_{n}^{p}{A}_{n}^{\text{T}}+{K}_{n}\end{array}$ (5)

Since ${Q}_{n}^{e}=Var\left({x}_{n}-{x}_{n}^{e}\right)$ , one has ${Q}_{n}^{e}=\left(I-{N}_{n}{C}_{n}\right){Q}_{n}^{p}+{N}_{n}{W}_{n}{N}_{n}^{\text{T}}$ . On the other hand, ${Q}_{0}^{p}=Var\left({x}_{0}\right)+Var\left({x}_{0}^{p}\right)$ and if ${x}_{0}^{p}$ is chosen as being constant (0 for example), then its variance will be null and ${Q}_{0}^{p}=Var\left({x}_{0}\right)$ . The techniques developed in the linear filtering can sometimes be extended to the nonlinear case by the mean of linearization methods [21]. However, there are more general results that can be applied in nonlinear cases such as particle filtering.

4. Presentation of the FRM-KF

4.1. Model Statement

We consider the progressive transmission of a JPEG2000 image, encoded in bitplane. We assume the transmission is done bitplane by bitplane, over a narrow network channel. Because of the poor network quality, the receiver cannot wait until all the data are transmitted before decoding and displaying the image. Moreover, the transmission can unpredictably stop at any time. Thus, the receiver has to use the data received so far to estimate as better as possible the whole image. A first approach consists in simply refreshing the estimated image with newly received layers and in displaying the result when its quality reaches a given threshold. Instead, we learn from successive bitplanes or layers, considered as partial observations of the image, to infer the missing parts. Hence, the bitplane transmission can be viewed as a dynamic system with partial observations. Since image structures are variables, we can use a representative sample of coefficients for statistical inference.

Let *S* be such an image, and
${\left\{{L}_{n}\right\}}_{1\le n\le M}$ the sequence of bitplanes extracted from *S*. Transmitting *S* consists in transmitting the layers
${L}_{M},{L}_{M-1},\cdots ,{L}_{1}$ . We call
${X}_{n}$ the part of *S* yet to be transmitted, after the sequence
${L}_{M},\cdots ,{L}_{M-n+1}$ , has been transmitted: that is the “residual”
$S-\left\{{L}_{M}\mathrm{,}\cdots \mathrm{,}{L}_{M-n+1}\right\}$ . For convenience, we also say
${X}_{0}=S$ ,
${Y}_{0}=0$ and
${Y}_{n}={L}_{M-n+1}$ .

Deterministically, we can then write

$\{\begin{array}{l}{X}_{n+1}={X}_{n}-{Y}_{n+1}\hfill \\ {X}_{0}=S,{Y}_{0}=0\hfill \end{array}$ (6)

However, following our purpose of inference, a stochastic description is needed here. Hence, from the receiver’s viewpoint, the following model that recalls the problem (2) can be considered:

${X}_{n+1}=\alpha {X}_{n}-\beta -{\epsilon}_{n}$ (7)

${Y}_{n+1}=\frac{1-\alpha}{\alpha}{X}_{n+1}+\frac{\beta}{\alpha}+\frac{{\epsilon}_{n}}{\alpha}+{\omega}_{n+1}$ (8)

with $\alpha \mathrm{,}\beta \in \mathbb{R}$ , ${\epsilon}_{n}\u21dd\mathcal{N}\left(\mathrm{0,}{\gamma}_{n}^{2}\right)$ , ${\omega}_{n}\u21dd\mathcal{N}\left(\mathrm{0,}{\sigma}_{n}^{2}\right)$ , where ${\gamma}_{0}>0$ , ${\gamma}_{n+1}={a}^{n+1}{\gamma}_{0}$ , ${\sigma}_{0}=0$ and ${\sigma}_{n+1}={b}^{n}{\sigma}_{1}+\frac{c}{1-b}$ .

Equation (7) describes the dynamics of the remaining information to be received while Equation (8) gives the next layer to be received. Indeed, we make the hypothesis of an arithmitico-geometric progression of the part of the image that remains to be sent ( ${X}_{n}$ ). On the same manner, we assume an affine relationship between the current layer to be sent ( ${Y}_{n}$ ) and the current part of the image that remains to be sent ( ${X}_{n}$ ). The choice of an affine model is as simple as natural for a first modeling that will prove otherwise reasonable. Notice that ${Y}_{0}=0$ and that by formulation of the problem, ${X}_{0}$ follows the uniform law $\mathcal{U}\left(\left[\mathrm{0;255}\right]\right)$ . Indeed, except that they belong to $\left[\mathrm{0;255}\right]\cap \mathbb{N}$ , we do not have any prior information on coefficients, and completed information is given by

$S={X}_{n}+{\displaystyle {\sum}_{i=0}^{n}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{Y}_{i},\text{\hspace{0.17em}}\forall n=0,\cdots ,M.$ (9)

Equation (7) underlines an exponential variation of estimation errors both with their variances. The filtering procedure will consist in determining the mathematical expectation of
${X}_{n}$ conditionally upon
${Y}_{\mathrm{0:}n}$ , at the step
$n=1,\cdots ,7$ . The choice of the upper bound of *n* = 7 is motivated by the fact that we process images by channels. And for a real color image, we have red, green and blue channels each coded on 8 bits (numbered from 0 to 7). The estimation
${S}_{n}$ of *S* is given by

${S}_{n}=\mathbb{E}\left[{X}_{n}|{Y}_{0},\cdots ,{Y}_{n}\right]+{\displaystyle {\sum}_{i=0}^{n}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{Y}_{i}.$ (10)

Proposition 1. If $0<\left|\alpha \right|<1$ then

$\underset{n\to \infty}{\mathrm{lim}}\mathbb{E}\left[{X}_{n}\right]+\frac{\beta}{1-\alpha}=\underset{n\to \infty}{\mathrm{lim}}\mathbb{E}\left[{Y}_{n}\right]=0.$

Moreover, if $0\le a<1$ and $0<b<1$ , then

$\underset{n\to \infty}{\mathrm{lim}}Var\left[{X}_{n}\right]=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\underset{n\to \infty}{\mathrm{lim}}Var\left[{Y}_{n}\right]=\frac{c}{1-b}.$

*Proof*. Let
${U}_{n}=\mathbb{E}\left[{X}_{n}\right]$ . One has

${U}_{n+1}=\alpha {U}_{n}-\beta $

and

${U}_{n+1}=\alpha {U}_{n}-\beta ={\alpha}^{n+1}{U}_{0}-\frac{1-{\alpha}^{n+1}}{1-\alpha}\beta $

Hence, $\mathbb{E}\left[{X}_{n}\right]={\alpha}^{n}\mathbb{E}\left[{X}_{0}\right]+\frac{1-{\alpha}^{n}}{1-\alpha}\beta $ and if $\left|\alpha \right|<1$ then $\underset{n\to \infty}{\mathrm{lim}}{\alpha}^{n}=0$ and $\underset{n\to \infty}{\mathrm{lim}}\mathbb{E}\left[{X}_{n}\right]=-\frac{\beta}{1-\alpha}$ . On the other hand, $\mathbb{E}\left[{Y}_{n}\right]=\frac{1-\alpha}{\alpha}\mathbb{E}\left[{X}_{n}\right]+\frac{\beta}{\alpha}$ and

$\underset{n\to \infty}{\mathrm{lim}}\mathbb{E}\left[{Y}_{n}\right]=\underset{n\to \infty}{\mathrm{lim}}\frac{1-\alpha}{\alpha}\mathbb{E}\left[{X}_{n}\right]+\frac{\beta}{\alpha}=0$ (11)

For the second part of Proposition 1,

$Var\left[{X}_{n+1}\right]={\alpha}^{2}Var\left[{X}_{n}\right]+Var\left[{\epsilon}_{n}\right]$ (12)

$={\alpha}^{2\left(n+1\right)}Var\left[{X}_{0}\right]+\frac{1-{\left({a}^{2}{\alpha}^{-2}\right)}^{n+1}}{1-{a}^{2}{\alpha}^{-2}}{\alpha}^{2n}{\gamma}_{0}^{2}$ (13)

and

$Var\left[{Y}_{n+1}\right]={\left(1-\alpha \right)}^{2}Var\left[{X}_{n}\right]+Var\left[{\epsilon}_{n}\right]$ (14)

$+Var\left[{\omega}_{n+1}\right]$ (15)

$={\left(1-\alpha \right)}^{2}Var\left[{X}_{n}\right]+Var\left[{\epsilon}_{n}\right]$ (16)

$+{\left({b}^{n}{\sigma}_{1}+\frac{c}{1-b}\right)}^{2}$ (17)

Hence, if additionally $0\le a<1$ et $0<b<1$ , then $\underset{n\to \infty}{\mathrm{lim}}Var\left[{X}_{n}\right]=\underset{n\to \infty}{\mathrm{lim}}Var\left[{\epsilon}_{n}\right]=0$ and $\underset{n\to \infty}{\mathrm{lim}}Var\left[{Y}_{n}\right]=\frac{c}{1-b}$ .

Proposition 1 illustrates the fact that in the long run ( $n\to \infty $ ), the remaining information about the image is predictable and tends to $-\frac{\beta}{1-\alpha}$ . On the other hand, the layers to be received tend to zero in the long run. That is realistic since only a finite number of layers are sufficient. Following the same principle, the remaining information shall be null in the long run. Thus, we should have $\beta =0$ . We adopt it later in the work.

The use of the Kalman filter also gives us the benefit of its memoryless characteristic: it only retains the previous state to infer the current one. So it is not necessary to keep track of all the previously computed states in memory for the prediction method.

4.2. Calibration and Validation of the Model

The dynamics of the conditional distribution law (characterized by its mean vector and its variance-covariance matrix) is stirred by the filtering equations. In order to determine coefficients
$\alpha $ ,
$\beta $ , *a*, *b* and *c*, we proceed by statistic regressions on the sample
$\left[\mathrm{0;255}\right]\cap \mathbb{N}$ corresponding to all possible values of a block of pixels. Regression aims at identifying the best set of parameters which minimize the sum square error (SSE) of the best fitting model. Precisely, we shall determine
$\alpha $ and
$\beta $ which minimize the quantity

${\text{SSE}}_{1}={\displaystyle {\sum}_{n=0}^{7}}{\left(\frac{1}{256}{\displaystyle {\sum}_{i=0}^{255}}\left({X}_{n+1}^{i}-\alpha {X}_{n}^{i}+\beta \right)\right)}^{2}$

$={\displaystyle {\sum}_{n=0}^{7}}{\left({\stackrel{\xaf}{X}}_{n+1}-\alpha {\stackrel{\xaf}{X}}_{n}+\beta \right)}^{2}$ (18)

Since we adopted
$\beta =0$ , it remains to find
$\alpha $ in such a way that (18) is minimal. After
$\alpha $ and
$\beta $ have been identified, one can obtain consecutively *a*, *b* and *c* by minimizing the following SSEs:

${\text{SSE}}_{2}={\displaystyle {\sum}_{i=1}^{n}}\text{\hspace{0.17em}}{\displaystyle {\sum}_{n=0}^{7}}{\left({\gamma}_{n+1}-a{\gamma}_{n}\right)}^{2}={\displaystyle {\sum}_{n=0}^{7}}{\left({S}_{{X}_{n+1}}-a{S}_{{X}_{n}}\right)}^{2}$ (19)

and

${\text{SSE}}_{3}={\displaystyle {\sum}_{i=1}^{n}}\text{\hspace{0.17em}}{\displaystyle {\sum}_{n=0}^{7}}{\left({\sigma}_{n+1}-b{\sigma}_{n}-c\right)}^{2}={\displaystyle {\sum}_{n=0}^{7}}{\left({S}_{{Y}_{n+1}}-b{S}_{{Y}_{n}}-c\right)}^{2}.$ (20)

In (18), (19) and (20), we have for $Z=X,Y$ ,

${\stackrel{\xaf}{Z}}_{n}=\frac{1}{256}{\displaystyle \underset{i=0}{\overset{255}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{Z}_{n}^{i}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{S}_{{Z}_{n}}^{2}=\frac{1}{255}{\displaystyle \underset{i=0}{\overset{255}{\sum}}}{\left({Z}_{n}^{i}-{\stackrel{\xaf}{Z}}_{n}\right)}^{2}.$

Following the aforementioned regressions, we obtained Table 1.

Note that all the parameters satisfy the hypotheses of Proposition 1 and therefore guarantee the exponential convergence of the filter.

5. Experimental Evaluation of FRM-KF

This section aims at applying the filtering procedure we described above to a sample of 210 images coming from the University of Southern California-Signal

Table 1. Estimates of parameters.

and Image Processing Institute USC-SIPI^{1} database. We used a computer workstation with the following characteristics: RAM: 4 GB, Processor: 4xIntel (c) CoreTM i3-3227U CPU @1.90 GHz of 32 bits on a Ubuntu operating system 18.04 (Linux kernel: 4.15.0-74-generic).

According to Section 3 we have ${A}_{n}=\alpha $ , ${B}_{n}=\beta $ , ${C}_{n}=\frac{1-\alpha}{\alpha}$ , ${D}_{n}=\frac{\beta}{\alpha}$ , ${K}_{n}={\gamma}_{n}^{2}$ , ${W}_{n}=\frac{{\gamma}_{n}^{2}}{{\alpha}^{2}}+{\sigma}_{n}^{2}$ , ${Q}_{0}^{p}\in \left\{\frac{{255}^{2}}{12}\mathrm{;}\frac{{255}^{2}}{6}\right\}$ . Let us recall here that for a Uniform law $\mathcal{U}\left(\left[a\mathrm{,}b\right]\right)$ the variance is given by $\frac{{\left(b-a\right)}^{2}}{12}$ . Figure 1 illustrates the

evolution of the visual rendering of images following the quality layers reception and the filtering procedure.

Compared to the results in [5], the visual rendering they obtained at their fifth step is obtained here at the 3* ^{rd}* step (Figure 1(d)), corresponding to a good visual quality for human perception. The Peak Signal to Noise Ratio (PSNR) was measured for the successive estimated images based on received layers. We compared our PSNRs to those of reference methods: the Set Partitioning In Hierarchical Tree (SPIHT) method, the method of Tzu-Chuen and the method of Tung [5].

A regression analysis showed for all considered methods that there is an affine relation between the number of received layers and the measured PSNR (at least 93% for the adjusted R-squared) with high significant^{2} slope and intercept.

Figure 1. Visual rendering of the model on Lenna (256 × 256). (from (b)-(h)). In (a), there is the original Lenna (256 × 256) compressed images using the OpenJPEG library (version 2.3.0-October 2017), with 6 levels of resolution and 8 levels of quality (80%, 70%, 60%, 50%, 40%, 30%, 20% and 1%, corresponding to the original image being of high quality).

Table 2 gives the regression coefficients of each method for 256 * 256 and 512 * 512 resolutions of Lenna image studied in [5].

Table 3 gives the difference of regression coefficients of each considered method with respect to FRM-KF. The intercept shows that Tzu-Chueng Lu method has the better initial PSNR while the FRM-KF has the worst probably because its initial estimation is drawn uniformly randomly. Fortunately, the FRM-KF has the best slope which is about 1.63 times better than the second higher slope displayed by the SPIHT method. Hence, with a few number of images (from 3) FRM-KF presents the best performance compared to other methods.

The database coming from the USC-SIPI contains 73 images having a 256 * 256 resolution, 83 images having a 512 * 512 resolution, 53 images having a 1024 * 1024 resolution and only 1 image having a 2050 * 2050 resolution. For our statistical analyses we then focused on 256 * 256, 512 * 512 and 1024 * 1024 resolutions. Again we found an affine relation between the PSNR and the number of received layers. The P-value was less than 2 × 10^{−}^{16} and the adjusted *R*^{2} (model fitting factor) was between 88.77% and 96.96%. In order to give a general behavior of the FRM-KF method, the computed values of the slope and the intercept are given in Table 4.

Table 2. PSNR as function of the number of received layers.

Table 3. Difference of regression coefficients of each method with respect to FRM-KF.

Table 4. Regression coefficients of the PSNR as linear function of received layers using FRM-KF method.

We evaluated the time needed to decode the images. The first phase consisting to generate white noise, to decode the first quality layer of the original image, and to combine the both took about 2.24 × 10^{−}^{1} ± 2.868 × 10^{−}^{2}, 8.87 × 10^{−}^{1} ± 5.612 × 10^{−}^{2} and 3.505 ± 1.587 × 10^{−}^{1} (in terms of average ± standard deviation) seconds respectively for 256 * 256, 512 * 512 and 1024 * 1024 resolution images. The necessary times to decode each other quality layer and to combine it with previous result, were given by 3.149 × 10^{−}^{2} ± 4.012 × 10^{−}^{3}, 1.223 × 10^{−}^{1} ± 1.024 × 10^{−}^{2}, 4.95 × 10^{−}^{1} ± 4.009 × 10^{−}^{2} seconds respectively for 256 * 256, 512 * 512 and 1024 * 1024 resolution images. The images used on current mobile devices have a resolution of at least 512 * 512. With regard to the time corresponding to the processing of the 1024 * 1024 resolution image, we recommend the FRM-KF method to resolutions less or equal to 512 * 512.

Focusing on the amount of data transmitted during a streaming of images for each quality layer, we notice that less than 10% of image data is needed to obtain up to the sixth-quality image. So, the process is suitable in terms of processing and memory resources for small devices with low computing capabilities.

6. Conclusions

This work addressed the problem of image transmission in limited environment. We were interested in the progressive transmission and refinement of still images, as a process that adapts to low quality network service. In order to achieve our objectives, we proposed a stochastic model which presents the missing parts of the image as noise effects. In a stochastic context, the problem of estimating dynamically a signal conditionally upon available observations is known as filtering. Thus, we tried successfully to calibrate a Kalman filter model using statistical regression and some general considerations. The output model we got was precisely a discrete Kalman filter.

Applying the filtering procedure on a dataset of 209 images we got satisfactory results. Indeed, we evaluated the evolution of Peak Signal to Noise Ratio (PSNR) with respect to the number of received layers. An affine relation was found independently on the PIT method we considered (Set Partitioning In Hierarchical Tree, Tzu-Chuen, Tung and FRM-KF methods). The FRM-KF approach we proposed appeared to be one which improves the PSNR faster.

The performance of FRM-KF method has been further evaluated in terms of the ratios in the quality of data image/size sent and in the quality of image/time required for treatment. A high quality was reached faster with relatively small data (less than 10% of image data is needed to obtain up to the sixth-quality image). The time for treatment also decreases faster with number of received layers. However, we found that the time of image treatment might be large starting from a image resolution of 1024 * 1024. Hence, we recommend FRM-KF method for resolutions less or equal to 512 * 512.

In future works, we are expected to extend our method in multimedia communication environments, subject to disturbances, in order to ensure robustness to breakdowns and interference. We should also consider adapting our approach to video streaming in order to ensure a greater continuity of video streaming service content.

NOTES

^{1}http://sipi.usc.edu/database/

^{2}Significance codes under R software in terms of P-value: 0 “***” 0.001 “**” 0.01 “*” 0.05 “•” 0.1 “ ” 1.

References

[1] Kiely, A.B. (1996) Progressive Transmission and Compression Images. The Telecommunications and Data Acquisition Progress Report 42-124, Jet Propulsion Laboratory, Pasadena, California, October-December 1995, 88-103.

https://tmo.jpl.nasa.gov/progress_report/42-124/124E.pdf

[2] Chen, C., Zhu, X., de Veciana, G., Bovik, A.C. and Heath, R.W. (2015) Rate Adaptation and Admission Control for Video Transmission with Subjective Quality Constraints. IEEE Journal of Selected Topics in Signal Processing, 9, 22-36.

https://doi.org/10.1109/JSTSP.2014.2337277

[3] Servetto, S.D. and Vetterli, M. (2000) High-Bandwidth Internet Video Telephony. The 10th International Packet Video Workshop, Forte Village Resort, Cagliari, 1-2 May 2000.

https://infoscience.epfl.ch/record/34083?ln=fr

[4] Boujelbene, R., Jemaa, Y.B. and Zribi, M. (2019) A Comparative Study of Recent Improvements in Wavelet-Based Image Coding Schemes. Multimedia Tools and Applications, 78, 1649-1683.

https://doi.org/10.1007/s11042-018-6262-4

[5] Lu, T.-C. and Chang, C.-C. (2007) A Progressive Image Transmission Technique Using Haar Wavelet Transformation. International Journal of Innovative Computing, Information and Control, 3, 1449-1461.

[6] Christopoulos, C., Skodras, A. and Ebrahimi, T. (2000) The JPEG2000 Still Image Coding System: An Overview. IEEE Transactions on Consumer Electronics, 46, 1103-1127.

https://doi.org/10.1109/30.920468

[7] Rabbani, M. (2002) JPEG2000: Image Compression Fundamentals, Standards and Practice. Journal of Electronic Imaging, 11, 286.

https://doi.org/10.1117/1.1469618

[8] Charles, G.C. and Chui, K. (2017) Kalman Filtering: With Real-Time Applications. Springer International Publishing, Berlin.

[9] Grewal, M.S. (2011) Kalman Filtering. In: Lovric, M., Ed., International Encyclopedia of Statistical Science, Springer, Berlin, Heidelberg, 705-708.

https://doi.org/10.1007/978-3-642-04898-2_321

[10] Kalman, R.E. (1960) A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82, 35-45.

https://doi.org/10.1115/1.3662552

[11] Jiang, J.-H., Chang, C.-C. and Chen, T.-S. (1997) Selective Progressive Image Transmission Using Diagonal Sampling Technique. Proceedings of International Symposium on Digital Media Information Base, Nara, 26-28 November 1997, 59-67.

[12] Tzou, K.-H. (1987) Progressive Image Transmission: A Review and Comparison of Techniques. Optical Engineering, 26, Article ID: 267581.

https://doi.org/10.1117/12.7974121

[13] Ambadekar, S.P., Jain, J. and Khanapuri, J. (2019) Digital Image Watermarking through Encryption and DWT for Copyright Protection. In: Bhattacharyya, S., Mukherjee, A., Bhaumik, H., Das, S. and Yoshida, K., Recent Trends in Signal and Image Processing, Springer, Singapore, 187-195.

https://doi.org/10.1007/978-981-10-8863-6_19

[14] Chen, T.-S. and Chang, C.-C. (1997) Progressive Image Transmission Using Side Match Method. IPSJ International Symposium on Information Systems and Technologies for Network Society, Japan, 24-26 September 1997, 191-198.

[15] Welch, G. and Bishop, G. (2006) An Introduction to the Kalman Filter. TR 95-041, University of North Carolina, Chapel Hill.

[16] Anderson, B.D. and Moore, J.B. (1979) Optimal Filtering. Prentice-Hall, Englewood Cliffs, 21.

[17] Durbin, J. and Koopman, S.J. (2012) Time Series Analysis by State Space Methods. Oxford University Press, Oxford.

https://doi.org/10.1093/acprof:oso/9780199641178.001.0001

[18] Auger, F., Hilairet, M., Guerrero, J.M., Monmasson, E., Orlowska-Kowalska, T. and Katsura, S. (2013) Industrial Applications of the Kalman Filter: A Review. IEEE Transactions on Industrial Electronics, 60, 5458-5471.

https://doi.org/10.1109/TIE.2012.2236994

[19] Carraro, C. (1989) A Few problems with Application of the Kalman Filter. In: Decarli, A., Francis, B.J., Gilchrist, R., Seeber, G.U.H., Eds., Statistical Modelling, Vol. 57, Springer, New York, 75-83.

https://doi.org/10.1007/978-1-4612-3680-1_9

[20] Lang, T. and Dunne, D. (2008) Application of Particle Filters in a Hierarchical Data Fusion system. 2008 11th International Conference on Information Fusion, Cologne, 30 June-3 July 2008, 1-7.

[21] Ristic, B., Arulampalam, S. and Gordon, N. (2003) Beyond the Kalman Filter: Particle Filters for Tracking Applications. Artech House, Washington DC.