Dynamic Indoor Localization Using Maximum Likelihood Particle Filtering

A popular approach for solving the indoor dynamic localization problem based on WiFi measurements consists of using particle filtering. However, a drawback of this approach is that a very large number of particles are needed to achieve accurate results in real environments. The reason for this drawback is that, in this particular application, classical particle filtering wastes many unnecessary particles. To remedy this, we propose a novel particle filtering method which we call maximum likelihood particle filter (MLPF). The essential idea consists of combining the particle prediction and update steps into a single one in which all particles are efficiently used. This drastically reduces the number of particles, leading to numerically feasible algorithms with high accuracy. We provide experimental results, using real data, confirming our claim.


Introduction
Wireless positioning is the most popular approach for indoor location-based services. It finds applications in emergency rescue, smart home, security monitoring and many other areas. In contrast to outdoor positioning, whose dominant approach is based on the global positioning system (GPS), indoor positioning is hindered by multiple sources of interference and multipath effects in time-varying environments. In view of this, multiple techniques for indoor positioning are currently being investigated. The most popular ones are based on radio frequency identification [1,2], Bluetooth signals [3,4], ultra-wideband signals [5,6] and WiFi signals [7,8]. Due to their large coverage, low implementation cost and availability in existing mobile devices, WiFi-based indoor positioning is the most promising technique.
A broad classification of indoor localization methods can be done in two categories, namely, geometric mapping and fingerprinting [9]. In geometric mapping, the target's position is estimated using geometric information like distances and angles from the target to a set of access points (APs) [10]. Typical techniques for obtaining geometric information are time of flight (ToF) [11] and angle of arrival (AoA) [12]. This approach typically requires obtaining information from more than three APs. the resulting methods are based on line of sight (LOS) signals, and therefore are easily influenced by obstacles. Strict synchronization is also needed between different devices. The above limitations are overcome by the fingerprinting method [13]. It consists of building a database containing the values of certain received signal features at a given set of known locations. Localization is then done by comparing the features obtained at the target's location with those of the database. Localization based on fingerprinting is more immune to distortions caused by the environment, since the latter are coded in the feature database. However, it suffers from poor positioning accuracy when the target is in a position far from anyone in the database.
The most commonly WiFi signal feature used for indoor localization is its received signal strength (RSS) [14]. It measures the signal power strength at the receiver side. Although RSS measurements are easy to obtain, only a very rough estimation of the receiver's position can be obtained from it. This is due to the fading and distortion suffered by signals propagating through multipaths. To tackle this problem, in recent years channel state information (CSI) has been used instead of RSS [9]. CSI contains information of the phase and amplitude value of each subcarrier of an orthogonal frequency division multiplexing (OFDM) channel. It can be readily obtained from any OFDM system based on the 802.11n protocol. Since multipath and fading information is represented in the CSI, its use for indoor localization leads to more accurate and stable results in comparison to RSS.
To improve the accuracy of any given localization method, extra information can be obtained from a motion model of the target. The common solution for fusing the information from the motion model with that from measurements consists of using some Bayesian tracking technique. We refer to resulting techniques as dynamic localization methods [15]. One such techniques is the Kalman filter [16][17][18]. While being optimal and having a closed-form, it only applies to linear Gaussian model. This assumption can be relaxed using the extended Kalman filter [19]. However, since this method is simply based on linearizing a non-linear model, it often leads to inaccurate results. This can be overcome by using particle filtering [20][21][22][23]. This essentially consists of approximating non-Gaussian probability distributions using a number of samples (particles) multiplied by their associated weights.
As it is known, a Bayesian tracking method consists of alternating two steps called prediction and update. In a particle filter, a set of new particles is drawn during each prediction step, whose weighs are determined during the subsequent update step. A drawback of this approach is that this weighting results in little use of those particles whose weights are very small. The number of wasted particles can be very large in the case where measurements already provide enough information to determine the target's location with a reasonable accuracy, without using the information provided by the motion model. This is often the case in localization setups, for otherwise, algorithms not using a motion model would not work with reasonable accuracy. Consequently, a very large number of particles are required to avoid particle depletion after the update step, which would result in an inaccurate result. The vast number of required particles renders the approach numerically intractable in many applications.
In order to tackle the above shortcoming, a number of methods are available to reduce the estimation error resulting from using particle filtering with a restricted number of particles. In [21], a machine learning classifier is added to decide whether particles are in the correct room. In [22], a context variable, with values in the set {'free space', 'constrained space', 'static space'}, is added to the state to account for the movement constraints at the target's current position. Also, the authors of [23] divide the environment into reachable and unreachable areas, so as to discard particles falling into an unreachable area.
Generally speaking, the above methods aim to reduce the particle depletion phenomenon described above by complementing the motion model with environmentdependent information to decide where to draw particles. The aim of this work is to make the most of this principle, i.e., ideally to totally avoid particle depletion. To this end, we propose a novel particle filtering scheme which combines the prediction and update steps into a single one. In other words, we avoid drawing unnecessary particles during the prediction step that will afterwards disappear during the subsequent update step. This drastically reduces the required number of particles, making the approach numerically feasible in virtually any application. In deriving our method, we exploit the principle underlying a Bayesian tracking method called maximum likelihood Kalman filtering (MLKF) [24]. We then call the proposed method the maximum likelihood particle filter (MLPF).
The authors used the MLKF in [25] to propose a dynamic localization method which only applies to the case in which the motion model is linear and Gaussian. In this sense, the method proposed in this work can be thought of as the generalization of the method we proposed in [25] to the case of non-linear motion models. This kind of models are typically used in robotics [26], and as we show with experiments using real data, leads to great accuracy improvements.
The rest of the paper is organized as follows: In Section 2, we state the dynamic localization problem. In Section 3, we describe the particle filtering approach for solving it. In Section 4, we derive the proposed MLPF method. In Section 5, we present experimental evidence, showing the numerical advantages offered by the proposed method. Concluding remarks are given in Section 6. For ease of readability, all proofs appear in Appendix C .

Problem Description
We assume there is a target moving in an indoor environment. The target's Cartesian coordinates at time t are y t = [a t , b t ] ∈ R 2 and θ t ∈ (−π, π] is its orientation in radians, as shown in Figure 1. The target's motion model is described by the following non-linear state-transition equation where u t is the target's control input and e t = ã t ,b t and n t are process noises accounting for the model's inaccuracy. To write the above in compact form we define the target's pose x t = y t , θ t ∈ R 3 and the process noise w t = e t , n t . We then write with f defined according to (1) and (2). We assume that x 1 = 0 and the sequence w t ∼ N (0, Q(u t )) is i.i.d., with Q(u) being a diagonal positive definite matrix for each u.

Remark 1.
We state the motion model in the general form (1) in order to make our presentation valid for the different models used in the literature. In Appendix A we briefly describe the most popular models falling into this general form.
At each time t ∈ N, we obtain the following measurement where the map m : R 2 → R N is known and the measurement noise v t ∼ N (0, R) is i.i.d. and independent of w t . Problem 1. Let Z T = z 1 , . . . , z T denote the set of all measurements available up to time T. The dynamic localization problem consists of using, at time T, the measurements Z T to obtain an estimatex T of x T .

Available Solution Using Particle Filtering
As described in Section 1, the dynamic localization problem can be solved using the well-known particle filtering method [27]. In this section, we describe this approach.
The Bayesian filtering equations are initialized by p(x 1 |Z 1 ) = δ(x 1 ), and proceed as follows Equations (5) and (6) are called the prediction and update steps, respectively. These equations do not have close form expressions, in general. Hence, a numeric method is needed to approximately compute them. This can be done using particle filtering. More precisely, we start with Then, at time t > 1, the prediction step is computed by with x i t|t−1 obtained by drawing it from the following distribution Also, the update step is computed by with i t|t normalized so that ∑ I i=1 i t|t = 1. The use of particle filtering for solving the localization problem permits accommodating the accuracy by increasing the number of particles. As mentioned in Section 1, a problem that often occurs is that the desired accuracy requires a very large number of particles. This can result in a numerically very expensive algorithm. To address this issue, in the next section we propose a variant of the particle filtering algorithm, which drastically reduces the number of particles required to achieve a given accuracy.

Proposed Algorithm
In the particle filtering algorithm described in Section 3, at time t, we use the approximation (9) of the updated distribution p(x t−1 |Z t−1 ), together with the state-transition Equation (1), to generate the approximation (8) of the predicted distribution p(x t |Z t−1 ). The latter is then used in combination with the likelihood function p(z t |x t ) to approximate the new updated distribution p(x t |Z t ). This is done by weighting the i-th particle x i t|t by a weight i t|t proportional to the particle's likelihood p y t |x i t . As mentioned in Section 1, a drawback of this approach is that this weighting wastes many particles. In order to avoid this, we combine the prediction and update steps into a single one. More precisely, particles for approximating the updated distribution p(x t |Z t ) are directly drawn using the knowledge of the particle approximation of the previous updated distribution p(x t−1 |Z t−1 ), without drawing particles to represent the predicted distribution p(x t |Z t−1 ). We describe below how this is done. Notation 1. In order to simplify the notation we remove the dependence of Q on u t , i.e., we use Q instead of Q(u t ).
and, for ξ = h, a, b, We assume that at time t we know the following approximation Our goal is to use the above to build an approximation of the form Our first step is to derive a convenient decomposition of p(x t |Z t ). This is done in the following lemma. Lemma 1. The following equality holds in the system given by (1), (2) and (4) where It follows from Lemma 1 that, in order to find an expression of p(x t |Z t ) we need expressions of p(y t |Z t−1 ) and p(θ t |y t , Z t ). Under the approximation (10), the desired expressions are given in the following lemma.
p(θ t |y t , Z t−1 ) where to simplify the notation we use g −i t−1 = g i t−1 −1 , and ∼ ∝ denotes the approximately proportional sign. Also where J denoted the Jacobian operator, In order to build an approximation of the form (11) we need a method for drawing samples from p(x t |Z t ). In view of (12), this can be done by drawing samples from p(θ t |y t , Z t ) and p(y t |Z t ). It follows from (16) that p(θ t |y t , Z t ) is a Gaussian mixture distribution. Hence, we can readily draw samples from it. On the other hand, from (13) and (15) we obtain from where it is not clear how to draw samples. Our strategy to go around this consists finding a convenient Gaussian approximationp t (y t ) of p(y t |Z t ), from where we can draw samples, and then weight these samples to account for the difference betweenp t (y t ) and p(y t |Z t ).
To derive the Gaussian approximationp t (y t ) we use (13) and proceed in three steps. In Lemma 3 we do a Gaussian approximation of p(y t |Z t−1 ), in Lemma 4 we do the same with p(z t |y t ), and in Lemma 5 we combine these two approximations to obtainp t (y t ).

Lemma 4.
The following approximation holds where with Ξ t (y) = log N (z t ; m(y), R).
Combining Lemmas 3 and 4 we obtain the Gaussian approximation of p(y t |Z t ) given in the following lemma.

Lemma 5. The following approximation holds
with Combining the results in Lemmas 1, 2 and 5 we can build the desired approximation (11) following the procedure described in Algorithm 1.

2.
For each j = 1, · · · , I, do: (a) Pick an index i ∈ {1, · · · , I} by drawing it from the discrete distribution such that i has probability proportional to i (c) Compute the particle weight using

3.
Normalize the weights so that ∑ I j=1 j t|t = 1.

4.
Estimate the position usingx t = ∑ I i=1 i t|t x i t|t .

Experimental Validation
In this section, we evaluate the performance of our proposed indoor localization method. To this end, we compare it with that of two methods based on particle filtering (PF). The first one is a newly proposed RSS-based method described in [20], which we refer to as PF-RSS. In order to assess the advantage resulting from using CSI measurements instead of RSS ones, the second method is the CSI-based version the PF-RSS one. It is obtained by using the PF method, described in Section 3, instead of the proposed MLPF method. We refer to it as PF-CSI. We also consider the dynamic localization method from [25], which we call MLKF, and only applies to linear Gaussian motion models like the one given in Appendix B. We also consider the localization performance obtained via static localization. To this end, we use the static localization method proposed in [25], which, as shown in that work, outperforms other available static positioning methods like FILA [28], DeepFi [29], and PhaseFi [30].
As APs we used TP-Link TL-WDR4310 routers with the OpenWrt platform installed. To acquire CSI values we use the Atheros CSI tool [31]. It provides CSI values of 56 subcarriers, and for each one, two 10 bit values are used to represent its phase and amplitude. Motion capture cameras with millimeter-level accuracy are used to obtain the ground truth position.
For the target we use a two-wheel robot controlled by a NVIDIA Jetson Nano developer board, equipped with an Atheros AR9590 network interface controller. We consider a target whose motion is described by the velocity model given in Appendix A.1, with σ 2 v = 0.36, σ 2 ω = 0.072 and σ 2 γ = 0.072. To generate the measurements we pass the acquired CSI phase values through the linear calibration stage proposed in [30] ( § II.B), then unwrap the resulting phases and use the resulting phase differences as fingerprints. The whole procedure for generating fingerprints is described in [25]. The measurement model is given by the following Gaussian kernel expansion where p n ∈ R 2 , n ∈ [1, 2, . . . , N], denote the positions used to build the fingerprint database. Following [32] (Chapter 4.3.2) we choose χ = 1 2 I 1/3 . Also, κ = [κ 1 , . . . , κ N ] is computed so that, for each n = 1, . . . , N, m(p n ) matches the CSI measured at p n .
As performance metric we use the mean squared localization error , i.e., if the estimated pose at time t isx t = ŷ t ,θ t and the ground truth is y t , then For the PF method, following [23] we use I = 1000 particles. Also, for the MLKF method, we use the linear Gaussian motion model described in Appendix B, with σ 2 p = 0.12 and σ 2 ι = 0.25.
In the first experiment, we aim to evaluate the performance in large environments. To this end we use an empty room of 7 × 16 m 2 . Only one AP is placed at the corner of the room as shown in Figure 2. To build the fingerprinting database, we measured CSI's at N = 70 positions, one meter apart from each other, on a grid as also shown in the same figure. As the target moves within the room, we estimate its position every τ = 0.5 seconds.
As we can see from Table 1, the proposed MLPF method, with only 10 particles, clearly outperforms the outcomes of static positioning and particle filtering. Its performance is comparable to that of the MLKF. However, it outperforms the latter if we increase the number of particles to 50. This is due to the use of a non-linear motion model which yields more accurate statistical knowledge of the target's position at each sample time. Also, although particle filtering should yield the theoretically optimal result if the number of particles is large enough, the very large number of required particles result in that, even with 1000 particles, its accuracy is significantly inferior than that of the proposed MLPF method with 10 particles. Figure 3 shows how the positioning accuracy of the proposed method improves with the number of particles. In Figure 4 we show the error cumulative distribution (ECD) of each method, i.e., for each error (in meters), we show the proportion of positions whose localization error is smaller than that error. The estimated trajectory and ground truth of the experiment are shown in Figure 5.    The second experiment aims to show the performance of our method in living and working environments. To this end we use two adjacent offices of 7.1 × 11.3 m 2 connected by a corridor. We build the fingerprint database using N = 72 positions, as shown in Figure 6, together with the AP's location.  We see from Table 2 that, while in the empty room the proposed method with 10 particles performed similarly to the MLKF method, in this second experiment it largely outperforms the latter. The table also shows the clear advantage of the proposed method over its rivals. The reason for this advantage is as follows. Due to the signal distortion and attenuation caused by multipaths, walls and furnitures, the localization information provided by measurements is not enough to yield accurate estimates at certain locations. This can be seen in the ECD shown in Figure 7. At those locations, dynamic localization methods resort to motion model information to produce an estimate. Figure 7 shows that only the proposed MLPF method can produce accurate estimates at those locations. This is because it is the only method that is able to make full use of the rich information provided by a non-linear motion model. While methods based on PF potentially have the same possibility, the extremely large number of required particles make this unfeasible in practice. To conclude the experiment, we show in Figure 8 the ground truth and estimated trajectories.

Conclusions
We proposed a new indoor positioning method, which we called maximum likelihood particle filtering. Its essential idea consists of combing the prediction and updates steps of a traditional particle filter into a single step, which requires solving a maximum likelihood estimation problem. This results in a better utilization of particles. The method so obtained achieves high accuracy indoor positioning with a drastically smaller number of particles in comparison with particle filtering. This makes it numerically feasible for real-time applications. We validated our claims in a WiFi indoor positioning experiment using real data. Our experiment show that our method leads to great accuracy improvements achieving sub-meter level indoor localization accuracy, which exceeds the requirements of the 5G NR Release 16 standard from 3GPP [33].  The input is u t = [v t , ω t ], with v t and ω t being the robot's measured linear and angular speeds, respectively. Also and α 1 , . . . , α 6 being system parameters. The motion model is given by where τ denotes the sampling time and v t = v t +ṽ t , ω t = ω t +ω t .
Proof of Lemma 2. We split the proof in steps: 1.
It follows straightforwardly from [34] ( § 5.4) that Since the variance of n t−1 is small, we can do the following approximation, where to simplify the notation we used f −i t−1 = f i t−1 −1 .

Proof of Lemma 4. Let
L t (y) = p(z t |y t = y).
Recall that N is the dimension of z t . It follows from ( [24], Theorem 6) that, for all ∈ R 2 , we can do the following approximation with Ξ t (y) = log L t (y) = log N (z t ; m(y), R).
In view of the above, for large N we have Hence p(z t |y t ) = L t (y t ) ∼ ∝ N (y t ; λ t , Λ t ).
Proof of Lemma 5. From the above we get p(y t |Z t ) ∝p(z t |y t )p(y t |Z t−1 ) The result then follows since, from [35] ( § 8.1.8), N (y t ; λ t , Λ t )N y t ; g i t−1 (0), Jg i t−1 (0)Q e Jg i t−1 (0) = α i t N y t ; ξ i t , Ξ i t .