Real-time NLOS/LOS Identiﬁcation for Smartphone-based Indoor Positioning System using WiFi RTT and RSS

The accuracy of smartphone-based positioning methods using WiFi usually suffers from ranging errors caused by non-line-of-sight (NLOS) conditions. Previous research usually exploits several statistical features from a long time series (hundreds of samples) of WiFi received signal strength (RSS) or WiFi round-trip time (RTT) to achieve a high identiﬁcation accuracy. However, the long time series or large sample size attributes to high power and time consumption in data collection for both training and testing. This will also undoubtedly be detrimental to user experience as the waiting time of getting enough samples is quite long. Therefore, this paper proposes a new real-time NLOS/LOS identiﬁcation method for smartphone-based indoor positioning system using WiFi RTT and RSS. Based on our extensive analysis of RSS and RTT features, a machine learning-based method using random forest was chosen and developed to separate the samples for NLOS/LOS conditions. Experiments in different environments show that our method achieves a discrimination accuracy of about 94% with a sample size of 10. Considering the theoretically shortest WiFi ranging interval of 100ms of the RTT-enabled smartphones, our algorithm is able to provide the shortest latency of 1s to get the testing result among all of the state-of-art methods.


Index Terms
Real-time NLOS identification, WiFi RTT, WiFi RSS, machine learning, random forest, smartphone, positioning.

I. INTRODUCTION
Location is vital for numerous applications driven by uncountable mobile users and developers.
The global navigation satellite system (GNSS) has been served for years to provide high-precision localization and relevant applications in outdoor scenario [1].However, the low penetration of GNSS signal through walls and obstacles sharply decreases the positioning accuracy in indoor environment [2].Numerous indoor positioning methods have been proposed to fill the vacancy of providing location-based services (LBS) in indoor scenario these years, such as WiFi [3]- [5], ultra-wideband (UWB) [6]- [8], Radio Frequency Identification Device (RFID) [9]- [11], Bluetooth [12]- [14].Since most smartphones are WiFi-enabled, and WiFi access points (APs) are widely installed in both private and public environments, WiFi-based methods are widely used to provide positioning service to users with smartphones in indoor scenarios.
WiFi-based indoor positioning methods are usually implemented by either fingerprinting method or range-based method.Fingerprinting method computes the user's position by matching the received signal strength (RSS) from multiple WiFi access points (AP) that nearby to the RSS that pre-recorded at known locations.The range-based method usually computes the user location by some algorithms, such as multilateration and least square, through the estimated distance between the AP and the smartphone according to the RSS [18].Especially, the protocol of fine time measurement (FTM) standardized by IEEE 802.11-2016 brought the new technology of round-trip time (RTT), which could provide meter-level positioning accuracy [16].Promoted by Google, various manufacturers claim that their updated Android-powered smartphones are WiFi-RTT enabled, this includes Google, Xiaomi, LG, Samsung, Sharp, and so forth [17].These WiFi-RTT enabled smartphones can send WiFi ranging requests to nearby APs to get the ranging DRAFT April 26, 2021 results (such as RSS and RTT-based distance measurement) in a short period of time (≥200 ms in this study) without connecting to the APs.
However, both fingerprinting and range-based localization methods are not satisfactory as the WiFi signal is vulnerable to multi-path effects, especially in non-line-of-sight (NLOS) conditions when obstacles block the clear line-of-sight between the transmitter (AP) and the receiver (smartphone).Therefore, NLOS conditions should always be identified first.The work from Xiao et al. [18], [19] extracts the multiple features from a group of RSS samples to distinguish between LOS/NLOS conditions.The algorithm could achieve the accuracy of around 95% and over 90% using the testing set (collected in the same experiment environment as the training set) and validation set (collected in a different environment) in the group size of 1000, respectively.
Yu et al. [20] proposed a method to reduce the impact of NLOS and multi-path through the combination of real-time ranging model based on WiFi RTT and pedestrian dead reckoning (PDR).Genter et al. presented a distance estimation error model with the Gaussian mixture model to calibrate the measuring distance using WiFi RTT.The work from Han et al. [21] uses support vector machine (SVM) to classify the NLOS and LOS conditions with the features extracted from a group of WiFi RSS and RTT samples.Such a method's best accuracy achieves over 92% on the testing set (collected one the same site as the training set) while the group size is 99.
Although the above-mentioned state-of-art methods could achieve good performance under certain circumstances, most of them are not real-time enough or not for real-time positioning use.Given a fixed sampling rate, it cost much time and power for smartphone to collect hundreds of samples to identify the NLOS and LOS conditions between the phone and the AP.This is even worse in practice as the ranging-based method usually needs at least three APs to calculate the user's location.Some researchers show efforts to reduce the influence of NLOS conditions in tracking the user's position by integrating the PDR method.However, it is not suitable when computing the absolute position.Even some work could achieve good performance on the testing April 26, 2021 DRAFT set where collected on the same site as the training set; nevertheless, the robustness has not been validated.
Therefore, this paper proposes a real-time NLOS/LOS identification method for smartphonebased indoor positioning system using the ranging results of WiFi RSS and RTT-based distance measurement.By analyzing a series of temporal WiFi samples (raning samples) from the same location, we extract several features of the data and implement a machine learning-based algorithm using random forest to distinguish between NLOS and LOS conditions.The main contributions of this work are as follows: • We are the first using WiFi RTT and RSS to realize real-time NLOS and LOS identification for smartphone-based indoor positioning use.In particular, our method provides high identification accuracy but low latency for each test.For each NLOS/LOS identification, the low demand of number of samples potentially saving the time cost and power consumption in both training and testing.This also enhances the user experience as the low demand of ranging samples can save the waiting time of getting the results.
• Our method is the first investigation of exploiting the random forest algorithm for NLOS/LOS identification using WiFi RTT-based distance measurement and RSS.Our algorithm achieves the highest accuracy that outperforms all of the state-of-art methods of real time NLOS/LOS identification for smartphone using WiFi.
• We investigate and analyze multiple features of the collected WiFi ranging samples; some of them are shown to be effective for NLOS identification.
• We present that using only the mean value of RTT, RSS, or their combination is insufficient to provide reliable discrimination accuracy.Therefore, a machine learning-based classification method using random forest is chosen to explore the various features simultaneously.
• We validate our proposed method in a different experiment environment, rather than only using the widely adopted K-fold cross-validation on the testing set where collected in the same environment as the training set.• Our method is ease of implementation for practical use.All the experiments and evaluation of our method is based on the data collected by smartphones and WiFi access points in real experiment sites without any pre-setting or reconfiguring infrastructure.

A. Ranging with WiFi RTT
As mentioned in the previous section, the protocol of FTM is able to calculate the distance between RTT-enabled smartphones and RTT-enabled access points.As shown in Figure 1, the access point sends acknowledgement (ACK) to the smartphone once the FTM request is received April 26, 2021 DRAFT (an Initial FTM Request (iFTMR) should be sent first).This gives a single burst that contains multiple FTMs (maximum of 31, excluding the iFTMR).One burst can happen with a burst period ranging from 100ms to every 2 16 * 100ms (1.8 hours) , which ultimately depends by the master [23].As the timestamps record the time when the signal is sent and received, the RTT can be calculated by subtracting the timestamp from the AP and the time delay occurs in the smartphone by: accordingly, the distance between the smartphone and access point (RDM) can be estimated by multiplying half the RTT to the velocity of light (c = 3 * 10 8 m/s): Promoted by Google, the function of WiFi RTT was introduced in Android 9 (API level 28) for more practical use.The FTM request is named as a ranging request in Android system for the RTT-enabled smartphones.The successful ranging request gives the user multiple measurements, such as RTT-based distance (in mm), RSS (in dBm), timestamp (in ms), and so forth.We define a pair of RTT-based measured distance (RDM) and RSS from one successful ranging request as one ranging sample in this paper.

B. RDM and RSS in NLOS/LOS Conditions
As previous discussed in Section I , RSS is detrimentally affected by multi-path effect, especially in NLOS environment.Similarly, NLOS condition usually causes an inaccurate distance measurement between the smartphone and AP due to a delay and fluctuation of the travel time of the signal.One of our experiments shown in Figure 2 (the experiment devices and settings are illustrated in Section III) presents that the block of the wall causes a further RDM and a weaker RSS in NLOS than the ones in LOS. Figure 3   in the Table I, the RDM in NLOS has a positive bias than it in LOS, while the RSS is over a hundred times weaker.This motivates us to discriminate the NLOS and LOS conditions using the combination of RDM and RSS.As shown in Figure 4a, the NLOS and LOS ranging samples are gathered in two clusters with a little overlapping.The boundary between the two clusters is much more clear when we calculate the mean value of a group of ranging samples.Figure 4b presents the mean of every 10 ranging samples from NLOS and LOS conditions at the same  ground-truth distance.The identification of NLOS and LOS conditions in this situation can be treated as a binary classification problem with two features (RDM and RSS) that could be solved by a simple linear classifier, such as support vector machine and decision tree.
However, the problem is more complicated in practice.In indoor environments, it is possible to measure the similar RSS and RDM for different ground-truth distance, obstacles, devices, and so forth.As shown in Figure 5, the pattern of the ranging samples from NLOS and LOS shows a large portion of overlapping even if we calculate the mean of every 10 samples.It As introduced before, the work from Xiao et al. [15] verified that the combination of some features of RSS (such as mean and kurtosis) could achieve a very high identification accuracy while the set size of N is 1000.However, this takes about 100s (considering the shortest sampling interval of 100ms) for an Android-powered smartphone to collect enough samples in practice.
As we consider the real-time NLOS identification for smartphone, a small set size should always be considered in both training and testing of the algorithm.Therefore, we analyze the variation, dispersion and distribution (using quartile deviation, skewness, kurtosis, and so forth) of a set of where N = 10, 30, 50, 100.All the extracted features and their parameters are listed below.
while Q 3 and Q 1 are the third and the first quartile in the freqeuncy distribution of the set, respectively.We also calculate the number λ of the outliers that more than 1.5 (Q 3 − Q 1 ) (so called interquartile range) above the third quartile or below the first quartile.
Range (R RDM , R RSS ): We calculate the range of the samples to denote the fluctuations of a set of measurements.The range R is defined as: while T max and T min are the maximum and minimum measured RDM or RSS in each set, respectively.
Skewness(S RDM , S RSS ): We calculate the skewness to evaluate the asymmetry of the probability distribution of the ranging samples in each set [24].The skewness S is defined as: Kurtosis(K RDM , K RSS ): The kurtosis evaluates the peak of the probability distribution of the ranging samples in each set, which is defined as: Therefore, one sampled datum F from a set of N ranging samples is denoted by: DRAFT April 26, 2021 As discussed before, the pattern of the ranging samples from different conditions shows a large portion of overlapping when considering the real indoor environments.It is uncertain that whether the extracted features of RDM and RSS could still remain their characteristics in NLOS/LOS conditions in a complex environment in practice.Therefore, we conduct experiments (illustrated in Section III) to investigate the NLOS and LOS ranging samples covering the following situations: • Same ground-truth distance (SGD): In this situation, the ground-truth between the smartphone and the APs keeps the same in both NLOS and LOS condition (this is similar to the experiment mentioned previously).
• Similar RTT-based measured distance (SMD): As the ground-truth distance to the two APs in NLOS and LOS condition is different, it is expected that some features (such as the mean of RDM) that are able to distinguish the NLOS and LOS conditions in SGD may not work in this situation.
• Similar recieved signal strength (SRS): Similar to SMD, as the obtained RSS in both NLOS and LOS condition is similar, some features (such as the mean of RSS) may not work in this situation.
Besides, the number of extracted features and their impact on NLOS and LOS identification are also what we are interested in.As we notice in Figure 4 and 5 that RSS usually shows higher sparsity but lower dispersion than RDM, it is speculated that some extracted features of RSS may not able to help to identify the NLOS and LOS conditions, especially when the set size N is small.The features with low contribution to the improvement of identification accuracy should be eliminated, as more features will increase the complexity of the algorithm.Therefore, we use the technique illustrate below to find the best combination of features to distinguish between NLOS and LOS conditions by evaluating the multiple features simultaneously.

C. NLOS/LOS Identification Employing Random Forest
Based on the problem analysis and motivation in the previous subsection, the task here is to decide whether a given set of ranging samples corresponds to NLOS or LOS conditions.Machine learning-based algorithms have been widely adopted by many studies to classify NLOS and LOS samples.Least squares-support vector machine (LS-SVM) is one of the most adorable methods that has been implemented in various studies proposed by Xiao et al. [18], [19], Chitambira et al.
[30], Han et al. [22], and so forth.In recent years, deep learning-based NLOS/LOS identification methods (such as [31], [32]) have attracted some attention.However, the high computational complexity of the above mentioned methods leads to a long training and testing time, which is not suitable for practical use.Alternatively, some studies such as [35], [36] have already proved that random forest can show great performance in NLOS/LOS identification with low computation complexity.Therefore, we implement a random forest machine learning algorithm to solve the NLOS/LOS discrimination problem.
Random forest is an ensemble learning algorithm that trains the model using several classifiers (decision trees) with random set of features [33].It makes the final prediction by combining the results from all the classifiers through majority of votes.As RF uses the combination of both boosting and bagging, it usually produces a model that is not highly overfit with high efficiency.
As we only have the two labels of NLOS and LOS condition in this study, Classification and Regression Tree (CART) is chosen to solve this binary classification problem (we employ 10 decision trees in this study to avoid high computation complexity [34]).Rather than using information entropy, Gini index is used to evaluate the features and divide the input samples in CART for faster computation.The Gini index is defined as: where L is the number of categories of the dataset, and p l denotes the probability of the sample's lable is l.As the dataset X has 2 classes of data in this case, L is set to 2, and hence the Gini DRAFT April 26, 2021 index of X according to a given feature x i could be computed by: The Gini index reflects the uncertainty of the given set of samples.Since the Gini index is the the difference between 1 and the sum of the probability squares of category l (as shown in 8 ), the larger Gini index, the higher uncertainty of the samples.Therefore, the optimal partition feature x could be selected by minimizing the Gini index as follows:

III. EXPERIMENT SETTINGS
This section illustrates the experiment sites, setup, and equipment that we used to collect the ranging samples in this study.This section also discusses the data collection procedure for constructing different datasets for training, testing, and validation.

A. Experiment Sites
The experiments were conducted in two different real-world sites, including an office and a student accommodation.Figure 6 shows the office site on the ground floor of the Scottish Microelectronics Centre.It is a complex indoor environment with wooden doors and concrete constructed walls (reinforced with metal rebars), as well as different obstacles.The volunteers were asked to collect the ranging samples from the RTT-enabled access point following the path.
The paths are composed of multiple test points at the interval of 1m as shown in Figure 7.The NLOS samples were collected from AP2 on the path marked in red, while the LOS samples were collected from AP1 following the path marked in blue.The two paths are separated by the wall and other obstacles in the environment.With these settings of test points and access point locations, the ground-truth distances between the smartphone and access point vary from 0.5m to 12m approximately.As the access points were set on two tripods at the height of over 1.7m at the center of the office site, the clear line-of-site was not affected by the desks and chairs surrounding the blue path.
Another experiment site shown in Figure 14a   rooms.Data were collected on the test points marked in green.At each point, the volunteers were asked to collect samples from all the APs and their conditions.For example, the samples collected in room 1 from AP1 are in LOS condition, and the samples collected from AP2 and AP3 are in NLOS condition.There is also one test point marked in red at the middle of AP1 and AP2, while the distance between the two APs is 8m.

B. Devices and Software
In this experiment, four smartphones include three Google Pixel 2 and one Google Pixel 2XL were used to collect the ranging samples.Google WiFi access points were utilized as the transmitter in the measurements.Some core specifications are listed in Table II.The devices that used in this work are shown in Figure 8.In this study, we used the Android application named WifiRttScan developed by Google to send ranging requests and collect the ranging results.The

C. Data Collection
As previously mentioned, the lowest latency of sending the ranging request from one RTTenabled smartphone is 100ms in theory.However, it is recommended by Google that the sampling interval should not be shorter than 200ms to avoid collision and other software problems.
Therefore, the lowest latency is set to 200ms in this work, which gives 10 ranging samples every two seconds if all requests are successful (one sample per 200ms).Although there is no clear evidence that illustrates the sampling rate would affect the samples' quality, we set the ranging period at 200ms, 250ms, 333ms, and 500ms for the Google Pixel 2XL and other three Google Pixel 2, respectively, to reduce the uncertainty.The smartphones were always kept waking up (the application was always running in the foreground), face up and oriented parallel to the ground during the data collection to avoid the effects that may caused by some reasons, such as the gesture of holding the phone, orientation of the antenna, or other software problems.Besides, the volunteers were asked to collect the samples statically.This means that the volunteers who held the smartphone can not move once the data collection starts.At least 600 samples were collected by each device at each point.Especially, more than 7000 samples were collected on the test point marked in red in Figure 14a for the analysis illustrates in Section II.

IV. EVALUATION
This section presents the evaluation of the proposed method.We first introduce the construction of the three datasets, and then present how the extracted features would affect the NLOS/LOS identification through the analysis of probability density function (PDF).This is followed by the results of training and testing of the machine learning-based method using random forest.
Finally, we select the best features and validate the reliability of the well-trained models.

A. Datasets
We construct three datasets from the data collected in different environments.The first dataset contains the samples collected from a student accommodation.Our preliminary experiment and analysis of ranging samples in different conditions (illustrated in Section II) are based on this dataset, which motivates us to design a machine learning-based method and conduct the following experiment to solve the NLOS and LOS identification problem.
The second set of samples were collected in the static environment of office cite that few people were walking around.This set is used for training and testing of the general NLOS identification performance of different algorithms.
The last set of samples collected in the student accommodation is used for validating the robustness of our methods in a different experiment site than office site.In practice, it is essential that the algorithms could work in different scenarios to avoid repeated training process, as training is usually labour intensive and time consuming.

B. Training
Figure 5a presents the data we collected in the static office site in Figure 6  their probability distributions.As shown in Figure 11, the NLOS and LOS ranging samples give different probability distributions of these features (the set size N=10).We can observe that beside the different shape of the PDF curves of the extracted features from NLOS and LOS samples, there is usually a shift between the two fitted curves of the samples collected in different conditions.However, most of the features show some overlapping areas, which may cause the inaccurate identification of NLOS and LOS conditions.Therefore, we test the performance of the proposed methods trough different combinations of the features.The combinations of features are represented by C i (i = 1, 2, . . ., 8).The subsets of different combinations of features are shown in Table IV.As the features are extracted from either RDM or RSS signals, we design four different schemes to assemble the features from different signals: ): This scheme uses all the features from RDM and RSS (also called FTM features).
• C SEL i (i = 1, 2, . . ., 8): The selected features are used in this scheme (also called SEL features).This contains all the features from RDM samples and only the mean value of

C. Testing
The metrics that we evaluated the proposed algorithm are fail rate (the algorithm fail in detecting NLOS), false alarm rate (the algorithm identifies NLOS while the samples are from LOS), and overall false detection rate (the sum of fail rate and false alarm rate).The three metrics are denoted by P N , P L and P O , respectively.
To evaluate the performance of identification more accurately, we test the proposed algorithms using K−fold cross validation [38]: • All the training samples collected in the office site are randomly divided in to K disjoint sets.
• The algorithms are trained with K − 1 sets and tested with the remaining set.
• The training and testing processes are conducted K times to ensure that each set has been tested.K is set to 10 in this work., and slightly decreases when the samples size expands from 10 to 100.This is because the distribution feature of skewness that measures the asymmetry of the probability distribution cannot be extracted from a small number of samples.We also notice that another distribution feature of kurtosis is not able to help reduce the identification error no matter what sample size is.The false detection rate of C RDM 7 using the combination of mean and kurtosis is even higher than C RDM 1 .
As shown in Figure 12b, the lowest false detection rate of using RSS features is 0.1750 present a trend that the error decreases as the samples size expand;nevertheless, this does not meet the demand of real-time NLOS/LOS identification using a few samples.
Figure 12a and 12b also show that the false detection rates are the highest when the size is 10 and the lowest when the size is 100, which illustrates that the number of either RDM or RSS samples collected at each location impacts the NLOS/LOS discrimination accuracy.Owing to the noise in the measurement, using single source of signal (either RDM or RSS) and its features among a small sample size usually leads to an inaccurate fit of the samples and a high false detection rate.Large sample size may help reduce the error in NLOS identification, which does not meet the aim of real-time identification of this work, as discussed at the beginning of this paper.Therefore, we selected and combined both RDM and RSS features to achieve a better performance remaining the small sample size (N =10).The results are discussed as follows.
Figure 13a shows the testing results of using the features extracted from FTM samples (both RDM and RSS).We can observe that the false detection rates in all subsets (C F T M

8
) are April 26, 2021 DRAFT all much lower than using only RDM or RSS features.Almost all the subsets could achieve a false detection rate lower than 0.05.It is obvious that most subsets can provide low identification errors when the sample size is down to 10.This may because the model learns the joint distribution of multiple features of both RDM and RSS signals.
Although the method of using FTM features achieves a low false detection rate using a small set of samples, it doubles the number of features in each subset than using single-source features (either RDM or RSS), which increases the training cost.As we previously discussed that most of the extracted features of RSS do not show a significant difference in NLOS and LOS conditions in a small size of samples.Our testing results in Figure 12b also verifies that the extracted features of RSS show limited improvement on the detection rate.Therefore, we construct several new subsets of selected (SEL) features that remains all the RDM features but only the mean of RSS in each subset.As the testing results are shown in Figure 13b, the false detection rates of all the subsets using SEL features remain at a similar low level as using FTM features.
The top three lowest false detection rates of using SEL samples are 0.0193, 0.0221 and 0.0238 when the subsets of C SEL ) can always help to reduce the identification error even when the sample size is down to 10, which meets our aim of realtime NLOS and LOS identification using limited number of samples.To verify the performance and the reliability of using SEL features over only 10 samples, we conducted more experiments in different environments for validation.The results are discussed in the next subsection.

D. Validation
We have fully trained and tested the models using a large amount of data collected at the   We also compare our method to other recent years proposed state-of-art NLOS/LOS identification algorithms for smartphone.We evaluate the methods through varying aspects, including the used signal, number of features, sample size for one test, testing accuracy (where collected on the same site as the training set) and validation accuracy (where collected on the different site as the training set).As shown in Table VI, we first compare our results to other NLOS/LOS identification methods for smartphone using WiFi signals.Our testing accuracy outperforms all other methods with the least samples of 10 and a moderate number of features of 5.As most methods have no experiments for validation, the robustness and reliability of our method is higher.We also list some methods that use other mobile radio frequency devices and CSI to realize NLOS/LOS identification methods as a reference.We can observe that our method of using WiFi RTT and RSS with the least number of samples for each test is highly competitive and accurate.

V. CONCLUSION AND FUTURE WORK
This paper has proposed a real-time NLOS/LOS identification method with high accuracy and low latency for smartphone-based indoor positioning system using WiFi ranging.Our preliminary experiment and analysis present the possibility and limitation of NLOS/LOS identification using

Fig. 2 .Fig. 3 .
Fig. 2. Collecting ranging samples from the smartphone at the middle of the path between the two APs in different rooms

Fig. 4 .
Fig. 4. Illustration of how RDM and RSS can be distinguished between NLOS and LOS conditions (a) original ranging samples; (b) grouped ranging samples (group size = 10).

Fig. 5 .
Fig. 5.A large amount of ranging samples collected in a variety of settings, including different smartphones, sampling rate and ground-truth distance (details will be illustrated in Section III), in NLOS/LOS conditions (a) original ranging samples; (b) grouped ranging samples (group size = 10).

Fig. 6 .
Fig. 6.Floor plan of the first floor of the Scottish Microelectronics Centre.Access points and receiver locations are marked.

Fig. 7 .
Fig. 7. Marks of the test points on the path in the office site

Fig. 8 .
Fig. 8. Smartphones and access point used in this experiment

Fig. 9 .
Fig. 9. Interface of the android application for data collection

Fig. 10 .Fig. 11 .
Figure5apresents the data we collected in the static office site in Figure6for training and testing.The large amount of ranging samples cover all the situations that we concerned and illustrated in Section II.This contains the situations of the ranging samples collected in both

Fig. 12 .
Fig. 12. Fail rate (PN ), false alarm rate (PL) and false detection rate (PO) of the RF-based NLOS identification method using different signals and their features (a) RDM features; (b) RSS features.

Figure 12 presents 2 − C RDM 8 )
Figure 12 presents the fail rate, false alarm rate and false detection rate of the RF-based NLOS identification method using varying signals and different subsets of features.We can observe that the extracted features of RDM of a set of samples and their combinations (C RDM 2

6 ) 4 ) 2 − C RSS 8 )Fig. 13 .
Fig. 13.Fail rate (PN ), false alarm rate (PL) and false detection rate (PO) of the RF-based NLOS identification method using ranging samples (both RDM and RSS) and their features (a) FTM features (RDM and RSS); (b) SEL features (selected features of RDM and RSS).

3 , C SEL 4 and C SEL 5 are
used, respectively.We can also observe that the standard deviation and quartile deviation of RDM (C SEL 3 andC SEL 4

Fig. 14 . 3 , C SEL 4 and C SEL 5 )
Fig. 14.Fail rate (PN ), false alarm rate (PL) and false detection rate (PO) of the RF-based NLOS identification method using ranging samples (both RDM and RSS) and their features (a) RDM and RSS; (b) RDM and RSS (selected features).
WiFi RSS and WiFi RTT-based distance measurements.According to the evaluations of the multiple extracted features from a large amount of ranging samples, a machine learning-based algorithm using random forest has been established to investigate the impact of different combinations of the features on real-time NLOS/LOS discrimination accuracy.We then evaluated our method by setting experiments on different sites in training and validating strategies.The proposed method shows good reliability and robustness to the change of environments, which means the method does not need repeated site survey and training for different applications.The required sample size and the identification accuracy are highly competitive compared to other state-of-art NLOS/LOS identification methods.As this study focuses on real-time NLOS/LOS identification April 26, 2021 DRAFT

TABLE I STATISTICAL
RESULTS OF RDM AND RSS COLLECTED AT THE SAME GROUND-TRUTH DISTANCE IN DIFFERENT CONDITIONS Condition Mean of RDM (m) Mean of RSS (dBm)

TABLE VI COMPARISON
OF DIFFERENT NLOS/LOS IDENTIFICATION METHODSIn this subsection, we first compare our proposed random forest-based method to other machine learning algorithms.We implemented and trained a LS-SVM model and a fully connected deep neural network (3 dense layers, 512 nodes per layer, batch size=100, epochs=100) using the best features of C SEL .As listed in Table V, the proposed method using random forest achieves the best correct detection rate with the least training time considering 10-fold cross validation.