Are We There Yet? Indoor Localization and Tracking
Renewed Interest in Indoor Localization
As I was re-watching the Mission: Impossible series, whose first movie was released in 1996, I was reminded of how effortlessly Ethan Hunt and his team at IMF could track a moving target, in the basement of an unknown building, in real time, uninterrupted, and with centimeter precision. Around the same time, handheld GPS receivers were becoming mainstream. I remember the excitement and awe of being able to determine my location anywhere on earth, thinking it would not be long until the fiction of indoor tracking would become a practical and affordable reality.
Now some 30 years and many (implausible?) spy movies later, that future is finally showing promise, and the implications of such a technology are profound. Potential applications in healthcare include asset tracking (like wheelchairs, defibrillators, or IV pumps), patient flow optimization (reducing wait times by mapping bottlenecks in real time), and infection control (reconstructing contact networks after outbreaks). Retail and commerce could be streamlined with personalized navigation (guiding shoppers to specific products), heatmap-driven store layout (redesigning shelves based on actual traffic patterns), and inventory reconciliation (knowing where items actually are vs. where they should be). Factories and warehouses could also benefit from asset tracking, collision prevention, and throughput optimization. You can envisage uses in museums, public transport, emergency response, forensics, accessibility, and much more. In the home, simply knowing the location of smart speakers, light bulbs, door locks, or even kitchen appliances may dramatically reduce the friction of setup.
A key enabling factor for wider adoption is the emergence of technologies offering high accuracy—some sub-10cm—that are commodity-level affordable. Running on very low power, sometimes harvesting RF or solar energy, standardized anchors and tags can create large indoor positioning ecosystems. The most transformative applications will likely emerge from combining location with context – not just where someone is, but what they’re doing, what they need, and how the environment should respond. The line between helpful and intrusive will be one of the defining design challenges of the next decade.
This article provides a brief overview of some of the indoor tracking technologies available today and in the near future.
Foundational Technologies
The current localization landscape may seem fragmented, but all technologies come down to a simple physical idea: using measurable signal changes to infer distance and/or angle, be it with radio signals, sound, light, inertial measurement, or a fusion of the above. Throughout this text, we define a tag as a device that is tracked relative to an anchor. Some techniques rely on two-way communication between an initiator and responder.
Wi-Fi
Wi-Fi was designed for data communication, not positioning. Retrofitting distance estimation onto a communication protocol introduces fundamental challenges: multipath interference, signal absorption, hardware variation, and the physics of radio propagation in complex indoor environments.
Some of the early works on indoor localization used the Wi-Fi Received Signal Strength Indicator (RSSI) as a proxy for range. As signal strength decays with distance, the observed signal strengths from different Wi-Fi access points become a fingerprint of a particular location. Of course, signal strength is also affected by constructive and destructive interference caused by reflectors in the vicinity and can change dramatically in indoor settings where the environment often changes through furniture movements, people mobility, etc. While information from several Wi-Fi access points provides some degree of resilience, only coarse-grained localization is possible using Wi-Fi fingerprinting.
With this limitation in mind, recent IEEE 802.11mc and 802.11az Wi-Fi standards allow measurement of wireless propagation delay using fine-time-measurement (FTM) protocol to measure Round Trip Time (RTT), where a mobile Wi-Fi device can actively range with an access point obtaining its distance. IEEE 802.11mc typically achieves 0.5-2 m accuracy in practice.
Several recent efforts have shown that Wi-Fi can also be a viable technology for sensing applications such as device-free localization, motion recognition, or human identification. The IEEE 802.11bf task group has formalized Wi-Fi sensing in the 2025 standard.
Bluetooth
Like Wi-Fi, Bluetooth was designed as a communication protocol and not a ranging technology. Bluetooth RSSI has nevertheless been used for ranging between cellphone and consumer devices including smart tags (e.g. Tile), hearing aids (inc. Oticon, Starkey), door locks (e.g. Yale Assure), and smart speakers (e.g. Apple HomePod). Direction finding was added in Bluetooth 5.1 to measure Angle of Arrival (AoA) for receivers and Angle of Departure (AoD) for transmitters using phase shifts in antenna arrays. With linear arrays, AoA/AoD error increases as the beam is steered away from the broadside direction; depending upon the use case, reliable AoA/AoD might be constrained to a narrow field of view.
Bluetooth 6.0 introduced the Channel Sounding (CS) framework, including RTT and Phase-Based Ranging (PBR). Bluetooth RTT, operating on a similar principle to IEEE 802.11mc Wi-Fi, achieves 1-2m accuracy in practice. Bluetooth RTT is less widely deployed but benefits from Bluetooth’s ubiquity in small devices, lower cost, and lower power consumption. PBR measures the phase shift due to time of flight with pure tones on multiple frequencies exchanged between an initiator and reflector. Range is determined either by fitting to the unwrapped phase slope or by estimating Channel Impulse Response (CIR) with an interpolated IFFT followed by peak picking.
Bluetooth CS is now supported by several smartphones running Android and iOS, expected to achieve 20-50 cm accuracy in practice. The temporal resolution of the CIR is inversely proportional to the effective total bandwidth, in this case limited to 80 MHz. The Bluetooth Special Interest Group (SIG) is exploring 5 and 6 GHz bands that would increase bandwidth and consequently ranging accuracy.
Ultra-wideband (UWB)
Recall that both Wi-Fi and Bluetooth were designed primarily as communication protocols and not as ranging or localization primitives, and that narrow-band signals impose limits on the time-domain accuracy measurable through round trip time methods. Ultra-wideband radios tackle this issue by transmitting extremely short standardized pulses using their entire bandwidth (typically 500 MHz to 1 GHz). At a receiver, these pulses appear as signals that quickly rise above the noise floor allowing precise time of arrival inference. Further, the sharp short pulses remain separable from nearby multipath providing a robust estimate of the first arriving signal path, which is often also the line-of-sight path. Together with a protocol that eliminates clock offset errors and mitigates the effects of clock drift, UWB realizes close to 10-centimeter ranging capabilities.
The UWB pulses and the preamble formats are standardized in IEEE 802.15.4a and IEEE 802.15.4z. UWB packets can also carry data (albeit at data rates significantly lower than those available in Wi-Fi), including device IDs and timing information that are crucial for precise ranging. This allows interoperable bi-directional communication between UWB-enabled devices, with the ability to initiate ranging available at both the anchors as well as the tags.
UWB is becoming a popular ranging technology with its inclusion in most recent Apple phones and Apple AirTags, Google Pixel phones, and the higher-end Samsung Galaxy phones. Further, car manufacturers such as Tesla and BMW have also adopted UWB for their keyless entry systems, expanding the UWB reach into several aspects of everyday life.
UWB’s large bandwidth also opens up the scope for sensing signal reflections from the human body or other objects nearby. Observing disturbances to received signals allows device-free inferences about movements in an indoor space, enables through-wall movement sensing, and allows creating security perimeters while keeping the sensors hidden deep inside an indoor space. UWB RADAR technology is expected to be used for gesture recognition, in vehicles for collision prevention, and also for proximity-based context awareness in future applications.
Acoustics
Many consumer TVs, soundbars, and smart speakers are equipped with microphone arrays for voice capture. The same microphones can also be used to determine the range and angle of a sounding loudspeaker by estimating the CIR through deconvolution of a known stimulus such as wideband music, sine sweeps, or Gold codes. Compared with RF techniques like Bluetooth CS or UWB, acoustical techniques benefit from proportionately lower phase noise, higher bandwidths (many octaves vs. fractions of an octave), and slower signal propagation. TCL, Hisense, and LG TVs implementing Dolby Atmos FlexConnect typically locate wireless speakers with an accuracy of 5-15 cm in living room environments with a 2-element array. AoA can also be estimated with sufficient precision throughout the horizontal plane; this is important for the case when loudspeakers are placed to the side of a TV and error variance is maximized.
Ultrasonic arrays may be used for positioning outside the audible frequency range, additionally benefiting from smaller array form factors due to the shorter wavelengths. They are, however, sensitive to occlusion and require sampling rates that may not be supported by capture pipelines designed for wideband speech.
mmWave Radar
Millimeter waves (mmWave) are signals with wavelengths at the millimeter scale. They readily reflect off physical objects and the delay in reflections can be used for localization of objects, sensing of the environment, and observing changes to the physical space in which mmWave devices are placed. Due to their short wavelength, these signals do not penetrate through thick solid objects, making them most suitable for room-scale monitoring.
Of particular interest for localization and tracking (instead of mmWave communication) are mmWave Radars using 24−27 GHz, 60 GHz, and 77 GHz bands. These systems typically use frequency modulated continuous wave (FMCW) signals in which the carrier frequency is swept over a wide bandwidth in a carefully controlled ramp, with the receiver collocated with the transmitter. Although the transmitted signal may be many orders of magnitude stronger than the returning reflections, the receiver can recover weak echoes by mixing the received signal with a copy of the transmitted sweep. This converts the propagation delay of each reflection into a lower-frequency beat signal, whose frequency is proportional to the distance travelled by the reflected path. By analyzing these beat frequencies across one or more antennas, the radar can estimate range, velocity, and angle of arrival, producing point-clouds of nearby objects and motion. Fine range resolution becomes possible due to the large bandwidths available at these frequencies.
One of the main limitations of radar-based localization is the difficulty in identification of specific objects or people. As a result, mmWave Radar is used in vehicles for detecting nearby objects, estimating their distance and relative velocity, and supporting functions such as adaptive cruise control, collision avoidance, and parking assistance. In these use-cases, the radar does not necessarily need to know the identity of each object, but rather only needs to categorize the obstacle into a general class, such as a pedestrian, object on road, or another car. In the home, products such as the Aqara Presence Sensor are marketed as tools for home automation through human presence detection.
Computer Vision
Optical / camera-based systems are a class of relatively mature and accurate techniques with the potential for sub-mm accuracy over large volumes. OptiTrack-class systems frequently provide the ground truth against which Bluetooth CS and UWB are validated, although they require retroreflective markers and multiple IR cameras to triangulate 3D position. Structured light cameras, such as those used in Microsoft Kinect and Intel RealSense, deliver accuracy of 1-5 mm; time of flight (ToF) cameras like Microsoft Kinect v2 deliver around 5-15 mm. Both have found varied applications in 3D scanning, gesture recognition and robotics manipulation. Stereo vision can provide 1-10 mm at close range, degrading with distance. LiDAR like the Velodyne HDL-64 and the varied types used on autonomous vehicles can achieve 2-5 cm over very large ranges but are limited by the need for mechanical rotation. Solid state LiDAR like the Livox Mid-360 has a narrower field of view, and 2D LiDAR like the SICK TiM uses a single-plane laser giving 1-3 cm accuracy for use cases like robot navigation. As a general rule, narrow field of view, sensitivity to occlusions, and privacy concerns limit the applicability of many computer vision techniques for home use.
Sensor Fusion and Robustness
While there has been significant research on each individual localization technology, no one technology is perfect; each has its own limitations. Fusion between technologies therefore becomes a promising alternative where limitations of one technology can be overcome by another. A popular sensor for fusion is the inertial sensor. Alone, inertial sensors drift over time, but when combined with periodic reset logic, inertial sensors can provide a privacy preserving, low-power localization primitive. Inertial sensors are combined with computer vision-based techniques to reduce the power needs of continuous camera operations. Combined with Wi-Fi or UWB, inertial sensors provide a fast tracking mechanism to bridge the time gap between successive ranging sessions. Sequential measurements may benefit from Kalman or particle filtering to detect outliers and estimate the trajectory of moving targets. Apple iPhones fuse UWB, camera, and inertial data with temporal filtering to calculate range and direction of an AirTag.
Map Building
With the exception of OptiTrack, we have so far considered only the estimation of range and optionally angle of a tag relative to a single anchor. A system for robust indoor localization and tracking requires infrastructure consisting of multiple anchors using redundancy to address the problems caused by occlusion and multipath. With multiple anchors in known locations, trilateration may be used to locate a tag by the intersection of circles (or spheres) whose radii are derived from the estimated ranges of the tag. GPS operates on this principle by receiving timestamps from synchronized satellites. Alternatively, triangulation uses the intersection of lines derived from the AoA. A practical implementation may use numerical optimization to estimate locations with both techniques.
It is reasonable to assume that the locations of the anchors are known in many industrial applications. However, this is unlikely to be true for ad hoc networks like smart devices in the home. Mapping is the process of determining the anchor locations through a calibration procedure. Care must be taken to determine an appropriate coordinate system and to address symmetries that may arise due to degeneracy in the anchor locations and the range/AoA data they provide, especially in the presence of occlusion or multipath.
Privacy and Security
The extent to which privacy in indoor localization is a requirement depends heavily on the application. A person using an indoor localization infrastructure for navigation may not wish their own location to be known to the infrastructure. In contrast, an object being tracked inside a warehouse or shop floor may have no privacy requirement. Protocols and infrastructure that are inherently secure and privacy preserving can enable both kinds of localization use cases.
Several security measures have been taken to protect raw ranging data. Unlike RSSI, RTT makes it impossible for an attacker to simply intercept and amplify the signal to trick an access point, as any delay introduced by the attacker is immediately detectable. This prevents attacks on keyless entry systems used by cars; on a lighter vein, here is a YouTube video demonstrating this attack. In Bluetooth CS, the pseudo-random channel hopping sequence cannot be predicted easily by an adversary. Bluetooth CS also makes anti-spoofing provisions because a malicious device that could predict the hopping sequence could potentially inject fake distance measurements. Both Bluetooth CS and UWB employ physical layer protection to prevent bad actors from spoofing or tampering with the ranging sequence.
Some critical tensions are worth noting. Efficiency gains increase the risk of surveillance normalization. Personalization may bring data commodification. A clear challenge for the indoor localization and tracking industry is to build public trust rather than erode it by demonstrating benefits that outweigh the potential privacy risks. A framework that clearly demarcates the privacy risks and explains these to users of localization facilities might help develop public trust and provide better control in the hands of the users.
Conclusion
So are we on the cusp of a Mission: Impossible-style future? Kind of. The physics of occlusions, multipath, and signal bandwidth will always be limiting factors that might be addressed through a combination of sensor fusion, redundancy, temporal filtering, or by exploiting certain characteristics of a specific use case. Security measures taken to prevent hacking and spoofing make the examples from the movies seem all the more far-fetched. The extent to which the public will accept these technologies will be dictated by the trust they have in the companies that make them, and the convenience that comes from new compelling use cases, ubiquity, and price driven by broad adoption of the upcoming standards. The movies have nevertheless given us many exciting uses for indoor location and tracking, and we are excited by the possibilities that seem tantalizingly close to practical reality.
Further Reading
Indoor localization, tracking, and navigation have been a topic of interest for several decades. The literature on this topic is vast and spans several academic venues and industry facing literature.
Journals
IEEE Journal of Indoor and Seamless Positioning and Navigation
IEEE Transactions on Mobile Computing
IEEE Transactions on Instrumentation and Measurement
Journal of Location Based Services
Conferences
International Conference on Indoor Positioning and Indoor Navigation
IPSN: ACM/IEEE International Conference on Information Processing in Sensor Networks
Biographies
Mark R.P. Thomas is an Editor for IEEE SPS Industry Signals and a Principal Researcher at Dolby Laboratories. His research background is in all things audio from DSP to UX, leading a research group working on the capture, creation, coding, transporting, perception, and rendering of spatial audio for both professional and consumer. Dr. Thomas received an MEng degree in Electrical and Electronic Engineering from Imperial College London in 2006 and a PhD in Glottal-Synchronous Speech Processing in from the same institution in 2010.

Ashutosh Dhekne is an associate professor at the School of Computer Science at Georgia Tech. His research interests include wireless networking, wireless localization, and sensing. Dr. Dhekne received his Ph.D. from the University of Illinois at Urbana-Champaign in 2019. He is a recipient of the NSF Career award.

