Tracking, Geolocation and Digital Exhaust

You are unique… In so many ways…

The accounting systems on which modern society depends are surveillance systems when viewed with another lens. All administrative, financial, logistics, public heath, and intelligence systems rely on the ability to track people, objects, and data. Efficiency and effectiveness in tracking have been greatly aided by improvements in data analysis, computational capabilities, and greater aggregations of data.

Advances in social network analysis, traffic analysis, fingerprinting, profiling, de-anonymization/re-identification, and behavioral modeling techniques have all contributed to better tracking capabilities. In addition, modern technological artifacts typically contain one or more unique hardware device identifiers. These identifiers—particularly in mobile devices, but also RFIDs, and soon Intelligent Vehicle-Highway Systems—are widespread, but also effectively unmodifiable and relatively unknown to most of their owners. For example, with mobile devices, each network interface (cellular, Bluetooth, WiFi) requires a minimum of one unique hardware identifier—all uniquely trackable. One hand, aggregating these unique identifiers allows services like Google, Skyhook, and others to associate geolocation data with WiFi access points and provide useful services. On the other hand, Samy Kamkar’s work described in Hack pinpoints where you live: How I met your girlfriend shows the potentially awkward and invasive side effects.

Individuals generate transactional data from common interactions offline such as card key systems and nearly every online transaction. Improvements in techniques to correlate disparate data as well as techniques to analyze the unique characteristics of software, hardware, network traffic to form a fingerprint is frequently unique. For example, a large-scale analysis of web browsers from the Panopticlick project showed that over 90% of seemingly common consumer configurations were effectively unique. IP geolocation data can be used to increase security as with Detecting Malice with ModSecurity: GeoLocation Data or it can be used in ways that are quite Creepy.

Another major shift is the widespread collection and aggregation of geolocation information from mobile devices. Location can be a highly unique identifier, even if the mobile device changes. Philippe Golle and Kurt Partridge show that two data points sampled during the day—one at home and one at work are enough to uniquely identify many individuals, even in anonymized data. Geolocation data can also reveal significant information about the people spend time with and a view of their social network. Jeff Jonas sums this up well in Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food! In a sense the mobile phone has caused an enormous increase in uniquely identifiable data that can be used for tracking.

An average person now generates a constant stream of geolocation data that is collected by mobile carriers. Geolocation information is generated from cellular triangulation, geolocated IP addresses, and integrated GPS units, which deliver down to 10 meter accuracy. Geolocated mobile transaction data aggregated across multiple carriers is increasingly available for commercial use. It is possible to accurately track large numbers of individuals in constrained environments simply by sniffing the ITMI (temporary ID) as Path Intelligence does in mall, although they could sniff the IMEI just as easily, but they say they do not to protect privacy. Still, large-scale analysis of geolocation data is in its infancy. ReadWriteWeb describes how Developers Can Now Access Locations of 250 Million Phones Across U.S. Carriers

Tracking technologies—particularly when combined with geolocation information—have matured far beyond tracking individuals and are rapidly becoming capable of tracking groups and larger populations, which could be applied to entire enterprises or political organizations. Tools and techniques have made it feasible to correlate geolocation information, commercially aggregated profiles of online use, digital fingerprints, and offline transactional data. In addition, analysis of current anonymization techniques has repeatedly shown that simply adding another source of data is enough to re-identify a large percentage of the population. The Spatial Law and Policy blog is doing a nice job of tracking the policy implications of geolocation data.

The immense potential value of geolocation and other tracking data may well provide enough incentive for it to be used in ways counter to our own interests. Potential threats for misuse of the data need to be taken into account when designing systems. For example, what is the value of highly accurate logistical data about a US corporation derived from geolocation data and social network analysis to a foreign industrial competitor? Even a small amount of data that allowed a rudimentary analysis of external individuals meeting with internal high-level executives would be a worthwhile target. Similarly, both foreign industrial interests and foreign states may be willing to spend significant resources to acquire details on the movements and meetings of political parties.

More broadly I have been thinking about the question—What does it mean for a third-party to acquire better logistics about an organization than the organization has itself? What are the policy implications when and if these tracking tools are deployed in places without the rule of law, stable transitions of government, and low levels of corruption that we assume in the US? Could changes in the design and implementation of these systems mitigate the risks outlined? For example, should these design changes include internal controls, data scrubbing capabilities, and user interfaces that more clearly indicate a big picture of what data is being given off. Are there behavioral strategies that would reduce risks? To what extent can user education reduce risk?

You should follow me on Twitter.