Password Managers Relieve Password Headaches

Passwords Are a Hassle

I’ll be the first to admit I can’t remember all my passwords. Most of us can’t, so we pick a few passwords that are easy to remember and then use them with multiple sites. This results in two immediate problems. A password manager can help with both of these problems. First, passwords that are easy to remember are typically also easy to guess. Second, a compromised password is a risk to every site where it has been reused. A password manager both of these problems since it can generate a secure and unique password for each site, but only requires that you remember a single password to unlock the database. While it is possible, to create passwords that are secure and memorable, it is more difficult to do this with the significant number of passwords we frequently use in modern life. I detailed some additional problems with passwords in previous articles Your NYE Resolution—Pick Better Passwords and Data Evaporation and the Security of Recycled Accounts. I find that password manager with solid browser integration is well worth the initial setup time and expense.

While there are many good options, my password manager of choice is 1Password from AgileBits that is available for Mac OS X, Windows, and the iPhone, iPad, iPod Touch. I consider it an indispensable tool and I use it daily both on my desktop and my phone. 1Password integrates with many popular browsers, which makes logging into web sites faster and more convenient. The application allows me to easily switch between multiple browsers and multiple devices without worrying, which browser I might have saved a particular password.

When I first looked at 1Password in 2006, I thought there was no way I would be willing pay for it since all modern browsers ship with password management functionality. Shortly after I started testing the application I found it so convenient, I changed my mind and purchased it. Nearly six years and many major upgrades later, I have no regrets. I have nearly eight hundred logins saved in 1Password. Even though I regularly clean out duplicates and entries for dead services, this is still a ridiculous number of accounts. Look at it this way, I test services so you don’t have to.

We All Forget Passwords

A 2007 paper A Large-Scale Study Of Web Password Habits of more than half a million users found that about 1.5% of all Yahoo! users forgot their password each month. Yahoo Mail alone has more than 200 million accounts, so this is a significant number. The authors found that the “average user has 6.5 passwords, each of which is shared across 3.9 different sites. Each user has about 25 accounts that require passwords, and types an average of 8 passwords per day.”

Complicated Passwords and Compact Keyboards Don’t Mix

The current crop of smartphones ship with highly capable browsers, but entering lengthy passwords on a phone keyboard is even more error prone and frustrating on the desktop. Here again, a password manager can reduce the complexities of entering many different password strings on a mobile device. The application allows you to make a mobile keyboard optimized and possibly simplified password that protects your longer more complex passwords and notes. This is of course a security tradeoff.

Mobile Safari on the iPhone and iPad does not permit plugins, so the 1Password application on iOS devices embeds a browser that is able to offer the automatic login feature. I prefer the default browser, but unfortunately there is no option for direct integration. The 1Password bookmarklet makes it relatively quick to look up an entry in the database and then copy and paste long passwords from its database far more easily than trying to type them in by hand

Other Advantages of 1Password

I regularly use multiple browsers. I also frequently delete my cookies and browser settings when I test services. This would typically cause a nightmare of needing to re-authenticate to each web site where I deleted the cookies. Since all of my login information is stored in 1Password rather than the browser, I don’t have to care about which browser I am currently using or even if my cookies still exist.

Since 1Password is also a general form filler it can cope with login forms that have partial entries or multi-stage. For example, many services require that users re-enter their password to access account management features even if they are already logged in. This is to prevent another person from simply walking up to your unattended computer from viewing or making changes to billing information, email forwarding, and passwords. In most cases, 1Password is able treat the re-authentication sign in forms exactly like a standard sign in form.

Some sign in forms are multi-stage where login process is split across several forms. For example, many online banks are multi-stage sign in forms. In the first stage, the user enters a username and their browser must acquire a cookie from the bank. If the user does not already have a cookie from a previous session, the user must enter a second authentication factor such responding to a text message with a unique code or entering the code from a hardware token. Next, on a second form on a separate page the user enters a password.

In cases where 1Password is confused by multiple stage forms, the work around for this type of site is to simply make two separately named entries in 1Password. For example, the first entry would contain the username and the second entry would contain the password. The user must go through the full sign in process the first time to received a cookie from the bank by completing the two-factor authentication process and has create a 1Password entry for each step in the form. Each subsequent login to the bank will be treated like all other sites and can be automated with the auto-login and auto-submit features.

Here is a small laundry list of other features I regularly use and appreciate about 1Password.

  • General form saving support. 1Password can save and replay many kind of web forms, which is a useful feature if you find yourself filling out the same information over and over again.
  • Support for “identities” where the application stores commonly used bits of information such as name, email, phone number and can populate this information into many types of forms with little effort.
  • Basic anti-phishing protection since by default 1Password will only post usernames, passwords, and other forms back to the same domain name as the original.
  • The application can generate random passwords with several different templates that will satisfy most password requirements.
  • In addition to usernames, passwords, forms and identities, 1Password also supports encrypted notes.
  • The Mac OS X desktop application will sync over the local wired network and WiFi for iOS devices
  • 1Password will sync with Dropbox for all desktop and mobile applications including Windows and Android

Limitations of 1 Password

There are several important limitations with 1Password. The application cannot handle login forms built with Adobe Flash. Previous generations of 1Password supported login forms with HTTP basic authentication, however the new plugin architecture for Safari and Chrome do not offer support for HTTP basic. AgileBits says it is working on a solution for Firefox.

The features of the Windows version of 1Password are not quite yet on part with the Mac, for example it only supports 32-bit Internet Explorer, 32-bit Firefox, Chrome, and Safari. This said that covers most browsers that user’s need.

Pricing

1Password for Mac and 1Password for Windows is $49.99, 1Password Pro is $14.95 is available for iPhone, iPad, and iPod touch.

1Password Bookmarklet Gone Missing

If you are a frequent 1Password user, particularly on iOS devices, you may have noticed that AgileBits discontinued support for the 1Password bookmarklet, which was the best option for integrating with Mobile Safari rather than the integrated browser in the application. Fortunately, Kevin Yank and * have produced a working 1Password bookmarklet. I have reproduced it here:

javascript:window.location='onepassword://'+window.location.href.substring(window.location.href.indexOf('//')+2)

You should follow me on Twitter.

The World is Not Flat and Neither Are Social Networks

Now that I and the rest of the Internet has grown accustomed to Google Plus and Facebook’s most recent friend categorization features, I thought it was time to revisit and revise a previously unpublished piece of mine. Take a moment and think about your friends, family, colleagues, friends of friends, acquaintances, and members of the same social club. These six groups could comprise a large part, but certainly not all, of the people that you know. You may also have extended family, classmates, common members of sports teams, religious associations, and the familiar strangers you recognize, but don’t know their names. To further complicate matters, the people in these groups often change over time as we move through life. How we conduct ourselves depends on the situation. It is highly unlikely that you act the same way around your grandmother as you do at a party with your friends and people do not expect you to act the same way. Your friends, work colleagues, and extended family do not all know each other and I suspect that in many cases you would like to keep it that way. For this reason, it seems odd to expect that our interactions in online social networks would be any different.

I had the final word in Erica Naone’s Technology Review article Can Google Get Social Networking Right?. Naone’s piece argues that Google needed to dramatically improve its social offerings to compete against Facebook. She asked me to comment on Google’s social services such as Buzz and Profiles and how they might interact with user’s search history. It is interesting to see how much the discussion has changed since the article appeared. Disclosure: I worked as an engineering intern on Google Accounts during 2005-2006, but this was well before any of Google’s social options existed. I responded with a discussion of broad problems I saw with social network services. The following quote in the Naone’s article mostly reflects my statements, although the quote makes it appear that I am singling out Facebook for criticism, which misses the point that I think this is a fundamental problem across many social networks.

“Facebook, meanwhile, has its own problems, and some of these could turn out to be opportunities for Google. Ben Gross, an expert in online identity, notes that Facebook and other social networks don’t accurately differentiate between people’s social connections, making their social graph information less valuable to users and advertisers. For example, social networks tend to put all of a user’s connections into a single group of “friends,” and expect users to manage complex privacy settings to sort out family, work connections, and bar buddies. “Social network services should not assume that networks are flat, or that people are willing to put in the effort to articulate these networks or that they even want to,” he says.”

My full response from which the quote was taken follows below. I fixed a few typos, but it is otherwise unedited.

“I see several consistent problems with many of the social network services. First, they often unify disparate social networks in ways that do not match people’s actual experience and may not even make sense to them. In order to have a real representation of people’s social networks, they would have to fully articulate these networks to the service, which is a pretty unnatural thing to do. For many people the edges of the network shift regularly. Most social network services do not make it easy to maintain multiple independent networks on the service. It is common for people to maintain independent social networks, where individuals may not want the networks unified and people may not even care or wish to know about the other networks. For example, one’s extended family vs. one’s work colleagues vs. one’s friends they have brunch with on the weekend. The idea that there is a single flat network is sort of ridiculous.

I often hear people say that people who want to maintain independent identities or networks are somehow up to no good. I have interviewed quite a few people about this topic for my dissertation. It’s clear that people’s lives are complicated and their identifiers and networks reflect this. If you think about it, it is not at all strange for someone to want to separate their work life, from their family life, from their friend, or all manner of combinations. The boundaries of these relationships shift and behaviors vary widely. Social network services should not assume that networks are flat, that people are willing to put in the effort to articulate these networks, or that they even want to. Also for many people, they may have portions of their network that they are connected to online and therefore the online representation of their network may be very skewed. Even if people are connected to multiple networks online, they may use different social network services for different social networks. For example, it is not unusual for people to primarily have email conversations with some connections, use AIM for others, Google Talk for others, SMS for another group, and Facebook for yet another. Each service would be missing the chunk of connections for the other service.”

You need context to create a meaningful representation of a person’s social network. To make matters worse, that context shifts constantly as do peoples social relations, particularly those with whom we have weak connections. This is why people often see online social network representations as a cartoonish view of their own complex and ever changing social worlds. This is not a new revelation about social relations. William James published the following in 1890.

Properly speaking, a man has as many social selves as there are individuals who recognize him and carry an image of him in their mind. To wound any one of these his images is to wound him. But as the individuals who carry the images fall naturally into classes, we may practically say that he has as many different social selves as there are distinct groups of persons about whose opinion he cares. He generally shows a different side of himself to each of these different groups. Many a youth who is demure enough before his parents and teachers, swears and swaggers like a pirate among his ‘tough’ young friends. We do not show ourselves to our children as to our club-companions, to our customers as to the laborers we employ, to our own masters and employers as to our intimate friends. From this there results what practically is a division of the man into several selves; and this may be a discordant splitting, as where one is afraid to let one set of his acquaintances know him as he is elsewhere; or it may be a perfectly harmonious division of labor, as where one tender to his children is stern to the soldiers or prisoners under his command.

It is important to recognize that forcing people interact with their social relations as a flat network has many undesirable consequences. Figuring out how to restore a more natural balance to social relations is a grand challenge for social networks. People we think of as friends, enemies, and acquaintances change over time as friendships intensify and cool and we move through life phases. Also, complete visibility in networks is not always desirable or healthy. When we remove people’s choice to disclose their relationships and group memberships we strip them of something that is fundamentally human. We provide people with only one option for presenting themselves at a time denies them an important means of self-expression that is also fundamentally human.

I find it heartening to see how much has improved over the last year as both Google Plus and Facebook have dramatically improved the situation in allowing us more options to interact naturally with different social spheres. Framing choices about self presentation as choices about privacy misses the point that the issue is usually about context. When social networks lack context, it forces people to articulate everyone that should be included or excluded from a particular interaction. In these cases, the cognitive overhead of potentially making this judgement for each interaction is staggeringly high. Unless you are a public figure, you likely never need to decide if what you say is appropriate or even remotely interesting to someone you went to grade school with, someone you went to college with, a work colleague, your aunt, your next door neighbor, and a dear friend. We should not force people to work this hard unnecessarily.

References

danah michele boyd. Friendster and publicly articulated social networking. In CHI ‘04 extended abstracts on Human factors in computing systems, pages 1279–1282, New York, NY, USA, 2004. ACM. Articulated Social Networks: An Ethnographic Study of Friendster

Erving Goffman. Presentation of Self in Everyday Life. Anchor Books, New York, 1959.

Francesca Grippa, Antonio Zilli, Robert Laubacher, and Peter A. Gloor. E-mail may not reflect the social network. In Proceedings of the North American Association for Computational Social and Organizational Science Conference, 2006.

Ido Guy, Michal Jacovi, Noga Meshulam, Inbal Ronen, and Elad Shahar. Public vs. private: Comparing public social network information with email. In CSCW ‘08: Proceedings of the ACM 2008 conference on Computer supported cooperative work, pages 393–402, New York, NY, USA, 2008. ACM

Kai Fischbach, Peter A. Gloor, and Detlef Schoder. Analysis of informal communication networks – a case study. Business & Information Systems Engineering, 1:140–149, 2009.

William James. The Principles of Psychology, volume 1. Henry Holt & Co., 1890

Hat tip to Gaurav Mishra whose similar titled article The World is Not Flat and Neither is the Social Web (site is currently offline), from 2008 I found after I finished writing this post.

You should follow me on Twitter.

Tracking, Geolocation and Digital Exhaust

You are unique… In so many ways…

The accounting systems on which modern society depends are surveillance systems when viewed with another lens. All administrative, financial, logistics, public heath, and intelligence systems rely on the ability to track people, objects, and data. Efficiency and effectiveness in tracking have been greatly aided by improvements in data analysis, computational capabilities, and greater aggregations of data.

Advances in social network analysis, traffic analysis, fingerprinting, profiling, de-anonymization/re-identification, and behavioral modeling techniques have all contributed to better tracking capabilities. In addition, modern technological artifacts typically contain one or more unique hardware device identifiers. These identifiers—particularly in mobile devices, but also RFIDs, and soon Intelligent Vehicle-Highway Systems—are widespread, but also effectively unmodifiable and relatively unknown to most of their owners. For example, with mobile devices, each network interface (cellular, Bluetooth, WiFi) requires a minimum of one unique hardware identifier—all uniquely trackable. One hand, aggregating these unique identifiers allows services like Google, Skyhook, and others to associate geolocation data with WiFi access points and provide useful services. On the other hand, Samy Kamkar’s work described in Hack pinpoints where you live: How I met your girlfriend shows the potentially awkward and invasive side effects.

Individuals generate transactional data from common interactions offline such as card key systems and nearly every online transaction. Improvements in techniques to correlate disparate data as well as techniques to analyze the unique characteristics of software, hardware, network traffic to form a fingerprint is frequently unique. For example, a large-scale analysis of web browsers from the Panopticlick project showed that over 90% of seemingly common consumer configurations were effectively unique. IP geolocation data can be used to increase security as with Detecting Malice with ModSecurity: GeoLocation Data or it can be used in ways that are quite Creepy.

Another major shift is the widespread collection and aggregation of geolocation information from mobile devices. Location can be a highly unique identifier, even if the mobile device changes. Philippe Golle and Kurt Partridge show that two data points sampled during the day—one at home and one at work are enough to uniquely identify many individuals, even in anonymized data. Geolocation data can also reveal significant information about the people spend time with and a view of their social network. Jeff Jonas sums this up well in Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food! In a sense the mobile phone has caused an enormous increase in uniquely identifiable data that can be used for tracking.

An average person now generates a constant stream of geolocation data that is collected by mobile carriers. Geolocation information is generated from cellular triangulation, geolocated IP addresses, and integrated GPS units, which deliver down to 10 meter accuracy. Geolocated mobile transaction data aggregated across multiple carriers is increasingly available for commercial use. It is possible to accurately track large numbers of individuals in constrained environments simply by sniffing the ITMI (temporary ID) as Path Intelligence does in mall, although they could sniff the IMEI just as easily, but they say they do not to protect privacy. Still, large-scale analysis of geolocation data is in its infancy. ReadWriteWeb describes how Developers Can Now Access Locations of 250 Million Phones Across U.S. Carriers

Tracking technologies—particularly when combined with geolocation information—have matured far beyond tracking individuals and are rapidly becoming capable of tracking groups and larger populations, which could be applied to entire enterprises or political organizations. Tools and techniques have made it feasible to correlate geolocation information, commercially aggregated profiles of online use, digital fingerprints, and offline transactional data. In addition, analysis of current anonymization techniques has repeatedly shown that simply adding another source of data is enough to re-identify a large percentage of the population. The Spatial Law and Policy blog is doing a nice job of tracking the policy implications of geolocation data.

The immense potential value of geolocation and other tracking data may well provide enough incentive for it to be used in ways counter to our own interests. Potential threats for misuse of the data need to be taken into account when designing systems. For example, what is the value of highly accurate logistical data about a US corporation derived from geolocation data and social network analysis to a foreign industrial competitor? Even a small amount of data that allowed a rudimentary analysis of external individuals meeting with internal high-level executives would be a worthwhile target. Similarly, both foreign industrial interests and foreign states may be willing to spend significant resources to acquire details on the movements and meetings of political parties.

More broadly I have been thinking about the question—What does it mean for a third-party to acquire better logistics about an organization than the organization has itself? What are the policy implications when and if these tracking tools are deployed in places without the rule of law, stable transitions of government, and low levels of corruption that we assume in the US? Could changes in the design and implementation of these systems mitigate the risks outlined? For example, should these design changes include internal controls, data scrubbing capabilities, and user interfaces that more clearly indicate a big picture of what data is being given off. Are there behavioral strategies that would reduce risks? To what extent can user education reduce risk?

You should follow me on Twitter.

Data Evaporation and the Security of Online Identities

Disappearing Data

What happens to our data when we are gone? What happens to us, when our data is gone? Does any of this missing data make us vulnerable? These questions that once seemed theoretical are increasingly relevant to our everyday lives. The consequences include not only the potential for lost communications, but also lost data in cloud services, and risk for security breaches for individuals and businesses alike.

We all understand that data deteriorates along with the physical media it is stored on–photographs fade and hard disks crash. This is why we have backups, or at least should have them. The problem is, unfortunately, not so simple these days as much of our data in the cloud depends on multiple systems and services acting in concert to exist. This means that data may disappear for reasons independent of the physical media, even with backups and replication.

I think evaporation is a useful analogy for describing the complex array of factors that cause data to disappear–including services going out of business, enforced retention policies, missed subscription payments, malicious deletion, and loss due to system migrations. One new problem is that the loss of modern data often includes not only documents and media on file systems, but also accounts and online identities.

Lost Data = Lost Access = Lost Identities

It is not a stretch to say our online identities are now essential for daily communication. As part of my dissertation research, I began to investigate the lifecycle–selection, increased use, decreased use, discontinuation, and points in between–of online identifiers including email addresses, instant messenger IDs, and social network services. I was particularly interested in what caused people to stop using their identifiers and if it was by choice. I found that often people lost access to identifiers for reasons out of their control, such as account lockouts, account inactivity, and failure to renew subscriptions. There is often a limited window of time before that data begins to evaporate due to account inactivity or missed payments for a service.

I began to look at the policies from major service providers related to inactive accounts. The policies I found were conflicting, inconsistently presented and followed, and are evolving rapidly. Email services tend to mark accounts inactive, while social networks do not. Paid email accounts do not have activity requirements.

Here are some of the policies from large providers of webmail and other services:

  • AOL: May mark free email account as inactive after 30 days and data may be deleted.
  • Gmail: Marks account as inactive after six months. Inactive accounts may still receive email. After nine months of inactivity, addresses may be deleted. Deleted addresses are not recycled or recoverable.
  • Hotmail: Microsoft says free Hotmail accounts will become inactive after 270 days or if you do not log in for 10 days after creating the account. Inactive accounts will not receive email. Account names may be deleted after 360 days of inactivity and Window’s Live IDs may be deleted after 365 days of inactivity. I also found conflicting documents on the – Microsoft site that said Hotmail accounts might be marked inactive after 30 days or 120 days of not logging in.
  • Yahoo: Deactivates free email accounts after four months. After this time, accounts may be reactivated, however any existing email is deleted and cannot be recovered.

Security and Recycled Identifiers

Depending on the circumstances, services may recycle expired accounts. This means that old identifiers may have new owners. The consequences may be much more than needing a new email address after forgetting to renew a domain name or the loss of a loved one’s letters after an account becomes inactive. There are serious security and privacy implications ranging from potential identity theft to corporate espionage.

If your old email address ends up with a new owner, that new owner will receive any email that was once destined for you. Why is this a problem? Suppose that email address was listed as the primary address or the recovery address for another account. Most systems send either one-time links to reset passwords, or worse, the password in plain text to the email primary or recovery email address. Unfortunately, people tend to reuse passwords across accounts. It is also not uncommon for people to list the older email address as the recovery address for a newer email account, meaning it would be possible to reset the password for a new account as well. Gaining access to an individual’s primary email account is the key to gaining access to most other accounts.

This is a not a theoretical problem. In 2009, Twitter’s internal systems were compromised when an attacker systematically evaluated Twitter employee’s personal accounts looking for potential points of access. The attacker realized that one employee registered a Gmail account using a Hotmail account that had since been marked inactive.

Hotmail recycled the Twitter employee’s account as it had been inactive more than a year and so the attacker simply registered the old username and then used it to reset the current Gmail password. The attacker then found messages in the Gmail account that contained plain text passwords and correctly guessed that the password had also been the Gmail password and simply reset the password to the old password to remain unnoticed. The hacker then used his access to the Gmail account and passwords to compromise other personal accounts of the employee and then those of other employees. One compromise led to another and eventually the hacker gained access to internal Twitter systems. He downloaded hundreds of internal documents, posted screen shots proving his exploits and released more than 300 internal documents to Techcrunch.

Domain Names

The rules and policies under which domain names expire and may be transferred to other parties are complex and vary widely–both by registrar, TLD, and ccTLD–but in general this is not much more than two months and after two to three months the domain will be resold. Here is a brief overview to give you a sense of the time frame and the complications related to expiring domain names.

When the owner of a domain fails to pay, the domain is typically assigned an “Expired” status usually lasting between 30 and 45 days. During this time the domain is usually renewable, but may not be accessible or transferable. Afterwards the domain enters what is known as the Redemption Grace Period (RGP), which is 30 days. Individual details are removed from the WHOIS database and the DNS are deleted so the domain is inaccessible. During the RGP, no edits or transfers are allowed, although the domain may be restored by paying the registrar a fee of $100-$250 USD. After this time, the domain is assigned a “Pending Delete” status, which lasts for five days. At the end of this period, the domain is generally either placed up for auction or released to the general registration pool.

Once a domain is reregistered, the new domain owner may create addresses and Web pages that match the old ones. Domains of defunct businesses may have potentially hosted many email accounts. As with the Twitter breach, these accounts could potentially lead to the compromise of other accounts.

Risk Analysis

The following are some risks to consider, and a few thoughts on how to mitigate those risks.

Potential Risks

  • A complex web of interlocking accounts and systems may affect your risk of a security breach.
  • Do not disregard the risk of “low value” accounts, as they may allow access to more sensitive accounts.
  • Inactive accounts may introduce as much liability as accounts with weak passwords.
  • Best practices may demand a clear separation of business and personal accounts and data, but there are often lapses in the real world.

Suggestions to Mitigate Risk

  • Document usernames and recovery addresses for each account.
  • Set recurring calendar tasks for account renewal payments and to log into infrequently used accounts.
  • Consider purchasing a subscription for infrequently used email accounts used as recovery addresses.
  • Consider using a password manager to generate and store unique strong passwords for each site.
  • Services should never send passwords in plain text.
  • Services should not allow password changes to recently used passwords.
  • Services should offer more notification options about accounts with a pending inactive or deleted status.

OpenID Trends: Improved Usability and Increased Centralization

The OpenID authentication framework is the most well known of the federated user-centric identity systems. OpenID has effectively become the first commonplace single sign-on option for the Internet at large. Most sizeable Web-based service providers such as AOL, Google, Facebook, Microsoft, MySpace and Yahoo! have integrated at least limited support for OpenID. Services often run OpenID authentication side-by-side with their in-house developed authentication or as an alternate method of authentication. Once the user has authenticated via their OpenID provider, their credentials can be used to automatically sign the user into other services previously linked to their OpenID. Widespread support has made OpenID the de-facto authentication mechanism for low-value transactions on the Web.

Two quick and somewhat loose definitions. An OpenID Provider is part of the backend of an identity system that offers an authentication services to other systems known as OpenID Relying Parties. Say your favorite blog requires that you log into Google to verify your identity to comment on a post. In this case Google would be the OpenID Provider (Identity Provider is the generic term) and your favorite blog would be the Relying Party since it depends on Google to handle the details of authenticating you so you can post.

Usability

OpenID has made great improvements in usability in the last several years. Many people found early OpenID implementations confusing. Users needed to first enter the URL that served as their OpenID identifier such as http://username.openidprovider.com. Without an existing cookie, users would have to enter their email address and password to complete the authentication. In addition, the users browser window was typically redirected to the OpenID provider’s site and then redirected back to the service they were trying to log into resulting in further confusion. Service providers found that the combination of URL-based identifiers and a login sequence differed from the entrenched standard of a username and password combination confused many people.

Each of these factors significantly reduced the usability of OpenID. However, OpenID specifications and implementations have evolved to mitigate and eliminate many of the usability problems. In many current deployments, users simply click on the logo their OpenID Provider (e.g., Google or Yahoo!) and then log in with familiar credentials without realizing the authentication is OpenID-based. One significant unsolved usability problem is that OpenID offers no support for Single Log Out. In the case of public or shared computers this situation is a significant security risk, as well as a usability problem, as subsequent users may find themselves signed in under the wrong user name when navigating to new sites.

User centric identity theoretically offers the end-user more control over his own identifiers, however in practice the amount of control is dependent on the amount of control the user has over the domain name or service of the OpenID URL. Users may maintain multiple OpenIDs and OpenIDs may be delegated. For example, an individual may wish to use a personal domain as an OpenID URL. The problem is this requires the skills to run the OpenID server as well as the overhead of maintaining and securing the server. There are two straightforward solutions to OpenID delegation, both of which require some technical facilities. The first–and most common–requires inserting a block of HTML containing the delegation commands on a Web page on the site being delegated to the OpenID Provider. The second requires adding an additional DNS CNAME for a host on the site that is being delegated to the OpenID Provider. Most individuals are highly unlikely to have this knowledge; the desire to obtain it, or even the knowledge that it exists.

Centralizing the Decentralized

OpenID was designed as a decentralized, federated, user-centric identity system. The OpenID infrastructure as a whole is decentralized. There are no dependencies on any single piece of hardware, software, service, individual, or company. The independent OpenID Foundation holds the intellectual property for the OpenID standards. The lack of dependencies removes the vulnerability of a catastrophic single point of failure.

I would argue that the common use cases for OpenID are increasingly centralized and realistic options for individuals to have any real control over their OpenIDs is decreasing. I recognize that some may argue with the last statement, but I would like to use a simple metric, which is the answer to this question: Can you take it with you? In the vast majority of common use cases, the answer is no. I would argue that the only viable way to have a true user-centric OpenID is to own a domain name and to have control over its DNS. The lack of end-user control does not mean the system functions any less efficiently, the opposite is quite likely true, but it does mean that it is not particularly user-centric.

In practice, OpenID appears to be heading towards greater centralization for Web-based authentication. Many services that offer OpenID authentication only accept authentication from a very limited set of OpenID providers. Services that accept OpenID authentication from any OpenID provider often place the general authentication in a less prominent location. Service providers have an incentive to limit authentication services they accept as it can significantly reduce risk and complexity and most users already have credentials from one of the major service providers. I believe this situation is not inherent to OpenID and would likely occur with any successful user-centric identity system. For example, Twitter does not support OpenID, rather it uses OAuth for both external authorization as well as authentication. Many services offer support for authentication via Twitter OAuth in the same interface as other providers that use OpenID.

Furthermore, most large OpenID enabled services are Identity Providers meaning they offer an authentication mechanism to other services. Most smaller OpenID enabled sites are OpenID Relying Parties meaning they accept authentication from OpenID Providers. OpenID Providers typically offer authentication services, but do not accept outside OpenID authentication themselves. Effectively, a few OpenID Providers serve many OpenID Relying Parties. Delegating the development and maintenance of user account management systems and password reset flows are benefits for offering authentication as an OpenID Relying Party. In addition these services gain the benefit of any advances in OpenID security and usability.

OpenID Increasingly Popular

In the close of my 2008 article: “The Promise and Problems of OpenID,” I wrote: “OpenID is clearly gaining in adoption and importance. Currently, OpenID is both too lightweight for enterprise identity management and too insecure for sites with financial or other highly sensitive data. Some of the current problems will be mitigated by OpenID extensions and new more secure mechanisms for OpenID authentication and improved phishing protection. Businesses, especially those with consumer Web-based services, would do well to familiarize themselves with the technology and pay attention to its progress.”

When people authenticate to poplar services via OpenID without having to even know they are using it, this indicates OpenID is becoming a mainstream authentication infrastructure. The protocol is evolving rapidly and it appears that common implementations in the future may be hybrids of OpenID and the OAuth authorization protocol. Still, there are substantial costs to implementing, managing, securing, and supporting user account management systems. Offering authentication as an OpenID Relying Party can potentially significantly reduce these costs and the friction for new account signups for people with existing OpenIDs. However, this reduction in cost comes with a loss of control over user account information that must be weighed against the benefits. Even though long-term stability for OpenID may be a ways off, it is clearly a critical technology to monitor.

* This article originally appeared as OpenID Trends: Improved Usability and Increased Centralization in the August 2010 issue of Messaging News.

Federal Digital Identity Proposal Lacking in Usability

The White House announced The National Strategy for Trusted Identities in Cyberspace (NSTIC) proposal and a NSTIC Fact Sheet on The White House blog. The NSTIC proposal (PDF) describes a plan to implement a federated online identity system with strong authentication. The document states the President expects to sign a final version in October 2010 and the strategy will likely significantly influence the government’s identity management efforts. In this post I will discuss the usability aspects of the proposal.

One of my primary concerns is that the proposal barely mentions usability factors within the identity system, even though they will be crucial for gaining public acceptance and critical to its effectiveness. Researchers studying usability and security have repeatedly shown that people are likely to resist or circumvent security in a system with poor usability. One of the guiding principles for the strategy is that “Identity Solutions will be Cost-Effective and Easy To Use.” However, the section is only a half a page long and largely discusses the potential benefit derived from reducing the number of username and password combinations individuals must remember. The section includes a few sentences that state that the new identity system should take advantage of as many existing widely used of infrastructure as possible and that service providers should conduct usability studies. The section leaves the reader with the impression that usability in actually unimportant even the proposal lists ease of use as listed as a major goal.

I would argue that most modern identity systems have been overly complicated for individuals to use and have required too much cognitive overhead for routine transactions. This is in no small part why it has been so difficult to move beyond the much-criticized username and password combination for user authentication. In order for a new identity system to provide significant improvements in reliability, assurance, security, and privacy, we must make significant improvements in usability. This is not a new problem. In his 1992 paper Observing Reusable Password Choices, Eugene Spafford, published research detailing problems with reusing weak passwords on multiple sites (Spafford 1992). In their 1999 paper Users are not the enemy, Adams and Sasse investigated compliance with security policies and in particular password management policies in several companies and found that compliance rates were substantially lower when policies conflicted with or prevented common work practices. In their 2006 paper Why Phishing Works, Rachna Dhamija and colleagues showed how individuals consistently fail to detect fraudulent web sites even when security indicators provided notifications that something was amiss.

Another component of usability is accessibility. The proposal made no mention of how the new identity systems will accommodate the less technically savvy and less able-bodied segments of the population. The strategy should consider those with limited vision, limited mobility, or other disabilities. The American Foundation for the Blind provides the following statistics of adult Americans with limited vision. Ages 18-44 8.0 million, ages 45-64 10.7 million, ages 65-74 2.8 million, ages 75 and older 3.7 million. This is a total of 25.2 million adults who have trouble seeing even with glasses or contact lenses.

The proposal promotes a federated and user-centric identity system. The common definition of a federated identity system is one that allows one service to accept authentication from another service. User-centric identity systems allow individuals some measure of control over their identities–typically a username or other unique identifier–and the attributes–age, email address, citizenship–attached to that identity. The usability problems for federated identities, user-centric-identities, and attribute exchange are neither trivial nor solved. OpenID is arguably the first widely adopted federated authentication mechanism for the internet with a user-centric model.

The history of OpenID is an excellent illustration of the usability challenges. Early incarnations required that users enter their OpenID URL to begin the authentication process. Their browser session was then redirected to the OpenID provider they used for authentication, which was often a different domain than the one they were attempting to log in to. Finally, after a successful authentication, the user would be redirected back to the original site. The change from the traditional username and password combination combined with a confusing authentication flow with multiple redirects left many users confused. OpenID specifications and implementations have evolved to mitigate and eliminate many of the usability problems. In many current deployments, most users will not even realize they are using OpenID for authentication, as they simply will click on a Google or Yahoo logo and then log in with familiar credentials.

This post is a revised version of the usability portion of the comments I submitted to the official NSTIC submission site. I based the critique on research from my dissertation Online Identifiers in Everyday Life, where I examined at the ways that social, technical and policy factors affect individual’s behavior with online identifiers.

The State of User Tracking and the Impossibility of Anonymizing Data

What we think is reasonable, commonplace, or even possible in terms of protecting or violating online privacy shifts constantly. Recent developments in tools and techniques for tracking online behavior and identifying individuals from supposedly anonymized data sets should cause us to reevaluate what is possible.

Katherine McKinley of iSEC Partners published a detailed analysis of how popular browsers and browser extensions handle cookies and other methods of local data storage used for tracking users in her December, 2008 paper Cleaning Up after Cookies (PDF). McKinley tested the ability for browsers and extensions to clear the private data as well as “private browsing” features. She found that most browsers attempted to clear previous stored private data, but often left some data accessible. She found that Adobe Flash did not attempt to remove this data and in fact stored it in such a way that it circumvented most privacy protections offered by browsers. iSEC Partners created an online version of the test used in the article to allow individuals to test their own configurations. It is available at Breadcrumbs Tracker.

The August, 2009 paper Flash Cookies and Privacy by Ashkan Soltani and Shannon Canty and Quentin Mayo and Lauren Thomas and Chris Jay Hoofnagle at UC Berkeley focuses directly on the privacy issues related to Flash Cookies. The authors survey the top 100 web sites according to QuantCast in July of 2009 and found that more than half of them used Flash cookies. The authors note that unlike standard HTTP cookies, Flash cookies do not have an expiration date and are stored in a different location on the file system that is harder to find. Most cookie management tools will not delete these type of cookies and they remain in place even when private browsing mode in enabled. The authors found that Flash cookies were frequently employed to track users that had explicitly attempted to prevent cookie tracking by using the Flash cookie to regenerate a HTTP cookie that had been deleted.

Most online services use multiple tracking services for analytics, performance monitoring, and usability analysis. A mixture of JavaScript-based tracking codes and cookies is the most common method for user tracking. The paper On the Leakage of Personally Identifiable Information Via Online Social Networks (PDF) presented at the ACM Workshop on Online Social Networks by Balachander Krishnamurthy and Craig Wills describes the techniques used by advertising firms and social networks services to track users and the types of information they release. The authors studied information leakage from twelve online social networks and found that the bulk of user information is released through HTTP headers and third-party cookies.

In his post Netflix’s Impending (But Still Avoidable) Multi-Million Dollar Privacy Blunder on the Freedom to Tinker blog, Paul Ohm discusses his 2009 publication Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization in the context of the announcement for the second Netflix prize to improve the accuracy of Netflix predictions. Ohm argues that it is not possible to anonymize the data and that it is irresponsible and possibly illegal to release it. Netflix released a half a million anonymized subscriber records for analysis in the original contest. The one million dollar prize resulted in significant numbers of researchers entering the contest.

Soon after the Netflix released the records, researchers Arvind Narayanan and Vitaly Shmatikov proved they could identify individual subscribers by combining the Netflix data with other databases. Their publication Robust De-anonymization of Large Sparse Datasets (PDF) presented at the 2008 IEEE Symposium on Security and Privacy describes How to Break Anonymity of the Netflix Prize Dataset. Narayanan and Shmatikov continued their research and demonstrated de-anonymizing social networks such as Twitter in De-Anonymizing Social Networks (PDF) (paper FAQ) a paper presented at the 2009 IEEE Symposium on Security and Privacy. Ohm, reminds the readers about the scandal that occurred in 2006 when AOL researchers Greg Pass, Abdur Chowdhury, and Cayley Torgeson presented their paper A Picture of Search (PDF) at the first International Conference on Scalable Information Systems. The authors released an anonymized dataset they analyzed in the paper that included more than six hundred thousand AOL users, some individuals were subsequently individually identified.

Carnegie Mellon University professor Latanya Sweeney developed the foundation for much of the current work on de-anonymizing data sets. Her paper All the Data on All The People (only abstract is publicly available) published in 2000, showed that it was possible to identify individuals in US Census data using only a few variables. The paper argues that it is possible to identify almost 90% of the US population using only full date of birth, gender, and ZIP code.

Alessandro Acquisti and Ralph Gross (no relation) presented their research on Predicting Social Security Numbers from Public Data. The authors demonstrate that it is possible to automate the process of predicting an individual’s Social Security Number (SSN) for large portions of the population using public information. The information used to create predictions is easily harvested from social networking sites, voter registration records, and commercial databases. Aquiusti and Gross argue that we must reconsider our policies around the use of SSNs, which are commonly used for authentication and frequently abused by identity thieves.

* This article originally appeared as The State of User Tracking and the Impossibility of Anonymizing Data in my Messaging News “On Message Column.”

You should follow me on Twitter.