Disappearing Data

What happens to our data when we are gone? What happens to us, when our data is gone? Does any of this missing data make us vulnerable? These questions that once seemed theoretical are increasingly relevant to our everyday lives. The consequences include not only the potential for lost communications, but also lost data in cloud services, and risk for security breaches for individuals and businesses alike.

We all understand that data deteriorates along with the physical media it is stored on–photographs fade and hard disks crash. This is why we have backups, or at least should have them. The problem is, unfortunately, not so simple these days as much of our data in the cloud depends on multiple systems and services acting in concert to exist. This means that data may disappear for reasons independent of the physical media, even with backups and replication.

I think evaporation is a useful analogy for describing the complex array of factors that cause data to disappear–including services going out of business, enforced retention policies, missed subscription payments, malicious deletion, and loss due to system migrations. One new problem is that the loss of modern data often includes not only documents and media on file systems, but also accounts and online identities.

Lost Data = Lost Access = Lost Identities

It is not a stretch to say our online identities are now essential for daily communication. As part of my dissertation research, I began to investigate the lifecycle–selection, increased use, decreased use, discontinuation, and points in between–of online identifiers including email addresses, instant messenger IDs, and social network services. I was particularly interested in what caused people to stop using their identifiers and if it was by choice. I found that often people lost access to identifiers for reasons out of their control, such as account lockouts, account inactivity, and failure to renew subscriptions. There is often a limited window of time before that data begins to evaporate due to account inactivity or missed payments for a service.

I began to look at the policies from major service providers related to inactive accounts. The policies I found were conflicting, inconsistently presented and followed, and are evolving rapidly. Email services tend to mark accounts inactive, while social networks do not. Paid email accounts do not have activity requirements.

Here are some of the policies from large providers of webmail and other services:

  • AOL: May mark free email account as inactive after 30 days and data may be deleted.
  • Gmail: Marks account as inactive after six months. Inactive accounts may still receive email. After nine months of inactivity, addresses may be deleted. Deleted addresses are not recycled or recoverable.
  • Hotmail: Microsoft says free Hotmail accounts will become inactive after 270 days or if you do not log in for 10 days after creating the account. Inactive accounts will not receive email. Account names may be deleted after 360 days of inactivity and Window’s Live IDs may be deleted after 365 days of inactivity. I also found conflicting documents on the - Microsoft site that said Hotmail accounts might be marked inactive after 30 days or 120 days of not logging in.
  • Yahoo: Deactivates free email accounts after four months. After this time, accounts may be reactivated, however any existing email is deleted and cannot be recovered.

Security and Recycled Identifiers

Depending on the circumstances, services may recycle expired accounts. This means that old identifiers may have new owners. The consequences may be much more than needing a new email address after forgetting to renew a domain name or the loss of a loved one’s letters after an account becomes inactive. There are serious security and privacy implications ranging from potential identity theft to corporate espionage.

If your old email address ends up with a new owner, that new owner will receive any email that was once destined for you. Why is this a problem? Suppose that email address was listed as the primary address or the recovery address for another account. Most systems send either one-time links to reset passwords, or worse, the password in plain text to the email primary or recovery email address. Unfortunately, people tend to reuse passwords across accounts. It is also not uncommon for people to list the older email address as the recovery address for a newer email account, meaning it would be possible to reset the password for a new account as well. Gaining access to an individual’s primary email account is the key to gaining access to most other accounts.

This is a not a theoretical problem. In 2009, Twitter’s internal systems were compromised when an attacker systematically evaluated Twitter employee’s personal accounts looking for potential points of access. The attacker realized that one employee registered a Gmail account using a Hotmail account that had since been marked inactive.

Hotmail recycled the Twitter employee’s account as it had been inactive more than a year and so the attacker simply registered the old username and then used it to reset the current Gmail password. The attacker then found messages in the Gmail account that contained plain text passwords and correctly guessed that the password had also been the Gmail password and simply reset the password to the old password to remain unnoticed. The hacker then used his access to the Gmail account and passwords to compromise other personal accounts of the employee and then those of other employees. One compromise led to another and eventually the hacker gained access to internal Twitter systems. He downloaded hundreds of internal documents, posted screen shots proving his exploits and released more than 300 internal documents to Techcrunch.

Domain Names

The rules and policies under which domain names expire and may be transferred to other parties are complex and vary widely–both by registrar, TLD, and ccTLD–but in general this is not much more than two months and after two to three months the domain will be resold. Here is a brief overview to give you a sense of the time frame and the complications related to expiring domain names.

When the owner of a domain fails to pay, the domain is typically assigned an “Expired” status usually lasting between 30 and 45 days. During this time the domain is usually renewable, but may not be accessible or transferable. Afterwards the domain enters what is known as the Redemption Grace Period (RGP), which is 30 days. Individual details are removed from the WHOIS database and the DNS are deleted so the domain is inaccessible. During the RGP, no edits or transfers are allowed, although the domain may be restored by paying the registrar a fee of $100-$250 USD. After this time, the domain is assigned a “Pending Delete” status, which lasts for five days. At the end of this period, the domain is generally either placed up for auction or released to the general registration pool.

Once a domain is reregistered, the new domain owner may create addresses and Web pages that match the old ones. Domains of defunct businesses may have potentially hosted many email accounts. As with the Twitter breach, these accounts could potentially lead to the compromise of other accounts.

Risk Analysis

The following are some risks to consider, and a few thoughts on how to mitigate those risks.

Potential Risks

  • A complex web of interlocking accounts and systems may affect your risk of a security breach.
  • Do not disregard the risk of “low value” accounts, as they may allow access to more sensitive accounts.
  • Inactive accounts may introduce as much liability as accounts with weak passwords.
  • Best practices may demand a clear separation of business and personal accounts and data, but there are often lapses in the real world.

Suggestions to Mitigate Risk

  • Document usernames and recovery addresses for each account.
  • Set recurring calendar tasks for account renewal payments and to log into infrequently used accounts.
  • Consider purchasing a subscription for infrequently used email accounts used as recovery addresses.
  • Consider using a password manager to generate and store unique strong passwords for each site.
  • Services should never send passwords in plain text.
  • Services should not allow password changes to recently used passwords.
  • Services should offer more notification options about accounts with a pending inactive or deleted status.