The World is Not Flat and Neither Are Social Networks

Now that I and the rest of the Internet has grown accustomed to Google Plus and Facebook’s most recent friend categorization features, I thought it was time to revisit and revise a previously unpublished piece of mine. Take a moment and think about your friends, family, colleagues, friends of friends, acquaintances, and members of the same social club. These six groups could comprise a large part, but certainly not all, of the people that you know. You may also have extended family, classmates, common members of sports teams, religious associations, and the familiar strangers you recognize, but don’t know their names. To further complicate matters, the people in these groups often change over time as we move through life. How we conduct ourselves depends on the situation. It is highly unlikely that you act the same way around your grandmother as you do at a party with your friends and people do not expect you to act the same way. Your friends, work colleagues, and extended family do not all know each other and I suspect that in many cases you would like to keep it that way. For this reason, it seems odd to expect that our interactions in online social networks would be any different.

I had the final word in Erica Naone’s Technology Review article Can Google Get Social Networking Right?. Naone’s piece argues that Google needed to dramatically improve its social offerings to compete against Facebook. She asked me to comment on Google’s social services such as Buzz and Profiles and how they might interact with user’s search history. It is interesting to see how much the discussion has changed since the article appeared. Disclosure: I worked as an engineering intern on Google Accounts during 2005-2006, but this was well before any of Google’s social options existed. I responded with a discussion of broad problems I saw with social network services. The following quote in the Naone’s article mostly reflects my statements, although the quote makes it appear that I am singling out Facebook for criticism, which misses the point that I think this is a fundamental problem across many social networks.

“Facebook, meanwhile, has its own problems, and some of these could turn out to be opportunities for Google. Ben Gross, an expert in online identity, notes that Facebook and other social networks don’t accurately differentiate between people’s social connections, making their social graph information less valuable to users and advertisers. For example, social networks tend to put all of a user’s connections into a single group of “friends,” and expect users to manage complex privacy settings to sort out family, work connections, and bar buddies. “Social network services should not assume that networks are flat, or that people are willing to put in the effort to articulate these networks or that they even want to,” he says.”

My full response from which the quote was taken follows below. I fixed a few typos, but it is otherwise unedited.

“I see several consistent problems with many of the social network services. First, they often unify disparate social networks in ways that do not match people’s actual experience and may not even make sense to them. In order to have a real representation of people’s social networks, they would have to fully articulate these networks to the service, which is a pretty unnatural thing to do. For many people the edges of the network shift regularly. Most social network services do not make it easy to maintain multiple independent networks on the service. It is common for people to maintain independent social networks, where individuals may not want the networks unified and people may not even care or wish to know about the other networks. For example, one’s extended family vs. one’s work colleagues vs. one’s friends they have brunch with on the weekend. The idea that there is a single flat network is sort of ridiculous.

I often hear people say that people who want to maintain independent identities or networks are somehow up to no good. I have interviewed quite a few people about this topic for my dissertation. It’s clear that people’s lives are complicated and their identifiers and networks reflect this. If you think about it, it is not at all strange for someone to want to separate their work life, from their family life, from their friend, or all manner of combinations. The boundaries of these relationships shift and behaviors vary widely. Social network services should not assume that networks are flat, that people are willing to put in the effort to articulate these networks, or that they even want to. Also for many people, they may have portions of their network that they are connected to online and therefore the online representation of their network may be very skewed. Even if people are connected to multiple networks online, they may use different social network services for different social networks. For example, it is not unusual for people to primarily have email conversations with some connections, use AIM for others, Google Talk for others, SMS for another group, and Facebook for yet another. Each service would be missing the chunk of connections for the other service.”

You need context to create a meaningful representation of a person’s social network. To make matters worse, that context shifts constantly as do peoples social relations, particularly those with whom we have weak connections. This is why people often see online social network representations as a cartoonish view of their own complex and ever changing social worlds. This is not a new revelation about social relations. William James published the following in 1890.

Properly speaking, a man has as many social selves as there are individuals who recognize him and carry an image of him in their mind. To wound any one of these his images is to wound him. But as the individuals who carry the images fall naturally into classes, we may practically say that he has as many different social selves as there are distinct groups of persons about whose opinion he cares. He generally shows a different side of himself to each of these different groups. Many a youth who is demure enough before his parents and teachers, swears and swaggers like a pirate among his ‘tough’ young friends. We do not show ourselves to our children as to our club-companions, to our customers as to the laborers we employ, to our own masters and employers as to our intimate friends. From this there results what practically is a division of the man into several selves; and this may be a discordant splitting, as where one is afraid to let one set of his acquaintances know him as he is elsewhere; or it may be a perfectly harmonious division of labor, as where one tender to his children is stern to the soldiers or prisoners under his command.

It is important to recognize that forcing people interact with their social relations as a flat network has many undesirable consequences. Figuring out how to restore a more natural balance to social relations is a grand challenge for social networks. People we think of as friends, enemies, and acquaintances change over time as friendships intensify and cool and we move through life phases. Also, complete visibility in networks is not always desirable or healthy. When we remove people’s choice to disclose their relationships and group memberships we strip them of something that is fundamentally human. We provide people with only one option for presenting themselves at a time denies them an important means of self-expression that is also fundamentally human.

I find it heartening to see how much has improved over the last year as both Google Plus and Facebook have dramatically improved the situation in allowing us more options to interact naturally with different social spheres. Framing choices about self presentation as choices about privacy misses the point that the issue is usually about context. When social networks lack context, it forces people to articulate everyone that should be included or excluded from a particular interaction. In these cases, the cognitive overhead of potentially making this judgement for each interaction is staggeringly high. Unless you are a public figure, you likely never need to decide if what you say is appropriate or even remotely interesting to someone you went to grade school with, someone you went to college with, a work colleague, your aunt, your next door neighbor, and a dear friend. We should not force people to work this hard unnecessarily.

References

danah michele boyd. Friendster and publicly articulated social networking. In CHI ‘04 extended abstracts on Human factors in computing systems, pages 1279–1282, New York, NY, USA, 2004. ACM. Articulated Social Networks: An Ethnographic Study of Friendster

Erving Goffman. Presentation of Self in Everyday Life. Anchor Books, New York, 1959.

Francesca Grippa, Antonio Zilli, Robert Laubacher, and Peter A. Gloor. E-mail may not reflect the social network. In Proceedings of the North American Association for Computational Social and Organizational Science Conference, 2006.

Ido Guy, Michal Jacovi, Noga Meshulam, Inbal Ronen, and Elad Shahar. Public vs. private: Comparing public social network information with email. In CSCW ‘08: Proceedings of the ACM 2008 conference on Computer supported cooperative work, pages 393–402, New York, NY, USA, 2008. ACM

Kai Fischbach, Peter A. Gloor, and Detlef Schoder. Analysis of informal communication networks – a case study. Business & Information Systems Engineering, 1:140–149, 2009.

William James. The Principles of Psychology, volume 1. Henry Holt & Co., 1890

Hat tip to Gaurav Mishra whose similar titled article The World is Not Flat and Neither is the Social Web (site is currently offline), from 2008 I found after I finished writing this post.

You should follow me on Twitter.

Tracking, Geolocation and Digital Exhaust

You are unique… In so many ways…

The accounting systems on which modern society depends are surveillance systems when viewed with another lens. All administrative, financial, logistics, public heath, and intelligence systems rely on the ability to track people, objects, and data. Efficiency and effectiveness in tracking have been greatly aided by improvements in data analysis, computational capabilities, and greater aggregations of data.

Advances in social network analysis, traffic analysis, fingerprinting, profiling, de-anonymization/re-identification, and behavioral modeling techniques have all contributed to better tracking capabilities. In addition, modern technological artifacts typically contain one or more unique hardware device identifiers. These identifiers—particularly in mobile devices, but also RFIDs, and soon Intelligent Vehicle-Highway Systems—are widespread, but also effectively unmodifiable and relatively unknown to most of their owners. For example, with mobile devices, each network interface (cellular, Bluetooth, WiFi) requires a minimum of one unique hardware identifier—all uniquely trackable. One hand, aggregating these unique identifiers allows services like Google, Skyhook, and others to associate geolocation data with WiFi access points and provide useful services. On the other hand, Samy Kamkar’s work described in Hack pinpoints where you live: How I met your girlfriend shows the potentially awkward and invasive side effects.

Individuals generate transactional data from common interactions offline such as card key systems and nearly every online transaction. Improvements in techniques to correlate disparate data as well as techniques to analyze the unique characteristics of software, hardware, network traffic to form a fingerprint is frequently unique. For example, a large-scale analysis of web browsers from the Panopticlick project showed that over 90% of seemingly common consumer configurations were effectively unique. IP geolocation data can be used to increase security as with Detecting Malice with ModSecurity: GeoLocation Data or it can be used in ways that are quite Creepy.

Another major shift is the widespread collection and aggregation of geolocation information from mobile devices. Location can be a highly unique identifier, even if the mobile device changes. Philippe Golle and Kurt Partridge show that two data points sampled during the day—one at home and one at work are enough to uniquely identify many individuals, even in anonymized data. Geolocation data can also reveal significant information about the people spend time with and a view of their social network. Jeff Jonas sums this up well in Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food! In a sense the mobile phone has caused an enormous increase in uniquely identifiable data that can be used for tracking.

An average person now generates a constant stream of geolocation data that is collected by mobile carriers. Geolocation information is generated from cellular triangulation, geolocated IP addresses, and integrated GPS units, which deliver down to 10 meter accuracy. Geolocated mobile transaction data aggregated across multiple carriers is increasingly available for commercial use. It is possible to accurately track large numbers of individuals in constrained environments simply by sniffing the ITMI (temporary ID) as Path Intelligence does in mall, although they could sniff the IMEI just as easily, but they say they do not to protect privacy. Still, large-scale analysis of geolocation data is in its infancy. ReadWriteWeb describes how Developers Can Now Access Locations of 250 Million Phones Across U.S. Carriers

Tracking technologies—particularly when combined with geolocation information—have matured far beyond tracking individuals and are rapidly becoming capable of tracking groups and larger populations, which could be applied to entire enterprises or political organizations. Tools and techniques have made it feasible to correlate geolocation information, commercially aggregated profiles of online use, digital fingerprints, and offline transactional data. In addition, analysis of current anonymization techniques has repeatedly shown that simply adding another source of data is enough to re-identify a large percentage of the population. The Spatial Law and Policy blog is doing a nice job of tracking the policy implications of geolocation data.

The immense potential value of geolocation and other tracking data may well provide enough incentive for it to be used in ways counter to our own interests. Potential threats for misuse of the data need to be taken into account when designing systems. For example, what is the value of highly accurate logistical data about a US corporation derived from geolocation data and social network analysis to a foreign industrial competitor? Even a small amount of data that allowed a rudimentary analysis of external individuals meeting with internal high-level executives would be a worthwhile target. Similarly, both foreign industrial interests and foreign states may be willing to spend significant resources to acquire details on the movements and meetings of political parties.

More broadly I have been thinking about the question—What does it mean for a third-party to acquire better logistics about an organization than the organization has itself? What are the policy implications when and if these tracking tools are deployed in places without the rule of law, stable transitions of government, and low levels of corruption that we assume in the US? Could changes in the design and implementation of these systems mitigate the risks outlined? For example, should these design changes include internal controls, data scrubbing capabilities, and user interfaces that more clearly indicate a big picture of what data is being given off. Are there behavioral strategies that would reduce risks? To what extent can user education reduce risk?

You should follow me on Twitter.