The World is Not Flat and Neither Are Social Networks

Now that I and the rest of the Internet has grown accustomed to Google Plus and Facebook’s most recent friend categorization features, I thought it was time to revisit and revise a previously unpublished piece of mine. Take a moment and think about your friends, family, colleagues, friends of friends, acquaintances, and members of the same social club. These six groups could comprise a large part, but certainly not all, of the people that you know. You may also have extended family, classmates, common members of sports teams, religious associations, and the familiar strangers you recognize, but don’t know their names. To further complicate matters, the people in these groups often change over time as we move through life. How we conduct ourselves depends on the situation. It is highly unlikely that you act the same way around your grandmother as you do at a party with your friends and people do not expect you to act the same way. Your friends, work colleagues, and extended family do not all know each other and I suspect that in many cases you would like to keep it that way. For this reason, it seems odd to expect that our interactions in online social networks would be any different.

I had the final word in Erica Naone’s Technology Review article Can Google Get Social Networking Right?. Naone’s piece argues that Google needed to dramatically improve its social offerings to compete against Facebook. She asked me to comment on Google’s social services such as Buzz and Profiles and how they might interact with user’s search history. It is interesting to see how much the discussion has changed since the article appeared. Disclosure: I worked as an engineering intern on Google Accounts during 2005-2006, but this was well before any of Google’s social options existed. I responded with a discussion of broad problems I saw with social network services. The following quote in the Naone’s article mostly reflects my statements, although the quote makes it appear that I am singling out Facebook for criticism, which misses the point that I think this is a fundamental problem across many social networks.

“Facebook, meanwhile, has its own problems, and some of these could turn out to be opportunities for Google. Ben Gross, an expert in online identity, notes that Facebook and other social networks don’t accurately differentiate between people’s social connections, making their social graph information less valuable to users and advertisers. For example, social networks tend to put all of a user’s connections into a single group of “friends,” and expect users to manage complex privacy settings to sort out family, work connections, and bar buddies. “Social network services should not assume that networks are flat, or that people are willing to put in the effort to articulate these networks or that they even want to,” he says.”

My full response from which the quote was taken follows below. I fixed a few typos, but it is otherwise unedited.

“I see several consistent problems with many of the social network services. First, they often unify disparate social networks in ways that do not match people’s actual experience and may not even make sense to them. In order to have a real representation of people’s social networks, they would have to fully articulate these networks to the service, which is a pretty unnatural thing to do. For many people the edges of the network shift regularly. Most social network services do not make it easy to maintain multiple independent networks on the service. It is common for people to maintain independent social networks, where individuals may not want the networks unified and people may not even care or wish to know about the other networks. For example, one’s extended family vs. one’s work colleagues vs. one’s friends they have brunch with on the weekend. The idea that there is a single flat network is sort of ridiculous.

I often hear people say that people who want to maintain independent identities or networks are somehow up to no good. I have interviewed quite a few people about this topic for my dissertation. It’s clear that people’s lives are complicated and their identifiers and networks reflect this. If you think about it, it is not at all strange for someone to want to separate their work life, from their family life, from their friend, or all manner of combinations. The boundaries of these relationships shift and behaviors vary widely. Social network services should not assume that networks are flat, that people are willing to put in the effort to articulate these networks, or that they even want to. Also for many people, they may have portions of their network that they are connected to online and therefore the online representation of their network may be very skewed. Even if people are connected to multiple networks online, they may use different social network services for different social networks. For example, it is not unusual for people to primarily have email conversations with some connections, use AIM for others, Google Talk for others, SMS for another group, and Facebook for yet another. Each service would be missing the chunk of connections for the other service.”

You need context to create a meaningful representation of a person’s social network. To make matters worse, that context shifts constantly as do peoples social relations, particularly those with whom we have weak connections. This is why people often see online social network representations as a cartoonish view of their own complex and ever changing social worlds. This is not a new revelation about social relations. William James published the following in 1890.

Properly speaking, a man has as many social selves as there are individuals who recognize him and carry an image of him in their mind. To wound any one of these his images is to wound him. But as the individuals who carry the images fall naturally into classes, we may practically say that he has as many different social selves as there are distinct groups of persons about whose opinion he cares. He generally shows a different side of himself to each of these different groups. Many a youth who is demure enough before his parents and teachers, swears and swaggers like a pirate among his ‘tough’ young friends. We do not show ourselves to our children as to our club-companions, to our customers as to the laborers we employ, to our own masters and employers as to our intimate friends. From this there results what practically is a division of the man into several selves; and this may be a discordant splitting, as where one is afraid to let one set of his acquaintances know him as he is elsewhere; or it may be a perfectly harmonious division of labor, as where one tender to his children is stern to the soldiers or prisoners under his command.

It is important to recognize that forcing people interact with their social relations as a flat network has many undesirable consequences. Figuring out how to restore a more natural balance to social relations is a grand challenge for social networks. People we think of as friends, enemies, and acquaintances change over time as friendships intensify and cool and we move through life phases. Also, complete visibility in networks is not always desirable or healthy. When we remove people’s choice to disclose their relationships and group memberships we strip them of something that is fundamentally human. We provide people with only one option for presenting themselves at a time denies them an important means of self-expression that is also fundamentally human.

I find it heartening to see how much has improved over the last year as both Google Plus and Facebook have dramatically improved the situation in allowing us more options to interact naturally with different social spheres. Framing choices about self presentation as choices about privacy misses the point that the issue is usually about context. When social networks lack context, it forces people to articulate everyone that should be included or excluded from a particular interaction. In these cases, the cognitive overhead of potentially making this judgement for each interaction is staggeringly high. Unless you are a public figure, you likely never need to decide if what you say is appropriate or even remotely interesting to someone you went to grade school with, someone you went to college with, a work colleague, your aunt, your next door neighbor, and a dear friend. We should not force people to work this hard unnecessarily.

References

danah michele boyd. Friendster and publicly articulated social networking. In CHI ‘04 extended abstracts on Human factors in computing systems, pages 1279–1282, New York, NY, USA, 2004. ACM. Articulated Social Networks: An Ethnographic Study of Friendster

Erving Goffman. Presentation of Self in Everyday Life. Anchor Books, New York, 1959.

Francesca Grippa, Antonio Zilli, Robert Laubacher, and Peter A. Gloor. E-mail may not reflect the social network. In Proceedings of the North American Association for Computational Social and Organizational Science Conference, 2006.

Ido Guy, Michal Jacovi, Noga Meshulam, Inbal Ronen, and Elad Shahar. Public vs. private: Comparing public social network information with email. In CSCW ‘08: Proceedings of the ACM 2008 conference on Computer supported cooperative work, pages 393–402, New York, NY, USA, 2008. ACM

Kai Fischbach, Peter A. Gloor, and Detlef Schoder. Analysis of informal communication networks – a case study. Business & Information Systems Engineering, 1:140–149, 2009.

William James. The Principles of Psychology, volume 1. Henry Holt & Co., 1890

Hat tip to Gaurav Mishra whose similar titled article The World is Not Flat and Neither is the Social Web (site is currently offline), from 2008 I found after I finished writing this post.

You should follow me on Twitter.

Experimental Options for Analyzing Social Networks in Messaging Systems

Social network analysis is the study of connections, flows, and structure among people, groups, organizations, and systems. The points or nodes in the network may include people, routers, or even disease vectors. The ability to analyze communication patterns and social networks has become a major component of eDiscovery systems. Packages from Autonomy’s Zantaz, Cataphora, and Seagate’s i365 MetaLINCS all feature social network analysis functionality. Research, development, and experimentation in social network analysis tools are likely to make significant contributions to commercial eDiscovery systems in the future. Community, communication and collaboration services, such as LinkedIn, Twitter, FaceBook, and MySpace, are now commonly used in conjunction with institutional systems. These external services are not yet commonly integrated with most compliance and archiving systems. In this article I discuss the NodeXL and Maltego applications. Both of these tools offer a specialized feature set that could offer insight into future development for eDiscovery platforms in terms of external data and analysis of social networks.

Social network analysis and network theory research has a rich literature that spans many disciplines including anthropology, criminology, economics, epidemiology, political science, psychology, sociology, and statistics. Social scientists in anthropology, psychology, and sociology developed modern social network analysis methods from the 1930s to the 1960s. Starting in the 1970s, social network analysis attracted researchers from a growing array of fields and rapidly increasing subspecialties. Mark Granovetter’s 1973 paper “The Strength of Weak Ties” and Stanley Milgram’s small world project (the source of the idea of “six degrees of separation” were both fundamental to the growth of the field. Euler’s 1736 paper, titled the “Seven Bridges of Koenigsberg,” is considered the first paper on graph theory, which underlies much of the mathematics behind social network analysis. An excellent place to begin looking for more information is the Web site of the International Network for Social Network Analysis (INSNA), a professional organization dedicated to advancing the field of social network analysis.

NodeXL

NodeXL is a free and open source social network analysis and visualization tool that is an add-on package for Microsoft Excel 2007. The ease of use of the software combined with integrated import functionality for multiple common types of data make NodeXL particularly compelling. NodeXL is straightforward to use, quick to set up, and is capable of analyzing data without requiring programming experience for anyone who is familiar with working with Excel. The software is in use by both academics and professionals. Several university classes teach social network analysis with NodeXL. The primary NodeXL documentation is a well-written tutorial developed for the courses. Update: the book Analyzing Social Media Networks with NodeXL: Insights from a Connected World is available as of September 2010.

One major difficulty with many experimental and research systems is that mechanisms for importing real world data are often limited or nonexistent. In these systems, data extraction, normalization, and cleaning may involve significant effort. Commercial electronic discovery systems typically include an integrated set of components that are capable of managing the entire lifecycle of the eDiscovery process including preservation, collection, processing, review, and analysis. Most eDiscovery systems are able to import common document and messaging formats. NodeXL is able to import real world data from multiple sources including email messages, Twitter messages, Flickr tags, and YouTube user networks. NodeXL relies on Windows Desktop Search in XP or Windows Search in Vista to import email. The software can also import other social network analysis tool formats including: UCINet, graphML, Pajek, and CSV. The software requires Windows XP or Vista, Office 2007, and several other updated system components from Microsoft, but no other third-party software.

Commercial eDiscovery systems include a much wider array of features for analysis such as the ability to reconstruct conversations, threads, as well as a history of connections between messages, documents, and access logs. Dedicated SNA packages, such as NodeXL, typically contain more specialized network metrics, network layouts, and visualizations than general eDiscovery systems, although with a more limited set of data types. These specialized packages allow for experimentation with different types of analyses and for comparison with existing analyses. The NodeXL team plans to include support for additional types of popular social network services such as Facebook and enterprise information sources such as Active Directory. Access to the data is provided through official APIs from each of the services. The terms of service for an API typically restricts how much data may be collected and the potential uses for the data. The use of these APIs to collect data for legal action will most certainly require a court order to remain compliant with the terms of service.

Maltego

Maltego from Paterva is a unique tool that bills itself as an “open source intelligence” application that could be viewed as an eDiscovery system for the Internet at large. Paterva is based in South Africa. Roelof Temmingh, who is active and vocal in the security community, formed the company in 2007. Maltego launched as a commercial product in 2008.

The application helps you to collect information about people, documents, network resources, and the trails of information we increasingly leave around the Internet. Once you have gathered your information, Maltego provides methods to analyze the data, make inferences about relationships between them, and then visualize these connections. Maltego excels at enumerating Internet infrastructure such as information about IP addresses, net blocks, autonomous system numbers, DNS records, and WHOIS records. There are methods to collect information about email address, mail servers, URLs, phone numbers, document metadata and social network services.

Maltego relies on specialized data connectors called “transforms” that interact with online services to gather information from many sources. For example, one set of transforms connects to the WikiScanner service and allows a user to query if a particular IP address or netblock has made edits to Wikipedia or query which IP address made a particular edit. Another set of transforms allows users to discover connections between phone numbers, email addresses, URLs, and IP addresses. In the first version of the application, known then as Evolution, the transforms ran directly on the Maltego client machine. After a legal threat from a large social network, Paterva reexamined its data sources and determined that several of the transforms violated the terms of service.

Paterva then redesigned the software and released a second version called Maltego that ran the transforms directly from Paterva’s servers rather than on the client. Paterva eliminated all the transforms that potentially violated terms of service for various providers including all social network service transforms. In addition, Paterva changed the primary search engine from Google to Yahoo!, which allows automated queries under its terms of service. The new architecture also allowed Maltego to quickly add new transforms to the service and manage the number of API calls to prevent users from reaching limits defined by the services. Recent versions of Maltego once again include the ability to perform local transforms, so that users can create and share their own transforms and adds the ability to extract data from local files for databases.

Paterva offers a commercial edition and a free community edition Maltego client. The commercial edition costs $430 USD for the first year and $320 USD per year for renewals. The community edition is free, but places some limitations on use including a limited number of queries per day, no export of data, and limited levels of detail. Paterva offers a Maltego Transform Application Server (TAS) that allows customers to run transformations from their own server. This allows customers to integrate Maltego with their own infrastructure and eliminate their reliance on Paterva’s servers for privacy reasons. Maltego Mesh is an experimental Firefox plugin that automatically extracts entities from Web pages, such as names, companies, email, addresses, phone numbers, dates and IP addresses. Mesh can then save these entities along with source of the entities for later analysis in Maltego.

Acquiring datasets of real world examples, that are of significant size and do not have significant legal or privacy restrictions can be a significant problem when evaluating systems for eDiscovery, social network analysis, anti-spam, and email content analysis. Enron’s investigation by the Federal Energy Regulatory Commission (FERC) led to the email messages used in the trial entering the public record. Academics acquired, processed, and cleaned these emails and made available them for others to analyze. The result is known as the Enron Email Corpus, which contains approximately a half million messages from more than 150 individuals. The Enron Email Dataset has resulted in a significant number of publications and has been a boon to researchers and practitioners alike, as it is a tremendous resource to experiment with and test against.

Communications applications, and therefore eDiscovery systems, will increasingly be designed with an awareness of social network systems and social network analysis. The tools presented in this article offer some insight into potential future developments for these systems.

* This article originally appeared as Experimental Options for Analyzing Social Networks in Messaging Systems in the November 2009 issue of Messaging News magazine. Minor updates and link to NodeXL book added on September 13, 2010.