Six years later, with big data again promising a new way of "doing social science," this warning remains all too true.The Ok Cupid data release reminds us that the ethical, research, and regulatory communities must engage in collaborative, dedicated, and multi-prong efforts to address the conceptual muddles present in big data research, reframe the ethical dilemmas inherent in such research projects, expand educational and outreach efforts, and develop policy guidance focused on the unique challenges of big data research ethics.However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it is a more useful form.For those concerned about privacy, research ethics, and the rise of publicly releasing large data sets, this logic of "but the data is already public" is an all-too-familiar refrain used to easily set aside thorny ethical concerns.

At the same time, the big data that increasingly fuels economic decision-making has emerged as a rich terrain for engaging in academic research and experimentation: think of the Facebook emotional contagion experiment of 2014, where the news feeds of nearly 700,000 users were altered to study the impact on mood; or when Harvard researchers released the first wave of their "Tastes, Ties and Time" dataset in 2008, comprising four years’ worth of complete Facebook profile data harvested from the accounts of an entire cohort of 1,700 college students; or a decade ago when AOL released over 20 million search queries from 658,000 of its users to the public in 2006 in an attempt to support academic research on search engine usage.When considered through the lens of the regulatory definition of "private information," social media postings are often considered public, especially when users take no visible, affirmative steps to restrict access.As a result, big data researchers conclude subjects are not deserving of particular privacy consideration.For example, the determination of what constitutes "private information" – and thus triggers particular privacy concerns – becomes difficult within the context of big data research.Distinctions within the regulatory definition of "private information" – namely, that it only applies to information which subjects reasonably expect is not normally monitored or collected and not normally publicly available – become less clearly applicable when considering the data environments and collection practices that typify big data research, such as the wholesale mining of Facebook activity or public OKCupid accounts.

