A group of social science researchers have essentially doxed about 70,000 OkCupid users, dumping a massive data set that includes everything from user names to answers to what may be deeply personal questions. This release may have gone well over the line in the amount of details it included, but the group's methodology, so called "data scraping," isn't uncommon.
It's being called unethical by many privacy advocates, and the group's actions weren't sanctioned by OkCupid.
Nonetheless, nearly 70,000 people have had published their answers to questions like "Are your parents ugly?" and "Would you ever consider cutting a partner (who asked for it) in sexual play?"
What OkCupid Is And What Really Happened
The site and its app "use math to help you find dates," OkCupid proclaims on its website. That math relies on user data including height, age, education, income, ethnicity, vices and answers to questions meant to assess compatibility.
Access to that information, readily found on users' profiles, is restricted to OkCupid users. While OkCupid would probably like for all of its users to be there for companionship, Emil Kirkegaard, a master's student at Aarhus University, and his associates went there looking to collect data on people.
"Since late 2014 we have been scraping the data [from] OkCupid and currently have a database of some 70,000 unique users and their answers to approximately 2,000 questions as well as demographic information, currently unpublished," says Kirkegaard.
The data was mined from the site between November 2014 and March 2015 using a scraper, an automated tool that was programmed to search out profiles based on the amount of questions the user answered.
The group wrote a paper on the observations they made from the scraped data, and they published it all online at the Open Science Framework. In the paper, the authors test a hypothesis that posits that cognitive ability has a negative impact on religion and religion has a positive impact on politics.
"This is a clear violation of our terms of service - and the Computer Fraud and Abuse Act - and we're exploring legal options," an OkCupid spokesperson told DailyMail.com.
Moral Or Amoral
Carnegie Mellon University's Scott B. Weingart, Digital Humanities Specialist, tweeted on the subject of the OkCupid scraping asserting that he can, with almost 90 percent accuracy, connect real names to screen names based on sexual preferences and histories.
It's a violation of a reasonable expectation of privacy, but it isn't clear if OkCupid and its users have any legal recourse. The information, as sensitive as some of it may be, was shared to a semi-public platform.
Legal or illegal, Emily Gorcenski, a software engineer, asserts in a blog post that the group's scraping and data dumping were "a fundamental violation of research ethics."
"Human subjects research must also meet the guidelines of beneficence and equipoise: the researchers must do no harm, the research must answer a legitimate question, and the research must be of a benefit to society," wrote Gorcenski, who says she has her NIH Certification in Human Subjects research.
Gorcenski also called into question the methodology the group used in its paper and the hypothesis the study tested. She also pointed out the conflicts of interests in the journal's peer review process.
"This has a dramatic stench of attempting to find a dataset to match a pre-formed conclusion; in this case, it smells a lot like the prototypical rhetoric of a specific atheist politic," Gorcenski wrote. "One author's comments betray any sense of independence in this regard."
Scrapers
OkCupid has modified its site, so the code Kirkegaard and company used won't work without being revised itself. But the lesson here is to avoid posting things that should never be public information.
There have been other data scraping efforts launched against OkCupid and countless other sites. It's just that this latest incident drew attention because of the researchers' decision to boldly share such a large cache of personal data.
Data scrapers are a type of bot, a software tool built to automate repetitive tasks. Nearly half of the Internet consists of bots, according to a recent study from Device Atlas.