FACEBOOK ENGINEERS: WE HAVE NO IDEA WHERE WE KEEP ALL YOUR PERSONAL DATA
In a discovery hearing, two veteran Facebook engineers told the court that the company doesn’t keep track of all your personal data.
IN MARCH, two veteran Facebook engineers found themselves grilled about the company’s sprawling data collection operations in a hearing for the ongoing lawsuit over the mishandling of private user information stemming from the Cambridge Analytica scandal.
The hearing, a transcript of which was recently unsealed, was aimed at resolving one crucial issue: What information, precisely, does Facebook store about us, and where is it? The engineers’ response will come as little relief to those concerned with the company’s stewardship of billions of digitized lives: They don’t know.
The admissions occurred during a hearing with special master Daniel Garrie, a court-appointed subject-matter expert tasked with resolving a disclosure impasse. Garrie was attempting to get the company to provide an exhaustive, definitive accounting of where personal data might be stored in some 55 Facebook subsystems. Both veteran Facebook engineers, with according to LinkedIn two decades of experience between them, struggled to even venture what may be stored in Facebook’s subsystems. “I’m just trying to understand at the most basic level from this list what we’re looking at,” Garrie asked.
“I don’t believe there’s a single person that exists who could answer that question,” replied Eugene Zarashaw, a Facebook engineering director. “It would take a significant team effort to even be able to answer that question.”
When asked about how Facebook might track down every bit of data associated with a given user account, Zarashaw was stumped again: “It would take multiple teams on the ad side to track down exactly the — where the data flows. I would be surprised if there’s even a single person that can answer that narrow question conclusively.”
In an emailed statement that did not directly address the remarks from the hearing, Meta spokesperson Dina El-Kassaby told The Intercept that a single engineer’s inability to know where all user data was stored came as no surprise. She said Meta worked to guard users’ data, adding, “We have made — and continue making — significant investments to meet our privacy commitments and obligations, including extensive data controls.”
THE DISPUTE OVER where Facebook stores data arose when, as part of the litigation, now in its fourth year, the court ordered Facebook to turn over information it had collected about the suit’s plaintiffs. The company complied but provided data consisting mostly of material that any user could obtain through the company’s publicly accessible “Download Your Information” tool.
Facebook contended that any data not included in this set was outside the scope of the lawsuit, ignoring the vast quantities of information the company generates through inferences, outside partnerships, and other nonpublic analysis of our habits — parts of the social media site’s inner workings that are obscure to consumers. Briefly, what we think of as “Facebook” is in fact a composite of specialized programs that work together when we upload videos, share photos, or get targeted with advertising. The social network wanted to keep data storage in those nonconsumer parts of Facebook out of court.
In 2020, the judge disagreed with the company’s contention, ruling that Facebook’s initial disclosure had indeed been too sparse and that the company must reveal data obtained through its oceanic ability to surveil people across the internet and make monetizable predictions about their next moves.
Facebook’s stonewalling has been revealing on its own, providing variations on the same theme: It has amassed so much data on so many billions of people and organized it so confusingly that full transparency is impossible on a technical level. In the March 2022 hearing, Zarashaw and Steven Elia, a software engineering manager, described Facebook as a data-processing apparatus so complex that it defies understanding from within. The hearing amounted to two high-ranking engineers at one of the most powerful and resource-flush engineering outfits in history describing their product as an unknowable machine.
The special master at times seemed in disbelief, as when he questioned the engineers over whether any documentation existed for a particular Facebook subsystem. “Someone must have a diagram that says this is where this data is stored,” he said, according to the transcript. Zarashaw responded: “We have a somewhat strange engineering culture compared to most where we don’t generate a lot of artifacts during the engineering process. Effectively the code is its own design document often.” He quickly added, “For what it’s worth, this is terrifying to me when I first joined as well.”
THE REMARKS IN the hearing echo those found in an internal document leaked to Motherboard earlier this year detailing how the internal engineering dysfunction at Meta, which owns Facebook and Instagram, makes compliance with data privacy laws an impossibility. “We do not have an adequate level of control and explainability over how our systems use data, and thus we can’t confidently make controlled policy changes or external commitments such as ‘we will not use X data for Y purpose,’” the 2021 document read.
The fundamental problem, according to the engineers in the hearing, is that Facebook’s sprawl has made it impossible to know what it consists of anymore; the company never bothered to cultivate institutional knowledge of how each of these component systems works, what they do, or who’s using them. There is no documentation of what happens to your data once it’s uploaded, because that’s just never been something the company does, the two explained. “It is rare for there to exist artifacts and diagrams on how those systems are then used and what data actually flows through them,” explained Zarashaw.
“It is rare for there to exist artifacts and diagrams on how those systems are then used and what data actually flows through them.”
Facebook’s inability to comprehend its own functioning took the hearing up to the edge of the metaphysical. At one point, the court-appointed special master noted that the “Download Your Information” file provided to the suit’s plaintiffs must not have included everything the company had stored on those individuals because it appears to have no idea what it truly stores on anyone. Can it be that Facebook’s designated tool for comprehensively downloading your information might not actually download all your information? This, again, is outside the boundaries of knowledge.
“The solution to this is unfortunately exactly the work that was done to create the DYI file itself,” noted Zarashaw. “And the thing I struggle with here is in order to find gaps in what may not be in DYI file, you would by definition need to do even more work than was done to generate the DYI files in the first place.”
The systemic fogginess of Facebook’s data storage made answering even the most basic question futile. At another point, the special master asked how one could find out which systems actually contain user data that was created through machine inference.
“I don’t know,” answered Zarashaw. “It’s a rather difficult conundrum.”
Update: September 7, 2022, 9:56 p.m. ET
This story has been updated to include a statement from Meta sent after publication.
Open letter from The BMJ to Mark Zuckerberg
Dear Mark Zuckerberg,
We are Fiona Godlee and Kamran Abbasi, editors of The BMJ, one of the world’s oldest and most influential general medical journals. We are writing to raise serious concerns about the “fact checking” being undertaken by third party providers on behalf of Facebook/Meta.
In September, a former employee of Ventavia, a contract research company helping carry out the main Pfizer covid-19 vaccine trial, began providing The BMJ with dozens of internal company documents, photos, audio recordings, and emails. These materials revealed a host of poor clinical trial research practices occurring at Ventavia that could impact data integrity and patient safety. We also discovered that, despite receiving a direct complaint about these problems over a year ago, the FDA did not inspect Ventavia’s trial sites.
The BMJ commissioned an investigative reporter to write up the story for our journal. The article was published on 2 November, following legal review, external peer review and subject to The BMJ’s usual high level editorial oversight and review.
But from November 10, readers began reporting a variety of problems when trying to share our article. Some reported being unable to share it. Many others reported having their posts flagged with a warning about “Missing context … Independent fact-checkers say this information could mislead people.” Those trying to post the article were informed by Facebook that people who repeatedly share “false information” might have their posts moved lower in Facebook’s News Feed. Group administrators where the article was shared received messages from Facebook informing them that such posts were “partly false.”
Readers were directed to a “fact check” performed by a Facebook contractor named Lead Stories.
We find the “fact check” performed by Lead Stories to be inaccurate, incompetent and irresponsible.
— It fails to provide any assertions of fact that The BMJ article got wrong
— It has a nonsensical title: “Fact Check: The British Medical Journal Did NOT Reveal Disqualifying And Ignored Reports Of Flaws In Pfizer COVID-19 Vaccine Trials”
— The first paragraph inaccurately labels The BMJ a “news blog”
— It contains a screenshot of our article with a stamp over it stating “Flaws Reviewed,” despite the Lead Stories article not identifying anything false or untrue in The BMJ article
— It published the story on its website under a URL that contains the phrase “hoax-alert”
We have contacted Lead Stories, but they refuse to change anything about their article or actions that have led to Facebook flagging our article.
We have also contacted Facebook directly, requesting immediate removal of the “fact checking” label and any link to the Lead Stories article, thereby allowing our readers to freely share the article on your platform.
There is also a wider concern that we wish to raise. We are aware that The BMJ is not the only high quality information provider to have been affected by the incompetence of Meta’s fact checking regime. To give one other example, we would highlight the treatment by Instagram (also owned by Meta) of Cochrane, the international provider of high quality systematic reviews of the medical evidence. Rather than investing a proportion of Meta’s substantial profits to help ensure the accuracy of medical information shared through social media, you have apparently delegated responsibility to people incompetent in carrying out this crucial task. Fact checking has been a staple of good journalism for decades. What has happened in this instance should be of concern to anyone who values and relies on sources such as The BMJ.
We hope you will act swiftly: specifically to correct the error relating to The BMJ’s article and to review the processes that led to the error; and generally to reconsider your investment in and approach to fact checking overall.
Fiona Godlee, editor in chief
Kamran Abbasi, incoming editor in chief
As current and incoming editors in chief, we are responsible for everything The BMJ contains.
 Thacker PD. Covid-19: Researcher blows the whistle on data integrity issues in Pfizer’s vaccine trial. BMJ. 2021 Nov 2;375:n2635. doi: 10.1136/bmj.n2635. PMID: 34728500. https://www.bmj.com/content/375/bmj.n2635
 Miller D. Fact Check: The British Medical Journal Did NOT Reveal Disqualifying And Ignored Reports Of Flaws In Pfizer COVID-19 Vaccine Trials. Nov 10, 2021. https://leadstories.com/hoax-alert/2021/11/fact-check-british-medical-jo…
Competing interests: As current and incoming editors in chief, we are responsible for everything The BMJ contains.