In January 2019, IBM released a collection of nearly a million photos scraped from image hosting service Flickr to academic and corporate research groups. The photos were filtered from a Yahoo, then parent company of Flickr, publication in 2014 of 100 million images, and annotated with skin tone, estimated age and gender, and extensive facial features (Solon). There was immediate outcry and indignation of the public, despite the fact these photos were intended to be used to improve facial recognition software biases. Commercial face recognition algorithms severely underperform for minorities and women specifically, a problem seemingly rooted in the lack of data. Only 7.4% and 4.4%, respectively, of images in common benchmark datasets such as Adience and IJB-A are of dark-skinned women (Suresh). IBM attempted to combat this problem with the release and promote the improvement of recognition for underrepresented groups. However, it seemed the "benevolence" of IBM had been lost in the process, and the release was met with swift criticism. There are extensive problems with the publication of the photo collection, but I argue to you, none bigger than the promotion of facial recognition in its entirety.

The most glaringly obvious issue with the release lies in the lack of consent of the image owners. It is true there is a significant gap between what the public and industry professionals consider "acceptable practices" when it comes to data-scraping; facial recognition researchers considered it common practice to scrape public images when software was first developing (Hao). In fact, publicly scraping sources is one of the first topics taught to data science professionals, and certainly not considered out of the ordinary. Jason Schulz, a law professor at NYU, likens public scraping to the "dirty little secret of AI training sets," and says, "researchers often just grab whatever images are available in the wild," (Solon). However, the advancement of associated technologies has created a need for re-evaluation of what was once commonplace. IBM Watson's own Visual Recognition software can now be used to identify specific people from photos (Solon), creating a clear violation of the European General Data Protection Regulation. The GDPR states the collection of sensitive biometric data is prohibited if individual identities can be confirmed from the data points, unless explicit consent is given (Khan). As the Visual Recognition software has made clear, this now applies to all images in the IBM collection. While the images were published under the "Creative Commons" license on Flickr, meaning others can reuse the photo without paying licensing fees, the consent to photo utilization was given for an entirely different internet ecosystem. Greg Peverill-Conti, a Boston-based public relations executive with over 700 photos in the collection, rightfully communicates, "none of the people I photographed had any idea their images [would be] used in this way," (Solon). Additionally, the Illinois Biometric Information Privacy Act sets out regulations stating a person's written consent is needed to capture, store, and share biometric information, including face geometry. The release of the IBM collection set up a system demanding an "opt-out" from users concerned with privacy, an overwhelming difficult task to pose on ordinary citizens for several reasons.

First, it is nearly impossible to know if specific photos are included in the database, because it was not released to the public. However, given a user somehow knows their photos are included and wants them removed, the company has stated they are committed to "protecting the privacy of individuals" and "will work with anyone who requests a URL to be removed from the dataset," (Solon). But there are problems with this "commitment" to privacy too. For example, NBC News alerted a photographer more than 1000 of his photos were in the IBM collection, and he promptly followed up with IBM with his Flickr user id, requesting an "opt-out." IBM responded that none of his photos were included in the dataset, to which the photographer gave specific links of his photos in the collection (provided to him by NBC News, as he could not access the dataset himself). The company then responded apologetically, blaming an indexing bug, and removed the four photos the photographer was able to provide links to. NBC News estimates 1001 of the photographer's photos remain in the dataset. Furthermore, even if a photo is removed from IBM's original collection, that photo has still permeated throughout the 250 organizations that have requested access to the dataset (Solon). This arduous, impossible "opt-out" process is unacceptable.

Racial Profiling

The second danger posed by the release of the IBM collection lies in the improvement of facial recognition software, seemingly a paradox. John Smith, who oversees artificial intelligence research at IBM, states "we recognize that societal bias is not necessarily something we can fully tackle with science, but our aim is to address mathematical and algorithmic bias," (Solon). This is fundamentally the wrong approach to combatting bias and promoting good ethical practice in advanced algorithms, because it does not consider the potential ramifications of the software development itself. Eighty-five racial justice and civil rights groups have recently called for tech companies to refuse to sell facial recognition software to governments, citing historical government exploitation of technology to target communities of color or religious minorities and immigrants ("Pressure Mounts"). This call to action comes from the understanding that while facial recognition software accuracy can improve, this has no implication on the accuracy or fairness of the manner in which it is deployed. For example, an NBC News study found criminal biometric databases contained overwhelming numbers of mugshots of African Americans, Latinos, and immigrants, as these groups are frequently subject to targeting from biased policing practices (Solon). Even if facial recognition was used to promote the "general safety and well-being of society," the software would still disproportionately identify those pre-existing in the criminal database as a result of unfair targeting.

Woody Hartzog, a professor of law and computer science at Northeastern University, describes the struggle between improving accuracy and risking further bias. "You've really got a rock-and-a-hard-place situation happening here. Facial recognition can be incredibly harmful when it's inaccurate and incredibly oppressive the more accurate it gets," (Solon). The release of the IBM collection has created a structure in which users unwillingly are aiding in the training of systems that may potentially be used in oppressive ways against themselves or their communities. For example, the state-of-the-art Maryland facial recognition system was used to identify and arrest protestors in Baltimore following the death of Freddie Grey. It is fear of this misuse of the technology that has lead Axon, the country's largest supplier of police body camera technology, to refuse to invest or research in facial recognition (Schuppe). Jennifer King, the director of consumer privacy at the Center for Internet and Society at Stanford Law School, articulates this doubt. "My concern is that a city buys into this so deeply and buys into a process that … forces people to defend themselves against things they haven't done," (Schuppe). Law enforcement agencies having the ability to track someone deemed suspicious without proving they have done anything wrong is inherently un-American and raises worries of a "guilty until proven innocent" system. As argued by Zeynep Tufekci when Facebook released their emotional contagion study, the existence of algorithmic gatekeepers is a dangerous one and should be approached with caution (Boyd). Unfairly biased facial recognition software informing police and governmental decisions has agency on the country and the livelihood of citizens, and must be treated as such.

Recommendation to IBM

Nicole Ozer, the technology and civil liberties director for the ACLU of California, writes in a letter to tech companies "we are at a crossroads with face surveillance, and the choices made by companies now will determine whether the next generation will have to fear being tracked by the government for attending a protest, going to their place of worship, or simply living their lives," ("Pressure Mounts"). I urge IBM to not be on the wrong side of these crossroads, and to publicly declare it will not support or aid in government facial recognition technology or use. Furthermore, the release IBM collection of photos should be regulated more intensely, and "opt-out" options need to be far more extensive. The collection is too easily shared with unknown sources – even NBC News was able to receive a full copy – and IBM must recognize the sensitive nature of this dataset. It must make clear to users if their images are used and obtain written consent.

However, the reduction of bias in the software is important, but bias is a symptom of a greater issue: the danger of facial recognition in the first place. In China, a nightmare scenario of highly accurate facial recognition software is already in place; the technology is being used to target the Uighur Muslim ethnic minority group in what's been called "automated racism," (Ghaffrey). In the past year, San Francisco, Oakland, and Somerville, Massachusetts, have all passed legislation banning the use of facial recognition, recognizing the danger of government surveillance, and the truly impossible task of opting-out. "Unlike your cellphone or computer," writes tech reporter Shirin Ghaffrey, "there's no way to turn off your face." The European Union is currently working on imposing strict limits to the indiscriminate use of facial recognition technology, and it is expected for legislation to be released within the next year (Khan).

The IBM Chief Privacy officer recently published an article about the need for an American privacy law, and she is not wrong. Ms. Montgomery writes "no company has ever succeeded without listening intently to its customers." I urge you to listen now, and to see the dangers in the software you are openly supporting and promoting the development of. It is easy to see the trend towards restrictive legislation, and an early public declaration denouncing government use of facial recognition technology would send IBM to the forefront of user biometric privacy advocacy. IBM has a clear opportunity here to right a wrong, and you must take it.

References

Boyd, Danah. "Untangling research and practice: What Facebook's 'emotional contagion' study teaches us." Research Ethics, vol. 12, no. 1, 2016, pp. 4-13, doi: 10.1177/1747016115583379. Accessed 20 October 2019.

Ghaffary, Shirin. "How facial recognition became the most feared technology in the US." Recode, 9 August 2019, https://www.vox.com/recode/2019/8/9/20799022/facial-recognition-law. Accessed 20 October 2019.

Hao, Karen. "IBM's photo-scraping scandal shows what a weird bubble AI researchers live in." MIT Technology Review, 15 March 2019, https://www.technologyreview.com/f/613131/ibms-photo-scraping-scandal-shows-what-a-weird-bubble-ai-researchers-live-in/. Accessed 20 October 2019.

Khan, Mehreen. "EU plans sweeping regulation of facial recognition." Financial Times, 22 August 2019, https://www.ft.com/content/90ce2dce-c413-11e9-a8e9-296ca66511c9. Accessed 20 October 2019.

Montgomery, Christina. "IBM's Chief Privacy officer: What America needs in a national consumer privacy law." Fox Business, 2 October 2019, https://www.foxbusiness.com/technology/ibms-chief-privacy-officer-what-america-needs-in-a-national-consumer-privacy-law. Accessed 20 October 2019.

"Pressure Mounts on Amazon, Microsoft, and Google Against Selling Facial Recognition to Government." ACLU Northern California, 15 January 2019, https://www.aclunc.org/news/pressure-mounts-amazon-microsoft-and-google-against-selling-facial-recognition-government. Accessed 20 October 2019.

Schuppe, Jon. "Facial recognition gives police a powerful new tracking tool. It's also raising alarms." NBC News, 30 July 2018, https://www.nbcnews.com/news/us-news/facial-recognition-gives-police-powerful-new-tracking-tool-it-s-n894936. Accessed 20 October 2019.

Solon, Olivia. "Facial recognition's 'dirty little secret': Millions of online photos scraped without consent." NBC News, 12 March 2019, https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921. Accessed 20 October 2019.

Suresh, Harini and John V. Guttag. "A Framework for Understanding Unintended Consequences of Machine Learning." ArXiv. 28 January 2019, https://arxiv.org/pdf/1901.10002.pdf. Accessed 20 October 2019.