Fuzzy Hashing's Intentionally Black Box
How does it work? While there has been lots of work on fuzzy hashing published, the innards of the process are intentionally a bit of a mystery. The New York Times recently wrote a story that was probably the most public discussion of how such technology works. The challenge was if criminal producers and distributors of CSAM knew exactly how such tools worked then they might be able to craft how they alter their images in order to beat it. To be clear, Cloudflare will be running the CSAM Screening Tool on behalf of the website operator from within our secure points of presence. We will not be distributing the software directly to users. We will remain vigilant for potential attempted abuse of the platform, and will take prompt action as necessary.
Tradeoff Between False Negatives and False Positives
We have been working with a number of authorities on how we can best roll it out this functionality to our customers. One of the challenges for a network with as diverse a set of customers as Cloudflare's is what the appropriate threshold should be to set the comparison distance between the fuzzy hashes.
If the threshold is too strict — meaning that it's closer to a traditional hash and two images need to be virtually identical to trigger a match — then you're more likely to have have many false negatives (i.e., CSAM that isn't flagged). If the threshold is too loose, then it's possible to have many false positives. False positives may seem like the lesser evil, but there are legitimate concerns that increasing the possibility of false positives at scale could waste limited resources and further overwhelm the existing ecosystem. We will work to iterate the CSAM Scanning Tool to provide more granular control to the website owner while supporting the ongoing effectiveness of the ecosystem. Today, we believe we can offer a good first set of options for our customers that will allow us to more quickly flag CSAM without overwhelming the resources of the ecosystem.
Different Thresholds for Different Customers
The same desire for a granular approach was reflected in our conversations with our customers. When we asked what was appropriate for them, the answer varied radically based on the type of business, how sophisticated its existing abuse process was, and its likely exposure level and tolerance for the risk of CSAM being posted on their site.
For instance, a mature social network using Cloudflare with a sophisticated abuse team may want the threshold set quite loose, but not want the material to be automatically blocked because they have the resources to manually review whatever is flagged.
A new startup dedicated to providing a forum to new parents may want the threshold set quite loose and want any hits automatically blocked because they haven't yet built a sophisticated abuse team and the risk to their brand is so high if CSAM material is posted -- even if that will result in some false positives.
A commercial financial institution may want to set the threshold quite strict because they're less likely to have user generated content and would have a low tolerance for false positives, but then automatically block anything that's detected because if somehow their systems are compromised to host known CSAM they want to stop it immediately.
Different Requirements for Different Jurisdictions
There also may be challenges based on where our customers are located and the laws and regulations that apply to them. Depending on where a customers business is located and where they have users, they may choose to use one, more than one, or all the different available hash lists.
In other words, one size does not fit all and, ideally, we believe allowing individual site owners to set the parameters that make the most sense for their particular site will result in lower false negative rates (i.e., more CSAM being flagged) than if we try and set one global standard for every one of our customers.
Over time, we are hopeful that we can improve CSAM screening for our customers. We expect that we will add additional lists of hashes from numerous global agencies for our customers with users around the world to subscribe to. We're committed to enabling this flexibility without overly burdening the ecosystem that is set up to fight this horrible crime.
Finally, we believe there may be an opportunity to help build the next generation of fuzzy hashing. For example, the software can only scan images that are at rest in memory on a machine, not those that are streaming. We're talking with Hany Farid, the former Dartmouth professor who now teaches at Berkeley California, about ways that we may be able to build a more flexible fuzzy hashing system in order to flag images before they're even posted.
Concerns and Responsibility
One question we asked ourselves back when we began to consider offering CSAM scanning was whether we were the right place to be doing this at all. We share the universal concern about the distribution of depictions of horrific crimes against children, and believe it should have no place on the Internet, however Cloudflare is a network infrastructure provider, not a content platform.
But we thought there was an appropriate role for us to play in this space. Fundamentally, Cloudflare delivers tools to our more than 2 million customers that were previously reserved for only the Internet giants. The security, performance, and reliability services that we offer, often for free, without us would have been extremely expensive or limited to the Internet giants like Facebook and Google.
Today there are startups that are working to build the next Internet giant and compete with Facebook and Google because they can use Cloudflare to be secure, fast, and reliable online. But, as the regulatory hurdles around dealing with incredibly difficult issues like CSAM continue to increase, many of them lack access to sophisticated tools to scan proactively for CSAM. You have to get big to get into the club that gives you access to these tools, and, concerningly, being in the club is increasingly a prerequisite to getting big.
If we want more competition for the Internet giants we need to make these tools available more broadly and to smaller organizations. From that perspective, we think it makes perfect sense for us to help democratize this powerful tool in the fight against CSAM.
We hope this will help enable our customers to build more sophisticated content moderation teams appropriate for their own communities and will allow them to scale in a responsible way to compete with the Internet giants of today. That is directly aligned with our mission of helping build a better Internet, and it's why we're announcing that we will be making this service available for free for all our customers.
Cloudflare's connectivity cloud protects
entire corporate networks, helps customers build
Internet-scale applications efficiently, accelerates any
website or Internet application,
wards off DDoS attacks, keeps
hackers at bay, and can help you on
your journey to Zero Trust.
Visit
1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.
To learn more about our mission to help build a better Internet,
start here. If you're looking for a new career direction, check out
our open positions.
Policy & LegalAbuse