June 11, 2024
|
No Robots(.txt): How to Ask ChatGPT and Google Bard to Not Use Your Website for Training.
You can prevent ChatGPT and Bard from using the content on your website to train their models. Using your website's "robots.txt." file, you can instruct bots and web crawlers NOT to scrape content.
While it may appear that AI dominates every industry and sector, public sentiment around AI remains increasingly skeptical. We’ve been witnesses to its lies, e.g., hallucinations, generic knowledge e.g., regurgitated output and biased responses e.g., whitewashed content.
Every new AI release unveils yet-another countermeasure. And this arsenal keeps expanding with approaches that block the AI’s access to data. Today, an oldie-but-goodie approach is highlighted with a recap of recent and human-based interventions. Stacking these old, new and forever school approaches help to curb runaway AI.
The robots.txt file remains a stable since the beginning of the internet. It has been a constant line of defense to limit access to websites’ webpages and/or directory of webpages. The simple use of providing instructions for web crawlers — and now generative AI algorithms — should be a universal stopgap. It’s not. First, adding disallow instructions to this file contradicts websites’ purpose, which is to be discovered. So the incentive to update the robots.txt file remains low. And second, following this file’s instructions aren’t required or enforceable.
The resilience of the robots.txt file comes from the web crawling politeness policies. The most relevant one is any form of web scraping tools respecting the allow and disallow instructions. So what does that mean for you? If you host a website, revise your robots.txt file to disallow bots from continuing to access, read and ingest your webpage content. Companies adhere to politeness policies since that bad actor label minimizes their reputation and profits. Follow the step-by-step guide given in the article below.
There’s a swell of checks-and-balances methods performed as part of AI systems. Recently, watermarking in AI has gained renewed attention. It amounts to adding a hard-to-remove digital tracker to an algorithm – it’s a small piece of code that keeps a running log of how the algorithm was manipulated. Digital watermarking can help distinguish between what’s AI generated, AI assisted or AI enabled. It can help indicate what’s perceived as digitally true. Automated content moderation algorithms attempts to identify blatant inappropriate content. It also can help combat AI-enabled fraud, misinformation and disinformation, when executed responsibly. Otherwise, content moderation dissolves into algorithmic misogynoir.
The human eye — with our critical thinking skills — remains one of the best stopgap measures to check AI’s output. Here’s a suggested shortcut to help you more quickly vet responses.
Read the Entire Article Here! |
"People believe that they won’t be able to learn to code, that it’ll take a long time to learn the skill well or they have to be a math prodigy to understand and apply coding concepts. In reality, you don’t have to be “super smart,” but you must be persistent." pg 87
Get Your Copy of Data Conscience Here! |
Stay Rebel Techie,
Dr. Brandeis
Thanks for subscribing! If you like what you read or use it as a resource, please share the newsletter signup with three friends!
Removing the digital debris swirling on the interwebs. A space you can trust to bring the data readiness, AI literacy and AI adoption realities you need to be an informed and confident leader. We discuss AI in education, responsible AI and data guidance, data/AI governance and more. Commentary is often provided by our CEO, Dr. Brandeis Marshall. Subscribe to Rebel Tech Newsletter!
Tuesday, February 25th IN DATA NEWS Judge Throws Out Facial Recognition Evidence In Murder Case An Ohio judge throws out facial recognition evidence in a murder case, preventing prosecutors from securing a conviction. Clearview AI, a controversial facial recognition software, was used to identify the suspect from surveillance footage. “With no immediate leads, investigators turned to surveillance footage taken six days after the crime. They used Clearview AI, a controversial facial...
Tuesday, February 11th IN DATA NEWS Google Lifts a Ban on Using Its AI for Weapons and Surveillance In 2018, Google published principles barring its AI technology from being used for sensitive purposes, such as weapons and surveillance. Weeks into President Donald Trump’s second term, those guidelines are being overhauled. It’s an open-palm slap in the face to all of us, especially Google’s workforce. They said that AI applications wouldn’t be used for military purposes and actively mitigate...
December 3, 2024 👋🏾 Reader, Wishing you and yours a happy holidays. As the DataedX team settles into our Winter Rest period (now until Jan 6-ish), I wanted to share the mounds of good trouble we've gotten into this year. It has been a year full of learning, teaching and leadership development. We’re steadfastly focused on integrating equity throughout DataOps and AIOps. We believe in making data and AI concepts snackable from the classroom to the boardroom. This means that our society can be...