June 11, 2024
|
No Robots(.txt): How to Ask ChatGPT and Google Bard to Not Use Your Website for Training.
You can prevent ChatGPT and Bard from using the content on your website to train their models. Using your website's "robots.txt." file, you can instruct bots and web crawlers NOT to scrape content.
While it may appear that AI dominates every industry and sector, public sentiment around AI remains increasingly skeptical. We’ve been witnesses to its lies, e.g., hallucinations, generic knowledge e.g., regurgitated output and biased responses e.g., whitewashed content.
Every new AI release unveils yet-another countermeasure. And this arsenal keeps expanding with approaches that block the AI’s access to data. Today, an oldie-but-goodie approach is highlighted with a recap of recent and human-based interventions. Stacking these old, new and forever school approaches help to curb runaway AI.
The robots.txt file remains a stable since the beginning of the internet. It has been a constant line of defense to limit access to websites’ webpages and/or directory of webpages. The simple use of providing instructions for web crawlers — and now generative AI algorithms — should be a universal stopgap. It’s not. First, adding disallow instructions to this file contradicts websites’ purpose, which is to be discovered. So the incentive to update the robots.txt file remains low. And second, following this file’s instructions aren’t required or enforceable.
The resilience of the robots.txt file comes from the web crawling politeness policies. The most relevant one is any form of web scraping tools respecting the allow and disallow instructions. So what does that mean for you? If you host a website, revise your robots.txt file to disallow bots from continuing to access, read and ingest your webpage content. Companies adhere to politeness policies since that bad actor label minimizes their reputation and profits. Follow the step-by-step guide given in the article below.
There’s a swell of checks-and-balances methods performed as part of AI systems. Recently, watermarking in AI has gained renewed attention. It amounts to adding a hard-to-remove digital tracker to an algorithm – it’s a small piece of code that keeps a running log of how the algorithm was manipulated. Digital watermarking can help distinguish between what’s AI generated, AI assisted or AI enabled. It can help indicate what’s perceived as digitally true. Automated content moderation algorithms attempts to identify blatant inappropriate content. It also can help combat AI-enabled fraud, misinformation and disinformation, when executed responsibly. Otherwise, content moderation dissolves into algorithmic misogynoir.
The human eye — with our critical thinking skills — remains one of the best stopgap measures to check AI’s output. Here’s a suggested shortcut to help you more quickly vet responses.
Read the Entire Article Here! |
"People believe that they won’t be able to learn to code, that it’ll take a long time to learn the skill well or they have to be a math prodigy to understand and apply coding concepts. In reality, you don’t have to be “super smart,” but you must be persistent." pg 87
Get Your Copy of Data Conscience Here! |
Stay Rebel Techie,
Dr. Brandeis
Thanks for subscribing! If you like what you read or use it as a resource, please share the newsletter signup with three friends!
Removing the digital debris swirling on the interwebs. A space you can trust to bring the data readiness, AI literacy and AI adoption realities you need to be an informed and confident leader. We discuss AI in education, responsible AI and data guidance, data/AI governance and more. Commentary is often provided by our CEO, Dr. Brandeis Marshall. Subscribe to Rebel Tech Newsletter!
Tuesday, March 25th IN DATA NEWS OpenAI must face part of Intercept lawsuit over AI training OpenAI lost a bid to dismiss a lawsuit alleging it misused news articles published by The Intercept to train ChatGPT. This is a win for media outlets, although the same New York judge dismissed The Intercept's claim that OpenAI unlawfully distributed its articles after removing their copyright information. 😌 Data creators add one to the win column for the regular people. 🥷🏽Data thieves must come...
December 3, 2024 👋🏾 Reader, Wishing you and yours a happy holidays. As the DataedX team settles into our Winter Rest period (now until Jan 6-ish), I wanted to share the mounds of good trouble we've gotten into this year. It has been a year full of learning, teaching and leadership development. We’re steadfastly focused on integrating equity throughout DataOps and AIOps. We believe in making data and AI concepts snackable from the classroom to the boardroom. This means that our society can be...
June 25, 2024 The Rebel Tech Newsletter is our safe place to critique data and tech algorithms, processes, and systems. We highlight a recent data article in the news and share resources to help you dig deeper in understand how our digital world operates. DataedX Group helps data educators, scholars and practitioners learn how to make responsible data connections. We help you source remedies and interventions based on the needs of your team or organization. IN DATA NEWS The impact of...