new post. ha. only because it became a link collection that wanted a title. and now it feels like I'm microblogging in my commit history.

This commit is contained in:
Markus Heurung 2025-03-31 18:18:11 +02:00
parent 9ef4fc7317
commit a4e30ea7e1

View file

@ -0,0 +1,25 @@
---
layout: default
title: One can hope.
tags:
- ai
- poison
date: "2025-03-31 14:18"
---
[poisoning well](https://heydonworks.com/article/poisoning-well/)
> One of the many pressing issues with Large Language Models (LLMs) is they are trained on content that isnt theirs to consume.
>
> Since most of what they consume is on the open web, its difficult for authors to withhold consent without also depriving legitimate agents (AKA humans or “meat bags”) of information.
>
> Some well-meaning but naive developers have implored authors to instate robots.txt rules, intended to block LLM-associated crawlers.
Related from my bookmarks:
- [iocaine](https://iocaine.madhouse-project.org/), which I've been following for quite a while since [@algernon](https://come-from.mad-scientist.club/@algernon) does a great job of developing in the open.
> This is deliberately malicious software, intended to cause harm. Do not deploy if you arent fully comfortable with what you are doing. LLM scrapers are relentless and brutal, they will place additional burden on your server, even if you only serve static content.
- [tobi of gotosocial about related fediverse stats poisoning](https://gts.superseriousbusiness.org/@dumpsterqueer/statuses/01JK5PXHCHWP58172Y40QKE2ZZ)
- > [Quixotic](https://marcusb.org/hacks/quixotic.html) is a program that will feed fake content to bots and robots.txt-ignoring LLM scrapers.
- > [Nepenthes](https://zadzmo.org/code/nepenthes/) [...] is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLMs - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.