new post. ha. only because it became a link collection that wanted a title. and now it feels like I'm microblogging in my commit history.
This commit is contained in:
parent
9ef4fc7317
commit
a4e30ea7e1
1 changed files with 25 additions and 0 deletions
25
_posts/2025-03-13-one-can-hope.md
Normal file
25
_posts/2025-03-13-one-can-hope.md
Normal file
|
@ -0,0 +1,25 @@
|
|||
---
|
||||
layout: default
|
||||
title: One can hope.
|
||||
tags:
|
||||
- ai
|
||||
- poison
|
||||
date: "2025-03-31 14:18"
|
||||
---
|
||||
|
||||
[poisoning well](https://heydonworks.com/article/poisoning-well/)
|
||||
|
||||
> One of the many pressing issues with Large Language Models (LLMs) is they are trained on content that isn’t theirs to consume.
|
||||
>
|
||||
> Since most of what they consume is on the open web, it’s difficult for authors to withhold consent without also depriving legitimate agents (AKA humans or “meat bags”) of information.
|
||||
>
|
||||
> Some well-meaning but naive developers have implored authors to instate robots.txt rules, intended to block LLM-associated crawlers.
|
||||
|
||||
Related from my bookmarks:
|
||||
|
||||
- [iocaine](https://iocaine.madhouse-project.org/), which I've been following for quite a while since [@algernon](https://come-from.mad-scientist.club/@algernon) does a great job of developing in the open.
|
||||
> This is deliberately malicious software, intended to cause harm. Do not deploy if you aren’t fully comfortable with what you are doing. LLM scrapers are relentless and brutal, they will place additional burden on your server, even if you only serve static content.
|
||||
- [tobi of gotosocial about related fediverse stats poisoning](https://gts.superseriousbusiness.org/@dumpsterqueer/statuses/01JK5PXHCHWP58172Y40QKE2ZZ)
|
||||
- > [Quixotic](https://marcusb.org/hacks/quixotic.html) is a program that will feed fake content to bots and robots.txt-ignoring LLM scrapers.
|
||||
- > [Nepenthes](https://zadzmo.org/code/nepenthes/) [...] is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLMs - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue