Scrape

I want to talk about the recent news of Tumblr and Wordpress parent company Automattic being in talks to sell user content to AI companies OpenAI and Midjourney to train their models on. All that we know is currently in that sentence, by the way; the talks are still in progress and the company’s not super transparent about it, which makes sense to me.

What doesn’t make sense to me is the fact that a lot of Internet users seem to think this is outrageous, or new, or somehow strange behaviour for a large company, or that it is just starting. It seems obvious, given AI companies’ proclivities to go ahead and then ask forgiveness, not permission to do the thing, that Tumblr/Wordpress users’ public data has already been hoovered up into the gaping maw of the LLM training sets and this is a mea-culpa gesture; not so much a business proposal as a sheepish admission of guilt and monetary compensation. One wonders what would have happened had they not been called out.

When I was in publishing school back in the early twenty-teens, it was drilled into us that any blog content could be considered published and therefore disqualified from any submission to a publication unless they were specifically asking for previously published pieces. There was at that time a dawning awareness that whatever you had put on the internet (or continued to put out there) was not going to go away. Are you familiar with how Facebook saves everything that you type, even if you don’t post it? That was the big buzz, back then. Twitter was on the rise, and so was Tumblr, and in that context, it seemed a bit naïve to assume that anything written online would ever be private again (if it ever was in the first place…). It was de rigeur for me to go into my privacy settings on Facebook and adjust them in line with updates every few months.

So, for example, this little post of mine here wouldn’t really count as submittable material unless I substantially added to or changed it in some way before approaching a publisher with it. (The definition of “substantially” is up to said publisher, of course.) This might have changed with time (and depending on location), but my brain latched on to it and I find it safest to proceed from this assumption. For the record, I don’t think it’s foolish or naive for internet users to have the opposite assumption, and trust that the companies whose platforms they are using will handle their content in a respectful way and guard their privacy. That should be the baseline. It is a right and correct impulse, taken egregious advantage of by the morally bankrupt.

In any case, I at first have interpreted this whole debacle as …slightly empowering? to users? in a way? as now there are opt-out procedures that Tumblr users can take to put the kibosh on a process that is already happening, and now this scraping of data will be monitored by the parent site, instead of operating according to a don’t-ask-don’t-tell policy. I have to wonder if the same will be extended to Reddit users, or the commenters on CNN or Fox news. And whether my first impression will bear up under any weight of scrutiny whatsoever.

On social media, I assume that everything I post will always and forever be accessible to anyone with enough skills (or money) to want to access it. Same with email, anything in “the cloud” that is not hosted on a double-encrypted server, my search engine preferences, and really any site that I have a login for. My saving grace thus far has been that I am a boring person with neither fame nor wealth nor enemies with a reason to go after me. Facebook got big when I was in my undergraduate years; given that social media was extremely nascent back then, I put a lot of stuff up that I shouldn’t have. Data that I care about. Things I would like to keep secret, keep safe. But I’ve long made my peace with the fact that the internet has known everything about everything I was willing to put up about me for my entire adult life and continues to grasp for more and more. At least on Tumblr, I can say “no”, and then get righteously indignant when that “no” is inevitably ignored and my rights violated.

I hate this state of affairs. But I also want to be able to talk to my family, connect with other solarpunks, do research, communicate with my colleagues … to live in a society, one might say. I try not to let it bother me much. However, I DO sign anything and everything that comes my way from the Electronic Frontier Foundation, an organization dedicated to legislating the shit out of these corporations that have given us free tickets to unlimited knowledge and communication for the price of our personal data, and effectively excommunicated anyone who does not agree to their TOS. The EFF is US-based, but given that most of the social media and AI giants on the internet are also US-based, I feel like it’s relevant.

In my solarpunk future, the internet does still exist, and we can access and use it as much or as little as we like. But it is tightly controlled so that the reckless appropriation and use of art, writing, content, personal data, cannot happen and is not the fee charged for participation in the world wide web. I want to live in a world where my personal data is my own but I can still reach out to my friends and family whenever I’d like, about whatever I want; isn’t that a nice thought?

Previous
Previous

Review: Animals in Translation

Next
Next

It’s Time to End the Hero’s Journey