How I Made a Self-Quoting Tweet

Or the real reason Twitter doesn't want you to have an edit button.

I'll try to leave the pulp in here and keep this as chronological as I can. In that spirit, no tweets were harmed in the making of this post.

The original idea to make a tweet that quote tweets itself is from the 28th of May 2020—as recorded in Evernote—but I think had likely occurred to me earlier when considering what ramifications Twitter having an edit button would have—most notably being able to mislead, being able to vandalize someone's timeline post-retweet, and of course being able to edit your tweet to refer to itself.

Fundamentally the challenge is just correctly guessing what ID a given tweet is going to get, then appending that onto the URL for our profile and tweeting it.

This initial note already had some background research done into determining how tweet IDs were generated, with a link to this article containing a useful breakdown of Twitter's Snowflake IDs, so thanks to the author of that, Nauman Siddique.

Anatomy of a tweet ID

Twitter used to use sequential IDs but no longer do. Public-facing sequential IDs have the drawback of making usage of your platform easy to estimate. They are also hard to generate in a distributed fashion while preserving order.

From the link above we find that the new Twitter IDs (used for more than just tweets—for example, lists) are composed of three parts: a timestamp, a machine ID, and a sequence number, arranged like so:

TIMESTAMP MACHINE ID SEQUENCE NUM
41 BITS   10 BITS    12 BITS
000...000 0000000000 000000000000 

These are then just stuck together and interpreted as a decimal number and look something like 1320553050730340354.

Brute-forcing the whole thing is not going to work here as there are so many possibilities, but thankfully the largest section is the timestamp, which should be easy enough to guess correctly. This will likely just involve finding the delay between my program guessing an ID and Twitter assigning an ID to the generated tweet. There will always be fluctuations here as we're dealing in milliseconds and both my computer and all of Twitter's system will be under varying loads. However, it should be somewhat consistent, at least within a given timeframe. Hopefully then we can just figure the other two out as they are much smaller—having only 1024 and 4096 possibilities, compared to the timestamp's over 2 trillion (that's a lot of milliseconds).

I knew I'd likely have to do some spamming as I was not going to get it right on the first go, so I created a new account to spare my few but wonderful followers.

Why the name?

From Wikipedia:
A quine is a computer program which takes no input and produces a copy of its own source code as its only output.

So it's only natural that a quinetweet would print its own URL, and thus hopefully quote tweet or retweet itself. And naturally I set the profile photo to one of Quine himself, and the banner to a relevant Escher lithograph.

Tweeting with the API

I signed the new profile up for a developer account to start tweeting programatically using Twitter's API. And began with their examples using twurl.

The first step is authorization with my shiny new API keys:

twurl authorize --consumer-key CONSUMER_KEY \
                --consumer-secret CONSUMER_SECRET

And now we can get straight into tweeting:

twurl -d 'status=Test tweet using the POST statuses/update endpoint' /1.1/statuses/update.json

Resulting in our beautiful first tweet:

Test tweet using the POST statuses/update endpoint

— quinetweet (@quinetweet) September 23, 2020

In the returned response there is quite a lot of information, but we only really care about the ID, in this case 1308911113229209601 which thankfully matches up with what shows up on Twitter's website—they're not lying!

Okay, so now let's quote tweet the previous tweet:

twurl -d 'status=https://twitter.com/quinetweet/status/1308911113229209601' /1.1/statuses/update.json

Beautiful! I can almost taste the recursion already.

https://t.co/FXps7y4yMw

— quinetweet (@quinetweet) September 23, 2020

Now to investigate the behaviour of the various components of the ID, let's do two tweets in quick succession, using a simple Bash loop:

for i in {1..2}; do twurl -d 'status=Quick succession test' /1.1/statuses/update.json; done

To which we're met with a warning from Twitter about the second attempt being a duplicate—so apparently Twitter do have some protection against unoriginality.

No worries, simply adding a variable should fix this:

for i in {1..2}; do twurl -d 'status=Quick succession test $i' /1.1/statuses/update.json; done

Oh no! This is also getting the same duplicate warning, what's going on? Let's check Twitter:

Quick succession test $i

— quinetweet (@quinetweet) September 24, 2020

How embarrassing—we've accidentally linked to Intelsat's stock ticker! We should have used double quotes:

for i in {1..2}; do twurl -d "status=Quick succession test $i" /1.1/statuses/update.json; done

Quick succession test 1

— quinetweet (@quinetweet) September 24, 2020

Quick succession test 2

— quinetweet (@quinetweet) September 24, 2020

Finally! Now we can say we're programatically tweeting without completely lying.

Okay now let's take a look at these last two IDs, splitting them into timestamp, machine ID, and sequence number:

1309237975868469248 -> (312146657912, 375, 0)
1309237977982345216 -> (312146658416, 362, 0)

We see the second was posted 504 ms after the first (from Twitter's point of view), the machine IDs differ by 13, and both the sequence numbers are 0. We might be able to get away with assuming the sequence number is most commonly 0. This is great news because it was the larger of the two non-timestamp components so greatly reduces the number of checks we'll have to make. The range for our brute forcing looks like it might be small enough after all!

While Bash was great to start off with, I'm more comfortable with Python, so...

Let's start guessing some IDs

I'm just going to post the final code here with a brief description of each function. I'm sure there are numerous ways the code could be improved (for one it should probably take the machine ID and other guesswork bits as arguments).

tweet_id_from_timestamp

This does roughly what it says on the tin, and was created by simply reversing the get_tweet_timestamp function that was helpfully shared in the article mentioned in the intro, including Twitter's timestamp OFFSET that they had already worked out.

tweet_id_to_parts

This gets a tweet id and splits it up into the parts described above: the timestamp, machine ID, and sequence number.

compare_ids

To see how badly off our guesses are, we'll need a function to compare the ID we guessed to the one Twitter actually assigned. While it might seem like a tweet ID is just one number and you might think you could just subtract the two to compare them, due to the nature of how they are created simply being off by one millisecond and getting everything else right would be lead your guess to be off by several million. For this reason it makes more sense to compare the individual parts so that is what we do here.

guess_tweet_id

Again, a simply named function that guesses a tweet ID based on the time it is called and another time offset and machine ID. Note here that we don't do anything about the sequence number as it was usually zero so there's not much point guessing anything else.

guess

This function actually does the posting of the tweets and will guess N different tweets in quick succession with the same time offset and machine ID. I kept N low enough so I could manually change the offsets and if they were very far off I wouldn't eat into the rate limit too much.

A non-gist version of the code is on Github here.

An idea that didn't work

While manually adjustment of the offsets and machine ID was getting me kind of close, I thought it could be even better to do that automatically. If these values were time-sensitive, a program would be able to update them much faster than I could. I tried to do this by updating based on the mean error of the previous several responses, but this ended up not really working (maybe the median or mode would fare better here). It ended up being easier to just eyeball the differences and pick something reasonable, though I'm not entirely sure if I explicitly know—even now—what I was doing.

Shit gets weird

With that all done, I began the sport of just letting it run in short bursts until the rate limit (300 tweets per 3 hour window) forced me to go do something else. Who knew rate limits could be such an effective public health measure?

A strange and completely unexpected thing began to happen. In these quiet hours of the internet, some of the attempted self-quotes started linking to tweets from other accounts, mostly in South America and Japan! What was going on? All of the guessed URLs had the quinetweet account name hardcoded into them, so why and how were they linking to other tweets?

https://t.co/OOUKS9dwCp

— quinetweet (@quinetweet) September 26, 2020

That is most certainly not my account. So what did we actually try to tweet here and what did it link to? Conveniently both of our accounts have ten character names so the URLs line up nicely in a monospaced font which makes visually comparing them even easier than normal.

https://twitter.com/quinetweet/status/1309684114073808896
https://twitter.com/gzhdigital/status/1309684114073808896

Okay, wild! We guessed someone else's tweet ID! And as the IDs are time-dependent that means they were met with an instantaneous retweet—creepy. Also, it seems like Twitter doesn't actually care about the username and just resolves URLs based on the tweet ID. I'm sure lots of people already knew that but it's new to me.

Let's try another, this time from the Pope: https://twitter.com/Pontifex/status/1107421599333007362

Okay this is pretty interesting, but back to the task at hand. We're met with plenty more of these close misses along the way, which brings up the idea that it's probably easier to guess your own ID at times of lower traffic, when fewer people are fighting for it.

The latest content discovery mechanism

There was one more of these examples that definitely deserves a shoutout. This one was slightly confusing at first because I actually retweeted someone else's retweet, but it's worth a watch ('tis a bit loud).

https://t.co/WKJ360tAsR

— quinetweet (@quinetweet) September 26, 2020

The song is Thundercat's Funny Thing which also has a great video, so I think I may have just found my new favourite content discovery mechanism—generating random tweet IDs and checking if they exist. Cue the old-timers saying that's how they browsed the internet before search engines.

The Promised Land

You may have noticed from the code that for every guess we print the actual ID followed by how far off the guess was with respect to the time, the machine ID, then the sequence number. Here's a sample of the terminal output:

1309935898243600384
7 -3 0
1309935900311334913
8 11 -1
1309935902421114881
0 0 -1
1309935904895758336
7 -1 0
1309935906963501058
3 11 -2
1309935909010317312
6 12 0
1309935911094886401
-4 12 -1
1309935913133314054
15 13 -6
1309935915259891713
1 -3 -1
1309935917319282689
2 0 -1

Hot damn, check the highlighted example:
The timestamp: exactly right, down to the millisecond!
The machine ID: nailed it!
The sequence number that we stopped caring about because it generally just seems to be zero: ... NOT ZERO!

Guessed ID: 1309935902421114880
Actual ID:  1309935902421114881

Fuck, that was close! I'm still not going to change the sequence number though as 0 still seems to be the most common value it takes. This makes sense as it's a counter and has to go through 0 to get to any other value. Similarly 1 should be more common than 2.

We get a few more that are super close. For a frustrating number of them the two more opaque ones are perfect and the time is just off by a few milliseconds. We adjust appropriately (the appropriate level in these close calls being not much) and soldier on:

1309951030889787392
-3 -2 0
1309951033003700224
-11 2 0
1309951035092463620
5 2 -4
1309951037185372161
9 14 -1
1309951039249027077
8 0 -5
1309951041321013248
0 0 0
Success

:O SUCCESS!!! We have done it! Twitter has eaten its own tail. In the throngs of the elation from finally getting this I immediately regret how mundane I made the success printout. Anyway, let's see the fabled quinetweet, the bringer of loops:

https://t.co/MAbIwtoonW

— quinetweet (@quinetweet) September 26, 2020

Okay, that's kind of disappointing (it works better in the Twitter app so I'm told, but still tempted to submit a bug-report) but I care much less about the visual display than the fact the deed is done! And it only took 960 tweets to do it.

If you want try to do this yourself using fewer tweets, better code, or anything else please be my guest! I wonder how reliably this can be done. Is one in every hundred tweets doable? Probably. One in ten? Could be tough, but then again there's so much more to learn about these IDs and lots of analysis that I didn't do, so it's entirely possible. If you want to use my code, it's all here.

And another thing

One other weird idea would be to tweet lots of IDs with timestamps from a good bit in the future—maybe a month or a year from now—and see the reactions when people realise they were quote-tweeted a year before they actually sent the tweet! Heck, why limit ourselves to such short-termism? There's enough room in these timestamps for almost 70 years of milliseconds. This means we could actually quote-tweet someone before they've even been born! Now that would be something.

If you liked this article, have anything to add, or have beaten my score please let me know under my tweet about this on Twitter, I'd love to hear from you!

And for anyone at Twitter who was depending on the network of tweets being a Directed Acyclic Graph, I'm so terribly sorry.