“There are things known and there are things unknown and in between are the doors of perception.” — Aldous Huxley
I’m Huxley Westemeier (26’) and welcome to “The Sift,” a weekly opinions column focused on the impacts and implications of new technologies.
______________________________________________________
AI has been in the news a lot in the last week: OpenAI acquired the website domain “chat.com” for over $10 million, Apple Intelligence features have started to appear on newer devices, and perhaps most importantly, AI-generated election misinformation spiked on social media services last Tuesday. Instead of focusing on these stories, which are pretty self-explanatory, I’d like to highlight a specific technological advancement.
Decart.ai, a company specializing in real-time AI-generated video, has trained a model on perhaps the most unsuspecting source: Minecraft. Researchers used hours of Minecraft gameplay collected by OpenAI and created a transformer model called Oasis that “generates frames autoregressively” according to the site. Before Oasis, text-to-video generators, such as OpenAI’s Sora, processed multiple frames simultaneously in both directions to ensure visual consistency. Oasis does the complete opposite by generating each frame independently while relying on context from the previous frame.
Between the split second between frames, the model can consider inputs from the player. If you press the spacebar, the model will determine, based on context from past players, that it should start a jump animation in the next frame.
I recommend trying the public demo here (completely free) before continuing.
I noticed a few things: sometimes if you turn away from a scene and look back, it will entirely disappear. The inventory system is broken (the number of torches you might have, for example, changes from frame to frame), and while it’s playable, the entire game runs slowly and at a low resolution.
But there’s an essential distinction between this game and Minecraft. The official game has a built-in physics and lighting engine that manages the visual elements and helps the game feel more immersive and responsive. The AI-generated version doesn’t just lack physics or lighting algorithms: there’s no game engine at all.
That’s right.
Everything you see is solely a result of training data combined with keypress information. When you move your mouse around, it’s not telling a fake camera to pan around in 3d space (how the Minecraft system works). The system simply understands from training information that the world rotates when the user moves their mouse.
This is a big deal: it’s the distinction between fake AI systems (including Tesla’s recent Tesla Bot demo that turned out to be controlled by humans) and actual AI models that can take user input and reshape their output in real time.
The technology isn’t fully fleshed out, as anyone who plays Oasis will quickly find. However, it’s still impressive to consider that the model is (somewhat) mimicking the gameplay and visual elements while not requiring dedicated game engines. I firmly believe that AI systems such as chatbots and video-generation models have significant ethical concerns: using an actor’s likeness or an author’s copyrighted content to train shouldn’t be allowed. But using it for a gaming/interactive purpose- like changing backgrounds in games to mimic a user’s preferences- opens up a world of possibilities.
While Oasis hasn’t perfectly matched the visual fidelity of Minecraft, I still prefer it to the upcoming Minecraft Movie’s hyper-realistic style.
If you know, you know.