Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.

Source:https://github.com/SoraKumo001/next-streaming

⬅️ Visual Reasoning Is Coming Soon
AIPedant 9 daysReload
This seems to ignore the mixed record of video generation models:

  For visual reasoning practice, we can do supervised fine-tuning on sequences similar to the marble example above. For instance, to understand more about the physical world, we can show the model sequential pictures of Slinkys going down stairs, or basketball players shooting 3-pointers, or people hammering birdhouses together.... 

  But where will we get all this training data? For spatial and physical reasoning tasks, we can leverage computer graphics to generate synthetic data. This approach is particularly valuable because simulations provide a controlled environment where we can create scenarios with known outcomes, making it easy to verify the model's predictions. But we'll also need real-world examples. Fortunately, there's an abundance of video content online that we can tap into. While initial datasets might require human annotation, soon models themselves will be able to process videos and their transcripts to extract training examples automatically. 
Almost every video generator makes constant "folk physics" errors and doesn't understand object permanence. DeepMind's Veo2 is very impressive but still struggles with object permanence and qualitatively nonsensical physics: https://x.com/Norod78/status/1894438169061269750

Humans do not learn these things by pure observation (newborns understand object permanence, I suspect this is the case for all vertebrates). I doubt transformers are capable of learning it as robustly, even if trained on all of YouTube. There will always be "out of distribution" physical nonsense involving mistakes humans (or lizards) would never make, even if they've never seen the specific objects.


nkingsy 9 daysReload
The example of the cat and detective hat shows that even with the latest update, it isn't "editing" the image. The generated cat is younger, with bigger, brighter eyes, more "perfect" ears.

I found that when editing images of myself, the result looked weird, like a funky version of me. For the cat, it looks "more attractive" I guess, but for humans (and I'd imagine for a cat looking at the edited cat with a keen eye for cat faces), the features often don't work together when changed slightly.


Tiberium 9 daysReload
It's sad that they used 4o's image generation feature for the cat example which does some diffusion or something else, results in the whole image changing. They should've instead used Gemini 2.0 Flash's image generation feature (or at least mentioned it!), which, even if far lower quality and resolution (max of 1024x1024, but Gemini will try to match the res of the original image, so you can get something like 681x1024), is much much better at leaving the untouched parts of the image actually "untouched".

Here's the best out of a few attempts for a really similar prompt, more detailed since Flash is a much smaller model "Give the cat a detective hat and a monocle over his right eye, properly integrate them into the photo.". You can see how the rest of the image is practically untouched to the naked human eye: https://ibb.co/zVgDbqV3

Honestly Google has been really good at catching up in the LLM race, and their modern models like 2.0 Flash, 2.5 Pro are one of (or the) best in their respective areas. I hope that they'll scale up their image generation feature to base it on 2.5 Pro (or maybe 3 Pro by the time they do it) for higher quality and prompt adherence.

If you want, you can give 2.0 Flash image gen a try for free (with generous limits) on https://aistudio.google.com/prompts/new_chat, just select it in the model selector on the right.


uaas 9 daysReload
> Rather watch than read? Hey, I get it - sometimes you just want to kick back and watch! Check out this quick video where I walk through everything in this post

Hm, no, I’ve never had this thought.


rel_ic 9 daysReload
The inconsistency of an optimistic blog post ending with a picture of a terminator robot makes me think this author isn't taking themself seriously enough. Or - the author is the terminator robot?