Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.


⬅️ OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computers
abrichr 15 daysReload
Thank you for making this available!

Check out for a cross platform (Mac and Windows) open source library that learns to perform tasks in desktop apps by observing human demonstrations.

We believe a major shortcoming with conventional approaches to AI agents is expecting them to be able to figure tasks of arbitrary complexity from first principles. While understandable from an academic perspective, this is unnecessary for practical utility, since humans perform these tasks constantly.

With OpenAdapt you can demonstrate to a model how to perform a task, then have it take over the task, with additional user-supplied natural language instructions.

I have created an issue to evaluate OpenAdapt on OSWorld: Contributions welcome!

Edit: from

> The ./trajectories file contains the annotated trajectories for each data item in ./examples for finishing the task.

Unfortunately this file does not appear to be included in the repo. I have submitted an issue here:

ec109685 15 daysReload
Buried in their presentation is the current effectiveness of agents to complete desktop computing tasks.

Humans are able to complete the tasks given at 70%+ effectiveness while the best model is at 12% (GPT4-v). Most of the other models were <5% effective.

TheRoque 15 daysReload
Gotta love people working on replacing themselves. Jokes aside, seeing an AI interacting with a computer is kind of scary. It's not just outputting text anymore, it's doing the full work of a human working on a computer, meaning... a ton of people

stavros 15 daysReload
I built a small Python script so I could let GPT-4 debug my system issues:

It works surprisingly well!

bitwize 15 daysReload
Coming soon: Human-trained AI that can actuate a robotic hand to fill in paper forms with a Selectric typewriter. The doom of us all!