Immediately after switching the page, it will work with CSR.
Please reload your browser to see how it works.
What might be interesting here is that they are thinking about taxonomic tool use-cases, and exploring training and therefore optimizing the utilization of them.
This to me is a proof of concept — an interesting one, but just a proof of concept. You can see from their example search that the model over-relied on search; it didn’t need to re-search three times to get the answer.
A next step that I think would be useful would be updating the reward function to penalize search; pressing the model to use search when it needs to and not before. This to me is a likely framework going forward where MCP tool costing matters, and would be really useful to have in the next gen of tool calling LLMs.
In the case of search we’d hopefully get a really useful signal and outcome for times the model is unsure — it would call a friend, and get good info! And for times it’s sure, we’d have taught it not to waste reward on that.
So far I only have the “cold start” data posted, but I’m planning on posting a full distillation dataset.
When OpenAI was launched this is what I thought it was going to be like. Something, something for the betterment of man kind.