NHacker Next
login
▲Launch HN: Halluminate (YC S25) – Simulating the internet to train computer use
66 points by wujerry2000 22 hours ago | 42 comments
Loading comments...
zebomon 21 hours ago [-]
This is very interesting. I think a lot of people may be quick to overlook the value of such simulators when thinking about AI agents at the extremes. (Either they're not good enough to trust or they're so good they'll leapfrog over any economic value here.)

My own experience makes me lean toward thinking that the truth is somewhere in the middle in this situation, and that simulators like these will be valuable. I've been experimenting a lot with computer use on my website Bingeclock, passing through different prompts along the lines of "make a movie marathon based on X." The newest agents are consistently impressive, while also being consistently imperfect in surprising and interesting ways.

Whether or not all the labs are already running this kind of thing internally for themselves, you would know better than I. But it's an idea that seems very useful nonetheless. Congratulations on the launch!

wujerry2000 20 hours ago [-]
Computer use agents are starting to perform well on websites/apps that are in their training distribution, but still struggle a lot when dealing with tasks outside their distribution. A big reason why is because many more niche/enterprise applications are really hard to test on in the real world, hence the need for sims!

re: labs doing this internally. They definitely are! However, the scale of sims buildout is going to be massive, probably many orders of magnitude above what we have today. We think it makes sense for one central player to do this because a really good simulator can be used by multiple people at once. It doesn’t make sense for every AI lab/company to build out their own environments if an industry standard catalog exists.

reactordev 2 hours ago [-]
Conway’s law strikes again…
zebomon 20 hours ago [-]
Intriguing analysis. I'll be following along with interest!
nasmorn 3 hours ago [-]
This is very interesting but I would worry if this proves to be an important part of the solution, why would Expedia not release a sandbox that returns validation if agent use becomes valuable for them.
mandeepj 14 hours ago [-]
Have you looked at agents from OpenAI and perplexity? Sure, they aren't perfect, but at the same time, they aren't far from near ready.

Does this simulation really required? There's another YC startup, they're processing PDFs I believe. They didn't train their systems on any simulation.

Edited to reword and add more context.

wujerry2000 14 hours ago [-]
OpenAI agent is very impressive!

That being said, there are still a lot of use cases its not good at, and also looking at long trajectory tasks, enterprise work tasks, etc. I imagine those are all still very nascent.

I think we are still very early on computer use, being "production ready" requires probably close to 95%+ accuracy on most tasks and we're not there yet for most use cases.

davecyen 14 hours ago [-]
Very cool - is it possible to simulate this on a live production site (i.e. instead of Halluminate Flights, just test the agent live on Expedia)? Even though you don't have access to the backend json, presumably you could verify the right values were entered in the frontend/UI?
wm2 14 hours ago [-]
yup, though without access to the code it's much harder to pull the state of the components - becomes more like a web scraping problem, it's a brittle and much hackier than just intentionally exposing component state like we can do in the sim.

more importantly though are use cases that depend on the data. the data on real google flights/expedia is constantly changing, so it's impossible to build datasets based ground truth, e.g. the answer for a task like "Find the cheapest round-trip flight option from Bologna (BLQ) to Dushanbe (DYU) if I leave on 2026-05-05 and come back on 2026-05-15. Return the total price and the flight numbers for all flights." isn't stable. on our site, we control the data, so that answer is stable (deterministically random). so controlling the whole clone rather than running on the prod site unlocks richer and more repeatable tasks/testing.

lastly, our site runs the exact same locally as deployed, it has zero internet dependencies. so it can be run offline directly on the cluster with no issue for network latency/failures

BobbyJo 10 hours ago [-]
Had this exact idea recently, applied to various software tooling. I think agents of all types are going to follow a similar path to self-driving cars: first 80% comes in a big boom, and the last 20% comes over a decade of training and simulations.

I think each agent use case is going to need a simulation for its reward to eek out the last 20%.

Edit: Realized I forgot to say Great Work! Looks Cool!

wujerry2000 9 hours ago [-]
Self driving cars are a really good place to derive intuitions. Robotics as well!

Both those spaces are still optimizing on the last mile performance gains that get exponentially harder.

The good thing about computer use is building software environments are faster and also more repeatable, so hopefully we see quicker improvements here. :)

sandGorgon 3 hours ago [-]
this is very cool ! i contribute to an opensource mobile browser (github.com/wootzapp/wootz-browser). would love to have it work in Westworld if it makes sense for you folks.
DearAll 16 hours ago [-]
Love what you’re doing. Are you currently open to interns? Would love to connect with you and chat more about using high quality data to help people better train and evaluate their ai agents!
wm2 14 hours ago [-]
hey not hiring right now but connect with me on twitter and we can talk more there: https://x.com/wgm752
orliesaurus 18 hours ago [-]
Good luck Jerry!!! Interesting pivot for sure, playgrounds for AI seems like a good idea, I wish someone tackled them in 3D too (not just for browser/computer agent I mean) :P
whymauri 19 hours ago [-]
Are these simulations shared between your customers, or are you building bespoke environments per client/user? How does the creation of environments scale?
wujerry2000 19 hours ago [-]
Theses are really good questions!

we share the public/consumer simulators, but we also build bespoke environments on a per customer basis (think enterprise sites or even full VMs loaded with applications and data).

environment creation scalability is a big priority for us. we currently automate most of the process, but it still takes a fair bit of manual work to finish them and to get the details right. there is some reusability across environments, for example, we can use the flight results generation code in any travel/flightbooking sim. we also have some semi-automated approaches for creating tasks and verifiers. but still lots of work to be done here.

whymauri 18 hours ago [-]
Super interesting, thank you.
sealthedeal 21 hours ago [-]
Super cool. What would the real world use cases for SME adoption?
wujerry2000 20 hours ago [-]
A few common ones we've heard

Engineering: QA automation is huge, closes the loop on "fully automated" software engineering if another computer use system is able to click around and help identify bugs in software

Deep Research: probably the biggest use case for computer use right now, finding information that isn't easily indexed or accessible via APIs.

General RPA: This is industry specific, but lots of just everyday knowledge work involves data transfer between many platforms that sucks and no one wants to do. A great example is Epic in Healthcare. SO much labor is employed just to write and read information from this desktop app that isn't easily accessible. Imagine a computer use system that can do automated data pulls at scale for legacy desktop apps. This is a huge huge use case, and something that we're excited to try and improve with simulators of things like Epic, SAP, Salesforce, etc.

Consumer: Lots of just general everyday tasks. I would recommend checking out https://yutori.com/ if you're interested in seeing how a computer use agent can be helpful in your day to day. Its fun for daily news reports, restaurant reservation checking, etc.

CodingJeebus 21 hours ago [-]
Curious to see how this works out. The flight booking example is interesting because it’s one of the last purchase powers I’d want to hand over to an AI.

If it gets a major travel detail wrong, purchases a business class ticket on accident, etc. and I need to adjust the booking by calling the airline, then I’m way less happy than I was if I just bought the ticket myself. Not to mention what happens when Google flights gets a UI refresh and knocks the accuracy rate of the agent down even 10%.

Digital criminals are gonna love it, though.

I’m personally much more interested in automating browser tasks that aren’t economically valuable because that mitigates the risk.

wujerry2000 20 hours ago [-]
UI refreshes knocking down simulator realism is a real issue that we're still trying to solve.

I think this will probably be a mixture of automated QA/engineering and scale.

Another interesting path is actually partnering directly with software providers to offer their platforms as simulators IF they see there is a competitive advantage to training agents to perform well on their UI.

This idea we're really excited about, but it would require a company to see real revenue potential in enabling agentic access vs not. I'd say we're still on the "block them out" phase of the internet (ex. see Cloudflare's recent post about bot detection: https://blog.cloudflare.com/perplexity-is-using-stealth-unde...)

14 hours ago [-]
mousetree 21 hours ago [-]
Why are flight bookings the go to example always? For most people, booking a flight happens infrequently, is a non-trivial expense (to your point), and is not that burdensome to do yourself.
wujerry2000 20 hours ago [-]
We agree that as a demo flight booking is probably overused.

However, in talking with my AI Labs, their perspective on flight booking is a little different. "Solving" flight booking requires the AI agent to solve a LOT of hard problems. Namely, personalization, context, weighing multiple options, interacting with the UI, math, then wrapping that all up into a coherent response. The thought process is IF a computer use agent is able to solve flight booking well, then we will have developed many other powerful primitives that will scale to other problems.

So as a standalone use case, I'm inclined to agree this might not be where the most agent traction is seen. However, as a research/capability goal, there are some generalizations that could apply to other very important use cases.

jedberg 13 hours ago [-]
> and is not that burdensome to do yourself.

I don't know about you, but it takes me hours to book a flight if it's for my family, because I'm usually booking a flight, a car, and a hotel, and I have to constantly min-max the costs between hotels on certain days, flights on certain days, and cars on certain days.

If it's not burdensome for you, then you're either taking very simple trips or you're so rich that you don't care.

mandeepj 11 hours ago [-]
> I have to constantly min-max the costs between hotels on certain days, flights on certain days, and cars on certain days.

I agree it's a burdensome chore!

Just wondering - your hotel stay can't be less than the days between your flight. For car, one can manage to cut down with Uber/public transport, but still turns out to be expensive than a rental car.

jedberg 10 hours ago [-]
> your hotel stay can't be less than the days between your flight.

This is exactly right, and why it's such a pain. Because if I have a bit of flexibility, I have to figure out which flying day is best for prices and seats, and then see if the hotel is more or less between those days.

For example, if I fly on Tuesday I can save $400 vs flying Sunday. But if I want to stay a week, the hotel may not have the following Sunday. So now I have to look an alternate hotel, which may not include parking like the first one, and so on and so on. There are so many variables that can all change based on the day of arrival and departure.

We used to have travel agents for this (and still do!). But I've used travel agents, and I've used (other people's) personal assistants, but no one ever gets it right. I only trust myself, my wife, and my sister in law to get this right.

Having an AI agent that gets this right would be incredible.

> For car, one can manage to cut down with Uber/public transport, but still turns out to be expensive than a rental car.

If I'm getting a car it's usually because it's a place where Lyft and public transport won't work. Otherwise I always default to public transport and then Lyft if necessary.

fragmede 20 hours ago [-]
It's because most people have done it; and it's infrequent and sufficiently expensive that makes it enough of a pain point to make for a good example. Because it's infrequent, most people don't have a rigorous well-practiced system for how to go about it to get the optimal ticket for their particular circumstances for that flight, and because it can be somewhat expensive, there's a bit of a burden taken on in order to optimize for price as well, especially given all the shenanigans airlines play with pricing.

If you're rich, you can just look for the ticket at the time you like on your preferred airline and buy a first class ticket, whatever the price, for whenever you want to fly, even if it's tomorrow. For the rest, that's not practical. So the flight search has to begin a few months out, with the burden of doing multiple searches (in incognito mode) across various airlines and/or aggregators, in order to optimize various factors. This takes a non-trivial amount of time. Add in looking for hotels and rental cars, and for some it's fun, for others it's an annoying burdensome chore that stands in the way of being on vacation.

It's just an example use case though. Similar to how "robot maid" that folds clothes isn't the be-all or end-all for robotics, if an AI is able to perform that task, it's going to have capabilities necessary for performing a wide variety of other tasks.

mandeepj 11 hours ago [-]
> (in incognito mode)

I used to do that, but when I cross-compared with normal mode, the prices were the same.

superb_dev 15 hours ago [-]
Airlines will love it too. How long until an AI company gets paid to prefer a certain company
wujerry2000 14 hours ago [-]
I think this is totally going to be the case!

AI vibe coding tools already prefer some solutions over others, probably because of training data distribution/post training preferences. This is leading to massive revenue differences and growth compared to companies that have not optimized to be AI agent preferred/in their training data distribution.

I imagine something similar will happen over time, where companies who are in the training data distribution get used by agents more, while others who neglect this get slowly phased out because systems don't know how to use them (out of distribution).

mrbluecoat 21 hours ago [-]
Interesting name for an AI company - one letter away from hallucinate..
wujerry2000 21 hours ago [-]
Yea haha ... early idea was illuminate + hallucinations. Naming isn't our strength :)
aresant 17 hours ago [-]
It's a great / memorable / and tongue-in-cheek name that anybody seriously in the space will instantly get and appreciate.
mousetree 21 hours ago [-]
I thought it was halloumi + illuminate
suninsight 20 hours ago [-]
how about halarax ...halucinate and paralax
inLewOf 20 hours ago [-]
Not too late to change the logo to a friendly piece of halloumi :) (best i could find from a quick google image search https://www.redbubble.com/i/poster/Halloumi-by-PaulSDesign/1...)
wm2 16 hours ago [-]
might to make some merch from this!
seemaze 20 hours ago [-]
Was disappointed to discover no cheese was involved in this venture
rickcarlino 21 hours ago [-]
I misread it as humiliate. Side note that this is not intended as a joke. This name might not be good long term.
bobotowned 21 hours ago [-]
[dead]
hmokiguess 12 hours ago [-]
[dead]
thebiglebrewski 21 hours ago [-]
Man, I was kind of hoping this was a YCombinator-backed cheese factory. But good luck on the launch!
mikepurvis 21 hours ago [-]
“good luck with lunch”
bobotowned 21 hours ago [-]
[dead]