Building an AI tour guide that helps users navigate Ramp's platform for financial operations

Introduction

Tour de Ramp

The best tour guides do more than just point you in the right direction – they anticipate your needs, explain complex landmarks, and make each step of the journey easy to follow. Ramp’s AI-powered assistant – aptly dubbed as “Tour Guide” – is a seasoned sherpa that helps users navigate Ramp’s platform for financial operations. 

This agent-based solution guides users through tasks ranging from expense approval to dynamically adjusting credit limits within the Ramp web application. Armed with knowledge about Ramp’s platform, Tour Guide increases user productivity by showing users how they should accomplish the most important tasks.

Problem

Improving delight and platform accessibility

Ramp’s product automates a lot for users, from bill payments, to credit card access, to expense management, and more. Like any software with layers of functionality, users need to become experts on how to use and administer the tool. There’s an onboarding curve, and Ramp wanted to reduce the time it took for someone to self-serve their needs.

Ramp wanted to provide faster, more immediate assistance in the Ramp product that didn’t involve calling customer support for help, while also maximizing user delight. Instead of aiming for full automation, which could be higher risk and uncomfortable for their users, Ramp designed an agent that allowed users to see and pause actions as the agent walked through a task.

UX

Navigating and educating users with human-agent collaboration

Ramp’s Tour Guide UX educates users about the platform functionality while also building user trust as they see the AI agent taking actions step-by-step. Tour Guide takes control of the user’s cursor to perform actions a human would do in Ramp (e.g. clicking a button, navigating a dropdown, or filling out a form).

As the AI navigates through the interface, it provides step-by-step explanations of its actions. A small banner pops up next to each relevant element, offering context and rationale for each click or input.

What’s unique about the Tour Guide agent is its strong emphasis on human-agent collaboration. Users can see all the agent actions and interrupt or take control of the agent at any point, rather than just running it in the background. Ramp designers also implemented a springing cursor that keeps users engaged and feeling like active participants as the Tour Guide agent performs actions on their behalf.

When designing the user experience for Tour Guide, the Ramp team was careful to meet user needs without overstepping.

“We avoid putting users in flows where they don’t actually need the Tour Guide.” - Rahul Sengottuvelu, Head of Applied AI at Ramp

In this vein, users don't need to manually activate the Tour Guide’s capabilities - instead, the Ramp team developed a classifier that intelligently identifies relevant queries and automatically routes them to the Tour Guide feature when appropriate.

Cognitive architecture

Iterative action-taking

One of the Ramp engineering team’s unique insights was that every user interaction with the Ramp web app could be categorized into a scrolling-, clicking-button-, or text-fill step. So, to automate a task for the user, the Tour Guide agent would need to generate these interaction steps in the right sequence.

The Ramp team designed the agent to take as input the current state of the web app session and suggest the next best action. Each action the Tour Guide took updated the state of the app, so the agent generates exactly one action – scrolling, clicking, or text fill – at a time. The resulting altered session would then be fed to generate the next action on the tour. This iterative action-taking approach was more effective than designing the entire tour from start to finish, which typically required many scrolls, clicks, and text fills to fulfill the user’s request.

To generate the next best action, the team initially built a multi-step agent that made two separate LLM calls. The first step was for planning – i.e., given an array of options that the agent could interact with, make a plan to interact with these objects. The second step was a grounding step that executed the object interaction

However, using two discrete LLM calls, while great for accuracy, resulted in too slow of a user experience. Ramp switched instead to using a consolidated, one call prompt that combined planning and action generation in one step.

Prompt engineering

Optimizing model inputs for high-accuracy outputs

When designing model inputs, the Ramp team worked with their own component library and had a combination of image and text data. They developed an annotation script that would tag interactive HTML elements with visible labels, similar to functionality provided by the Vimium browser extension. They also incorporated accessibility tags from the DOM, which provided clear, language-based descriptions of interface components to pass into the model.

To make sure the model could generate actionable steps instead of just descriptions of the UI, the team focused on refining inputs through data pre-processing. They simplified the DOM to prune out irrelevant objects, which created cleaner, more efficient inputs that could better guide the model’s actions.

According to Alex Shevchenko, the engineer behind Ramp Tour Guide:

“The most effective way to improve an agent’s accuracy is by constraining the decision space. LLMs still struggle to pick the best option among many similar ones.”

In addition to streamlining their inputs, the Ramp team also experimented with prompt optimization to improve output accuracy. Instead of letting the model pick from a lengthy list of interactable elements, they found that labeling a fixed set in the prompt with letters (A to Z) made it clear to the model what options were available to process. This led to a significant improvement in output accuracy.

In this process, Ramp’s biggest hurdle was keeping the prompt as concise as possible, since longer prompts resulted in increased latency. While they tried context stuffing to piece together extra context with the user screenshot, they found it was more effective to focus on well-enriched interactions without overloading the prompt.


Evaluation

Guardrails to keep the agent rolling smoothly

Ramp relied heavily on manual testing to get a sense of which actions performed well and which didn't. Once they identified the agent’s patterns of failure or success, they added guardrails. The team hardcoded restrictions to prevent the agent from interacting with tricky pages – including those containing complex workflows like large canvas interfaces or tables with numerous elements.

This approach allowed Ramp to boost reliability by limiting risk in high-failure areas and focusing the agent on tasks it could handle smoothly.

Conclusion

Adding rigor paid off

What truly sets Ramp apart is its exceptional user experience design. With seamless integration, a visually engaging interface, and step-by-step guidance, Ramp doesn’t just solve problems – but also empowers users to master the platform over time.

Looking ahead, Ramp plans to expand this into a broader "Ramp Copilot" - a single entry point for all user queries and actions within the platform. This underscores their commitment to simplifying complex financial workflows with AI, while keeping the user at the forefront of their journey. 

And that's not all...
Discover more breakout AI agent stories below from the most cutting-edge companies.  
Breakout Agentic Apps
Go back to main page
Read next story

Superhuman

Ready to start shipping 
reliable GenAI apps faster?

LangChain, LangSmith, and LangGraph are critical parts of the reference 
architecture to get you from prototype to production.