Martian has worked with companies of all sizes, from startups to Fortune 500 companies. One of those Fortune 500 companies is in the telecommunications industry, with internal and external applications including videoconferencing, chat, and scientific reasoning. Viewing AI as critical to the future of their company, they wanted a system that could out-perform the competition and deliver a maximum of value. Martian built routing systems for their verticals, letting them boost performance by using multiple models in concert instead of only a single model.
In order to validate that the router could provide value in their use case, we first conducted a POC for a specific task: routing for foreign language understanding. After successfully completing the POC, we developed routers for use cases. In both cases, we were able to elevate performance above GPT-4, the model which the company was previously using.
In the POC, we worked with the company to develop a series of test cases where GPT-4 struggled in their application (namely, foreign language understanding). Martian collected representative datasets for the test cases, then plotted the performance of individual models against a router.
In the following figure, points farther to the right are better. Vertically lower points are also less expensive.

In the POC, the Martian Router was able to achieve higher performance than GPT-4 in foreign language understanding across languages and tasks within those languages.
After the POC, routing systems were constructed for video-conferencing (referred to below as "Google Meet"), chat (referred to below as "Messenger"), and scientific reasoning (referred to below as "Wolfram Alpha"). In each of these use cases, routing was able to out-perform the single model the company was previously using, as evaluated via human preference annotations from users. In the use case most important to the company – videoconferencing – routing produced a system preferred over GPT-4 90% of the time.
In the following figure, taking up more of the bar is better. Red is the router, gray is GPT-4.
