Joe Chen and Jonathan Shen (Upwork) Fireside Chat

Posts

Interview

May 6, 2025

We were thrilled to feature Joe and Jonathan from Upwork at March's Chat8VC in San Francisco. We covered their journey from teams like Google Brain and Cruise, and their own startup, to leading AI efforts at Upwork—building Uma, a suite of specialized LLMs powering workflows for freelancers and clients across the platform.

They shared lessons on building robust, production-grade models in-house, and why generalist LLMs fall short for long-form, human-in-the-loop interactions. We got a deep dive into their custom data pipelines, safety-aware training processes, and the tradeoffs between off-the-shelf APIs and tightly integrated models tuned to platform behavior.

As a reminder, Chat8VC is a monthly series hosted at our SF office where we bring together founders and early-phase operators excited about the foundation model ecosystem. We focus on highly technical, implementation-oriented conversations, giving builders the chance to showcase what they’re working on through lightning demos and fireside chats.

If this conversation resonates and you'd like to get involved—or join us at a future event—reach out to Vivek Gopalan (vivek@8vc.com) and Bela Becerra (bela@8vc.com).

It’s my pleasure to introduce our friends Joe & Jonathan at Upwork! Over the past couple years, they've taught a good chunk of the 8VC team a bunch about LLM internals and helped us become better predictors of forward progress in AI. They started a company, got scouted by Upwork to build their tech there, and are now running AI R&D there. Want to say a few words about yourselves before we dive in?

Joe (Upwork): Sure, absolutely. As Vivek mentioned, we were doing a startup and then joined Upwork to lead a lot of the LLM training and AI functionality here. My background is actually in AI robustness and reliability. I led reliability research at Cruise and also at Waymo in the self-driving car space. So I come at this from the "old man" mentality, because I’ve seen how things in self-driving don’t always work 100% of the time. That’s why it’s so important these systems are robust and reliable. That’s the mentality we’re bringing to Upwork too—building AI systems that actually work in practice.

‍Jonathan (Upwork): I’m definitely much more on the software side. I come from a background at Google Brain, doing research into deep learning models. Now that I’m at Upwork, I’m making sure we develop our own in-house models and continue doing foundational research—because that’s critical to bringing these models to market.

Cool! Let’s maybe start by talking about Uma. First, for those who don’t know—can you briefly explain what Upwork does and then talk about the first use cases you had in mind for Uma, both on the client and freelancer side?

Joe: Upwork is a large two-sided work marketplace for freelancers and businesses. It’s a public company that’s been around for a while. People come to the site to find freelancers to connect with skilled freelancers for their business needs—website development, mobile software, those kinds of things. A lot of what we’re doing with Uma, Upwork’s Mindful AI, is building models to help that process.

The first use cases were custom models to help freelancers write proposals—since on Upwork, a freelancer has to write a proposal for every job. We also built Uma to help clients post jobs and evaluate candidates. It’s very much human-in-the-loop. That excites us—it’s not about full end-to-end automation just yet. It’s about making the business flywheel spin faster.

Let’s go a bit deeper into the model side—why focus on specialized models? And what did you have to do around synthetic data, post-training, and data curation?

Joe: I should caveat that a lot of what we’re doing is still in user testing – we had to build out our entire GPU cluster first, and we got to a custom fine-tuned model five months after we joined. The first things that we productionized were proposal writing and candidate evaluation. Even these models alone are showing significant impact – significant contributions to our business.

We love data curation. One of the biggest advantages of being at Upwork is that we can go out and hire talented writers and domain experts from our platform to write interesting new data for us. For some of our initial models, we had to define: What is Uma’s personality? What is the style of interaction we want? So we hired people—many of them screenwriters and others already on the Upwork platform—to write conversations. That helped us differentiate the style of interaction early on.

We also have a robust synthetic data program which is another big element. You can only go so far with real data—it’s somewhat scalable, but not massively so. So we take raw data from help articles, internal documentation, and use it to form better datasets. You can’t just toss raw tokens into a model and expect it to work.

Jonathan: On the technical side, we’re really leveraging open-source models like Llama and Deepseek. Ultimately for specialized experiences, we need to control the kinds of data that’s gone into model weights.

What’s been the most interesting use case where specialized models outperformed generalist ones—or where a generalist model would have required a ton of scaffolding?

Joe: Yeah, so when we joined Upwork, our initial hypothesis was that off-the-shelf models are great for short Q&A, but they fall short for long-tail of business use cases. Like, if you want to build an agent that can handle complex, multi-turn conversations—customer service, project scoping, that sort of thing—you don’t want something that’s just prompt-tuned to death with a giant flowchart.

So at our prior startup, we were working on algorithms to support long-form dialogue: how to stabilize those conversations, remove the need for prompt-tuning, and avoid overfitting to brittle conversation trees. That’s what led to us joining Upwork—Upwork saw what we had built and said, “Yes, this is what we need.”

And it’s worked out. In user studies, we saw that our fine-tuned models doubled quality scores—style, accuracy—relative to off-the-shelf models like GPT-4 or Claude. That’s not shocking. A model trained to act a certain way will always outperform one that’s just lightly prompted.

More surprising was how much better the custom models were even on non-conversational tasks—like pure factual Q&A. We have a slide that compares our Q&A model to GPT-4o and Claude. Even when RAG results were bad, our model did better than GPT-4o did with good RAG. That extra layer of robustness matters—it’s like teaching your model to forget how to be wrong.

In our demo of Uma, we covered everything from training data collection (how to simulate user behavior with our freelancer network & collect accurate human data), Q&A evals, UX choices (how to best embed this within the existing Upwork product, and best imbue notions of memory), and how to handle contextual task switching. To learn more about the work Joe, Jonathan, and their team is doing, read this blog HERE.

Continue Reading

Posts

News

Jun 11, 2025

Announcing Our Investment in Outset

Ask anyone who’s ever begun a term paper the night before it’s due: there are stark tradeoffs between research time and research quality. This is especially true of primary user research, one of the fundamental ways large enterprises can ensure their products resonate with customers. Historically, this has required a choice between user interviews, which are high-fidelity but slow, manual, and expensive, and surveys, which are scalable, cheaper, and faster, but often result in unrefined, low-signal data. Although there have been some strong companies on the survey side, e.g. Qualtrics and Medallia, this uneasy compromise has always persisted—until now.

Posts

Interview

May 9, 2025

Charles Srisuwananukorn (Together AI) Fireside Chat

We were thrilled to feature Charles Srisuwananukorn from Together AI at January’s Chat8VC. Charles is the Founding Vice President of Engineering at Together AI, where he leads the company’s work on AI infrastructure and clusters. Previously, he was Head of Applied Machine Learning at Snorkel AI and held engineering roles at Apple. He studied Computer Science at Stanford and has helped steer Together from an early contributor to open-source AI to a full-stack infra platform.

Posts

News

Apr 30, 2025

Clear Eyes, Fuzzy Joins, Can’t Lose: Announcing Our Investment in Structify

Human-quality workflows need human-quality data, an axiom that has only grown truer in the AI-first enterprise. However, access to complete, high-signal data remains a limiting factor, given steep data provider fees, inflexible schemas, AI hallucinations, and scattered, inconsistent, and mutating sources. Customers don’t need to be data scientists to recognize shovel-ready datasets, but if they need to be data scientists to generate them reliably, data will always be rate-limiting.

Posts

Apr 9, 2025

Quantifying the Impact of GenAI Developer Tools

It’s widely agreed that GenAI will transform software development, and GenAI dev tools have emerged as cornerstones of 8VC’s portfolio and broader AI productivity thesis. Up to now, however, hard data on the scale and specifics of this shift have been missing from the equation. In competitive industries, the speed and efficiency gains promised by GenAI coding tools could well mean the difference between market leadership and obsolescence. Companies can’t afford to select the wrong tools and end up on the wrong side of the AI adoption curve.

Posts

News

Apr 8, 2025

Glimpse's $10M Series A to Rescue CPG Margins

For many consumers, the grocery aisle is the avatar of post-COVID inflation. Even as headline inflation has cooled (from 9.1% in June of 2022 to 2.8% in February of 2025), food prices remain stubbornly high, driving popular perceptions of gouging. $10 egg cartons, once confined to artisanal producers in Portland who know every hen by name, have hit big box stores. But grocery is a famously low-margin industry, and for CPG brands, the cost of doing business can be prohibitive.

Posts

Mar 27, 2025

How Design Becomes Your Competitive Edge

In the startup world, design is your strategic roadmap. Investing in it early means avoiding costly detours, optimizing efficiency, and outshining competitors. Design doesn't just elevate your product—it creates the vital connection between your vision and your customers' needs, accelerates growth, and drives margins by keeping efforts lean and focused.

[mORE RESOURCES]

back to RESOURCES

Home

Resources

Portfolio

Fellowship

About

Build

Our Thesis

Jobs

Team

Contact

Announcing Our Investment in Outset

Charles Srisuwananukorn (Together AI) Fireside Chat

Joe Chen and Jonathan Shen (Upwork) Fireside Chat

Share

Cool! Let’s maybe start by talking about Uma. First, for those who don’t know—can you briefly explain what Upwork does and then talk about the first use cases you had in mind for Uma, both on the client and freelancer side?

Let’s go a bit deeper into the model side—why focus on specialized models? And what did you have to do around synthetic data, post-training, and data curation?

What’s been the most interesting use case where specialized models outperformed generalist ones—or where a generalist model would have required a ton of scaffolding?

Continue Reading

Announcing Our Investment in Outset

Charles Srisuwananukorn (Together AI) Fireside Chat

Clear Eyes, Fuzzy Joins, Can’t Lose: Announcing Our Investment in Structify

Quantifying the Impact of GenAI Developer Tools

Glimpse's $10M Series A to Rescue CPG Margins

How Design Becomes Your Competitive Edge

Links

Company

Programs

Contact