Charles Srisuwananukorn (Together AI) Fireside Chat

Posts

Interview

May 9, 2025

We were thrilled to feature Charles Srisuwananukorn from Together AI at January’s Chat8VC. Charles is the Founding Vice President of Engineering at Together AI, where he leads the company’s work on AI infrastructure and clusters. Previously, he was Head of Applied Machine Learning at Snorkel AI and held engineering roles at Apple. He studied Computer Science at Stanford and has helped steer Together from an early contributor to open-source AI to a full-stack infra platform.

You might know their team from their algorithmic and data work on projects like FlashAttention and RedPajama. In this convo, Charles shared his experience scaling both hardware and systems engineering at Together, as well as their team’s philosophy around efficient AI: bridging breakthroughs in model architectures with low-level optimizations in networking, kernel design, and cluster reliability. We cover the challenges of running physical infrastructure at scale, lessons learned from handling esoteric GPU failures, and Together’s ambitions to support both high-scale model training and the next wave of small models for fast inference.

As a reminder, Chat8VC is a monthly series hosted at our SF office where we bring together founders and early-phase operators excited about the foundation model ecosystem. We focus on highly technical, implementation-oriented conversations, giving builders the chance to showcase what they’re working on through lightning demos and fireside chats.

If this conversation resonates and you'd like to get involved—or join us at a future event—reach out to Vivek Gopalan (vivek@8vc.com) and Bela Becerra (bela@8vc.com).

Excited to have Charles here with us. Before we dive into tooling and infrastructure, let’s take a step back. What’s it been like building Together during this intense period of demand in AI?

Charles (Together): Yeah, it's been a wild ride. We run our own infrastructure - actual physical clusters - which is a massive challenge. You’re talking about deploying GPUs and building site processes, in addition to all the software we ship - it’s not purely virtual. It’s very real, and that complexity is a big part of our day-to-day. But it’s also what makes it fun. And we’re growing a lot.

Vivek: Many AI-native companies are facing similar growth stories and challenges – though not with hardware and at the same level of the stack that you guys are dealing with.

When you look at the open-source AI ecosystem – what’s missing? What would you tell future founders in this room to build to support some of the challenges that you’re running into?

Charles: The obvious gap is good data. Clean, diverse, high-quality datasets are still hard to come by. That’s why we launched RedPajama early on. We saw the need - there was a lack of really good data to train models and explore LLMs - and tried to help fill it. That’s a big thing people need to contribute back to open source AI.

The other big one is tooling for reinforcement learning. As reasoning models get better, the ability to steer and refine them through RL becomes more powerful. But the tools still lag behind.

Vivek: Yeah, and we've seen a bunch of people who are working on products to help with the post-training lifecycle!

Part of why we're here today is to talk about Together GPU clusters. Maybe tell us a little bit more about that. What kinds of companies are using them? Why go and expand in this route rather than just being pure inference as a service?

Charles: We have GPUs and people can rent them and actually use them to train models, run inference, or do whatever they need. These are H100s, H200s, and we’ll have GB200s and B200s soon. We set you up with Slurm or Kubernetes or whatever you want – and make sure everything works well so that on day one, you can be productive.

We use these clusters ourselves for our inference service, for continuous training, and for our internal research. They’re designed by AI engineers and researchers, for AI engineers and researchers.

The reason we’re doing this ties back to our broader mission. We want to offer a holistic platform for LLMs, and compute infrastructure is a critical piece. If you're doing anything serious with LLMs, reliable compute is table stakes. We think this is a core part of the product.

As you scale these clusters to 10K, 50K, 100K GPUs, what are the biggest challenges that you guys have run into? Whether it's kernel optimization or networking, or something else?

Charles: Yeah, I mean I think the most common issues that we see in these clusters are the same ones that you’d see in the Llama technical report – GPUs falling out of the bus all the time, ECC errors in the GPUs, etc. But some of the more surprising things are around kernels and hardware reliability.

We’ve had issues with overheating transceivers. At one point we literally walked around the data center with an infrared camera to see what was running too hot and out of spec. That kind of low-level ops – it wasn’t what I thought I’d be doing when I started this job.

And then you start learning about things like: how do objects get cleaned off the optical fiber so that your Infiniband doesn’t flap and cause your training to slow down? These are deep physical-world problems. I didn’t expect to have to get this far into that layer, but you need it if you want to run clusters reliably.

How do you think about ops overhead at that scale? How much can you rely on automation? Or do you literally have people walking around with IR cameras?

Charles: Yeah, it’s both. At our scale, we can’t function without a ton of automation. We’ve got agents running on every machine, monitoring GPU utilization, thermals, failures and paging the team when something goes wrong.

But then, yeah, someone might have to walk into the data center. Maybe it’s rebooting a node. Maybe it’s replacing a bad cable. The systems are built to self-report, but the physical work still has to happen. These aren’t abstract cloud resources. They're real servers that need care and feeding.

And when you look at the broader market, there are a lot of people offering inference and renting out compute. What’s your core competency and what’s hard for others to replicate?

Charles: At Together, our focus is efficient AI. And to us, efficiency spans both the model side, algorithm design, and the infra side, kernel tuning, systems optimization, etc.

So when you rent a GPU cluster from us, you’re not just getting bare metal. You get access to our Together Kernel Collection. This is a suite of custom kernels that we’ve optimized to make training faster and more efficient. Drop them into your training loop, and you might see 10%+ improvements out of the box.

And we’ve been through the pain ourselves. We operate these clusters not just for customers, but for our own research and services. That operational experience compounds. When you’re running 10K+ GPU clusters and keeping them reliable for your own workloads, you learn what matters.

Let’s talk about R1 for a second. R1 distillation showed you can compress a big model down to something like a 32B and still get impressive performance. How does that change your infrastructure planning? Do you worry about building for giant models only to have people prefer smaller ones? Do you risk infra obsolescence if models stay in this mid-sized range?

Charles: Yeah, I think about that a lot. And it’s lucky for us because even though you can distill a model like R1 and get great performance at 70B, there’s still real demand for full-sized versions and more use-cases open up for smaller ones.

I was literally talking to someone tonight about this as we host DeepSeek, and they asked: “Are you hosting the big one?” People still want the big one! Especially in research, pretraining, frontier work. The appetite is there.

So when we build infra, we build it to support both. That means really fast Infiniband, fully non-blocking topologies, so we can support fully-distributed large models. But the same setup also runs small models just fine. It’s about flexibility.

And what about the extreme end like the 1B, 3B models that can run locally? We’re big believers in local AI here having invested in Ollama early. Do you see Together extending toward the edge with something in the middle like edge points-of-presence colocated close to where end users are but not quite in the cloud? Is that a real problem today or something more 3-4 years out?

Charles: Yeah, totally. I actually think that problem is real today even without super-small models.

Latency matters. A lot of inference use cases like AI companions and phone calls are super latency-sensitive. People don’t want to wait. So we’re already thinking about edge POPs, routing, co-locating inference close to the user. We’ve even had to tackle this latency problem already and plan this out even with our large models – to figure out how we build an edge network and reduce latency no matter where the request is coming from.

The smaller models just make that more viable and you should see a lot of people playing around with them. You could imagine a hierarchy with device, then edge, then a really big hub data center, and that opens up some really interesting architectural ideas. Like treating the device as an extension of our cloud.

Maybe one last thing to close us out. We've got a lot more compute coming online over the next year or two, and you guys are obviously continuing to scale fast. What's one thing that keeps you up at night and what are you most bullish and excited about right now?

Charles: I think the thing that keeps me up at night is that GPUs don't sleep. I'll get paged in the middle of the night and I’m just making sure everything keeps running. Literally Pagerduty keeps me up at night.

As we scale, the surface area just explodes. We’re at something like 160 people now, and the infra footprint is massive. It’s a lot of responsibility.

I’m also super excited. The most recent model releases like Qwen and Deepseek, and what we’ve seen with open models and smaller ones are game changers for accessibility. We’re going to see a huge wave of people suddenly able to run GPT-level models on their laptops.

That’s super cool and I can’t wait to see what people will do with it.

Continue Reading

Posts

News

Jul 16, 2025

The American Development Renaissance: Our Investment in Bedrock Robotics

The concept of building holds mythic status in the world of new venture creation, and the mythos often obscures the difficulty. In construction, there are no such illusions. Construction is a $2 trillion U.S./$13 trillion global industry where automation has barely scratched the surface of possibility. Beneath that surface is dirt—unfathomable amounts. For new housing subdivisions, commercial real estate developments, data centers to power the AI boom, highway expansions, and countless other projects, American abundance begins with dirt.

Posts

News

Jun 11, 2025

Announcing Our Investment in Outset

Ask anyone who’s ever begun a term paper the night before it’s due: there are stark tradeoffs between research time and research quality. This is especially true of primary user research, one of the fundamental ways large enterprises can ensure their products resonate with customers. Historically, this has required a choice between user interviews, which are high-fidelity but slow, manual, and expensive, and surveys, which are scalable, cheaper, and faster, but often result in unrefined, low-signal data. Although there have been some strong companies on the survey side, e.g. Qualtrics and Medallia, this uneasy compromise has always persisted—until now.

Posts

Interview

May 6, 2025

Joe Chen and Jonathan Shen (Upwork) Fireside Chat

We were thrilled to feature Joe and Jonathan from Upwork at March's Chat8VC in San Francisco. We covered their journey from teams like Google Brain and Cruise, and their own startup, to leading AI efforts at Upwork—building Uma, a suite of specialized LLMs powering workflows for freelancers and clients across the platform.

Posts

News

Apr 30, 2025

Clear Eyes, Fuzzy Joins, Can’t Lose: Announcing Our Investment in Structify

Human-quality workflows need human-quality data, an axiom that has only grown truer in the AI-first enterprise. However, access to complete, high-signal data remains a limiting factor, given steep data provider fees, inflexible schemas, AI hallucinations, and scattered, inconsistent, and mutating sources. Customers don’t need to be data scientists to recognize shovel-ready datasets, but if they need to be data scientists to generate them reliably, data will always be rate-limiting.

Posts

Apr 9, 2025

Quantifying the Impact of GenAI Developer Tools

It’s widely agreed that GenAI will transform software development, and GenAI dev tools have emerged as cornerstones of 8VC’s portfolio and broader AI productivity thesis. Up to now, however, hard data on the scale and specifics of this shift have been missing from the equation. In competitive industries, the speed and efficiency gains promised by GenAI coding tools could well mean the difference between market leadership and obsolescence. Companies can’t afford to select the wrong tools and end up on the wrong side of the AI adoption curve.

Posts

News

Apr 8, 2025

Glimpse's $10M Series A to Rescue CPG Margins

For many consumers, the grocery aisle is the avatar of post-COVID inflation. Even as headline inflation has cooled (from 9.1% in June of 2022 to 2.8% in February of 2025), food prices remain stubbornly high, driving popular perceptions of gouging. $10 egg cartons, once confined to artisanal producers in Portland who know every hen by name, have hit big box stores. But grocery is a famously low-margin industry, and for CPG brands, the cost of doing business can be prohibitive.

[mORE RESOURCES]

back to RESOURCES

Home

Resources

Portfolio

Fellowship

About

Build

Our Thesis

Jobs

Team

Contact

The American Development Renaissance: Our Investment in Bedrock Robotics

Announcing Our Investment in Outset

Charles Srisuwananukorn (Together AI) Fireside Chat

Share

Excited to have Charles here with us. Before we dive into tooling and infrastructure, let’s take a step back. What’s it been like building Together during this intense period of demand in AI?

When you look at the open-source AI ecosystem – what’s missing? What would you tell future founders in this room to build to support some of the challenges that you’re running into?

Part of why we're here today is to talk about Together GPU clusters. Maybe tell us a little bit more about that. What kinds of companies are using them? Why go and expand in this route rather than just being pure inference as a service?

As you scale these clusters to 10K, 50K, 100K GPUs, what are the biggest challenges that you guys have run into? Whether it's kernel optimization or networking, or something else?

How do you think about ops overhead at that scale? How much can you rely on automation? Or do you literally have people walking around with IR cameras?

And when you look at the broader market, there are a lot of people offering inference and renting out compute. What’s your core competency and what’s hard for others to replicate?

Maybe one last thing to close us out. We've got a lot more compute coming online over the next year or two, and you guys are obviously continuing to scale fast. What's one thing that keeps you up at night and what are you most bullish and excited about right now?

Continue Reading

The American Development Renaissance: Our Investment in Bedrock Robotics

Announcing Our Investment in Outset

Joe Chen and Jonathan Shen (Upwork) Fireside Chat

Clear Eyes, Fuzzy Joins, Can’t Lose: Announcing Our Investment in Structify

Quantifying the Impact of GenAI Developer Tools

Glimpse's $10M Series A to Rescue CPG Margins

Links

Company

Programs

Contact