Tome Fireside Chat
In the spirit of sharing some lessons from our events, earlier this year we had the pleasure of hosting Keith Peiris and Henri Liriani, the co-founders of Tome, at our first Chat8VC gathering in April. As a reminder, Chat8VC is a series hosted at our SF office where we bring together founders and early-phase operators excited about the foundation model ecosystem. We focus on having technical, implementation-oriented conversations and give founders and builders the opportunity to showcase what they’re working on through lightning demos and fireside chats. This can be a fun side project, related to a company they’re starting, or a high-leverage use case within a company operating at scale.
Keith & Henri shared their philosophy around presenting ideas, how that can be augmented through generative AI, and where Tome might be headed in the months to come.
If any of these themes excite you and you’d like to chat more with our team about them or attend future Chat8VC events, please reach out to Vivek Gopalan at email@example.com and Bela Becerra at firstname.lastname@example.org!
Let's start with a little bit of background on you both, because I think it's important that we understand why you want to change how we share ideas. You both have a lot of product and design experience, and you’ve been making a lot of presentations throughout your careers. And Keith, you were an entrepreneur before you even went to school – take us back to what inspired you in the first place to build Tome.
Keith: I’ve spent a good chunk of my adult life working in consumer social. I was at Meta, Instagram, and I started Citizen after that. It's funny – when you work with a social network, the people creating content actually drive the whole ecosystem. You need people to create content frequently so your audience returns, views that content, and ultimately views ads. Because of this dynamic, we’ve put a lot of engineering and design effort during the mobile era into accessible power of expression.
When I say that, think about what you can do now with the TikTok, Instagram or Snapchat camera relative to Adobe After Effects. A lot of that innovation was making the really rich expression that used to be in professional tools available to someone to play with on their phone. We’re now able to get people to share hundreds of selfies a day, documenting what they’re eating, thinking and how they’re spending time. Presumably, because of that, we became a little closer to our friends in society.
But if you want to share something in your head, you're stuck using something that approximates a document or a PowerPoint. So a lot of our thinking was, "Wait a minute, can we help make communication, expression, power, more easily available when you want to share something in your head?"
The other piece is that, when you work on consumer communication products at scale, nobody ever uses your product the intended way. Everyone uses these different modalities of video, text, images, gifs, payments in a unique way. We quickly became excited about building an open-ended communication product that could weave between different modes, so that you could use the right tool for the right job.
Henri: I was always interested in shortening that journey from this clear picture in your head of something you might want to say to someone, and really getting them to understand it. Just because you can draw it on a piece of paper doesn’t mean someone else understands. We’ve developed some ritualistic tools to convey that understanding.
The slide deck is really well suited to the syllogism – you could build one brick of an argument at a time for someone to understand – this was one of the reasons why it seemed like an interesting use case to focus on initially. Especially because a lot of the tools that have classically been used to create decks aren't very oriented towards that. They're not optimized for coaching you on how to create a compelling argument, or how to substantiate what you're saying, or make what you're saying look good for your audience. It's a lot like, "Hey, let's get you on this shortest path to checking the box and completing this ritual with whatever you're doing in life."
And do you consider the diversity of aesthetics as a core component of the communication, or is it a lower priority compared to the content itself?
Henri: Yeah, I mean the authentic and pedantic answer that we started with initially was that we didn't want you to think about design at all. We wanted you to focus on: What are you trying to say? What order are you making these points in? Do they follow? Are they compelling? Do you have the right information in the right place? And are you wasting time getting it and assembling the way it looks so that you can share it with someone?
We launched Tome initially with up to two tiles per page, and with one background color and one typeface. We've since added flexibility but there's still an ethos of that that we care about – you shouldn't be bogged down in a thousand different decisions about the parameters and visualization.
We obviously want to give you those vectors and the fullness of time, and a lot of that can come from the generative components.
Keith, one point that you made is that everyone uses consumer social products in different ways. The deck is so ubiquitous – everyone thinks about how to present using a deck. When you were thinking about the initial vision of Tome, did you think about scoping that problem for specific personas, rather than going broad and designing for everyone. Has that changed over time?
Keith: One of the things that Henri teaches me every two weeks is that we're actually more similar than different. If you can make something that you really love, there's probably more people like you than you think. If you deconstruct the billion people or so that use PowerPoint or Google slides, the largest persona is probably sales. But I think we just decided in earnest to build for people who we knew, and we figured that their needs were similar enough to most people.
We ended up building for people like us – product managers, designers, and engineering leaders. We figured that we at least know those people, so we could beg them to use our early stage product. When we started to enjoy using Tome, then we realized there were a lot of other people who also started to like it.
And as you started to see adoption from all of these different personas that may be more similar than different, how do you figure out what the right features to add, especially on the AI side, and when did you start to get product inflection?
Henri: I think similarly a lot of these things come from a mix of introspection and just checking that with people that we could find time with and talk to. A lot of people found the “cold start” problem the hardest thing. People procrastinate and sit in front of their screens and think: "Oh, what do I say? What do I do to get going?" And then once you're going, you might be progressing at a faster rate because you've kind of given yourself a set of boxes to fill.
The structure piece felt like something pretty intuitive that was blocking a lot of people in the beginning. So applying an LLM to generate some structure – even if it wasn't the right structure – was just provocative enough to unblock a lot of people. Imagine if someone just sent you a partially incorrect outline for everything you needed to do all the time, it would probably still be pretty useful, and I think it’s actually kind of analogous to a lot of other successful products in the space.
And as you guys launched that, what was the initial feedback and how do you think about instrumenting that feedback in the business? Which also is something super important as you guys have grown to millions of active users.
Keith: A lot of it is drinking from the fire hose. First we launched automation generation, which was material in the sense that most people who use PowerPoint don't know how to draw. I thought it was really helpful to be able to generate a custom visual for the point you were trying to make.
When we launched the first version of automatic Tome generation, and when Henri posted a video of it on Twitter, things just started to erupt. We had everyone from teachers in Korea using it for lesson plans, to sales people making eight Tomes a day. All of these people are trying to do very different things with Tome – we’re trying to find the right overlap.
Henri: The only thing I'd add is that we have to build feature parity with incumbents that have been around for decades.
It's hard to imagine being able to immediately have all of exactly the right nuanced expression vectors that those tools have on day one. The V1 of “Auto Tome” though did a really good job of showcasing the power of multimodal storytelling from a really small input.
It's a good preview for the version that will come out very soon that does a better job at selecting the right supporting artifact for every page, where that might not be a generated image, but instead a chart that got data from a certain place that is helpful to you.
There's some amount of roadmap determinism where you have to get future parity with PowerPoint slides, and then there's some amount of non-determinism for all the net-new generative capabilities? How do you trade that off internally within the Tome engineering and product teams,with how much time are you going to spend on reaching future parity versus all of these new experiments?
Henri: We're debating this nearly every day. We have an “offensive roadmap” which is making progress towards this vision that doesn't exist anywhere else in the world. There's also defensive stuff we have to do to keep our site up, to keep models from drifting, and also to some extent button-up feature sets that will cause leakage like export formats.
The offensive roadmap has to do with a lot of first principles thinking plus end-user validation. And also a focus on broad adoption versus narrowing too deeply on any one type of persona right now while we're still waiting for Tome to be distributed.
Keith: This is what makes the development fun. As product people, you're defined more about what you choose not to build than what you choose to build.
You're working through the question of “If we do X and Y do we end up with joy?” You need this joy or this pull to get someone to use your product. We’ve taken an indirect product path where we focused on the joy and the “wow”, and we hope that our users will give us a chance to build the rest.
To transition into some of the roadmap – the “chance to build the rest” – and something that we’ve discussed together before, how do you pull in source of truth information into presentations? It's so valuable, right? Stale data in a presentation is just not acceptable. And the process of building and generating reports and getting all this data centralized in one place, rather than just hooking in your presentation system, is tedious. You should be connect the framework you use to share ideas with your sources of truth. How do you guys think about that on the roadmap? What's the most exciting thing you guys have there?
Keith: There are a few! Henri, do you want to talk about citations and then I will talk about some data integrations?
Henri: Yeah! I think one key primitive to exist in a format like this for both a creator and a viewer is the construct of the citation, which is basically just a relationship between something that was generated and a source of some kind that you can read to decide if you agree with it or not. We’re working to get that primitive right.
Is that mostly a trust element or do users want to go retrieve more information from the cited sources? How is usage evolving?
Keith: I would say a little bit of both.
It's funny, one of our best sources of enterprise funnel has been TikTok. I remember talking to the COO of a large bank who found Tome because his daughter watched a TikTok about Tome.
His statement was, "We actually like the structure of the content that you're generating but we just want to pull content from our workspace.” We’re working on that domain specific retrieval functionality.
Henri: Back to citations, the basic version looks like: you ask the question "How many people live in California?" and a web search occurs, information is retrieved, a summary is generated, and you can click on the summary and see what source led to that fact. This same workflow can be used to do the same sort of question answering from internal data.
Cool. So there's some element of internal knowledge base search that you guys will end up having to build. For example, if I want to pull up a PRD that was in Notion, you can backlink to that. What do you think are the hard problems that you guys will have to solve, and what kind of talent will you have to bring onboard?
Keith: One of the fun challenges about this space is that you have to build a really high performance editor that feels fluid and intuitive. And then you also need to build a way to ingest data from a wide variety of web services.
We're going to operate outside of the Microsoft walled garden and the Google walled garden – and most modern companies keep their data outside of this. We're also building and growing a very strong LLM team, who's now starting to fine-tune models, and very soon we'll be building our own models for these tasks.
On that front, base models are getting better at multimodal generation. I think there's a question around how content gets generated for Tomes or how you can go from text input to presentation. Right now, if I understand correctly, you're decomposing language model output and then structuring it.
Today you generate text and then structure it into tiles, but at some point you'll be able to generate tiles directly with both image and text as output. How much of that do you think you guys need to do on your own? What exciting research work have you seen?
Henri: The foundation model work is happening faster than it makes sense for us to try to compete with. As a result, all of the features that we've built today have a swapable architecture to be able to move to iterations of new models, and this has put us in a position to very easily try new models and test new prompt templates for those models.
In time, that language that you just described, basically text-to-Tome, needs to exist, and that's a pretty independent piece. The long-term vision looks like a way of generating an entire Tome through some intermediate representation.
It could be represented as JSON right?
Keith: Yep that's one of the ways we can sort of approximate multi modality now!
For more on Tome’s latest product and technology developments, check out the Tome Blog.