The Other 90%: Introducing the Aryn-8VC Partnership

Posts

News

Sep 28, 2023

Unstructured data is the undiscovered country of the enterprise, containing the institutional wisdom AI seeks to capture. Unlocking the value of that data has proved elusive, as both enterprise search and LLMs have fallen short individually. Aryn's mission is to help you answer questions from all of your data. To this end, Aryn is bringing generative AI to OpenSearch and data preparation - and bridging the gap from results to answers.

Why Now?

While building analytics services at AWS, Aryn’s founding team observed several trends pointing towards a new company. Many customers successfully processed structured data but struggled with unstructured data - which represents up to 90% of enterprise-generated data. However, most analytics services under development focused on structured data. A few proprietary unstructured data search platforms existed for end users and IT teams, but developers were largely ignored. It became clear that better unstructured data services were urgently needed. More recently, advances in generative AI have captivated users, who increasingly prefer natural language interactions in their applications. The next wave of search experiences will be conversational. Aryn will make them attainable.

Architecture & Advantages

Aryn is built on two realizations: 1) In generative AI, data quality determines answer quality - hence the necessity of cleaning and enriching unstructured data. 2) Generative AI is not a layer or tool, but an enabler at every layer. By augmenting data preparation and search with AI, as well as for AI, it’s possible to deliver a conversational search experience that’s high-quality and easy to build on. The result is Aryn’s conversational search stack for unstructured data:

Conversational APIs let developers easily build conversational search apps, leveraging their preferred generative AI model(s) with plug-and-play convenience.
Hybrid Search integrates the best of semantic and keyword search behind a seamless interface, using OpenSearch, the popular open source search and analytics suite.
Sycamore, Aryn’s semantic data preparation system, uses generative AI to clean, extract, enrich, and summarize data. In other words, it improves data quality to improve answer quality.

Aryn’s key differentiators are threefold:

Speed to production and scale. For most companies, retraining LLMs to answer questions on private data is prohibitive. This can be overcome using retrieval-augmented generation (RAG), which runs semantic search on grounded data to help generative AI models create answers. However, this approach typically requires a complex pipeline of components lacking scale, security, and maintainability. Aryn’s conversational search stack provides everything developers need to build conversational apps - without requiring AI or search expertise. With OpenSearch as a central piece, developers can easily scale their applications to production.
Better quality answers. When using RAG, answer quality reflects the data presented to the LLM, and the ability to experiment with different models and prompts can have a significant effect on quality. High-stakes work requires accurate outputs, not just decent probabilities. To tackle this problem, Aryn created and open sourced Sycamore, a robust, scalable semantic data preparation system for making unstructured data meaningful - and therefore searchable. Aryn allows developers to customize their data prep and search pipelines, so they can ground AI models on the highest quality data required by their use cases.
Fully open-source. Aryn’s conversational search stack is 100% open source, under the Apache v2.0 license. It allows developers to customize pipelines and choose their AI models, avoiding lock-in or overreliance on any LLM. A cross-section of early Aryn customers reveals the demand for conversational search, spanning verticals such as financial services, manufacturing, life sciences, government, productivity tools, and media.]

Aryn in Action

Already, Aryn consistently generates answers that would be impossible using purely lexical or LLM-based approaches. As an illustration, take the following query: "What algorithm was the most recent winner of the sort benchmarks?" Aryn's answer:

"The most recent winners of the sort benchmarks in 2022 are Exoshuffle-CloudSort for CloudSort and ELSAR for JouleSort [citation]. CloudSort measures the minimum cost for sorting a large amount of data on a public cloud. JouleSort measures the amount of energy required to sort a certain amount of records. [citation]."

Asking this question of most lexical search platforms returns a document or snippet with matching keywords, not a direct answer. These systems force users to wade through results that may not even address the questions asked, let alone correctly.

In the absence of ground truth, most LLM-based platforms today don't fare much better. Here's ChatGPT's answer: "I don't have access to real-time data, and there isn't a specific 'sort benchmarks' competition or award that I'm aware of with regularly updated winners. Sorting algorithms and their performance are typically evaluated in research papers, textbooks, and online coding communities. Benchmarks are conducted by researchers and developers to compare the performance of different sorting algorithms in specific scenarios."

Famously, when engineers join Google’s search team, they’re given a presentation on the “Life of a Query”, describing exactly what happens from the moment a user types a query and hits “enter”. Applying this frame to Aryn’s conversational search stack, we can appreciate the full journey from question to answer:

User input: Everything starts with a user posing a natural language question.
Query understanding: The system uses both recent conversation interactions and input query in the context for the LLM to understand and rewrite the query for better retrieval. For example, if a user asks “What’s the temperature of a cool star?”, traditional search systems might interpret ‘cool’ as fascinating or interesting. LLMs, recognizing the scientific context, would understand that the user is talking about lower temperature stars, i.e. red dwarfs.
Index lookup: As with a traditional search system, the system checks its index of known documents to identify potentially relevant documents for the retrieval set.
Dual ranking: The system uses a combination of traditional, lexical ranking signals such as BM25, TF-IDF, as well as semantic and contextual relevance from LLMs, to sort the documents in the retrieval set by relevance.
Synthesis: The system uses a foundation model to synthesize the answers from the top- ranked documents, along with their citations.
Conversation platform: The system keeps a record of questions and answers in the same session over time, and uses that information to hone in on certain topics over time.

Reflections

Only an extraordinary team could resolve the challenges of creating true conversational search app infrastructure. CEO Mehul Shah served as GM of OpenSearch and Glue at AWS, leading product, engineering, operations, and GTM, and bringing these technologies to some of the world’s largest enterprises. CTO Ben Sowell was technical lead for both AWS Glue and AWS Lake Formation. Chief Product Officer Jon Fritz led product management for AWS’ Apache Spark and Hadoop services, and in-memory database services like ElastiCache for Redis. Having three former AWS Principal Engineers and two AWS GM/Directors among Aryn’s founding team is extraordinary, but only one token of their search, big data, and cloud systems bona fides. On a personal note, BG and Mehul are fellow practitioners from the old school of Silicon Valley database kernels, and have been hoping to work together for almost two decades. We’re confident Aryn will prove worth the wait.

Today, Aryn exited stealth, announcing a $7.5 million seed round. We participated for many reasons, starting with the team, which combines vast experience with a rare cohesiveness. Architecture-wise, many vector databases have cropped up, but we bet on a proven search platform (including a scalable vector database) already used by tens of thousands of engineers and thousands of enterprises, and pre-approved by their IT departments. The elegant integration of transformer models throughout adds untold possibilities to these mature, powerful frameworks - without locking customers into one model. Crucially, Aryn is the only end-to-end infrastructure for conversational search that is 100% open source, configurable, and model agnostic. For developers and users alike, this commitment to flexibility ensures they can meet mission requirements now and in the future.

In a sense, Aryn enables stereoscopic vision for the enterprise, combining familiar and novel methods into something natural and universal. In practice, Aryn achieves an elegant symbiosis: it allows developers to build apps for mission critical endeavors without needing to become AI experts, while making the contributions of AI experts exponentially more useful to fields where they have no prior knowledge. We are honored to be a part of Aryn’s story, and look forward to seeing that story become an epic.

Continue Reading

Posts

News

Sep 4, 2025

Announcing the Augment-8VC Partnership

Earlier this year, we led Augment’s $25 million seed round, our second time backing Harish Abbott (CEO) and Art Rivilis (CTO). The momentum around Augment is unmistakable, as they demonstrate how AI will reshape the logistics industry—and announce their Series A funding round.

Video

Interview

Aug 23, 2025

How 8VC Builds Billion-Dollar Companies | Palantir, Addepar, Saronic

Posts

Aug 4, 2025

AI Doctors Won’t Work For Free

We are fortunate to live in an age where the cost of medical expertise is rapidly approaching zero. In just the past few months, OpenAI models were shown...

Posts

News

Jul 16, 2025

The American Development Renaissance: Our Investment in Bedrock Robotics

The concept of building holds mythic status in the world of new venture creation, and the mythos often obscures the difficulty. In construction, there are no such illusions. Construction is a $2 trillion U.S./$13 trillion global industry where automation has barely scratched the surface of possibility. Beneath that surface is dirt—unfathomable amounts. For new housing subdivisions, commercial real estate developments, data centers to power the AI boom, highway expansions, and countless other projects, American abundance begins with dirt.

Posts

News

Jun 11, 2025

Announcing Our Investment in Outset

Ask anyone who’s ever begun a term paper the night before it’s due: there are stark tradeoffs between research time and research quality. This is especially true of primary user research, one of the fundamental ways large enterprises can ensure their products resonate with customers. Historically, this has required a choice between user interviews, which are high-fidelity but slow, manual, and expensive, and surveys, which are scalable, cheaper, and faster, but often result in unrefined, low-signal data. Although there have been some strong companies on the survey side, e.g. Qualtrics and Medallia, this uneasy compromise has always persisted—until now.

Posts

Interview

May 9, 2025

Charles Srisuwananukorn (Together AI) Fireside Chat

We were thrilled to feature Charles Srisuwananukorn from Together AI at January’s Chat8VC. Charles is the Founding Vice President of Engineering at Together AI, where he leads the company’s work on AI infrastructure and clusters. Previously, he was Head of Applied Machine Learning at Snorkel AI and held engineering roles at Apple. He studied Computer Science at Stanford and has helped steer Together from an early contributor to open-source AI to a full-stack infra platform.

[mORE RESOURCES]

back to RESOURCES

Home

Resources

Portfolio

Fellowship

About

Build

Our Thesis

Jobs

Team

Contact

Announcing the Augment-8VC Partnership

How 8VC Builds Billion-Dollar Companies | Palantir, Addepar, Saronic

The Other 90%: Introducing the Aryn-8VC Partnership

Share

Why Now?

Architecture & Advantages

Aryn in Action

Reflections

Continue Reading

Announcing the Augment-8VC Partnership

How 8VC Builds Billion-Dollar Companies | Palantir, Addepar, Saronic

AI Doctors Won’t Work For Free

The American Development Renaissance: Our Investment in Bedrock Robotics

Announcing Our Investment in Outset

Charles Srisuwananukorn (Together AI) Fireside Chat

Links

Company

Programs

Contact