Plato Data Intelligence.
Vertical Search & Ai.

CoinFund Leads $3.1M Bagel Network Round to Support Next-Gen Data Infra in web3 x AI Stack

Date:

7 min read

1 day ago

Bagel Network: The Next-Gen Computable Data Layer for AI

CoinFund is proud to lead a $3.1 million financing for Bagel Network, a startup building a decentralized protocol for collaborative embeddings datasets and expanding the decentralized AI computable data ecosystem. To get caught up on CoinFund’s ongoing research and investments on the emerging web3 x AI stack please see our previous content on Worldcoin, Giza, 2022 AI x web3 overview and our Gensyn seed thesis.

Intro to Vector Embeddings

As an introduction, vector embeddings are a way to convert words, sentences, images, and other data into mathematical objects (vectors) while preserving individual and relative data. For example, an embedding could translate the word “apple” into a 200-dimensional vector based on the word’s context across a large dataset. This vector would capture the essential meaning of apple and its relationships to related concepts like fruit, orchard, pie, etc. While the primary applications in vector embeddings have been in text with flagship models such as Word2Vec and GloVe, vector embeddings can be produced for other kinds of data including image and audio data. This context is critical as AI development becomes increasingly focused on developing multi-modality where a model can process text, image, or audio and output any out of any of these three mediums as well. Furthermore, embedding models could be used to capture larger data types such as user-specific embeddings that capture user preferences, behaviors, and characteristics or product-level embeddings that capture a product’s attributes, features, or any other semantic information.

The Bagel Opportunity

The commercial opportunity for vector databases has grown rapidly over the past 12 months alongside the mainstream adoption of early consumer AI applications such as ChatGPT, Midjourney and Runway, just to name a few. Bagel is among the first web3-native attempts to combine a vector embeddings database with an incentivized marketplace protocol, leveraging web3 primitives to supercharge permissioned data and model sharing and collaboration, with a potential path to winning the web3-native category from both a product and an incentivized network perspective, and the ability to move quickly to preserve its early leadership given founder Bidhan Roy’s cross-disciplinary professional experience on the Amazon Alexa team, at Instacart and Arweave. We believe Bagel Network is a key enabler for the next generation of AI applications, whose adoption today remains bottlenecked by the ability to provide contextualized, highly applicable and use case-specific responses, gated by the insatiable demand for training data, especially as most of the world’s data remains unstructured.

While some web2 embeddings companies (both VC-funded and corporate spinouts) are part of the broader competitive set, Bagel Network has been able to ship quickly to advantage of its time-limited opportunity to lead the embeddings category from a web3-native perspective, with an already-live demo, SDK and pilot users. Longer term, we believe that Bagel’s approach of building a decentralized protocol and marketplace for indexed vector embedding datasets positions the network at the intersection of two mutually-reinforcing key trends — the rise of LLMs (and derivative applications) and the embrace of the permissionless, transparent and decentralized core values of web3.

While the market is nascent for vector embeddings, there are data points we can consider. First, we can look at the relational database management market as an established market comp that could be reached (source). Today that market is worth $69.44B and growing at a CAGR of 12%. There’s also the end-market analysis: some primary industries serviced by vector embeddings include image recognition ($38B), recommendation engine ($4.55B), and AI chatbots ($5.4B) that which collectively are projected to grow with a CAGR of 20–40% through 2030. Lastly, global spending on artificial intelligence (including ML, AI robotics, computer vision, NLP and sensor tech) is now projected to grow from $300B+ in 2024 to $700B+ by 2030. With these figures in mind, vector embeddings are likely to play a role as an enabling technology for the increasingly capable multi-modal AI models and applications that will be emerging over the next decade.

Bagel’s Role in the Decentralized AI Stack

We believe that Bagel Network will supercharge permissioned sharing and collaboration through its cryptonative marketplace model solving key problems within the data layer of the AI tech stack. This fits the web3 ethos of permissionless access and collaboration all while delivering needed infrastructure for the next generation of AI. Currently, a disproportionate amount of data is owned and controlled by large entities, boxing out smaller organizations via accessibility to high quality datasets or simply the compounding effect of scaled intelligence. Bagel Network redefines the AI data landscape by creating a two-sided marketplace where machine learning engineers, researchers, and AI agents collaboratively build, trade, and license datasets. Because embedding generation is often one of the most computationally intensive parts of an AI pipeline, there exist high levels of redundancy that exist in vector database systems today, leading to inefficiencies, higher costs, and duplicated work. Bagel Network allows models to share embeddings, avoiding duplicate work. This is more efficient while retaining attribution via blockchain metadata and other necessary ingredients to fairly share future monetization potential to help route around cold-start-related friction. In the context of artificial intelligence, we are already seeing open source efforts to replicate closed source datasets to advance model improvement (see RedPajama-Data, reproduction of LLaMA training dataset, or the Mistral/Mixtral open model approach).

We anticipate that a vector database coupled with a decentralized network can out-compete through leveraging open source and collaborative development (an approach that has already won in the backend web2 stack). For example, a smart contract can manage permissioned access with specificity to discrete embeddings, which is not possible with a Github-like centralized approach. A protocol can reward data contributions, monitor and incentivize network participation (through forking), and track compute resource usage. Current vector database solutions lack the ability for collaboration, while open-source platforms like Github/HuggingFace lack the incentive to produce high quality embeddings. Today, much high-quality data exists within enterprises and public datasets, fragmented and underexploited which can in the future be onboarded, aligned, and monetized. Finally, an open marketplace allows permissioned development on embedding collections by multiple teams simultaneously, for example via open-source software but for vectorized datasets. This catalyzes innovation across sectors in contrast to siloed efforts.

Conclusion

As with any investment, many risks (execution, competition, scaling, monetization) exist with such an ambitious vision. However, we believe that Bagel Network exhibits promising early traction and is well-positioned in a high-growth market with several secular tailwinds in its favor, especially given a currently uncrowded greenfield opportunity set to design and launch a leading web3 implementation well-aligned with the AI/data value creation flywheel. Ultimately, CoinFund views Bagel’s long-term vision of creating a decentralized marketplace for machine learning computable datasets as a missing and critical piece of the part of the web3 stack being built for AI/ML use cases. While still early days, we believe that the market potential outweighs the risks — hence CoinFund’s high-conviction bet and our excitement to roll up our sleeves together with Bidhan Roy and the rest of the Bagel team. To learn more or sign up as an early data partner, visit www.bagel.net!

Disclaimer: The views expressed here are those of the individual CoinFund Management LLC (“CoinFund”) personnel quoted and are not the views of CoinFund or its affiliates. Certain information contained herein has been obtained from third-party sources, which may include portfolio companies of funds managed by CoinFund. While taken from sources believed to be reliable, CoinFund has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by CoinFund. An offer to invest in a CoinFund fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by CoinFund, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by CoinFund (excluding investments for which the issuer has not provided permission for CoinFund to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://www.coinfund.io/portfolio.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. This presentation contains “forward-looking statements,” which can be identified by the use of forward-looking terminology such as “may”, “will”, “should”, “expect”, “anticipate”, “project”, “estimate”, “intend”, “continue” or “believe” or the negatives thereof or other variations thereon or comparable terminology. Due to various risks and uncertainties, actual events or results may differ materially and adversely from those reflected or contemplated in the forward-looking statements.

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?