18 June 2023

Learnings from doing ML on web3 data


We have been running experiments using large language models (LLMs) to model web3 data.In the process of running these experiments we have discovered a few things about doing machine learning (ML) in the context of web3, and we would like to share these with the community.

Here’s what we’ve learned:

Web3 data is not optimized for batch read

Web3 data is not optimized for batch read, instead the few systems that occur are designed for write. The core use case for a blockchain is for you to store data, and often to do that, you write to the chain. ML use cases require batch read.

NO standardized data schema for all blockchains

There is no standardized data schema for all blockchains. For example, Ethereum has a different data model than Solana. Which means if you are building a generalized blockchain level solution, you will have to customize your code at the data layer for each blockchain that diverges. This isn’t necessarily a dealbreaker, however, in software design we like solutions that are generalizable and cost minimal effort to accommodate new examples of the same primitive.

Ethereum and Solana are both blockchains and supposedly should allow the creation of a metaverse, where there is a world where you can take your data with you anywhere you want on the internet. That day is still not today.

Non-Human readable code

Even though web3 is supposed to be open, any smart contracts do not post their code, instead what one can reliably get access to is the bytecode of the smart contract. And of course, bytecode is not human-readable.

Very nascent developer tooling

The developer tooling landscape for web3 is nascent. For example, let’s say you want to decompile bytecode back to source code, the tools available for this task are not robust. And there are more examples of tasks like this that should be fairly east to do but are not.


From the experiments we ran in web3, we came to the conclusion that the best way to approach it is to think of the web3 as a beta-level next generation web infrastructure. We are still deep in the building phase when it comes to web3 infrastructure, unfortunately we as a web3 community has treated it as if it was ready for production. There are quite a few significant things that need to be invented before we should unleash it on the consumer web.