Decoding Decentralized AI With Vishwa Raman
Five Questions with Head of Enterprise Solutions of Oasis Labs, Vishwa Raman.
As Head of Enterprise Solutions at Oasis Labs, Vishwa Raman leads multi-faceted efforts that include engineering, product, and customer success. As most of his work is focused on privacy-preserving access to sensitive data, Vishwa is the perfect person to discuss decentralized AI with. In the following, we'll walk through five questions about his research, decentralized AI, and more.
1. What's your definition of decentralized AI?
Vishwa Raman: AI technologies have existed for far longer than the idea of decentralized AI (DeAI). The latter has been less of an evolution since the dawn of AI and is more of an attempt at a revolution that has gained momentum since the advent of Large Language Models (LLMs) and ChatGPT. Decentralized AI attempts to stem an emergent concern that AI is too powerful to be controlled by a select few and needs to be democratized with transparency around training, inference, and incentives.
Decentralized AI, in my view, is best captured by the following visual:
AI is a powerful technology, and the strides we have seen over the last few years have been breathtaking. It is poised to become the most significant derivative data product in generations. The understanding of what data is used to build models, how useful they are when applied to specific human-scale problems, and ensuring value flows all the way back to the contributors of the data and models should be vested with humanity.
Anyone anywhere should be able to build models that solve problems that improve outcomes for everyone. Transparency in training and inference, data provenance, data utility, incentive mechanisms, governance, fairness, and inclusivity is critical and best enabled using trustless, confidential, and privacy-preserving blockchain technologies.
2. How did you start working on decentralized AI, and where do you see it going?
VR: The earliest foray into deep learning for me was building a multi-modal driver-assistive system, where a driver could talk to an AI assistant like they would talk to a friend, with context augmented by where the driver was looking. This was in 2012 using Theano, a system that predates TensorFlow, to build a Multi-Layer Perceptron to classify driver head orientation. Needless to say, we have come a long way in the 12 years since that work.
The earliest foray into decentralized AI was building a simple logistic regression model for classification to run as a smart contract on our early Rust-based runtime, predating Emerald! This was an onchain proof-of-concept and could handle minuscule datasets of a few thousand vectors and a handful of classes. It led to an affirmation of the fact that large-scale machine learning onchain, even if realizable, will be prohibitive in both time and cost. We, therefore, built Parcel that would support offchain confidential compute using Google Confidential VMs with the expectation that we would commit data access requests and grants onchain, store data confidentially offchain, and relegate compute-intensive operations, such as ML training, to offchain workers.
But, Parcel may have been too early for its time. AI training via federated learning remained in the prototype and discussion stages. With the advances in transformer-based models such as LLMs, we are at a fascinating stage where we realize our vision is well within our grasp, given Sapphire and Runtime Offchain Logic (ROFL), a newly launched generalized computing framework that enables arbitrary applications to function in a decentralized, verifiable, and confidentiality-preserving way. This means the benefits that we have pursued since the formation of the company (mentioned in the graphic above) are within reach.
3. Say a bit more about DeAI and ROFL.. what will it enable?
VR: Data is non-rival – once it is shared in its raw form, it can get reproduced, reshared, and reused ad infinitum without the data owner being aware of how it is being used and monetized. Consider AI pipelines: The data used to train models often has a clearly defined owner. Individuals as owners of their identity information, demographic data, financial data, wealth and buying patterns, thought leadership pieces, works of fiction and non-fiction, talks, lectures, etc. Corporations are owners of the vast amounts of data for which they have data rights, and the products they build with that data provide services and improve outcomes (for example, healthcare systems).
Once this data is shared, it is difficult to know how much value is derived from the data via derivative products such as AI models. Responsible and privacy-preserving use of data brings us to the following requirements:
A. Data use requires data owner consent.
B. Data use should be within confidential environments so that the data is used by an attested, verified algorithm and is not accessible to any individual or corporation.
C. The infrastructure that performs computation should be verifiably confidential and not owned by any single entity to ensure trustlessness.
D. The value derived from data, either in kind or otherwise, should be fed back to the data owners.
Sapphire and the Oasis Network enable all of the above. For AI pipelines, we have the following additional requirements:
A. Training workloads should run with confidentiality for the training data, with the models that are generated and stored confidentially and with defined ownership. The data should never be used for any purpose other than training.
B. Derivative products such as AI models, which “carry” the intelligence in the data and are the model providers’ intellectual property, should be used within confidential environments so that the inference results flow out, but the models remain private and confidential.
These requirements cannot be satisfied in fully replicated, verifiable computing environments, as in blockchain runtimes like Sapphire. The compute costs will be prohibitive, besides the inability to perform some operations given the constrained blockchain runtime environments that preclude network access, non-deterministic algorithms, and communication between compute nodes. The storage costs will be infeasible since AI models are often billions of bytes.
This brings us to ROFL, which, in simple terms, provides similar mechanisms for verifiable computing as Sapphire and the Oasis Network but with the ability to run compute-intensive workloads, such as AI training and inference off-chain, to enable scale and reduce cost.
From an architectural standpoint, ROFL is an extension of Sapphire that enables anyone to train AI models off-chain and perform inference off-chain, using similar confidentiality guarantees as on-chain computation at a fraction of the compute and storage costs. As an extension to Sapphire, ROFL enables these AI workloads to seamlessly use Sapphire for verifiability, provenance tracking, usage tracking, value determination, and transfer, extending trustlessness and transparency to AI pipelines.
Anyone should be able to run ROFL nodes and support a decentralized network of confidential computing for use in AI training and inference. Validators can co-locate ROFL nodes with their Sapphire and consensus nodes and establish a confidential computing mesh for decentralized AI. That would be the vision.
4. What objectives does decentralized AI aim to accomplish?
VR: The following are objectives that DeAI can tackle using a blockchain network with or without confidentiality.
A. Provenance tracking for the data that goes into training decentralized AI models so that consumers know exactly what data was used to build each model to enable informed choice in using these models.
B. Transparency in value flows, where payments for model use are in crypto, opening up the possibility of compensating data owners in a verifiable, non-repudiable manner given provenance.
C. Establishment of Decentralized Autonomous Organizations (DAOs), that bring together a consortium of related industry entities to provide their data and/or models for use, with onchain tracking of the value each organization brings, based on usage, to enable transparency in incentives and democratization of governance.
Note that without confidentiality in use, as provided by the likes of ROFL, the data and models will remain in place as sharing them requires trust that hasn’t worked well in the past leading to data silos. With confidential runtimes like Sapphire and verifiably confidential offchain computation provided by ROFL, we can tackle the following additional objectives.
A. Enable the movement of data and models from the clouds that host them to any decentralized node that provides confidential computing.
B. Enable joint training of AI models using data from different entities given data confidentiality in-use.
C. Enable models to freely participate in marketplaces without explicit trust, as the infrastructure handles the required confidentiality and integrity.
In short, ROFL puts the decentralized in decentralized AI.
5. Can you share some info on your current research?
VR: Most of my work, if not all, is in privacy-preserving access to sensitive data. One aspect of what I do is enable access to regulated data, such as HIPAA-regulated healthcare data, with minimal friction and using differential privacy. The other aspect of which I am an ardent fan is the democratization of healthcare using AI. The day we realize a DAO of healthcare institutions that share and deploy AI models so that every doctor has the same ability to help her patients as the best-regarded specialist for her patient’s condition is the one I cannot wait for. A doctor should be able to call a “friend” to discuss a patient’s condition and jointly determine the best treatment option available to her patient to ensure the best long-term outcomes.