JOLTx Reaching For SOTA In Zero Knowledge Machine Learning (zkML).

What Is zkML?

Zero-knowledge machine learning (zkML) is an emerging field that combines zero-knowledge proofs (ZKPs) with machine learning (ML) to enable verifiable and privacy-preserving ML computations. zkML allows a prover to demonstrate that an ML model was executed correctly without revealing sensitive inputs or requiring the verifier to rerun the computation. You can also use it to hide the model weights and model. This is crucial for applications in decentralized finance, privacy-preserving AI, and secure off-chain computations. By ensuring that ML inference is both private and trustless, zkML paves the way for transparent and scalable AI-powered applications in blockchain, Web3, and beyond.

Technical Foundations of zkML

JOLT-based zkML (JOLTx) – JOLT is a zkVM, a zero-knowledge virtual machine that can prove the execution of arbitrary programs (JOLT paper , JOLT Blog).

JOLT targets the RISC-V instruction set, meaning you can compile any program (written in a high-level language like Rust or C++) down to RISC-V assembly and then generate a proof that this assembly code ran correctly. Under the hood, JOLT introduces a new front-end based on a concept called the lookup singularity: instead of heavy algebraic constraints for each operation, it converts CPU instructions into lookups in a gigantic pre-defined table of valid instruction outcomes. Every step of computation (like an addition, multiplication, or even a bitwise operation) is verified by checking it against this table via a fast lookup argument named Lasso, or more recently Shout.

The remarkable result is that JOLT’s circuits only perform lookups into these tables, drastically simplifying proof generation. For example, a 64-bit OR or AND operation – which would be expensive to enforce with normal arithmetic constraints – becomes a single table lookup check under JOLT. The design ensures the prover’s main job per CPU instruction is just committing a handful of field elements (about six 256-bit numbers per step) and proving those lookups correct. This makes JOLT a general and extensible approach: any ML model can be treated as a program and proved with minimal custom circuit design.

We propose contributing to the JOLT stack with speciality precompiles for ML operations. Non-linearities can be handled via lookups efficiently, while other common properties can be handled by sum-check and a few other techniques unique to the JOLT zkVM. We are calling this extension JoltX, but it’s really just JOLT with a few added tools and specialty pre-compiles. We expect to release it in a few months, so stay tuned!

What other cool zkML projects are out there?

EZKL (Halo2-based) – EZKL takes a more traditional SNARK circuit approach, built on Halo2.

Instead of emulating a CPU, EZKL works at the level of the ML model’s computation graph. Developers export a neural network (or any computation graph) to an ONNX file, and EZKL’s toolkit compiles that into a set of polynomial constraints (an arithmetic circuit) tailored to the model

Each layer of the neural network – e.g. convolution, matrix multiplication, activation functions – is translated into constraints that a Halo2 prover can solve. To handle operations that aren’t naturally polynomial (like ReLU activations or big integer ops), Halo2 uses lookup arguments too, but in a more limited way. For instance, large tables (like all $2^{64}$ possibilities of a 64-bit operation) must be split or “chunked” into smaller tables (say 16-bit chunks), and multiple lookups plus recombination constraints are needed to simulate the original operation. This chunking adds overhead and complexity to circuits. Thus, EZKL’s proof generation involves creating many such constraints and using Halo2’s proving algorithm (often KZG or Halo-based commitments) to produce a proof. The strength of EZKL’s approach is that it’s model-aware – it can optimize specifically for neural network layers and even prune or quantize weights for efficiency. However, each new model or layer type may require custom constraint writing or at least regenerating the circuit, and the prover has to handle a large system of constraints which can be slow for big models.

DeepProve (GKR-based) – DeepProve, by Lagrange, takes a different path using an interactive proof protocol known as GKR (Goldwasser–Kalai–Rotblum) .

In essence, GKR treats the entire computation (like a forward pass through an ML model) as a layered arithmetic circuit and proves its correctness via the sum-check protocol rather than heavy polynomial arithmetic DeepProve’s workflow is to ingest a model (also via ONNX), then automatically generate a sequence of computations that correspond to each layer of the neural network. Instead of directly turning this into a static SNARK circuit, it uses GKR to check each layer’s outputs against its inputs with minimal cryptographic hashing/commitment overhead. The prover in a GKR scheme runs the actual model computation but then engages in an interactive proof (made non-interactive with Fiat-Shamir) to convince the verifier that every layer was computed correctly.

The beauty of GKR is that its prover complexity is linear in the size of the circuit (O(n)), with only a small constant factor slowdown compared to normal execution. In fact, modern GKR-based systems can be <10× slower than plain execution for some tasks like large matrix multiplications. DeepProve combines this with modern polynomial commitment techniques to output a succinct proof after the sum-check rounds, effectively creating a zkSNARK tailored to neural network inference. One trade-off is that GKR works best on structured computations (like the static layers of an NN) and involves more complex cryptographic protocol logic, but its strength is raw speed in proving deep computations.

Strengths & Weaknesses

Each approach has unique strengths and potential drawbacks

JOLTx (Lookup based zkVM with precompiles)

Strengths: Extremely flexible (it can prove any code, not just neural nets) and benefits from JOLT’s “lookup singularity” optimization making even bit-level operations cheap.

It requires no custom circuit per model – you compile and go – which greatly improves developer experience and reduces the chance of bugs.

The use of Lasso lookups means the proof constraints scale mainly with the number of operations executed, not their complexity, giving JOLT a consistent cost model.

Weaknesses: As a general VM, it may introduce some overhead for each instruction; for extremely large models with millions of simple operations, a specialized approach (like GKR) might achieve lower absolute proving time by streaming through the computation. Also, JOLT is relatively new – it relies on a novel lookup argument and complex ISA-level tables, which are cutting-edge tech that will need time to mature. But given its design, even JOLT’s current prototype outperforms prior zkVMs in efficiency.

—

‍
EZKL (Halo2/PLONK)

Strengths: It’s built on a widely-used SNARK framework, which means it benefits from existing tooling, audits, and on-chain verifier support (Halo2 proofs can be verified with Ethereum-friendly cryptography). EZKL is fairly user-friendly for data scientists: you can take a PyTorch or TensorFlow model, export ONNX, and get a proof that your model inference was done correctly.

It has seen practical integrations already (from DeFi risk models to gaming AI, as we’ll discuss) which show it’s capable of proving real ML tasks.

Weaknesses: Performance can become a bottleneck as models grow. Traditional SNARK circuits often impose huge overhead – historically on the order of a million times more work for the prover than just running the model.

Halo2’s approach tries to optimize, but operations like big matrix multiplies or non-linear activations still translate to lots of constraints. The need to chunk large lookups (for e.g. 32-bit arithmetic or non-linear functions) adds extra constraints and proving time.

Essentially, EZKL may struggle with very large networks (both in proving time and memory), sometimes requiring splitting the circuit or using special techniques to fit within practical limits. It’s a great general-purpose SNARK approach, but not the fastest when pushed to scale.

—

DeepProve (GKR)

Strengths: Blazing fast proof generation for deep models. By avoiding the overhead of encoding every multiplication as a polynomial constraint, GKR lets the prover mostly just do normal numeric computations and then add a thin layer of cryptographic checks. DeepProve’s team reports 54× to 158× faster proving than EZKL on equivalent neural networks.

In fact, the bigger the model, the more GKR shines: as model complexity increases, DeepProve’s advantage grows, since its linear scaling stays manageable while circuit-based approaches balloon in cost.

Weaknesses: The approach is somewhat specialized to circuit-like computations (which luckily includes most feed-forward ML). It’s less immediately flexible if your workload has lots of conditional logic or irregular operations – those are easier to handle in a VM like JOLT. Additionally, making GKR proofs zero-knowledge and succinct can result in proof sizes larger than classical SNARKs and verification procedures that, while much faster than re-running the whole model, are not instant. (DeepProve’s proofs take on the order of 0.5 seconds to verify for CNNs, which is excellent for large models, but traditional SNARK verifiers can be in the milliseconds.

So DeepProve is laser-focused on performance, possibly at the cost of a bit more complexity in the proof protocol and slightly heavier verification than something like a Halo2 proof. It’s a powerful approach for scaling zkML, especially in server or cloud environments, though maybe less oriented to light clients or on-chain verification without further optimization.

Performance and Efficiency Comparison

When it comes to raw performance, each zkML approach has different priorities. Proof generation time is often the critical metric. DeepProve’s GKR-based prover currently takes the crown in speed – benchmarks show it generating proofs 50× to 150× faster than EZKL on the same models

This leap comes from GKR’s near-linear time algorithm that sidesteps the heavy polynomial algebra of SNARK circuits. In practical terms, a neural network inference that might take EZKL hours to prove could be done in minutes with DeepProve. And as models get larger (more layers, more parameters), this gap widens, because GKR’s per-operation overhead stays low while Halo2’s grows.

JOLT’s performance goal is similarly ambitious – it aims to be an order of magnitude faster than existing SNARK frameworks. A16z’s team has demonstrated Lasso (the lookup engine in JOLT) running 10× faster than Halo2’s lookup mechanism, with a roadmap to reach 40× improvement

This means operations that were once a bottleneck (like those pesky bitwise ops or big-field arithmetic) become much cheaper. JOLT essentially trades computation for table lookups, and thanks to Lasso, looking up values in a huge virtual table is cheap. The cost does not grow with the table size (no, they aren’t actually storing $2^{128}$ rows in memory!) – it grows mostly with the number of lookups performed.

So if your ML model does a million ReLUs, the prover cost grows with those million operations, but each is just a quick table check. Early results show JOLT’s prover needing to handle only a handful of field element commitments per instruction step, which is extremely low overhead. In short, JOLT is optimizing the heck out of the computation per operation, slashing the traditional SNARK overhead by using precomputed knowledge (the lookup tables) to skip doing hard math on the fly.

EZKL with Halo2, while slower in comparison, isn’t standing still either. It has benefitted from Halo2’s optimizations like custom gates and partial lookups. For moderate-sized models, EZKL is perfectly usable and was shown to outperform some other earlier alternatives (it was about 3× faster than a STARK-based approach on Cairo, and 66× faster than a RISC-V STARK in one 2024 benchmark.

This indicates that a well-optimized SNARK circuit can beat a more naively implemented VM. The downside is that to reach those speeds, EZKL and Halo2 had to carefully tune everything, and they still inherit the fundamental cost of polynomial commitments, FFTs, and proof arithmetic. By contrast, JOLT’s and DeepProve’s newer methods avoid much of the FFT or high-degree polynomial overhead – JOLT by limiting itself to lookups (plus a new argument to handle them efficiently), and DeepProve by using sum-check (which works over multilinear polynomials and requires only lightweight hashing commitments).

In classical SNARKs, a lot of time is spent doing things like computing large FFTs or multi-scalar multiplications; GKR largely avoids FFTs by operating on boolean hypercubes and multilinear extensions, and JOLT avoids huge FFTs for lookups by never materializing giant tables in the first place.

In terms of proof size and verification, there are trade-offs. JOLT and EZKL (Halo2) ultimately produce standard SNARK proofs – typically a few kilobytes – that can be verified quickly (a few pairings or some polynomial evaluations). DeepProve’s approach, being akin to a STARK, might produce larger proofs (tens of KB to maybe hundreds, depending on the model) and a verification that, while dramatically faster than re-running the model, might involve more steps than verifying a succinct KZG-proof. The DeepProve team highlighted that their verification is up to 671× faster than having a GPU simply re-compute an MLP to check it, bringing verification down to ~0.5s for fairly complex models.

That’s a strong result, indicating that even if the proof is bigger, it’s still far easier to verify than doing the raw AI computation (especially important if the verifier is a smart contract or a light client with limited compute). Halo2 proofs are smaller and even faster to check, but the difference is somewhat moot if proving them in the first place is too slow for your application.

A big aspect of efficiency is how these frameworks handle special operations. For example, in ML inference we often have non-linear steps (like ArgMax, ReLU, or Sigmoid). EZKL might handle a ReLU by enforcing a constraint with a boolean selector (which itself might be done via a lookup or additional constraints). JOLT, by contrast, can implement ReLU in the program (as a couple of CPU instructions branching on a sign bit) and prove those branches cheaply through lookups – essentially leveraging the CPU’s ability to do comparisons. DeepProve can accommodate piecewise-linear functions by incorporating them into the arithmetic circuit that GKR will verify, but highly non-linear functions (like a sigmoid) might need polynomial or lookup approximation. In general, JOLT’s philosophy is to make all the weird stuff look normal by executing actual code (with conditionals, loops, etc.) and using the lookup argument to cover any operation’s logic.

Halo2’s philosophy is to restrict everything to polynomial equations (sometimes cumbersome for complex ops), and GKR’s is to break computations down to sums and products that can be recursively checked. The efficiency of each approach in handling these will reflect in proof times: JOLT likely excels at control-heavy logic, GKR at large linear algebra crunching, and Halo2 sits in the middle with a balance but higher overhead on both counts.

For verification of Kinic proofs we are using the Internet Computer blockchain as it can handle larger proof verification much quicker than other L1. We are also not limited to precompiles that require tricky work to get JOLTx on-chain.

Scalability for Large-Scale Models

When we talk scalability in zkML, we could mean two things: handling large models (many layers, many parameters) and handling diverse models (different architectures or even dynamic models). Let’s consider large-scale deployment first – say you want to prove inference on a big CNN or a transformer with millions of parameters.

DeepProve is explicitly built for this scenario. Its performance gains multiply as models grow. If a small model (say a tiny MLP) is, for example, 50× faster to prove with DeepProve than with EZKL, a much larger model might be 100×+ faster.

The team notes that as we scale to models with millions of parameters, DeepProve will become even faster relative to alternatives, and they plan to push this further with parallel and distributed proving techniques. GKR is friendly to parallelization because many sub-calculations (like all neurons in a layer) can be checked in a batch. Also, GKR doesn’t require a giant monolithic “proving key” that grows with circuit size – it works more on-the-fly – so memory usage can be lower. This makes DeepProve promising for cloud-scale or even on-chain verification of fairly large neural nets in the future.

EZKL (Halo2) can handle reasonably large networks, but there are limits. Building a single huge circuit for a big model can demand a lot of memory and time (tens of gigabytes of RAM and hours of compute in some cases). The EZKL team has been exploring methods to improve this, such as circuit splitting and aggregation (proving pieces of the model separately and then combining proofs), and quantization strategies to reduce the arithmetic complexity. Still, a general SNARK like Halo2 will face challenges beyond a certain scale without specialized optimizations. It might be best suited for small to medium models or cases where proving happens offline and infrequently (or where strong hardware is available). The plus side is that once you generate a proof, verifying it stays easy even for large models – which is great for on-chain scenarios where a smart contract might verify a proof of a 100 million parameter model, something absolutely infeasible to recompute on chain.

JOLT sits in an interesting position on scalability. On one hand, it’s a SNARK-based approach so it also has to do work roughly linear in the number of operations executed (like Halo2 and GKR both do). On the other hand, JOLT’s constant per operation is very low due to the lookup-based technique. If we consider a “large-scale deployment” like running a model with, say, 10 million operations, JOLT would have to perform those 10 million lookups and corresponding commitments. That’s heavy, but not fundamentally heavier than what GKR or even a straightforward circuit would do – it’s still linear. The question is: how well can JOLT optimize and parallelize that? Since JOLT treats each instruction as a step, it could potentially parallelize proving by executing multiple instructions in parallel (akin to multiple CPU cores) if the proof system supports splitting the trace. The current research suggests they’re focusing on single-core performance first (with that ~6 field elements per step cost), but because the approach is a virtual machine, one could envision distributing different segments of the program trace across provers in the future (this is speculative, but not off the table given lookup arguments can be composed). Even without fancy distribution, JOLT can leverage existing polynomial commitment tricks – e.g. a universal setup that supports circuits of arbitrary size so it doesn’t need a new trusted setup for a bigger model. In terms of handling different ML architectures, JOLT shines: whether your model is a CNN, an RNN with loops, a decision tree ensemble, or some new hybrid, as long as you can run it on a CPU, JOLT can prove it. That makes it highly scalable in a development sense – you don’t need to redesign your proving method for each new model type.

DeepProve and other GKR-based methods are currently tailored to typical deep neural networks (layers of matrix ops and elementwise functions). They scale wonderfully with depth and width of those networks, but if you threw a very irregular workload at them (say a model that dynamically decides to skip layers or has data-dependent loops), the framework might need adaptation or might lose some efficiency. However, most large-scale ML models in deployment (vision, NLP, etc.) have a regular structure, so that’s fine.

One could ask: which approach is best for real-time or on-device usage, as opposed to cloud-scale? Real-time implies we want very low latency proofs even for smaller models. JOLT’s approach, being a SNARK, might allow smaller models to be proven in seconds or less on a decent device, especially as the tech matures. EZKL on a mobile CPU would likely be slower (Halo2 proving isn’t exactly mobile-friendly yet, though there are efforts to accelerate it. DeepProve could leverage GPUs effectively – if you have a GPU on device, it might actually prove a small model extremely fast (GKR loves parallel hardware). But DeepProve on a CPU might not be as optimized as JOLT on a CPU for real-time scenarios. So scalability isn’t just about handling bigger – it’s about handling the right size efficiently in the right environment. JOLT is aiming to be an all-purpose workhorse across environments, which in the long run makes it a strong candidate for both cloud and edge deployment.

JOLTx: Redefining zkML Capabilities With Sum-check and Lookups

With all these innovations, why do we put a spotlight on JOLT-based zkML as the best option moving forward? The answer comes down to versatility, performance, and real-world practicality – a combination that is hard to beat.

First, JOLT introduces a fundamentally new paradigm for building SNARKs. Instead of contorting high-level programs into circuits gate by gate, JOLT realizes the vision of designing circuits that only do lookups. This means complexity is handled in a precomputation phase (defining those huge instruction tables), and the online phase is dead-simple: just prove you did valid lookups. It’s like turning every hard operation into an “O(1)” step in the circuit. This drastically lowers the prover overhead for tasks that were traditionally painful in SNARKs (e.g. bit fiddling, arbitrary branching logic). For machine learning models, which often have a mix of linear algebra (which SNARKs handle ok) and non-linear decisions or data transformations (which SNARKs handle poorly), JOLT provides a balanced solution – the linear algebra still has to be done step by step, but each arithmetic op is straightforward, and any non-linear decision (like “if neuron > 0 then…” in ReLU) is also straightforward because the VM can simply branch and the proof of the correct branch is just a lookup check.

Secondly, JOLT is fast and getting faster. The research behind it shows an immediate >10× prover speedup vs. mainstream SNARK tools, and hints at reaching 40× with optimizations. It also inherits a lot of prior SNARK advancements: it uses modern polynomial commitment schemes and can leverage existing cryptographic libraries. The Lasso argument at its core was built with efficiency in mind and already demonstrated superior performance to older lookup arguments (which were the bottleneck in systems like Halo2).

What this means for real-time or local ML is that suddenly the idea of generating a proof on, say, a smartphone or a laptop doesn’t sound so crazy. If your model inference takes 100 milliseconds normally, maybe JOLT can prove it in a few seconds – that’s a big deal! Contrast that with older approaches where that proof might have taken minutes or hours, making it impractical outside of server farms. JOLT’s efficiency gains bring zkML closer to the realm of interactive use. We could envision, for example, a browser extension that uses JOLT to prove “I ran a vision model on the image you just uploaded and it contains no NSFW content” in real-time before letting the image be posted. Or a car’s onboard computer proving to insurance servers that it indeed used a verified driving model during an autonomous drive. These scenarios require quick turnaround, and JOLT’s speed makes it feasible.

Another huge factor is developer and user experience. JOLT lets developers work in the languages and models they’re comfortable with (via RISC-V compilation and shortly ONNX conversion). You don’t need to know the intricacies of SNARK circuits to use it. For the ML world, this is crucial: most ML engineers aren’t cryptography experts, and they shouldn’t have to be. With JOLT, they can write or compile existing code and still get proofs. This approach is analogous to the early days of GPUs – initially, only graphics experts wrote GPU code, but eventually general frameworks allowed any programmer to utilize GPU acceleration. JOLT is like the “GPU for ZK proofs” in that sense: a specialized engine that is accessible through standard toolchains. This lowers the barrier to adoption dramatically. We’re likely to see libraries that package common machine learning tasks (like model inference, model accuracy proofs, etc.) on top of JOLT so others can plug-and-play.

The auditability of JOLT is another subtle game-changer. Because it’s essentially proving a standard ISA’s execution, it’s easier to reason about and audit than a bespoke circuit. You can rely on the well-defined RISC-V spec and the correctness of the lookup tables, rather than verifying tens of thousands of hand-written constraints. This means higher confidence in proofs of critical ML models. For example, if a model is used in court or in a medical decision, having a clear trail (“this program was executed correctly”) is much more assuring than “this custom circuit that only a handful of experts understand was satisfied.” Auditors could even single-step through a VM execution trace if needed, something not really possible with monolithic circuit proofs.

JOLT- based zkML synthesizes theoretical elegance with practical implementation – the performance breakthroughs needed for scalability and the flexibility needed for broad adoption. It turns zero-knowledge proofs into a developer-friendly, high-speed utility. While Halo2-based and GKR-based approaches have blazed the trail in showing what’s possible (and will continue to be used for specific strengths), JOLT is poised to unify and uplift the zkML field. It’s like the jump from hand-crafted assembly to high-level programming – once a general-purpose, efficient solution exists, it empowers an entire ecosystem to flourish. For anyone looking to deploy verifiable machine learning in practice, JOLT offers a clear and compelling path forward: fast proofs, any model, any time. The future of zkML also belongs to the crafted zkVM and its precompiles!

‍

Wyatt Benno