Back to Blog

Table of Contents

Highlights

Febo on Pinocchio, p-token, and Pushing Solana's Limits

Written By

Fernando Otero (Febo) and Brian Wong

May 11, 2026

If you've written a Solana program in the last year, you've probably heard of Pinocchio. It started as a sidebar conversation at a London hacker house about dependency hell and has since become the foundation for a new generation of highly optimized Solana programs. The first of those, p-token, is a drop-in reimplementation of SPL Token that brings transfer costs down from 4,645 compute units to 76 and transfer_checked from 6,200 compute units to 105, alongside a set of new instructions designed for how programs actually use the token program today.

A few quick definitions before we get into the conversation:

Pinocchio is a Rust library for writing Solana programs with no external dependencies. The name comes from "no strings attached," a joke about the lack of dependencies. By rewriting the types a program needs from scratch, Pinocchio avoids the external dependencies conflicts that used to plague the solana-program crate, and along the way it opens up optimizations that are not possible when you're pulling in the full SDK.

Zero-copy account access means reading account data directly from the input buffer instead of deserializing it into owned types. For programs like the token program, where most instructions touch a small number of fixed-layout accounts, this avoids a huge amount of unnecessary work.

p-token is a reimplementation of SPL Token built on Pinocchio. It preserves the exact behavior of the original program, instruction by instruction and error by error, while cutting compute costs by an order of magnitude. SIMD-0266, which authorizes the upgrade, was approved on March 14, 2026. p-token also introduces three new instructions: Batch, WithdrawExcessLamports, and UnwrapLamports.

What's new for developers: programs that use the token program for CPIs will see meaningful CU reductions automatically once p-token ships. Programs that adopt Batch can compress multiple token operations into a single CPI, paying one 1,000 CU base invocation fee instead of one per instruction. We've measured roughly 12 to 13 percent block space recovery from the p-token switch alone, and that figure does not yet account for the further gains from Batch.

Background and entry into Solana

Q: What was your path from academia into crypto and Solana? What convinced you to go all in, and how does your academic background still show up in your work?

My path was a little unusual. My research area was evolutionary computation: genetic algorithms, particle swarm optimization, ant colony optimization, that family of techniques. In academia it's very easy to get specialized in a narrow lane and stay there. You can be very successful, but you miss what's happening outside. I was always making an effort to look beyond my own research and pay attention to what was happening in technology more broadly.

When blockchains started creating a real buzz, I got curious. I was lucky that I did some reading before jumping in. At the very beginning I was actually a bit skeptical. If you're going to put everyone's transactions on a blockchain, how is that going to scale? When I came across Solana, the focus on scalability stood out. This was a project actually building something that could work at scale. I consider myself lucky that I started in Solana directly.

I didn't jump in all at once. I was still working in academia and started doing two hours a week of Solana work. That quickly became four hours, because you can't really do much in two. After about four or five months of that, I realized I was already spending way more than four hours on it and having a lot of fun. That was the moment I went full time.

The academic background still shows up in how I approach problems. Research trains you to define a problem precisely, find a metric, and iterate. A lot of what we did with p-token was exactly that: a clearly defined behavior to preserve, a clear metric to drive down, and a tight loop of measuring and changing one thing at a time.

The origin of Pinocchio

Q: Pinocchio started as a conversation with Jon Cinque at a London hacker house. What problem was discussed and when did you realize you were building something bigger than a refactor of solana-program?

There was no plan. I had just joined Anza, and the London hacker house happened to be a month after I started. It was the first time I was meeting Jon in person after joining. We weren't going there to design a new library.

That same week, people at the hacker house were running into yet another round of dependency conflicts. At the time, if you wanted to write a program, even if you were using Anchor, you needed the solana-program crate. That crate had a huge number of dependencies, many of which had nothing to do with on-chain programs. It existed to serve other use cases too, so it pulled in everything those use cases needed. It was extremely easy to end up with two versions of the same dependency in conflict, and there was no clean way to resolve it. That had happened three or four times in a short period, and it was happening again that week.

So I went to Jon and said, can we do something about this? The conversation started right there. The idea was simple: build a library focused on programs only, with no external dependencies. Jon picked the name Pinocchio in that same conversation. "No strings attached," because there are no dependency strings.

The catch is that if you don't bring in the SDK, you have to rewrite all the types you need from scratch. Pubkey, AccountInfo, instruction parsing, all of it. And once we started rewriting, we realized we had an opportunity to make those types more efficient than the SDK version. Cavey had already pointed out a few places where the SDK was doing more work than it needed to, so the timing was perfect. At the beginning, efficiency was not the main goal. The main goal was eliminating dependency conflicts. But once you're rewriting the foundation anyway, you may as well rewrite it well.

It started as a small experiment, almost a "let's see what this looks like" thing. Very quickly we saw that it worked, and that it was significantly more efficient. At that point the decision to keep pushing was easy.

From building p-token to getting it to mainnet

Q: At Breakpoint 2024 in Singapore, Jon walked up and said "we're writing a new token program, transfers need to be 200 CUs or less." It currently sits at 4,645 CUs. What was your initial plan, and did you know rewriting the program in Pinocchio was going to be the answer?

Using Pinocchio was the plan from day one. By Breakpoint 2024, we had already been working on Pinocchio for several months. The first public release was either out or very close, and we knew what kind of efficiency gains we could get from it. So when Jon said "200 CUs or less," I had a baseline reason to believe it was reachable, but I wasn't sure we could actually hit that target.

The 200 CU value wasn't an arbitrary number. Jon had a reasoning behind it. I asked him later, and he walked me through how he arrived at it. The only requirement he gave me was that single number.

The way I worked on it was to write just enough of the program to be able to run a transfer, and then optimize. I wasn't going to write the whole program, find it was too slow, and have to redo everything. I got to around 600 CUs fairly quickly, then to around 300, and at some point I crossed under 200 and ended up around 140 in the first iteration of optimizations. Once I knew transfer was that cheap, that's when we committed to writing the whole program out and keeping every instruction at the same level of optimization.

Q: SIMD-0266 was approved on March 14. Can you walk through what it took to get from a working prototype to something ready for mainnet?

Rewriting an existing program is challenging, but it's also a well-defined problem, and the well-defined part is what makes it tractable.

The challenging part is that you have to preserve behavior exactly. If you find a clever reordering of checks that would speed things up but change the order in which errors fire, you can't ship it. SPL Token is one of the most heavily used programs on Solana, and any program that depends on a specific error code firing under a specific condition would break. So errors have to happen at exactly the same point, with exactly the same input, and with exactly the same error code. That constrains the optimization space a lot.

The positive side is that you have a clear metric and a clear reference. You know what behavior you need to match, you know what the current CU consumption is, and you know what direction "better" looks like. That's actually a luxury. A lot of optimization work is hard because you don't know what good looks like. Here we always did.

Q: You've said roughly 70 percent of the CU reduction came from two changes: switching the entry point and zero-copy account access. The other 30 percent was tinkering. What did the tinkering actually look like?

The 70 percent is an estimate, but it's roughly correct. As soon as you replace the solana-program entrypoint with the Pinocchio entrypoint, and you stop using bincode and borsh in favor of zero-copy reads, you get most of the way there. Those are easy wins in the sense that they don't require touching the per-instruction logic. Doing only those changes got me to around 600 CUs on transfer. Good, but not 200.

The remaining 30 percent varies instruction by instruction, so there's no single trick. The patterns that came up the most were:

Removing duplicate checks. The original token program does the right validations, but it sometimes does them twice. In a transfer, for example, there's already a check that the source account has enough balance. Then later, when the balance is updated, the code uses checked arithmetic in Rust, which does the same check again. The second check is unnecessary, because the first one already guarantees the operation is safe. You can replace the checked operation with an unchecked one and save a few CUs. Multiply that across every instruction and it adds up.

Unchecked borrows where they're sound. If an instruction only borrows a single account once, the borrow tracker isn't doing anything for you, and you can use an unchecked borrow safely.

Match versus if. This is the part that really earns the word "tinkering." In some places, replacing a match with an if improves the code generation, and in other places it doesn't. You can't predict it. You change one line, run the benchmark, and keep the change if it helps.

None of these individually is dramatic. But you go through every instruction line by line, and the small wins compound.

There's another optimization that came from Cavey, which deserves its own explanation. Cavey had the idea of giving the transfer instruction priority inside the program. Normally, when a program receives an invocation, it parses all the accounts, then parses the instruction data, then dispatches. That parsing costs CUs before you've even started doing any real work. Cavey's idea was: peek at the input first, see if the shape is consistent with a transfer (the right number of accounts, the right size), then peek at the instruction data, and if it really is a transfer, jump straight to the transfer logic without doing the generic parsing pass. You already know the layout, so you can read fields at fixed offsets instead of iterating.

We picked transfer because we did an analysis of about a month of mainnet token program traffic. Close to 50 percent of all token program instructions are transfers. So optimizing transfer specifically, even more than the rest, makes 50 percent of all token program traffic cheaper. The same analysis showed that about seven instructions account for roughly 80 percent of mainnet token program usage, and those seven get a lighter version of the same priority treatment. You can't give every instruction priority, because then nothing has priority. But you can stack-rank by usage and lean into the top of the curve.

Q: Walk us through the verification pipeline: unit tests, fuzzing with Firedancer tooling, Neodyme replaying months of mainnet transactions, audits, formal verification. What does each layer catch that the others don't?

We followed a stepwise approach, and each layer is good at something the others aren't.

Unit tests are great when you're actively developing, especially before you're feature-complete, because you can run the tests for the instructions you've already implemented and skip the rest. The huge advantage with p-token was that the SPL Token test suite already existed. We knew those tests were valid and we knew what passing looked like. Tests are good at predictable outcomes: success cases, and known failure cases. They're not good at the long tail.

Fuzzing is what catches the long tail. We use the Firedancer fuzzing tooling, which throws essentially gibberish at the program. The expectation isn't that the program succeeds, it's that the program fails the same way the original does. Same input, same error code, same point of failure. On the very first fuzzing run we caught cases where we were returning an error at the right step but with a different error code that didn't match the original. That's exactly what fuzzing is for. It's unpredictable on input but lets you assert on output equivalence.

Neodyme replaying mainnet transactions is a different angle on the same goal. Instead of synthetic gibberish, you replay real history and check that your reimplementation produces the same results as the original. That catches realistic combinations that fuzzing might never randomly generate.

Audits look at the code from a perspective that tests and fuzzing can't replicate. Auditors are not asking "does this work?" They're asking "how do I break this?" That's a fundamentally different mindset. Developers tend to look at code and think about what it's trying to do. Auditors look at code and think about what they can make it do. That's how you find the issues that come from a combination of things, where any one piece in isolation looks fine.

Formal verification is the strongest guarantee. Tests and fuzzing sample the input space. Formal verification proves a property over the entire input space. In our case, the property we're proving is equivalence: that p-token behaves exactly the same as SPL Token, for every possible input. That work is currently underway, with completion targeted for mid-May.

Q: p-token adds three new instructions: Batch, WithdrawExcessLamports, and UnwrapLamports. Where did the ideas for these come from?

These were mostly community requests.

Batch came from a suggestion by Dean. The idea is that you can execute multiple token instructions inside a single program invocation. The reason this matters is the per-CPI base fee. Every cross-program invocation costs at least 1,000 CUs before you do anything else. If your DeFi protocol does a swap that involves seven token instructions, that's 7,000 CUs in base CPI fees alone, on top of the actual work. With Batch, you do one CPI into p-token and pass it all seven instructions, and you pay the 1,000 CU base fee once. For programs that do a lot of token operations per transaction, that's a meaningful structural improvement on top of the per-instruction gains.

The 12 to 13 percent block space figure I mentioned earlier doesn't include Batch. That number comes from the per-instruction optimizations alone. Once programs adopt Batch, there's a second wave of recovery on top of that.

WithdrawExcessLamports is more boring. It already exists on Token-2022. People sometimes accidentally send lamports to a mint account instead of a token account, and once the lamports are sitting on a mint they're stuck. WithdrawExcessLamports lets you recover them. We knew there was demand for it, so we brought it over.

UnwrapLamports is a community request. Today, if you have wrapped SOL and want to unwrap it, you have to create a temporary ATA, transfer the wrapped SOL into it, and close the account to recover the lamports. That's three operations for something that should be one. UnwrapLamports lets you specify a destination directly and sends the lamports there in a single step. For DeFi protocols that handle wrapped SOL constantly, that's a real quality-of-life improvement.

Q: An issue came up during the audit related to owner checks not firing between batched instructions. What did that teach you about the tradeoffs of optimization?

That was a really good find, and it's a good example of how introducing a new use case can break an assumption that was sound under the old one.

When you optimize, you optimize for a use case. The original token instructions were designed to run as a single invocation. You invoke, the instruction runs, the program returns. Under that model, there's an optimization where you can skip an explicit owner check inside the instruction, because the runtime enforces at the end of the invocation that you can only have written to accounts you own. If you wrote to an account you don't own, it fails. So an in-program owner check would be redundant.

Batch breaks that assumption. Now multiple instructions run inside the same invocation context, and the runtime's owner check only fires when the outermost invocation finishes, not between batched instructions. So an instruction inside a batch can write to an account, and the runtime won't catch the ownership violation until much later, after other instructions have already executed. The auditors caught it, we added explicit owner checks for the batch path, and the issue was resolved before mainnet. But it's a clear illustration of why audits matter. The original optimization was correct under the original assumptions. The new feature changed the assumptions.

The bigger lesson is that anyone running a protocol with real users and real funds should be auditing their code, ideally formally verifying it where it's feasible. Edge cases are not trivial to spot, especially when you're deep in optimization mode and focused on a specific path. You can introduce something that creates an exploit precisely because you weren't looking at it from an attacker's angle. If you're shipping code that holds user funds, that has to be part of the process.

What comes next

Q: Beyond p-token, what other core programs is the team looking at rewriting with Pinocchio? And how does type-sharing between Pinocchio and the SDK change the developer experience?

We're actively working on more. p-ATA is well underway. That's a reimplementation of the Associated Token Account program, the one that derives the deterministic token account address from a mint and an owner. It's how you can send USDC to someone's wallet address and know the funds will land in the right place.

p-memo is about to deploy. The memo program is small, but we got the same kind of efficiency gains there, in the 90+ percent improvement range.

Beyond those specific programs, there's a broader push to rewrite the core programs to be no_std. Once you're doing that rewrite anyway, the natural thing is to take the opportunity to make them more efficient at the same time. So you should expect more programs to move in this direction over time.

On type-sharing: at the very beginning, Pinocchio had to define its own types because the SDK pulled in too many dependencies. That was the original problem we were solving. But over the last year or so, we've been gradually improving the SDK to reduce its dependency footprint. As that happened, we were able to start sharing types between Pinocchio and the SDK. Pinocchio today has much less code than it did at the start, because a lot of what made it efficient has moved into the SDK itself. We sometimes joke that the SDK has been "Pinocchio-fied."

The reason that matters for developers is that the SDK is used both on-chain and off-chain. If your account state types are shared between Pinocchio and the SDK, you can write your account layout once and use it in your program and your Rust client without conversion code in between. That was one of the most common community requests when Pinocchio first took off: "you have a different Pubkey type, I have to convert everything." Now you don't.

Q: Pinocchio has a different syntax than Anchor. How do you recommend developers start building with it?

The most important thing is to understand what you're doing before you reach for the optimizations. Pinocchio gives you more freedom, and that freedom can bite you if you don't understand the model.

Pinocchio offers both a safe API and an unsafe API. Start with the safe API. Understand what each call is doing and why. Look at the unsafe API only when you understand what guarantees you're trading away and you're confident you can uphold them yourself.

If you're completely new to Solana, the first priority is understanding the account model, how programs work, and what validations a program is responsible for performing. Whether your first program is in Anchor or in Pinocchio with the safe API matters less than whether you understand those fundamentals. Once you have a working program that does the right validations, then you start asking how to make it more efficient.

The thing I'd warn against is reading the p-token source code as a tutorial on how to write programs. p-token's optimizations work because of the specific context they're in: a heavily audited reimplementation with a known reference behavior. If you copy a pattern out of p-token without understanding why it's sound there, you can very easily introduce unsoundness in your own program.

For learning resources, the team at Blueshift has the most beginner-friendly path. They have written guides and a set of progressive challenges that walk you through writing Pinocchio programs hands-on. Solana Turbine runs cohorts every quarter or so that cover Pinocchio. The Pinocchio README itself is reasonably friendly, but it assumes you already know how to write Solana programs. It's documentation for people who need to learn Pinocchio specifically, not for people who are learning Solana from scratch.

Q: You've been working on Solana for about five years, two of them at Anza. What keeps you excited about working on Solana?

The developer community. We have a strong and demanding developer community that does not settle for "good enough." Cavey, Dean, Leo, the folks at Blueshift and Solana Turbine, and many others are constantly pushing on what can be improved next. That's what makes it exciting to keep working here. There is always something to learn and someone is always finding a better way to do something you thought was already done.

It is also collaborative in a way that I think is underappreciated. Pinocchio is where it is today because the community embraced it and started contributing ideas. We started it, but the iterative improvements came from a lot of different people in a lot of different places exchanging ideas and chipping in. That kind of culture, where people from different companies work together on the same low-level infrastructure, is not a given.

Q: What happens after p-token ships to mainnet, and do you have any final advice for people in the Solana community?

I'm pretty sure that once people see what p-token does, the next request is going to be a p-token-2022 version [laughs]. I've already been getting questions about it. As I mentioned earlier, we are working on making the core programs no_std, and in the process we will also look for more optimizations.

As for advice: stay curious. Ask whether there is a better way to do the thing you are doing, even if the current way works. That is how things actually improve. There is always a new problem to solve and something worth rethinking. I don't think you ever really feel comfortable working on Solana, because someone is always finding something new. That is actually a very good thing.

Febo is a core engineer at Anza working on Pinocchio, p-token, and the broader effort to make Solana's core programs as efficient as possible. You can follow Febo at @0x_febo and Anza at@anza_xyz.