Skip to content

Conversation

@julienrbrt
Copy link
Member

Overview

Attempt to replace go-header default store by ev-node store.
This avoids duplication of blocks.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 30, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 1, 2026, 10:02 PM

@claude
Copy link
Contributor

claude bot commented Jan 30, 2026

Claude finished @julienrbrt's task in 3m 56s —— View job


Comprehensive Code Review - Complete ✅

I've completed a detailed review of this refactoring. The PR successfully unifies the store architecture by replacing the go-header store with ev-node store adapters, eliminating data duplication. Here's my analysis:


Summary

This is a significant architectural improvement that:

  • ✅ Eliminates duplicate storage of blocks between go-header and ev-node stores
  • ✅ Simplifies rollback logic by consolidating to a single store
  • ✅ Maintains P2P sync compatibility through well-designed adapters
  • ✅ Removes ~700 lines of wrapper code (exchange_wrapper.go)
  • ⚠️ Has some areas that need attention before merging

Critical Issues

1. context.Background() Usage in Constructors

pkg/store/store_adapter.go:142 and pkg/store/store_adapter.go:400

Using context.Background() in NewStoreAdapter and Height() means these operations cannot be canceled or have timeouts. While Height() is likely fast, this violates Go best practices for context propagation.

Impact: Medium - Could cause issues during shutdown or if the store operation blocks unexpectedly.

Recommendation: Pass a context parameter to constructors, or document why context.Background() is acceptable here (e.g., if Height() is guaranteed O(1) from in-memory cache).

Fix context usage →


2. Tail() Performance for Pruned Nodes

pkg/store/store_adapter.go:234-269

The Tail() method performs a linear scan from genesisInitialHeight to find the first available block after pruning. This is O(n) where n is the number of pruned heights.

Impact: High for pruned nodes - Could cause significant performance degradation on large chains with aggressive pruning.

Status: Already acknowledged with TODO comment at line 233. The author plans to optimize this in PR #2984 by storing the pruned tail in state.

Recommendation: Ensure PR #2984 is prioritized and merged before this becomes a bottleneck in production.


3. Potential Data Loss in Append() Without Error Propagation

pkg/store/store_adapter.go:434-462

The Append() method silently skips items that are already in the store (line 447-450) and zero items (line 440-442). While this is likely intentional, there's no logging or metrics to track when items are skipped.

Impact: Low - Could make debugging difficult if blocks are unexpectedly not appearing in the pending cache.

Recommendation: Add debug-level logging or metrics when items are skipped to aid in troubleshooting.


Concurrency Safety Analysis

✅ Well-Implemented Thread Safety

  1. heightSub mechanism (lines 35-104): Excellent implementation using atomic operations + mutex for the wait/notify pattern. The double-check pattern in Wait() (lines 67-76) prevents races.

  2. Pending cache: Uses hashicorp/golang-lru/v2 which is thread-safe internally. LRU eviction prevents unbounded memory growth.

  3. Height tracking: Proper use of atomic.Uint64 in heightSub and sync service.

⚠️ Race Condition Potential

In pendingHead() method (pkg/store/store_adapter.go:188-198):

for _, h := range a.pending.Keys() {
    if item, ok := a.pending.Peek(h); ok && h > maxHeight {
        maxHeight = h
        head = item
    }
}

While LRU is thread-safe, iterating over Keys() and then Peek()ing each key is not atomic. A key could be evicted between the Keys() call and Peek(), though this is handled by the ok check. However, the iteration order from Keys() is not guaranteed to be deterministic.

Impact: Low - The ok check handles eviction gracefully, but this could return a non-maximal head in rare race scenarios.

Recommendation: Document this race possibility, or use a mutex to protect the entire iteration.


Architecture & Design

✅ Strong Points

  1. Adapter Pattern: Clean separation between ev-node store and go-header interface requirements. The StoreGetter[H] interface is well-designed for testability.

  2. Pending Cache Strategy: Smart approach to handle P2P-received blocks before validation. The LRU eviction (1000 items) prevents memory issues while keeping recent unvalidated blocks accessible.

  3. Blocking GetByHeight: The heightSub mechanism elegantly solves go-header's requirement for blocking reads until data is available. Good use of channels for synchronization.

  4. Type Safety: Generic StoreAdapter[H] with concrete type aliases (HeaderStoreAdapter, DataStoreAdapter) provides both flexibility and convenience.

  5. Rollback Simplification: Removing 40+ lines from each rollback command and consolidating to the single source of truth is a major win for maintainability.

⚠️ Design Considerations

  1. WriteToStoreAndBroadcast Still Needed

    Per @tac0turtle's comment: "Do we still need to do writetostoreandbroadcast in the sync loop if we have this?"

    Looking at the removed code in block/internal/syncing/syncer.go, the WriteToStoreAndBroadcast calls were removed for DA-sourced events. However, WriteToStoreAndBroadcast is still used in other places (like when aggregators produce blocks).

    Analysis: The DA-sourced blocks no longer need WriteToStoreAndBroadcast because:

    • The store adapter's Append() already handles adding to pending cache
    • The P2P syncer can access blocks via the adapter without explicit broadcast for local-only sync

    But: Producer-created blocks still need WriteToStoreAndBroadcast to propagate via P2P gossip.

    Recommendation: Add a comment in the code clarifying when WriteToStoreAndBroadcast is needed vs. when store adapter alone suffices.

  2. Pending Cache Size

    defaultPendingCacheSize = 1000 (line 21) seems reasonable, but there's no documentation on how this number was chosen or what happens if a node falls >1000 blocks behind.

    Recommendation: Document the reasoning and add metrics to track cache hit/miss rates and evictions.


Code Quality & Testing

✅ Test Coverage

  • Comprehensive adapter tests in header_store_adapter_test.go and data_store_adapter_test.go
  • Tests cover: basic CRUD, store vs pending cache, hash lookups, ranges, initialization, blocking reads
  • Good use of table-driven tests and parallel execution

⚠️ Test Gaps

  1. Concurrency tests missing: No tests for concurrent Append() + GetByHeight() or race conditions.
  2. LRU eviction tests missing: No tests verifying behavior when pending cache exceeds 1000 items.
  3. Tail() pruning tests incomplete: Tests don't cover the O(n) scan scenario mentioned in the TODO.
  4. Error handling: Limited tests for context cancellation during blocking GetByHeight().

Coverage: 78.73% patch coverage with 57 lines missing (mostly in adapter error paths).

Recommendation: Add concurrency and eviction tests, especially for the pending cache behavior under load.


Error Handling

✅ Good Practices

  • Consistent error wrapping with context (fmt.Errorf)
  • Proper use of errors.Join() in rollback and sync service cleanup
  • Handles both header.ErrNotFound and header.ErrEmptyStore appropriately

⚠️ Issues

  1. Silent error ignoring in Height() (pkg/store/store_adapter.go:400-401):

    if h, err := a.getter.Height(context.Background()); err == nil && h > 0 {

    If Height() returns an error, it's silently ignored and falls through to cached height. This could mask real issues.

  2. DeleteRange onDeleteFn error handling (pkg/store/store_adapter.go:494-512):
    If onDeleteFn returns an error partway through the range, some items are deleted but not others, potentially leaving inconsistent state. Consider wrapping in a transaction or rolling back on error.


Performance Considerations

  1. Multiple store.Height() calls: In Height() method (line 398-429), we call getter.Height() which may hit disk. Consider caching more aggressively.

  2. Linear scan in Get() for pending items (line 283-287): Iterates all pending cache keys to find by hash. With 1000 items, this could be slow. Consider adding a hash->height index for the pending cache.

  3. Tail() optimization: Already covered in Critical Issues Peer discovery #2.


Security Considerations

No security issues identified. The changes:

  • Don't expose new APIs
  • Maintain the same data validation as before
  • Don't introduce credential or secret handling
  • Properly handle untrusted P2P data in pending cache before validation

Documentation

✅ Good Documentation

  • Excellent inline comments in store_adapter.go explaining the adapter pattern (lines 106-132)
  • Clear TODO comments for known limitations
  • Updated pkg/store/README.md with adapter information

⚠️ Could Improve

  • Missing godoc examples for public adapter constructors
  • No sequence diagram showing adapter interaction with P2P sync flow
  • Pending cache eviction policy not documented in README

Recommendations Summary

Must Fix Before Merge

  1. Add tests for pending cache LRU eviction behavior
  2. Add concurrency tests for adapter under load
  3. Document or fix context.Background() usage in constructors

Should Fix Before Merge

  1. Add logging/metrics when Append() skips items
  2. Document when WriteToStoreAndBroadcast is still needed
  3. Improve error handling in Height() method (don't silently ignore errors)

Future Work (Can be separate PRs)

  1. Optimize Tail() for pruned nodes (already tracked in feat: block Pruning #2984)
  2. Add hash->height index for pending cache to speed up Get()
  3. Add metrics for pending cache hit/miss rates

Final Assessment

This is a well-architected refactoring that successfully achieves its goal of eliminating data duplication. The adapter pattern is clean, the pending cache strategy is sound, and the concurrency mechanisms are well-implemented.

Key strengths:

  • Eliminates duplicate storage
  • Simplifies rollback logic significantly
  • Maintains backward compatibility with go-header P2P
  • Good test coverage for core functionality

Key concerns:

  • Tail() performance for pruned nodes (acknowledged with TODO)
  • Missing tests for concurrency and cache eviction
  • Some error handling could be more robust

Overall: ✅ Approve with minor changes recommended. The critical issues are either already acknowledged (Tail optimization) or easily fixable (tests, logging). This is a significant improvement to the codebase architecture.


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the node's data storage strategy by centralizing all block-related data within the ev-node store. This change aims to prevent data duplication and simplify the overall architecture. By introducing adapter layers, the existing go-header P2P synchronization mechanisms can now operate directly on the unified ev-node store, leading to a more coherent and maintainable system. The modifications also streamline rollback procedures and reduce redundant code paths related to data persistence.

Highlights

  • Unified Store Architecture: The pull request refactors the storage mechanism to use the ev-node store as the single source of truth for both headers and data, eliminating the need for a separate go-header store and reducing data duplication.
  • Introduction of Store Adapters: New HeaderStoreAdapter and DataStoreAdapter implementations are introduced. These adapters wrap the ev-node store, allowing it to satisfy the header.Store interface from the go-header library, thus integrating seamlessly with go-header's P2P infrastructure.
  • Simplified Rollback Logic: The rollback commands (apps/evm/cmd/rollback.go and apps/testapp/cmd/rollback.go) are simplified. They now primarily rely on the ev-node store's Rollback method, removing explicit go-header store operations for headers and data, and improving error handling with errors.Join.
  • Refactored Sync Service Initialization: The pkg/sync/sync_service.go file is updated to use the new store adapters. NewDataSyncService and NewHeaderSyncService now directly accept the ev-node store and wrap it with the appropriate adapter, streamlining the setup of synchronization services.
  • Dependency Clean-up: Imports related to go-header/store and ds.Batching are removed from various files, reflecting the consolidated storage approach and reducing unnecessary dependencies.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully refactors the store integration for go-header by introducing DataStoreAdapter and HeaderStoreAdapter. This change eliminates data duplication by allowing the go-header P2P infrastructure to directly utilize the ev-node store. The rollback commands have been updated to reflect this unified store approach, and comprehensive tests have been added for the new adapter implementations. This is a significant improvement in architecture and efficiency.

@codecov
Copy link

codecov bot commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 78.73134% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.96%. Comparing base (9bc6311) to head (9101f11).

Files with missing lines Patch % Lines
pkg/store/store_adapter.go 78.83% 37 Missing and 14 partials ⚠️
pkg/sync/sync_service.go 70.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3036      +/-   ##
==========================================
+ Coverage   55.48%   55.96%   +0.48%     
==========================================
  Files         117      117              
  Lines       11685    11838     +153     
==========================================
+ Hits         6483     6625     +142     
- Misses       4480     4484       +4     
- Partials      722      729       +7     
Flag Coverage Δ
combined 55.96% <78.73%> (+0.48%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review January 30, 2026 16:54
// For ev-node, this is typically the genesis/initial height.
// If pruning has occurred, it walks up from initialHeight to find the first available item.
// TODO(@julienrbrt): Optimize this when pruning is enabled.
func (a *StoreAdapter[H]) Tail(ctx context.Context) (H, error) {
Copy link
Member Author

@julienrbrt julienrbrt Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is O(1) for non pruned node, and O(n) for pruned nodes, so we should improve this by saving the pruned tail in state in #2984

var errElapsedHeight = errors.New("elapsed height")

// defaultPendingCacheSize is the default size for the pending headers/data LRU cache.
const defaultPendingCacheSize = 1000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should leave enough header/data for p2p syncing nodes before they executes the block.
We can think of making it bigger otherwise.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to do writetostoreandbroadcast in the sync loop if we have this ?

@julienrbrt
Copy link
Member Author

julienrbrt commented Feb 1, 2026

Do we still need to do writetostoreandbroadcast in the sync loop if we have this ?

We shouldn't need it indeed 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants