dev_copycat.erl - Message Indexing Orchestration Device
Overview
Purpose: Orchestrate indexing of messages from foreign sources into HyperBEAM caches
Module: dev_copycat
Device: ~copycat@1.0
Pattern: External Data → HyperBEAM Cache
This device orchestrates the indexing of messages from external sources (primarily Arweave) into a HyperBEAM node's local caches. It acts as a coordinator, delegating to specific engine implementations for each data source type.
Supported Sources
1. GraphQL Endpoints
- Engine:
dev_copycat_graphql - Use Case: Query Arweave gateway GraphQL APIs
- Features: Pagination, filtering by tags/owner/recipient
2. Arweave Nodes
- Engine:
dev_copycat_arweave - Use Case: Fetch blocks directly from Arweave nodes
- Features: Height-based range fetching, reverse chronological
Dependencies
- Engines:
dev_copycat_graphql,dev_copycat_arweave
Public Functions Overview
%% Source-Specific Functions
-spec graphql(Base, Request, Opts) -> {ok, IndexedCount} | {error, Reason}.
-spec arweave(Base, Request, Opts) -> {ok, FinalHeight} | {error, Reason}.Public Functions
1. graphql/3
-spec graphql(Base, Request, Opts) -> {ok, IndexedCount} | {error, Reason}
when
Base :: map(),
Request :: map(),
Opts :: map(),
IndexedCount :: integer(),
Reason :: term().Description: Fetch data from a GraphQL endpoint for replication. Delegates to dev_copycat_graphql:graphql/3.
<<"query">>- GraphQL query string<<"tag">>/<<"tags">>- Filter by tags<<"owner">>- Filter by transaction owner<<"recipient">>- Filter by recipient<<"node">>(in Opts) - Gateway URL<<"operationName">>- GraphQL operation name<<"variables">>- Query variables
-module(dev_copycat_graphql_test).
-include_lib("eunit/include/eunit.hrl").
graphql_with_query_test() ->
Base = #{},
Request = #{
<<"query">> => <<"query { transactions { edges { node { id } } } }">>
},
Opts = #{
<<"node">> => <<"https://arweave.net">>
},
% Note: This would hit real endpoint, use mock in actual tests
Result = dev_copycat:graphql(Base, Request, Opts),
?assert(is_tuple(Result)).
graphql_with_tag_filter_test() ->
Request = #{
<<"tag">> => <<"App-Name">>,
<<"value">> => <<"MyApp">>
},
Result = dev_copycat:graphql(#{}, Request, #{}),
?assert(is_tuple(Result)).2. arweave/3
-spec arweave(Base, Request, Opts) -> {ok, FinalHeight} | {error, Reason}
when
Base :: map(),
Request :: map(),
Opts :: map(),
FinalHeight :: integer(),
Reason :: term().Description: Fetch data from an Arweave node for replication. Delegates to dev_copycat_arweave:arweave/3.
<<"from">>- Starting block height (default: current height)<<"to">>- Ending block height (default: 0 - Genesis)
-module(dev_copycat_arweave_test).
-include_lib("eunit/include/eunit.hrl").
arweave_range_test() ->
Base = #{},
Request = #{
<<"from">> => 1000,
<<"to">> => 999
},
Result = dev_copycat:arweave(Base, Request, #{}),
?assert(is_tuple(Result)).
arweave_from_current_test() ->
% Use a small bounded range to avoid hanging
Request = #{
<<"from">> => 1001,
<<"to">> => 1000
},
Result = dev_copycat:arweave(#{}, Request, #{}),
?assert(is_tuple(Result)).Engine Architecture
Module Pattern
Each data source has its own engine module:
-module(dev_copycat_[ENGINE]).
-export([[engine_name]/3]).
[engine_name](Base, Request, Opts) ->
% Engine-specific implementation
{ok, Result}.Adding New Engines
To add support for a new data source:
- Create
dev_copycat_[engine].erl - Implement
[engine]/3function - Add routing function to
dev_copycat.erl:[engine](Base, Request, Opts) -> dev_copycat_[engine]:[engine](Base, Request, Opts).
Common Patterns
%% Index from GraphQL with custom query
Result = dev_copycat:graphql(
#{},
#{
<<"query">> => <<"
query($after: String) {
transactions(
first: 100,
after: $after,
tags: [{name: \"App-Name\", values: [\"MyApp\"]}]
) {
edges {
node { id owner { address } tags { name value } }
cursor
}
pageInfo { hasNextPage }
}
}
">>,
<<"variables">> => #{}
},
#{<<"node">> => <<"https://arweave.net">>}
).
% Returns: {ok, TotalIndexed}
%% Index by tag filter (auto-generates query)
dev_copycat:graphql(
#{},
#{
<<"tag">> => <<"App-Name">>,
<<"value">> => <<"MyApp">>
},
#{}
).
%% Index by owner
dev_copycat:graphql(
#{},
#{<<"owner">> => <<"wallet-address-here">>},
#{}
).
%% Fetch Arweave blocks in a range
dev_copycat:arweave(
#{},
#{
<<"from">> => 1500000,
<<"to">> => 1499000
},
#{}
).
% Returns: {ok, FinalHeight}
%% Fetch all blocks from current to target
dev_copycat:arweave(
#{},
#{<<"to">> => 1000000},
#{}
).
%% Fetch recent blocks (from current)
dev_copycat:arweave(#{}, #{}, #{}).
% Fetches from latest block down to GenesisUse Cases
1. Initial Node Sync
Populate a new HyperBEAM node with historical data:
% Start from current block, index backwards
dev_copycat:arweave(#{}, #{}, #{}).2. Process-Specific Indexing
Index only messages for a specific process:
dev_copycat:graphql(
#{},
#{
<<"tag">> => <<"Process">>,
<<"value">> => ProcessID
},
#{}
).3. Owner-Based Indexing
Index all transactions from a specific wallet:
dev_copycat:graphql(
#{},
#{<<"owner">> => WalletAddress},
#{}
).4. Gap Filling
Fill missing blocks in specific height ranges:
% Fetch specific range
dev_copycat:arweave(
#{},
#{
<<"from">> => MissingEnd,
<<"to">> => MissingStart
},
#{}
).5. Application-Specific Indexing
Index transactions for a specific application:
dev_copycat:graphql(
#{},
#{
<<"tags">> => [
#{<<"name">> => <<"App-Name">>, <<"value">> => <<"MyApp">>},
#{<<"name">> => <<"App-Version">>, <<"value">> => <<"1.0">>}
]
},
#{}
).Engine Characteristics
GraphQL Engine (dev_copycat_graphql)
Strengths:
- Flexible filtering
- Efficient pagination
- Tag-based queries
- Owner/recipient filtering
- Depends on gateway availability
- Limited to indexed data
- Query complexity limits
- Targeted message retrieval
- Tag-based filtering
- Recent transaction indexing
Arweave Engine (dev_copycat_arweave)
Strengths:
- Direct block access
- Complete data availability
- No gateway dependency
- Reliable for historical data
- Sequential block fetching
- Slower than GraphQL for filtered queries
- Network-intensive
- Complete node sync
- Historical block indexing
- Gap filling in block ranges
Error Handling
Both engines return standardized error tuples:
{ok, Result} % Success with result
{error, Reason} % Failure with reasonCommon error scenarios:
- Network failures
- Invalid queries
- Missing blocks
- Rate limiting
- Parse errors
Integration with Cache System
The copycat device works closely with HyperBEAM's cache:
- Check Cache: Before fetching, checks if data exists
- Fetch: Retrieves data from external source
- Parse: Converts to HyperBEAM message format
- Write: Stores in local cache
- Continue: Proceeds to next item
This ensures efficient replication without duplicating existing data.
Performance Considerations
GraphQL
- Batch size: 100 transactions per request (typical)
- Pagination: Automatic with cursor-based continuation
- Rate limits: Depends on gateway
Arweave
- Sequential: One block at a time
- Caching: Skips already-indexed blocks
- Direction: Reverse chronological (newest first)
Monitoring
Both engines emit events for monitoring:
% GraphQL events
?event({graphql_parsed_msgs, Count})
?event({indexer_graphql_wrote, {total, Total}})
% Arweave events
?event({arweave_block_cached, {height, Height}})
?event({arweave_block_not_found, {height, Height}})References
- GraphQL Engine -
dev_copycat_graphql.erl - Arweave Engine -
dev_copycat_arweave.erl - Cache System -
hb_cache.erl - Gateway Client -
hb_gateway_client.erl - Arweave Device -
~arweave@2.9-pre
Notes
- Orchestrator Pattern: Delegates to specialized engines
- Extensible: Easy to add new data sources
- Cache-Aware: Avoids duplicate indexing
- Event-Driven: Emits events for monitoring
- Async-Capable: Can run indexing in background
- Gateway Integration: Works with Arweave gateways
- Direct Node Access: Can fetch directly from nodes
- Flexible Filtering: Multiple query options for GraphQL
- Range Support: Block height ranges for Arweave
- Automatic Pagination: Handles multi-page GraphQL results
- Error Resilient: Continues on individual item failures
- Simple API: Two main functions for different sources
- Stateless: No persistent state in device itself
- Cache-First: Leverages existing cached data
- Modular Design: Clean separation of concerns per engine