Indexer
Indexers are node operators in The Graph Network that stake Graph Tokens (GRT) in order to provide indexing and query processing services. Indexers earn query fees and indexing rewards for their services. They also earn from a Rebate Pool that is shared with all network contributors proportional to their work, following the Cobbs-Douglas Rebate Function.
GRT that is staked in the protocol is subject to a thawing period and can be slashed if Indexers are malicious and serve incorrect data to applications or if they index incorrectly. Indexers can also be delegated stake from Delegators, to contribute to the network.
Indexers select subgraphs to index based on the subgraph’s curation signal, where Curators stake GRT in order to indicate which subgraphs are high-quality and should be prioritized. Consumers (eg. applications) can also set parameters for which Indexers process queries for their subgraphs and set preferences for query fee pricing.
Technical Level Required
ADVANCED
FAQ
What is the minimum stake required to be an indexer on the network?
The minimum stake for an indexer is currently set to 100K GRT.
What are the revenue streams for an indexer?
Query fee rebates - Payments for serving queries on the network. These payments are mediated via state channels between an indexer and a gateway. Each query request from a gateway contains a payment and the corresponding response a proof of query result validity.
Indexing rewards - Generated via a 3% annual protocol wide inflation, the indexing rewards are distributed to indexers who are indexing subgraph deployments for the network.
How are rewards distributed?
Indexing rewards come from protocol inflation which is set to 3% annual issuance. They are distributed across subgraphs based on the proportion of all curation signal on each, then distributed proportionally to indexers based on their allocated stake on that subgraph. An allocation must be closed with a valid proof of indexing (POI) that meets the standards set by the arbitration charter in order to be eligible for rewards.
Numerous tools have been created by the community for calculating rewards; you'll find a collection of them organized in the Community Guides collection. You can also find an up to date list of tools in the #delegators and #indexers channels on the Discord server.
What is a proof of indexing (POI)?
POIs are used in the network to verify that an indexer is indexing the subgraphs they have allocated on. A POI for the first block of the current epoch must be submitted when closing an allocation for that allocation to be eligible for indexing rewards. A POI for a block is a digest for all entity store transactions for a specific subgraph deployment up to and including that block.
When are indexing rewards distributed?
Allocations are continuously accruing rewards while they're active. Rewards are collected by the indexers, and distributed whenever their allocations are closed. That happens either manually, whenever the indexer wants to force close them, or after 28 epochs a delegator can close the allocation for the indexer, but this results in no rewards being minted. 28 epochs is the max allocation lifetime (right now, one epoch lasts for ~24h).
Can pending indexer rewards be monitored?
The RewardsManager contract has a read-only getRewards function that can be used to check the pending rewards for a specific allocation.
Many of the community-made dashboards include pending rewards values and they can be easily checked manually by following these steps:
Query the mainnet subgraph to get the IDs for all active allocations:
Use Etherscan to call getRewards()
:
Navigate to Etherscan interface to Rewards contract
To call
getRewards()
:Expand the 10. getRewards dropdown.
Enter the allocationID in the input.
Click the Query button.
What are disputes and where can I view them?
Indexer's queries and allocations can both be disputed on The Graph during the dispute period. The dispute period varies, depending on the type of dispute. Queries/attestations have 7 epochs dispute window, whereas allocations have 56 epochs. After these periods pass, disputes cannot be opened against either of allocations or queries. When a dispute is opened, a deposit of a minimum of 10,000 GRT is required by the Fishermen, which will be locked until the dispute is finalized and a resolution has been given. Fisherman are any network participants that open disputes.
Disputes have three possible outcomes, so does the deposit of the Fishermen.
If the dispute is rejected, the GRT deposited by the Fishermen will be burned, and the disputed Indexer will not be slashed.
If the dispute is settled as a draw, the Fishermen's deposit will be returned, and the disputed Indexer will not be slashed.
If the dispute is accepted, the GRT deposited by the Fishermen will be returned, the disputed Indexer will be slashed and the Fishermen will earn 50% of the slashed GRT.
Disputes can be viewed in the UI in an Indexer's profile page under the Disputes
tab.
What are query fee rebates and when are they distributed?
Query fees are collected by the gateway whenever an allocation is closed and accumulated in the subgraph's query fee rebate pool. The rebate pool is designed to encourage Indexers to allocate stake in rough proportion to the amount of query fees they earn for the network. The portion of query fees in the pool that are allocated to a particular indexer is calculated using the Cobbs-Douglas Production Function; the distributed amount per indexer is a function of their contributions to the pool and their allocation of stake on the subgraph.
Once an allocation has been closed and the dispute period has passed the rebates are available to be claimed by the indexer. Upon claiming, the query fee rebates are distributed to the indexer and their delegators based on the query fee cut and the delegation pool proportions.
What is query fee cut and indexing reward cut?
The queryFeeCut
and indexingRewardCut
values are delegation parameters that the Indexer may set along with cooldownBlocks to control the distribution of GRT between the indexer and their delegators. See the last steps in Staking in the Protocol for instructions on setting the delegation parameters.
queryFeeCut - the % of query fee rebates accumulated on a subgraph that will be distributed to the indexer. If this is set to 95%, the indexer will receive 95% of the query fee rebate pool when an allocation is claimed with the other 5% going to the delegators.
indexingRewardCut - the % of indexing rewards accumulated on a subgraph that will be distributed to the indexer. If this is set to 95%, the indexer will receive 95% of the indexing rewards pool when an allocation is closed and the delegators will split the other 5%.
How do indexers know which subgraphs to index?
Indexers may differentiate themselves by applying advanced techniques for making subgraph indexing decisions but to give a general idea we'll discuss several key metrics used to evaluate subgraphs in the network:
Curation signal - The proportion of network curation signal applied to a particular subgraph is a good indicator of the interest in that subgraph, especially during the bootstrap phase when query voluming is ramping up.
Query fees collected - The historical data for volume of query fees collected for a specific subgraph is a good indicator of future demand.
Amount staked - Monitoring the behavior of other indexers or looking at proportions of total stake allocated towards specific subgraphs can allow an indexer to monitor the supply side for subgraph queries to identify subgraphs that the network is showing confidence in or subgraphs that may show a need for more supply.
Subgraphs with no indexing rewards - Some subgraphs do not generate indexing rewards mainly because they are using unsupported features like IPFS or because they are querying another network outside of mainnet. You will see a message on a subgraph if it is not generating indexing rewards.
What are the hardware requirements?
Small - Enough to get started indexing several subgraphs, will likely need to be expanded.
Standard - Default setup, this is what is used in the example k8s/terraform deployment manifests.
Medium - Production indexer supporting 100 subgraphs and 200-500 requests per second.
Large - Prepared to index all currently used subgraphs and serve requests for the related traffic.
Setup | Postgres (CPUs) | Postgres (memory in GBs) | Postgres (disk in TBs) | VMs (CPUs) | VMs (memory in GBs) |
Small | 4 | 8 | 1 | 4 | 16 |
Standard | 8 | 30 | 1 | 12 | 48 |
Medium | 16 | 64 | 2 | 32 | 64 |
Large | 72 | 468 | 3.5 | 48 | 184 |
What are some basic security precautions an indexer should take?
Operator wallet - Setting up an operator wallet is an important precaution because it allows an indexer to maintain separation between their keys that control stake and those that are in control of day-to-day operations. See Stake in Protocol for instructions.
Firewall - Only the indexer service needs to be exposed publicly and particular attention should be paid to locking down admin ports and database access: the Graph Node JSON-RPC endpoint (default port: 8030), the indexer management API endpoint (default port: 18000), and the Postgres database endpoint (default port: 5432) should not be exposed.
Infrastructure
At the center of an indexer's infrastructure is the Graph Node which monitors Ethereum, extracts and loads data per a subgraph definition and serves it as a GraphQL API. The Graph Node needs to be connected to Ethereum EVM node endpoints, and IPFS node for sourcing data; a PostgreSQL database for its store; and indexer components which facilitate its interactions with the network.
PostgreSQL database - The main store for the Graph Node, this is where subgraph data is stored. The indexer service and agent also use the database to store state channel data, cost models, and indexing rules.
Ethereum endpoint - An endpoint that exposes an Ethereum JSON-RPC API. This may take the form of a single Ethereum client or it could be a more complex setup that load balances across multiple. It's important to be aware that certain subgraphs will require particular Ethereum client capabilities such as archive mode and the tracing API.
IPFS node (version less than 5) - Subgraph deployment metadata is stored on the IPFS network. The Graph Node primarily accesses the IPFS node during subgraph deployment to fetch the subgraph manifest and all linked files. Network indexers do not need to host their own IPFS node, an IPFS node for the network is hosted at https://ipfs.network.thegraph.com.
Indexer service - Handles all required external communications with the network. Shares cost models and indexing statuses, passes query requests from gateways on to a Graph Node, and manages the query payments via state channels with the gateway.
Indexer agent - Facilitates the indexers interactions on chain including registering on the network, managing subgraph deployments to its Graph Node/s, and managing allocations.
Prometheus metrics server - The Graph Node and Indexer components log their metrics to the metrics server.
Note: To support agile scaling, it is recommended that query and indexing concerns are separated between different sets of nodes: query nodes and index nodes.
Ports overview
Important: Be careful about exposing ports publicly - administration ports should be kept locked down. This includes the the Graph Node JSON-RPC and the indexer management endpoints detailed below.
Graph Node
Port | Purpose | Routes | CLI Argument | Environment Variable |
---|---|---|---|---|
8000 | GraphQL HTTP server (for subgraph queries) | /subgraphs/id/... /subgraphs/name/.../... | --http-port | - |
8001 | GraphQL WS (for subgraph subscriptions) | /subgraphs/id/... /subgraphs/name/.../... | --ws-port | - |
8020 | JSON-RPC (for managing deployments) | / | --admin-port | - |
8030 | Subgraph indexing status API | /graphql | --index-node-port | - |
8040 | Prometheus metrics | /metrics | --metrics-port | - |
Indexer Service
Port | Purpose | Routes | CLI Argument | Environment Variable |
---|---|---|---|---|
7600 | GraphQL HTTP server (for paid subgraph queries) | /subgraphs/id/... /status /channel-messages-inbox | --port |
|
7300 | Prometheus metrics | /metrics | --metrics-port | - |
Indexer Agent
Port | Purpose | Routes | CLI Argument | Environment Variable |
---|---|---|---|---|
8000 | Indexer management API | / | --indexer-management-port |
|
Setup server infrastructure using Terraform on Google Cloud
Install prerequisites
Google Cloud SDK
Kubectl command line tool
Terraform
Create a Google Cloud Project
Clone or navigate to the indexer repository.
Navigate to the ./terraform directory, this is where all commands should be executed.
Authenticate with Google Cloud and create a new project.
Use the Google Cloud Console's billing page to enable billing for the new project.
Create a Google Cloud configuration.
Enable required Google Cloud APIs.
Create a service account.
Enable peering between database and Kubernetes cluster that will be created in the next step.
Create minimal terraform configuration file (update as needed).
Use Terraform to create infrastructure
Before running any commands, read through variables.tf and create a file terraform.tfvars
in this directory (or modify the one we created in the last step). For each variable where you want to override the default, or where you need to set a value, enter a setting into terraform.tfvars
.
Run the following commands to create the infrastructure.
Download credentials for the new cluster into ~/.kube/config
and set it as your default context.
Creating the Kubernetes components for the indexer
Copy the directory
k8s/overlays
to a new directory$dir,
and adjust thebases
entry in$dir/kustomization.yaml
so that it points to the directoryk8s/base
.Read through all the files in
$dir
and adjust any values as indicated in the comments.
Deploy all resources with kubectl apply -k $dir
.
Graph Node
Graph Node is an open source Rust implementation that event sources the Ethereum blockchain to deterministically update a data store that can be queried via the GraphQL endpoint. Developers use subgraphs to define their schema, and a set of mappings for transforming the data sourced from the block chain and the Graph Node handles syncing the entire chain, monitoring for new blocks, and serving it via a GraphQL endpoint.
Getting started from source
Install prerequisites
Rust
PostgreSQL
IPFS
Additional Requirements for Ubuntu users - To run a Graph Node on Ubuntu a few additional packages may be needed.
Setup
Start a PostgreSQL database server
Clone Graph Node repo and build the source by running
cargo build
Now that all the dependencies are setup, start the Graph Node:
Getting started using Docker
Prerequisites
Ethereum node - By default, the docker compose setup will use mainnet: http://host.docker.internal:8545 to connect to the Ethereum node on your host machine. You can replace this network name and url by updating
docker-compose.yaml
.
Setup
Clone Graph Node and navigate to the Docker directory:
For linux users only - Use the host IP address instead of
host.docker.internal
in thedocker-compose.yaml
using the included script:
Start a local Graph Node that will connect to your Ethereum endpoint:
Indexer components
To successfully participate in the network requires almost constant monitoring and interaction, so we've built a suite of Typescript applications for facilitating an Indexers network participation. There are three indexer components:
Indexer agent - The agent monitors the network and the indexer's own infrastructure and manages which subgraph deployments are indexed and allocated towards on chain and how much is allocated towards each.
Indexer service - The only component that needs to be exposed externally, the service passes on subgraph queries to the graph node, manages state channels for query payments, shares important decision making information to clients like the gateways.
Indexer CLI - The command line interface for managing the indexer agent. It allows indexers to manage cost models and indexing rules.
Getting started
The indexer agent and indexer service should be co-located with your Graph Node infrastructure. There are many ways to setup virtual execution environments for you indexer components; here we'll explain how to run them on baremetal using NPM packages or source, or via kubernetes and docker on the Google Cloud Kubernetes Engine. If these setup examples do not translate well to your infrastructure there will likely be a community guide to reference, come say hi on Discord! Remember to stake in the protocol before starting up your indexer components!
From NPM packages
From source
Using docker
Pull images from the registry
Or build images locally from source
Run the components
NOTE: After starting the containers, the indexer service should be accessible at http://localhost:7600 and the indexer agent should be exposing the indexer management API at http://localhost:18000/.
Using K8s and Terraform
See the Setup Server Infrastructure Using Terraform on Google Cloud section
Usage
NOTE: All runtime configuration variables may be applied either as parameters to the command on startup or using environment variables of the format
COMPONENT_NAME_VARIABLE_NAME
(ex.INDEXER_AGENT_ETHEREUM
).
Indexer agent
Indexer service
Indexer CLI
The Indexer CLI is a plugin for @graphprotocol/graph-cli
accessible in the terminal at graph indexer
.
Indexer management using indexer CLI
The indexer agent needs input from an indexer in order to autonomously interact with the network on the behalf of the indexer. The mechanism for defining indexer agent behavior are the indexing rules. Using indexing rules an indexer can apply their specific strategy for picking subgraphs to index and serve queries for. Rules are managed via a GraphQL API served by the agent and known as the Indexer Management API. The suggested tool for interacting with the Indexer Management API is the Indexer CLI, an extension to the Graph CLI.
Usage
The Indexer CLI connects to the indexer agent, typically through port-forwarding, so the CLI does not need to run on the same server or cluster. To help you get started, and to provide some context, the CLI will briefly be described here.
graph indexer connect <url>
- Connect to the indexer management API. Typically the connection to the server is opened via port forwarding, so the CLI can be easily operated remotely. (Example:kubectl port-forward pod/<indexer-agent-pod> 8000:8000
)graph indexer rules get [options] <deployment-id< [<key1> ...]
- Get one or more indexing rules usingall
as the<deployment-id>
to get all rules, orglobal
to get the global defaults. An additional argument--merged
can be used to specify that deployment specific rules are merged with the global rule. This is how they are applied in the indexer agent.graph indexer rules set [options] <deployment-id> <key1> <value1> ...
- Set one or more indexing rules.graph indexer rules start [options] <deployment-id>
- Start indexing a subgraph deployment if available and set itsdecisionBasis
toalways
, so the indexer agent will always choose to index it. If the global rule is set to always then all available subgraphs on the network will be indexed.graph indexer rules stop [options] <deployment-id>
- Stop indexing a deployment and set itsdecisionBasis
to never, so it will skip this deployment when deciding on deployments to index.graph indexer rules maybe [options] <deployment-id>
— SetthedecisionBasis
for a deployment torules
, so that the indexer agent will use indexing rules to decide whether to index this deployment.
All commands which display rules in the output can choose between the supported output formats (table
, yaml
, and json
) using the -output
argument.
Indexing rules
Indexing rules can either be applied as global defaults or for specific subgraph deployments using their IDs. The deployment
and decisionBasis
fields are mandatory, while all other fields are optional. When an indexing rule has rules
as the decisionBasis
, then the indexer agent will compare non-null threshold values on that rule with values fetched from the network for the corresponding deployment. If the subgraph deployment has values above (or below) any of the thresholds it will be chosen for indexing.
For example, if the global rule has a minStake
of 5 (GRT), any subgraph deployment which has more than 5 (GRT) of stake allocated to it will be indexed. Threshold rules include maxAllocationPercentage
, minSignal
, maxSignal
, minStake
, and minAverageQueryFees
.
Data model:
Cost models
Cost models provide dynamic pricing for queries based on market and query attributes. The Indexer Service shares a cost model with the gateways for each subgraph for which they intend to respond to queries. The gateways, in turn, use the cost model to make indexer selection decisions per query and to negotiate payment with chosen indexers.
Agora
The Agora language provides a flexible format for declaring cost models for queries. An Agora price model is a sequence of statements that execute in order for each top-level query in a GraphQL query. For each top-level query, the first statement which matches it determines the price for that query.
A statement is comprised of a predicate, which is used for matching GraphQL queries, and a cost expression which when evaluated outputs a cost in decimal GRT. Values in the named argument position of a query may be captured in the predicate and used in the expression. Globals may also be set and substituted in for placeholders in an expression.
Example cost model:
Example query costing using the above model:
Query | Price |
---|---|
{ pairs(skip: 5000) { id } } | 0.5 GRT |
{ tokens { symbol } } | 0.1 GRT |
{ pairs(skip: 5000) { id { tokens } symbol } } | 0.6 GRT |
Applying the cost model
Cost models are applied via the Indexer CLI, which passes them to the Indexer Management API of the indexer agent for storing in the database. The Indexer Service will then pick them up and serve the cost models to gateways whenever they ask for them.
Interacting with the network
Stake in the protocol
The first steps to participating in the network as an Indexer are to approve the protocol, stake funds, and (optionally) set up an operator address for day-to-day protocol interactions. _ Note: For the purposes of these instructions Remix will be used for contract interaction, but feel free to use your tool of choice (OneClickDapp, ABItopic, and MyCrypto are a few other known tools)._
Once an indexer has staked GRT in the protocol, the indexer components can be started up and begin their interactions with the network.
Approve tokens
Open the Remix app in a browser
In the
File Explorer
create a file named GraphToken.abi with the token ABI.With
GraphToken.abi
selected and open in the editor, switch to the Deploy andRun Transactions
section in the Remix interface.Under environment select
Injected Web3
and underAccount
select your indexer address.Set the GraphToken contract address - Paste the GraphToken contract address (
0xc944E90C64B2c07662A292be6244BDf05Cda44a7
) next toAt Address
and click theAt address
button to apply.Call the
approve(spender, amount)
function to approve the Staking contract. Fill inspender
with the Staking contract address (0xF55041E37E12cD407ad00CE2910B8269B01263b9
) andamount
with the tokens to stake (in wei).
Stake tokens
Open the Remix app in a browser
In the
File Explorer
create a file named Staking.abi with the staking ABI.With
Staking.abi
selected and open in the editor, switch to theDeploy
andRun Transactions
section in the Remix interface.Under environment select
Injected Web3
and underAccount
select your indexer address.Set the Staking contract address - Paste the Staking contract address (
0xF55041E37E12cD407ad00CE2910B8269B01263b9
) next toAt Address
and click theAt address
button to apply.Call
stake()
to stake GRT in the protocol.(Optional) Indexers may approve another address to be the operator for their indexer infrastructure in order to separate the keys that control the funds from those that are performing day to day actions such as allocating on subgraphs and serving (paid) queries. In order to set the operator call
setOperator()
with the operator address.(Optional) In order to control the distribution of rewards and strategically attract delegators indexers can update their delegation parameters by updating their indexingRewardCut (parts per million), queryFeeCut (parts per million), and cooldownBlocks (number of blocks). To do so call
setDelegationParameters()
. The following example sets the queryFeeCut to distribute 95% of query rebates to the indexer and 5% to delegators, set the indexingRewardCutto distribute 60% of indexing rewards to the indexer and 40% to delegators, and setthecooldownBlocks
period to 500 blocks.
The life of an allocation
After being created by an indexer a healthy allocation goes through four states.
Active - Once an allocation is created on-chain (allocateFrom()) it is considered active. A portion of the indexer's own and/or delegated stake is allocated towards a subgraph deployment, which allows them to claim indexing rewards and serve queries for that subgraph deployment. The indexer agent manages creating allocations based on the indexer rules.
Closed - An indexer is free to close an allocation once 1 epoch has passed (closeAllocation()) or their indexer agent will automatically close the allocation after the maxAllocationEpochs (currently 28 days). When an allocation is closed with a valid proof of indexing (POI) their indexing rewards are distributed to the indexer and its delegators (see "how are rewards distributed?" below to learn more).
Finalized - Once an allocation has been closed there is a dispute period after which the allocation is considered finalized and it's query fee rebates are available to be claimed (claim()). The indexer agent monitors the network to detect finalized allocations and claims them if they are above a configurable (and optional) threshold, —-allocation-claim-threshold.
Claimed - The final state of an allocation; it has run its course as an active allocation, all eligible rewards have been distributed and its query fee rebates have been claimed.
Last updated