Data Structures in Blockchain for Analytics Engineers Part 2: How to Query Blockchain Data
Mastering blockchain data extraction,techniques and tools for analytics engineers
In Part 1 of our series, we explored the fundamental data structures that form the foundation of blockchain technology. We looked into linked lists, Merkle trees, hash tables, Patricia tries, and directed acyclic graphs (DAGs), providing analytics engineers with a firm understanding of blockchain architecture.

Now, in Part 2, we'll build upon that knowledge by focusing on where to find blockchain data and how to query it effectively. This guide is crucial for analytics engineers looking to apply their skills in the web3 space. We'll explore various data sources, and querying methods that will enable you to extract valuable insights from blockchain networks.
If this sounds exciting to you, let’s dive right in to learn about the intricacies of blockchain data and how to explore it.
🔍 What is Blockchain Data?
Blockchain data refers to publicly accessible information about transactions and activities that have occurred on blockchain networks such as Bitcoin, Ethereum, Polygon, and BNB Smart Chain (BSC). Often referred to as on-chain data, it serves as the lifeblood of blockchain networks, providing a transparent and immutable record of all activities. With blockchain data, it is possible to track and analyse cryptocurrency transactions, monitor smart contract interactions, identify trends in decentralized finance (DeFi) protocols, trace the movement of assets, and generate advanced market insights through blockchain data analytics.
There are three main types of blockchain data:
- Transaction Data: This is the information about individual transactions on a blockchain network, including sender and receiver addresses, the amount transferred, transaction ID, and timestamp e.g., a Bitcoin transfer from Alice to Bob. 
- Block Data: Data about a block in the blockchain includes Block ID, timestamp, list of transactions, previous block hash, and nonce e.g., a block containing 100 Bitcoin transactions. 
- Smart Contract Data: This is the information regarding smart contracts deployed to a blockchain network i.e., contract code, state, logs, and emitted events. 
🗄️Blockchain Data Sources
Blockchain data sources are key to analyzing and understanding blockchain networks. They offer access to a variety of data types, including on-chain, off-chain, and indexed data. For analytics engineers looking to access blockchain data, several options are available, which we will discuss below:
- On-chain data refers to information directly stored on the blockchain, accessible to anyone with an internet connection. There are two primary ways to access on-chain data: - Block explorers: These are web-based tools that allow users to view and interact with blockchain data, e.g., Etherscan.io, Blockchain.com, Blockcypher.com, Blockchair.com, etc. 
- Node APIs: Node APIs allow direct interaction with blockchain nodes. This method provides the most up-to-date data but requires running a node or connecting to a node service, e.g., Ethereum JSON-RPC. 
 
from web3 import Web3
# Connect to an Ethereum node (replace with your node URL)
w3 = Web3(Web3.HTTPProvider('<https://mainnet.infura.io/v3/YOUR-PROJECT-ID>'))
# Check if connected
print(w3.isConnected())
# Get balance of an address
address = '0x742d35Cc6634C0532925a3b844Bc454e4438f44e'
balance = w3.eth.get_balance(address)
print(f"Balance: {w3.from_wei(balance, 'ether')} ETH")
- Off-chain data: Unlike on-chain data, off-chain data refers to information related to blockchain activities but not stored directly on the blockchain network. This data can provide valuable context and additional insights when combined with on-chain data. It includes: - Centralized exchanges provide data on trading volumes, order books, and price movements. Examples include Coinbase, Binance, and Kraken. 
- DeFi protocols, or decentralized finance protocols, offer data on liquidity pools, yield farming, and other on-chain financial activities. Examples include Uniswap, Aave, MakerDAO, and Curve Finance. 
 
- Indexed data: Indexed data services process and organise blockchain data to make it easier to query. These services are crucial for making blockchain data accessible for complex analyses. An example of an indexed data service/source is The Graph which is a decentralized protocol for indexing and querying blockchain data. It allows developers to create and publish open APIs, called subgraphs, which make it easier to query specific data from various blockchain networks. - Other indexing services like Covalent and Moralis provide indexed blockchain data across multiple chains. They offer APIs that simplify querying and analyzing blockchain data. 
from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport
# Initialize a Graph client
transport = RequestsHTTPTransport(
    url='<https://api.thegraph.com/subgraphs/name/uniswap/uniswap-v2>'
)
client = Client(transport=transport, fetch_schema_from_transport=True)
# Define the query
query = gql('''
{
  pairs(first: 5, orderBy: reserveUSD, orderDirection: desc) {
    id
    token0 {
      symbol
    }
    token1 {
      symbol
    }
    reserveUSD
  }
}
''')
# Execute the query
result = client.execute(query)
# Print the results
for pair in result['pairs']:
    print(f"{pair['token0']['symbol']}-{pair['token1']['symbol']}: ${float(pair['reserveUSD']):,.2f}")
📊Querying Methods and Interfaces
Now that we understand the various sources of blockchain data, let's explore the different methods and interfaces used to query it. As an analytics engineer, you'll need to be familiar with these querying techniques to work effectively with blockchain data.
- REST APIs: Representational State Transfer (REST) APIs are widely used in web development and blockchain data querying. They are suitable for simple data retrieval tasks and are often the easiest to implement. They're ideal for querying specific information, such as account balances or transaction details. Many block explorers, like Etherscan, provide REST APIs for accessing blockchain data. 
- GraphQL: GraphQL is a query language that has gained popularity in recent years, including in the blockchain space. It allows for flexible, efficient querying of blockchain data. Clients can request exactly the data they need, reducing the over-fetching and under-fetching of data. GraphQL is especially useful for complex queries that fetch related data in a single request. This can significantly reduce the number of API calls needed and improve performance. 
- SQL-like queries: For analytics engineers familiar with SQL, some platforms offer SQL-like interfaces for querying blockchain data. Platforms like Dune Analytics, Flipside let users query blockchain data using SQL syntax, making it easier for those with traditional database experience to transition to web3. 
SELECT
    date_trunc('day', block_time) AS date,
    COUNT(*) AS daily_transactions
FROM ethereum.transactions
WHERE block_time >= NOW() - INTERVAL '30 days'
GROUP BY 1
ORDER BY 1
SQL-like interfaces leverage existing SQL skills, making it easier for analytics engineers to transition to web3. They also allow for complex data manipulations and aggregations familiar to SQL users.
🚀Advanced Querying Techniques
As you become more proficient with blockchain data, you'll encounter scenarios requiring advanced querying techniques. These methods enable more efficient and targeted data retrieval and analysis.
- Working with Merkle proofs: Merkle proofs, which we discussed in Part 1 of this series, allow for efficient verification of data integrity without downloading the entire blockchain. Understanding how to work with Merkle proofs is crucial for certain types of web3 analytics, especially when dealing with light clients or when verifying specific transactions without processing the entire block. 
- Querying state tries in Ethereum: State tries in Ethereum store the current state of all accounts and contracts. Querying these tries directly can provide insights into the current state of the Ethereum network without needing to process the entire transaction history. 
from web3 import Web3
w3 = Web3(Web3.HTTPProvider('<https://mainnet.infura.io/v3/YOUR-PROJECT-ID>'))
# Get the storage at a specific position for a contract
contract_address = '0x6B175474E89094C44Da98b954EedeAC495271d0F'  # DAI token contract
storage_position = 0
storage_value = w3.eth.get_storage_at(contract_address, storage_position)
print(f"Storage value: {storage_value.hex()}")
- Techniques for querying large datasets efficiently. When working with blockchain data, you'll often encounter large datasets that require efficient querying techniques: - Use pagination to break large datasets into manageable chunks 
- Implement caching strategies to reduce redundant queries 
- Utilize parallel processing for complex computations 
 
🛠️Tools and Platforms for Blockchain Data Analysis
There are several tools and platforms that facilitate the analysis of blockchain data, offering various features such as querying, visualization, and real-time insights. Key tools include:
- Dune Analytics: Dune Analytics is a powerful platform that allows users to query, visualize, and share blockchain data. It uses SQL to interact with the data and provides a collaborative environment where analysts can create and share dashboards. 
- The Graph: The Graph is a decentralized protocol for indexing and querying blockchain data. It allows developers to create and publish open APIs, called subgraphs, that applications can query using GraphQL. This makes it easier to access and analyze blockchain data. 
- Nansen: Nansen is an analytics platform for blockchain that combines on-chain data with a massive database of wallet addresses. It provides insights into market trends, token flows, and the activities of various market participants. 
- Glassnode: Glassnode provides on-chain data and intelligence for blockchain assets. It offers a range of metrics and tools for analyzing blockchain networks, market indicators, and exchange activity, helping investors and analysts make informed decisions. 
- Flipside Crypto: Flipside Crypto provides real-time, on-demand analytics on blockchain networks. It transforms blockchain data into actionable insights for various stakeholders, including developers, investors, and analysts. 
- Chainalysis: Chainalysis offers blockchain data and analysis services to government agencies, exchanges, financial institutions, and cybersecurity companies. It helps these entities monitor, investigate, and comply with regulatory requirements. 
- Santiment: Santiment provides data feeds, signals, and cryptocurrency market analysis. It aggregates data from multiple sources, including on-chain, social media, and development activity, to provide a comprehensive market view. 
- CryptoQuant: CryptoQuant is a platform that provides on-chain data, market indicators, and alerts. It offers insights to help traders and investors understand market trends and make better trading decisions. 
- Messari: Messari offers a range of tools for crypto research, including on-chain data analysis, market insights, and news aggregation. It provides a comprehensive suite of analytics tools for tracking and analyzing blockchain assets. 
🎯Summary
In this article, we explored the world of blockchain data and how to query it effectively. We covered the types of blockchain data, various data sources, querying methods and interfaces, advanced techniques, and popular tools and platforms for web3.
By understanding these concepts and mastering various querying techniques, you'll be better equipped to design efficient queries, build robust data pipelines, and derive meaningful insights from blockchain data. Whether you're tracking DeFi protocol usage, analyzing NFT trends, or monitoring network health, these tools and techniques form the foundation of blockchain analytics.
Remember, the key to successful web3 analytics is to start simple, experiment with different tools and techniques, and gradually build your expertise. Don't be afraid to dive in and get your hands dirty with real data – that's the best way to learn and grow in this exciting field.
References
- Ethereum.org. "JSON-RPCAPI."https://ethereum.org/en/developers/docs/apis/json-rpc/ 
- Etherscan. "Etherscan APIs."https://docs.etherscan.io/ 
- The Graph. "Introduction to The Graph." https://thegraph.com/docs/en/ 
- Dune Analytics. "Dune Analytics Documentation." https://dune.com/docs/ 
- Web3.py. "Web3.py Documentation."https://web3py.readthedocs.io/ 
- Uniswap. "Uniswap V2 Subgraph."https://thegraph.com/explorer/subgraph/uniswap/uniswap-v2 
- Glassnode. "Glassnode Academy."https://academy.glassnode.com/ 
- Nansen. "Nansen API Documentation."https://docs.nansen.ai/ 
- Covalent. "Covalent API Documentation." https://www.covalenthq.com/docs/api/ 
- Moralis. "Moralis Documentation." https://docs.moralis.io/ 
- Moralis. “How to Query Blockchain Data” https://moralis.io/how-to-query-blockchain-data-for-transactions-balances-and-more/ 

