Building Analytics That Scale: Our PostHog + BigQuery Architecture

November 8, 2025
Best Practices
11 min read
By OnlyLinks Team
Building Analytics That Scale: Our PostHog + BigQuery Architecture

Building Analytics That Actually Scale: Our PostHog + BigQuery Architecture

When we set out to build OnlyLinks, we knew analytics would be mission-critical. Content creators need real-time insights into their traffic, clicks, and audience behavior—and they need it fast. But traditional analytics solutions either didn't scale well or required rebuilding the wheel from scratch.

Here's how we built an analytics system that handles millions of events while keeping queries blazing fast and costs under control.

The Hybrid Architecture: Best of Both Worlds

Most teams choose either a hosted analytics platform (like PostHog, Mixpanel, or Amplitude) or a data warehouse (like BigQuery, Snowflake, or Redshift). We chose both—and that makes all the difference.

PostHog handles event capture, giving us:

  • Battle-tested SDKs for client and server-side tracking
  • Automatic session recording and dead click detection
  • Rich event properties with GeoIP lookup, device detection, and browser fingerprinting
  • A powerful ingestion pipeline that never drops events

BigQuery handles queries, giving us:

  • Sub-second queries across billions of rows
  • Materialized views that pre-aggregate data
  • Clustering and partitioning for optimal performance
  • Cost-effective storage with columnar compression

This separation of concerns is key. PostHog is optimized for capturing every single event reliably, while BigQuery is optimized for querying that data efficiently.

The Secret Sauce: Materialized Views

Here's where things get interesting. Raw event data is flexible but slow to query. Traditional analytics platforms force you to define metrics upfront. We found a middle ground: materialized views with intelligent clustering.

We maintain several pre-aggregated views that transform raw events into queryable metrics:

1. Hourly Traffic Aggregation

CREATE MATERIALIZED VIEW analytics.hourly_traffic
CLUSTER BY hour_timestamp, user_path
AS
SELECT
  TIMESTAMP_TRUNC(event_timestamp, HOUR) AS hour_timestamp,
  user_path,
  COUNT(*) AS view_count,
  COUNT(DISTINCT session_id) AS unique_sessions
FROM analytics.raw_events
WHERE event_type = 'page_view'
GROUP BY hour_timestamp, user_path;

Why this matters: Clustering by hour_timestamp and user_path means queries filtering by time range and user are ridiculously fast. BigQuery can skip entire data blocks that don't match your query.

2. Geographic & Device Analytics

CREATE MATERIALIZED VIEW analytics.audience_distribution
CLUSTER BY date, user_path, country_code, device_category
AS
SELECT
  DATE(event_timestamp) AS date,
  user_path,
  COALESCE(geo_country, 'Unknown') AS country_code,
  COALESCE(device_type, 'Unknown') AS device_category,
  COALESCE(browser_name, 'Unknown') AS browser_name,
  COUNT(*) AS event_count
FROM analytics.raw_events
WHERE event_type = 'page_view'
GROUP BY date, user_path, country_code, device_category, browser_name;

The clustering strategy is crucial. BigQuery allows a maximum of 4 clustering fields, so we prioritized based on our most common query patterns:

  1. date - nearly all queries filter by date range
  2. user_path - filtering by specific users/profiles
  3. country_code - geographic analytics are frequently requested
  4. device_category - device breakdowns are common

Browser is included but not clustered—a deliberate trade-off we made after analyzing our query patterns.

3. Link Engagement Tracking

CREATE MATERIALIZED VIEW analytics.link_performance
CLUSTER BY date, user_path, destination_url
AS
SELECT
  DATE(event_timestamp) AS date,
  user_path,
  destination_url,
  link_title,
  COUNT(*) AS click_count,
  COUNT(DISTINCT session_id) AS unique_clickers
FROM analytics.raw_events
WHERE event_type = 'link_click'
GROUP BY date, user_path, destination_url, link_title;

This powers our "Top Links" analytics, showing creators which specific links are driving the most engagement.

Redis Caching: The Performance Multiplier

Even with optimized BigQuery queries, hitting the database on every request adds latency. That's where Redis comes in—but not in the way you might expect.

We use multi-level caching with intelligent invalidation:

async function getAnalyticsDashboard(userId: string, timeRange: string) {
  const cacheKey = `analytics:${userId}:${timeRange}`;
  
  // Try Redis first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // Cache miss - query BigQuery
  const data = await queryBigQueryForDashboard(userId, timeRange);
  
  // Cache for 5 minutes
  await redis.setex(cacheKey, 300, JSON.stringify(data));
  
  return data;
}

Key insight: We cache at the API response level, not individual query results. This means:

  • A single cache hit returns the entire analytics dashboard
  • Cache invalidation is simple—just a TTL
  • No complex dependency tracking between queries

For real-time campaign tracking, we implemented tiered caching with fallback behavior:

async function getCampaignMetrics(campaignId: string) {
  const inMemoryCache = memoryCache.get(campaignId);
  
  // Try in-memory cache first (faster but shorter TTL)
  if (inMemoryCache && Date.now() - inMemoryCache.timestamp < 60000) {
    return inMemoryCache.data;
  }
  
  // Try Redis cache
  const redisCache = await redis.get(`campaign:${campaignId}`);
  if (redisCache) {
    const data = JSON.parse(redisCache);
    memoryCache.set(campaignId, { data, timestamp: Date.now() });
    return data;
  }
  
  // Query database
  const freshData = await queryDatabase(campaignId);
  
  // Update both caches
  await redis.setex(`campaign:${campaignId}`, 300, JSON.stringify(freshData));
  memoryCache.set(campaignId, { data: freshData, timestamp: Date.now() });
  
  return freshData;
}

This ensures analytics are always available, even during database maintenance or temporary outages.

Batch Querying: From N Queries to 1

Here's a problem we ran into early: agency dashboards showing analytics for 100+ creators would make 100+ separate BigQuery queries. Even with caching, cold loads were brutally slow.

Our solution? Batch everything into a single query using CTEs and array parameters.

WITH user_pageviews AS (
  SELECT
    user_path,
    SUM(view_count) AS total_views,
    SUM(unique_sessions) AS total_sessions
  FROM analytics.hourly_traffic
  WHERE user_path IN UNNEST(@user_paths)
    AND hour_timestamp >= @start_date
    AND hour_timestamp < @end_date
  GROUP BY user_path
),
user_clicks AS (
  SELECT
    user_path,
    SUM(click_count) AS total_clicks,
    COUNT(DISTINCT destination_url) AS unique_links
  FROM analytics.link_performance
  WHERE user_path IN UNNEST(@user_paths)
    AND date >= @start_date
    AND date < @end_date
  GROUP BY user_path
),
combined_metrics AS (
  SELECT
    pv.user_path,
    pv.total_views,
    pv.total_sessions,
    cl.total_clicks,
    cl.unique_links,
    SAFE_DIVIDE(cl.total_clicks, pv.total_views) * 100 AS ctr
  FROM user_pageviews pv
  LEFT JOIN user_clicks cl USING (user_path)
)
SELECT * FROM combined_metrics
ORDER BY total_views DESC;

Then in our application code:

async function getAgencyDashboard(agencyId: string, period: string) {
  // Get all creator paths for this agency
  const creators = await db.getAgencyCreators(agencyId);
  const userPaths = creators.map(c => c.path);
  
  // Single batched query instead of N queries
  const results = await bigquery.query({
    query: BATCH_ANALYTICS_QUERY,
    params: {
      user_paths: userPaths,
      start_date: getStartDate(period),
      end_date: new Date()
    }
  });
  
  return processResults(results);
}

The impact: Agency dashboards went from 15-30 seconds to load down to under 2 seconds. And because we're processing data in BigQuery instead of in our application servers, we use far less memory and CPU.

Intelligent Anomaly Detection

Here's something most analytics platforms don't do: proactive alerting for traffic anomalies.

We built automated detection that runs daily to identify unusual patterns:

Traffic Spike & Drop Detection

WITH daily_stats AS (
  SELECT
    user_path,
    DATE(hour_timestamp) AS date,
    SUM(view_count) AS daily_views
  FROM analytics.hourly_traffic
  WHERE hour_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY)
  GROUP BY user_path, date
),
baseline_comparison AS (
  SELECT
    user_path,
    AVG(CASE WHEN date < CURRENT_DATE() THEN daily_views END) AS avg_views,
    MAX(CASE WHEN date = CURRENT_DATE() THEN daily_views END) AS today_views
  FROM daily_stats
  GROUP BY user_path
  HAVING avg_views > 100  -- Only users with meaningful traffic
),
anomaly_detection AS (
  SELECT
    user_path,
    today_views,
    avg_views,
    ((today_views - avg_views) / avg_views) * 100 AS percent_change,
    CASE
      WHEN today_views < avg_views * 0.5 THEN 'DROP'
      WHEN today_views > avg_views * 2.0 THEN 'SURGE'
      ELSE 'NORMAL'
    END AS alert_type
  FROM baseline_comparison
)
SELECT *
FROM anomaly_detection
WHERE alert_type != 'NORMAL';

This runs daily and alerts creators when something unusual happens—like a viral post driving 10x traffic, or a technical issue causing views to plummet.

Click-Through Rate Monitoring

WITH daily_metrics AS (
  SELECT
    user_path,
    DATE(hour_timestamp) AS date,
    SUM(view_count) AS views,
    COALESCE(
      (SELECT SUM(click_count) 
       FROM analytics.link_performance lp 
       WHERE lp.user_path = ht.user_path 
         AND lp.date = DATE(ht.hour_timestamp)),
      0
    ) AS clicks
  FROM analytics.hourly_traffic ht
  WHERE hour_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY)
  GROUP BY user_path, date
),
ctr_analysis AS (
  SELECT
    user_path,
    AVG(SAFE_DIVIDE(clicks, views) * 100) AS avg_ctr,
    MAX(CASE WHEN date = CURRENT_DATE() 
        THEN SAFE_DIVIDE(clicks, views) * 100 
        END) AS today_ctr
  FROM daily_metrics
  WHERE views > 50  -- Statistical significance threshold
  GROUP BY user_path
)
SELECT
  user_path,
  today_ctr,
  avg_ctr,
  ((today_ctr - avg_ctr) / avg_ctr) * 100 AS ctr_change_percent
FROM ctr_analysis
WHERE ABS((today_ctr - avg_ctr) / avg_ctr) > 0.5  -- 50% change threshold
ORDER BY ABS((today_ctr - avg_ctr) / avg_ctr) DESC;

Why this matters: A creator's CTR dropping 50% might indicate broken links, confusing copy, or audience mismatch. Catching this early helps creators optimize their pages before losing significant engagement.

Performance Monitoring Built-In

Every query logs its resource usage to help us optimize costs:

async function executeMonitoredQuery(sql: string, params: any) {
  const startTime = Date.now();
  
  const [job] = await bigquery.createQueryJob({ query: sql, params });
  const [rows] = await job.getQueryResults();
  const [metadata] = await job.getMetadata();
  
  const stats = {
    duration: Date.now() - startTime,
    bytesProcessed: metadata.statistics.totalBytesProcessed,
    bytesBilled: metadata.statistics.totalBytesBilled,
    rowsReturned: rows.length
  };
  
  // Log to monitoring service
  logger.info('BigQuery execution', {
    query: sql.substring(0, 100),
    ...stats,
    costEstimate: (stats.bytesBilled / 1e12) * 5 // $5 per TB
  });
  
  // Alert if query is expensive
  if (stats.bytesBilled > 1e9) { // > 1GB
    alerts.notify('Expensive query detected', stats);
  }
  
  return rows;
}

This gives us real-time visibility into query costs. We've optimized several queries by 90%+ just by reviewing these logs.

For example, we discovered that adding a date filter reduced bytes processed from 500 MB to 50 MB for a common dashboard query—saving ~$0.002 per query, which adds up to $120/month at our query volume.

Parallel Execution with Graceful Degradation

When loading a dashboard, we fetch all metrics in parallel with individual error handling:

async function loadDashboard(userId: string, period: string) {
  const [
    trafficData,
    topLinks,
    geoDistribution,
    deviceBreakdown,
    sessionMetrics
  ] = await Promise.allSettled([
    getTrafficTimeseries(userId, period),
    getTopPerformingLinks(userId, period),
    getGeographicDistribution(userId, period),
    getDeviceBreakdown(userId, period),
    getSessionMetrics(userId, period)
  ]);
  
  return {
    traffic: trafficData.status === 'fulfilled' ? trafficData.value : null,
    topLinks: topLinks.status === 'fulfilled' ? topLinks.value : [],
    geography: geoDistribution.status === 'fulfilled' ? geoDistribution.value : [],
    devices: deviceBreakdown.status === 'fulfilled' ? deviceBreakdown.value : [],
    sessions: sessionMetrics.status === 'fulfilled' ? sessionMetrics.value : null,
    hasErrors: [trafficData, topLinks, geoDistribution, deviceBreakdown, sessionMetrics]
      .some(r => r.status === 'rejected')
  };
}

Each query has a fallback so one failure doesn't break the entire dashboard. If the "Geographic Distribution" query fails, we show an empty state—but traffic, clicks, and other metrics still load.

Parameterized Queries for Security & Performance

All queries use named parameters instead of string interpolation:

const sql = `
  SELECT
    user_path,
    SUM(view_count) AS total_views
  FROM analytics.hourly_traffic
  WHERE user_path = @userId
    AND hour_timestamp >= @startDate
    AND hour_timestamp < @endDate
  GROUP BY user_path
`;

const results = await bigquery.query({
  query: sql,
  params: {
    userId: '/creator-username',
    startDate: '2024-01-01T00:00:00Z',
    endDate: '2024-01-31T23:59:59Z'
  }
});

This provides:

  • SQL injection protection (even though we're server-side)
  • Better query plan caching in BigQuery (identical queries with different params reuse plans)
  • Type safety for parameters
  • Cleaner, more maintainable code

What We Learned

Building this analytics system taught us a few key lessons:

  1. Don't reinvent event capture. Use a proven platform like PostHog, Segment, or RudderStack. Focus your energy on the analytics experience, not the infrastructure.

  2. Materialized views are a cheat code. The upfront cost of defining views pays off 100x in query performance. Start with simple aggregations and iterate based on usage.

  3. Cluster by your query patterns. Study your most common queries and cluster accordingly. Don't just cluster by date—consider user paths, categories, or other frequently-filtered dimensions.

  4. Cache aggressively, invalidate simply. TTL-based caching is good enough for 99% of analytics use cases. Don't over-engineer it with complex invalidation logic.

  5. Batch queries ruthlessly. One complex query with CTEs is almost always faster than many simple queries, especially across a network. Plus, you save on per-query overhead.

  6. Build in observability from day one. Logging bytes processed helped us optimize costs before they became a problem. Treat query performance as a first-class metric.

  7. Graceful degradation > perfect data. It's better to show partial analytics than to show nothing when one query fails. Users understand temporary gaps—they don't understand blank screens.

  8. Anomaly detection adds massive value. Proactive alerts about traffic changes help users catch problems early and capitalize on viral moments. This feature gets mentioned in user feedback constantly.

The Results

Today, our analytics system handles:

  • 10M+ events per day ingested through PostHog
  • Sub-second queries for individual creator dashboards
  • 2-second load times for agency dashboards aggregating 100+ creators
  • 99.9% uptime with automatic failover to cached data

And most importantly: creators love it. They trust the data, rely on the alerts, and use insights to optimize their pages. We see 70%+ of active users checking their analytics at least weekly.

Technical Architecture Summary

Here's a quick reference of our stack:

  • Event Capture: PostHog (exports to BigQuery)
  • Data Warehouse: Google BigQuery
  • Caching: Redis
  • API Layer: tRPC for type-safe endpoints
  • Frontend: React + Recharts for visualizations
  • Monitoring: Custom logging + CloudWatch alerts

Next Steps

We're continuously improving our analytics system. On the roadmap:

  • Predictive analytics: Use historical patterns to forecast future traffic
  • Cohort analysis: Track user retention and engagement over time
  • Custom metrics: Let creators define their own KPIs
  • Real-time streaming: Sub-minute data freshness for critical metrics
  • Machine learning alerts: Smarter anomaly detection using ML models

Want to see it in action? Sign up for OnlyLinks and check out the analytics dashboard. It's fast, reliable, and built with the techniques described in this post.

Have questions about our architecture? We love talking about analytics infrastructure. Email us @ [email protected].


Written by the James Ash • Published November 2025

OnlyLinks Team

Author

The OnlyLinks team is dedicated to helping creators optimize their online presence with powerful link management tools and insights.

Published: