Building Analytics That Actually Scale: Our PostHog + BigQuery Architecture
When we set out to build OnlyLinks, we knew analytics would be mission-critical. Content creators need real-time insights into their traffic, clicks, and audience behavior—and they need it fast. But traditional analytics solutions either didn't scale well or required rebuilding the wheel from scratch.
Here's how we built an analytics system that handles millions of events while keeping queries blazing fast and costs under control.
The Hybrid Architecture: Best of Both Worlds
Most teams choose either a hosted analytics platform (like PostHog, Mixpanel, or Amplitude) or a data warehouse (like BigQuery, Snowflake, or Redshift). We chose both—and that makes all the difference.
PostHog handles event capture, giving us:
- Battle-tested SDKs for client and server-side tracking
- Automatic session recording and dead click detection
- Rich event properties with GeoIP lookup, device detection, and browser fingerprinting
- A powerful ingestion pipeline that never drops events
BigQuery handles queries, giving us:
- Sub-second queries across billions of rows
- Materialized views that pre-aggregate data
- Clustering and partitioning for optimal performance
- Cost-effective storage with columnar compression
This separation of concerns is key. PostHog is optimized for capturing every single event reliably, while BigQuery is optimized for querying that data efficiently.
The Secret Sauce: Materialized Views
Here's where things get interesting. Raw event data is flexible but slow to query. Traditional analytics platforms force you to define metrics upfront. We found a middle ground: materialized views with intelligent clustering.
We maintain several pre-aggregated views that transform raw events into queryable metrics:
1. Hourly Traffic Aggregation
CREATE MATERIALIZED VIEW analytics.hourly_traffic
CLUSTER BY hour_timestamp, user_path
AS
SELECT
TIMESTAMP_TRUNC(event_timestamp, HOUR) AS hour_timestamp,
user_path,
COUNT(*) AS view_count,
COUNT(DISTINCT session_id) AS unique_sessions
FROM analytics.raw_events
WHERE event_type = 'page_view'
GROUP BY hour_timestamp, user_path;
Why this matters: Clustering by hour_timestamp and user_path means queries filtering by time range and user are ridiculously fast. BigQuery can skip entire data blocks that don't match your query.
2. Geographic & Device Analytics
CREATE MATERIALIZED VIEW analytics.audience_distribution
CLUSTER BY date, user_path, country_code, device_category
AS
SELECT
DATE(event_timestamp) AS date,
user_path,
COALESCE(geo_country, 'Unknown') AS country_code,
COALESCE(device_type, 'Unknown') AS device_category,
COALESCE(browser_name, 'Unknown') AS browser_name,
COUNT(*) AS event_count
FROM analytics.raw_events
WHERE event_type = 'page_view'
GROUP BY date, user_path, country_code, device_category, browser_name;
The clustering strategy is crucial. BigQuery allows a maximum of 4 clustering fields, so we prioritized based on our most common query patterns:
date- nearly all queries filter by date rangeuser_path- filtering by specific users/profilescountry_code- geographic analytics are frequently requesteddevice_category- device breakdowns are common
Browser is included but not clustered—a deliberate trade-off we made after analyzing our query patterns.
3. Link Engagement Tracking
CREATE MATERIALIZED VIEW analytics.link_performance
CLUSTER BY date, user_path, destination_url
AS
SELECT
DATE(event_timestamp) AS date,
user_path,
destination_url,
link_title,
COUNT(*) AS click_count,
COUNT(DISTINCT session_id) AS unique_clickers
FROM analytics.raw_events
WHERE event_type = 'link_click'
GROUP BY date, user_path, destination_url, link_title;
This powers our "Top Links" analytics, showing creators which specific links are driving the most engagement.
Redis Caching: The Performance Multiplier
Even with optimized BigQuery queries, hitting the database on every request adds latency. That's where Redis comes in—but not in the way you might expect.
We use multi-level caching with intelligent invalidation:
async function getAnalyticsDashboard(userId: string, timeRange: string) {
const cacheKey = `analytics:${userId}:${timeRange}`;
// Try Redis first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Cache miss - query BigQuery
const data = await queryBigQueryForDashboard(userId, timeRange);
// Cache for 5 minutes
await redis.setex(cacheKey, 300, JSON.stringify(data));
return data;
}
Key insight: We cache at the API response level, not individual query results. This means:
- A single cache hit returns the entire analytics dashboard
- Cache invalidation is simple—just a TTL
- No complex dependency tracking between queries
For real-time campaign tracking, we implemented tiered caching with fallback behavior:
async function getCampaignMetrics(campaignId: string) {
const inMemoryCache = memoryCache.get(campaignId);
// Try in-memory cache first (faster but shorter TTL)
if (inMemoryCache && Date.now() - inMemoryCache.timestamp < 60000) {
return inMemoryCache.data;
}
// Try Redis cache
const redisCache = await redis.get(`campaign:${campaignId}`);
if (redisCache) {
const data = JSON.parse(redisCache);
memoryCache.set(campaignId, { data, timestamp: Date.now() });
return data;
}
// Query database
const freshData = await queryDatabase(campaignId);
// Update both caches
await redis.setex(`campaign:${campaignId}`, 300, JSON.stringify(freshData));
memoryCache.set(campaignId, { data: freshData, timestamp: Date.now() });
return freshData;
}
This ensures analytics are always available, even during database maintenance or temporary outages.
Batch Querying: From N Queries to 1
Here's a problem we ran into early: agency dashboards showing analytics for 100+ creators would make 100+ separate BigQuery queries. Even with caching, cold loads were brutally slow.
Our solution? Batch everything into a single query using CTEs and array parameters.
WITH user_pageviews AS (
SELECT
user_path,
SUM(view_count) AS total_views,
SUM(unique_sessions) AS total_sessions
FROM analytics.hourly_traffic
WHERE user_path IN UNNEST(@user_paths)
AND hour_timestamp >= @start_date
AND hour_timestamp < @end_date
GROUP BY user_path
),
user_clicks AS (
SELECT
user_path,
SUM(click_count) AS total_clicks,
COUNT(DISTINCT destination_url) AS unique_links
FROM analytics.link_performance
WHERE user_path IN UNNEST(@user_paths)
AND date >= @start_date
AND date < @end_date
GROUP BY user_path
),
combined_metrics AS (
SELECT
pv.user_path,
pv.total_views,
pv.total_sessions,
cl.total_clicks,
cl.unique_links,
SAFE_DIVIDE(cl.total_clicks, pv.total_views) * 100 AS ctr
FROM user_pageviews pv
LEFT JOIN user_clicks cl USING (user_path)
)
SELECT * FROM combined_metrics
ORDER BY total_views DESC;
Then in our application code:
async function getAgencyDashboard(agencyId: string, period: string) {
// Get all creator paths for this agency
const creators = await db.getAgencyCreators(agencyId);
const userPaths = creators.map(c => c.path);
// Single batched query instead of N queries
const results = await bigquery.query({
query: BATCH_ANALYTICS_QUERY,
params: {
user_paths: userPaths,
start_date: getStartDate(period),
end_date: new Date()
}
});
return processResults(results);
}
The impact: Agency dashboards went from 15-30 seconds to load down to under 2 seconds. And because we're processing data in BigQuery instead of in our application servers, we use far less memory and CPU.
Intelligent Anomaly Detection
Here's something most analytics platforms don't do: proactive alerting for traffic anomalies.
We built automated detection that runs daily to identify unusual patterns:
Traffic Spike & Drop Detection
WITH daily_stats AS (
SELECT
user_path,
DATE(hour_timestamp) AS date,
SUM(view_count) AS daily_views
FROM analytics.hourly_traffic
WHERE hour_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY)
GROUP BY user_path, date
),
baseline_comparison AS (
SELECT
user_path,
AVG(CASE WHEN date < CURRENT_DATE() THEN daily_views END) AS avg_views,
MAX(CASE WHEN date = CURRENT_DATE() THEN daily_views END) AS today_views
FROM daily_stats
GROUP BY user_path
HAVING avg_views > 100 -- Only users with meaningful traffic
),
anomaly_detection AS (
SELECT
user_path,
today_views,
avg_views,
((today_views - avg_views) / avg_views) * 100 AS percent_change,
CASE
WHEN today_views < avg_views * 0.5 THEN 'DROP'
WHEN today_views > avg_views * 2.0 THEN 'SURGE'
ELSE 'NORMAL'
END AS alert_type
FROM baseline_comparison
)
SELECT *
FROM anomaly_detection
WHERE alert_type != 'NORMAL';
This runs daily and alerts creators when something unusual happens—like a viral post driving 10x traffic, or a technical issue causing views to plummet.
Click-Through Rate Monitoring
WITH daily_metrics AS (
SELECT
user_path,
DATE(hour_timestamp) AS date,
SUM(view_count) AS views,
COALESCE(
(SELECT SUM(click_count)
FROM analytics.link_performance lp
WHERE lp.user_path = ht.user_path
AND lp.date = DATE(ht.hour_timestamp)),
0
) AS clicks
FROM analytics.hourly_traffic ht
WHERE hour_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY)
GROUP BY user_path, date
),
ctr_analysis AS (
SELECT
user_path,
AVG(SAFE_DIVIDE(clicks, views) * 100) AS avg_ctr,
MAX(CASE WHEN date = CURRENT_DATE()
THEN SAFE_DIVIDE(clicks, views) * 100
END) AS today_ctr
FROM daily_metrics
WHERE views > 50 -- Statistical significance threshold
GROUP BY user_path
)
SELECT
user_path,
today_ctr,
avg_ctr,
((today_ctr - avg_ctr) / avg_ctr) * 100 AS ctr_change_percent
FROM ctr_analysis
WHERE ABS((today_ctr - avg_ctr) / avg_ctr) > 0.5 -- 50% change threshold
ORDER BY ABS((today_ctr - avg_ctr) / avg_ctr) DESC;
Why this matters: A creator's CTR dropping 50% might indicate broken links, confusing copy, or audience mismatch. Catching this early helps creators optimize their pages before losing significant engagement.
Performance Monitoring Built-In
Every query logs its resource usage to help us optimize costs:
async function executeMonitoredQuery(sql: string, params: any) {
const startTime = Date.now();
const [job] = await bigquery.createQueryJob({ query: sql, params });
const [rows] = await job.getQueryResults();
const [metadata] = await job.getMetadata();
const stats = {
duration: Date.now() - startTime,
bytesProcessed: metadata.statistics.totalBytesProcessed,
bytesBilled: metadata.statistics.totalBytesBilled,
rowsReturned: rows.length
};
// Log to monitoring service
logger.info('BigQuery execution', {
query: sql.substring(0, 100),
...stats,
costEstimate: (stats.bytesBilled / 1e12) * 5 // $5 per TB
});
// Alert if query is expensive
if (stats.bytesBilled > 1e9) { // > 1GB
alerts.notify('Expensive query detected', stats);
}
return rows;
}
This gives us real-time visibility into query costs. We've optimized several queries by 90%+ just by reviewing these logs.
For example, we discovered that adding a date filter reduced bytes processed from 500 MB to 50 MB for a common dashboard query—saving ~$0.002 per query, which adds up to $120/month at our query volume.
Parallel Execution with Graceful Degradation
When loading a dashboard, we fetch all metrics in parallel with individual error handling:
async function loadDashboard(userId: string, period: string) {
const [
trafficData,
topLinks,
geoDistribution,
deviceBreakdown,
sessionMetrics
] = await Promise.allSettled([
getTrafficTimeseries(userId, period),
getTopPerformingLinks(userId, period),
getGeographicDistribution(userId, period),
getDeviceBreakdown(userId, period),
getSessionMetrics(userId, period)
]);
return {
traffic: trafficData.status === 'fulfilled' ? trafficData.value : null,
topLinks: topLinks.status === 'fulfilled' ? topLinks.value : [],
geography: geoDistribution.status === 'fulfilled' ? geoDistribution.value : [],
devices: deviceBreakdown.status === 'fulfilled' ? deviceBreakdown.value : [],
sessions: sessionMetrics.status === 'fulfilled' ? sessionMetrics.value : null,
hasErrors: [trafficData, topLinks, geoDistribution, deviceBreakdown, sessionMetrics]
.some(r => r.status === 'rejected')
};
}
Each query has a fallback so one failure doesn't break the entire dashboard. If the "Geographic Distribution" query fails, we show an empty state—but traffic, clicks, and other metrics still load.
Parameterized Queries for Security & Performance
All queries use named parameters instead of string interpolation:
const sql = `
SELECT
user_path,
SUM(view_count) AS total_views
FROM analytics.hourly_traffic
WHERE user_path = @userId
AND hour_timestamp >= @startDate
AND hour_timestamp < @endDate
GROUP BY user_path
`;
const results = await bigquery.query({
query: sql,
params: {
userId: '/creator-username',
startDate: '2024-01-01T00:00:00Z',
endDate: '2024-01-31T23:59:59Z'
}
});
This provides:
- SQL injection protection (even though we're server-side)
- Better query plan caching in BigQuery (identical queries with different params reuse plans)
- Type safety for parameters
- Cleaner, more maintainable code
What We Learned
Building this analytics system taught us a few key lessons:
Don't reinvent event capture. Use a proven platform like PostHog, Segment, or RudderStack. Focus your energy on the analytics experience, not the infrastructure.
Materialized views are a cheat code. The upfront cost of defining views pays off 100x in query performance. Start with simple aggregations and iterate based on usage.
Cluster by your query patterns. Study your most common queries and cluster accordingly. Don't just cluster by date—consider user paths, categories, or other frequently-filtered dimensions.
Cache aggressively, invalidate simply. TTL-based caching is good enough for 99% of analytics use cases. Don't over-engineer it with complex invalidation logic.
Batch queries ruthlessly. One complex query with CTEs is almost always faster than many simple queries, especially across a network. Plus, you save on per-query overhead.
Build in observability from day one. Logging bytes processed helped us optimize costs before they became a problem. Treat query performance as a first-class metric.
Graceful degradation > perfect data. It's better to show partial analytics than to show nothing when one query fails. Users understand temporary gaps—they don't understand blank screens.
Anomaly detection adds massive value. Proactive alerts about traffic changes help users catch problems early and capitalize on viral moments. This feature gets mentioned in user feedback constantly.
The Results
Today, our analytics system handles:
- 10M+ events per day ingested through PostHog
- Sub-second queries for individual creator dashboards
- 2-second load times for agency dashboards aggregating 100+ creators
- 99.9% uptime with automatic failover to cached data
And most importantly: creators love it. They trust the data, rely on the alerts, and use insights to optimize their pages. We see 70%+ of active users checking their analytics at least weekly.
Technical Architecture Summary
Here's a quick reference of our stack:
- Event Capture: PostHog (exports to BigQuery)
- Data Warehouse: Google BigQuery
- Caching: Redis
- API Layer: tRPC for type-safe endpoints
- Frontend: React + Recharts for visualizations
- Monitoring: Custom logging + CloudWatch alerts
Next Steps
We're continuously improving our analytics system. On the roadmap:
- Predictive analytics: Use historical patterns to forecast future traffic
- Cohort analysis: Track user retention and engagement over time
- Custom metrics: Let creators define their own KPIs
- Real-time streaming: Sub-minute data freshness for critical metrics
- Machine learning alerts: Smarter anomaly detection using ML models
Want to see it in action? Sign up for OnlyLinks and check out the analytics dashboard. It's fast, reliable, and built with the techniques described in this post.
Have questions about our architecture? We love talking about analytics infrastructure. Email us @ [email protected].
Written by the James Ash • Published November 2025
