Blog - May 17, 2025

Rethinking Analytics Schema Design in the Cloud Era

By Chris Muktar

From Normalization to Flexibility: A Shift in Database Thinking#

In the 1970s and 80s, when relational databases were king, schema design revolved around one principle: normalization. The goal was to reduce redundancy, optimize storage, and enforce strict data integrity.

But times have changed. Storage is abundant, compute is elastic, and the cost-performance trade-offs are completely different. The rise of cloud-native systems and big data platforms has upended traditional database wisdom.

The Trade-offs of Modern Schema Design#

In a modern analytics pipeline, three patterns matter more than ever:

Data access pattern: how data is read
Storage pattern: how data is stored
Compute pattern: how data is processed

Each pattern comes with its own constraints - especially in terms of latency, cost, and fault tolerance. Schema design today is no longer just a modeling exercise; it's a business decision that directly impacts performance and reliability.

UserBird: A Schemaless, Append-Only Approach#

At UserBird, we designed our ingest pipeline to be:

Fast
Resilient
Cheap

Our architecture allows the ingest pipeline to remain fully operational even if the rest of the application is offline. We made a deliberate decision to go schemaless at ingest.

We simply write timestamped JSON payloads to an append-only time series store. There's no enforced schema, no validation logic, and very little code involved. This has several advantages:

Ingest is extremely fast
Failure points are minimized
We're future-proofed against schema evolution

The Cost of Flexibility: Expensive Queries#

The trade-off is clear: while ingest is cheap, querying becomes expensive. Parsing, filtering, and aggregating semi-structured data on-the-fly is compute-intensive.

But here's the catch: analytics queries are almost always human-facing. An extra 500ms of latency is acceptable when you're rendering a chart or viewing a report. And we mitigate even this by precaching common computations and leveraging BigQuery’s parallel execution model.

When a query runs, we dynamically explode the JSON into virtual tables and execute SQL across them. This provides schema-on-read flexibility without a significant performance penalty.

Why Raw Data Wins Long-Term#

This model also allows us to evolve our schema retrospectively without destructive migrations. We always store raw, unprocessed data. Any transformation or modeling can be reapplied later, safely and transparently.

This prevents data loss and avoids schema drift bugs. It also makes debugging and analytics reproducibility much easier.

A New Paradigm for Data Systems#

What would have seemed reckless in the 1980s - throwing away schema enforcement, duplicating data, wasting CPU cycles - makes perfect sense in 2025.

Cloud economics has flipped the equation. Storage is cheap. Compute is elastic. Human attention is expensive. And flexibility is king.

Ready to Try Privacy-First Analytics?

See how UserBird can give you powerful insights without compromising user privacy. No cookies, no tracking consent needed, fully GDPR compliant.

Start Free Trial View Documentation

Subscribe to our Newsletter

Chris Muktar

Founder of UserBird.

Chris has been running digital marketing and software businesses since 2007, and is previously a founder of Linkly, a global URL shortener.