Time—“A finite extent or stretch of continued existence, as the interval separating two successive events or actions, or the period during which an action, condition, or state continues.” – Oxford Dictionary
Late in the session. You’ve been drilling for an hour, and the coach points at you and a fresh partner. “Switch.” You’re tired, they’re not, and you can feel it immediately. Combinations that landed clean twenty minutes ago are getting blocked. You’re a half-step slow on everything. “You in round one is not you in round five,” the coach says as you’re getting worked over. “Respect the clock.” Same problem as before: you need better conditioning. And that will come with time…
It goes beyond one session. The version of you who walked in a few months ago could barely throw a proper jab. The version of you today can string together combinations. The people who’ve been at the gym for years will tell you the same thing—they barely recognize their old selves. Entities change over time. Attributes expire. Relationships start and end. When you ignore time in your data model, you confuse who someone was with who they are now. Companies have been destroyed in minutes because their systems confused one type of time with another—deployment time, execution time, market time—all tangled together with no safeguards. Time cuts across everything, and when you get it wrong, the consequences can range from annoying to catastrophic.
So where does time show up? Everywhere. Time is a universal concept that binds all six layers of data modeling together. Every layer has a temporal dimension, whether explicitly modeled or implicitly assumed—from entities evolving at the structural layer, to how timestamps are stored at the technical layer, to how history is aggregated and queried at the analytical layer. The figure below maps time across all six layers. Miss the temporal dimension at any one of them, and the problems cascade upward, creating confusion about what your model actually represents.
In 2012, the Knight Capital Group, one of the largest stock trading firms in the U.S., lost $440 million in less than an hour due to a software deployment gone wrong. A legacy module that hadn’t been used in years was accidentally reactivated on some servers but not others. The module triggered automated trades at the wrong time, sending a flood of unintended orders into the stock market. Knight bought high and sold low, spiraling into massive losses almost instantly. The system lacked temporal safeguards to prevent stale code from executing live trades. By the time anyone noticed, the company was insolvent.
Poor handling of time has cratered countless applications, analyses, and ML/AI models. Time is a top-priority concern for any data system, yet it remains one of the most nuanced and misunderstood aspects of our work. Reality isn’t static. Events happen, states change, and when we model data, we are almost always attempting to capture a faithful record of these changes over time.
We’re going to build a framework for handling time in your data models. We’ll define the types of time, show you how to track history, and work through the practical details of representing and querying time. This matters. Get it right, and your models will survive the real world. Get it wrong, and you’re building on sand.
Fundamental Types of TimeTime in data modeling is more complicated than it looks. It’s not one thing—it’s several things layered on top of each other. Data gets created, moves through systems, and gets transformed. Each of those handoffs happens at different times. For practical purposes, focus on four types: event time, ingestion time, processing time, and valid time. Understand what each one means and when they diverge.
• • Figure 10.1: Four major types of time in data modeling
Event time is when something actually happened in the real world. It’s the moment the fighter threw the punch, the customer clicked buy, and the system crashed. This is the truth. Everything else is a variation on when that truth made it into your system.
Ingestion time represents the point at which information reaches your system. Effectively, it is the marker recorded right at the moment of entry. For a log file, it’s when the file hits your ingestion layer. For a queue, it’s the message arrival time. For an API, it’s the request timestamp. Ingestion time can lag far behind event time, or it can happen before.
Yes, ingestion time can come before event time. Sounds backwards, but it happens. A clock somewhere is wrong, or data arrived late with a stale timestamp. Welcome to data modeling.
Processing time is when your system actually works on the data—parsing, validating, transforming, and loading data. It’s when computation happens, and it can be totally disconnected from when the data arrived.
In a simple system where everything happens right now, ingestion time and processing time look the same. But most systems aren’t simple. Here’s where they diverge:
• Streaming systems: Data might be ingested into a queue at 2:00 PM but not processed until 2:05 PM due to backlog.
• Batch systems: Files ingested throughout the day might all be processed together at midnight.
• Distributed systems: Data ingested by one service might be processed by another service minutes or hours later.
Don’t confuse processing time with event time. Some people do, and their models blow up. Processing time tells you when the computation happened. Event time tells you when the reality you’re modeling actually occurred. They’re not the same thing.
Last one: valid time. This is when a fact is actually true in the real world. A customer’s address isn’t just one event. It’s a series. The customer lived at 123 Main from January to March, then moved to 456 Oak. Both were true. Both matter. Valid time tracks the actual lifespan of that truth.
To answer a question like “How many customers lived in California on January 1, 2024?”, our system must check which customers have an address record where the target date (January 1, 2024) falls on or after the valid_from date and before the valid_to date. In pseudocode:
SELECT COUNT(*) FROM customer_addresses
WHERE state = ‘CA’
AND valid_from <= ‘2024-01-01’::DATE AND valid_to > ‘2024-01-01’::DATE;When a customer moves, their old record’s valid_to is updated, and a new record is created with a new valid_from. This enables the accurate reconstruction of historical states, which is not possible using only event timestamps.
To recap: event time captures when something happened; valid time captures how long something remained true; processing time tells you when the system “knew” about the event.
Modeling History With TemporalityThe types of time we just examined refer to a single event’s transition through its creation, ingestion, processing, and validity within a system. But what happens when something affects this event later? This is where temporality comes into play.
Temporality is the practice of tracking and storing data values over time. This goes beyond just timestamping a record and involves managing multiple dimensions of time to provide complete historical context. Here, the term “dimension” refers to timelines (yes, time can have various timelines). The most common types of temporality are nontemporal, unitemporal, bitemporal, and tritemporal data.
Non-Temporal and UnitemporalBefore we describe unitemporal data, let’s look at what it’s not. Non-temporal data represents a single moment or “current” state. By contrast, unitemporal data is data with a single timeline, typically the valid time (also known as “business time” or “effective time”), the period during which something is true in the real world. Most data is unitemporal.
Unitemporal data models a single, linear version of history. For example, consider a Products table with price, valid_from, and valid_to columns that indicate the price of a product during a specific period. Let’s say a product has a pricing error that needs to be corrected. If you fix this historical price error, the original incorrect record is overwritten, and the history of what your database previously thought is lost. This is common with Create-Read-Update-Delete (CRUD) operations on transactional databases, where a value can be overwritten in place. This is not ideal if you want to reconstruct the product’s pricing history. Thankfully, there’s an easy solution: bitemporal data.
Read more
(https://practicaldatamodeling.substack.com/p/chapter-10-why-time-matters-in-data)
Comments (0)
No comments yet.