top of page

An example of Timing First in action

  • Jan 16
  • 4 min read

Updated: Jan 22



This is an interesting study and I'd like to use it to highlight what and why a 'timing first' architecture is so important in today's mainstream healthcare instrumentation and measurement systems.


The following paragraph is taken from the pdf above:

'Each PSG recording is resampled to 128 Hz to standardize sampling rates across participants and sites. Before downsampling, we utilized a fourth-order low-pass Butterworth filter to prevent aliasing, applied in a zero-phase setting to avoid phase distortion. Finally, we standardized the signal to have zero mean and unit variance. For any signals that needed to be upsampled, this was done using linear interpolation. Due to the channel-agnostic model design, we did not need any other data harmonization. Signals are segmented into 5-s patches, with each segment embedded into a vector representation for transformer model processing. To prevent data leakage, PSGs were split into pretrain, train, validation, test and temporal test sets early in the preprocessing pipeline. Although there is overlap between the pretraining and training sets, no overlap exists with the validation, test or temporal test sets. The SHHS serves as an independent dataset not used during pretraining, instead being used to evaluate the model’s ability to adapt to a new site through lightweight fine-tuning.'


The manuscript presents compelling results using large-scale multimodal PSG data and a foundation-model approach. However, several preprocessing choices (as noted above) introduce implicit assumptions about temporal validity and synchronization that merit explicit discussion, particularly given that the central findings rely on cross-system “discord” and inter-organ relationships.


First, resampling all signals to a uniform 128 Hz assumes that physiologically meaningful timing relationships are preserved through resampling. In practice, different PSG channels often originate from independent acquisition clocks and may exhibit small but nontrivial inter-channel skew or drift. Resampling enforces an artificial common clock post hoc, without verifying whether temporal alignment existed at acquisition. This makes it difficult to distinguish true physiological discord from measurement-induced misalignment.


Second, the use of zero-phase filtering, while effective at avoiding phase distortion in the signal-processing sense, removes causal directionality by incorporating future samples into past estimates. For analyses concerned with inter-system coordination or lead–lag dynamics, this step eliminates information relevant to physiological response ordering and recovery behavior.


Third, linear interpolation for upsampling introduces synthetic samples that were not observed, smoothing over short-term timing irregularities and potentially attenuating micro-jitter or transient misalignment. If discord between systems is treated as a predictive signal, interpolation risks conflating measurement artifacts with physiological phenomena.


Fourth, the channel-agnostic model design presumes temporal coherence across modalities rather than explicitly validating it. While this simplifies representation learning, it provides no mechanism for detecting or rejecting segments where synchronization fails. As a result, all segments are treated as equally valid inputs, even when temporal integrity may be compromised.


Fifth, segmentation into fixed 5-second patches imposes a uniform temporal structure on physiological processes that are inherently multi-scale and history-dependent. Recovery, compensation, and instability may not align with fixed window boundaries, and segment-first modeling can obscure these dynamics.


Finally, per-channel standardization to zero mean and unit variance removes baseline and tonic information that may be relevant to compensatory states. While appropriate for representation learning, this further shifts the model toward relative patterns at the expense of absolute physiological cost.


Taken together, these preprocessing steps are standard and reasonable for large-scale representation learning, but they collectively assume—rather than verify—measurement integrity and temporal coherence. Consequently, the reported predictive performance should be interpreted as conditional on assumed timing validity. Given that the primary signal of interest is inter-organ discord, explicitly addressing these assumptions or incorporating timing validation would strengthen the physiological interpretability and clinical robustness of the findings.


The intent of this review is not to diminish the contribution of the work, but to clarify where an additional architectural layer could materially strengthen both interpretability and downstream use.


Our approach using our timing-first architecture would not replace representation learning or foundation models such as the one presented here. Instead, it would precede them.


We treat measurement integrity and temporal validity as first-class objects, rather than assumptions absorbed by preprocessing. Before any resampling, filtering, or segmentation occurs, we explicitly evaluate whether the signals qualify to be interpreted at all.


Concretely, this means:

  • Timing is validated, not normalized -- Rather than enforcing a common clock via resampling, we first test whether inter-channel timing lies within bounded, physiologically meaningful tolerances. If synchronization fails, that condition is surfaced explicitly rather than smoothed away.

  • Phase and coherence are measured, not inferred -- Cross-system relationships are evaluated through explicit coherence and stability contracts prior to representation learning. Apparent discord is only interpreted as physiological when it persists under verified timing alignment.

  • Causality is preserved where relevant -- Filtering choices are constrained to avoid retrocausal leakage when lead–lag relationships, response ordering, or recovery dynamics are part of the hypothesis being tested.

  • Segmentation follows stability, not the reverse -- Windowing is adaptive to system behavior, allowing recovery and compensation dynamics to be observed across variable timescales rather than imposed fixed patches.

  • Refusal is an allowed outcome -- When timing integrity, synchronization, or coherence fails, the system explicitly declines to produce embeddings or predictions for those segments. This avoids downstream models learning from data whose meaning is ambiguous or degraded.


Importantly, our timing-first architecture is collaborative by design. Once signals satisfy integrity and timing contracts, they can be passed forward—unaltered or lightly annotated—into existing foundation models. The result is not less data, but better-qualified data, accompanied by auditable guarantees about when interpretation is warranted.


In this framing, foundation models do what they do best: learning rich representations and long-horizon associations. Our contribution is to establish the conditions under which those associations are physiologically meaningful.


The tangible benefit is twofold:

  1. Reduced false discord, where apparent mismatch is driven by instrumentation rather than biology.

  2. Explicit uncertainty handling, where predictions are conditioned on verified measurement validity rather than implicit assumptions.


As models like SleepFM move closer to clinical translation, this separation of concerns becomes increasingly important. Predictive power alone is not sufficient; confidence must be traceable to measurement conditions that justify it.


We see this as a natural next step for the field: not a competing paradigm, but an enabling layer that allows large-scale physiological models to operate on firmer ground.

 
 
 

Comments


bottom of page