Session 2: Wavefield Reconstruction of Cook Inlet DAS Data with SHallow Recurrent Decoder

Session 2: Wavefield Reconstruction of Cook Inlet DAS Data with SHallow Recurrent Decoder#

Author: Yiyu Ni (niyiyu@uw.edu)

Last updated: Dec 11, 2025

This notebook demonstrates the use of deep-learning model to compress and reconstruct wavefields recorded by offshore DAS. We demonstrate that SHallow REcurrent Decoders (SHRED) provide an effective framework to compress DAS data, enabling lower streaming rates, and being effective at reconstructing wavefield from ocean wave that composes the broadband DAS data.

import h5py
import obspy
import torch
import numpy as np
import matplotlib.pyplot as plt

from tqdm import tqdm

from utils import *

IMAGE_KWARGS = {"cmap": "RdBu", "vmax": 5, "vmin": -5, "aspect": "auto", "origin": "lower"}
TEXT_KWARGS = {"fontsize": 15, "color": "#c00000"}

Cook Inlet DAS experiment and data access#

Two seafloor telecommunication cables owned by the GCI Communication Corporation are used that originate from the landing station in Homer, Alaska: the Terrestrial for Every Rural Region (TERRA) runs westward to the Iliamna Bay, near Augustine Volcano, and the southern span of the Kodiak Kenai Fiber Link (KKFL‐S) that connects Kenai peninsula and the Kodiak archipelago. Beginning June 2023, both cables have been connected to a Sintela Onyx v1.0 interrogator unit in the GCI landing station for continuous data acquisition. The team has built a heterogeneous edge unit with a Network‐Attached‐System (NAS) for year‐long continuous data storage (∼160 TB), and two CPU and GPU edge computing servers for data processing. The interrogator unit saves the raw data at a 25 Hz sampling rate and further decimates it into 2.5 Hz; both raw and decimated data are saved. There are 8,531 channels using a 9.47 m channel spacing with a gauge length of 23.93 m between September and December.

The earthquake data from the Cook Inlet DAS experiment are hosted at https://dasway.ess.washington.edu/gci. Earthquakes and daily data reports were updated daily. Here, we check an earthquake event recorded on December 29, 2023 on the KKFLS cable. The event has an internal id 11784946 and you can find the USGS event report here.

We can load the QuakeML file for this event directly obspy.

e = obspy.read_events("https://dasway.ess.washington.edu/gci/events/2023-12-29/11784946/11784946.qml")[0]
print(e)

Event:	2023-12-29T02:13:56.063000Z | +60.006, -153.148 | 3.7  ML

	            resource_id: ResourceIdentifier(id="smi:service.iris.edu/fdsnws/event/1/query?eventid=11784946")
	             event_type: 'earthquake'
	    preferred_origin_id: ResourceIdentifier(id="smi:service.iris.edu/fdsnws/event/1/query?originid=49444222")
	 preferred_magnitude_id: ResourceIdentifier(id="smi:service.iris.edu/fdsnws/event/1/query?magnitudeid=214424597")
	                   ---------
	     event_descriptions: 1 Elements
	                origins: 1 Elements
	             magnitudes: 2 Elements

Here, we download the the 2-minute event file for this event.

!wget "https://dasway.ess.washington.edu/gci/events/2023-12-29/11784946/KKFLS.h5" -O KKFLS.h5

--2025-12-11 00:15:28--  https://dasway.ess.washington.edu/gci/events/2023-12-29/11784946/KKFLS.h5
Resolving dasway.ess.washington.edu (dasway.ess.washington.edu)... 128.208.23.57
Connecting to dasway.ess.washington.edu (dasway.ess.washington.edu)|128.208.23.57|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 102420064 (98M) [application/octet-stream]
Saving to: ‘KKFLS.h5’

KKFLS.h5            100%[===================>]  97.67M  4.66MB/s    in 25s     

2025-12-11 00:15:53 (3.93 MB/s) - ‘KKFLS.h5’ saved [102420064/102420064]

fp = h5py.File("./KKFLS.h5", 'r', locking=False)
gl = fp['Acquisition'].attrs['GaugeLength']
t0 = fp['Acquisition']['Raw[0]']['RawDataTime'][0]/1e6
fs = fp['Acquisition']['Raw[0]'].attrs['OutputDataRate']
dt = 1./fs
dx = fp['Acquisition'].attrs['SpatialSamplingInterval']
un = fp['Acquisition']['Raw[0]'].attrs['RawDataUnit']
ns = len(fp['Acquisition']['Raw[0]']['RawDataTime'][:])
nx = fp['Acquisition']['Raw[0]'].attrs['NumberOfLoci']
fp.close()


print('Start time:\t\t\t', t0)
print('Unit:\t\t\t\t',un.decode('UTF-8'))
print('Gauge length (m):\t\t', gl)
print('Sampling rate (Hz):\t\t',fs)
print('Sample interval (s):\t\t',dt)
print('Channel interval (m):\t\t',dx)
print('Number of samples:\t\t',ns)
print('Number of channels:\t\t',nx)

Start time:			 1703816046.56
Unit:				 Radians
Gauge length (m):		 23.928572013009905
Sampling rate (Hz):		 25.0
Sample interval (s):		 0.04
Channel interval (m):		 9.571428805203961
Number of samples:		 3000
Number of channels:		 8530

Here, we open the file and retrieve the DAS data from channel nc_start to nc_end (1000 channels in toal). The data is sampled at 25 Hz.

nc_start = 1000
nc_end = 2000

fp = h5py.File("./KKFLS.h5", 'r', locking=False)
raw = fp['/Acquisition/Raw[0]/RawData'][:, nc_start:nc_end].T
raw -= np.mean(raw, axis=-1, keepdims=True)
raw /= np.std(raw, axis=-1, keepdims=True)
fp.close()

print(f"Got data array with {raw.shape} (channel, time)")

Got data array with (1000, 3000) (channel, time)

plt.figure(figsize=(10, 4), dpi=150)
plt.imshow(raw, extent=[0, 120, nc_start, nc_end], **IMAGE_KWARGS)
plt.ylabel("Channel", fontsize=15)
plt.text(2, nc_end-70, "Raw Data", **TEXT_KWARGS)
plt.xlabel("Time (sec)", fontsize=15)

Text(0.5, 0, 'Time (sec)')

../../_images/f00d2075632f8f1d95b4afcd2ecb6d5c771105f3bae06b05b446a15ac1901057.png

SHallow REcurrent Decoder (SHRED)#

SHRED was designed to take a short time series (sensory temporal trajectory) of a field sampled on a sparse network and reconstruct the current state of the field in a dense, spatial array. The adapted SHRED model takes inputs as the strain rate time series measured before and at the current time \(T_0\) from \(N_{input}\) channels and reconstructs the spatial wavefield snapshot at time \(T_0\) of \(N_{output}\) continuous channel, from which the inputs are decimated.

Below, we define model parameters and load the pretrained SHRED model. The core of the SHRED model is an LSTM and one or more Linear Layers. See PyTorch document below:

# define a SHallow Recurrent Decoder model with LSTM and Linear model
class SHRED(torch.nn.Module):
    def __init__(self, n_channel, n_hidden, n_output, n_lstm):
        super().__init__()
        self.lstm = torch.nn.LSTM(
            n_channel, n_hidden, n_lstm, batch_first=True, dropout=0.1
        )
        self.sdn1 = torch.nn.Linear(n_hidden, n_output // 2)
        self.sdn3 = torch.nn.Linear(n_output // 2, n_output)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        x = self.lstm(x)[1][0][-1]
        x = self.relu(self.sdn1(x))
        x = self.sdn3(x)
        return x

n_lag = 100
n_channel = 101
n_hidden = 256
n_output = 1000
n_lstm = 2

model = SHRED(n_channel, n_hidden, n_output, n_lstm)
model.load_state_dict(torch.load("./SHRED_KKFLS_25Hz_101i_1000o_100sp.pt", 
                                 map_location=torch.device('cpu')))

print(f"The SHRED model has {count_weights(model)} weights")
model.eval()

The SHRED model has 1523452 weights

SHRED(
  (lstm): LSTM(101, 256, num_layers=2, batch_first=True, dropout=0.1)
  (sdn1): Linear(in_features=256, out_features=500, bias=True)
  (sdn3): Linear(in_features=500, out_features=1000, bias=True)
  (relu): ReLU()
)

Wavefield reconstruction#

The SHRED model uses downsampled DAS data as input, i.e., we select n_channel channels from n_output channels with a regular spacing.

cidx = np.linspace(1, n_output, n_channel, dtype='int') - 1 # python convention
cidx

array([  0,   9,  19,  29,  39,  49,  59,  69,  79,  89,  99, 109, 119,
       129, 139, 149, 159, 169, 179, 189, 199, 209, 219, 229, 239, 249,
       259, 269, 279, 289, 299, 309, 319, 329, 339, 349, 359, 369, 379,
       389, 399, 409, 419, 429, 439, 449, 459, 469, 479, 489, 499, 509,
       519, 529, 539, 549, 559, 569, 579, 589, 599, 609, 619, 629, 639,
       649, 659, 669, 679, 689, 699, 709, 719, 729, 739, 749, 759, 769,
       779, 789, 799, 809, 819, 829, 839, 849, 859, 869, 879, 889, 899,
       909, 919, 929, 939, 949, 959, 969, 979, 989, 999])

raw_dsp = np.zeros_like(raw)
raw_dsp[cidx, :] = raw[cidx, :]

plt.figure(figsize=(10, 4), dpi=150)
plt.imshow(raw_dsp, extent=[0, 120, nc_start, nc_end], interpolation = "none", **IMAGE_KWARGS)
plt.ylabel("Channel", fontsize=15)
plt.text(2, nc_end-70, "Masked / Downsampled Data", **TEXT_KWARGS)
plt.xlabel("Time (sec)", fontsize=15)

Text(0.5, 0, 'Time (sec)')

../../_images/bfb4610520b022353aac7c8db3db0616f8c6c28fa6e63971250901019982c473.png

rec = np.zeros_like(raw)
for idt in tqdm(range(n_lag, 3000), ncols=80):
    X = raw_dsp[cidx, idt-(n_lag-1):idt+1].T
    with torch.no_grad():
        rec[:, idt] = model(torch.tensor(X))

100%|██████████████████████████████████████| 2900/2900 [00:12<00:00, 239.99it/s]

plt.figure(figsize=(10, 12), dpi=150)
plt.subplots_adjust(hspace=0.1)
plt.subplot(3, 1, 1)
plt.imshow(raw_dsp, extent=[0, 120, nc_start, nc_end], **IMAGE_KWARGS)
plt.text(2, nc_end-70, "Masked Data", **TEXT_KWARGS)
plt.xticks([])
plt.ylabel("\nChannel", fontsize=15)

plt.subplot(3, 1, 2)
plt.imshow(rec, extent=[0, 120, nc_start, nc_end], **IMAGE_KWARGS)
plt.text(2, nc_end-70, "Reconstructed Data", **TEXT_KWARGS)
plt.xticks([])
plt.ylabel("Channel", fontsize=15)

plt.subplot(3, 1, 3)
plt.imshow(raw, extent=[0, 120, nc_start, nc_end], **IMAGE_KWARGS)
plt.text(2, nc_end-70, "Raw Data", **TEXT_KWARGS)
plt.ylabel("\nChannel", fontsize=15)
plt.xlabel("Time (sec)", fontsize=15)

Text(0.5, 0, 'Time (sec)')

../../_images/fabb30ca26dd0bb096d1d9f2c37744202d0fee1b2dc7a05cd58b2e7588e7e9c6.png

Spectrum#

The power spectrum of the raw data (black solid line) and reconstruction (red solid line) across the cables, with the selected channel, suggesting a well‐reconstruction in the frequency domain below 0.15 Hz.

icha = 0
istart = n_lag
iend = 3000
nsp = iend - istart
frq = np.fft.rfftfreq(nsp, d=1./fs)

ftr_raw = []
ftr_rec = []

while iend <= raw.shape[1]:
    ftr_raw.append(20 * np.log10((2/nsp) * abs(np.fft.rfft(raw[icha, istart:iend] * np.hamming(nsp)))))
    ftr_rec.append(20 * np.log10((2/nsp) * abs(np.fft.rfft(rec[icha, istart:iend] * np.hamming(nsp)))))
    istart += 100
    iend += 100

plt.figure(figsize=(7, 4), dpi=150)
plt.plot(frq, np.mean(ftr_raw, axis = 0), linestyle = "-", linewidth=2, 
         color='k', alpha=1, label = "Raw")
plt.plot(frq, np.mean(ftr_rec, axis = 0), linestyle = "--", linewidth=2, 
         color='r', alpha=1, label = 'Reconstruction')

plt.grid(True, alpha=0.5)
plt.title(f"Channel number: {icha + nc_start}", fontsize = 15)
plt.vlines(0.15, -200, 0, color = 'gray', linestyle = ':')
plt.text(0.16, -115, "f = 0.15 Hz", rotation = 90)
plt.legend(fontsize=15, loc = 'upper right')
plt.xlabel("Frequency (Hz)", fontsize=15)
plt.ylabel("Power Spectrum (dB/Hz)", fontsize=15)
plt.xscale("log")
plt.xlim([3e-3, 8])
plt.ylim([-120, 0])

(-120.0, 0.0)

../../_images/98404f40e384069748ceb339c019a0026624000d0fabf2ed8067d903773cff09.png

Kurtosis event detection#

Kurtosis, defined by the fourth standardized moment, is a statistical value measuring the shape of a given dis- tribution, and it has been used as a characteristic function for the state change of a system. Because the seismic phase onset usually generates non‐Gaussian distribution on the wavefield measurement, the kurtosis can be used for event detection indicated by a sharp jump when the measuring window includes the onset signal.

residual = raw[:, n_lag:] - rec[:, n_lag:]
kur = np.zeros_like(residual)
for i in tqdm(range(residual.shape[0])):
    kur[i, :] = akurtosis(residual[i, :], dt, win=1.)

100%|██████████| 1000/1000 [00:02<00:00, 390.00it/s]

fig, axs = plt.subplot_mosaic([['k'], ['mean']], figsize=(10, 7), height_ratios=(2, 1), dpi=150)

axs["k"].imshow(kur, vmax=1, vmin=-2, extent=[0, 116, nc_start, nc_end], 
                aspect="auto", origin="lower")
axs["k"].set_ylabel("Channel", fontsize=15)
axs["mean"].plot(np.mean(kur, axis=0), color="#c00000", linewidth=2)
axs["mean"].set_xticks([i*25 for i in np.linspace(0, 120, 7)], [i for i in np.linspace(0, 120, 7)])
axs["mean"].set_xlim([0, nsp])
axs["mean"].grid(True, alpha=0.5)
axs["mean"].set_xlabel("Time (sec)", fontsize=15)
axs["mean"].set_ylabel("Mean Kurtosis", fontsize=15)

Text(0, 0.5, 'Mean Kurtosis')

../../_images/ed205918b1bff752812eec8ebe0e2d1bfe7c94a475c1d3d4eb48ae44e40946bb.png

References#

niyiyu/DAS-reconstruction
Ni, Y., Denolle, M. A., Shi, Q., Lipovsky, B. P., Pan, S., & Kutz, J. N. (2024). Wavefield reconstruction of distributed acoustic sensing: Lossy compression, wavefield separation, and edge computing. Journal of Geophysical Research: Machine Learning and Computation, 1(3), e2024JH000247 10.1029/2024JH000247.
Williams, J. P., Zahn, O., & Kutz, J. N. (2023). Sensing with shallow recurrent decoder networks. arXiv preprint arXiv:2301.12011.