Saving Data with ASDF

Pyatoa stores data and processing results to PyASDF ASDFDataSets, which are seismological data structures built upon the HDF5 file format.

Data are stored as ObsPy or NumPy objects within the ASDFDataSet and can be easily retrieved for later processing.

Collections of ASDFDataSets can be read by the Inspector class to perform bulk misfit assessment.

To load an example dataset to play around with:

from pyatoa.scripts.load_example_data import load_example_asdfdataset

ds = load_example_asdfdataset()

Writing Data to a Dataset

Pyatoa stores data, metadata and processing results in ASDFDataSets. This can either be done manually, or automatically during a processing workflow.

Writing Data Manually

The write function of the Manager writes data to ASDFDataSets, including: observed waveforms, synthetic waveforms, station metadata, event metadata, misfit windows and adjoint sources.

from pyasdf import ASDFDataSet
from pyatoa import Manager

ds = ASDFDataSet("example.h5")
mgmt = Manager(ds=ds)
# ... some processing steps

mgmt.write(ds=ds)

The Config class can also write itself to a dataset, this must be done separate from the Manager write.

from pyatoa import Config

cfg = Config()
cfg.write(write_to=ds)

Writing Data Automatically

Prior to data gathering and processing with the Manager, you can set the Config parameter save_to_ds to tell a Manager to automatically save any data and processing results to the dataset (this is set True by default).

The Manager must be supplied a valid ADSFDataSet. See the data discovery page for automated data discovery routines.

from pyatoa import Config, Manager
from pyasdf import ASDFDataSet

ds = ASDFDataSet("example.h5")
cfg = Config(save_to_ds=True, paths={...})  # paths set to local data
mgmt = Manager(ds=ds, config=cfg)
mgmt.gather()  # <- automatically stores gathered data to dataset
mgmt.standardize().preprocess()
mgmt.window()  # <- automatically stores misfit windows to dataset
mgmt.measure()  # <- automatically stores adjoint sources to dataset

Accessing Data from Datasets

This section details how to access waveforms, misfit results and metadata stored inside an ASDFDataSet.

See the PyASDF documentation for more information.

Event and Station Metadata

To access event metadata, stored as an ObsPy Event object

Note

By design, Pyatoa only stores one event per ASDFDataSet, to avoid file sizes getting too large;

>>> ds.events[0]
Event:      2018-02-18T07:43:48.130000Z | -39.949, +176.299 | 4.86 mw

                      resource_id: ResourceIdentifier(id="smi:local/cmtsolution/2018p130600/event")
                       event_type: 'earthquake'
              preferred_origin_id: ResourceIdentifier(id="smi:local/cmtsolution/2018p130600/origin#cmt")
           preferred_magnitude_id: ResourceIdentifier(id="smi:local/cmtsolution/2018p130600/magnitude#mw")
     preferred_focal_mechanism_id: ResourceIdentifier(id="smi:local/cmtsolution/2018p130600/focal_mechanism")
                             ---------
               event_descriptions: 1 Elements
                         comments: 1 Elements
                 focal_mechanisms: 1 Elements
                          origins: 2 Elements
                       magnitudes: 3 Elements

To access the station list, which stores data and metadata for all stations in the dataset:

>>> ds.waveforms.list()
['NZ.BFZ']

Waveforms are stored alongside metadata coded by the the network and station code of each receiver.

>>> ds.waveforms.NZ_BFZ
Contents of the data set for station NZ.BFZ:
    - Has a StationXML file
    - 2 Waveform Tag(s):
        observed
        synthetic_i01s00

To access station metadata, stored as an ObsPy Inventory object

>>> ds.waveforms.NZ_BFZ.StationXML
Inventory created at 2020-02-02T22:21:59.000000Z
    Created by: Delta
            None
    Sending institution: GeoNet (WEL(GNS_Test))
    Contains:
        Networks (1):
            NZ
        Stations (1):
            NZ.BFZ (Birch Farm)
        Channels (3):
            NZ.BFZ.10.HHZ, NZ.BFZ.10.HHN, NZ.BFZ.10.HHE

Observed and Synthetic Waveforms

Observed waveforms are tagged by Pyatoa with the Config.observed_tag attribute, which is ‘observed’ by default. Waveforms are stored as Stream objects.

>>> ds.waveforms.NZ_BFZ.observed
3 Trace(s) in Stream:
NZ.BFZ..BXE | 2018-02-18T07:43:28.130000Z - 2018-02-18T07:49:30.557500Z | 13.8 Hz, 5000 samples
NZ.BFZ..BXN | 2018-02-18T07:43:28.130000Z - 2018-02-18T07:49:30.557500Z | 13.8 Hz, 5000 samples
NZ.BFZ..BXZ | 2018-02-18T07:43:28.130000Z - 2018-02-18T07:49:30.557500Z | 13.8 Hz, 5000 samples

Synthetic waveforms are tagged by Pyatoa with the Config.synthetic_tag attribute.

ds.waveforms.NZ_BFZ.synthetic

During a SeisFlows inversion, the synthetic_tag may reflect the iteration and step count assigned by SeisFlows.

Note

See the naming standards page for further explanation on tagging for inversions.

For iteration 1, step count 0, synthetics will be saved as:

>>> ds.waveforms.NZ_BFZ.synthetics_i01s00
3 Trace(s) in Stream:
NZ.BFZ..BXE | 2018-02-18T07:43:28.130000Z - 2018-02-18T07:49:30.557500Z | 13.8 Hz, 5000 samples
NZ.BFZ..BXN | 2018-02-18T07:43:28.130000Z - 2018-02-18T07:49:30.557500Z | 13.8 Hz, 5000 samples
NZ.BFZ..BXZ | 2018-02-18T07:43:28.130000Z - 2018-02-18T07:49:30.557500Z | 13.8 Hz, 5000 samples

This tagging system allows Pyatoa to save multiple sets of synthetic waveforms to a single ASDFDataSet.

Misfit Windows

Misfit windows, adjoint sources and configuration parameters are stored in the auxiliary_data attribute of the ASDFDataSet.

>>> ds.auxiliary_data
Data set contains the following auxiliary data types:
    AdjointSources (1 item(s))
    Configs (2 item(s))
    MisfitWindows (1 item(s))

The MisfitWindows attribute stores information about misfit windows

ds.auxiliary_data.MisfitWindows

During an inversion, misfit windows are tagged by the iter_tag and step_tag attributes of Config

>>> ds.auxiliary_data.MisfitWindows
1 auxiliary data sub group(s) of type 'MisfitWindows' available:
    i01
>>> ds.auxiliary_data.MisfitWindows.i01
1 auxiliary data sub group(s) of type 'MisfitWindows/i01' available:
    s00
>>> ds.auxiliary_data.MisfitWindows.i01.s00
3 auxiliary data item(s) of type 'MisfitWindows/i01/s00' available:
    NZ_BFZ_E_0
    NZ_BFZ_N_0
    NZ_BFZ_Z_0

Accessing each misfit window provides a dictionary of window parameters, same as the information that is outputted by Pyflex.

>>> ds.auxiliary_data.MisfitWindows.i01.s00.NZ_BFZ_E_0
Auxiliary Data of Type 'MisfitWindows'
    Path: 'i01/s00/NZ_BFZ_E_0'
    Data shape: '(2,)', dtype: 'int64'
    Parameters:
        absolute_endtime: 2018-02-18T07:44:59.915000Z
        absolute_starttime: 2018-02-18T07:43:57.130000Z
        cc_shift_in_samples: 97
        cc_shift_in_seconds: 7.0325
        center_index: 833
        channel_id: NZ.BFZ..BXE
        dlnA: 0.8178943677509113
        dt: 0.0725
        left_index: 400
        max_cc_value: 0.9260584412126905
        min_period: 8.0
        phase_arrival_P: 15.262235117775926
        phase_arrival_Pn: 15.131536549180034
        phase_arrival_S: 25.700988089152666
        phase_arrival_Sn: 25.674453184025445
        phase_arrival_p: 14.045597727214647
        phase_arrival_s: 23.62091920350575
        phase_arrival_sP: 18.77953271333086
        relative_endtime: 91.785
        relative_starttime: 28.999999999999996
        right_index: 1266
        time_of_first_sample: 2018-02-18T07:43:28.130000Z
        window_weight: 7.267822403942347

Adjoint Sources

Adjoint sources can be accessed in the same manner as misfit windows, through the AdjointSources attribute of auxiliary data.

ds.auxiliary_data.AdjointSources

During an inversion, adjoint sources are tagged by the iter_tag and step_tag attributes of Config

>>> ds.auxiliary_data.AdjointSources.i01.s00
3 auxiliary data item(s) of type 'AdjointSources/i01/s00' available:
    NZ_BFZ_BXE
    NZ_BFZ_BXN
    NZ_BFZ_BXZ

Adjoint sources are stored as dictionaries with relevant creation information:

>>> ds.auxiliary_data.AdjointSources.default.NZ_BFZ_BXE
Auxiliary Data of Type 'AdjointSources'
    Path: 'i01/s00/NZ_BFZ_BXE'
    Data shape: '(5000, 2)', dtype: 'float64'
    Parameters:
        adj_src_type: cc_traveltime_misfit
        component: BXE
        dt: 0.0725
        location:
        max_period: 20.0
        min_period: 8.0
        misfit: 24.220799999999993
        network: NZ
        starttime: 2018-02-18T07:43:28.130000Z
        station: BFZ

The actual data array of the adjoint source is also stored here in two column format (time, amplitude):

>>> ds.auxiliary_data.AdjointSources.i01.s00.NZ_BFZ_BXE.data[:]
array([[-20.    ,   0.    ],
       [-19.9275,   0.    ],
       [-19.855 ,   0.    ],
       ...,
       [342.2825,   0.    ],
       [342.355 ,   0.    ],
       [342.4275,   0.    ]])

Configuration Parameters

Users can access Config parameters from the auxiliary data attribute. This is useful for understanding how windows and adjoint sources were generated.

>>> ds.auxiliary_data.Configs.i01.s00
Auxiliary Data of Type 'Configs'
    Path: 'i01/s00'
    Data shape: '(1,)', dtype: 'bool'
    Parameters:
        _synthetic_tag: None
        adj_src_type: cc_traveltime_misfit
        client: None
        component_list: ['Z' 'N' 'E']
        end_pad: 350
        event_id: 2018p130600
        filter_corners: 4
        iteration: 1
        max_period: 20.0
        min_period: 8.0
        observed_tag: observed
        ...

Loading Data From a Dataset

Data previously saved to an ASDFDataSet can be loaded back into a Manager class using the the load function. This is useful for repeating measurements, re-using misfit windows on new data, or running seismic inversions.

Config Parameters

To load the Config class from an ASDFDataSet, you need to specify a path which was generated from the iter_tag and step_tag attributes of the saved Config.

cfg = Config()
cfg.read(read_from=ds, path="i01/s00", fmt="asdf")

Data and Metadata

The Managers load function searches for metadata, waveforms and configuration parameters, based on the code and path arguments.

The path attribute is specified by the iter_tag and step_tag attributes of the saved Config.

Note

Waveforms stored in the ASDFDataSet are unprocessed. Users will have to re-run the standardize and preprocess functions to retrieve the waveforms used to generate saved windows/adjoint sources.

mgmt = Manager(ds=ds)
mgmt.load(code="NZ.BFZ", path="i01/s00")

Windows and Adjoint Sources

Note

Misfit windows and adjoint sources are not explicitely re-loaded when calling the load function.

To re-load windows, you can call the window function, setting the fix_windows argument to True and specifying the iteration and step_count to retrieve windows from:

mgmt.window(fix_windows=True, iteration="i01", step_count="s00")

The Manager does not currently have the capability to re-load adjoint sources, but given a loaded Config and set of windows, you can re-calculate adjoint sources with the measure function:

mgmt.measure()