pyatoa

Subpackages

Package Contents

Classes

Config

The Config class is the main interaction object between the User and

Manager

Pyatoas core workflow object.

Executive

The Executive is hierarchically above Pyatoa's core class, the Manager.

Inspector

This plugin object will collect information from a Pyatoa run folder and

Attributes

logger

ch

FORMAT

formatter

pyatoa.logger[source]
pyatoa.ch[source]
pyatoa.FORMAT = '[%(asctime)s] - %(name)s - %(levelname)s: %(message)s'[source]
pyatoa.formatter[source]
class pyatoa.Config(yaml_fid=None, ds=None, path=None, iteration=None, step_count=None, event_id=None, min_period=10, max_period=100, rotate_to_rtz=False, unit_output='DISP', component_list=None, adj_src_type='cc_traveltime', observed_tag='observed', synthetic_tag=None, st_obs_type='obs', st_syn_type='syn', win_amp_ratio=0.0, pyflex_parameters=None, pyadjoint_parameters=None)[source]

The Config class is the main interaction object between the User and workflow. It is used by Manager for workflow management, and also for information sharing between Pyatoa objects and functions. The Config can be read to and written from external files and ASDFDataSets.

property pfcfg

simple dictionary print of pyflex config object

property pacfg

simple dictionary print of pyflex config object

property iter_tag

string formatted version of iteration, e.g. ‘i00’

property step_tag

string formatted version of step, e.g. ‘s00’

property eval_tag

string formatted version of iter and step, e.g. ‘i01s00’

property synthetic_tag

tag to be used for synthetic data, uses iteration and step count

property aux_path

property to quickly get a bog-standard aux path e.g. i00/s00

__str__()[source]

String representation of the class for print statements. It separates information into similar bins for readability.

__repr__()[source]

Simple call string representation

_check()[source]

A series of sanity checks to make sure that the configuration parameters are set properly to avoid any problems throughout the workflow. Should normally be run after any parameters are changed to make sure that they are acceptable.

_get_aux_path(default='default', separator='/')[source]

Pre-formatted path to be used for tagging and identification in ASDF dataset auxiliary data. Internal function to be called by property aux_path.

Parameters:
  • default (str) – if no iteration or step information is given, path will default to this string. By default it is ‘default’.

  • separator (str) – if an iteration and step_count are available, separator will be placed between. Defaults to ‘/’, use ‘’ for no separator.

static _check_io_format(fid, fmt=None)[source]

A simple check before reading or writing the config to determine what file format to use. Currently accepted file formats are yaml, asdf and ascii.

Parameters:

fmt (str) – format specified by the User

Return type:

str

Returns:

format string to be understood by the calling function

copy()[source]

Simply convenience function to return a deep copy of the Config

write(write_to, fmt=None)[source]

Wrapper for underlying low-level write functions

Parameters:
  • fmt (str) –

    format to save parameters to. Available:

    • yaml: Write all parameters to a .yaml file which can be read later

    • ascii: Write parameters to a simple ascii file, not very smart and yaml is prefereable in most cases

    • asdf: Save the Config into an ASDFDataSet under the auxiliary data attribute

  • write_to (str or pyasdf.ASDFDataSet) – filename to save config to, or dataset to save to

read(read_from, path=None, fmt=None)[source]

Wrapper for underlying low-level read functions

Parameters:
  • read_from (str or pyasdf.asdf_data_set.ASDFDataSet) – filename to read config from, or ds to read from

  • path (str) – if fmt=’asdf’, path to the config in the aux data

  • fmt (str) – file format to read parameters from, will be guessed but can also be explicitely set (available: ‘yaml’, ‘ascii’, ‘asdf’)

_write_yaml(filename)[source]

Write config parameters to a yaml file, retain order

Parameters:

filename (str) – filename to save yaml file

_write_asdf(ds)[source]

Save the Config values as a parameter dictionary in the ASDF Data set Converts types to play nice with ASDF Auxiliary Data. Flattens dictionaries and external Config objects for easy storage.

Parameters:

ds (pyasdf.asdf_data_set.ASDFDataSet) – dataset to save the config file to

_write_ascii(filename)[source]

Write the config parameters to an ascii file

Parameters:

filename (str) – filename to write the ascii file to

_read_yaml(filename)[source]

Read config parameters from a yaml file, parse to attributes.

Parameters:

filename (str) – filename to save yaml file

Return type:

dict

Returns:

key word arguments that do not belong to Pyatoa are passed back as a dictionary object, these are expected to be arguments that are to be used in Pyflex and Pyadjoint configs

Raises:

ValueError – if unrecognized kwargs are found in the yaml file

_read_asdf(ds, path)[source]

Read and set config parameters from an ASDF Dataset, assumes that all necessary parameters are located in the auxiliary data subgroup of the dataset, which will be the case if the write_to_asdf() function was used Assumes some things about the structure of the auxiliary data.

Parameters:
  • ds (pyasdf.asdf_data_set.ASDFDataSet) – dataset with config parameter to read

  • path (str) – model number e.g. ‘m00’ or ‘default’, or ‘m00/s00’

class pyatoa.Manager(config=None, ds=None, event=None, st_obs=None, st_syn=None, inv=None, windows=None, staltas=None, adjsrcs=None, gcd=None, baz=None)[source]

Pyatoas core workflow object.

Manager is the central workflow control object. It calls on mid and low level classes to gather data, standardize and preprocess stream objects, generate misfit windows, and calculate adjoint sources. Has a variety of internal sanity checks to ensure that the workflow stays on the rails.

property st

Simplified call to return all streams available, observed and synthetic

__str__()[source]

Print statement shows available data detailing workflow

__repr__()[source]

Return repr(self).

check()[source]

(Re)check the stats of the workflow and data within the Manager.

Rechecks conditions whenever called, incase something has gone awry mid-workflow. Stats should only be set by this function.

reset()[source]

Restart workflow by deleting all collected data in the Manager, but retain dataset, event, config, so a new station can be processed with the same configuration as the previous workflow.

write_to_dataset(ds=None, choice=None)[source]

Write the data collected inside Manager to an ASDFDataSet

Parameters:
  • ds (pyasdf.asdf_data_set.ASDFDataSet or None) – write to a given ASDFDataSet. If None, will look for internal attribute self.ds to write to. Allows overwriting to new datasets

  • choice (list or None) –

    choose which internal attributes to write, by default writes all of the following: ‘event’: Event atttribute as a QuakeML ‘inv’: Inventory attribute as a StationXML ‘st_obs’: Observed waveform under tag config.observed_tag ‘st_syn’: Synthetic waveform under tag config.synthetic_tag ‘windows’: Misfit windows collected by Pyflex are stored under

    auxiliary_data.MisfitWindow

    ’adjsrcs’: Adjoint sources created by Pyadjoint are stored under

    auxiliary_data.AdjointSources

    ’config’: the Pyatoa Config object is stored under

    ’auxiliary_data.Config’ and can be used to re-load the Manager and re-do processing

write_adjsrcs(path='./', write_blanks=True)[source]

Write internally stored adjoint source traces into SPECFEM defined two-column ascii files. Filenames are based on what is expected by Specfem, that is: ‘NN.SSS.CCC.adj’

..note::

By default writes adjoint sources for ALL components if one component has an adjoint source. If an adjoint sourced doesn’t exist for a given component, it will be written with zeros. This is to satisfy SPECFEM3D requirements.

Parameters:
  • path (str) – path to save the

  • write_blanks (bool) – write zeroed out adjoint sources for components with no adjoint sources to meet the requirements of SPECFEM3D. defaults to True

load(code=None, path=None, ds=None, synthetic_tag=None, observed_tag=None, config=True, windows=False, adjsrcs=False)[source]

Populate the manager using a previously populated ASDFDataSet. Useful for re-instantiating an existing workflow that has already gathered data and saved it to an ASDFDataSet.

Note

mgmt.load() will return example data with no dataset

Warning

Loading any floating point values may result in rounding errors. Be careful to round off floating points to the correct place before using in future work.

Parameters:
  • code (str) – SEED conv. code, e.g. NZ.BFZ.10.HHZ

  • path (str) – if no Config object is given during init, the User can specify the config path here to load data from the dataset. This skips the need to initiate a separate Config object.

  • ds (None or pyasdf.asdf_data_set.ASDFDataSet) – dataset can be given to load from, will not set the ds

  • synthetic_tag (str) – waveform tag of the synthetic data in the dataset e.g. ‘synthetic_m00s00’. If None given, will use config attribute.

  • observed_tag (str) – waveform tag of the observed data in the dataset e.g. ‘observed’. If None given, will use config attribute.

  • config (bool) – load config from the dataset, defaults to True but can be set False if Config should be instantiated by the User

  • windows (bool) – load misfit windows from the dataset, defaults to False

  • adjsrcs (bool) – load adjoint sources from the dataset, defaults to False

flow(standardize_to='syn', fix_windows=False, iteration=None, step_count=None, **kwargs)[source]

A convenience function to run the full workflow with a single command. Does not include gathering. Takes kwargs related to all underlying functions.

mgmt = Manager()
mgmt.flow() == mgmt.standardize().preprocess().window().measure()
Parameters:
  • standardize_to (str) – choice of ‘obs’ or ‘syn’ to use one of the time series to standardize (resample, trim etc.) the other.

  • fix_windows (bool) – if True, will attempt to retrieve saved windows from an ASDFDataSet under the iteration and step_count tags to use during misfit quantification rather than measuring new windows

  • iteration (int or str) – if ‘fix_windows’ is True, look for windows in this iteration. If None, will check the latest iteration/step_count in the given dataset

  • step_count (int or str) – if ‘fix_windows’ is True, look for windows in this step_count. If None, will check the latest iteration/step_count in the given dataset

Raises:

ManagerError – for any controlled exceptions

flow_multiband(periods, standardize_to='syn', fix_windows=False, iteration=None, step_count=None, plot=False, **kwargs)[source]

Run the full workflow for a number of distinct period bands, returning a final set of adjoint sources generated as a summation of adjoint sources from each of these period bands. Allows for re-using windows collected from the first set of period bands to evaluate adjoint sources from the remaining period bands.

Note

Kwargs are passed through to Manager.preprocess() function only

Basic Usage

Manager.flow_multiband(periods=[(1, 5), (10, 30), (40, 100)])

Parameters:
  • periods (list of tuples) – a list of tuples that define multiple period bands to generate windows and adjoint sources for. Overwrites the Config’s internal min_period and max_period parameters. The final adjoint source will be a summation of all adjoint sources generated.

  • standardize_to (str) – choice of ‘obs’ or ‘syn’ to use one of the time series to standardize (resample, trim etc.) the other.

  • fix_windows (bool) – if True, will attempt to retrieve saved windows from an ASDFDataSet under the iteration and step_count tags to use during misfit quantification rather than measuring new windows

  • iteration (int or str) – if ‘fix_windows’ is True, look for windows in this iteration. If None, will check the latest iteration/step_count in the given dataset

  • step_count (int or str) – if ‘fix_windows’ is True, look for windows in this step_count. If None, will check the latest iteration/step_count in the given dataset

  • plot (str) – name of figure if given, will plot waveform and map for each period band and append period band to figure name plot

Return type:

tuple of dict

Returns:

(windows, adjoint_sources), returns all the collected measurements from each of the period bands

Raises:

ManagerError – for any controlled exceptions

_combine_mutliband_results(windows, adjsrcs)[source]

Function flow_multiband() generates multiple sets of adjoint sources for a variety of period bands, however the User is only interested in a single adjoint source which is the average of all of these adjoint sources.

This function will take the multiple sets of adjoint sources and sum them accordingly, returning a single set of AdjointSource objects which can be used the same as any adjsrc attribute returned from measure.

Parameters:

adjsrcs (dict of dicts) – a collection of dictionaries whose keys are the period band set in flow_multiband(periods) and whose values are dictionaries returned in Manager.adjsrcs from Manager.measure()

Return type:

(dict of Windows, dict of AdjointSource)

Returns:

a dictionary of Windows, and AdjointSource objects for each component in the componet list. Adjoint sources and misfits are the average of all input adjsrcs for the given periods range

standardize(force=False, standardize_to='syn', normalize_to=None)[source]

Standardize the observed and synthetic traces in place. Ensures Streams have the same starttime, endtime, sampling rate, npts.

Parameters:
  • force (bool) – allow the User to force the function to run even if checks say that the two Streams are already standardized

  • standardize_to (str) – allows User to set which Stream conforms to which by default the Observed traces should conform to the Synthetic ones because exports to Specfem should be controlled by the Synthetic sampling rate, npts, etc. Choices are ‘obs’ and ‘syn’.

  • normalize_to (str) – allow for normalizing the amplitudes of the two traces. Choices are: ‘obs’: normalize synthetic waveforms to the max amplitude of obs ‘syn’: normalize observed waveform to the max amplitude of syn ‘one’: normalize both waveforms so that their max amplitude is 1

preprocess(which='both', filter_=True, corners=2, remove_response=False, taper_percentage=0.05, zerophase=True, normalize_to=None, convolve_with_stf=True, half_duration=None, **kwargs)[source]

Apply a simple, default preprocessing scheme to observed and synthetic waveforms in place.

Default preprocessing tasks: Remove response (optional), rotate (optional), filter, convolve with source time function (optional)

User is free to skip this step and perform their own preprocessing on Manager.st_obs and Manager.st_syn if they require their own unique processing workflow.

Parameters:
  • which (str) – “obs”, “syn” or “both” to choose which stream to process defaults to “both”

  • filter (bool) – filter data using Config.min_period and Config.max_period with corners filter corners. Apply tapers and demeans before and after application of filter.

  • taper_percentage (float) – percentage [0, 1] of taper to apply to head and tail of the data before and after preprocessing

  • corners (int) – number of filter corners to apply if `filter`==True

  • zerophase (bool) – apply a zerophase filter (True) or not (False). Zerophase filters are run back and forth meaning no phase shift is applied, but more waveform distorition may be present.

  • remove_response (bool) – flag, remove instrument response from ‘obs’ type data using the provided inv. Defaults to False. Kwargs are passed directly to the the ObsPy remove_response function. See ObsPy docs for available options.

  • convolve_with_stf (bool) – flag, convolve ‘syn’ type data with a Gaussian source time function to mimic a finite source. Used when half half duration in seismic simulations is set to 0. Defaults to True and relies on parameters half_duration

  • half_duration (float) – Source time function half duration in units of seconds. Only used if `convolve_with_stf`==True

  • normalize_to (str) – allow for normalizing the amplitudes of the two traces. Choices are: ‘obs’: normalize synthetic waveforms to the max amplitude of obs ‘syn’: normalize observed waveform to the max amplitude of syn ‘one’: normalize both waveforms so that their max amplitude is 1

window(fix_windows=False, iteration=None, step_count=None, force=False)[source]

Evaluate misfit windows using Pyflex. Save windows to ASDFDataSet. Allows previously defined windows to be retrieved from ASDFDataSet.

Note

  • Windows are stored as dictionaries of pyflex.Window objects.

  • All windows are saved into the ASDFDataSet, even if retrieved.

  • STA/LTA information is collected and stored internally.

Parameters:
  • fix_windows (bool) – do not pick new windows, but load windows from the given dataset from ‘iteration’ and ‘step_count’

  • iteration (int or str) – if ‘fix_windows’ is True, look for windows in this iteration. If None, will check the latest iteration/step_count in the given dataset

  • step_count (int or str) – if ‘fix_windows’ is True, look for windows in this step_count. If None, will check the latest iteration/step_count in the given dataset

  • force (bool) – ignore flag checks and run function, useful if e.g. external preprocessing is used that doesn’t meet flag criteria

retrieve_windows(iteration, step_count, return_previous)[source]

Mid-level window selection function that retrieves windows from a PyASDF Dataset, recalculates window criteria, and attaches window information to Manager. No access to rejected window information.

Parameters:
  • iteration (int or str) – retrieve windows from the given iteration

  • step_count (int or str) – retrieve windows from the given step count in the given dataset

  • return_previous (bool) – if True: return windows from the previous step count in relation to the given iteration/step_count. if False: return windows from the given iteration/step_count

select_windows_plus()[source]

Mid-level custom window selection function that calls Pyflex select windows, but includes additional window suppression functionality. Includes custom Pyflex addition of outputting rejected windows, which will be used internally for plotting.

Note

Pyflex will throw a ValueError if the arrival of the P-wave is too close to the initial portion of the waveform, considered the ‘noise’ section. This happens for short source-receiver distances (< 100km).

This error becomes a PyflexError if no event/station attributes are provided to the WindowSelector

We could potentially deal with this by zero-padding the waveforms, and running select_windows() again, but for now we just raise a ManagerError and allow processing to continue

measure(force=False)[source]

Measure misfit and calculate adjoint sources using PyAdjoint.

Method for caluculating misfit set in Config, Pyadjoint expects standardized traces with the same spectral content, so this function will not run unless these flags are passed.

Returns a dictionary of adjoint sources based on component. Saves resultant dictionary to a pyasdf dataset if given.

Note

Pyadjoint returns an unscaled misfit value for an entire set of windows. To return a “total misfit” value as defined by Tape (2010) Eq. 6, the total summed misfit will need to be scaled by the number of misfit windows chosen in Manager.window().

Parameters:

force (bool) – ignore flag checks and run function, useful if e.g. external preprocessing is used that doesn’t meet flag criteria

_format_windows()[source]

In pyadjoint.calculate_adjoint_source, the window needs to be a list of lists, with each list containing the [left_window, right_window] Each window argument should be given in units of time (seconds).

Return type:

dict of list of lists

Returns:

dictionary with key related to individual components, and corresponding to a list of lists containing window start and end

plot(choice='both', save=None, show=True, corners=None, figsize=None, dpi=100, **kwargs)[source]

Plot observed and synthetics waveforms, misfit windows, STA/LTA and adjoint sources for all available components. Append information about misfit, windows and window selection. Also as subplot create a source receiver map which contains annotated information detailing src-rcv relationship like distance and BAz. Options to plot either or.

For valid key word arguments see visuals.manager_plotter and visuals.map_maker

Parameters:
  • show (bool) – show the plot once generated, defaults to False

  • save (str) – absolute filepath and filename if figure should be saved

  • corners – {lat_min, lat_max, lon_min, lon_max} corners to cut the map to, otherwise a global map is provided

  • choice (str) – choice for what to plot: * ‘wav’: plot waveform figure only * ‘map’: plot a source-receiver map only * ‘both’ (default): plot waveform and source-receiver map together

  • figsize (tuple) – optional size of the figure, set by plot()

  • dpi (int) – optional dots per inch (resolution) of figure

exception pyatoa.ManagerError[source]

Bases: Exception

A class-wide custom exception raised when functions fail gracefully

class pyatoa.Executive(event_ids, station_codes, config, max_stations=4, max_events=1, cat='+', log_level='DEBUG', cwd=None, datasets=None, figures=None, logs=None, adjsrcs=None, ds_fid_template=None)[source]

The Executive is hierarchically above Pyatoa’s core class, the Manager. It sets up a simple framework to organize and parallelize misfit quantification.

property codes

Define a set of event-station codes that are used to traverse through all possible source receiver combinations.

Note

Workaround for having it be pretty difficult to pass multiple arguments into an executor. Just pass a list of strings that is split by the parallel processes

check()[source]

Parameter checking

process()[source]

Process all events concurrently

process_event(event_id)[source]

Process all given stations concurrently for a single event

Parameters:

event_id (str) – one value from the Executor.events list specifying a given event to process

process_station(event_id_and_station_code)[source]

Parallel process multiple Managers simultaneously, which is the biggest time sync. IO is done in serial to get around BlockingIO

Note

Very broad exceptions to keep process running smoothly, you will need to check log messages individually to figure out if and where things did not work

Note

Employs a workaround to inability to parallel write to HDF5 files BlockingIOError by doing the processing first, and then waiting for each process to finish writing before accessing.

Parameters:

event_id_and_station_code (str) – a string concatenation of a given event id and station code, which will be used to process a single source receiver pair

_check_rank(event_id_and_station_code)[source]

Poor man’s method for determining the processor rank for a given event. Used so that processes that happen only once (e.g., writing config) are done consistently by one process

Parameters:

event_id_and_station_code (str) – a string concatenation of a given event id and station code, which will be used to process a single source receiver pair

Return type:

int

Returns:

rank index in Executive.codes based on event and station

_generate_logger(log_path)[source]

Create a log file for each source. No stream handler, only file output Also create a memory handler to dump all log messages at once, rather than as they happen, allowing multiple stations to write to the same file sequentially

Parameters:

log_path (str) – path and filename to save log file

class pyatoa.Inspector(tag='default', verbose=True)[source]

Bases: pyatoa.visuals.insp_plot.InspectorPlotter

This plugin object will collect information from a Pyatoa run folder and allow the User to easily understand statistical information or generate statistical plots to help understand a seismic inversion.

Inherits plotting capabilities from InspectorPlotter class to reduce clutter

property keys

Shorthand to access the keys of the Windows dataframe

property events

Return an array of all event ids

property stations

Return an array of all stations

property networks

Return an array of all stations

property netsta

Return a Dataframe containing unique network-station idents

property srcrcv

Return a dataframe with source-receiver information, dists and baz

property pairs

Determine the number of unique source-receiver pairs

property iterations

Return an array of all iteration

property steps

Returns a pandas. Series of iteration with values listing steps

property models

Return a dict of model numbers related to a unique iteration/step

property initial_model

Return tuple of the iteration and step count corresponding M00

property final_model

Return tuple of iteration and step count for final accepted model

property good_models

Return models that are only status 0 or 1 (initial or success)

property restarts

Try to guess the indices of restarts for convergence plot based on misfit increase in adjacent good models as well as discontinous misfit values for the final line search model and subsequent initial model. Not guaranteed to catch everything so may require manual review using the convergence() function

property evaluations

Returns the number of iterations, or the sum of all step counts

property mags

Return a dictionary of event magnitudes

property times

Return a dictionary of event origin times

property depths

Return a dictionary of event depths in units of meters

_get_str()[source]

Get the string representation once and save as internal attribute

__str__()[source]

Return a list of all variables and functions available for quick ref

__repr__()[source]

Return repr(self).

_try_print(a)[source]

Try-except catch for property print statements

_get_srcrcv_from_dataset(ds)[source]

Get source and receiver information from dataset, this includes latitude and longitude values for both, and event information including magnitude, origin time, id, etc.

Returns Dataframes for sources and receivers iff they are not already contained in the class dataframes, to avoid duplicates.

Returns empty DataFrames if no unique info was found.

Parameters:

ds (pyasdf.ASDFDataSet) – dataset to query for distances

Rtype source:

pandas.core.frame.DataFrame

Return source:

single row Dataframe containing event info from dataset

Rtype receivers:

multiindexed dataframe containing unique station info

_get_windows_from_dataset(ds)[source]

Get window and misfit information from dataset auxiliary data Model and Step information should match between the two auxiliary data objects MisfitWindows and AdjointSources

TODO: break this into _get_windows_from_dataset and

_get_adjsrcs_from_dataset?

Parameters:

ds (pyasdf.ASDFDataSet) – dataset to query for misfit:

Return type:

pandas.DataFrame

Returns:

a dataframe object containing information per misfit window

_parse_nonetype_eval(iteration, step_count)[source]

Whenever a user does not choose an iteration or step count, e.g., in plotting functions, this function defines default values based on the initial model (if neither given), or the last step count for a given iteration (if only iteration is given). Only step count is not allowed

Parameters:
  • iteration (str) – chosen iteration, formatted as e.g., ‘i01’

  • step_count (str) – chosen step count, formatted as e.g., ‘s00’

Return type:

tuple of str

Returns:

(iteration, step_count) default values for the iteration and step_count

discover(path='./', ignore_symlinks=True)[source]

Allow the Inspector to scour through a path and find relevant files, appending them to the internal structure as necessary.

Parameters:
  • path (str) – path to the pyasdf.asdf_data_set.ASDFDataSets that were outputted by the Seisflows workflow

  • ignore_symlinks (bool) – skip over symlinked HDF5 files when discovering

append(dsfid, srcrcv=True, windows=True)[source]

Simple function to parse information from a pyasdf.asdf_data_setASDFDataSet file and append it to the currect collection of information.

Parameters:
  • dsfid (str) – fid of the dataset

  • srcrcv (bool) – gather source-receiver information

  • windows (bool) – gather window information

extend(windows)[source]

Extend the current Inspector data frames with the windows from another Inspector. This is useful for when an inversion has been run in legs, so two individual inspectors constitute a single inversion.

Note

The current inspector is considered leg A, and the argument ‘windows’ is considered leg B. Leg B will have its iteration numbers changed to reflect this

Warning

This will only work if all the events and stations are the same. That is, only two identical inversion scenarios can be used.

Parameters:

windows (pandas.core.data_frame.DataFrame or list of DataFrames) – Windows from a separate inspector object that will be used to extend the current Inspector. Can also be provided as a list of DataFrames to extend multiple times.

save(path='./', fmt='csv', tag=None)[source]

Save the downloaded attributes into JSON files for easier re-loading.

Note

fmt == ‘hdf’ requires ‘pytables’ to be installed in the environment

Parameters:
  • tag (str) – tag to use to save files, defaults to the class tag but allows for the option of overwriting that

  • path (str) – optional path to save to, defaults to cwd

  • fmt (str) – format of the files to write, default csv

write(**kwargs)[source]

Same as Inspector.save(), but I kept writing .write()

read(path='./', fmt=None, tag=None)[source]

Load previously saved attributes to avoid re-processing data.

Parameters:
  • tag (str) – tag to use to look for files, defaults to the class tag but allows for the option of overwriting that

  • path (str) – optional path to file, defaults to cwd

  • fmt (str) – format of the files to read, default csv

copy()[source]

Return a deep copy of the Inspector

reset()[source]

Simple function to wipe out all the internal attributes

isolate(iteration=None, step_count=None, event=None, network=None, station=None, channel=None, component=None, keys=None, exclude=None, unique_key=None)[source]

Returns a new dataframe that is grouped by a given index if variable is None, defaults to returning all available values

Parameters:
  • event (str) – event id e.g. ‘2018p130600’ (optional)

  • iteration (str) – iteration e.g. ‘i00’ (optional)

  • step_count (str) – step count e.g. ‘s00’ (optional)

  • station (str) – station name e.g. ‘BKZ’ (optional)

  • network (str) – network name e.g. ‘NZ’ (optional)

  • channel (str) – channel name e.g. ‘HHE’ (optional)

  • component (str) – component name e.g. ‘Z’ (optional)

  • unique_key (str) – isolates model, event and station information, alongside a single info key, such as dlnA. Useful for looking at one variable without have to write out long lists to ‘exclude’ or ‘keys’

  • keys (list) – list of keys to retain in returned dataset, ‘exclude’ will override this variable, best to use them separately

  • exclude (list) – list of keys to remove from returned dataset

Return type:

pandas.DataFrame

Returns:

DataFrame with selected rows based on selected column values

nwin(level='step')[source]

Find the cumulative length of misfit windows for a given iter/step, or the number of misfit windows for a given iter/step.

Note

Neat trick to select just by station: insp.windows(level=’station’).query(“station == ‘BFZ’”)

Parameters:

level (str) –

Level to get number of windows by. Default is ‘step’

  • step: to get the total window length and number of windows for the given step count.

  • station: to get this on a per-station basis, useful for identifying sta quality.

Return type:

pandas.DataFrame

Returns:

a DataFrame with indices corresponding to iter, step, columns listing the number of windows (nwin) and the cumulative length of windows in seconds (length_s)

misfit(level='step', reset=False)[source]

Sum the total misfit for a given iteration based on the individual misfits for each misfit window, and the number of sources used. Calculated misfits are stored internally to avoid needing to recalculate each time this function is called

Note

To get per-station misfit on a per-step basis

df = insp.misfits(level=”station”).query(“station == ‘TOZ’”) df.groupby([‘iteration’, ‘step’]).sum()

Parameters:
  • level (str) – Default is ‘step’ ‘station’: unscaled misfit on a per-station basis ‘step’: to get total misfit for a given step count. ‘event’: to get this on a per-event misfit.

  • reset (bool) – reset internally stored attribute and re-calculate misfit

Return type:

dict

Returns:

total misfit for each iteration in the class

stats(level='event', choice='mean', key=None, iteration=None, step_count=None)[source]

Calculate the per-level statistical values for DataFrame

Parameters:
  • level (str) – get statistical values per ‘event’ or ‘station’

  • choice (str) – Pandas function, ‘mean’, ‘std’, ‘var’, etc.

  • iteration (str) – filter for a given iteration

  • step_count (str) – filter for a given step count

Return type:

pandas.DataFrame

Returns:

DataFrame containing the choice of stats for given options

minmax(iteration=None, step_count=None, keys=None, quantities=None, pprint=True)[source]

Calculate and print the min/max values for a whole slew of parameters for a given iteration and step count. Useful for understanding the worst/ best case scenarios and their relation to the average.

Parameters:
  • iteration (str) – filter for a given iteration

  • step_count (str) – filter for a given step count

  • keys (list of str) – keys to calculate minmax values for, must be a subset of Inspector.windows.keys()

  • quantities (list of str) – quantities to get values for, e.g. min, max, median, must be an attribute of pandas.core.series.Series

  • pprint (bool) – pretty print the resulting values

Return type:

dict

Returns:

dictionary containing the minmax stats

compare(iteration_a=None, step_count_a=None, iteration_b=None, step_count_b=None)[source]

Compare the misfit and number of windows on an event by event basis between two evaluations. Provides absolute values as well as differences. Final dataframe is sorted by the difference in misfit, showing the most and least improved events.

Parameters:
  • iteration_a (str) – initial iteration to use in comparison

  • step_count_a (str) – initial step count to use in comparison

  • iteration_b (str) – final iteration to use in comparison

  • step_count_b (str) – final step count to use in comparison

Return type:

pandas.core.data_frame.DataFrame

Returns:

a sorted data frame containing the difference of misfit and number of windows between final and initial

compare_windows(iteration_a=None, step_count_a=None, iteration_b=None, step_count_b=None)[source]

Compare individual, matching misfit windows between two evaluations.

Note

This will only work/make sense if the windows were fixed between the two evaluations, such that they share the exact same window selections.

Parameters:
  • iteration_a (str) – initial iteration to use in comparison

  • step_count_a (str) – initial step count to use in comparison

  • iteration_b (str) – final iteration to use in comparison

  • step_count_b (str) – final step count to use in comparison

Return type:

pandas.core.data_frame.DataFrame

Returns:

a data frame containing differences of windowing paramenters between final and initial models

filter_sources(lat_min=None, lat_max=None, lon_min=None, lon_max=None, depth_min=None, depth_max=None, mag_min=None, mag_max=None, min_start=None, max_start=None)[source]

Go through misfits and windows and remove events that fall outside a certain bounding box. Return sources that fall within the box. Bounds are inclusive of given values.

Parameters:
  • lat_min (float) – minimum latitude in degrees

  • lat_max (float) – maximum latitude in degrees

  • lon_min (float) – minimum longitude in degrees

  • lon_max (float) – maximum longitude in degrees

  • depth_min (float) – minimum depth of event in km, depth is positive

  • depth_max (float) – maximum depth of event in km, depth is positive

  • mag_min (float) – minimum magnitude

  • mag_max (float) – maximum magnitude

  • min_start (obspy.UTCDateTime()) – minimum origintime of event

  • max_start (obspy.UTCDateTime()) – maximum origintime of event

get_models()[source]

Return a sorted list of misfits which correspond to accepted models, label discards of the line search, and differentiate the final accepted line search evaluation from the previous iteration and the initial evaluation of the current iteration.

Note

State and status is given as: 0 == INITIAL function evaluation for the model; 1 == SUCCESS -ful function evaluation for the model; -1 == DISCARD trial step from line search.

Return type:

pandas.core.data_frame.DataFrame

Returns:

a dataframe containing model numbers, their corresponding iteration, step count and misfit value, and the status of the function evaluation.

get_srcrcv()[source]

Retrieve information regarding source-receiver pairs including distance, backazimuth and theoretical traveltimes for a 1D Earth model.

Return type:

pandas.core.frame.DataFrame

Returns:

separate dataframe with distance and backazimuth columns, that may be used as a lookup table

get_unique_models(float_precision=3)[source]

Find all accepted models (status 0 or 1) that have a unique misfit value. Because some forward evaluations are repeats of the previous line search evaluation, they will effectively be the same evaluation so they can be removed

Parameters:

float_precision (int) – identical misfit values will differ after some decimal place. this value determines which decimal place to truncate the values for comparison