pyatoa.core.inspector

A class to aggregate time windows, source-receiver information and misfit using Pandas.

Module Contents

Classes

Inspector

This plugin object will collect information from a Pyatoa run folder and

class pyatoa.core.inspector.Inspector(tag='default', verbose=True)[source]

Bases: pyatoa.visuals.insp_plot.InspectorPlotter

This plugin object will collect information from a Pyatoa run folder and allow the User to easily understand statistical information or generate statistical plots to help understand a seismic inversion.

Inherits plotting capabilities from InspectorPlotter class to reduce clutter

property keys[source]

Shorthand to access the keys of the Windows dataframe

property events[source]

Return an array of all event ids

property stations[source]

Return an array of all stations

property networks[source]

Return an array of all stations

property netsta[source]

Return a Dataframe containing unique network-station idents

property srcrcv[source]

Return a dataframe with source-receiver information, dists and baz

property pairs[source]

Determine the number of unique source-receiver pairs

property iterations[source]

Return an array of all iteration

property steps[source]

Returns a pandas. Series of iteration with values listing steps

property models[source]

Return a dict of model numbers related to a unique iteration/step

property initial_model[source]

Return tuple of the iteration and step count corresponding M00

property final_model[source]

Return tuple of iteration and step count for final accepted model

property good_models[source]

Return models that are only status 0 or 1 (initial or success)

property restarts[source]

Try to guess the indices of restarts for convergence plot based on misfit increase in adjacent good models as well as discontinous misfit values for the final line search model and subsequent initial model. Not guaranteed to catch everything so may require manual review using the convergence() function

property evaluations[source]

Returns the number of iterations, or the sum of all step counts

property mags[source]

Return a dictionary of event magnitudes

property times[source]

Return a dictionary of event origin times

property depths[source]

Return a dictionary of event depths in units of meters

_get_str()[source]

Get the string representation once and save as internal attribute

__str__()[source]

Return a list of all variables and functions available for quick ref

__repr__()[source]

Return repr(self).

_try_print(a)[source]

Try-except catch for property print statements

_get_srcrcv_from_dataset(ds)[source]

Get source and receiver information from dataset, this includes latitude and longitude values for both, and event information including magnitude, origin time, id, etc.

Returns Dataframes for sources and receivers iff they are not already contained in the class dataframes, to avoid duplicates.

Returns empty DataFrames if no unique info was found.

Parameters:

ds (pyasdf.ASDFDataSet) – dataset to query for distances

Rtype source:

pandas.core.frame.DataFrame

Return source:

single row Dataframe containing event info from dataset

Rtype receivers:

multiindexed dataframe containing unique station info

_get_windows_from_dataset(ds)[source]

Get window and misfit information from dataset auxiliary data Model and Step information should match between the two auxiliary data objects MisfitWindows and AdjointSources

TODO: break this into _get_windows_from_dataset and

_get_adjsrcs_from_dataset?

Parameters:

ds (pyasdf.ASDFDataSet) – dataset to query for misfit:

Return type:

pandas.DataFrame

Returns:

a dataframe object containing information per misfit window

_parse_nonetype_eval(iteration, step_count)[source]

Whenever a user does not choose an iteration or step count, e.g., in plotting functions, this function defines default values based on the initial model (if neither given), or the last step count for a given iteration (if only iteration is given). Only step count is not allowed

Parameters:
  • iteration (str) – chosen iteration, formatted as e.g., ‘i01’

  • step_count (str) – chosen step count, formatted as e.g., ‘s00’

Return type:

tuple of str

Returns:

(iteration, step_count) default values for the iteration and step_count

discover(path='./', ignore_symlinks=True)[source]

Allow the Inspector to scour through a path and find relevant files, appending them to the internal structure as necessary.

Parameters:
  • path (str) – path to the pyasdf.asdf_data_set.ASDFDataSets that were outputted by the Seisflows workflow

  • ignore_symlinks (bool) – skip over symlinked HDF5 files when discovering

append(dsfid, srcrcv=True, windows=True)[source]

Simple function to parse information from a pyasdf.asdf_data_setASDFDataSet file and append it to the currect collection of information.

Parameters:
  • dsfid (str) – fid of the dataset

  • srcrcv (bool) – gather source-receiver information

  • windows (bool) – gather window information

extend(windows)[source]

Extend the current Inspector data frames with the windows from another Inspector. This is useful for when an inversion has been run in legs, so two individual inspectors constitute a single inversion.

Note

The current inspector is considered leg A, and the argument ‘windows’ is considered leg B. Leg B will have its iteration numbers changed to reflect this

Warning

This will only work if all the events and stations are the same. That is, only two identical inversion scenarios can be used.

Parameters:

windows (pandas.core.data_frame.DataFrame or list of DataFrames) – Windows from a separate inspector object that will be used to extend the current Inspector. Can also be provided as a list of DataFrames to extend multiple times.

save(path='./', fmt='csv', tag=None)[source]

Save the downloaded attributes into JSON files for easier re-loading.

Note

fmt == ‘hdf’ requires ‘pytables’ to be installed in the environment

Parameters:
  • tag (str) – tag to use to save files, defaults to the class tag but allows for the option of overwriting that

  • path (str) – optional path to save to, defaults to cwd

  • fmt (str) – format of the files to write, default csv

write(**kwargs)[source]

Same as Inspector.save(), but I kept writing .write()

read(path='./', fmt=None, tag=None)[source]

Load previously saved attributes to avoid re-processing data.

Parameters:
  • tag (str) – tag to use to look for files, defaults to the class tag but allows for the option of overwriting that

  • path (str) – optional path to file, defaults to cwd

  • fmt (str) – format of the files to read, default csv

copy()[source]

Return a deep copy of the Inspector

reset()[source]

Simple function to wipe out all the internal attributes

isolate(iteration=None, step_count=None, event=None, network=None, station=None, channel=None, component=None, keys=None, exclude=None, unique_key=None)[source]

Returns a new dataframe that is grouped by a given index if variable is None, defaults to returning all available values

Parameters:
  • event (str) – event id e.g. ‘2018p130600’ (optional)

  • iteration (str) – iteration e.g. ‘i00’ (optional)

  • step_count (str) – step count e.g. ‘s00’ (optional)

  • station (str) – station name e.g. ‘BKZ’ (optional)

  • network (str) – network name e.g. ‘NZ’ (optional)

  • channel (str) – channel name e.g. ‘HHE’ (optional)

  • component (str) – component name e.g. ‘Z’ (optional)

  • unique_key (str) – isolates model, event and station information, alongside a single info key, such as dlnA. Useful for looking at one variable without have to write out long lists to ‘exclude’ or ‘keys’

  • keys (list) – list of keys to retain in returned dataset, ‘exclude’ will override this variable, best to use them separately

  • exclude (list) – list of keys to remove from returned dataset

Return type:

pandas.DataFrame

Returns:

DataFrame with selected rows based on selected column values

nwin(level='step')[source]

Find the cumulative length of misfit windows for a given iter/step, or the number of misfit windows for a given iter/step.

Note

Neat trick to select just by station: insp.windows(level=’station’).query(“station == ‘BFZ’”)

Parameters:

level (str) –

Level to get number of windows by. Default is ‘step’

  • step: to get the total window length and number of windows for the given step count.

  • station: to get this on a per-station basis, useful for identifying sta quality.

Return type:

pandas.DataFrame

Returns:

a DataFrame with indices corresponding to iter, step, columns listing the number of windows (nwin) and the cumulative length of windows in seconds (length_s)

misfit(level='step', reset=False)[source]

Sum the total misfit for a given iteration based on the individual misfits for each misfit window, and the number of sources used. Calculated misfits are stored internally to avoid needing to recalculate each time this function is called

Note

To get per-station misfit on a per-step basis

df = insp.misfits(level=”station”).query(“station == ‘TOZ’”) df.groupby([‘iteration’, ‘step’]).sum()

Parameters:
  • level (str) – Default is ‘step’ ‘station’: unscaled misfit on a per-station basis ‘step’: to get total misfit for a given step count. ‘event’: to get this on a per-event misfit.

  • reset (bool) – reset internally stored attribute and re-calculate misfit

Return type:

dict

Returns:

total misfit for each iteration in the class

stats(level='event', choice='mean', key=None, iteration=None, step_count=None)[source]

Calculate the per-level statistical values for DataFrame

Parameters:
  • level (str) – get statistical values per ‘event’ or ‘station’

  • choice (str) – Pandas function, ‘mean’, ‘std’, ‘var’, etc.

  • iteration (str) – filter for a given iteration

  • step_count (str) – filter for a given step count

Return type:

pandas.DataFrame

Returns:

DataFrame containing the choice of stats for given options

minmax(iteration=None, step_count=None, keys=None, quantities=None, pprint=True)[source]

Calculate and print the min/max values for a whole slew of parameters for a given iteration and step count. Useful for understanding the worst/ best case scenarios and their relation to the average.

Parameters:
  • iteration (str) – filter for a given iteration

  • step_count (str) – filter for a given step count

  • keys (list of str) – keys to calculate minmax values for, must be a subset of Inspector.windows.keys()

  • quantities (list of str) – quantities to get values for, e.g. min, max, median, must be an attribute of pandas.core.series.Series

  • pprint (bool) – pretty print the resulting values

Return type:

dict

Returns:

dictionary containing the minmax stats

compare(iteration_a=None, step_count_a=None, iteration_b=None, step_count_b=None)[source]

Compare the misfit and number of windows on an event by event basis between two evaluations. Provides absolute values as well as differences. Final dataframe is sorted by the difference in misfit, showing the most and least improved events.

Parameters:
  • iteration_a (str) – initial iteration to use in comparison

  • step_count_a (str) – initial step count to use in comparison

  • iteration_b (str) – final iteration to use in comparison

  • step_count_b (str) – final step count to use in comparison

Return type:

pandas.core.data_frame.DataFrame

Returns:

a sorted data frame containing the difference of misfit and number of windows between final and initial

compare_windows(iteration_a=None, step_count_a=None, iteration_b=None, step_count_b=None)[source]

Compare individual, matching misfit windows between two evaluations.

Note

This will only work/make sense if the windows were fixed between the two evaluations, such that they share the exact same window selections.

Parameters:
  • iteration_a (str) – initial iteration to use in comparison

  • step_count_a (str) – initial step count to use in comparison

  • iteration_b (str) – final iteration to use in comparison

  • step_count_b (str) – final step count to use in comparison

Return type:

pandas.core.data_frame.DataFrame

Returns:

a data frame containing differences of windowing paramenters between final and initial models

filter_sources(lat_min=None, lat_max=None, lon_min=None, lon_max=None, depth_min=None, depth_max=None, mag_min=None, mag_max=None, min_start=None, max_start=None)[source]

Go through misfits and windows and remove events that fall outside a certain bounding box. Return sources that fall within the box. Bounds are inclusive of given values.

Parameters:
  • lat_min (float) – minimum latitude in degrees

  • lat_max (float) – maximum latitude in degrees

  • lon_min (float) – minimum longitude in degrees

  • lon_max (float) – maximum longitude in degrees

  • depth_min (float) – minimum depth of event in km, depth is positive

  • depth_max (float) – maximum depth of event in km, depth is positive

  • mag_min (float) – minimum magnitude

  • mag_max (float) – maximum magnitude

  • min_start (obspy.UTCDateTime()) – minimum origintime of event

  • max_start (obspy.UTCDateTime()) – maximum origintime of event

get_models()[source]

Return a sorted list of misfits which correspond to accepted models, label discards of the line search, and differentiate the final accepted line search evaluation from the previous iteration and the initial evaluation of the current iteration.

Note

State and status is given as: 0 == INITIAL function evaluation for the model; 1 == SUCCESS -ful function evaluation for the model; -1 == DISCARD trial step from line search.

Return type:

pandas.core.data_frame.DataFrame

Returns:

a dataframe containing model numbers, their corresponding iteration, step count and misfit value, and the status of the function evaluation.

get_srcrcv()[source]

Retrieve information regarding source-receiver pairs including distance, backazimuth and theoretical traveltimes for a 1D Earth model.

Return type:

pandas.core.frame.DataFrame

Returns:

separate dataframe with distance and backazimuth columns, that may be used as a lookup table

get_unique_models(float_precision=3)[source]

Find all accepted models (status 0 or 1) that have a unique misfit value. Because some forward evaluations are repeats of the previous line search evaluation, they will effectively be the same evaluation so they can be removed

Parameters:

float_precision (int) – identical misfit values will differ after some decimal place. this value determines which decimal place to truncate the values for comparison