ccatkidlib.analysis.utils package

Submodules

ccatkidlib.analysis.utils.dataframe module

ccatkidlib.analysis.utils.dataframe.add_data_to_properties(obj, df, col_name) DataFrame

Add a quantity calculated with a data object’s data DataFrame to the properties DataFrame

Note

  • The df DataFrame does not necessarily need to derive from a data object’s data DataFrame, but the structure of this method is designed specifically for that use case

Example

Parameters:
  • df (pl.DataFrame) – Polars DataFrame with the data to be added to the properties DataFrame. The DataFrame must be in wide format with the column names being tone numbers (e.g., ‘0000’, ‘0001’, etc.)

  • col_name (str) – Name of column to add to properties DataFrame

ccatkidlib.analysis.utils.dataframe.check_properties(obj, col_name: str, include: int | list[int] | None = None, exclude: int | list[int] | None = None, recalc: bool = False) list[int]

Check which subset of detectors do not have a value for the specified column

Parameters:
  • col_name (str) – Name of data column

  • () (exclude)

  • ()

  • recalc (bool)

Returns:

List of tones without a value for the specified column

Return type:

return (list[int])

ccatkidlib.analysis.utils.dataframe.coalesce_join(left_df: DataFrame, right_df: DataFrame, on: str, shared_cols: str | list[str]) DataFrame

Join two Polars DataFrames, replacing shared columns with non null values from right DataFrame right_df

Parameters:
  • left_df (pl.DataFrame) – Left (old) DataFrame

  • right_df (pl.DataFrame) – Right (new) DataFrame

  • on (str | list[str]) – Columns to join two DataFrames on

  • shared_columns (str | list[str]) – Shared columns between both DataFrames

Returns:

Joined DataFrame

Return type:

return (pl.DataFrame)

ccatkidlib.analysis.utils.dataframe.get_properties(obj, col_name: str | list[str] = '.*', include: int | list[int] | None = None, exclude: int | list[int] | None = None, strict: bool = False)

Get the specified data columns and rows from the properties Polars DataFrame

Parameters:
  • col_name (str | list[str], optional) – Defaults to all columns

  • include (int | list[int] | None, optional) – Defaults to None

  • exclude (int | list[int] | None, optional) – Defaults to None

  • strict (bool, optional) – Defaults to False

ccatkidlib.analysis.utils.dataframe.parse_tones(func_include: Callable[[list[int], Any], list[Expr]], func_exclude: Callable[[list[int], Any], list[Expr]], func_all: Callable[[Any], list[Expr]], include: int | list[int] | None = None, exclude: int | list[int] | None = None, *args) any

ccatkidlib.analysis.utils.multiprocess module

Library of helper functions for multiprocessing data analysis code

ccatkidlib.analysis.utils.multiprocess.check_max_workers(max_workers: int) int

Ensure that the maximum number of worker processes specified is less than or equal to the number of available CPU cores

Parameters:

max_workers – Maximum number of workers to use for multiprocessing

Returns:

max_workers if it less than or equal to the number of CPUs, otherwise returns the number of available CPU cores

ccatkidlib.analysis.utils.multiprocess.create_batches(func: Callable[[DataFrame], Series], tones: list[int], col_name: list[str], schema: Schema, return_col: list[str], return_type: list[DataType], padding: int = 4, calc_col: list[str] | None = None, max_workers: int = 1, recalc: bool = False) tuple[Expr, list[list[int]], list[list[int]], str | list[str], list[list[str]], int]
Parameters:
  • func (Callable[[pl.DataFrame], pl.Series]) – Analysis function to apply to tones. Must take a Polars DataFrame as the input and return a Polars Series

  • [list[int]] (tones)

ccatkidlib.analysis.utils.multiprocess.optional_executor(max_workers: int = 1, ex: ProcessPoolExecutor | None = None) Iterator[ProcessPoolExecutor]

Context manager that yields the concurrent.futures ProcessPoolExecutor provided or creates a new one if None provided

Parameters:
  • max_workers – Maximum number of worker processes to use for multiprocessing. Only used if ex is None

  • ex – A concurrent.futures ProcessPoolExecutor

Yields:

The concurrent.futures ProcessPoolExecutor provided or a newly created one if None provided

ccatkidlib.analysis.utils.multiprocess.package_results(results_dict: dict) Series
ccatkidlib.analysis.utils.multiprocess.process_batches(func: Callable, *args, **kwargs) list[Any | Exception]
ccatkidlib.analysis.utils.multiprocess.struct_batches(struct: pl.Struct, num_data_cols: int, batch_len: int, max_workers: int) list[list[np.ndarray]]

ccatkidlib.analysis.utils.pair module

Library of helper functions for getting ccatkidlib data files and pairing with corresponding configuration files.

ccatkidlib.analysis.utils.pair.get_config(path: str | PosixPath, all_cfg: bool = False) list[str]

Get the config files associated with the specified data file.

Parameters:
  • path (str | pathlib.PosixPath) – Path of data file

  • all_cfg (bool, optional) – Whether to return config files for all drones. Defaults to False.

Returns:

List of config file paths (io_cfg, drone_cfg(s), and ext_cfg) associated with the specified data file

Return type:

list[str]

ccatkidlib.analysis.utils.pair.get_data_file(com_to: str, timestamp: str | int, data_type: str, data_dir: str = '**', date: str = '**', sess_id: str = '**', root_data_dir: str = '/') list[str]

Get a ccatkidlib data file based on provided path information.

Parameters:
  • com_to (str) – Drone that took the data. In form ‘Board.Drone’

  • timestamp (str | int) – Timestamp of data file

  • data_type (str) – Type of data file. Should be one of ‘vna’, ‘targ’, ‘timestream’.

  • data_dir (str, optional) – Directory where data is stored. Defaults to wildcard ‘**’

  • date (str, optional) – Date data was taken. Defaults to wildcard ‘**’

  • sess_id (str, optional) – ccatkidlib session ID of data. Defaults to wildcard ‘**’

  • root_data_dir (str, optional) – Root directory where data is stored. Defaults to ‘/’

Returns:

Path of found data file. Returns ‘invalid/path’ if data file not found.

Return type:

str

ccatkidlib.analysis.utils.pair.get_sess_dir(sess_id, data_dir: str = '**', date: str = '**', root_data_dir: str = '/') str
ccatkidlib.analysis.utils.pair.get_sweep(path: str | PosixPath, **kwargs)
ccatkidlib.analysis.utils.pair.replace_root(path: str | PosixPath, old_root: str, new_root: str)

Replace the root directory of a file path with a new root

Parameters:
  • path (str | pathlib.PosixPath) – Original file path

  • old_root (str) – Old root directory of file path to be replaced

  • new_root (str) – New root directory to replace the old root

Returns:

New file path with the root directory replaced. If the new file path does not exist, returns the original path.

Return type:

return (str)

ccatkidlib.analysis.utils.pickle module

ccatkidlib.analysis.utils.pickle.multi_dump(network: Network, pickle_name: str, num_segments: int = 1, transform: Callable[[Network], Network] | None = None) None

Segment specified Network and transform/pickle segments individually. Specifically, multiple Network objects are created, each with a subset of the Detectors in the .data DataFrame

Parameters:
  • networkNetwork object to segment and run transformations on/pickle

  • pickle_name – Name of pickle file

  • num_segments – Number of segments to split Network object into

  • transform – Function to run on each segment. Must return a Network

ccatkidlib.analysis.utils.pickle.multi_load(com_to: str, pickle_name: str, sess_id: str, data_dir: str = '**', date: str = '**', root_data_dir: str = '/', transform: Callable[[Network], Network] | None = None) Network

Module contents