ccatkidlib.analysis.utils package

Submodules

ccatkidlib.analysis.utils.dataframe module

ccatkidlib.analysis.utils.dataframe.add_data_to_properties(obj, df, col_name) → DataFrame

Add a quantity calculated with a data object’s data DataFrame to the properties DataFrame

Note

The df DataFrame does not necessarily need to derive from a data object’s data DataFrame, but the structure of this method is designed specifically for that use case

Example

Parameters:

df (pl.DataFrame) – Polars DataFrame with the data to be added to the properties DataFrame. The DataFrame must be in wide format with the column names being tone numbers (e.g., ‘0000’, ‘0001’, etc.)
col_name (str) – Name of column to add to properties DataFrame

ccatkidlib.analysis.utils.dataframe.check_properties(obj, col_name: str, include: int | list[int] | None = None, exclude: int | list[int] | None = None, recalc: bool = False) → list[int]

Check which subset of detectors do not have a value for the specified column

Parameters:

col_name (str) – Name of data column
() (exclude)
()
recalc (bool)

Returns:

List of tones without a value for the specified column

Return type:

return (list[int])

ccatkidlib.analysis.utils.dataframe.coalesce_join(left_df: DataFrame, right_df: DataFrame, on: str, shared_cols: str | list[str]) → DataFrame

Join two Polars DataFrames, replacing shared columns with non null values from right DataFrame right_df

Parameters:

left_df (pl.DataFrame) – Left (old) DataFrame
right_df (pl.DataFrame) – Right (new) DataFrame
on (str | list[str]) – Columns to join two DataFrames on
shared_columns (str | list[str]) – Shared columns between both DataFrames

Returns:

Joined DataFrame

Return type:

return (pl.DataFrame)

Get the specified data columns and rows from the properties Polars DataFrame

Parameters:

col_name (str | list[str], optional) – Defaults to all columns
include (int | list[int] | None, optional) – Defaults to None
exclude (int | list[int] | None, optional) – Defaults to None
strict (bool, optional) – Defaults to False

ccatkidlib.analysis.utils.dataframe.parse_tones(func_include: Callable[[list[int], Any], list[Expr]], func_exclude: Callable[[list[int], Any], list[Expr]], func_all: Callable[[Any], list[Expr]], include: int | list[int] | None = None, exclude: int | list[int] | None = None, *args) → any

ccatkidlib.analysis.utils.multiprocess module

Library of helper functions for multiprocessing data analysis code

ccatkidlib.analysis.utils.multiprocess.check_max_workers(max_workers: int) → int

Ensure that the maximum number of worker processes specified is less than or equal to the number of available CPU cores

Parameters:: max_workers – Maximum number of workers to use for multiprocessing
Returns:: max_workers if it less than or equal to the number of CPUs, otherwise returns the number of available CPU cores

ccatkidlib.analysis.utils.multiprocess.create_batches(func: Callable[[DataFrame], Series], tones: list[int], col_name: list[str], schema: Schema, return_col: list[str], return_type: list[DataType], padding: int = 4, calc_col: list[str] | None = None, max_workers: int = 1, recalc: bool = False) → tuple[Expr, list[list[int]], list[list[int]], str | list[str], list[list[str]], int]

Parameters:

func (Callable[[pl.DataFrame], pl.Series]) – Analysis function to apply to tones. Must take a Polars DataFrame as the input and return a Polars Series
[list[int]] (tones)

ccatkidlib.analysis.utils.multiprocess.optional_executor(max_workers: int = 1, ex: ProcessPoolExecutor | None = None) → Iterator[ProcessPoolExecutor]

Context manager that yields the concurrent.futures ProcessPoolExecutor provided or creates a new one if None provided

Parameters:

max_workers – Maximum number of worker processes to use for multiprocessing. Only used if ex is None
ex – A concurrent.futures ProcessPoolExecutor

Yields:

The concurrent.futures ProcessPoolExecutor provided or a newly created one if None provided

ccatkidlib.analysis.utils.multiprocess.package_results(results_dict: dict) → Series

ccatkidlib.analysis.utils.multiprocess.process_batches(func: Callable, *args, **kwargs) → list[Any | Exception]

ccatkidlib.analysis.utils.multiprocess.struct_batches(struct: pl.Struct, num_data_cols: int, batch_len: int, max_workers: int) → list[list[np.ndarray]]

ccatkidlib.analysis.utils.pair module

Library of helper functions for getting ccatkidlib data files and pairing with corresponding configuration files.

ccatkidlib.analysis.utils.pair.get_config(path: str | PosixPath, all_cfg: bool = False) → list[str]

Get the config files associated with the specified data file.

Parameters:

path (str | pathlib.PosixPath) – Path of data file
all_cfg (bool, optional) – Whether to return config files for all drones. Defaults to False.

Returns:

List of config file paths (io_cfg, drone_cfg(s), and ext_cfg) associated with the specified data file

Return type:

list[str]

ccatkidlib.analysis.utils.pair.get_data_file(com_to: str, timestamp: str | int, data_type: str, data_dir: str = '**', date: str = '**', sess_id: str = '**', root_data_dir: str = '/') → list[str]

Get a ccatkidlib data file based on provided path information.

Parameters:

com_to (str) – Drone that took the data. In form ‘Board.Drone’
timestamp (str | int) – Timestamp of data file
data_type (str) – Type of data file. Should be one of ‘vna’, ‘targ’, ‘timestream’.
data_dir (str, optional) – Directory where data is stored. Defaults to wildcard ‘**’
date (str, optional) – Date data was taken. Defaults to wildcard ‘**’
sess_id (str, optional) – ccatkidlib session ID of data. Defaults to wildcard ‘**’
root_data_dir (str, optional) – Root directory where data is stored. Defaults to ‘/’

Returns:

Path of found data file. Returns ‘invalid/path’ if data file not found.

Return type:

str

ccatkidlib.analysis.utils.pair.get_sess_dir(sess_id, data_dir: str = '**', date: str = '**', root_data_dir: str = '/') → str

ccatkidlib.analysis.utils.pair.get_sweep(path: str | PosixPath, **kwargs)

ccatkidlib.analysis.utils.pair.replace_root(path: str | PosixPath, old_root: str, new_root: str)

Replace the root directory of a file path with a new root

Parameters:

path (str | pathlib.PosixPath) – Original file path
old_root (str) – Old root directory of file path to be replaced
new_root (str) – New root directory to replace the old root

Returns:

New file path with the root directory replaced. If the new file path does not exist, returns the original path.

Return type:

return (str)

ccatkidlib.analysis.utils.pickle module

ccatkidlib.analysis.utils.pickle.multi_dump(network: Network, pickle_name: str, num_segments: int = 1, transform: Callable[[Network], Network] | None = None) → None

Segment specified Network and transform/pickle segments individually. Specifically, multiple Network objects are created, each with a subset of the Detectors in the .data DataFrame

Parameters:

network – Network object to segment and run transformations on/pickle
pickle_name – Name of pickle file
num_segments – Number of segments to split Network object into
transform – Function to run on each segment. Must return a Network

ccatkidlib.analysis.utils.pickle.multi_load(com_to: str, pickle_name: str, sess_id: str, data_dir: str = '**', date: str = '**', root_data_dir: str = '/', transform: Callable[[Network], Network] | None = None) → Network

ccatkidlib.analysis.utils package

Submodules

ccatkidlib.analysis.utils.dataframe module

ccatkidlib.analysis.utils.multiprocess module

ccatkidlib.analysis.utils.pair module

ccatkidlib.analysis.utils.pickle module

Module contents