ccatkidlib.analysis.utils package
Submodules
ccatkidlib.analysis.utils.dataframe module
- ccatkidlib.analysis.utils.dataframe.add_data_to_properties(obj, df, col_name) DataFrame
Add a quantity calculated with a data object’s
dataDataFrame to thepropertiesDataFrameNote
The
dfDataFrame does not necessarily need to derive from a data object’sdataDataFrame, but the structure of this method is designed specifically for that use case
Example
- Parameters:
df (pl.DataFrame) – Polars DataFrame with the data to be added to the
propertiesDataFrame. The DataFrame must be in wide format with the column names being tone numbers (e.g., ‘0000’, ‘0001’, etc.)col_name (str) – Name of column to add to
propertiesDataFrame
- ccatkidlib.analysis.utils.dataframe.check_properties(obj, col_name: str, include: int | list[int] | None = None, exclude: int | list[int] | None = None, recalc: bool = False) list[int]
Check which subset of detectors do not have a value for the specified column
- Parameters:
col_name (str) – Name of data column
() (exclude)
()
recalc (bool)
- Returns:
List of tones without a value for the specified column
- Return type:
return (list[int])
- ccatkidlib.analysis.utils.dataframe.coalesce_join(left_df: DataFrame, right_df: DataFrame, on: str, shared_cols: str | list[str]) DataFrame
Join two Polars DataFrames, replacing shared columns with non null values from right DataFrame
right_df- Parameters:
left_df (pl.DataFrame) – Left (old) DataFrame
right_df (pl.DataFrame) – Right (new) DataFrame
on (str | list[str]) – Columns to join two DataFrames on
shared_columns (str | list[str]) – Shared columns between both DataFrames
- Returns:
Joined DataFrame
- Return type:
return (pl.DataFrame)
- ccatkidlib.analysis.utils.dataframe.get_properties(obj, col_name: str | list[str] = '.*', include: int | list[int] | None = None, exclude: int | list[int] | None = None, strict: bool = False)
Get the specified data columns and rows from the
propertiesPolars DataFrame- Parameters:
col_name (str | list[str], optional) – Defaults to all columns
include (int | list[int] | None, optional) – Defaults to None
exclude (int | list[int] | None, optional) – Defaults to None
strict (bool, optional) – Defaults to False
- ccatkidlib.analysis.utils.dataframe.parse_tones(func_include: Callable[[list[int], Any], list[Expr]], func_exclude: Callable[[list[int], Any], list[Expr]], func_all: Callable[[Any], list[Expr]], include: int | list[int] | None = None, exclude: int | list[int] | None = None, *args) any
ccatkidlib.analysis.utils.multiprocess module
Library of helper functions for multiprocessing data analysis code
- ccatkidlib.analysis.utils.multiprocess.check_max_workers(max_workers: int) int
Ensure that the maximum number of worker processes specified is less than or equal to the number of available CPU cores
- Parameters:
max_workers – Maximum number of workers to use for multiprocessing
- Returns:
max_workersif it less than or equal to the number of CPUs, otherwise returns the number of available CPU cores
- ccatkidlib.analysis.utils.multiprocess.create_batches(func: Callable[[DataFrame], Series], tones: list[int], col_name: list[str], schema: Schema, return_col: list[str], return_type: list[DataType], padding: int = 4, calc_col: list[str] | None = None, max_workers: int = 1, recalc: bool = False) tuple[Expr, list[list[int]], list[list[int]], str | list[str], list[list[str]], int]
- Parameters:
func (Callable[[pl.DataFrame], pl.Series]) – Analysis function to apply to tones. Must take a Polars DataFrame as the input and return a Polars Series
[list[int]] (tones)
- ccatkidlib.analysis.utils.multiprocess.optional_executor(max_workers: int = 1, ex: ProcessPoolExecutor | None = None) Iterator[ProcessPoolExecutor]
Context manager that yields the concurrent.futures ProcessPoolExecutor provided or creates a new one if None provided
- Parameters:
max_workers – Maximum number of worker processes to use for multiprocessing. Only used if
exis Noneex – A concurrent.futures ProcessPoolExecutor
- Yields:
The concurrent.futures ProcessPoolExecutor provided or a newly created one if None provided
- ccatkidlib.analysis.utils.multiprocess.package_results(results_dict: dict) Series
- ccatkidlib.analysis.utils.multiprocess.process_batches(func: Callable, *args, **kwargs) list[Any | Exception]
- ccatkidlib.analysis.utils.multiprocess.struct_batches(struct: pl.Struct, num_data_cols: int, batch_len: int, max_workers: int) list[list[np.ndarray]]
ccatkidlib.analysis.utils.pair module
Library of helper functions for getting ccatkidlib data files and pairing with corresponding configuration files.
- ccatkidlib.analysis.utils.pair.get_config(path: str | PosixPath, all_cfg: bool = False) list[str]
Get the config files associated with the specified data file.
- Parameters:
path (str | pathlib.PosixPath) – Path of data file
all_cfg (bool, optional) – Whether to return config files for all drones. Defaults to False.
- Returns:
List of config file paths (io_cfg, drone_cfg(s), and ext_cfg) associated with the specified data file
- Return type:
list[str]
- ccatkidlib.analysis.utils.pair.get_data_file(com_to: str, timestamp: str | int, data_type: str, data_dir: str = '**', date: str = '**', sess_id: str = '**', root_data_dir: str = '/') list[str]
Get a ccatkidlib data file based on provided path information.
- Parameters:
com_to (str) – Drone that took the data. In form ‘Board.Drone’
timestamp (str | int) – Timestamp of data file
data_type (str) – Type of data file. Should be one of ‘vna’, ‘targ’, ‘timestream’.
data_dir (str, optional) – Directory where data is stored. Defaults to wildcard ‘**’
date (str, optional) – Date data was taken. Defaults to wildcard ‘**’
sess_id (str, optional) – ccatkidlib session ID of data. Defaults to wildcard ‘**’
root_data_dir (str, optional) – Root directory where data is stored. Defaults to ‘/’
- Returns:
Path of found data file. Returns ‘invalid/path’ if data file not found.
- Return type:
str
- ccatkidlib.analysis.utils.pair.get_sess_dir(sess_id, data_dir: str = '**', date: str = '**', root_data_dir: str = '/') str
- ccatkidlib.analysis.utils.pair.get_sweep(path: str | PosixPath, **kwargs)
- ccatkidlib.analysis.utils.pair.replace_root(path: str | PosixPath, old_root: str, new_root: str)
Replace the root directory of a file path with a new root
- Parameters:
path (str | pathlib.PosixPath) – Original file path
old_root (str) – Old root directory of file path to be replaced
new_root (str) – New root directory to replace the old root
- Returns:
New file path with the root directory replaced. If the new file path does not exist, returns the original path.
- Return type:
return (str)
ccatkidlib.analysis.utils.pickle module
- ccatkidlib.analysis.utils.pickle.multi_dump(network: Network, pickle_name: str, num_segments: int = 1, transform: Callable[[Network], Network] | None = None) None
Segment specified
Networkand transform/pickle segments individually. Specifically, multipleNetworkobjects are created, each with a subset of theDetectorsin the .data DataFrame- Parameters:
network –
Networkobject to segment and run transformations on/picklepickle_name – Name of pickle file
num_segments – Number of segments to split
Networkobject intotransform – Function to run on each segment. Must return a
Network