HPO API¶
HPO Inputs¶
Params¶
An example of a fully constructed params
argument:
from crayai import hpo
my_params = hpo.Params([['--learning_rate', 1e-4, (1e-6, 1)]
['--dropout_rate', (0.1, 1)],
['--optimizer', 'sgd', ['sgd', 'gd', 'adam']])
-
class
crayai.hpo.params.
Params
(params)¶ The params class stores the hyperparameters to be optimized with a chosen strategy.
- Parameters
params – List of lists of hyperparameter information describing the flags associated with hyperparameters, their default values, and their possible values.
Each element of the
params
list must follow one of these formats:# Hyperparameter search space expressed as tuple of bounds [flag_string, default_value, (lower_bound, upper_bound)] # Hyperparameter search space expressed as list of values [flag_string, default_value, [value1, value2, value3, .., valueN]] # Hyperparameter search space without a default value chooses an initial value at random [flag_string, [value1, value2, value3, .., valueN]] [flag_string, (lower_bound, upper_bound)]
-
detect_cycle
(self, g) → bool¶ Determine if there is any cycle in the dependency graph
- Args:
g: nx.DiGraph object
- Returns:
bool value
-
get_dependency_graph_as_str
(self) → str¶ Return concatenated strings of condition separated by ;
-
get_ordered_param_when_valid
(self) → List[str]¶ Return a list of ordered parameters when there is no cycle in the graph
-
get_params
(self)¶ Get hyperparameters.
- Returns:
dict(string, string): The key of the dictionary is the name of the hyperparameter e.g
--learning_rate
, and the value is the value of the hyperparameter, e.g.0.01
.
-
serialize_graph
(self, graph) → List[str]¶ Returns an ordered list of nodes. Assumes graph has no cycle
Evaluator¶
-
class
crayai.hpo.evaluator.
Evaluator
(cmd, **kwargs)¶ The evaluator class defines how to evaluate a set of hyperparameters by running the kernel program (model training script) with command line arguments.
- Parameters
- cmd: str
Shell command used to evaluate the hyperparameters (without hyperparameter flags). For example,
python relative/path/to/model.py
. The path to the kernel script should be a relative path.If
src_path
is defined,cmd
should be set as if it is run fromsrc_path
.If
src_path
is not defined,cmd
should be set as if it is run from the current working directory.
- src_path: str, optional (default=””)
Path to source files.
src_path
must be defined in order to userun_path
. Ifsrc_path=""
, all evaluations take place in the current working directory. Can be a relative or absolute path.- run_path: str, optional (default=’run’)
Top-level workspace directory to create subdirectories for running evaluations and generating log files in.
src_path
must be set in order for run_path to be used, otherwise a warning will be generated and all runs will take place in the current working directory. Can be a relative of absolute path.- metric: Any, optional (default={‘FoM’: 1.0})
Dictionary (or a string) containing evaluation metrics and their weights to use during hyperparameter optimization. A weighted sum is used when multiple metrics are specified by the user. When a string is specified (e.g., ‘f1: ‘), it is considered as a single metric with 100% weight assigned to it. Metric names are assumed to be unique strings that are searched in the evaluation output. When multiple metrics are specified, CrayAI should be built with regex support i.e., CHPL_REGEXP=re2. Given a single metric is used, evaluation output is parsed using regex when CHPL_REGEXP=re2. Otherwise, string matching is used.
- checkpoint: str, optional (default=””)
Path to checkpoint directory per workspace. Required for using PBT optimizer.
- nodes: int, optional (default=0)
Number of nodes to allocate for distributed training. Ignored when using an existing allocation. Setting
nodes>1
withworkload_manager='local'
will generate an error. Designates the number of pods when using Kubernetes.If
<=1
, then allocate1
node.If
>1
, then allocate that many nodes.
- nodes_per_eval: int, optional, (default=0)
Number of nodes to run for each evaluation. Only applicable if evaluation supports distributed execution. If nodes_per_eval is not set then one node will be used per evaluation unless nodes per evaluation is a hyperparameter.
- num_parallel_evals: int, optional, (default=0)
Number of evaluations to run in parallel.
If
0
, then runnodes/nodes_per_eval
evaluations in parallel.If
>0
, then run that many evaluations in parallel.
- workload_manager: string, optional (default=’auto’)
Workload manager to be used for acquiring and managing allocations.
If ‘auto’, then detect workload manager; run locally if no workload_manager is found.
If ‘local’, then use no workload_manager (locally).
If ‘slurm’, use slurm workload manager.
If ‘pbs’, use PBS workload manager.
If ‘k8s’, use Kubernetes as workload manager (must also use ‘k8s’ as the launcher).
- launcher: string, optional, (default=’auto’)
Launcher to be used for executing evaluations. This can be left as ‘auto’ unless using a different launcher than the workload manager.
If ‘auto’, then inherit from the workload_manager.
If ‘local’, then run with no launcher (locally).
If ‘slurm’, use slurm (
srun
) as launcher.If ‘urika’, use urika (
run_training
) as launcher.If ‘k8s’, use Kubernetes as the launcher (must also use ‘k8s as the workload manager).
- workload_image: string, optional (default=””)
The image containing the workload platform to be used in the evaluation (ex. TensorFlow, PyTorch, etc.). This should be the specific name of the image in the registry, ex: ‘shasta-tensorflowv1.15-ubuntu:latest’. Currently only supported when the ‘workload_manager’ is ‘k8s’ and is a requirement for this option.
- launch_args: string, optional (default=””)
Flags to pass to launcher command.
- alloc_jobid: int, optional, (default=0)
Job ID specifiying what allocation to use. Currently only supports
workload_manager='slurm'
. Can be omitted if job id is available through environment variables such asSLURM_JOBID
.- alloc_timeout: int, optional, (default=30)
Number of minutes requested in allocation.
- alloc_args: string, optional, (default=””)
Additional arguments to be passed to the allocation command. For example
alloc_args='-C haswell'
.- timeout: real (default=0.0)
Time budget for all evaluations (minutes)
- flag: dict, optional, (default={})
Flags used to pass information (e.g. nodes used in an evaluation) from the evaluator to the kernel script. Possible keys for flags are: ‘nodes_per_evaluation’.
For example, setting
flag={'nodes_per_evaluation': '--N'}
allows nodes per evaluation to be represented by the flag--N
.- verbose: bool, optional, (default=False)
Enable verbose output for evaluation and job management.
- num_retries: int, optional (default=0)
Number of times a failed evaluation can be re-attempted.
Examples
from crayai import hpo # Local evaluation evaluator = hpo.Evaluator('python3 relative/path/to/train_model.py', workload_manager='local') # Distributed evaluation, where Evaluator will allocate nodes evaluator = hpo.Evaluator('python3 relative/path/to/train_model.py', workload_manager='slurm', nodes=4) # Evaluation with multiple metrics evaluator = hpo.Evaluator('python3 relative/path/to/train_model.py --FoM', metric={ 'Metric_categorical_crossentropy': 0.5, 'Metric_weighted_categorical_crossentropy': 0.5 } )
Condition¶
An example of conditional hyperparameters
from crayai import hpo
from crayai.hpo import condition
evaluator = hpo.Evaluator('python source/sin.py')
params = hpo.Params([["-a", 1.0, (1, 10.0)],
["-b", 1.0, (1, 10.0)],
["-c", 1.0, (1, 10.0)],
["-d", 1.0, (1, 10.0)],
["-e", 1.0, (1, 10.0)],
["-f", 1.0, (1, 10.0)],
["-g", 1.0, (1, 10.0)]])
conditions = [ condition.greater_than('-b','-a', 2),
condition.greater_than('-b', '-c', 3),
condition.less_than('-f', '-d', 2),
condition.less_than('-e', '-b', 4),
condition.less_than('-a', '-g', 5),
condition.less_than('-c', '-g', 5)
]
params.add_conditions(conditions)
-
class
crayai.hpo.condition.
Condition
(dependent_param: str, parent_param: str, parent_param_value: Any, dependent_param_value: Any = None, operator: str = '')¶ Parent class for encoding dependency between hyperparameters
-
type_as_str
(self, value)¶ Get type as a clean string such as ‘int’, ‘float’ or ‘list(str)’
-
-
class
crayai.hpo.condition.
equals
(dependent_param: str, parent_param: str, parent_param_value: Any, dependent_param_value: Any = None, operator: str = '==')¶ Condition subclass where a hyperparameter is used only when parent hyperparameter meets equality condition (e.g., b | a == 1)
-
class
crayai.hpo.condition.
greater_than
(dependent_param: str, parent_param: str, parent_param_value: Any, dependent_param_value: Any = None, operator: str = '>')¶ Condition subclass where a hyperparameter is used only when parent hyperparameter is greater than a given value (e.g., b | a > 5.0)
-
class
crayai.hpo.condition.
inside
(dependent_param: str, parent_param: str, parent_param_value: Any, dependent_param_value: Any = None, operator: str = 'in')¶ Condition subclass where a hyperparameter is used only when parent hyperparameter are sampled from a list of values (e.g., b | a in [1, 2, 3, 4])
-
class
crayai.hpo.condition.
less_than
(dependent_param: str, parent_param: str, parent_param_value: Any, dependent_param_value: Any = None, operator: str = '<')¶ Condition subclass where a hyperparameter is used only when parent hyperparameter is less than a given value (e.g., b | a < 10.0)
-
class
crayai.hpo.condition.
not_equals
(dependent_param: str, parent_param: str, parent_param_value: Any, dependent_param_value: Any = None, operator: str = '!=')¶ Condition subclass where a hyperparameter is used only when parent hyperparameter meets inequality condition (e.g., b | a != 1)
HPO Strategies¶
Genetic HPO¶
Population-based Training¶
Population-based training (PBT) requires enabling checkpointing, which requires
some modifications to the model training program code, as well as additional
arguments to the Optimizer
constructor.
-
class
crayai.hpo.genetic_optimizer.
GeneticOptimizer
(evaluator=None, **kwargs)¶ Genetic optimizer
Employs a genetic search to hyperparameters to minimize the figure of merit (FoM). Hyperparameters can be optimized from a list of values (e.g. learning rate selected from a list of values
[1e-4, 1e-3, 1e-2, 1e-1]
) or a range of values (e.g. any real number between(1e-4, 1e-1)
). If hyperparameters are optimized from a list of values then only values in the list are searched and mutations occur with respect to their indices. For example, a hyperparameter at index 20 in a list of 100 elements will have a higher chance of mutating to a nearby index, such as 21 or 19.- Parameters
evaluator (Evaluator) – Evaluator instance
generations (int) – Number of generations. Defaults to
1000
.num_demes (int) – Number of distinct demes (populations). Defaults to
4
.pop_size (int) – Number of individuals per deme. Total number of individuals per generation is
num_demes * pop_size
. Defaults to64
.mutation_rate (float) – Probability of mutation per hyperparameter during creation of next generation. Can be
0.0
to1.0
. Defaults to0.05
(5%).crossover_rate (float) – Probability of crossover per hyperparameter during creation of next generation. Can be
0.0
to1.0
. Defaults to0.33
(33%).migration_interval (float) – Interval of migration between demes. Defaults to
5
.mul_mutation_bounds (list) – Bounds on mutation percentages. Index 0 is the upper bound of a small mutation, index 1 is a lower bound on large mutation, and index 2 is an upper bound on large mutation. Defaults to
[0.01, 0.1, 0.2]
([1%, 10%, 20%]).add_mutation_bounds (list) – Bounds on addition percentages. Index 0 is the upper bound of a small mutation, index 1 is a lower bound on large mutation, and index 2 is an upper bound on large mutation. Defaults to
[0.03, 0.03, 0.13]
([3%, 3%, 13%]).name (str) – Experiment name used as prefix for log filenames to record results of optimization. Defaults to empty string
""
.verbose (bool) – Enable verbose output. Defaults to
False
.
-
best_fom
= None¶ Field of merit associated with best hyperparameters
-
best_params
= None¶ Best set of hyperparameters found
Random Search¶
-
class
crayai.hpo.random_optimizer.
RandomOptimizer
(evaluator, **kwargs)¶ Random optimizer
Employs a random search to minimize the figure of merit (FoM).
- Parameters
evaluator (Evaluator) – Evaluator instance
num_iters (int) – Number of iterations to run. Defaults to
1000
.seed (int) – Seed for random number generator. Defaults to
0
, i.e. random seed used.name (str) – Experiment name used as prefix for log filenames to record results of optimization. Defaults to empty string
""
.verbose (bool) – Enable verbose output. Defaults to
False
.
-
best_fom
= None¶ Field of merit associated with best hyperparameters
-
best_params
= None¶ Best set of hyperparameters found
Grid Search¶
-
class
crayai.hpo.grid_optimizer.
GridOptimizer
(evaluator, **kwargs)¶ Grid optimizer
Employs a grid search to minimize the figure of merit (FoM).
- Parameters
evaluator (Evaluator) – Evaluator instance
grid_size (int) – Number of grid points to discretize for each hyperparameter. This argument does not impact lists of values. Defaults to
4
.chunk_size (int) – Number of grid points to evaluate per batch (chunk). Defaults to
1000
.name (str) – Experiment name used as prefix for log filenames to record results of optimization. Defaults to empty string
""
.verbose (bool) – Enable verbose output. Defaults to
False
.
-
best_fom
= None¶ Field of merit associated with best hyperparameters
-
best_params
= None¶ Best set of hyperparameters found.