Introduction
argmin is a numerical optimization library written entirely in Rust.
Its goal is to offer a wide range of optimization algorithms with a consistent interface.
It is type-agnostic by design, meaning that any type and/or math backend, such as nalgebra
or ndarray
can be used -- even your own.
Observers allow one to track the progress of iterations, either by using one of the provided ones for logging to screen or disk or by implementing your own.
An optional checkpointing mechanism helps to mitigate the negative effects of crashes in unstable computing environments.
Due to Rusts powerful generics and traits, most features can be exchanged by your own tailored implementations.
argmin is designed to simplify the implementation of optimization algorithms and as such can also be used as a toolbox for the development of new algorithms. One can focus on the algorithm itself, while the handling of termination, parameter vectors, populations, gradients, Jacobians and Hessians is taken care of by the library.
IMPORTANT NOTE
This book covers version 0.10 of argmin! Parts of this book may not apply to versions below 0.10.
The argmin ecosystem
The ecosystem consists of a number of crates:
- argmin: Optimization algorithms and framework
- argmin-math: Interface for math backend abstraction and implementations for various versions of ndarray, nalgebra and
Vec
s. - argmin-testfunctions: A collection of test functions
- finitediff: Finite differentiation
- modcholesky: Modified cholesky decompositions
Algorithms
argmin comes with a number of line searches (Backtracking, More-Thuente, Hager-Zhang, trust region methods (Cauchy point, Dogleg, Steihaug), Steepest descent, (Nonlinear) conjugate gradient, Newton method, Newton-CG, Quasi-Newton methods (BFGS, L-BFGS, DFP, SR1-TrustRegion), Gauss-Newton methods (with and without line search), Golden-section search, Landweber, Brents optimization and root finding methods, Nelder-Mead, Simulated Annealing, Particle Swarm Optimization and CMA-ES.
For a complete and up-to-date list of all algorithms please visit the API documentation.
Examples for each algorithm can be found on Github. Make sure to choose the tag matching the argmin version you are using.
Documentation
This book is a guide on how to use argmins algorithms as well as on how to implement algorithms using argmins framework. For detailed information on specific algorithms or traits, please refer to argmins API documentation.
The argmin-math documentation outlines the abstractions over the math backends and the argmin-testfunctions API documentation lists all available test functions.
For details on how to use finitediff
for finite differentiation, please refer to the corresponding API documentation.
The documentation of modcholesky
can be found here.
Discord server
Feel free to join the argmin Discord!
License
Both the source code as well as the documentation are licensed under either of
- Apache License, Version 2.0, ([LICENSE-APACHE][__link35] or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License ([LICENSE-MIT][__link37] or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Getting started
This chapter shows how to integrate argmin into your project and how to define an optimization problem and apply a solver to it. It also show cases other features such as observers and checkpointing.
Using argmin
In order to use argmin, one needs to add both argmin
and argmin-math
to Cargo.toml
:
[dependencies]
argmin = { version = "0.10" }
argmin-math = { version = "0.4", features = ["ndarray_latest", "nalgebra_latest"] }
or, for the current development version:
[dependencies]
argmin = { git = "https://github.com/argmin-rs/argmin" }
argmin-math = { git = "https://github.com/argmin-rs/argmin", features = ["ndarray_latest", "nalgebra_latest"] }
Via adding argmin-math
one can choose which math backend should be available.
The examples above use both the ndarray_latest
and nalgebra_latest
features of argmin-math
.
For details on which options are available in argmin-math
, please refer to the documentation.
Crate features
argmin offers a number of features which can be enabled or disabled depending on your needs.
Optional
serde1
: Support forserde
. Needed for checkpointing. Deactivating this feature leads to fewer dependencies and can lower compilation time, but it will also disable checkpointing.ctrlc
: This feature uses thectrlc
crate to properly stop the optimization (and return the current best result) after pressingCtrl+C
during an optimization run.rayon
: This feature addsrayon
as a depenceny and allows for parallel computation of cost functions, operators, gradients, Jacobians and Hessians. Note that only solvers that operate on multiple parameter vectors per iteration benefit from this feature (e.g. Particle Swarm Optimization).full
: Enables all default and optional features.
Experimental support for compiling to WebAssembly
Compiling to WASM requires the feature wasm-bindgen
.
WASM support is still experimental. Please report any issues you encounter when using argmin in a WASM context.
Which math backend to use
argmin offers abstractions over basic Vec
s, ndarray
and nalgebra
types.
For performance reasons, the latter two should be preferred. Which one to use is a matter of taste and may depend on what you are already using.
Vec
s on the other hand do not have very efficient implementations for the different mathematical operations and therefore are not well suited for solvers which heavily rely on matrix operations.
However, Vec
s are suitable for solvers such as Simulated Annealing and Particle Swarm Optimization, which mainly operate on the parameter vectors themselves.
Examples
Every solver and most features of argmin are showcased in the examples directory.
Each example is a dedicated Rust crate with a corresponding Cargo.toml
, which includes all relevant dependencies to run it.
Make sure that the examples you are looking at match the argmin version you are using be choosing the appropriate Git version tag on Github.
Basic concept
There are three components needed for solving an optimization problem in argmin:
- A definition of the optimization problem / model
- A solver
- An executor
The Executor applies the solver to the optimization problem. It also accepts observers and checkpointing mechanisms, as well as an initial guess of the parameter vector, the cost function value at that initial guess, gradient, and so on.
A solver is anything that implements the Solver trait. This trait defines how the optimization algorithm is initialized, how a single iteration is performed and when and how to terminate the iterations.
The optimization problem needs to implement a subset of the traits
CostFunction
,
Gradient
,
Jacobian
,
Hessian
, and
Operator
.
Which subset is needed is given by the requirements of the solver.
For example, SteepestDescent
, as a gradient descent method, requires CostFunction
and Gradient
, while Newton's method expects Gradient
and Hessian
.
Defining an optimization problem
Depending on the requirements of the solver that is to be used, the optimization problem needs to implement a subset of the traits
CostFunction
: Computes the cost or fitness for a parameter vectorp
Gradient
: Computes the gradient for a parameter vectorp
Jacobian
: Computes the Jacobian for a parameter vectorp
Hessian
: Computes the Hessian for a parameter vectorp
Operator
: Applies an operator to the parameter vectorp
Anneal
: Create a new parameter vector by "annealing" of the current parameter vectorp
(needed for SimulatedAnnealing).
Which subset is needed is given in the documentation of each solver.
Example
The following code snippet shows how to use the Rosenbrock test functions from argmin-testfunctions
in argmin.
For the sake of simplicity, this example will use the vec
math backend.
#![allow(unused)] fn main() { extern crate argmin; extern crate argmin_testfunctions; use argmin_testfunctions::{ rosenbrock, rosenbrock_derivative, rosenbrock_hessian }; use argmin::core::{Error, CostFunction, Gradient, Hessian}; /// First, we create a struct called `Rosenbrock` for your problem struct Rosenbrock {} /// Implement `CostFunction` for `Rosenbrock` /// /// First, we need to define the types which we will be using. Our parameter /// vector will be a `Vec` of `f64` values and our cost function value will /// be a 64 bit floating point value. /// This is reflected in the associated types `Param` and `Output`, respectively. /// /// The method `cost` then defines how the cost function is computed for a /// parameter vector `p`. Note that we have access to the fields `a` and `b` /// of `Rosenbrock`. impl CostFunction for Rosenbrock { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the return value computed by the cost function type Output = f64; /// Apply the cost function to a parameter `p` fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> { // Evaluate Rosenbrock function Ok(rosenbrock(p)) } } /// Implement `Gradient` for `Rosenbrock` /// /// Similarly to `CostFunction`, we need to define the type of our parameter /// vectors and of the gradient we are computing. Since the gradient is also /// a vector, it is of type `Vec<f64>` just like `Param`. impl Gradient for Rosenbrock { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the gradient type Gradient = Vec<f64>; /// Compute the gradient at parameter `p`. fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> { // Compute gradient of the Rosenbrock function Ok(rosenbrock_derivative(p)) } } /// Implement `Hessian` for `Rosenbrock` /// /// Again the types of the involved parameter vector and the Hessian needs to /// be defined. Since the Hessian is a 2D matrix, we use `Vec<Vec<f64>>` here. impl Hessian for Rosenbrock { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the Hessian type Hessian = Vec<Vec<f64>>; /// Compute the Hessian at parameter `p`. fn hessian(&self, p: &Self::Param) -> Result<Self::Hessian, Error> { // Compute Hessian of the Rosenbrock function Ok(rosenbrock_hessian(p)) } } }
The following example shows how to implement the Operator
trait when the operator is the 2x2 matrix [4.0, 1.0; 1.0, 3.0]
.
#![allow(unused)] fn main() { extern crate argmin; extern crate argmin_testfunctions; use argmin::core::{Error, Operator}; struct MyProblem {} impl Operator for MyProblem { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the output type Output = Vec<f64>; fn apply(&self, p: &Self::Param) -> Result<Self::Output, Error> { Ok(vec![4.0 * p[0] + 1.0 * p[1], 1.0 * p[0] + 3.0 * p[1]]) } } }
Parallel evaluation with bulk_*
methods
NOTE
So far only Particle Swarm Optimization allows parallel evaluations of parameter vectors. Therefore no increase in performance should be expected for other solvers.
All of the above mentioned traits come with additional methods which enable processing several parameter vectors at once.
These methods require the rayon
feature to be enabled. Without this feature, the methods resort to sequental processing of all inputs.
The methods are shown in the following table:
Trait | Single input | Multiple inputs |
---|---|---|
CostFunction | cost | bulk_cost |
Gradient | gradient | bulk_gradient |
Jacobian | jacobian | bulk_jacobian |
Hessian | hessian | bulk_hessian |
Operator | apply | bulk_apply |
Anneal | anneal | bulk_anneal |
These bulk_*
methods come with a default implementation, which essentially looks like this:
#![allow(unused)] fn main() { extern crate argmin; extern crate argmin_testfunctions; use argmin_testfunctions::{rosenbrock, rosenbrock_derivative, rosenbrock_hessian}; use argmin::core::{Error, CostFunction, SyncAlias, SendAlias}; struct Rosenbrock {} impl CostFunction for Rosenbrock { type Param = Vec<f64>; type Output = f64; /// Conventional cost function which only processes a single parameter vector fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> { // [ ... ] Ok(rosenbrock(p)) } //////////////////////////////////////////////////////////////////////////// /// Bulk cost function which processes an array of parameter vectors /// //////////////////////////////////////////////////////////////////////////// fn bulk_cost<'a, P>(&self, params: &'a [P]) -> Result<Vec<Self::Output>, Error> where P: std::borrow::Borrow<Self::Param> + SyncAlias, Self::Output: SendAlias, Self: SyncAlias, { #[cfg(feature = "rayon")] { if self.parallelize() { params.par_iter().map(|p| self.cost(p.borrow())).collect() } else { params.iter().map(|p| self.cost(p.borrow())).collect() } } #[cfg(not(feature = "rayon"))] { params.iter().map(|p| self.cost(p.borrow())).collect() } } /// Indicates whether parameter vectors passed to `bulk_cost` should be /// evaluated in parallel or sequentially. /// This allows one to turn off parallelization for individual traits, even if /// the `rayon` feature is enabled. fn parallelize(&self) -> bool { true } } }
If needed, the bulk_*
methods can be overwritten by custom implementations.
Besides the bulk_*
method, there is also a method parallelize
which by default returns true
.
This is a convenient way to turn parallelization off, even if the rayon
feature is enabled.
It allows one to for instance turn parallel processing on for CostFunction
and off for Gradient
.
To turn it off, simply overwrite parallelize
such that it returns false
.
Running a solver
The Executor
s constructor takes a solver and an optimization problem as input and applies the solver to the problem.
The initial state of the optimization run can be modified via the configure
method.
This method accepts a closure with the state as only parameter.
This allows one to provide initial parameter vectors, cost function values, Hessians, and so on via the closure.
There are different kinds/types of state and the particular kind of state used depends on the solver.
Most solvers internally use IterState
, but some (for instance Particle Swarm Optimization) use PopulationState
.
Please refer to the respective documentation for details on how to modify the state.
Once the Executor
is configured, the optimization is run via the run
method.
This method returns an OptimizationResult
which contains the provided problem, the solver and the final state.
Assuming the variable is called res
, the final parameter vector can be accessed via res.state().get_best_param()
and the corresponding cost function value via res.state().get_best_cost()
.
For an overview, OptimizationResult
s Display
implementation can be used to print the result: println!("{}", res)
.
The following example shows how to use the SteepestDescent
solver to solve a problem which implements CostFunction
and Gradient
(which are both required by the solver).
#![allow(unused_imports)] extern crate argmin; extern crate argmin_testfunctions; use argmin::core::{State, Error, Executor, CostFunction, Gradient}; use argmin::solver::gradientdescent::SteepestDescent; use argmin::solver::linesearch::MoreThuenteLineSearch; use argmin_testfunctions::{rosenbrock, rosenbrock_derivative}; struct MyProblem {} // Implement `CostFunction` for `MyProblem` impl CostFunction for MyProblem { // [...] /// Type of the parameter vector type Param = Vec<f64>; /// Type of the return value computed by the cost function type Output = f64; /// Apply the cost function to a parameter `p` fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> { Ok(rosenbrock(p)) } } // Implement `Gradient` for `MyProblem` impl Gradient for MyProblem { // [...] /// Type of the parameter vector type Param = Vec<f64>; /// Type of the return value computed by the cost function type Gradient = Vec<f64>; /// Compute the gradient at parameter `p`. fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> { Ok(rosenbrock_derivative(p).to_vec()) } } fn run() -> Result<(), Error> { // Create new instance of cost function let cost = MyProblem {}; // Define initial parameter vector let init_param: Vec<f64> = vec![-1.2, 1.0]; // Set up line search needed by `SteepestDescent` let linesearch = MoreThuenteLineSearch::new(); // Set up solver -- `SteepestDescent` requires a linesearch let solver = SteepestDescent::new(linesearch); // Create an `Executor` object let res = Executor::new(cost, solver) // Via `configure`, one has access to the internally used state. // This state can be initialized, for instance by providing an // initial parameter vector. // The maximum number of iterations is also set via this method. // In this particular case, the state exposed is of type `IterState`. // The documentation of `IterState` shows how this struct can be // manipulated. // Population based solvers use `PopulationState` instead of // `IterState`. .configure(|state| state // Set initial parameters (depending on the solver, // this may be required) .param(init_param) // Set maximum iterations to 10 // (optional, set to `std::u64::MAX` if not provided) .max_iters(10) // Set target cost. The solver stops when this cost // function value is reached (optional) .target_cost(0.0) ) // run the solver on the defined problem .run()?; // print result println!("{}", res); // Extract results from state // Best parameter vector let best = res.state().get_best_param().unwrap(); // Cost function value associated with best parameter vector let best_cost = res.state().get_best_cost(); // Check the execution status let termination_status = res.state().get_termination_status(); // Optionally, check why the optimizer terminated (if status is terminated) let termination_reason = res.state().get_termination_reason(); // Time needed for optimization let time_needed = res.state().get_time().unwrap(); // Total number of iterations needed let num_iterations = res.state().get_iter(); // Iteration number where the last best parameter vector was found let num_iterations_best = res.state().get_last_best_iter(); // Number of evaluation counts per method (Cost, Gradient) let function_evaluation_counts = res.state().get_func_counts(); Ok(()) } fn main() { if let Err(ref e) = run() { println!("{}", e); std::process::exit(1); } }
Optionally, Executor
allows one to terminate a run after a given timeout, which can be set with the timeout
method of Executor
.
The check whether the overall runtime exceeds the timeout is performed after every iteration, therefore the actual runtime can be longer than the set timeout.
In case of timeout, the run terminates with TerminationReason::Timeout
.
Observing progress
Argmin offers an interface to observe the state of the solver at initialization as well as after every iteration. This includes the parameter vector, gradient, Jacobian, Hessian, iteration number, cost values and many more general as well as solver-specific metrics. This interface can be used to implement loggers, send the information to a storage or to plot metrics.
The observer ParamWriter
saves the parameter vector
to disk and as such requires the parameter vector to be serializable.
This observer is available in the argmin-observer-paramwriter
crate.
The observer SlogLogger
logs the progress of the optimization to screen or to disk.
This can be found in the argmin-observer-slog
crate.
Writing to disk requires the serde1
feature to be enabled in argmin-observer-slog
.
The rate at which the progress of the solver is observed can be set via ObserverMode
,
which can be either Always
, Never
, NewBest
(whenever a new best solution is found) or Every(i)
which means every i
th iteration.
Custom observers can be used as well by implementing the Observe
trait
(see the chapter on implementing an observer for details).
The following example shows how to add an observer to an Executor
which logs progress to the terminal.
The mode ObserverMode::Always
ensures that every iteration is printed to screen.
Multiple observers can be added to a single Executor
.
#![allow(unused_imports)] extern crate argmin; extern crate argmin_testfunctions; use argmin::core::{Error, Executor, CostFunction, Gradient}; use argmin::core::observers::ObserverMode; use argmin_observer_slog::SlogLogger; use argmin::solver::gradientdescent::SteepestDescent; use argmin::solver::linesearch::MoreThuenteLineSearch; use argmin_testfunctions::{rosenbrock, rosenbrock_derivative}; struct Rosenbrock {} /// Implement `CostFunction` for `Rosenbrock` impl CostFunction for Rosenbrock { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the return value computed by the cost function type Output = f64; /// Apply the cost function to a parameter `p` fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> { Ok(rosenbrock(p)) } } /// Implement `Gradient` for `Rosenbrock` impl Gradient for Rosenbrock { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the return value computed by the cost function type Gradient = Vec<f64>; /// Compute the gradient at parameter `p`. fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> { Ok(rosenbrock_derivative(p)) } } fn run() -> Result<(), Error> { // Define cost function (must implement `CostFunction` and `Gradient`) let cost = Rosenbrock {}; // Define initial parameter vector let init_param: Vec<f64> = vec![-1.2, 1.0]; // Set up line search let linesearch = MoreThuenteLineSearch::new(); // Set up solver let solver = SteepestDescent::new(linesearch); // [...] let res = Executor::new(cost, solver) .configure(|state| state.param(init_param).max_iters(10)) // Add an observer which will log all iterations to the terminal .add_observer(SlogLogger::term(), ObserverMode::Always) .run()?; println!("{}", res); Ok(()) } fn main() { if let Err(ref e) = run() { println!("{}", e); std::process::exit(1); } }
Checkpointing
Checkpointing is a useful mechanism for mitigating the effects of crashes when software is run in an unstable environment, particularly for long run times. Checkpoints are snapshots of the current state of the optimization which can be resumed from in case of a crash. These checkpoints are saved regularly at a user-chosen frequency.
Currently only saving checkpoints to disk with FileCheckpoint
is provided
in the argmin-checkpointing-file
crate.
Via the Checkpoint
trait other checkpointing approaches can be implemented (see the chapter on implementing a checkpointing method for details).
The CheckpointingFrequency
defines how often checkpoints are saved and can be chosen to be either Always
(every iteration), Every(u64)
(every Nth iteration) or Never
.
The following example shows how the checkpointing
method is used to configure and activate checkpointing.
If no checkpoint is available on disk yet, an optimization will be started from scratch.
If the run crashes and a checkpoint is found on disk, then it will resume from the checkpoint.
Example
extern crate argmin; extern crate argmin_testfunctions; use argmin::core::{CostFunction, Error, Executor, Gradient, observers::ObserverMode}; #[cfg(feature = "serde1")] use argmin::core::checkpointing::CheckpointingFrequency; use argmin_checkpointing_file::FileCheckpoint; use argmin_observer_slog::SlogLogger; use argmin::solver::landweber::Landweber; use argmin_testfunctions::{rosenbrock, rosenbrock_derivative}; #[derive(Default)] struct Rosenbrock {} /// Implement `CostFunction` for `Rosenbrock` impl CostFunction for Rosenbrock { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the return value computed by the cost function type Output = f64; /// Apply the cost function to a parameter `p` fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> { Ok(rosenbrock(p)) } } /// Implement `Gradient` for `Rosenbrock` impl Gradient for Rosenbrock { /// Type of the parameter vector type Param = Vec<f64>; /// Type of the return value computed by the cost function type Gradient = Vec<f64>; /// Compute the gradient at parameter `p`. fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> { Ok(rosenbrock_derivative(p).to_vec()) } } fn run() -> Result<(), Error> { // define initial parameter vector let init_param: Vec<f64> = vec![1.2, 1.2]; let my_optimization_problem = Rosenbrock {}; let iters = 35; let solver = Landweber::new(0.001); // [...] #[cfg(feature = "serde1")] let checkpoint = FileCheckpoint::new( // Directory ".checkpoints", // File base name "optim", // How often to save a checkpoint (in this case every 20 iterations) CheckpointingFrequency::Every(20) ); #[cfg(feature = "serde1")] let res = Executor::new(my_optimization_problem, solver) .configure(|state| state.param(init_param).max_iters(iters)) .checkpointing(checkpoint) .run()?; // [...] Ok(()) } fn main() { if let Err(ref e) = run() { println!("{}", e); } }
Advanced topics
Implementing a solver
In this section we are going to implement the Landweber solver, which essentially is a special form of gradient descent. In iteration \( k \), the new parameter vector \( x_{k+1} \) is calculated from the previous parameter vector \( x_k \) and the gradient at \( x_k \) according to the following update rule:
\[ x_{k+1} = x_k - \omega * \nabla f(x_k) \]
In order to implement this using argmin, one first needs to define the struct Landweber
which holds data specific to the solver (the step length \( \omega \)).
Then, the Solver
trait needs to be implemented for this struct.
The Solver
trait consists of several methods; however, not all of them need to be implemented since most come with default implementations.
NAME
: a&'static str
which holds the solvers name (mainly needed for the observers).init(...)
: Run before the the actual iterations and initializes the solver. Does nothing by default.next_iter(...)
: One iteration of the solver. Will be executed by theExecutor
until a stopping criterion is met.terminate(...)
: Solver specific stopping criteria. This method is run after every iteration. Note that one can also terminate from withinnext_iter
if necessary.terminate_internal(...)
: By default callsterminate
and in addition checks if the maximum number of iterations was reached or if the best cost function value is below the target cost value. Should only be overwritten if absolutely necessary.
Both init
and next_iter
have access to the optimization problem (problem
) as well as the internal state (state
).
The methods terminate
and terminate_internal
only have access to state
.
The function parameter problem
is a wrapped version of the optimization problem and as such gives access to the cost function, gradient, Hessian, Jacobian,...).
It also keeps track of how often each of these is called.
Via state
the solver has access to the current parameter vector, the current best parameter vector, gradient, Hessian, Jacobian, population, the current iteration number, and so on.
The state
can be modified (for instance a new parameter vector is set) and is then returned by both init
and next_iter
.
The Executor
then takes care of updating the state properly, for instance by updating the current best parameter vector if the new parameter vector is better than the previous best.
It is advisable to design the solver such that it is generic over the actual type of the parameter vector, gradient, and so on.
The current naming convention for generics in argmin is as follows:
O
: Optimization problemP
: Parameter vectorG
: GradientJ
: JacobianH
: HessianF
: Floats (f32
orf64
)
These individual generic parameters are then constrained by type constraints.
For instance, the Landweber iteration requires the problem O
to implement Gradient
, therefore a trait bound of the form O: Gradient<Param = P, Gradient = G>
is necessary.
From the Landweber update formula, we know that a scaled subtraction of two vectors is required.
This must be represented in form of a trait bound as well: P: ArgminScaledSub<G, F, P>
.
ArgminScaledSub
is a trait from argmin-math
which represents a scaled subtraction.
With this trait bound, we require that it must be possible to subtract a value of type G
scaled with a value of type F
from a value of type P
, resulting in a value of type P
.
The generic type F
represents floating point value and therefore allows users to choose which precision they want.
Implementing the algorithm is straightforward: First we get the current parameter vector xk
from the state via state.take_param()
.
Note that take_param
moves the parameter vector from the state
into xk
, therefore one needs to make sure to move the updated parameter vector into state
at the end of next_iter
via state.param(...)
.
Landweber requires the user to provide an initial parameter vector.
If this is not the case than we return an error to inform the user.
Then the gradient grad
is computed by calling problem.gradient(...)
on the parameter vector.
This will return the gradient and internally increase the gradient function evaluation count.
We compute the updated parameter vector xkp1
by computing xk.scaled_sub(&self.omega, &grad)
(which is possible because of the ArgminScaledSub
trait bound introduced before).
Finally, the state is updated via state.param(xkp1)
and returned by the function.
#![allow(unused)] fn main() { extern crate argmin; use argmin::core::{ ArgminFloat, KV, Error, Gradient, IterState, Problem, Solver, State }; use argmin::argmin_error_closure; use serde::{Deserialize, Serialize}; use argmin_math::ArgminScaledSub; // Define a struct which holds any parameters/data which are needed during the // execution of the solver. Note that this does not include parameter vectors, // gradients, Hessians, cost function values and so on, as those will be // handled by the `Executor` and its internal state. #[derive(Serialize, Deserialize)] pub struct Landweber<F> { /// Step length omega: F, } impl<F> Landweber<F> { /// Constructor pub fn new(omega: F) -> Self { Landweber { omega } } } impl<O, F, P, G> Solver<O, IterState<P, G, (), (), (), F>> for Landweber<F> where // The Landweber solver requires `O` to implement `Gradient`. // `P` and `G` indicate the types of the parameter vector and gradient, // respectively. O: Gradient<Param = P, Gradient = G>, // The parameter vector of type `P` needs to implement `ArgminScaledSub` // because of the update formula P: Clone + ArgminScaledSub<G, F, P>, // `F` is the floating point type (`f32` or `f64`) F: ArgminFloat, { // This gives the solver a name which will be used for logging const NAME: &'static str = "Landweber"; // Defines the computations performed in a single iteration. fn next_iter( &mut self, // This gives access to the problem supplied to the `Executor`. `O` implements // `Gradient` and `Problem` takes care of counting the calls to the respective // functions. problem: &mut Problem<O>, // Current state of the optimization. This gives access to the parameter // vector, gradient, Hessian and cost function value of the current, // previous and best iteration as well as current iteration number, and // many more. mut state: IterState<P, G, (), (), (), F>, ) -> Result<(IterState<P, G, (), (), (), F>, Option<KV>), Error> { // First we obtain the current parameter vector from the `state` struct (`x_k`). // Landweber requires an initial parameter vector. Return an error if this was // not provided by the user. let xk = state.take_param().ok_or_else(argmin_error_closure!( NotInitialized, "Initial parameter vector required!" ))?; // Then we compute the gradient at `x_k` (`\nabla f(x_k)`) let grad = problem.gradient(&xk)?; // Now subtract `\nabla f(x_k)` scaled by `omega` from `x_k` // to compute `x_{k+1}` let xkp1 = xk.scaled_sub(&self.omega, &grad); // Return new the updated `state` Ok((state.param(xkp1), None)) } } }
Another example with a more complex solver which requires init
as well as returning a KV
is (hopefully) coming soon!
Implementing an observer
Coming soon!
Implementing a checkpointing method
Coming soon!
Contributing
This crate is looking for contributors! Potential projects can be found in the Github issues, but feel free to suggest your own ideas. Besides adding optimization methods and new features, other contributions are also highly welcome, for instance improving performance, documentation, writing examples (with real world problems), developing tests, adding observers, implementing a C interface or Python wrappers. Bug reports (and fixes) are of course also highly appreciated.
Running the tests
The repository is organized as a workspace with the two crates argmin
and argmin-math
.
It is recommended to run the tests for both crates individually, as this simplifies the choice of features.
In case of argmin
, all combinations of features should lead to working tests.
For argmin-math
, this is not the case, since some of the features are mutually exclusive.
Therefore it is recommended to run the tests for argmin
from the root directory of the repository like this:
cargo test -p argmin --all-features
This will work for --all-features
and any other combination of features.
Note that not all tests will run if only a subset of the features is enabled.
In terms of argmin-math
, one can just test the default features:
cargo test -p argmin-math
Or the default features plus the latest ndarray
/nalgebra
backends:
cargo test -p argmin-math --features "latest_all"
Individual backends can be tested as well; however, care has to be taken to not add two different versions of the same backend, as that may not work.