Introduction

argmin is a numerical optimization library written entirely in Rust.

Its goal is to offer a wide range of optimization algorithms with a consistent interface. It is type-agnostic by design, meaning that any type and/or math backend, such as nalgebra or ndarray can be used -- even your own.

Observers allow one to track the progress of iterations, either by using one of the provided ones for logging to screen or disk or by implementing your own.

An optional checkpointing mechanism helps to mitigate the negative effects of crashes in unstable computing environments.

Due to Rusts powerful generics and traits, most features can be exchanged by your own tailored implementations.

argmin is designed to simplify the implementation of optimization algorithms and as such can also be used as a toolbox for the development of new algorithms. One can focus on the algorithm itself, while the handling of termination, parameter vectors, populations, gradients, Jacobians and Hessians is taken care of by the library.

IMPORTANT NOTE

This book covers version 0.10 of argmin! Parts of this book may not apply to versions below 0.10.

The argmin ecosystem

The ecosystem consists of a number of crates:

argmin: Optimization algorithms and framework
argmin-math: Interface for math backend abstraction and implementations for various versions of ndarray, nalgebra and Vecs.
argmin-testfunctions: A collection of test functions
finitediff: Finite differentiation
modcholesky: Modified cholesky decompositions

Algorithms

argmin comes with a number of line searches (Backtracking, More-Thuente, Hager-Zhang, trust region methods (Cauchy point, Dogleg, Steihaug), Steepest descent, (Nonlinear) conjugate gradient, Newton method, Newton-CG, Quasi-Newton methods (BFGS, L-BFGS, DFP, SR1-TrustRegion), Gauss-Newton methods (with and without line search), Golden-section search, Landweber, Brents optimization and root finding methods, Nelder-Mead, Simulated Annealing, Particle Swarm Optimization and CMA-ES.

For a complete and up-to-date list of all algorithms please visit the API documentation.

Examples for each algorithm can be found on Github. Make sure to choose the tag matching the argmin version you are using.

Documentation

This book is a guide on how to use argmins algorithms as well as on how to implement algorithms using argmins framework. For detailed information on specific algorithms or traits, please refer to argmins API documentation.

The argmin-math documentation outlines the abstractions over the math backends and the argmin-testfunctions API documentation lists all available test functions.

For details on how to use finitediff for finite differentiation, please refer to the corresponding API documentation.

The documentation of modcholesky can be found here.

Discord server

Feel free to join the argmin Discord!

License

Both the source code as well as the documentation are licensed under either of

Apache License, Version 2.0, ([LICENSE-APACHE][__link35] or http://www.apache.org/licenses/LICENSE-2.0)
MIT License ([LICENSE-MIT][__link37] or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Getting started

This chapter shows how to integrate argmin into your project and how to define an optimization problem and apply a solver to it. It also show cases other features such as observers and checkpointing.

Using argmin

In order to use argmin, one needs to add both argmin and argmin-math to Cargo.toml:

[dependencies]
argmin = { version = "0.10" }
argmin-math = { version = "0.4", features = ["ndarray_latest", "nalgebra_latest"] }

or, for the current development version:

[dependencies]
argmin = { git = "https://github.com/argmin-rs/argmin" }
argmin-math = { git = "https://github.com/argmin-rs/argmin", features = ["ndarray_latest", "nalgebra_latest"] }

Via adding argmin-math one can choose which math backend should be available. The examples above use both the ndarray_latest and nalgebra_latest features of argmin-math. For details on which options are available in argmin-math, please refer to the documentation.

Crate features

argmin offers a number of features which can be enabled or disabled depending on your needs.

Optional

serde1: Support for serde. Needed for checkpointing. Deactivating this feature leads to fewer dependencies and can lower compilation time, but it will also disable checkpointing.
ctrlc: This feature uses the ctrlc crate to properly stop the optimization (and return the current best result) after pressing Ctrl+C during an optimization run.
rayon: This feature adds rayon as a depenceny and allows for parallel computation of cost functions, operators, gradients, Jacobians and Hessians. Note that only solvers that operate on multiple parameter vectors per iteration benefit from this feature (e.g. Particle Swarm Optimization).
full: Enables all default and optional features.

Experimental support for compiling to WebAssembly

Compiling to WASM requires the feature wasm-bindgen. WASM support is still experimental. Please report any issues you encounter when using argmin in a WASM context.

Which math backend to use

argmin offers abstractions over basic Vecs, ndarray and nalgebra types. For performance reasons, the latter two should be preferred. Which one to use is a matter of taste and may depend on what you are already using.

Vecs on the other hand do not have very efficient implementations for the different mathematical operations and therefore are not well suited for solvers which heavily rely on matrix operations. However, Vecs are suitable for solvers such as Simulated Annealing and Particle Swarm Optimization, which mainly operate on the parameter vectors themselves.

Examples

Every solver and most features of argmin are showcased in the examples directory. Each example is a dedicated Rust crate with a corresponding Cargo.toml, which includes all relevant dependencies to run it. Make sure that the examples you are looking at match the argmin version you are using be choosing the appropriate Git version tag on Github.

Basic concept

There are three components needed for solving an optimization problem in argmin:

A definition of the optimization problem / model
A solver
An executor

The Executor applies the solver to the optimization problem. It also accepts observers and checkpointing mechanisms, as well as an initial guess of the parameter vector, the cost function value at that initial guess, gradient, and so on.

A solver is anything that implements the Solver trait. This trait defines how the optimization algorithm is initialized, how a single iteration is performed and when and how to terminate the iterations.

The optimization problem needs to implement a subset of the traits CostFunction, Gradient, Jacobian, Hessian, and Operator. Which subset is needed is given by the requirements of the solver. For example, SteepestDescent, as a gradient descent method, requires CostFunction and Gradient, while Newton's method expects Gradient and Hessian.

Defining an optimization problem

Depending on the requirements of the solver that is to be used, the optimization problem needs to implement a subset of the traits

CostFunction: Computes the cost or fitness for a parameter vector p
Gradient: Computes the gradient for a parameter vector p
Jacobian: Computes the Jacobian for a parameter vector p
Hessian: Computes the Hessian for a parameter vector p
Operator: Applies an operator to the parameter vector p
Anneal: Create a new parameter vector by "annealing" of the current parameter vector p (needed for SimulatedAnnealing).

Which subset is needed is given in the documentation of each solver.

Example

The following code snippet shows how to use the Rosenbrock test functions from argmin-testfunctions in argmin. For the sake of simplicity, this example will use the vec math backend.


#![allow(unused)]
fn main() {
extern crate argmin;
extern crate argmin_testfunctions;
use argmin_testfunctions::{
    rosenbrock, rosenbrock_derivative, rosenbrock_hessian
};
use argmin::core::{Error, CostFunction, Gradient, Hessian};

/// First, we create a struct called `Rosenbrock` for your problem
struct Rosenbrock {}

/// Implement `CostFunction` for `Rosenbrock`
///
/// First, we need to define the types which we will be using. Our parameter
/// vector will be a `Vec` of `f64` values and our cost function value will 
/// be a 64 bit floating point value.
/// This is reflected in the associated types `Param` and `Output`, respectively.
///
/// The method `cost` then defines how the cost function is computed for a
/// parameter vector `p`. Note that we have access to the fields `a` and `b`
/// of `Rosenbrock`.
impl CostFunction for Rosenbrock {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the return value computed by the cost function
    type Output = f64;

    /// Apply the cost function to a parameter `p`
    fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> {
        // Evaluate Rosenbrock function
        Ok(rosenbrock(p))
    }
}

/// Implement `Gradient` for `Rosenbrock`
///
/// Similarly to `CostFunction`, we need to define the type of our parameter
/// vectors and of the gradient we are computing. Since the gradient is also
/// a vector, it is of type `Vec<f64>` just like `Param`.
impl Gradient for Rosenbrock {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the gradient
    type Gradient = Vec<f64>;

    /// Compute the gradient at parameter `p`.
    fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> {
        // Compute gradient of the Rosenbrock function
        Ok(rosenbrock_derivative(p))
    }
}

/// Implement `Hessian` for `Rosenbrock`
///
/// Again the types of the involved parameter vector and the Hessian needs to
/// be defined. Since the Hessian is a 2D matrix, we use `Vec<Vec<f64>>` here.
impl Hessian for Rosenbrock {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the Hessian
    type Hessian = Vec<Vec<f64>>;

    /// Compute the Hessian at parameter `p`.
    fn hessian(&self, p: &Self::Param) -> Result<Self::Hessian, Error> {
        // Compute Hessian of the Rosenbrock function
        Ok(rosenbrock_hessian(p))
    }
}
}

The following example shows how to implement the Operator trait when the operator is the 2x2 matrix [4.0, 1.0; 1.0, 3.0].


#![allow(unused)]
fn main() {
extern crate argmin;
extern crate argmin_testfunctions;
use argmin::core::{Error, Operator};

struct MyProblem {}

impl Operator for MyProblem {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the output
    type Output = Vec<f64>;

    fn apply(&self, p: &Self::Param) -> Result<Self::Output, Error> {
        Ok(vec![4.0 * p[0] + 1.0 * p[1], 1.0 * p[0] + 3.0 * p[1]])
    }
}
}

Parallel evaluation with `bulk_*` methods

NOTE

So far only Particle Swarm Optimization allows parallel evaluations of parameter vectors. Therefore no increase in performance should be expected for other solvers.

All of the above mentioned traits come with additional methods which enable processing several parameter vectors at once. These methods require the rayon feature to be enabled. Without this feature, the methods resort to sequental processing of all inputs. The methods are shown in the following table:

Trait	Single input	Multiple inputs
`CostFunction`	`cost`	`bulk_cost`
`Gradient`	`gradient`	`bulk_gradient`
`Jacobian`	`jacobian`	`bulk_jacobian`
`Hessian`	`hessian`	`bulk_hessian`
`Operator`	`apply`	`bulk_apply`
`Anneal`	`anneal`	`bulk_anneal`

These bulk_* methods come with a default implementation, which essentially looks like this:


#![allow(unused)]
fn main() {
extern crate argmin;
extern crate argmin_testfunctions;
use argmin_testfunctions::{rosenbrock, rosenbrock_derivative, rosenbrock_hessian};
use argmin::core::{Error, CostFunction, SyncAlias, SendAlias};
struct Rosenbrock {}

impl CostFunction for Rosenbrock {
    type Param = Vec<f64>;
    type Output = f64;

    /// Conventional cost function which only processes a single parameter vector
    fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> {
        // [ ... ]
        Ok(rosenbrock(p))
    }
    
    ////////////////////////////////////////////////////////////////////////////
    /// Bulk cost function which processes an array of parameter vectors     ///
    ////////////////////////////////////////////////////////////////////////////
    fn bulk_cost<'a, P>(&self, params: &'a [P]) -> Result<Vec<Self::Output>, Error>
    where
        P: std::borrow::Borrow<Self::Param> + SyncAlias,
        Self::Output: SendAlias,
        Self: SyncAlias,
    {
        #[cfg(feature = "rayon")]
        {
            if self.parallelize() {
                params.par_iter().map(|p| self.cost(p.borrow())).collect()
            } else {
                params.iter().map(|p| self.cost(p.borrow())).collect()
            }
        }
        #[cfg(not(feature = "rayon"))]
        {
            params.iter().map(|p| self.cost(p.borrow())).collect()
        }
    }
    
    /// Indicates whether parameter vectors passed to `bulk_cost` should be
    /// evaluated in parallel or sequentially.
    /// This allows one to turn off parallelization for individual traits, even if
    /// the `rayon` feature is enabled.
    fn parallelize(&self) -> bool {
        true
    }
}
}

If needed, the bulk_* methods can be overwritten by custom implementations.

Besides the bulk_* method, there is also a method parallelize which by default returns true. This is a convenient way to turn parallelization off, even if the rayon feature is enabled. It allows one to for instance turn parallel processing on for CostFunction and off for Gradient. To turn it off, simply overwrite parallelize such that it returns false.

Running a solver

The Executors constructor takes a solver and an optimization problem as input and applies the solver to the problem. The initial state of the optimization run can be modified via the configure method. This method accepts a closure with the state as only parameter. This allows one to provide initial parameter vectors, cost function values, Hessians, and so on via the closure. There are different kinds/types of state and the particular kind of state used depends on the solver. Most solvers internally use IterState, but some (for instance Particle Swarm Optimization) use PopulationState. Please refer to the respective documentation for details on how to modify the state.

Once the Executor is configured, the optimization is run via the run method. This method returns an OptimizationResult which contains the provided problem, the solver and the final state. Assuming the variable is called res, the final parameter vector can be accessed via res.state().get_best_param() and the corresponding cost function value via res.state().get_best_cost().

For an overview, OptimizationResults Display implementation can be used to print the result: println!("{}", res).

The following example shows how to use the SteepestDescent solver to solve a problem which implements CostFunction and Gradient (which are both required by the solver).

#![allow(unused_imports)]
extern crate argmin;
extern crate argmin_testfunctions;
use argmin::core::{State, Error, Executor, CostFunction, Gradient};
use argmin::solver::gradientdescent::SteepestDescent;
use argmin::solver::linesearch::MoreThuenteLineSearch;
use argmin_testfunctions::{rosenbrock, rosenbrock_derivative};

struct MyProblem {}

// Implement `CostFunction` for `MyProblem`
impl CostFunction for MyProblem {
      // [...]
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the return value computed by the cost function
    type Output = f64;

    /// Apply the cost function to a parameter `p`
    fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> {
        Ok(rosenbrock(p))
    }
}

// Implement `Gradient` for `MyProblem`
impl Gradient for MyProblem {
      // [...]
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the return value computed by the cost function
    type Gradient = Vec<f64>;

    /// Compute the gradient at parameter `p`.
    fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> {
        Ok(rosenbrock_derivative(p).to_vec())
    }
}

fn run() -> Result<(), Error> {

// Create new instance of cost function
let cost = MyProblem {};
 
// Define initial parameter vector
let init_param: Vec<f64> = vec![-1.2, 1.0];
 
// Set up line search needed by `SteepestDescent`
let linesearch = MoreThuenteLineSearch::new();
 
// Set up solver -- `SteepestDescent` requires a linesearch
let solver = SteepestDescent::new(linesearch);
 
// Create an `Executor` object 
let res = Executor::new(cost, solver)
    // Via `configure`, one has access to the internally used state.
    // This state can be initialized, for instance by providing an
    // initial parameter vector.
    // The maximum number of iterations is also set via this method.
    // In this particular case, the state exposed is of type `IterState`.
    // The documentation of `IterState` shows how this struct can be
    // manipulated.
    // Population based solvers use `PopulationState` instead of 
    // `IterState`.
    .configure(|state|
        state
            // Set initial parameters (depending on the solver,
            // this may be required)
            .param(init_param)
            // Set maximum iterations to 10
            // (optional, set to `std::u64::MAX` if not provided)
            .max_iters(10)
            // Set target cost. The solver stops when this cost
            // function value is reached (optional)
            .target_cost(0.0)
    )
    // run the solver on the defined problem
    .run()?;

// print result
println!("{}", res);

// Extract results from state

// Best parameter vector
let best = res.state().get_best_param().unwrap();

// Cost function value associated with best parameter vector
let best_cost = res.state().get_best_cost();

// Check the execution status
let termination_status = res.state().get_termination_status();

// Optionally, check why the optimizer terminated (if status is terminated) 
let termination_reason = res.state().get_termination_reason();

// Time needed for optimization
let time_needed = res.state().get_time().unwrap();

// Total number of iterations needed
let num_iterations = res.state().get_iter();

// Iteration number where the last best parameter vector was found
let num_iterations_best = res.state().get_last_best_iter();

// Number of evaluation counts per method (Cost, Gradient)
let function_evaluation_counts = res.state().get_func_counts();
    Ok(())
}

fn main() {
    if let Err(ref e) = run() {
        println!("{}", e);
        std::process::exit(1);
    }
}

Optionally, Executor allows one to terminate a run after a given timeout, which can be set with the timeout method of Executor. The check whether the overall runtime exceeds the timeout is performed after every iteration, therefore the actual runtime can be longer than the set timeout. In case of timeout, the run terminates with TerminationReason::Timeout.

Observing progress

Argmin offers an interface to observe the state of the solver at initialization as well as after every iteration. This includes the parameter vector, gradient, Jacobian, Hessian, iteration number, cost values and many more general as well as solver-specific metrics. This interface can be used to implement loggers, send the information to a storage or to plot metrics.

The observer ParamWriter saves the parameter vector to disk and as such requires the parameter vector to be serializable. This observer is available in the argmin-observer-paramwriter crate.

The observer SlogLogger logs the progress of the optimization to screen or to disk. This can be found in the argmin-observer-slog crate. Writing to disk requires the serde1 feature to be enabled in argmin-observer-slog.

The rate at which the progress of the solver is observed can be set via ObserverMode, which can be either Always, Never, NewBest (whenever a new best solution is found) or Every(i) which means every ith iteration.

Custom observers can be used as well by implementing the Observe trait (see the chapter on implementing an observer for details).

The following example shows how to add an observer to an Executor which logs progress to the terminal. The mode ObserverMode::Always ensures that every iteration is printed to screen. Multiple observers can be added to a single Executor.

#![allow(unused_imports)]
extern crate argmin;
extern crate argmin_testfunctions;
use argmin::core::{Error, Executor, CostFunction, Gradient};
use argmin::core::observers::ObserverMode;
use argmin_observer_slog::SlogLogger;
use argmin::solver::gradientdescent::SteepestDescent;
use argmin::solver::linesearch::MoreThuenteLineSearch;
use argmin_testfunctions::{rosenbrock, rosenbrock_derivative};

struct Rosenbrock {}

/// Implement `CostFunction` for `Rosenbrock`
impl CostFunction for Rosenbrock {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the return value computed by the cost function
    type Output = f64;

    /// Apply the cost function to a parameter `p`
    fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> {
        Ok(rosenbrock(p))
    }
}

/// Implement `Gradient` for `Rosenbrock`
impl Gradient for Rosenbrock {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the return value computed by the cost function
    type Gradient = Vec<f64>;

    /// Compute the gradient at parameter `p`.
    fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> {
        Ok(rosenbrock_derivative(p))
    }
}

fn run() -> Result<(), Error> {

// Define cost function (must implement `CostFunction` and `Gradient`)
let cost = Rosenbrock {};
 
// Define initial parameter vector
let init_param: Vec<f64> = vec![-1.2, 1.0];
 
// Set up line search
let linesearch = MoreThuenteLineSearch::new();
 
// Set up solver
let solver = SteepestDescent::new(linesearch);

// [...]

let res = Executor::new(cost, solver)
    .configure(|state| state.param(init_param).max_iters(10))
    // Add an observer which will log all iterations to the terminal
    .add_observer(SlogLogger::term(), ObserverMode::Always)
    .run()?;

println!("{}", res);
    Ok(())
}

fn main() {
    if let Err(ref e) = run() {
        println!("{}", e);
        std::process::exit(1);
    }
}

Checkpointing

Checkpointing is a useful mechanism for mitigating the effects of crashes when software is run in an unstable environment, particularly for long run times. Checkpoints are snapshots of the current state of the optimization which can be resumed from in case of a crash. These checkpoints are saved regularly at a user-chosen frequency.

Currently only saving checkpoints to disk with FileCheckpoint is provided in the argmin-checkpointing-file crate. Via the Checkpoint trait other checkpointing approaches can be implemented (see the chapter on implementing a checkpointing method for details).

The CheckpointingFrequency defines how often checkpoints are saved and can be chosen to be either Always (every iteration), Every(u64) (every Nth iteration) or Never.

The following example shows how the checkpointing method is used to configure and activate checkpointing. If no checkpoint is available on disk yet, an optimization will be started from scratch. If the run crashes and a checkpoint is found on disk, then it will resume from the checkpoint.

Example

extern crate argmin;
extern crate argmin_testfunctions;
use argmin::core::{CostFunction, Error, Executor, Gradient, observers::ObserverMode};
#[cfg(feature = "serde1")]
use argmin::core::checkpointing::CheckpointingFrequency;
use argmin_checkpointing_file::FileCheckpoint;
use argmin_observer_slog::SlogLogger;
use argmin::solver::landweber::Landweber;
use argmin_testfunctions::{rosenbrock, rosenbrock_derivative};

#[derive(Default)]
struct Rosenbrock {}

/// Implement `CostFunction` for `Rosenbrock`
impl CostFunction for Rosenbrock {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the return value computed by the cost function
    type Output = f64;

    /// Apply the cost function to a parameter `p`
    fn cost(&self, p: &Self::Param) -> Result<Self::Output, Error> {
        Ok(rosenbrock(p))
    }
}

/// Implement `Gradient` for `Rosenbrock`
impl Gradient for Rosenbrock {
    /// Type of the parameter vector
    type Param = Vec<f64>;
    /// Type of the return value computed by the cost function
    type Gradient = Vec<f64>;

    /// Compute the gradient at parameter `p`.
    fn gradient(&self, p: &Self::Param) -> Result<Self::Gradient, Error> {
        Ok(rosenbrock_derivative(p).to_vec())
    }
}

fn run() -> Result<(), Error> {
    // define initial parameter vector
    let init_param: Vec<f64> = vec![1.2, 1.2];
    let my_optimization_problem = Rosenbrock {};

    let iters = 35;
    let solver = Landweber::new(0.001);
// [...]

#[cfg(feature = "serde1")]
let checkpoint = FileCheckpoint::new(
    // Directory
    ".checkpoints",
    // File base name
    "optim",
    // How often to save a checkpoint (in this case every 20 iterations)
    CheckpointingFrequency::Every(20)
);


#[cfg(feature = "serde1")]
let res = Executor::new(my_optimization_problem, solver)
    .configure(|state| state.param(init_param).max_iters(iters))
    .checkpointing(checkpoint)
    .run()?;

// [...]

    Ok(())
}

fn main() {
    if let Err(ref e) = run() {
        println!("{}", e);
    }
}

Advanced topics

Implementing a solver

In this section we are going to implement the Landweber solver, which essentially is a special form of gradient descent. In iteration \( k \), the new parameter vector \( x_{k+1} \) is calculated from the previous parameter vector \( x_k \) and the gradient at \( x_k \) according to the following update rule:

\[ x_{k+1} = x_k - \omega * \nabla f(x_k) \]

In order to implement this using argmin, one first needs to define the struct Landweber which holds data specific to the solver (the step length \( \omega \)). Then, the Solver trait needs to be implemented for this struct.

The Solver trait consists of several methods; however, not all of them need to be implemented since most come with default implementations.

NAME: a &'static str which holds the solvers name (mainly needed for the observers).
init(...): Run before the the actual iterations and initializes the solver. Does nothing by default.
next_iter(...): One iteration of the solver. Will be executed by the Executor until a stopping criterion is met.
terminate(...): Solver specific stopping criteria. This method is run after every iteration. Note that one can also terminate from within next_iter if necessary.
terminate_internal(...): By default calls terminate and in addition checks if the maximum number of iterations was reached or if the best cost function value is below the target cost value. Should only be overwritten if absolutely necessary.

Both init and next_iter have access to the optimization problem (problem) as well as the internal state (state). The methods terminate and terminate_internal only have access to state.

The function parameter problem is a wrapped version of the optimization problem and as such gives access to the cost function, gradient, Hessian, Jacobian,...). It also keeps track of how often each of these is called.

Via state the solver has access to the current parameter vector, the current best parameter vector, gradient, Hessian, Jacobian, population, the current iteration number, and so on. The state can be modified (for instance a new parameter vector is set) and is then returned by both init and next_iter. The Executor then takes care of updating the state properly, for instance by updating the current best parameter vector if the new parameter vector is better than the previous best.

It is advisable to design the solver such that it is generic over the actual type of the parameter vector, gradient, and so on.

The current naming convention for generics in argmin is as follows:

O: Optimization problem
P: Parameter vector
G: Gradient
J: Jacobian
H: Hessian
F: Floats (f32 or f64)

These individual generic parameters are then constrained by type constraints. For instance, the Landweber iteration requires the problem O to implement Gradient, therefore a trait bound of the form O: Gradient<Param = P, Gradient = G> is necessary.

From the Landweber update formula, we know that a scaled subtraction of two vectors is required. This must be represented in form of a trait bound as well: P: ArgminScaledSub<G, F, P>. ArgminScaledSub is a trait from argmin-math which represents a scaled subtraction. With this trait bound, we require that it must be possible to subtract a value of type G scaled with a value of type F from a value of type P, resulting in a value of type P.

The generic type F represents floating point value and therefore allows users to choose which precision they want.

Implementing the algorithm is straightforward: First we get the current parameter vector xk from the state via state.take_param(). Note that take_param moves the parameter vector from the state into xk, therefore one needs to make sure to move the updated parameter vector into state at the end of next_iter via state.param(...). Landweber requires the user to provide an initial parameter vector. If this is not the case than we return an error to inform the user. Then the gradient grad is computed by calling problem.gradient(...) on the parameter vector. This will return the gradient and internally increase the gradient function evaluation count. We compute the updated parameter vector xkp1 by computing xk.scaled_sub(&self.omega, &grad) (which is possible because of the ArgminScaledSub trait bound introduced before). Finally, the state is updated via state.param(xkp1) and returned by the function.


#![allow(unused)]
fn main() {
extern crate argmin;

use argmin::core::{
    ArgminFloat, KV, Error, Gradient, IterState, Problem, Solver, State
};
use argmin::argmin_error_closure;
use serde::{Deserialize, Serialize};
use argmin_math::ArgminScaledSub;

// Define a struct which holds any parameters/data which are needed during the
// execution of the solver. Note that this does not include parameter vectors,
// gradients, Hessians, cost function values and so on, as those will be
// handled by the `Executor` and its internal state.
#[derive(Serialize, Deserialize)]
pub struct Landweber<F> {
    /// Step length
    omega: F,
}

impl<F> Landweber<F> {
    /// Constructor
    pub fn new(omega: F) -> Self {
        Landweber { omega }
    }
}

impl<O, F, P, G> Solver<O, IterState<P, G, (), (), (), F>> for Landweber<F>
where
    // The Landweber solver requires `O` to implement `Gradient`. 
    // `P` and `G` indicate the types of the parameter vector and gradient,
    // respectively.
    O: Gradient<Param = P, Gradient = G>,
    // The parameter vector of type `P` needs to implement `ArgminScaledSub`
    // because of the update formula
    P: Clone + ArgminScaledSub<G, F, P>,
    // `F` is the floating point type (`f32` or `f64`)
    F: ArgminFloat,
{
    // This gives the solver a name which will be used for logging
    const NAME: &'static str = "Landweber";

    // Defines the computations performed in a single iteration.
    fn next_iter(
        &mut self,
        // This gives access to the problem supplied to the `Executor`. `O` implements
        // `Gradient` and `Problem` takes care of counting the calls to the respective
        // functions.
        problem: &mut Problem<O>,
        // Current state of the optimization. This gives access to the parameter
        // vector, gradient, Hessian and cost function value of the current,
        // previous and best iteration as well as current iteration number, and
        // many more.
        mut state: IterState<P, G, (), (), (), F>,
    ) -> Result<(IterState<P, G, (), (), (), F>, Option<KV>), Error> {
        // First we obtain the current parameter vector from the `state` struct (`x_k`).
        // Landweber requires an initial parameter vector. Return an error if this was 
        // not provided by the user.
        let xk = state.take_param().ok_or_else(argmin_error_closure!(
            NotInitialized,
            "Initial parameter vector required!"
        ))?;
	
        // Then we compute the gradient at `x_k` (`\nabla f(x_k)`)
        let grad = problem.gradient(&xk)?;
	
        // Now subtract `\nabla f(x_k)` scaled by `omega` from `x_k`
        // to compute `x_{k+1}`
        let xkp1 = xk.scaled_sub(&self.omega, &grad);
	
        // Return new the updated `state`
        Ok((state.param(xkp1), None))
    }
}
}

Another example with a more complex solver which requires init as well as returning a KV is (hopefully) coming soon!

Implementing an observer

Coming soon!

Implementing a checkpointing method

Coming soon!

Contributing

This crate is looking for contributors! Potential projects can be found in the Github issues, but feel free to suggest your own ideas. Besides adding optimization methods and new features, other contributions are also highly welcome, for instance improving performance, documentation, writing examples (with real world problems), developing tests, adding observers, implementing a C interface or Python wrappers. Bug reports (and fixes) are of course also highly appreciated.

Running the tests

The repository is organized as a workspace with the two crates argmin and argmin-math. It is recommended to run the tests for both crates individually, as this simplifies the choice of features. In case of argmin, all combinations of features should lead to working tests. For argmin-math, this is not the case, since some of the features are mutually exclusive.

Therefore it is recommended to run the tests for argmin from the root directory of the repository like this:

cargo test -p argmin --all-features

This will work for --all-features and any other combination of features. Note that not all tests will run if only a subset of the features is enabled.

In terms of argmin-math, one can just test the default features:

cargo test -p argmin-math

Or the default features plus the latest ndarray/nalgebra backends:

cargo test -p argmin-math --features "latest_all"

Individual backends can be tested as well; however, care has to be taken to not add two different versions of the same backend, as that may not work.

The argmin book