Methodologies, approximation schemes, and algorithms for functional optimization
Functional optimization problems investigate the minimization (or maximization) of a functional with respect to admissible solutions belonging to an infinite-dimensional space of functions.
Originally studied in the classical calculus of variations [14], nowadays functional optimization problems model a variety of tasks arising in emerging fields of interest in Operations Research (see the examples and the references in [20, 28]), which share the following basic feature: it is required to determine a function that is optimal, in a sense specified by a suitable cost or merit functional, among a set of candidate admissible functions. Such a function might express, e.g., the dependence of an optimal solution to a mathematical programming problem on some varying parameters of the problem itself, the input/output mapping of a device learning from examples, the model of a system for fault diagnosis, the routing strategies in telecommunication networks, the water-releasing decisions in water resources management, etc.
In all such problems, infinite dimension makes inapplicable many tools typically used in finite-dimensional optimization, the subject of Mathematical Programming, which have been developed for sets of admissible solutions belonging to finite-dimensional spaces [7]. For the latter case there exists a “universal” ambient space, i.e., the space Rn (for a suitable positive integer n) and all norms are topologically equivalent on it. In contrast to this, functional optimization has to face additional difficulties, basically due to the choice of the ambient space and of the norm on it, which often determine substantially different convergence properties for procedures of approximate solution [13]. Approaches based on variational methods and functional analysis are well-suited to functional optimization, but often they are unable to provide a solution in analytical form and computationally efficient numerical procedures.
When an analytical solution cannot be found or cannot be efficiently implemented, one can search suboptimal solutions having the form of combinations of a certain numbers of basis elements, corresponding to computational units with a simple structure (e.g., sines and cosines, Gaussians, wavelets’ computational units, etc.), which are available from a software library and can be dealt with efficiently. Classical methods of approximate functional optimization [10,14] exploit linear approximation schemes : they search suboptimal solutions among all admissible functions that can be expressed as linear combinations of a certain number n of fixed basis functions. In such approach, tools from functional analysis are exploited to reduce the original functional optimization problem to a nonlinear programming problem. As the basis functions are a-priori fixed, the only objects to be optimized are the coefficients of the linear combination, whose optimization entails a nonlinear programming problem.
The rationale behind the idea of searching suboptimal solutions of this form is intuitive: when the number of basis functions becomes sufficiently large, suitable properties of such functions and of the functional to be minimized may ensure that the suboptimal solutions converge to the optimal one of the original problem [10]. The implementation of these approximation schemes on classical computers makes the number n of basis functions needed to guarantee that suboptimal solutions are “sufficiently close” to the optimal one (provided it exists) a crucial issues; such a number n plays the role of "model complexity" of the approximation scheme and can be studied with tools from approximation theory.
Besides infinite-dimensionality, another difficulty often arises: the mathematical formalization of many tasks in emerging areas of science and technology entails functional optimization problems in which it is required to perform optimization with respect to admissible functions dependent on a large number d of variables (see the examples and the references in [20, 28]). This is the case, for example, when the variables are the nodes of a communication network in routing problems, the number of items in inventory problems, the number of reservoirs in water resources management, etc. Linear approximation schemes searching for suboptimal solutions made up of at most a certain number n of basis computational units are often unable to deal efficiently with such functional optimization problems, because of the so-called “curse of dimensionality” [5]. Roughly speaking, the curse of dimensionality implies a very fast growth of the number n(e,d) of fixed basis functions necessary to obtain a fixed accuracy e of approximate optimization, with the number d of variables of admissible solutions (typically, an exponential growth O((1/e)d )). This drawback often limits the feasibility of numerical approaches and of traditional methods developed long time ago for the calculus of variations, such as the classical Ritz method [14], to applications involving a large number of variables [4, 9, 16, 28] and motivates the search for more efficient approximation schemes.
Experience has shown that
optimization over admissible sets made up of combinations of relatively few
computational units with a simple structure and dependent on a set of “inner” parameters (e.g., free-node splines,
rational functions, neural networks, exponential sums; see [18,19]) often achieves surprisingly good performances,
much better than those achievable by using a-priori fixed computational units with no parametric dependence (e.g.,
algebraic and trigonometric polynomials). The introduction in functional optimization of an alternative to the
classical Ritz method, called in [28] the "extended Ritz method" (ERIM), was motivated by the
innovative and
successful application since the late ’80s of feedforward neural networks to the approximate solution of
functional optimization problems with admissible solutions
dependent on a large number of variables[6, 8, 17, 21, 22, 26].
The origins of the extended Ritz method have to be sought in a series of papers [1, 2, 23, 24, 25, 27, 28], in which it was explicitly suggested restricting the set of admissible decision functions to linear combinations of functions dependent on some inner “free” parameters, rather than linear combinations of fixed basis functions as used in the classical Ritz method. Thus, in the extended Ritz method a nested family of linear subspaces of increasing dimensionality, which in the classical Ritz method approximates the set of admissible solutions, is replaced with a nested family of nonlinear approximating sets, called "variable-basis approximation scheme". Such schemes include a variety of nonlinear approximation schemes such as free-node splines [11, Chapter 13], polynomials with free frequencies and phases [12], feedforward neural networks [3, 15,18,19, 20], etc.
I am currently investigating:
i) accuracy of suboptimal solutions obtainable by variable-basis approximation schemes;
ii) classes of functional optimization problems and approximation schemes with polynomial (with respect to the number d of variables) rates of approximation;
iii) variable-basis approximation schemes that reduce functional optimization to a sequence of nonlinear programming problems, and algorithms for the solution of such problems;
v) applications of computationally effective algorithms to functional optimization problems in emerging areas (e.g., routing in communication networks, optimal traffic control, web exploration, optimization in stochastic graphs, etc.).
References
[1] Baglietto, M., Parisini, T., and Zoppoli, R.: Numerical solutions to the Witsenhausen counterexample by approximating networks, IEEE Trans. on Automatic Control, vol. 46, pp. 1471-1477, 2001.
[2] Baglietto, M., Parisini, T., and R. Zoppoli, Distributed-information neural control: the case of dynamic routing in traffic networks, IEEE Trans. on Neural Networks, vol. 12, pp. 485-502, 2001.
[3] A. R. Barron: “Universal Approximation Bounds for Superpositions of a Sigmoidal Function”. IEEE Trans. on Information Theory 39, 930-945, 1993.
[4] R. W. Beard and T. W. McLain, “Successive Galerkin approximation algorithms for nonlinear optimal and robust control,” Int. J. of Control, vol. 71, pp. 717-743, 1998.
[5] R. Bellman, Dynamic Programming, Princeton University Press, Princeton, New Jersey, 1957.
[6] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, Massachusetts, 1996.
[7] Bertsekas, D. P.: Nonlinear Programming. Athena Scientific, Belmont, MA, 1999.
[8] Chen, F.C. and Khalil, H.: Adaptive control of a class of nonlinear discrete-time systems using multilayer neural networks, IEEE Trans. on Automatic Control 1995, vol. 40, pp. 791-801.
[9] V. C. P. Chen, D. Ruppert, and C. A. Shoemaker, “Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming,” Oper. Res., vol. 47, pp. 38-53, 1999.
[10] J. W. Daniel, The Approximate Minimization of Functionals, Prentice-Hall, Englewood Cliffs, N.J., 1971.
[11] R. A. DeVore and G. G. Lorentz, “Constructive approximation,” Grundlehren der Mathematischen Wissenschaften, vol. 303. Springer-Verlag, Berlin, 1993.
[12] R. A. DeVore and V. N. Temlyakov, “Nonlinear approximation by trigonometric sums,” The J. of Fourier Analysis and Applications, vol. 2, no. 1, pp. 29-48, 1995.
13 [25] I. Ekeland and T. Turnbull. Infinite-Dimensional Optimization and Convexity, The University of Chicago Press, Chicago, USA, 1983.
[14] I. M. Gelfand and S. V. Fomin. Calculus of Variations, Prentice Hall, Englewood Cliffs, N. J., 1963.
[15] F. Girosi, M. Jones, T. Poggio: “Regularization Theory and Neural Networks Architectures”. Neural Computation 7, pp. 219-269, 1995.
[16] S. A. Johnson, J. R. Stedinger, C. Shoemaker, Y. Li and J. A. Tejada-Guibert, “Numerical solution of continuous-state dynamic programs using linear and spline interpolation,” Oper. Res., vol. 41, pp. 484-500, 1993.
[17] A. Juditsky, H. Hjalmarsson, A. Benveniste, B. Delyon, L. Ljung, J. Sjöberg, and Q. Zhang, Nonlinear black-box models in system identification: mathematical foundations, Automatica, vol. 31, pp. 1725-1750, 1995.
[18] V. Kủrková and M. Sanguineti, “Bounds on rates of variable-basis and neural-network approximation”, IEEE Transactions on Information Theory, vol. 47, pp. 2659-2665, 2001.
[19] V. Kủrková and M. Sanguineti, “Comparison of worst case errors in linear and neural network approximation”, IEEE Transactions on Information Theory, vol. 48, pp. 264-275, 2002.
[20] V. Kủrková and M. Sanguineti, “Error estimates for approximate optimization by the extended Ritz method”, SIAM Journal on Optimization, vol. 15, pp. 461-487, 2005.
[21] Narendra, K.S. and Mukhopadhyay, S.: Adaptive control using neural networks and approximate models. IEEE Trans. on Neural Networks, vol. 8, pp. 475-485, 1997.
[22] Narendra, K. S. and Parthasarathi, K.: Identification and control of dynamical systems using neural networks, IEEE Trans. on Neural Networks, vol. 4, pp. 4-26, 1990.
[23] Parisini, T., Sanguineti, M., and Zoppoli, R.: Nonlinear stabilization by receding-horizon neural regulators, Int. J. of Control, vol. 70, pp. 341-362, 1998.
[24] Parisini, T. and Zoppoli, R.: Neural networks for feedback feedforward nonlinear control systems, IEEE Trans. on Neural Networks, vol. 5, pp. 436-449, 1994.
[25] Parisini, T. and Zoppoli, R.: Neural approximations for multistage optimal control of nonlinear stochastic systems, IEEE Trans. on Automatic Control, vol. 41, pp. 889-895, 1996.
[26] J. Sjöberg, Q. Zhang, L. Ljung, A. Benveniste, P.-Y. Glorennec, B. Delyon, H. Hjalmarsson, and A. Juditsky, Nonlinear black-box modeling in system identification: a unified overview, Automatica, vol. 31, pp. 1691-1724, 1995.
[27] Zoppoli, R. and Parisini, T.: Learning techniques and neural networks for the solution of N-stage nonlinear nonquadratic optimal control problems, in Systems, Models and Feedback: Theory and Applications (A. Isidori and T. J. Tarn, Eds.), pp. 193-210. Birkhäuser, Boston, 1992.
[28] R. Zoppoli, M. Sanguineti, and T. Parisini, “Approximating networks and extended Ritz method for the solution of functional optimization problems”, Journal of Optimization Theory and Applications, vol. 112, pp. 403-440, 2002.
[29] K. Hlaváčková-Schindler and M. Sanguineti, “Bounds on the complexity of neural-network models and comparison with linear methods”, International Journal of Adaptive Control and Signal Processing, vol. 17, pp. 179-194, 2003.
[30] P. C. Kainen, V. Kủrková, and M. Sanguineti, “Minimization of error functionals over variable-basis functions”, SIAM Journal on Optimization, vol. 14, pp. 732-742, 2003.