The shortest path from Anarchy to Alcoholism

goes through Jesus and ants, according to wikipedia (14/07/2012). This path is precisely the following one :

Anarchism, Jesus, Ant, Termite, TermitomycesR.HeimRoger HeimUppsala UniversityTure NermanAlcoholism

For instance, consider the wikipedia page Ture Nerman, you should find a link toward  the wikipedia page Alcoholism. This is the shortest path according to the length of the linked wikipedia page names (i.e. the path quoted below has a distance of 91 characters – “AnarchismJesusAntTermiteTermitomycesR.HeimRoger HeimUppsala UniversityTure NermanAlcoholism”- that is the shortest wikipedia path from Anarchy to Alcoholism according to this distance).

You could be skeptical reading such a result : why some people are loosing their time and money to compute such a path ? Or you could be enthusiastic : this might open some work upon knowledge cartography. For instance, some pertinent metrics could be found (why not links density with a more pertinent distance function) to pop up potential interesting things.

I want here to illustrate a technical point : this shortest path is computed considering a BIG graph. All nodes of this graph are wikipedia pages (about 10 millions pages), and edges are links to other pages (about 100 millions links). The data are publicly available as wikipedia dump (40 Go) as Extensible_Markup_Language (xml). However xml is not a graph structure, but a tree data structure. In particular, there is no way to use XML technologies to compute this wikipedia shortest path.

In fact I did use XML technologies to compute this shortest path. Indeed, the main point of this post is to prove that these technologies can scale up quite efficiently to handle Big Graph. Applications of such an XML based Graph Databases are numerous, specially for Data Scientists. In fact, since xml is very rich from a semantic point of view, data extraction, exploitation, and transformation, using XML based technology to query graphs is more than pertinent, it is really very efficient.

Technically, this work relies mainly over an open-sourced Extensible_Markup_Language   (xml) Database, called BaseX. A common query langage to query XML is called XPATH, extensively used through the functional programming language XQUERY.  As noted above, since xml is not a graph structure, you can’t use XQUERY / XPATH to query graphs. For this wikipedia proof-of-concept, I extended XPATH to directed graph, so that I can now use  XQUERY together with this extension to query graphs. I give to this this technology the name XQUERY++ (but I am open to a better name !).

Indeed, I already noticed (see this post) that this extension of XML technology was really powerful for data exploitation.  I am also already using these technologies for finance applications, because they are allowing fast data extraction / transformation, as well as easy and powerful reporting tools libraries.

You might ask why investing in Yet-Another-Graph-Technology, as this market is already slowly maturing. Indeed, I had no choice : I have tried to use Graph Databases. However they present two major inconvenient:

– Firstly, I do not know a powerful enough graph query language for my purposes. On the contrary, XQUERY is a very powerful data programming language, that is W3C standardized.

– XML is at heart of big company information systems (IS), as the de-facto tools used in IS urbanization. My clients data are usually XML or SQL based (SQL can be quite easily exported to XML).

Note that XML Databases and XQUERY have the reputation to exhibit poor performances. I confirm this point for heavy computational algorithmic task. Indeed, the “hard” algorithmic parts of this wikipedia work are written binding the database BaseX with C++ (binding XML databases with C++/JAVA / C# / Python … is quite easy now).

I hope that this small proof-of-concept convinced the reader that this technology can now be considered a serious potential concurrent to Graph Databases solution vendors.  And for those interested in this technology, I included a XQUERY ++ presentation !




Explicit solutions to multi-dimensional conservation laws

In this post, I am releasing a reviewed article concerning the convex hull algorithm, including numerical analysis in higher dimensions, see this link. It was a challenging algorithmic task to extend this work to higher dimensions.

This reviewed version applies the CHA algorithm to a two-dimensional problems that is the solution u=u(t,x_1,x_2) (for (x_1,x_2) \in {\mathbb{R}}^2) to the following Burgers-type initial value problem:

 \partial_t u + \nabla \cdot f(u) = 0,\quad f(u)(t,x) \in {\mathbb{R}}^2,\quad u(0,x_1,x_2)=sup\Big(\frac{1 - |(x_1,x_2)|_\infty}{10},0.\Big)

where we denote the norm |X-Y|_p = \Big( \sum_{d=1, \ldots, D}(x_d-y_d)^p \Big)^{1/p} for any exponent 1 \le p \le \infty. The following picture illustrates the behavior of this equation for different times with a simple flux f(u)= (\frac{u^2}{2},0) with a structured mesh. The next picture present a more complex flux f(u)= (\frac{u^2}{2},\frac{u^3}{30}) using an unstructured mesh.

I am quite impressed by these numerical results, the shock location (corresponding to time 5 and 7 on the figures) are lying on a one-dimensional manifold, as expected (see the article for details). Some work though remains to do to stabilize this algorithm for unstructured mesh.




2D solution of a non-linear conservation law (CHA-algorithm)













2-D solution using an unstructured mesh

The Convex Hull transformation

In this post, I am releasing a preprint, that can be found at Arxiv, submitted to Journal of Computational Physics, related to a joint work with Philippe Lefloch, work that I had begun to discuss in these posts : here and here .

I am really excited about this work. We introduce a transformation of maps, named the convex hull transformation.  This transformation is a new one to my knowledge, that might be part of  Transport theory, and that can be  be compared to the Polar factorization one of Yann Brenier. We use  this new transformation  to compute EXACT solutions to numerous hyperbolic non-linear equations arising in mathematical physics. This is probably my best mathematical work so far. Included is the abstract of this paper.


We revisit the method of characteristics for shock wave solutions to nonlinear hyperbolic problems and we describe a novel numerical algorithm —the convex hull algorithm (CHA)— in order to compute, both, entropy dissipative solutions (satisfying all relevant entropy inequalities) and entropy conservative (or multivalued) solutions to nonlinear hyperbolic conservation laws. Our method also applies to Hamilton-Jacobi equations and other problems endowed with a method of characteristics. From the multivalued solutions determined by the method of characteristic, our algorithm ”extracts” the entropy dissipative solutions, even after the formation of shocks. It applies to, both, convex or non-convex flux/Hamiltonians. We demonstrate the relevance of the proposed approach with a variety of numerical tests including a problem from fluid


Fast technologies for strategies or risk measures evaluations over big portfolio with a lot of risk sources

I posted some months ago a working paper at SSRN, that can be found at SSRN siterelated to a post called “a breach in the curse of dimensionality“. I recently implemented this method with success, feeding it with real big portfolio data through a CRIMERE database concept.

I am recalling that, today, nobody knows how to evaluate complex risk measures, as CVA or systemic risk ones, for big bank or insurance portfolios, because there are too many risk sources.

The result seems to be in accordance with my expectations : fast algorithms to evaluate complex risk measures, as CVA or systemic risk ones, for big bank portfolios, seems now possible. Moreover, most probably, a single PC is enough to compute them : don’t waste your money and your intelligence to buy complex distributed systems to compute these measures. It means also that this technique should be able to evaluate complex strategies over big portfolios, as those coming from bank or insurance. Please do not hesitate to contact me if you have any needs in this direction.

Explicit Solver for non linear hyperbolic equations

In this post, I am publishing a small Executable, which has the particularity to be able to calculate the exact solution of a nonlinear equations class, given by scalar conservation laws in one dimension, which prototype is the Burger equation. This executable illustrates some of the work we have underway with Philippe Lefloch, work that I had begun to discuss in this post.

This small executable is distributed for educational purposes, without warranty of any kind, to allow teachers to illustrate the behavior of the solutions of these equations, as the phenomena of shock interactions or asymptotic behavior for large times. A very primitive interface allows the user to define his or her own tests. This is a first demonstrator: if there is any interest there, do not hesitate to contact me for changes or bugs or suggestions.

This program was designed to compute the solution u(t,x), t\ge 0, x \in {\mathbb{R}} of the Cauchy problem for scalar conservation laws in one dimension:

\partial_t u+\partial_x f(u)=0,\quad u(0,\cdot) = u_0(\cdot),

where u_0 is the initial condition, and f (u) the nonlinearity. I coded two types of nonlinearities: a convex f (u) = u ^2/2, and a non-convex f (u) = u ^ 3/3 - u , but would be very easy to integrate any other non-linearity. In fact, the computation engine used for this small program is able to compute the much more complex solution than one dimension scalar conservation laws : the approach works equally well in any dimension, for systems of non linear equations as those coming from Hamilton-Jacobi equations or other nonlinear hyperbolic type, and for all types of non-linearity (convex and non-convex). Do not hesitate to contact me if you have any needs numerical computations for this type of equations. Finally, I recall that there is an infinity of solutions of scalar conservation laws. This program compute the limiting viscosity solution, considered as the “physical” solution of interest, but it could also produce the other ones.


I designed these numerical techniques. However, as a numerical analyst, it seems clear that these techniques are much more efficient than the existing ones to compute hyperbolic non linear equations.

This exe is a rar archive. Once extracted,  a folder is containing two files :

– Exact_Solver.exe is an exe running under Windows.

– Initial_Condition.xml is an input file allowing the user to consider its own initial conditions.

  1. Exact_Solver.exe : Here is the main windows, once ran the exe:










The basic functionality are the following ones:

1) The box labelled “Inc. Time”  allows to define an increment time, default value 0.1 sec. You can input any Inc. Time between 0. to 99.99.

2) The button “++” computes the solution accordingly to the box “Inc. Time”.  As an illustration, here is the solution of Buger equation with gaussian initial conditions computed at time 99.99Exact_Solver_99






3) The right combo box allows to select any initial conditions defined in “Initial_Conditions.xml”


  • Initial_Condidions.xml

This is an xml file. Here are its default value :










One sees the two nonlinearities available: “u2″ or “u3_u”. Within, each of the three types of initial conditions available:

1)  <Gaussian> indicates that the initial condition is a Gaussian.

2)  <Riemann> indicates that the initial condition is a condition of Riemann type with u_0 (x) = u_l, x \le S, u_r, x \ge S

3)  <RiemannInteraction>indicates that the initial condition is a Riemann data of kind  u_0(x) = u_l;x \le S_l, \quad u_0(x) = u_m; S_l \le x \le S_r, \quad u_r \text{ else}.




Solveur explicite pour les équations hyperboliques non-linéaires

Dans ce post, je diffuse un petit programme, qui a la particularité de savoir calculer la solution exacte d’une classe d’équations non-linéaires, les lois de conservation scalaires en une dimension, dont le prototype est l’équation de Burger. Cet exécutable illustre une partie des travaux que nous avons en cours avec Philippe Lefloch, travaux que j’avais commencés à évoquer dans ce post .

Ce petit exécutable est diffusé à titre pédagogique, sans garantie aucune, pour permettre à des enseignants d’illustrer le comportement des solutions de ces équations, comme les phénomènes d’interactions de chocs ou le comportement asymptotique en temps grand. Une interface très primitive permet à l’utilisateur de définir lui-même ses propres tests. Il s’agit d’un premier démonstrateur: s’il y a un intérêt, n’hésitez pas à me contacter pour me demander des modifications ou me faire part de bugs ou suggestions.

Ce programme a été conçu pour calculer la solution u(t,x), t\ge 0, x \in {\mathbb{R}} du problème de Cauchy pour les lois de conservation scalaire en une dimension :

\partial_t u+\partial_x f(u)=0,\quad u(0,\cdot) = u_0(\cdot),

où  u_0(\cdot) est la condition initiale, et f(u) la non-linéarité. J’ai codé pour ce programme deux types de non-linéarités : une convexe f(u)=u^2 / 2, et une non convexe f(u)=u^3 / 3-u, mais il me serait très simple d’intégrer n’importe quelle autre non-linéarité. En fait, le moteur de calcul utilisé pour ce petit programme est capable également de calculer la solution de problèmes beaucoup plus complexes que les lois de conservation scalaires en une dimension : l’approche fonctionne indifféremment en dimension quelconque, pour les systèmes d’équations non-linéaires provenant des équations d’Hamilton-Jacobi, et bien d’autres équations de type hyperboliques non-linéaires, et pour tout types de non-linéarités (convexe et non convexe). Ici aussi, n’hésitez pas à me contacter si vous avez des besoins en calcul numériques sur ce type d’équations. Finalement, je rappelle qu’il existe une infinité de solutions des lois de conservation scalaires. Ce programme calcule la solution de viscosité, considérée comme la solution “physique” intéressante, mais il pourrait également exhiber les autres solutions.

Je suis juge et parti dans ce travail bien sûr. Cependant, en tant que numéricien, il me semble clair que les techniques numériques qui ont été développées pour ce travail sont bien plus performantes que l’existant pour simuler ce type de problème.

Le programme se présente sous forme d’un fichier .rar. Une fois extrait, apparaît le dossier contenant deux fichiers :

– Exact_Solver.exe est un exécutable Windows.

– Initial_Condition.xml est un fichier xml permettant à l’utilisateur de définir des jeux de conditions initiales.

  1. Exact_Solver.exe : Une fois lancé, l’exécutable apparaît :










Les fonctionnalités de base de ce programme sont les suivantes :

1) La box de la bel “Inc. Time”  permet de définir un incrément de temps, par défaut à 0.1 sec. Vous pouvez définir cet incrément de temps de 0. à 99.99.

2) Le bouton “++” calcule la solution sur le laps de temps défini dans “Inc. Time”.  Par exemple, voici la solution de l’équation de Buger avec conditions initiales gaussiennesExact_Solver_99, calculée au temps 99.99

3) La liste à droite su solveur permet de sélectionner les différents jeux de conditions initiales utilisateur entrées dans le fichier “Initial_Conditions.xml”


  • Initial_Condidions.xml

Il s’agit d’un fichier xml. Voici la version par défaut fournies avec le programme










On voit apparaître les deux non-linéarités disponibles : “u2″ ou “u3_u”. Au sein de chacune, les trois types de conditions initiales disponibles :

1)  <Gaussian> indique que la condition initiale est une gaussienne.

2)  <Riemann> indique que la condition initiale est une condition de Riemann avec u_0(x) = u_l;x \le S, \quad u_r \text{ else}.

3)  <RiemannInteraction> indique que la condition initiale est une condition de Riemann avec u_0(x) = u_l;x \le S_l, \quad u_0(x) = u_m; S_l \le x \le S_r, \quad u_r \text{ else}.

A High-Dimensional Pricing Framework for Financial Instrument valuation

I posted a working paper at SSRN, that can be found either at SSRN site, or directly here in this blog site, related to my previous post “a breach in the curse of dimensionality“. This is a working paper, and I am actually implementing this method, thus this paper will likely to change while experimenting. Do not hesitate to contact me for further details or feed back.

A breach in the Curse of Dimensionality

I am quite excited to release a work devoted to the Curse of dimensionality, in the context of Financial mathematics. I am actually trying to validate this work by peers, and will publish it as soon as it will be.

This work proposes a framework allowing to consider valuations of complex OTC products, including american exercising, written on a large number of underlyings. The motivation is to provide numerical algorithms able to quickly valuate big investment bank portfolios with sophisticated pricing models, as those coming from risk measures, and we think particularly to CVA (counterpart risk valuation) ones, leading to consider high dimensional Optimal Stopping Problems.

Indeed, it has already been noticed that current numerical methods devoted to pricing algorithms face an open mathematical problem, the curse of dimensionality (see for instance these authors Crepey, Labordere, Reghai), particularly in the context of CVA computations. We notice also that the econometry problem, consisting in finding a risk-neutral dynamic of the underlyings, faces the very same open problem. Indeed, the CVA approach, as some others existing risk measures, although vertuous, can not be credible as long as they will be “cursed”.

The framework introduced has two purposes : firstly to provide a fast calibration algorithm to determine a risk-neutral econometry, using risk-less market prices, able to capture correlation structures between underlyings. Secondly, to design a fast backwardation algorithms upon this econometry. Both algorithms breaks the curse of dimensionality wall. This is achieved using thoroughly a particular context, given by martingales processes, a classical assumption in mathematical finance. From a mathematical point of view, these results are obtained blending together stochastic, partial differential equations, and optimal transport theories, and are a generalization to the multi-dimensional case of this paper.

My perception is that these tools should be the right candidates to provide a standardized framework for computing risk measures of large portfolios or investment strategies, as well as a pedestal to tackle credible systemic risk measures. Moreover, this approach seems to be very performent from an algorithmic point of view. The resulting algorithms should be able to reduce drastically the computational burden for existing measures, as CVA or risk-neutral Front-office valuations.

The curse of dimensionality has been opened roughly sixty years ago as a mathematical problem. This problem is of crucial importance in the context of mathematical finance, where the dimensionality is the number of underlyings of a given portfolio, usually counting thousands of instruments as shares, rates, commodities, default probabilities and so on. Among the authors that tackled this problem in a financial context, let us cite for instance C. Schwab’s PDE approach, or V. Bally, G. Pages, J.Printems stochastic approach. Actually, these authors provide methods working for “low” dimensions. It is usually considered that the algorithmic burden of these methods is too heavy for let say ten underlyings.

For partial differential equations (P.D.E.), the curse of dimensionality is quite easy to illustrate: most numerical methods rely over cartesian meshes, consisting in N^D points, N being the number of points necessary to model one dimension, D being the number of dimensions. Crude methods as the alternate direction implicit one uses N^{D+1} operations to compute a solution, converging at rate N^{-2} in the best case toward the solution of the continuous equation. Hence, the overall computational burden to reach an accuracy of order \epsilon, for a small \epsilon>0, is \epsilon^{-\frac{D+1}{2}} in the best cases, being a clear algorithmic complexity limitation to address higher dimensional problems, i.e. when D becomes large, as it is in Finance.

We designed algorithms based over N points, lying in  {\mathbb{R}}^D. They show rate of convergence of order N^{-1/2}, in the worst case, independently of the dimension D. For convex solutions, this rate can be as good as N^{-2} without american exercising, and N^{-1} with, dimension independent. To reach this order of convergence, N^3 operations are necessary. Hence, the overall computational burden to reach an accuracy of order \epsilon, is \epsilon^{-3/2} in the best case, \epsilon^{-3} for american exercising, \epsilon^{-6} for the worst case. Low number of discretization points should be necessary to get accurate enough, in order to reach the confidence limit of state-of-the-art in finance modeling.


Data Heterogeneity with XQUERY++ and BaseX …

… is the title of a conference that I should give tomorrow (14/02/14) for the XML Prague conferencea conference on markup languages and data on the web. My slides can be found here : XMLPrague0214.

Big Data and Fast Dev Technics

During the fall 2013, I focused into the movement so-called “Big Data” , as part of my small company CRIMERE . Like many movements before it (see for instance “Semantic Web” , or “cloud ” ), it seems quite difficult to have a clear definition of what is really “Big Data” beyond a pure marketing concept . This movement, like others elsewhere, is reinventing the wheel by presenting it in a different angle. However, it clearly has a high potential , I recommend for instance this paper (in French), which deals with ” new” professions likely to emerge from these technologies.

In particular, this movement has two distinct facets that I find very interesting , even if the tackled problematic are not so news. The first facet of  “Big Data” is that associated with the processing of large volumes of data , related to cloud computing. The second, in which I am much more interested in my professional activity, is related to the heterogeneity of data and information systems : how to make sense of large volumes of data using complex models, and especially how to be able transform them in other data models, better suited to a particular treatment, which itself will generate other datas, etc etc … These problems are currently generating huge costs for companies , which support teams of Business analysts / computer scientists, whose task is precisely to make these data transformation.

Studying a few technical aspects related to that second facet ( data manipulation ), it came to me an iconoclastic idea : trying to explore the potential of these new technologies in the context of my professional activity. So I did the following exercise: develop a “professional” application engine of sensitivity VaR ( Value at Risk). This application is very classic, and occupies entire dedicated teams in risk departments of investment banks since decades.

Here is the result : it took me three weeks to develop this type of application , knowing that I am neither a professional developer, nor a VaR specialist. The characteristics of this application are:

– Input: 30,000 trades , described in FpML format version 5.5 , all kind of derivatives.
– Transformation into an internal VaR model data.
– Computing VaR sensitivity .

The complete application cycle ( data integration – 1 GB / data transformation / VaR computation) takes less than five minutes on my PC. The “business” part of this application consists in 400 lines of codes.

Obviously, there is some intelligence in the design of this application. However , the recipe is completely generic and can be applied to the entire software industry .

What I am defending is that the technology proposed by this movement allows, if used smartly, to gain a factor of scale in terms of productivity and hence in terms of costs across the whole industry software . I speak here of the costs generated throughout the life cycle of complex applications using large volumes of data: data transformation , development, maintenance , documentation, harmonization / standardization of information systems.

For instance, if you want to start a project involving the development of a complex application using large data volumes, do not hesitate to ask me for a estimation. This will allow you to compare, for pure information purposes of course, with quotes offered by your internal teams or your external providers.