Last week I spent evenings tweaking {:ok, 📍} LibLatLon, the helper library for geocoding. Currently it supports both OpenStreetMap and GoogleMaps providers (other might be easily added,) and provides fancy lookup given an image with gps coordinates included as an argument

iex|1  LibLatLon.lookup "images/IMG_1.jpg"
#⇒ %LibLatLon.Info{
#     address: "Avinguda del Litoral, [...] España",
#     bounds: %LibLatLon.Bounds{
#     ...

OK, leading shameless ad is over; let’s turn back to the theme of today’s talk. This library claims it can read a latitude and longitude information from almost anything that is somehow looking like a thing, that includes coordinates.

It was easy for:

  • {lat, lon} when is_number() tuples;
  • {[degree, minute, second], reference} values;
  • some weird combinations of the above, used here and there;
  • images with GPS geo information included (that was easy!).

One might check the diversity of types accepted and the examples below. And everything was fun unless I stepped into accepting strings as an input. If you wonder, Google supports the nifty properly typographed format:

Cool? Yes. I decided I need to support this format as well. In other words, I was to parse the input like "41°22´33.612˝N" and produce a float out of this. I love regular expressions. If I ever will launch my own programming language, it’ll support none syntax but regular expressions.

The only drawback is productivity and inability to damn test all the corner cases with a regular expression. One might go with something like:

\d{1,2}°\d{1,2}´\d{1,2}(\.\d{1,})?˝[NWES]

or, even, be more precise and disallow degrees, greater than 89 and minutes greater than 59 and all that. My goal was different: I wanted to use Elixir binary pattern matching to accomplish the task. Because I love regular expressions, but binary pattern matching is still way sexier.

The issue is one cannot pattern match binaries of undeternmined length in the middle of the match. My first idea was to strictly disallow malformed input like "42°0´6.57252˝N,3°8´28.13388˝E", but a friend of mine having an address “17257 Fontanilles, Girona, Spain” would complain and grudge that {42, 3.14159265} is accepted fine, while "42°0´0˝N,3°14´15.9˝E" is not.

Well, Elixir provides great opportunities for macro programming, would probably yell here the astute reader, and yes, here we go. We are about to generate all possible variants of the string above in a compile time.

Let’s do it for the single blahtitude:

  for id <- 1..2,
      im <- 1..2,
      is <- 1..@decimal_precision do
    def parse(<<
          d::binary-size(unquote(id)), "°",
          m::binary-size(unquote(im)), "´",
          s::binary-size(unquote(is)), "˝",
          ss::binary-size(1)
        >>),
    do: Enum.map([d, m, s], fn v ->
          with {v, ""} <- Float.parse(v), do: v
        end)

Hey, it was simple! @decimal_precision is a parameter that is small in developement environment and set to 12 in production. 12 gives 48 different implementations only for the single blahtitude and it takes some noticable time to compile, while the runtime execution is blazingly fast.

The result would be nearly the same as we had copy-pasted def parse(<<.........>>), do: {} 48 times and changed the details here and there.

The implementation of the whole match does not differ much: another three comprehension loops are added and the total number of generated functions is raised drastically. That’s mostly it.

The same technique might be applied to parsing dates, times, floating point numbers with a limited mantissa (with an unlimited one, it’s still possible with a fallback to regular expression when the amount of digits is greater than 42,) you name it.


Author note: sometimes plain old good regular expression looks at least way more sane.