When a web server receives a request, it has to figure out what piece of code to run to process this particular request. Any element of the request can be used for this routing, but in practice, most web frameworks use the path part of the URL. Defining expressions like these ones:
"/resources/:id" => "some code to do something" "/users/:name/profile" => "some code to do something" ...
And these implementations seem to use the same strategy to determine the matching route. They convert the path string of each rule into a regular expression, evaluate the rules in the defined order, and return the first match.
This is a flexible solution, but I think that regular expressions are a bit overkill for this task, and that evaluating them one by one doesn’t scale well.
Sure developers can try to optimize the order of the routes, but any application or REST API with a few hundreds of endpoints will waste non negligible cycles running these regexps before actually executing the application code.
I’ve checked the following web frameworks, they all use this Regexps + Loop strategy:
Note that Dancer acknowledges the performance issue and provides a cache mechanism to mitigate it. (see the description here)
I’ve tried something different, I’ve tried another strategy based on the use of the Trie data structure.
The idea is to use a prefix tree, and to make it support
*splat placeholders. Finding the possible routes by traversing the tree should scale better than iterating on the list of routes. Of course this approach won’t let you express the routes directly with regexps as some web frameworks do. But I’d argue that simple and clean URLs are nice, and that
*splat should be enough.
I’ve got a first implementation here in Go: https://github.com/ant0ine/go-urlrouter
I wrote a quick benchmark:
PASS BenchmarkNoCompression 50000 59603 ns/op BenchmarkCompression 50000 34498 ns/op BenchmarkRegExpLoop 1000 1422348 ns/op ok github.com/ant0ine/go-urlrouter 7.277s
(Compression here means Trie Compression which is just the reduction of the size of the tree once you’ve made it read only)
That makes the Trie strategy 40 times faster on this benchmark.
That said, the numbers are small here (ns), and the gain may not be significant compared to the full response time of the average request. I wonder what would be the results with an implementation in Perl or Python. Go is just really fast.
I think the nice thing here is that you can keep adding tons of routes to your application, knowing those at the bottom won't get any performance hit.
What do you think ?