I while back I wrote a blog series about DSLs, grammars, tokenizers, parsers and a SQL generator. The idea was that you could write a DSL query to mine your error log data and the code would generate SQL. The series can be found here: http://jack-vanlightly.com/blog/2016/2/3/how-to-create-a-query-language-dsl.
The tokenizer while simple was very inefficient, so I wrote a better one, you can find that here: http://jack-vanlightly.com/blog/2016/2/24/a-more-efficient-regex-tokenizer
I have just published working code based on this series and the better Regex tokenizer on Github here: https://github.com/Vanlightly/DslParser. It will show you example DSL queries, the tokens that get generated, the intermediate representation and finally the SQL text generated. It also shows you the elapsed time for running the inefficient and more efficient tokenizers.
It is a console application that gives you some options for viewing output at each stage and running simple time elapsed performance test.
Press 1 to see:
Both the inefficient tokenizer and the more efficient tokenizer produce the same outputs, so pressing 1 and 4 will result in the same output on screen.
Let's see the performance differences. We'll press 2 for the process to run 1000 times with the slow tokenizer on a small DSL query:
We'll press 3 for a large DSL query:
Now let's see the more efficient tokenizer at work. We'll press 5 for a small query:
So twice as fast, but not a dramatic difference. Let's run it for a large query:
Now we see the differeance: compare 28 seconds to 2.5 seconds. My post on the more efficient version goes into more detail on the performance difference.
So there you go, working code on Github for you all :)