Creating your own DSL is fun, it involves multiple complex steps which can be challenging and very rewarding to figure out. However if you don't know what you're doing then your code can end up one big hack which, while works, is complicated, hard to change and hard to read. Although pleased that you have a working DSL you know that underneath it's no looker. In this series I will go through, step by step, creating a simple query language that gets translated into SQL and executed against SQL Server.
Regarding text parsing, it is a massive subject and most materials out there are pretty abstract using a mathematical notation. There are multiple levels of complexity depending on the complexity of the language being parsed. I am going to try and keep this series as simple possible using a simple DSL.
DSL to SQL - The Stages
Any complex problem should be broken down into smaller and simpler steps with nice separation of concerns.
Step 1 - Tokenize the DSL Text
We need to take a string and convert it into a sequence of tokens. A string is made of a sequence of characters, but multiple characters can form a single word or concept. For example, 'WHERE' is made of 5 characters but it is a single token.
Step 2 - Parse the Token Sequence and Generate a Data Structure
The parser doesn't need to worry about characters, newlines, tabs etc. It just receives a sequence of discrete tokens which makes this step a lot simpler.
Parsing is a complex topic which involves grammars, lexers and parsers. Most commonly you'll read about LL1 and LL(k) Grammars and Parsers (Recursive-Descent Parsers), Back Tracking Parsers and Predicated Parsers. We'll go into some of these grammers and parsers in detail in the later posts. Once you understand grammars you will want to check out parser generators. These are programs that can auto generate a parser for you based on a grammar alone. However, for your first DSL I would recommend hand coding it so that you understand how the whole process works. It will become less abstract and theoretical and more concrete in your mind.
As the parser is parsing, it builds a data structure that represents the structure of the code being parsed. You will have heard about Parse Trees and Abstract Syntax Trees. In our example we'll be creating a simple QueryModel class to represent the query.
Step 3 - Generate SQL Text and Parameters From the Data Structure
We take the data structure, forgetting about text parsing completely at this stage, and we build an SQL query and it's associated SqlParameters. This can be executed and the results returned.
Next we'll look at what a tokenizer is and how to create one.
Example code on GitHub
SERIES LINKS
--> How to Create a Query Language DSL in C#
Creating a Simple Tokenizer (Lexer) in C#
Understanding Grammars
Implementing a DSL Parser in C#
Generating SQL from a Data Structure