Programming

Announcing Taskling.NET, a C# Batch Job API

Taskling.NET is a C# batch processing library that enables you to avoid rewriting the same code over and over again for batch and micro-batch jobs. 

Overview

  • Partitioning of batches into blocks of work with guaranteed isolation between blocks across batches
  • Recover from failures with automatic reprocessing/retries of blocks
  • Limiting the number of concurrent task executions (across servers)
  • Critical sections across servers
  • Standardised activity logging and alerting.
  • Thread-Safe enabling parallel processing of blocks and list block items

Why You Should Understand Databases

In recent years I have seen developers distance themselves from databases more and more for various reasons. The two most common reasons seem to be

  • Business logic should not be split between the database and the application, it should all be stored in the application code. So stored procedures and functions are now an anti-pattern.
  • The ORM (like Entity Framework, Hibernate, ActiveRecord etc) handles the details of SQL and also data migrations. ORMs provide better developer productivity so writing SQL by hand is an anti-pattern.

Optimizing Regex performance with RegexOptions.RightToLeft

Regex is fast when it is scanning text that doesn't match it's pattern at all. However, when finding text that almost matches, things can start to slow down. You really want the Regex to either match a text or discard it as soon as possible. Building up large potential matches and finding that they don't fit towards the end can end up being very costly. 

For patterns that are string literals, the Regex will run in linear time. So if you get lots of partial matches that get discarded after 20 characters and another Regex discards potential matches after 2 characters then the first will be in the order of ten times slower.

A More Efficient Regex Tokenizer

As part of a DSL parsing series, I wrote a post about a super simple yet memory inefficient way of tokenizing some input text. The benefit was that the tokenizer was extremely simple but the downside was that it wouldn't be suitable for large texts or if the tokenizer was called excessively.

In this post we'll look at a similar Regex based tokenizer and trade-off a little simplicity for lot of performance gain.

Implementing a DSL Parser

We previously created a tokenizer that breaks up a sequence of characters into a sequence of tokens (enum TokenType) and a LL2 production notation grammar that acts as a template for the code of the parser.

The input of this parser will be the sequence of tokens and the output will be an Intermediate Representation (IR) which is a data structure that represents the DSL text in a structured manner. The next step, after parsing, will be translating this IR into SQL.

Creating a Simple Tokenizer (Lexer) in C#

What is a Tokenizer?

Before you read on, there is a more efficient version of this tokenizer here: http://jack-vanlightly.com/blog/2016/2/24/a-more-efficient-regex-tokenizer that has significantly better performance.

So a tokenizer or lexer takes a sequence of characters and output a sequence of tokens. Let's dive straight into an example to illustrate this.

Meet a simplified version of Logging Query Language (LQL)

How to Create a Query Language DSL with C#

Creating your own DSL is fun, it involves multiple complex steps which can be challenging and very rewarding to figure out. However if you don't know what you're doing then your code can end up one big hack which, while works, is complicated, hard to change and hard to read. Although pleased that you have a working DSL you know that underneath it's no looker. In this series I will go through, step by step, creating a simple query language that gets translated into SQL and executed against SQL Server.

Code analysis rules versus training and coaching

I have an ongoing and friendly disagreement with colleagues over the value of code analysis rules and training. I focus part of my time on training and coaching as I feel that this is a great investment both in people and also in the quality of the software that is developed. The argument of my colleagues is that training needs to be repeated over and over again in order to cover the large developer base and as new people arrive. Also you can do a training session with a development team but that doesn't stop them from committing bad code.