Where can I learn the basics of writing a lexer?

Language AgnosticLexerCompiler Construction

Language Agnostic Problem Overview


I want to learn how to write a lexer. My university course had an assignment where we had to write a parser (and a lexer to go along with it) but this was given to us with no instruction or feedback (beyond the mark) so I didn't really learn much from it.

After searching for this topic, I can only find fairly advanced write ups which focus on areas which I feel are a few steps ahead of where I am at. I want a discussion on the basics of writing a lexer for a very simple language which I can use as a basis for investigating tokenising more complex languages.

At this stage I'm not really interested in best practices or optimisation techniques but instead prefer a focus on the essentials. What are some good resources to get me started?

Language Agnostic Solutions


Solution 1 - Language Agnostic

Basically there are two main approaches to writing a lexer:

  1. Creating a hand-written one in which case I recommend this small tutorial.
  2. Using some lexer generator tools such as lex. In this case, I recommend reading the tutorials to the particular tool of choice.

Also I would like to recommend the Kaleidoscope tutorial from the LLVM documentation. It runs through the implementation of a simple language and in particular demonstrates how to write a small lexer. There is a C++ and an Objective Caml version of the tutorial.

The classical textbook on the subject is Compilers: Principles, Techniques, and Tools also known as the Dragon Book. However this probably falls under the category of "fairly advanced write ups".

Solution 2 - Language Agnostic

The Dragon Book is probably the definitive guide on the subject, although it can be a bit overwhelming. Language Implementation Patterns and Programming Language Pragmatics are great resources as well.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRupert Madden-AbbottView Question on Stackoverflow
Solution 1 - Language AgnosticvitautView Answer on Stackoverflow
Solution 2 - Language AgnosticBrandon MoretzView Answer on Stackoverflow