Comparison of regexp syntax

To help myself learn Emacs regular expressions, I've put together this cheat sheet comparing regular expression syntax in Emacs, Python, and egrep.

Note that in Emacs, the syntax described below is for regular expressions that are entered directly (e.g. in isearch-forward-regexp and occur). When you provide a regexp in a string (e.g. in Lisp code or in re-builder) you need to double all backslashes.

Emacs Python egrep
Any character . . .
Beginning of line ^ ^ ^
End of line $ $ $
0 or more repetitions * * *
1 or more repetitions + + +
3-5 repetitions \{3,5\} {3,5} {3,5}
Optional ? ? ?
Character set [...] [...] [...]
Alternatives \| | |
Group \(...\) (...) (...)
Named group (?P<name>...)
Non-capturing group1\(?:...\)(?:...)
Word boundary \b \b \b
Digit [[:digit:]]\d [[:digit:]]
Whitespace char \s- \s
Alphanumeric char \w \w \w
Back reference \1 \1
Named back reference (?P=name)

1 Also referred to as "shy groups".

5 comments:

  1. Wow, I did not know one cannot specify a digit in Emacs regex dialect.

    Isn't there any Emacs package that could extend the regex syntax?

    By the way, thank you for the very, very nice emacs posts.

    ReplyDelete
  2. Oops, I was mistaken. Emacs supports a syntax class for digits within [...], just like egrep. I updated the post.

    ReplyDelete
  3. In the age of Unicode, you might want to be a little careful with \d; it doesn't always mean ASCII digits any more.

    ReplyDelete
  4. @Phil: Good then, I am relieved :)

    @oylenshpeegul: Thank you for that snippet. Good to know!

    ReplyDelete
  5. re-builder is awesome

    ReplyDelete