What perl-compatible regular expression (PCRE) could match books/chapters of the bible?
How about this one:
/(^[123]?(\s+)?[a-zA-Z\s]+)(\s+)?([0-9]+)/
How about this one:
/(^[123]?(\s+)?[a-zA-Z\s]+)(\s+)?([0-9]+)/
Everything between the first set of parenthesis is the book name. The parenthesis are done so we can pass the variables separately on to an array. This is a common task in working with PCRE, but for simply matching a pattern, may or may not be necessary. Though the second set, putting the \s+ in parenthesis, is for grouping as opposed to passing.
^[123]?
First we have to see if it is one of the books that starts with (^) a 1, 2, or 3 (e.g., 1 Chronicles, 3 John). The question-mark means that it's optional, because most books of the bible don't have those.
(\s+)?
Next we have to check if there is a space between that and the book name, because when somebody is searching for a book of the bible, they may get lazy (like me) and not put in a space. Again, it's optional as to whether or not it's there, so we use a question-mark.
[a-zA-Z\s]+
Next we have to check for all variations of capital or small letters, and we need to allow spaces as well for books like "Song of Solomon" (actually that's the only one with a space in it). Instead of the question-mark we have the plus, because at least one of these is required. You could also use [\p{L}\s]+ in PCRE, where \p{L}, or it's long form, \p{Letter} indicates a letter from any language.
The second set of parenthesis is the chapter...
(\s+)?
Again, the space (or more than one space for that matter) is optional between the book name and the chapter.
([0-9]+)
A chapter. At least one chapter is required (the + symbol).
Pretty easy, and honestly I put this blog entry in for myself to quickly refer to in case I have documented all my code badly (which I have!).
No comments:
Post a Comment