Regular Expression to match the beginning of a line

Forum for TextAloud version 3

Moderator: Jim Bretti

Post Reply
TurnieGC
Posts: 2
Joined: Wed Nov 20, 2013 4:26 am
Contact:

Regular Expression to match the beginning of a line

Post by TurnieGC »

Right now I'm trying to use the pronounciation dictionary to skip various characters used for markdown formatting.

To give you an easy example, the following line marks a chapter

Code: Select all

# Chapter 1
Now I tried to skip the '#' by using e.g. the following regular expression

Code: Select all

^#
Most of my regular expressions work fine so far. But they fail once I try to access text at the beginning of a line, i.e. starting the regexp with a '^' sign.

Is there some way to get it to work ? Or is this feature not supported by TextAloud ?

Best regards,

Michael :-)
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Regular Expression to match the beginning of a line

Post by Jim Bretti »

Hi Michael

By default, regular expressions in TextAloud process text as one big text string. So the symbol ^ will match the beginning of the text.

You can force an expression to process text line by line using the modifier (?m). So the expression (?m)^# should match a # symbol at the beginning of a line. With the (?m) modifier, ^ matches the beginning of a line and $ matches the end of a line.
Jim Bretti
NextUp.com
TurnieGC
Posts: 2
Joined: Wed Nov 20, 2013 4:26 am
Contact:

Re: Regular Expression to match the beginning of a line

Post by TurnieGC »

Thanks a lot for your help.

It's working well, although I encountered another behaviour i didn't expect.

I want to suppress a couple of lines starting with a certain qualifier, but skipping the whole line after the qualifier as well.

So I tried the following regular expression

Code: Select all

(?m)^qualifier.*$
It looks like ".*" isn't recognized. But that's no big deal, in the end the following solution did the same thing

Code: Select all

(?m)^qualifier[\w\s,;/]*$
Best regards,

Michael
Jim Bretti
Posts: 1558
Joined: Wed Oct 29, 2003 11:07 am
Contact:

Re: Regular Expression to match the beginning of a line

Post by Jim Bretti »

A possible problem with the expression (?m)^qualifier.*$ could have to do with greedy vs ungreedy (or lazy) pattern matching. By default, pattern matches are greedy, meaning they will match as much as possible. So the expression (?m)^qualifier.*$ begins matching at the first qualifier at the beginning of a line. Since the expression is greedy, it matches all the way to the very end of the text, since there is nothing to stop the . wildcard. So we're matching up the last place in the text where $ matches, rather than the first, which is what you want.

To make the matching ungreedy (or lazy), use a question mark (?) after the asterisk. So the expression to do what you want looks like this:

(?m)^qualifier.*?$

Greedy pattern matching is the default for the regular expression engine we use in TextAloud.
Jim Bretti
NextUp.com
Post Reply