Embedded Conditions Not Working Correctly
Posted: Sat Apr 09, 2011 3:05 am
The regex below, with the embedded condition, (?(?= – ) – 0?([1-9]|1[0-2]):0([1-9])\s*?([AP])\.?M\.?), should pick-up the time 4:15 – 5:05 p.m.
in the text below and it does.
(?#dd:To)(?m)(?<=^|\s)['"‘“\p{Pi}\p{Ps}]{0,2}0?([1-9]|1[0-2]):([1-5][0-9])(?(?= – ) – 0?([1-9]|1[0-2]):0([1-9])\s*?([AP])\.?M\.?)(?=\s|$)
$1 $2 to $3 o $4 $5.M
<< start of text >>
Arts & Letters Day
Romance and Mystery Novels: an Intimate Conversation with Annette Blair
In a cozy living room setting Annette Blair discusses the professional life of a writer of romance and mystery novels.
Wednesday, April 6th
4:15 – 5:05 p.m.
Conference Center, Second Floor, Room K-422
<< end of text >>
However if encounters the following text it will also incorrectly affect and mispronounce the time, 8:19 PM PDT
.
<< start of 2nd text >>
Yahoo! Breaking News Friday, April 8, 2011, 8:19 PM PDT
WASHINGTON (AP) Senate passes short-term spending bill to keep the government open
<< end of 2nd text >>
8:19 PM PDT should be unaffected by the above regex since the lookahead condition, (?= – ), should have failed. I have been able to get it to work properly by introducing an else condition as follows:
(?#dd:To)(?m)(?<=^|\s)['"‘“\p{Pi}\p{Ps}]{0,2}0?([1-9]|1[0-2]):([1-5][0-9])(?(?= – ) – 0?([1-9]|1[0-2]):0([1-9])\s*?([AP])\.?M\.?|he is a Jackass)(?=\s|$)
$1 $2 to $3 o $4 $5.M
However, it seems that since the condition is not met the regex should have failed without the else condition.
Percy Henry
in the text below and it does.
(?#dd:To)(?m)(?<=^|\s)['"‘“\p{Pi}\p{Ps}]{0,2}0?([1-9]|1[0-2]):([1-5][0-9])(?(?= – ) – 0?([1-9]|1[0-2]):0([1-9])\s*?([AP])\.?M\.?)(?=\s|$)
$1 $2 to $3 o $4 $5.M
<< start of text >>
Arts & Letters Day
Romance and Mystery Novels: an Intimate Conversation with Annette Blair
In a cozy living room setting Annette Blair discusses the professional life of a writer of romance and mystery novels.
Wednesday, April 6th
4:15 – 5:05 p.m.
Conference Center, Second Floor, Room K-422
<< end of text >>
However if encounters the following text it will also incorrectly affect and mispronounce the time, 8:19 PM PDT
.
<< start of 2nd text >>
Yahoo! Breaking News Friday, April 8, 2011, 8:19 PM PDT
WASHINGTON (AP) Senate passes short-term spending bill to keep the government open
<< end of 2nd text >>
8:19 PM PDT should be unaffected by the above regex since the lookahead condition, (?= – ), should have failed. I have been able to get it to work properly by introducing an else condition as follows:
(?#dd:To)(?m)(?<=^|\s)['"‘“\p{Pi}\p{Ps}]{0,2}0?([1-9]|1[0-2]):([1-5][0-9])(?(?= – ) – 0?([1-9]|1[0-2]):0([1-9])\s*?([AP])\.?M\.?|he is a Jackass)(?=\s|$)
$1 $2 to $3 o $4 $5.M
However, it seems that since the condition is not met the regex should have failed without the else condition.
Percy Henry