Loading...

xsl-list@lists.mulberrytech.com

[Prev] Thread [Next]  |  [Prev] Date [Next]

Re: [xsl] Complex Regex takes 201 steps in regex buddy but runs forever in Analyze-String Wolfgang Laun Mon Jan 31 12:01:34 2011

Yes, meanwhile I had changed the middle part to
   («[^»¤]+»\s*|§[^§¤]+§\s*){0,255}
so we agree :)
-W

On 31 January 2011 20:15, Alex Muir <[EMAIL PROTECTED]> wrote:
> Okay this one seems to work based on your suggestion and a little
> tweak to get it to surround all the LISTITEM's
>
> ((¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)\s*(((«[^»¤]+»\s*|§[^§¤]+§\s*){0,255})(¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)){0,200})
>
> Also I note that the input I posted there was working. I  was trying
> to reduce the input text and then ended up using a project scenario
> rather than a global scenario with the same name and after restarting
> oxygen I guess I switched to using a different scenario running
> different input than I wanted.
>
> Thanks much
>
>
> On Mon, Jan 31, 2011 at 6:59 PM, Wolfgang Laun <[EMAIL PROTECTED]> wrote:
>> The parentheses '(' and ')' do not match well in <xsl:variable
>> name="CompleteListIdentificationRegex" >. Please check.
>>
>> But one evil subpattern is this (with spaces inserted for readability):
>>
>>   ( ( «[^»¤]+» | \s+  |  §[^§¤]+§  ){0,255})
>>
>> This will try many combinations of zero to 255 repetitions of "any
>> number > 0 of spaces"
>>
>> Cleaner is
>>    (\s+|( «[^»¤]+»|§[^§¤]+§){0,255})
>>
>> -W
>>
>> On 31 January 2011 19:40, Alex Muir <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> With the following code:
>>> ------------------------------
>>>
>>> <?xml version="1.0"?>
>>> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>>>  xmlns:saxon="http://saxon.sf.net/" 
>>> xmlns:xs="http://www.w3.org/2001/XMLSchema"
>>>  version="2.0"  exclude-result-prefixes="#all">
>>>  <xsl:output method="xml" indent="no"/>
>>>
>>>
>>>  <xsl:template match="unknown[exists(text())]">
>>>    <xsl:copy>
>>>      <xsl:copy-of select="@*"/>
>>>
>>>      <xsl:call-template name="CompleteListAnalyze">
>>>        <xsl:with-param name="content" select="text()"/>
>>>      </xsl:call-template>
>>>
>>>    </xsl:copy>
>>>  </xsl:template>
>>>
>>>
>>>  <xsl:template name="CompleteListAnalyze">
>>>    <xsl:param name="content"/>
>>>
>>>    <xsl:variable name="CompleteListIdentificationRegex" >
>>>      
>>> <xsl:text>((¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)(((«[^»¤]+»|\s+|§[^§¤]+§){0,255})(¤LISTITEM[^¤]+¤[^¤]+¤/LISTITEM¤)){0,200})</xsl:text>
>>>    </xsl:variable>
>>>
>>>    <xsl:analyze-string select="$content"
>>> regex="{$CompleteListIdentificationRegex}">
>>>      <xsl:matching-substring>
>>>        <xsl:text>¤COMPLETELIST POSITION="</xsl:text>
>>>        <xsl:value-of select="position()"/>
>>>        <xsl:text>" PLACEMENT=""¤</xsl:text>
>>>        <xsl:value-of select="regex-group(1)"/>
>>>        <xsl:text>¤⊕/COMPLETELIST¤</xsl:text>
>>>      </xsl:matching-substring>
>>>      <xsl:non-matching-substring>
>>>        <xsl:value-of select="."/>
>>>      </xsl:non-matching-substring>
>>>    </xsl:analyze-string>
>>>  </xsl:template>
>>>
>>> </xsl:stylesheet>
>>>
>>>
>>> And the following input file:
>>> ----------------------------------
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <doc>
>>>    <unknown>¤LISTITEM BULLET="15" TITLE="TEXT TEXT TEXT TEXT"
>>> TYPE="SNLI"¤«§HL§FONT size="2" id="H13211"»15«/§HL§FONT»«/§HL§TD»
>>>   «§HL§TD id="H13213"»«/§HL§TD» «/§HL§TR» «§HL§TR id="H13215"»«§HL§TD
>>> id="H13216"» «/§HL§TD»«/§HL§TR» «§HL§TR valign="bottom" id="H13218"»
>>>      «§HL§TD id="H13220"»«/§HL§TD»         «§HL§TD colspan="2"
>>> id="H13222"»«§HL§FONT size="2" id="H13223"»TEXT TEXT TEXT
>>> TEXT«/§HL§FONT»¤/LISTITEM¤«/TD»         «TD id="H13225"»«/TD»
>>> «TD id="H13227"»«/TD»         «TD id="H13229"»«/TD»         «TD
>>> id="H13231"»«/TD»         «TD align="right" id="H13233"»¤LISTITEM
>>> BULLET="16" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"¤«§HL§FONT size="2"
>>> id="H13234"»16«/§HL§FONT»«/§HL§TD»         «§HL§TD
>>> id="H13236"»«/§HL§TD» «/§HL§TR» «§HL§TR id="H13238"»«§HL§TD
>>> id="H13239"» «/§HL§TD»«/§HL§TR» «§HL§TR valign="bottom" id="H13241"»
>>>      «§HL§TD id="H13243"»«/§HL§TD»         «§HL§TD colspan="2"
>>> id="H13245"»«§HL§FONT size="2" id="H13246"»TEXT TEXT TEXT TEXT TEXT
>>> «/§HL§FONT»¤/LISTITEM¤«/TD»         «TD id="H13248"»«/TD»         «TD
>>> id="H13250"»«/TD»         «TD id="H13252"»«/TD»         «TD
>>> id="H13254"»«/TD»         «TD align="right" id="H13256"»¤LISTITEM
>>> BULLET="17" TITLE="TEXT TEXT TEXT TEXT" TYPE="SNLI"¤«§HL§FONT size="2"
>>> id="H13257"»17«/§HL§FONT»«/§HL§TD»         «§HL§TD
>>> id="H13259"»«/§HL§TD» «/§HL§TR» «§HL§TR id="H13261"»«§HL§TD
>>> id="H13262"» «/§HL§TD»«/§HL§TR» «§HL§TR valign="bottom" id="H13264"»
>>>      «§HL§TD id="H13266"»«/§HL§TD»         «§HL§TD colspan="2"
>>> id="H13268"»«§HL§FONT size="2" id="H13269"»TEXT TEXT TEXT TEXT TEXT
>>> «/§HL§FONT»¤/LISTITEM¤</unknown>
>>> </doc>
>>>
>>> The regex held in the variable CompleteListIdentificationRegex runs
>>> fine on the same input executing to completion in 201 steps. It
>>> essentially just identifies all the content within the above <unknown>
>>> element.
>>>
>>> However the equivalent Analyze-String running in oxygen 12.1 will
>>> continue running and not stop on the same input.
>>>
>>> Any ideas?
>>>
>>> Been working on it for 4 hours without much progress other than
>>> reducing the number of execution steps in regex buddy by 40.
>>>
>>> Thanks Much
>>>
>>>
>>> --
>>> Alex
>>> -----
>>> Currently:
>>> Freelance Software Engineer 6+ yrs exp
>>>
>>> Previously:
>>> https://sites.google.com/a/utg.edu.gm/alex/
>>>
>>>
>>> A Bafila, is two rivers flowing together as one:
>>> http://www.facebook.com/pages/Bafila/125611807494851
>>>
>>> --~------------------------------------------------------------------
>>> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>>> To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
>>> or e-mail: <mailto:[EMAIL PROTECTED]>
>>> --~--
>>>
>>>
>>
>> --~------------------------------------------------------------------
>> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>> To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
>> or e-mail: <mailto:[EMAIL PROTECTED]>
>> --~--
>>
>>
>
>
>
> --
> Alex
> -----
> Currently:
> Freelance Software Engineer 6+ yrs exp
>
> Previously:
> https://sites.google.com/a/utg.edu.gm/alex/
>
>
> A Bafila, is two rivers flowing together as one:
> http://www.facebook.com/pages/Bafila/125611807494851
>
> --~------------------------------------------------------------------
> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
> or e-mail: <mailto:[EMAIL PROTECTED]>
> --~--
>
>

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:[EMAIL PROTECTED]>
--~--