Recent changes to 11: scanString start and end offsets incorrect.

#11 scanString start and end offsets incorrect.

Gary O'Leary-Steele — Fri, 23 Oct 2015 20:51:55 -0000

parseWithTabs() was the solution...

Thanks
Gary

#11 scanString start and end offsets incorrect.

Gary O'Leary-Steele — Fri, 23 Oct 2015 20:20:54 -0000

Here is another example with simple grammar;

import pyparsing as pp
variable = pp.Regex(r'\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*').setResultsName('variable')

grammar = variable

SAMPLE_PHP = r'''
$test;
$test1;
$test2;
$test3;
}'''

for token, start, stop in grammar.scanString(SAMPLE_PHP):
print "{} [{}:{}] from a string length of {}".format(token,start,stop,len(SAMPLE_PHP))

I would have expected each token match to provide the offset of the match but its not working that way for me.

scanString start and end offsets incorrect.

Gary O'Leary-Steele — Fri, 23 Oct 2015 20:01:35 -0000

Hi,

I'm using pyparsing to parse some PHP code and im trying to use scanString so that i can reference parsed components from the input text. For some reason im getting offsets for start and stop that are beyond the length of the string. Here is a simple example;

def test_foreach_bug_standalong():
'''
test to diagnose a scanStringOffsetBug
'''
import pyparsing as pp

nested_block = pp.nestedExpr(opener="{", closer="}").setResultsName("block_code")
foreach = pp.Group(pp.Literal("foreach") + "(" + ")" + \
                nested_block
                ).setResultsName("foreach")
variable = pp.Regex(r'\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*').setResultsName('variable')

grammar = foreach | variable

SAMPLE_PHP = r'''
        foreach(){          
                eval($item)
            }                   
        $test;

}'''
for token, start, stop in grammar.scanString(SAMPLE_PHP):
    print "{} [{}:{}] from a string length of {}".format(token,start,stop,len(SAMPLE_PHP))

The output generated from this is as follows;

[['foreach', '(', ')', ['eval($item)']]] [6:72] from a string length of 88
['$test'] [112:117] from a string length of 88

As you can see, the second match points to a start if 112 and ends 117 which is longer than the string.

Am i doing something wrong here?

Thanks
Gary