LMD SyntaxEdit Schemes Language Reference


Rules


Each scheme contains zero or more rules. Rule contains regular expression used to search tokens, or switch into another scheme. Keep in mind, however, that the scope of every regexp is limited to one line.  Also, spaces and newlines inside regexp are ignored, thus, to specify space inside regexp, use “\s” construct.

 

Element: <RegexRule>

 

Rule used to search some regexp inside line and split it to tokens, or assign whole match to another scheme.

 

Attribute: regex, type: Regular expression
Specifies regexp to search. Instead of give regex, attribute, you can set regexp inside element. Nota bene: respect common XML language rules, use special syntax for XML entities: «<», «>»,  « ‘ »,  « “ » (&lt; &gt; &apos; &quot;) respectively.
Refer to W3C specification of XML entites.

Attribute: moreWordSeparators, type: string.
This attribute extends default word separator chars, used by \b regexp operator, for this regexp attribute only. See topic in regexps section. See <KeywordRegex> element for example.

Attribute: moreWordChars, type: string.
This attribute extends default word chars, used by \b regexp operator, for this regexp attribute only. See topic in regexps section. See <KeywordRegex> element for example.

 

<Regex token0='attributeValue'>

    [^ &lt; &gt; &quot; &apos; = \s ]+         

</Regex>

<Regex token0='attributeValue'

       regex='[^ &lt; &gt; &quot; &apos; = \s ]+' />

 

Attribute: token0, token1, token2…, type: string, <Token> referenceю
Specifies token for particular group of matched regexp. token0 assigns all match to some token, token1 assigns first match group to some token, token2 assigns first match group etc…

 

Example 1:

 

<Regex token0='email'>

    [_a-zA-Z\d\-\.]+     

    @ 

    ([_ a-z A-Z \d \-]+ 

    (\. [_ a-z A-Z \d \-]+ )+ )

</Regex>  

           

All match will produce one “email” <Token>.

 

Example 2:

 

<Regex token1='emailUser' token2='emailAt' token3='emailHost'>

    ( [_a-zA-Z\d\-\.]+ )

    ( @  )

    ([_ a-z A-Z \d \-]+ 

    (\. [_ a-z A-Z \d \-]+ )+ )

</Regex>    

         

All match will produce three tokens: “emailUser” , “emailAt”, “emailHost”. If token for group sequence not given, then default scheme token will be produced:

 

Example 3:

 

<Regex token3='emailHost'>

    ( [_a-zA-Z\d\-\.]+ )

    ( @  )

    ([_ a-z A-Z \d \-]+ 

    (\. [_ a-z A-Z \d \-]+ )+ )

</Regex>             

 

This match will produce two tokens: “default”, “emailHost”. If token given for outer group of some inner group, then token for inner group will not be produced, instead, token will be produced for outer group only.

 

Example 4:

 

<Regex token3='emailHost' token4='emailHostEnd'>

    ( [_a-zA-Z\d\-\.]+ )

    ( @  )

    ([_ a-z A-Z \d \-]+ 

    (\. [_ a-z A-Z \d \-]+ )+ )

</Regex>                

 

This match will produce two tokens: “default”, “emailHost”. Group4 is inside Group3, so, token for Group4 will not be produced, because it inside token given for outer Group3.

 

Example 5:

 

<Regex token0='email' token3='emailHost' token4='emailHostEnd'>

    ( [_a-zA-Z\d\-\.]+ )

    ( @  )

    ([_ a-z A-Z \d \-]+ 

    (\. [_ a-z A-Z \d \-]+ )+ )

</Regex>                

 

This match will produce one token: “email” Groups 4 and 3 are inside of Group0 (whole match), so, tokens for Group4 and Group3 will not be produced, because it inside token given for outer Group0 (whole match).

 

Attribute: innerScheme, type: string, case-sensitive, scheme reference.
Causes parser to switch inside specified scheme to parse matched text. When parser find rule with inner scheme, it will switch inside new scheme, parse that text using inner scheme rules, jump over parsed text, and switch back to scheme it were. innerScheme  attribute is incompatible with token0..N attributes. For description of schemes nesting feature, see Schemes nesting section.

 

Also, rule can refer any scheme from other SSL document from TLMDEditDocument.SyntaxSchemes collection using syntax like this: innerScheme =”OtherDoc.SomeScheme”;  For example:

 

<!--Strings scheme -->

<Scheme name='String' defaultToken='string'>

    <!—Will highlight emails inside string literals -->

    <Regex token0='email'>

        [_a-zA-Z\d\-\.]+ 

        @ 

        ([_ a-z A-Z \d \-]+ 

        (\. [_ a-z A-Z \d \-]+ )+ )

    </Regex>                

</Scheme>

 

<!—- Text inside two quotes will be parsed by String scheme rules  -->

<Regex innerScheme='String'> 

    &quot; (.*?\\ &quot; )*? &quot;

</Regex>

 

<!—- Text inside two ‘’ will be parsed by String scheme 

     from other XML document in TLMDEditDocument.SyntaxSchemes 

     collection, named ‘JavaScript’ -->

<Regex innerScheme='JavaScript.String'> 

    &apos; (.*?\\ &apos; )*? &apos;

</Regex>

 

Attribute: priority, type: Integer
This property gives priority for this rule on parsing text, acceptable for several rules. For example:

 

<!-- defaultToken='string': all text inside will be green -->

<Scheme name='String' inherit='Text' defaultToken='string'>

    <Regex token0='escaped' regex='\\[a-z &quot; ]' />    

    <Regex token0='escaped' regex='\\0x[a-fA-F0-9]+' />

</Scheme>

 

<!-- defaultToken='badString': all text inside will be red -->

<Scheme name='BadString' inherit='String' defaultToken='badString'/>

 

<!—- You can inherit this scheme to highlight C++ string literals -->

<Scheme name='StringFind'>

    <!—- Text started from “ blah-blah .. 

         may be closed or unclosed string literal -->

 

    <!—- First, we will check for good (closed) string. -->

    <Regex innerScheme='String' priority='10' <!—- Big priority ?  > 

        &quot; (.*?\\ &quot; )*? &quot;

    </Regex>

 

    <!—- Second, we will check for bad (unclosed) string. -->

    <Regex innerScheme='BadString' <!—- priority=0 (default) ? > 

        &quot; (.*?\\ &quot; )*? .* $ 

    </Regex>

</Scheme>

 

Attribute: innerContentGroup, type: Integer
Gives group number for rule’s regexp used to get token “contents” for further syntax parsing. For more, see “Syntax Blocks” section For example:

 

<!-- All preprocessor text will go as one token 'preprocessor', 

     with contents taken from matched group1 -->

<Regex innerScheme='Preprocessor' innerContentGroup='1' priority='10' >

    ^ \s* \# ([a-zA-Z]+) .* $

</Regex>

 

<!-- We will fold text inside preprocessor if/endif -->

<SyntaxBlock capture="true">

    <!-- Here: ‘preprocessor’ is token, ‘if’ is token contents -->      

    <Start> 

        [ preprocessor:if  preprocessor:ifdef  ]

    </Start>

    <End> 

        [ preprocessor:ifend  preprocessor:endif ] 

    </End>

</SyntaxBlock>