Menu

#252 TCoolEdit: Syntax highlighting overhaul

8
pending
1
2025-10-04
2024-10-18
No

Inspired by Sebastian Ledesma's (@sebas_ledesma) work on new syntax highlighting parsers [discussion:d005538c39][feature-requests:#250][feature-requests:#251], I started reviewing and improving the relevant TCoolEdit code in the CoolPrj code base (see [discussion:d005538c39#b357]).

Although this was initially only exploratory and experimental, I have decided to submit my changes, as requested by Sebastian. My further support may be limited, but hopefully my changes will benefit him and others in using, maintaining and further developing the neat syntax highlighting feature.

Changes I have made so far [r7154]:

  • New virtual overload for TSyntaxParser::ParseLine replaces the old unsafe signature. The new overload has a safe signature, now returning a TTextBlocks container based on std::map, rather than taking an unsafe buffer pointer. The cookie now has an alias type TCookie and is taken by non-const reference as an in-out parameter. The default implementation implements a compatibility layer for old derived parsers still using the old signature. It also produces diagnostics for analysing the output from the old parsers, useful for validation and optimisation.

  • Optimised TCoolEditWnd::DrawLine — now computes the optimal spans for rendering the line based on the (possibly suboptimal) TTextBlocks given by the parser, ensuring the minimal number of drawing calls are made. The optimisation is linear and highly efficient, and is implemented as a generic algorithm in a local lambda. Diagnostics useful for analysis and validation are generated. This solution also removes redundancy in the old impementation.

  • TParserCookies, a std::unordered_map alias, is now used in TCoolEditWnd for simpler and safer management of syntax highlighting cookies (parser state per line). This change replaces a lot of ugly explicit memory management, and also eliminates an old workaround related to the old cookie handling (see discussion). The new implementation is much cleaner and should also be more efficient.

  • TColorIndex is now the name of the previously anonymous enum for the syntax highlighting colour indexes (e.g. COLORINDEX_NORMALTEXT). Likewise, internal alias TCharPos (for int) has been added for use in structure TTextBlock, which represents a syntax element marker in the old implementation. TCharPos serves as the std::map key in the new implementation.

  • TCSyntaxParser overhaul: This parser has been thoroughly simplified and refactored in modern C++ style, adhering to our Coding Standards, with a few issues fixed [bugs:#581][bugs:#582]. Hopefully, it should also be just as efficient, if not more, as it generates far fewer superfluous blocks, if any.

Edit: Additional refinements in this overhaul:

  • NEW: TCoolEdit: Syntax highlighting for ".rh" files [feature-requests:#250].
  • NEW: TCoolEdit: Syntax highlighting for INI files [feature-requests:#251].
  • NEW: TCoolEdit: Syntax highlighting for Unidiff patch files [feature-requests:#254].
  • NEW: CoolPrj: TParserSelector::ParserInfo: Added entry for HTML Help Project Files (*.hpj) [r7163].
  • NEW: TCoolEdit: Syntax highlighting for Doxygen configuration files [feature-requests:#253][r7176].
  • NEW: TCoolEdit: Syntax highlighting for Cpp2 files [r7197].
  • NEW: TCoolEdit: Syntax highlighting for C# files [r7200].
  • NEW: TCoolEdit: Base class TCppParserBase for TCppParser and TRcParser, for code reuse [r7214].
  • NEW: TCoolEdit now renders ligatures [r7381].
  • NEW: TCoolEdit::Copy [r7427].
  • NEW: CoolDemo: CoolEdit Benchmark [r7428].
  • CHG: TCoolEdit syntax highlighting overhaul [r7154].
  • CHG: TIniSyntaxParser: Removed redundant qualification of TCoolTextWnd::TTextBlocks [r7155].
  • CHG: CoolPrj: TSyntaxParser::TCookie has now been widened to UINT_PTR, for consistency [r7156].
  • CHG: CoolPrj: Renamed private function TCoolTextWnd::GetParseCookie to GetParserCookie, for consistency [r7156].
  • CHG: CoolPrj: TSyntaxParser::ParseLine: Corrected documentation [r7157].
  • CHG: CoolPrj: Updated TParserSelector::ParserInfo with the "*.hhp" pattern [r7167].
  • CHG: CoolPrj: Cleaned up TParserSelector::ParserInfo [r7175].
  • CHG: CoolPrj: Renamed CParserCreator to CppParserCreator, for consistency [r7180].
  • CHG: CoolPrj: TIniSyntaxParser and TCppParser refactoring [r7181].
  • CHG: TCoolTextWnd::GetParserCookie: Optimised the implementation [r7187].
  • CHG: TCoolTextWnd::DrawLine: Optimised the cookie handling [r7188].
  • CHG: CoolPrj: Removed old unsupported parsers for syntax highlighting [r7205].
  • CHG: CoolPrj: Removed "regexp.cpp" and rewrote TRegexSearchEngine using std::regex [r7205][r7207][r7211].
  • CHG: CoolPrj: Removed "fpattern.(h|cpp)" [r7209].
  • CHG: CoolPrj: Removed TParserSelector [r7214][r7226], for simplicity.
  • CHG: CoolPrj: Refactored TSyntaxParser, for clarity and simplicity [r7282].
  • CHG: CoolPrj: TEditPropDlg refactoring and improvements [r7289].
  • CHG: TCoolTextWnd overhaul continued [r7408].
  • BUG: CoolPrj: THtmlSyntaxParser does not highlight '<' correctly [r7151].
  • BUG: CoolPrj: Parser factory functions are exported [r7165].
  • BUG: TCoolEdit::KeyUnTab violates assertion [r7191].
  • BUG: TCoolEdit: Debug assertion fires in TSyntaxParser for isalnum etc. [bugs:#285].
  • BUG: TCoolEdit: Rendered characters are partly chopped [bugs:#525].
  • BUG: CoolPrj pollutes the global namespace [bugs:#526].
  • BUG: TCoolEdit: Text Block clipboard format is inconsistent [bugs:#551].
  • BUG: TCoolTextWnd::TStreamSelection may truncate, add garbage or cause crash [bugs:#553].
  • BUG: TCoolEdit::DeleteWordBack fails [bugs:#554].
  • BUG: CoolPrj auto-links static library in dynamic linking build mode [bugs:#563].
  • BUG: TCoolEdit::DeleteWord crashes [bugs:#555].
  • BUG: TCoolEdit renders whitespace as operators [bugs:#581].
  • BUG: TCoolEdit renders escape sequence incorrectly [bugs:#582].
  • BUG: TCoolEdit renders multiline syntax elements incorrectly [bugs:#583].
  • BUG: TCoolEdit highlights numbers incorrectly [r7226].
  • BUG: OWLMaker: Font change in the Syntax Highlighting settings is ignored [bugs:#588].

Related

Bugs: #285
Bugs: #525
Bugs: #526
Bugs: #551
Bugs: #553
Bugs: #554
Bugs: #555
Bugs: #563
Bugs: #580
Bugs: #581
Bugs: #582
Bugs: #583
Bugs: #588
Commit: [r7151]
Commit: [r7154]
Commit: [r7155]
Commit: [r7156]
Commit: [r7157]
Commit: [r7163]
Commit: [r7165]
Commit: [r7167]
Commit: [r7175]
Commit: [r7176]
Commit: [r7180]
Commit: [r7181]
Commit: [r7187]
Commit: [r7188]
Commit: [r7191]
Commit: [r7197]
Commit: [r7200]
Commit: [r7205]
Commit: [r7207]
Commit: [r7209]
Commit: [r7211]
Commit: [r7214]
Commit: [r7226]
Commit: [r7282]
Commit: [r7289]
Commit: [r7381]
Commit: [r7408]
Commit: [r7427]
Commit: [r7428]
Discussion: CoolEdit issues
Discussion: CoolEdit issues
Discussion: d005538c39
Discussion: CoolEdit in C++Builder 10.x
Discussion: CoolEdit in C++Builder 10.x
Discussion: CoolEdit in C++Builder 10.x
Feature Requests: #250
Feature Requests: #251
Feature Requests: #253
Feature Requests: #254
Feature Requests: #266
Feature Requests: #268
Feature Requests: #269
Wiki: Coding_Standards
Wiki: OWLNext_Roadmap_and_Prereleases

Discussion

  • Vidar Hasfjord

    Vidar Hasfjord - 2024-10-19
    • summary: TCoolEdit syntax highlighting overhaul --> TCoolEdit: Syntax highlighting overhaul
     
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-10-21

    BUG: CoolPrj: TRsrcSyntaxParser does not parse lines of length 1 [r7184].

    Also, the TRsrcSyntaxParser does not recognise ACCELERATORS, STRINGTABLE and TEXTINCLUDE as keywords.

    Looking at the implementation it looks suspiciously like the old C++ parser implementation, just with a different keyword list. I've now tried to copy over the new cleaned up C++ parser implementation, just with the keyword list changed (including adding the omissions mentioned above), and it seems to work well.

    I have now committed this new RC parser [r7185], but note that we ideally should get rid of this duplication of code by refactoring the code for reuse between the parsers, e.g. by using a common base class that has the keyword list as a compile-time parameter. I have left that as a \todo item in the file documentation.

    Edit: This has now been implemented. A base class TCppParserBase for TCppParser and TRcParser has been added, for code reuse [r7214].

     
    ❤️
    1

    Related

    Commit: [r7184]
    Commit: [r7185]
    Commit: [r7214]


    Last edit: Vidar Hasfjord 2024-11-16
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-10-22

    BUG:TCoolEdit renders multiline syntax elements incorrectly [bugs:#583].

    In the new code, an initialisation order bug affecting this issue was fixed in [r7186], but this bug isn't the only cause of the issue, since the issue applies to the old code as well, and fixing the bug in the new code doesn't solve the issue.

    The wider problem seems to be with the way cookies are handled in TCoolTextWnd. Notably, in DrawLine, a line is not parsed if it is empty. Changing the code so that empty lines are parsed seems to fix the issue, but the question remains: why?

    GetParserCookie should give us the correct cookie for the preceding line, even if the preceding line is empty, since GetParserCookie will call ParseLine for any line that needs its cookie generated. It doesn't skip empty lines, and it will even iterate over the whole file, if necessary.

    I suspect the issue is that the cookies are not properly updated as lines are edited. Editing may invalidate the cookie container, either if the change affects the rendering of the following line (as is the case for extended comments), or if you add or remove lines (in which case the line indexes change, thereby invalidating the cookie container from that line onwards) . I'm not sure this is properly handled, if at all. I tried saving and reopening the test file, thereby regenerating the cookies, and after that the extended comment did indeed render properly, indicating that the issue lies with stale cookies, as suspected.

    Edit: This issue has now been fixed. See [bugs:#583] and [r7189][r7192].

     

    Related

    Bugs: #583
    Commit: [r7186]
    Commit: [r7189]
    Commit: [r7192]


    Last edit: Vidar Hasfjord 2024-11-16
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-10-25

    With the addition of TCppParserBase (in "lang.h") introduced in [r7214], it is now easier than ever to create a new parser based on a C-like language, provided that it (more or less) shares the same tokens as C++, including number, string and character literals, escape sequences, line continuation (line ending with backslash), "#..." preprocessor directives, "//..." line comments, and "/*...*/" extended comments. You can just derive from TCppParserBase and override virtual function IsKeyword to indicate the words that should be highlighted as keywords.

    TCppParser and TRcParser now do just that.

    However, in [r7214], I did more changes as well. A major change was to effectively retire TParserSelector from the API. It is still alive, but it is now just a bloated implementation detail in "lang.cpp", hence given a trailing underscore (TParserSelector_) to indicate that it is not part of the library API. However, it is far too elaborate in its current role; the implementation can do without it and be much simplified. On the other hand, I've left it intact, in case there is demand for bringing it back to the API.

    Just to clarify, its purpose is to simplify parser selection for clients that provide their own, in addition to, or in place of, the built-in parsers. By passing true (the default) for initDefTypes in the constructor, the class will initialise its list of parsers from the built-in list. If false is passed, it will start empty. Then the apparent idea is that the user should call AddParser to add custom parsers, by passing a parser factory function for the parser, along with the appropriate file filter pattern (which is a string, listing filters separated by semicolon). When this is all done, the client can then call CreateParser, passing a file path, and have the appropriate parser for that file type returned, if any.

    All that said, I suspect this is overkill for the few (quite possibly zero) applications that implement a custom parser. I bet they are likely to just test the file type themselves and simply create their custom parser directly using its contructor, rather than go through all these hoops to use the TParserSelector functionality. And to use the built-in parsers, they'll just simply call ChooseSyntaxParser (declared in "lang.h").

    Okay, that was my long awkward exposition. Now, do you want TParserSelector back or not?

    If not, I may further simplify "lang.cpp" by sending TParserSelector_ into the woods.

    Edit: TParserSelector_ is no more [r7226].

     

    Related

    Commit: [r7214]
    Commit: [r7226]


    Last edit: Vidar Hasfjord 2024-10-27
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-10-27

    The original C++ parser did a lousy job with numbers. This has been much improved in [r7226]. Unlike the old, the revised implementation properly supports scientific format, digit separators and suffixes (built-in and user-defined).

    Here is a test screenshot; new on the left, old on the right:

    TCppParserBase number highlighting screenshot

    Please review and test!

     
    👍
    1

    Related

    Commit: [r7226]


    Last edit: Vidar Hasfjord 2025-09-11
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-11-05

    In [r7289], I overhauled the TEditPropDlg class for the syntax highlighting configuration dialog. In the following screenshot the old dialog is on the left, and the revamped dialog is on the right.

    TEditPropDlg improvements screenshot

    A colour sample along with the colour name is now shown for the items in the colour drop-down boxes.

    In addition to the drop-down lists, I have added detail buttons that open the common colour dialog, so that any custom colour can be selected. After selecting a custom colour, it will appear in the drop-down list with the name "Custom".

    The default colour is shown in the drop-down lists by appending "(Default)" to the colour name. The name of the currently active colour is shown in bold font. This is the "current" colour that was active when you opened the dialog, or since you switched category. That is, if you change the selected colour from the current colour and then switch to a different category, then if you return to the same category, your selected colour is now the current colour shown in bold.

    The dialog has generally been enlarged to better accommodate long font and colour names. And the dialog now uses the operating system message font (which is slightly larger than the old Tahoma dialog font).

    Category items "Normal Text" and "Whitespace" have swapped places, so that the former appears first in the list, for clarity and consistency. Note also that the font name and size fields are now disabled for other categories than normal text, since these base font settings are shared. Font variants bold, italic and underline can be individually set for each category, though.

    Note: There is still a lingering bug in the colour drop-down boxes, where sometimes multiple custom colours are listed at the bottom of the list (max two should be shown; the selected custom colour, as well as the default colour, if that too is a custom colour not in the list of named colours). If you spot the issue, let me know. Other than that, the dialog is fully functional now — after the bug fix in OWLMaker [bugs:#588] font changes also work, and are restored properly from the configuration file.

    Edit: The bug has been found and fixed [r7311].

     
    👍
    1

    Related

    Bugs: #588
    Commit: [r7289]
    Commit: [r7311]


    Last edit: Vidar Hasfjord 2025-08-29
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-11-19

    In [r7381], support for coding fonts with ligatures was added. For example, when using the font Cascadia Code, the double-character operator token "!=" will be rendered as a double-width not-equal operator "≠", the double-character arrow token "->" will be rendered as a double-width right-arrow "→", and a series of dashes "----" will be rendered as a solid line "⎯⎯". Other double-character tokens, like comparison, increment and decrement operators, are also rendered as double-width single characters.

    In the screenshot below, Cascadia Code is used in the first window from the top, and Cascadia Mono (without ligatures) is used in the second window.

    TCoolEdit ligatures screenshot

     
    👍
    1

    Related

    Commit: [r7381]


    Last edit: Vidar Hasfjord 2025-08-29
  • Vidar Hasfjord

    Vidar Hasfjord - 2024-11-23

    After all the fancy refactoring I've done for CoolEdit — rewrites using nice modern C++, while simplifying the code, correcting bugs and eliminating old inefficiencies and workarounds — you would hope to see some performance gains, right?

    Me too. I was very interested to see how my changes have affected the performance. So I added a little benchmark to our CoolDemo example. Alas, any hopes of performance gains were dashed. The new code is several times slower than the old!

    CoolEdit Benchmark screenshot

    In the screenshot above, the results for the old code are on the left, and the results for the new code are on the right. The results on the top are for the first run (cold cache), while the results on the bottom are the best results after a few reruns.

    Notable is the much slower reading speed from file. The new code is 16-20 times slower. The main reason for this is that I replaced the old reading code by (notoriously slow) standard streams to enable support for Unicode. We now read and write all the UTF encoding formats, and with any line ending convention. However, this is where most gains can be had by optimising the code.

    Sadly, the use of std::map and std::unordered_map for editor data structures, rather than the old raw arrays, has also affected performance negatively. Looking at the parsing and rendering speed (paging up to the top through the whole file), the new code is about 2.5 times slower. For text insertion (paste), the new code is 5 times slower. And for jumping around, the new code is 6-7 times slower.

    (Edit: It turns out the code performs well. The slowdown is all to do with replacing ExtTextOut by DrawTextEx. See my later post.)

    However, note that the new code includes corrections, such as refreshing the text correctly, which the old code didn't do properly. This means the new code may do significantly more work for correctness. Nevertheless, there is opportunity for a lot of optimisation here.

     

    Last edit: Vidar Hasfjord 2025-08-29
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-04-17
    • Labels: CoolPrj --> CoolPrj, Cleanup
     
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-08-15

    In my last post, I discussed performance, and I was concerned I would have to do some optimisation work to regain speed on the trunk (version 8), as compared to the old code (version 7). However, it turns out the major performance degradation in paging speed was all down to the replacement of ExtTextOut by DrawTextEx to get support for font ligatures (see earlier post).

    Notably, if you do the same replacement on "branches/7-coolprj-dev" (development branch based on the old code with some fixes and enhancements, including the slow Unicode file operations merged from the trunk), the trunk version is actually faster.

    See results below.


    CoolEdit Benchmark Results

    File: https://sourceforge.net/p/owlnext/code/8323/tree/trunk/source/coolprj/cooledit.cpp (5141 lines)


    CoolDemo 7.0.19.8313-prerelease (branches/7-coolprj-dev)

    Note: TCoolEdit was modified to have no extra line spacing (LineSpacingPercent = 0), just like the trunk version, to make tests comparable.

    1. Opening the file took 219 ms.
    2. Jumping to the last line took 4 ms.
    3. Selecting and copying everything took 1 ms.
    4. Pasting everything took 5 ms.
    5. Paging up to the top took 807 ms.
    6. Undo took 6 ms.

    The whole benchmark took 1045 ms.


    ExtTextOut replaced by DrawTextEx

    Note: This levels the playing field, since the trunk version also uses DrawTextEx.

    1. Opening the file took 219 ms.
    2. Jumping to the last line took 6 ms.
    3. Selecting and copying everything took 1 ms.
    4. Pasting everything took 8 ms.
    5. Paging up to the top took 2022 ms.
    6. Undo took 7 ms.

    The whole benchmark took 2266 ms.


    CoolDemo 8.0.5.8323-prerelease (trunk)

    1. Opening the file took 210 ms.
    2. Jumping to the last line took 6 ms.
    3. Selecting and copying everything took 1 ms.
    4. Pasting everything took 8 ms.
    5. Paging up to the top took 1890 ms.
    6. Undo took 4 ms.

    The whole benchmark took 2121 ms.

    This makes the trunk version ~7% faster in this test.

     
    👍
    1

    Last edit: Vidar Hasfjord 2025-08-15
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-08-15

    Note that the particular benchmark in my last post is kind to the old code, since the test file is a C++ file, and "branches/7-coolprj-dev" has merged the new and improved C++ parser from the trunk. The new parser creates far fewer tokens than the old parser.

    When running benchmarks using old vs new parsers, the performance difference is much larger than shown previously. For example, see benchmark results below using the old resource file parser, which is identical to the old C++ parser except for different keywords.


    CoolEdit Benchmark Results

    File: https://sourceforge.net/p/lima-vva/code/2303/tree/trunk/Lima.rc (1860 lines)


    CoolDemo 7.0.19.8313-prerelease (branches/7-coolprj-dev)

    Note: ExtTextOut was replaced by DrawTextEx, to match the trunk version and level the playing field.

    1. Opening the file took 58 ms.
    2. Jumping to the last line took 4 ms.
    3. Selecting and copying everything took 1 ms.
    4. Pasting everything took 10 ms.
    5. Paging up to the top took 1587 ms.
    6. Undo took 7 ms.

    The whole benchmark took 1669 ms.


    CoolDemo 8.0.5.8323-prerelease (trunk)

    Note: The font size was increased to match the extra line spacing used by the version on "branches/7-coolprj-dev". This means that this test is slightly handicapped by having to draw more pixels.

    1. Opening the file took 52 ms.
    2. Jumping to the last line took 6 ms.
    3. Selecting and copying everything took 1 ms.
    4. Pasting everything took 10 ms.
    5. Paging up to the top took 913 ms.
    6. Undo took 4 ms.

    The whole benchmark took 988 ms.

    This makes the trunk version ~59% faster in this test.

     

    Last edit: Vidar Hasfjord 2025-08-15
  • Erwin Lotter

    Erwin Lotter - 2025-08-15

    Great! This could make a noticeable difference even on fast systems.

     
    👍
    1
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-08-15

    @elotter wrote:

    Great! This could make a noticeable difference even on fast systems.

    Thanks. It was a great relief to find that my code didn't cause unnecessary slowdown.

    Now, it remains to find a way to optimise DrawTextEx, or use it only when needed (for fonts that have coding ligatures). DrawTextEx has a lot of options, so maybe just finding the right flag to set is all it takes to get a nice speed bump.

    (Edit: Optimisations have now been made [r8334].)

    PS. Optimising the file operations should be done as well (replacing slow streams).

     

    Related

    Commit: [r8334]


    Last edit: Vidar Hasfjord 2025-08-17
  • Vidar Hasfjord

    Vidar Hasfjord - 2025-09-03
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -45,6 +45,7 @@
    
     - BUG: CoolPrj: Parser factory functions are exported [r7165].
     - BUG: TCoolEdit::KeyUnTab violates assertion [r7191].
     - BUG: TCoolEdit: Debug assertion fires in TSyntaxParser for isalnum etc. [bugs:#285].
    +- BUG: TCoolEdit: Rendered characters are partly chopped [bugs:#525].
     - BUG: CoolPrj pollutes the global namespace [bugs:#526].
     - BUG: TCoolEdit: Text Block clipboard format is inconsistent [bugs:#551].
     - BUG: TCoolTextWnd::TStreamSelection may truncate, add garbage or cause crash [bugs:#553].
    
     

    Related

    Bugs: #285
    Bugs: #525
    Bugs: #526
    Bugs: #551
    Bugs: #553
    Commit: [r7165]
    Commit: [r7191]

  • Vidar Hasfjord

    Vidar Hasfjord - 2025-10-04
    • status: open --> pending
     

Anonymous
Anonymous

Add attachments
Cancel





MongoDB Logo MongoDB