Menu

#2884 Canvas placing on AI/MP Model objects not working properly anymore

2024.1
Started
Low
2024-11-03
2024-04-20
No

Hi there,

TL;DR
Something broke with the new OSG pager threads, when running with --props:/sim/rendering/database-pager/threads greater 1.

The offending commits are probably the ones related to the OSG paging and Scenery instantiation in combination with threading:

flightgear: ab1828365 (Enable /sim/rendering/database-pager/threads, 2024-02-24)
simgear:    6a2a33bc  (Implement instancing for scenery objects, 2024-02-24)

Background
in 2020 I added code to the C182 to have custom registration painted onto a model.
At the time of writing the addition 2020, that worked [1].
Also, when implementing that in november 2023 for the C172 and DA40NG I did not see this issue[2].

Problem description
Now, somehow since some time, adjusting the registration-canvas on remote planes is broken in a very curios way (on fgfs 2020.4):
Somehow for the first instance the registration is not shown.
For the other instances, all works fine.

Details
The interesting thing is, that the canvas properties always show a "match"; it reports, that the canvas is indeed matching the model selected.
And also the actual content of the canvas shows fine if i draw it additionally to a canvas window (screenshot in the github ticket linked below).

I did a torough analysis and bisect and documented the results and tests in my github ticket:
https://github.com/HHS81/c182s/issues/584#issuecomment-2063950321


Workaround
When disconnecting and reconnecting the first instance to MP, it starts to work again, and the first instance sees the correct registration on the remote model. It seem to not matter if this is a local LAN test or via MP-Servers.
Setting "--props:/sim/rendering/database-pager/threads=1" with a recent clean compiled next fixes the issue.

Discussion

  • Stuart Buchanan

    Stuart Buchanan - 2024-05-01

    Hi Benedict,

    Thanks for the detailed analysis you did. I think I've tracked this down, but I'm not 100% sure. Could you test the attached patch with multiple OSG threads and see if it resolves the problem for you.

    The background is that the Nasal loader isn't threadsafe, and I think can end up over-writing itself.

     
  • Stuart Buchanan

    Stuart Buchanan - 2024-05-01
    • status: New --> Started
     
  • Benedikt Hallinger

    Hi Stuart,
    thanks for investigating.

    Recompiled next with the patch applied as of:

    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
            modified:   src/Scripting/NasalModelData.cxx
            modified:   src/Scripting/NasalModelData.hxx
    
    
    Compiled:    2024-05-02 15:13:32
    fgmeta:      4c09a9c      -/Ticket_2878 (local)
    flightgear:  941bc8dda    origin/next (https://git.code.sf.net/p/flightgear/flightgear)
    fgdata:      71dcf80dd    origin/next (git://git.code.sf.net/p/flightgear/fgdata)
    simgear:     41af35f2     origin/next (https://git.code.sf.net/p/flightgear/simgear)
    

    Results:

    • Confirming still working with the patch applied and threads=1
    • Unfortunately with threads=2 and threads=4 it does not work but exhibit the described behavior :(
     
  • Stuart Buchanan

    Stuart Buchanan - 2024-05-24

    Hi Benedikt,

    I recently pushed a number of fixes for multithreading that you might want to try, but I'm not optimistic that they will have fixed the problem as none of them are related to Nasal or Canvas. :(

    However, I also pushed fixes to cmake to enable ThreadSanitizer (aka TSAN) which may make it easier to diagnose.

    If you have the time an inclination, could you create a build with ENABLE_TSAN=ON (you'll need to build both simgear and flightgear), repro the problem, and then send me the logs - there should be lots of information about thread race conditions report to STDERR.

    You might find this page useful: https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags

    In particular, you can set the logpath to something other than STDERR.

    Alternatively, could you send me a more detailed repro scenario - I've tried reproing this with MP loopback on the c182s but the registration looks OK to me.

    Thanks,

    -Stuart

     
  • Stuart Buchanan

    Stuart Buchanan - 2024-05-24
    • assigned_to: Stuart Buchanan
     
  • Benedikt Hallinger

    Hey Stuart,
    detailed reproduction steps and also a very detailed analysis is in my github ticket: https://github.com/HHS81/c182s/issues/584#issuecomment-2063950321

    • basicly launch a first instance with added commandline in your launcher: "--multiplay=in,10,127.0.0.1,5010 --multiplay=out,10,127.0.0.1,5011 --callsign=T-EST1 --airport=LOWI --parking-id=GA05"
    • and then a second one (separate shell), with "--multiplay=in,10,127.0.0.1,5011 --multiplay=out,10,127.0.0.1,5010 --callsign=T-EST2 --airport=LOWI --parking-id=GA06

    Basicly, the Problem occurs, when i launch two instances connecting trough real multiplayer (so real to dedicated instances, but it is OK to have them in one machine. It did also appear in real MP environment.

    I try the TSAN thing

     
  • Benedikt Hallinger

    I try the TSAN thing

    Turns out I cant. This is overloading my system and the process runs oom.
    Maybe you have more luck?

     
  • Benedikt Hallinger

    But without TSAN, it looks somewhat better now as of:

    Compiled:    2024-05-24 21:21:53
    fgmeta:      4c09a9c      -/Ticket_2878 (local)
    flightgear:  f854cef91    origin/next (https://git.code.sf.net/p/flightgear/flightgear)
    fgdata:      b7c5dc404    origin/next (git://git.code.sf.net/p/flightgear/fgdata)
    simgear:     44d4eeba     origin/next (https://git.code.sf.net/p/flightgear/simgear)
    

    With:

    --props:/sim/rendering/database-pager/threads=<p> Result
    p=1 5 tests OK (which was also good before)
    p=2 5 tests OK (which was broken before already)
    p=3 5 tests OK
    p=4 1 test BAD: start-segfault; 1 BAD (not showing, as described above); 3 tests good
    unset 4 tests OK, 1 test bad

    BUT, even with p=2, when disconnecting and reconnecting, the rejoined plane does not show the canvas in the remained-instance. With p=1 this is not the case, there reconnecting works.

    So to summarize, the case when both instances see a remote plane the first time, it is way better.
    But rejoining reliably triggers the issue; while with p=1 all is fine, always.

     
  • ranguli

    ranguli - 2024-11-03
    • Milestone: 2020.4 --> 2024.1
     

Log in to post a comment.

MongoDB Logo MongoDB