Despite saying in #385 that my focus was on UX more than raw performance, I've experimented some more with gscan2pdf's zooming, and also run it under sysprof to collect profiling data. Here's what I discovered:
Number 5 I've been struggling to make sense of, because once you've reached that level of zoom each level beyond it adds nothing to the view except blank canvas. The same number of page elements is being rendered each time, just smaller. In principle it might take the same amount of time to render at each level, but it really shouldn't take longer.
The fact that render time continues to increase makes me suspect the canvas area around the page is rendered as part of the page itself. (IOW, zooming out increases the virtual page size, and thus the total area that has to be rendered, even once we've moved beyond the actual page boundaries.)
Given that a larger page apparently == a longer rendering time, it's possible some efficiency could be gained from drawing/handling the background canvas and the content area separately: render only within the bounds of the content area, and display that rendered view on the canvas which (outside of that region) can be drawn with just a simple background fill.
Doing that, progressive zoom-outs beyond zoom-to-fit would leave the actual content region unchanged, but render it into a smaller area each time. Rather than the area to be rendered growing larger each time, relative to a progressively smaller content area.
Either way, by not including the canvas background/padding in the area to be rendered, with each successive zoom outward the actual content region should hopefully draw, if not faster, at least no slower.
Number 6 is also a curiosity. Granted, call time data isn't immediately helpful in understanding why so much time is spent in fontconfig. (Especially since I assume you're not making library calls directly, but instead are calling higher-level APIs that indirectly result in calls to libfontconfig.) However, the simple fact that every zoom change incurs the full penalty points to a prime opportunity to improve performance with the addition of caching. Given the time spent in fontconfig preparing a page view, is all of the data transient/useless beyond that single rendering? Or is there anything that could possibly be retained and reused from one zoom level to the next, so it doesn't have to be recomputed from the ground up every time?