Data Cache

Brought to you by: jbfaden, rfriedel

#6 Data Cache

Status: open

Owner: Jeremy Faden

Labels: Core Defaults (1)

Priority: 7

Updated: 2013-01-04

Created: 2007-02-09

Creator: Reiner H. W. Friedel

Private: No

The Data Cache feature needs some extra controls which could be implemented as additional default settings. This is needed to address several issues:

1. When requesting very long time ranges, the data cache needed can become very large which then really slows things down.
2. When batch processing the Data cache goes to its maximum and then needs to swap out constantly, which slows things down (on the whole system, not just in the idl process).

It would be good to add the following controls via default options:

a. Set a maximum size of a given data chunk that will be stored in the cache. If data larger than this, do not store.

b. Option to always delete contents of data cache when a new timerange is set. This is needed for batch processing. (If new timerenge however is contained in the previous one, then of course do not delete the cache...)

c. Option to fully disable data cache - this must of course be transparent to the modules that use the cache, their calls to the data cache objects still work, but nothing is stored or returned.
This option would be great for debugging...

Discussion

Jeremy Faden - 2007-02-21

Logged In: YES
user_id=1578913
Originator: NO

caching is always hard to get right, which was part of the reason a core cache facility was introduced. These suggestions are great. Item c is tricky because we quickly got in the habit of using the cache to convey data from the reader to the plotter, and sometimes from the plotter to the slicer. So I think we might have to essentially combine b and c, which essentially says, for any dataset id, cache just one dataset. I'll take a look at the codes...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeremy Faden - 2007-02-21

Logged In: YES
user_id=1578913
Originator: NO

Here's what I propose:
(a) This is easy enough to implement, but it's also possible that something is not being garbage collected. I'll try to set up a model of this to see what's going on. I'd expect that if it's not, then a "heap_gc, /verbose" should indicate a bunch of datasets are being orphaned. Also we'd need to support the readers and plotters that use the cache to convey data. Note I rediscovered the "protected" keyword, which I think was to indicate this use, but is not used by any modules.
(b) This is clean and easily describable. I'll implement this right away.
(c) I think we should add an option "clear cache before drawing" to the "Develop" preferences. At the beginning of the drawing cycle, we clear the cache. And optionally after each panel is drawn, and the developer knows that slicing will fail.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeremy Faden - 2007-02-21

Logged In: YES
user_id=1578913
Originator: NO

(b) I'm implementing b as a limit-per-dataset count. This is just as easy to implement and provides more flexibility. For example, 1 is the batch behavior requested. 2 allows for jumping back and forth between two
timeranges. 5 is a reasonable limit to limit the clutter. 50 means no limit. (note 0 is not allowed because of reader-to-plotter communication)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeremy Faden - 2007-02-21

Logged In: YES
user_id=1578913
Originator: NO

another interesting thing I just came across, it looks like (b) should already be working for batch mode. Try setting up a batch, then reset the cache, then run the batch. You should have just one entry in there (per dataset id), the last one plotted.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeremy Faden - 2007-02-22

Logged In: YES
user_id=1578913
Originator: NO

I've added two defaults, under the "develop" tab:
"cache count limit" and "auto cache reset".

"cache count limit" limits the number of entries of a particular dataset id within a cache. This addresses request (b).

"auto cache reset" triggers cache resets for support of developers. Note "after each panel" will probably break most slicers. This addresses request (c).

I'm note sure what to do with (a). The cache can't simply reject a store request, because of the intra-module communication. The keywords protected for store and release for retrieve were intended for this purpose, but we'll need to make sure modules use these keywords before implementing (a).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeremy Faden - 2007-02-25

Logged In: YES
user_id=1578913
Originator: NO

I'm wondering if the problems you are seeing in (a) (very large datasets) are due not to the cache, but with the way the dataset is constructed. It's likely that the entire dataset is being copied with each bit of data that is appended to the dataset. Would you point me to the code?

If this is the case, then we should have a way for the space for the dataset be pre-allocated, and then the append just fills in the dataset.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link: