Web Hosting

What is Data Hoarding and How it Works

Data hoarding

Hoarding is the process of preloading data into the cache in anticipation of a disconnection, so that the client can continue its operation while disconnected. Hoarding is similar to prefetching used in file and database systems to improve performance.

However, there are important differences between hoarding and prefetching. Prefetching is an ongoing process that transfers to the cache soon-to-be-needed files during periods of low network traffic. Since prefetching is continuously performed, in contrast to hoarding, keeping its overhead low is important. Furthermore, hoarding is more critical than prefetching, since, during disconnections, a cache miss cannot be serviced.

data hoarding
Data Hoarding

Thus, data hoarding tends to overestimate the client’s need for data. On the other hand, since the cache at the mobile client is a scarce resource, excessive estimations cannot be satisfied. An important parameter is the unit of hoarding, ranging from a disk block to a file, to groups of files or directories. Another issue is when to initiate hoarding. The Coda file system [KS92) runs a process called hoard walk periodically to ensure that critical files are in the mobile user’s cache.

The decision on which files to the cache can be either

(a) assisted by instructions explicitly given by the user or

(b) taken automatically by the system by utilizing implicit information, which is most often based on the past history of file references.

Coda [KS92) combines both approaches in deciding which data to hoard. Data are prefetched using priorities based on a combination of recent reference history and user-defined hoard files.

A tree-based method is suggested in ITLA-1-951 that processes the history of file references to build an execution tree. The nodes of the tree represent the programs and data files referenced. An edge exists from parent node A to child node B, when either program A calls program B, or program A uses file B. A GUI is used to assist the user in deploying this tracing facility to determine which files to hoard.

Besides clarity of presentation to users, the advantage of this approach is that it helps differentiate between the files accessed during multiple executions of the same program. Seer tKue94] is a predictive caching scheme based on the user’s past behaviour. Files are automatically prefetched based on a measure called semantic distance that quantifies how closely related they are.


The measure chosen is the local reference distance from a file A to a file B. This distance can be informally be defined as the number of file references separating two adjacent references to A and B in the history of past file references.

Show More
Back to top button