A page for a join project of Yoav Freund and Kresimir Mihic.
The goal of this project is to demonstrate the advantages of using an economics model and online learning to the problem of scheduling jobs in a large computer cluster, such as a data center.
The basic idea is that each task is associated with a revenue, the revenue is of the form:
If Task finished withing time T, then revenue is R Otherwise, the revenue is zero.
On the other hand, there is a cost for running the computers. The cost has two components:
- Fixed cost: amortized cost of hardward and space, X dollars per hour, regardless of utilization.
- Power cost: Cost of electricity for running the computers and for colling the computers: Y dollars per KiloWatt* Hour.
We assume that we have several streams of jobs. Jobs in each stream are similar to each other. We currently plan to use three job streams taken from the PARSEC benchmark suite:
- dedup - enterprise storage application; already in use in commercial applications
- ferret - search engine for non-text document data types (e.g. images)
- freqmine - data-mining application targeting e-commerce; already in use in commercial applications
The starting point, or straw-man is that the computation resources are not shared. In other words, jobs from each stream, which can be thought of as a client of the data center, runs on a separate processor.
The goal of our work is to discover ways in which, by sharing resources among streams, there is an overall gain. This will be measured in terms of increased profits, i.e. increase in reneue (number of high paying jobs completed / #CPUs * time) and/or decrease in variable cost (reduced power consumption).
The experts framework will be used as follows: each expert will represent a scheduling policy (which types of jobs to run on the same CPU). The hedging algorithms will be used to find the best policies through trial and error.