Research aims to cut cloud computing costs
|Cloud computing is as much as a blessing as it is a burden. Its almost limitless capacity to store and process information comes at a price, a very high one.
Social media sites such as Facebook and Flicker use cloud computing. With increasing user-generated data that needs to be maintained, so does the cost of doing it. Government agencies such as the Australian Taxation Office, Bureau of Statistics and Treasury are also heavy users of cloud computing services, and their costs are getting bigger. In fact, they spend a lot more than the websites mentioned above. An estimated one billion dollars could be saved if the Australian government develops a data center strategy - the core for cloud computing - for the next 15 years.
That is why researchers from Swinburne University of Technology's Center for Computing and Engineering Software Systems are looking for ways to reduce the high cost of internet data storage and retrieval in cloud computing. Using the Australian Research Council Discovery Project Grant, they are developing more cost-effective models for heavy users of cloud computing.
Professors Yun Yang and John Grundy (from Swinburne) and Dr. Jinjun Chen (now with the University of Technology, Sydney) have been exploring the management of raw data and intermediate datasets that are generated from processing this initial data. "The tradeoff is going to be between storage cost and computation cost," said Grundy. "Finding this balance is complex, and there are currently no decision-making tools to advise on whether to store or delete intermediate datasets, and if to store, which ones." To overcome this, the researchers have developed a mathematical model that factors in the size of the initial datasets, the rates charged by the service provider and the amount of intermediate data stored in the specified time. "The formula can be used to find the best deals for storing data in the cloud," noted Yang.
They have also developed an Intermediate Data-dependency Graph (IDG) that helps users decide whether they are better off spending money on storage or computation for intermediate datasets. "IDG records how each intermediate dataset is generated from the one before it and shows the generation relationship between them. This means if a deleted intermediate dataset needs to be regenerated, the IDG could find the nearest predecessor of the dataset. This can save computation cost, time and electricity consumption," Grundy added.
The researchers have been evaluating the solutions by simulating a pulsar survey used to crunch information from radio telescopes. "Searching for pulsars - rapidly spinning stars that beam light - is a typical scientific application. It generates vast amounts of data - typically at one gigabyte per second. That data will be processed and may be reanalyzed by astronomers all over the world for years to come," stated Yang. "We used the prices offered by Amazon cloud's cost model for this evaluation. For example, 15 cents per gigabyte per month for storage and 10 cents per hour for computation."
From one set of raw beam data collected by the telescope, the pulsar application generated six milestone intermediate datasets. The model generated three different cost scenarios. The minimum cost for one hour of observation data from the telescope and storing intermediate data for 30 days was $200; for storing no data and regenerating when needed, $1000; and for storing all intermediate data, $390. This gave the researchers options which data to keep and which to delete. "We could delete the intermediate datasets that were large but with lower generation expenses, and save the ones that were costly to generate, even though small," Yang said.
These are only a few of the solutions the researchers have produced. To cater to different sectors, the group is also working on models that will allow users to determine the minimum cost on the fly and as frequently as desired.