e-Science challenges in Astronomy and Astrophysics
The astronomical community is intricately linked with e-Science by the nature of the research conducted. Efforts to formalise this relationship are highlighted by the creation of Virtual Observatory (VO) networks both locally (AusVO) and internationally (the US-based NVO, for example). The goal of these networks is to present astronomical archives as an integrated and interoperating virtual observatory, as well as to develop and deploy the tools necessary for international utilisation of the archives. This has resulted in a set of data standards being published by the International Virtual Observatory Alliance (IVOA). While much has been achieved, the VO networks can still be considered a fledgling industry in some sense as there are still many challenges to be met, especially as we move towards peta-scale data-sets. Furthermore, e-Science has now moved beyond the management of observational data to broadly encompass all computationally intensive areas of astronomy and astrophysics. The focus of the workshop will be on e-Science challenges within astronomy and astrophysics, the innovations that these challenges are producing now and the innovations that will be required in the near future. Broadly speaking this covers the creation, processing and visualization of data (via either computation or observation), which in turn relates to data-management and accessibility (and, by extension, the VO).
Computational Science and Engineering Workshop
Computational Science and Engineering is the area of scientific research that merges science, mathematics, and computing to produce computer models and simulations that allow users to study complex and challenging scientific behavior. There are a growing number of challenges that computational code developers face in today’s computational climate. The complexity of the science and mathematics coupled with heterogeneous computer architectures with inadequate program development tools complicates software development. New processing hardware poses the need to re‐think algorithms in terms of massively, probably asynchronous, parallelization schemes. The aging legacy libraries on which most science user driven requirements are dependent are long overdue for replacement with software suitable to handle today’s more multifaceted computer architectures. To adapt to the current generation of kilo‐core processing, new algorithms, software designs and advanced parallel programming techniques need to be developed. Consequentially, the code development process is difficult. This workshop is aimed at bringing together computational scientists and engineers of different disciplines to discuss new issues, tackle complex problems and find advanced solutions propagating new trends in Computational Science. This will afford the opportunity to enable closer cooperation between computational modelers to share results attained from model developments and applications.
High-Performance Computing in the Life Sciences
Today, there are a variety of parallel and distributed high-performance computing platforms including multi-core architectures, GPGPUs/GPUs, clusters, grids, and clouds that, together with new programming paradigms such as MapReduce and Many Task Computing, offer a wide range of possibilities for solving compute- and data-intensive problems in an efficient manner. The purpose of this workshop is to provide the opportunity for participants to discuss and share the latest research in parallel and distributed high performance computing systems applied to problems in the Life Sciences, i.e. all fields of science that involve the scientific study of living organisms, such as biology, biochemistry, biophysics, botany, ecology, food science, medicine, medical imaging, neuroscience, pharmacology, physiology, systems biology, and zoology.
Parallel Optimisation and Parameter Fitting
This workshop focuses on the design and application of optimisation algorithms to real problems in computational science. Emphasis is placed on algorithms and methods that are particularly suited to parallel and distributed computing environments, such as Grids, Peer-to-Peer Networks, Multi-Core systems, GPGPU and Cloud Computing, due to the practical needs of many of the target problems. In addition, we are interested in optimization problem solving by evolutionary approaches or hybrid evolutionary approaches. The workshop will consider practical implementations, including all aspects from user interface design and details of optimisation methods to system architecture and access to resources. Discussion of applications will also address the power and usefulness of these methods for optimisation and parameter fitting of real-world problems in areas such as Arts, Humanities and e-Social Science, Bioinformatics and Health, Physical Sciences and Engineering, and Climate and Earth Sciences.
Exploring the Legal and Policy Aspects of Accessing and Making Use of Scientific Knowledge and Information
In recent years, e-Science, and e-Research more generally, has grown considerably and the scope of application now covers research domains from science and engineering through to the arts and humanities and the social sciences. This workshop will explore the legal and policy aspects of accessing and making use of scientific knowledge and information. Its focus will be the steps that are being taken, and will need to be taken in the future, to better enable scientific data to be shared, used and reused. In this session, we shall discuss open access regimes that might feasibly form the basis of policy frameworks for allowing public access and use of public escort information, and allowing better access to publicly funded research. We shall also discuss the various aspects of usability and user-centred design that have emerged from e-Science and related Web 2.0 research projects. We hope to identify ways in which e-Science tools and techniques can be developed with the aim of involving experts within the community whose skills can be drawn upon.
High Resolution Tiled Display Walls
Speakers include: Prof. Jurgen Schulze, Calit2; Brett Rolosen, AARNET;
Scalable high-resolution tiled display walls are becoming increasingly important to decision makers and researchers because high pixel counts in combination with large screen areas facilitate content rich, simultaneous display of computer-generated visualization information and high-definition video data from multiple sources. This tutorial is designed to cater for new users as well as researchers who are currently operating tiled display walls or 'OptiPortals'. We will discuss the current and future applications of display wall technology and explore opportunities for participants to collaborate and contribute in a growing community. Multiple tutorial streams will cover both hands-on practical development, as well as policy and method design for embedding these technologies into the research process.
Attendees will be able to gain an understanding of how to get started with developing similar systems themselves, in addition to becoming familiar with typical applications and large-scale visualisation techniques. Presentations in this tutorial will describe current implementations of tiled display walls that highlight the effective usage of screen real-estate with various visualization datasets, including collaborative applications such as visualcasting, classroom learning and video conferencing. A feature presentation for this tutorial will be given by Jurgen Schulze from Calit2 at the University of California, San Diego. Jurgen is an expert in scientific visualization in virtual environments, human-computer interaction, real-time volume rendering, and graphics algorithms on programmable graphics hardware.
The Microsoft Biology Foundation
The aim of the Microsoft Biology Foundation (MBF) project is to produce a well-architected and comprehensively-documented library of common functionality related to bioinformatics and genomics, with the intention of making it easier to write life science applications on the Windows platform. Using C# and the .NET 4.0 framework provides additional levels of flexibility for the developer - over 70 .NET programming languages are compatible, from Visual Basic and Python to C++ and F#. It also leverages the power of .NET - over 15,000 pre-written functions - and takes advantage of.NET Parallel Extensions, a new feature which can parallelize algorithms across all cores and processors of the local machine.
This demonstration will include a brief tour of the MBF library, including details of its free, open source, community-curated and community-owned philosophy and how scientists and developers can participate in future development. We will also demonstrate the flexibility and usability of the library through a range of applications, including a DNA sequence assembler using the Windows Presentation Foundation, an add-in for Microsoft Excel integrating bioinformatics functionality directly with the spreadsheet, access to webservices including demonstration of the Microsoft cloud computing solution Azure, and integration with HPC and scientific workflows. We will also provide a brief introduction to software development using MBF.
Cloud Computing for eScience Applications
Duration: half-day tutorial (four hours)
Cloud computing offers a potential mechanism to increase the efficiency of current research, ensure continuity of critical data and enable new kinds of research not now feasible. In this model, researchers focus on the higher levels of the software stack – applications and innovation, not low-level infrastructure. The cloud service providers deliver economies of scale and capabilities driven by a large market base and energy efficient infrastructure. In this tutorial we will cover the basics of computing platforms, specifically Microsoft’s Windows Azure and Amazon Web Services and their associated application programming models. Included in this discussion will be a comparison between the two cloud computing platforms, highlighting the unique capabilities and differences. Following this introduction to two distinct cloud platforms we will present examples from actual eScience and technical computing applications that illustrate how cloud computing platforms can be used an intellectual amplifier for research.
Scientific Workflows: The Pegasus Workflow Management System Example
Length of tutorial: Half Day
Content level % Introductory: 50% % Intermediate: 30%
Keywords - Scientific workflows, Workflow Management, Resource Provisioning, Pegasus, Condor DAGMan, Condor Glidein
Workflows are a key technology for enabling complex scientific applications. They capture the inter- dependencies between data transformations and analysis steps as well as the mechanisms to execute them in a distributed environment in a reliable and efficient fashion. Workflows can capture processes at different levels of abstraction, and also provide the provenance infrastructure necessary for scientific reproducibility and sharing. There is a spectrum of workflow systems used today in a variety of scientific disciplines such as astronomy, bioinformatics, and physics. In some cases workflow systems focus on service composition, standalone application composition, or both. In this tutorial we will examine the opportunities and challenges of designing and running scientific workflows in distributed environments such as the TeraGrid, Open Science Grid and Cloud Environments such Amazon EC2.
We will also explore the design and functionality of Pegasus-Workflow Management System, which is composed of the Pegasus Workflow Mapper and the Condor DAGMan workflow execution engine. Pegasus allows users to design workflows at a high-level of abstraction and then automatically maps them to distributed resources. Through hands-on exercises, we will cover issues of workflow composition—how to design a workflow in a portable way, and workflow execution—how to run the workflow efficiently and reliably. Lately, there has been an interest in running workflows across the TeraGrid and Open Science Grid (OSG). We will cover challenges involved in running across OSG and TeraGrid, and differences between the two environments.
Users now routinely run workflows with thousands of tasks. In some cases, they model their workflows as workflow of workflows where the total number of tasks reaches hundreds of thousands. An important component of the tutorial will be how to monitor, debug and analyze such workflows.
In order to scale up their scientific applications, users routinely provision resources beforehand on community grids like TeraGrid or on the cloud. We will introduce tools like Corral WMS and Wrangler that allow users to provision and manage Condor Glideins on community grids and on Clouds respectively. The SCEC Cybershake Project routinely provisions for resources using Corral Glidein Service on TeraGrid, and then run millions of jobs on these resources using Pegasus-WMS.
Pegasus-WMS has been in development for more than 9 years and is used in production by several scientific applications. The Corral Glidein Service is a result of three years of development as part of the Pegasus Project. Currently, there is ongoing effort to integrate Corral and GlideinWMS (Glidein tool used on OSG).
Attendee prerequisites (if any): The hands on exercises will be done on VMware Virtual Image that is configured to act as a Computational Grid. To execute the image, attendees will require VMware Player/ VMware Fusion) installed on their laptops. Please note that while VMware Player is a free download for Windows, for Mac one has to buy VMWare Fusion to execute the Virtual Image. Please contact the presenter beforehand if you have a MAC / Linux laptop and don’t have VMWare Fusion installed. The participants will be expected to bring in their own laptops with the following software installed: SSH client, Web Browser, PDF reader.
Last updated: 20/10/10| Site Administrator