The XCAT Science Portal

News & Politics

14 pages
3 views

Please download to get full document.

View again

of 14
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
The design and prototype implementation of the XCAT Grid Science Portal is described in this paper. The portal lets grid application programmers easily script complex distributed computations and package these applications with simple interfaces for
Transcript
  The XCAT Science Portal Sriram Krishnan, Randall Bramley, Dennis Gannon, Madhusudhan Govindaraju,Rahul Indurkar, Aleksander Slominski, Benjamin TemkoDepartment of Computer Science, Indiana University, Bloomington, INJay AlamedaNational Computational Science Alliance, ILRichard Alkire, Timothy Drews, Eric WebbDepartment of Chemical Engineering, University of Illinois, Urbana-Champaign, IL Abstract The design and prototype implementation of the XCAT Grid Science Portal is described in this paper.The portal lets grid application programmers easily script complex distributed computations and packagethese applications with simple interfaces for others to use. Each application is packaged as a “notebook”which consists of web pages and editable parameterized scripts. The portal is a workstation-based spe-cialized “personal” web server, capable of executing the application scripts and launching remote gridapplications for the user. The portal server can receive event streams published by the application andgrid resource information published by Network Weather Service (NWS) [32] or Autopilot [15] sensors.Notebooks can be “published” and stored in web based archives for others to retrieve and modify. TheXCAT Grid Science Portal has been tested with various applications, including the distributed simulationof chemical processes in semiconductor manufacturing and collaboratory support for X-ray crystallogra-phers.Keywords : Grid, Science Portal, Distributed Simulations, Scripted Applications. 1 Introduction The concept of a Science Portal was first introduced by the National Computational Science Alliance (NCSA)as part of a project designed to provide computational biologists with access to advanced tools and databasesthat could be shared by a community of users via web technology. A Science Portal can be broadly definedas an application specific environment for using and programming complex tasks involving remote resources.Over the past year the Science Portal concept has been heavily influenced by the emergence of the Grid [12]as a computational platform.A Grid is a set of distributed services and protocols that have been deployed across a large set of re-sources. These services include authentication, authorization, security, namespaces and file/object manage-ment, events, resource co-scheduling, user services, network quality of service, and information/directory Supported by NSF grants 4029710 and 4029713, NCSA Alliance, DOE2000.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and thefull citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires priorspecific permission and/or a fee.SC2001 November 2001, Denver (c) 2001 ACM 1-58113-293-X/01/0011 $5.00 1  services. Together these services enable applications to access and manage the remote resources and compu-tations. Web-based Grid Portals provide mechanisms to launch and manage jobs on the grid, via the web.Grid Science Portals are problem solving environments that allow scientists the ability to program, accessand execute distributed applications using grid resources which are launched and managed by a conventionalWeb browser and other desktop tools. In such portals, scientific domain knowledge and tools are presentedto the user in terms of the application science, and not in terms of complex distributed computing protocols.The system effectively makes the grid into a vast and powerful computation engine that seamlessly extendsthe user’s desktop to remote resources like compute servers, data sources and on-line instruments.This paper describes the XCAT Science Portal (XCAT-SP) which is an implementation of the NCSA GridScience Portal concept. XCAT-SP is based on the idea of an “active document” which can be thought of as a “notebook” containing pages of text and graphics describing the science of a particular computationalapplication and pages of parameterized, executable scripts. These scripts launch and manage the computa-tion on the grid, and results are dynamically added to the document in the form of data or links to outputresults and event traces.XCAT-SP is a tool which allows the user to read, edit, and execute these notebook documents. The goal of this research and the focus of this paper is to address the following set of questions. •  How well does the active document model work for real scientific applications? •  How does one use scripts to steer computations from the portal? •  What is a simple and efficient mechanism to store and retrieve data specific to each application? •  How should the portal be designed to interact with an event system to receive feedback from theremotely executing applications? •  How can a portal use a grid monitoring system to provide resource utilization information about itsenvironment? 2 Existing Grid Portals The area of Grid Portal design is now an extremely active and important part of the emerging Grid researchagenda. The existing projects can be grouped into three categories : •  User Portals for simple job submission and tracking, file management and resource selection •  Portal Construction Kits, that provide the APIs necessary for a portal to communicate with Gridservices •  Science Portals, as defined earlierIn the user portal category, the NPACI Hot Page [28] is the first and most successful system. Other userportal projects are the European project Unicore [4], Nimrod-G from Australia [11], and the IPG LaunchPad, which is the user portal for NASA’s Information Power Grid [6] .At least three projects provide portal construction toolkits. The Argonne Commodity Grid (CoG) [17]toolkit is a Java interface for Globus. GPDK from Lawrence Berkeley Labs [8] is a JSP API for CoG, andJiPANG from Tokyo Institute of Technology [25], uses Sun Microsystem’s Jini [23] to provide an interfaceto both CoG and networked solvers like Ninf [26] and Netsolve [7].Science Portals have a variety of forms. Some are designed around relatively specific application domains.For example, the Cactus Portal [16] from the Albert Einstein Institute was srcinally designed for black holesimulations and the ECCE/ELN [27] project from ORNL, LBNL and PNNL is for Computational ChemicalEngineering. The Lattice Portal [20] from Jefferson Labs is a user portal for high-energy physics. Onecategory of science portals directly addresses the problem of building multidisciplinary applications. TheGateway project [5] and the Mississippi project [31] use CORBA [14] and Enterprise Java Beans (EJB) [24]to build a three-tier architecture for launching and scheduling multiple applications. These two projects also2  use scripting to orchestrate large, complex application scenarios. Another CORBA-based project is the Rut-gers Discover portal [10] which also provides a good interface for computational steering and collaborations. 3 The XCAT Science Portal An initial prototype science portal that tests some of the features described above has been developed overthe last year at Indiana University with the help of the Chemical Engineering Team from NCSA. The portaldiffers in its architecture from the examples described above because it does not use a centralized web serveron a remote machine. In our system the portal software that runs on each user’s desktop/laptop has a built-in server. The reason for this is that the XCAT Science Portal is designed to integrate the user’s desktopenvironment with the remote grid resources. If the portal resides elsewhere, the only tool the user can use tointeract with the Grid is a Web browser or other HTTP clients. In our model, the portal server provides asingle, local gateway between the Grid Services and local applications. A local web browser can still interactwith it through HTTP, but other applications may possibly communicate with it via local protocols andservices, such as COM [22], .NET [21] and Bonobo/Gnome [1].As illustrated in Figure 1, the major components of the portal server include: •  A Java-based (Tomcat) server engine, which spawns off a set of Java Servlets that manage access tothe other components. •  A  notebook   database. A  notebook   is an active document defined by an XML object that describes aset of resources used in a computational application. It consists of documents, web pages, executionscripts, and other notebooks. •  A  Script Engine   that is used to execute complex Grid operations. The scripting is currently in JPython,which has become popular with many computational scientists. We provide JPython-based interfacesto the Argonne CoG toolkit, which in turn, provides access to Globus functionality and the GSI [18]Grid authentication mechanisms. It also has an API that allows easy access to the DOE CommonComponent Architecture (CCA) services [19]. •  An  Event Subsystem   that is capable of handling event messages, which may be generated by gridresources or user applications. •  A  Grid Performance Monitor   that provides the user with a view of available resources, their currentloads and network loads. •  A  Remote File Management Interface   that uses the GSI enabled FTP service. 3.1 The Notebook Database The underlying directory structure of the filesystem is used as the database to support the portal. Thedatabase stores a notebook corresponding to each computational application. Each notebook is stored asa directory and each page of the notebook is stored in a different subdirectory. An XML file containingmeta-data about the notebook and a list of pointers and references to the pages in the notebook is alsostored in the local database. Figure 2 shows a snippet of such an XML file. It describes a notebook session,with a title  Notebook Intro , containing a notebook page,  BigPicture  . The complete schemas can be viewedat  http://www.extreme.indiana.edu/an/xsd  .A notebook session can be saved as a jar (Java archive) file, which can be published in a repository usingGSI-enabled FTP or other file transfer services. Authorized portal users can retrieve the jar file from therepository and place it in a database local to their own portal. This enables the portal users in a scientificcommunity to share data corresponding to their experiments with their peers.3   WebBrowserLocalComponents Viz Tools MyPortal Active Notebook ServerAuthentication GSICOGGrid ToolsScript EngineNotebookDatabase Grid Performance Monitor ChannelApplication ProxyApplication Proxy WrappedApplication WrappedApplication Soap Event The Grid  SensorsMachine  Workstation Environment Figure 1: The XCAT Science Portal Architecture 3.2 Grid Application Scripting One difference between a  user portal   and a  science portal   is the complexity of the tasks that the portalsupports. A user portal allows users to submit single jobs to the grid. The portal provides features to makeit very simple to manage the job, providing load-time and run-time information, and to help the user selectresources and to monitor the execution of the job. In a science portal, the applications tend to be more com-plex. A single scientific experiment may involve running many different computational simulations and dataanalysis tasks. It may involve coupled multidisciplinary applications, collaboration, and remote softwarecomponents linked together to form a distributed application. Often these complex tasks may take a greatdeal of effort to plan and orchestrate, and the entire application may need to be run many times each with aslightly different set of parameter values. We have found that the best way to allow this sort of computationto be carried out is to allow the scientist access to a simple scripting language which has been endowedwith a library of utilities to manage Grid applications. Furthermore, we provide a simple tool which allowsthe scientist to build a web form interface to configure and launch the scripts. Users of the scripts simplyfill in parameter values to the web form and then click the  Submit   button. This launches a script whichexecutes on the user’s desktop, but manages remote applications on the grid. Our prototype implementationuses the (J)Python language for scripting because it is popular with scientists and has an excellent interfaceto Java, and we make the scripts grid-enabled by providing an API to Globus Services using the Cog Toolkit.Figure 4 illustrates a portal interface, which is typically application-dependent and is configurable by theusers. In the panel on the left, there is a view of an open notebook session. It consists of a set of pages andscript forms. In this figure, the form for a simple script which launches a local visualization application isshown. Parameter values selected by the user from the form page are bound to variables in the script. Byselecting  Edit   both the script and the form page may be edited as shown in Figure 5. In this case, the scriptlaunches a local program called  animator   which takes as a parameter the name of a simulation output fileto animate. In this example the script is trivial, but it is not much more difficult to write a script to launchan application on the grid and to manage remote files.4  <activeNotebook xmlns="http://www.extreme.indiana.edu/an"><activeNotebookInfo><title>Notebook_Intro (session)</title><creationDate>Thu Apr 19 10:54:10 EST 2001</creationDate><modifiedDate>Thu Apr 19 10:54:18 EST 2001</modifiedDate><version>1.0</version><id>NotebookIntro.7444</id><open>true</open><relatedTo>NotebookIntro</relatedTo><unsaved>false</unsaved></activeNotebookInfo><pageContent><title>BigPicture</title><url>/an/database/notebook/nNotebookIntro.7444/pBigPicture/big_picture.html</url><id>BigPicture</id><number>1</number><open>false</open></pageContent></activeNotebook> Figure 2: An XML file with notebook metadataA second form of scripting is used to manage the local details of the program’s execution on a remote site.The remote applications are managed by  application managers  . In most cases, the applications that thescientists and engineers want to run on the Grid are not  grid aware  , i.e. they are ordinary programs thatread and write data to and from files. In some cases, we have access to the application source, but oftenthat is not available - e.g, when using commercial applications codes. An application manager is an agentprocess that helps the application make use of grid services. For example, the manager can stage input filesfrom remote locations or invoke post-processing on the application output when the application has finished.The manager also serves as an event conduit between the application and the portal. If the applicationdies or creates a file, the manager can send an event back to the portal with the appropriate message. Theapplication manager is shown in Figure 6.The application manager can also act as a service broker for the application. The manager can registeritself with the Grid Information Service [13] and advertise the application’s capabilities. If a user with theappropriate authorization discovers it, then the manager can launch the application on behalf of the user andmediate the interaction with the user. For example, suppose the application is a library for solving sparselinear systems of equations on a large parallel supercomputer. The manager can export a remote solverinterface that takes a sparse linear system as input and returns solution vectors and error flags as output.If a user has a remote reference to the manager, the solver can be invoked by a remote method call passinga linear system (or its URI) as a parameter and the solution vector can be received as a result of the call.This is the model used by JiPANG to invoke Ninf and Netsolve.In the XCAT system, the application managers conform to the DOE Common Component Architecture(CCA)specification. XCAT is our implementation of the CCA specification, built on top of SoapRMI [29], thatallows the users to write CCA compliant components, in C++ and Java. The application managers aredesigned to be  scriptable   components, which have one standard port providing the creator with the ability todownload a script which the component can run. The scripting language and library used by the componentis identical to the language and library available to the portal engine. The application managers combinethe advantages of a persistent remote shell with that of a remote object which may be invoked through awell defined set of interfaces. Furthermore, the interfaces that a manager component supports can change5
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks