Where Do the Data Go? An Analysis of Chandra Data Dissemination

Previous   Contents   Next



Where Do the Data Go?
An Analysis of Chandra Data Dissemination

Reprinted by permission from ADASS XV Proceedings (Escorial, Spain, 2005) ASP Conf. Ser., G. Gabriel, C. Arviset, D. Ponz, and E. Solano, eds.

Introduction

The Chandra Data Archive (CDA) has been distributing data to users since September 1999. Over the years, four interfaces to the archive have been developed: two are web-based; one is stand-alone; and one is ftp-based. The public data in the CDA are mirrored at sites around the world. Downloads are logged, and since 2002 have been maintained in a database (Blecksmith et al. 2003). In addition to tracking the number of files and the volume of data being downloaded, this database is used to perform meta-analyses. Questions addressed in this paper include: (1) At what rate are the applications used relative to each other? (2) Where do users come from? (3) Is there a link between where a user is from and which application is used? (4) What sorts of events trigger downloads of a particular observation?

Relative Application Use

Users can connect to the CDA and browse or retrieve data using a number of interfaces. At this time no single interface provides
access to all classes of data in the archive. As a result, all analyses include only downloads that can be done through WebChaser which has the smallest set of retrievable products. As a result, in
this study we count only downloads of public data, excluding engineering observations and supporting products.

Provisional Retrieval Interface (PRI):
The original interface to the CDA providing access to public data, it is a CGI script with minimal search capabilities. It is the only GUI-based interface providing access to engineering observations.
Chaser:
This is a stand-alone Java application providing access to proprietary as well as public data. Users can enter individual selection criteria, as well upload criteria from a file. Search results may be viewed in Chaser or saved to a file. Users may select individual files which may be downloaded directly or staged to an ftp area. Chaser is the only interface providing access to supporting products.

WebChaser:
This is a web-based application providing access to proprietary and public data. Search criteria must be entered individually, but WebChaser provides the most detailed search results, including quick-look images, V&V reports, and links to processing status and publications tracked by the CDA operations group. Primary and secondary data packages are downloaded to an ftp staging area.

FTP:
This is an anonymous ftp-site holding primary and secondary products for all public observations.

An analysis of downloads shows that the choice of application depends upon the number of observations downloaded. Most sessions result in fewer than 10 downloaded datasets. When
FIGURE 28: Relative download application use for downloads of 10 or fewer and of 500 or more datasets in a three month period by a single host.

Chaser and WebChaser came on-line in mid-2001, the usage of the PRI began a steady decline, however a not insignificant number of users continue to use it today. Chaser’s predominant use is to download supporting products. For large downloads (more than 500 observations per download) users prefer the FTP site. For smaller downloads, most users prefer WebChaser and users have a slight preference for the FTP site over the PRI.

User Demographics

While the vast majority of CDA users are from the United States, we see a steadily increasing number of countries accessing the archive each year. We’ve gone from serving 22 countries in 2000 to 40 countries in 2004 and have served 57 countries through the course of the mission. Several countries download significant
FIGURE 29: Left: Percentage of Chandra downloads by country for PRI, FTP, and WebChaser.
Right: Relative downloads by country for WebChaser in 2004.
volumes of data; some of these either host or are scheduled to host mirror-sites of the public data. The data show China in particular to be a major user of the CDA, making it a good candidate for a future mirror site.
It is interesting to note there are country specific preferences for accessing the CDA. For instance, the majority of users in the UK prefer WebChaser while users in Japan prefer FTP. This may be related to the UK becoming a mirror site host in 2002, which improves response time for local use of WebChaser. Japan is scheduled to host a mirror site in the near future, which should hopefully make the use of WebChaser with its superior search capabilities more attractive. Users in China seem to prefer the PRI (with the exception of 2004, when there were two complete downloads of the archive using ftp). The PRI, a much simpler application than WebChaser, requires much less network bandwidth and thus may be more appealing in countries with limited bandwidth.

FIGURE 30: Relative usage of PRI, WebChaser, and ftp for the UK and Japan.

Publications and Downloads

What sorts of events trigger downloads of a particular observation? Three obvious events are the public release of data (the date of which is available in the CDA), scientific publication of the data (tracked by the CDA operations group (Rots et al. 2004)), and media coverage related to the observation (tracked by the Chandra Education and Public Outreach group). We find that, not surprisingly, all three types of events increase the number of downloads. Most observations are downloaded within days of going public, indicating that users monitor the archive for such events. Scientific publications increase downloads around the time of the publication, more so for the first publication associated with an observation. Press releases have a lesser effect, stimulating downloads only if the release is picked up by many large media services.

FIGURE 31: Time-lines of two Chandra observations showing the effects of public release, scientific publications and media coverage on downloads.
Conclusion

Our analysis of the Chandra downloads database indicates that users generally prefer web interfaces and direct access via ftp to stand alone applications. Preferences for download applications vary by country, possibly because of varying bandwidth. Public release of a dataset, scientific publications of data, and major press coverage all appear to increase the downloads of an observation near the time of the event.

This work is supported by NASA contract NAS 8-03060.


S. Winkelman, A. Rots, A. Duffy, S. Blecksmith, D. Jerius
Chandra X-ray Center/Smithsonian Astrophysical Observatory

References
Blecksmith, E., Paltani, S., Rots, A. & Winkelman, S. 2003, adassxii, 283
Rots, A., Winkelman, S., Paltani, S., Blecksmith, S. & Bright, J. 2004, adassxiii, 605