Converting SED Data to a Supported Format
Iris Threads
Overview
Synopsis:
If you have a SED data file written in an unsupported format which you would like to analyze in Iris, you can use the Iris SED Importer tool to convert it to an IVOA-compliant format so that it can be uploaded into the application. The SED Importer can be used to convert a single spectroscopic data segment or point, or multiple, separate segments or points, loaded from various locations; moreover, the tool can handle already compliant files. The single or aggregate converted SED can be serialized in FITS or VOTable format, and subsequently loaded into Iris for analysis.
Last Update: 23 Sep 2011 - moved section "Supported Input File Formats: Extended Information" to the References page SED Importer File Formats
Contents
- Introduction
- Getting Started
- Importing Data
- Entering the Conversion Configuration
- Building a Multi-segment SED
- Saving the Converted SED Data to File
- Loading the Converted File into Iris
- Getting Help
- Advanced: SED Importer Command-line Interface (CLI)
- Advanced: Creating Custom Input File Formats with Plugins
- History
Introduction
The SED Importer tool which is bundled with the Iris software package allows you to create SEDs by converting your SED data written in an unsupported format into an IVOA-compliant and Iris-compatibile FITS or VOTable format. You may load your SED data from a binary or text-based file on your local disk; from a URL address (http or ftp); from the NASA Extragalactic Database (NED), based on a query on target name; or even transmit data from a remotely connected application, such as TOPCAT. Some examples of data file formats which are considered non-standard within the context of Iris, are CSV, ASCII, and certain instances of FITS and VOTable (those which do not conform to the IVOA standard and are not compatible with common VO tools).
When you load SED data into the SED Importer from a file written in a format which is not supported by Iris, you are prompted to enter a few key pieces of information about the data format so that the SED Importer is able to convert it into a form which is recognized by VO tools. The resulting configuration information may be saved to file, separately from the converted SED file, to be used later in the non-interactive command-line interface of the tool (e.g., to quickly convert a batch of files). If you happen to load a file into the SED Importer which is already in the supported FITS or VOTable format - e.g., to build an aggregate SED from a mix of segments in unsupported and supported formats - this extra step of entering configuration information is unnecessary.
Supported Input File Formats
The file formats supported by the SED Importer, listed below with brief descriptions, are those which are supported by common VO tools. For each, a simple, generic assumption is made: data is arranged in a tabular format, with all rows having the same number of columns, and possibly with a header where metadata is stored. The way in which both data and metadata are stored depends on the specific format.
For detailed information on these formats, refer to the SED Importer File Formats page.
- ASCII - text file with columns separated by spaces and/or tabs
- CSV - text file with columns separated by commas (the first row may contain the name of the columns)
- FITS - consists of a series of Header Data Units (HDUs), each containing two components: an ASCII text header and the binary data. The header contains a series of header keywords that describe the data in a particular HDU and the data component immediately follows the header.
- VOTABLE - (text or binary) XML standard for the interchange of data represented as a set of tables. Consists of an unordered set of rows, each of a uniform structure, as specified in the table metadata. Each row in a table is a sequence of table cells, and each of these contains either a primitive data type, or an array of such primitives.
- IPAC - a custom bar-separated text format by IPAC
- TST - Tab Separated Table (comments are ignored, metadata is in key, value pairs)
Interoperability with SAMP
In the lower-left corner of the SED Importer desktop there is an icon that shows the status of the Simple Application Messaging Protocol (SAMP) connection, which the tool uses to communicate with other interoperable applications, including Iris.
SAMP is a Virtual Observatory protocol that allows desktop applications to communicate with each other. If you use other SAMP-enabled Virtual Observatory applications that manipulate tables of astronomical data, such as Topcat or Aladin, you can transmit tables of data from these external applications to the SED Importer. This protocol is also used to transmit SEDs from the SED Importer to other applications, like Iris.
Getting Started
The main mode of interaction with the SED Importer is through the Graphical User Interface (GUI), but a non-interactive, command-line interface (CLI) is also available for advanced users. In order to use the SED Importer CLI, what is referred to in this document as a 'setup' file is required input, along with the file to be converted. The setup file stores the configuration information needed to convert SED data written in an unsupported format into one of the supported FITS or VOTable formats; it may be created by using the SED Importer in interactive mode, first.
To open the SED Importer GUI, simply start the tool on the Unix command line by typing the full path to the SED Importer script in the Iris installation directory, as shown below:
% <basedir>/iris-1.0-<plat>-<arch>/SedImporter &
This opens the desktop interface which is available for using the tool in interactive mode. The desktop contains links to help documentation for both the Iris and SED Importer GUIs - which run as separate applications in the current release of Iris - as well as icons that launch the respective applications.
Most of the windows launched by the SED Importer will be confined within this desktop. Each window can be iconified and its icon will stay in the SED Importer desktop itself, so that it will not take up space in your applications bar.
The SED Importer desktop takes on the same look and feel as your native desktop, and the behaviour is consistent with your native environment. For example, on Mac OS X, iconified windows appear as miniatures at the bottom-center of the desktop, while on Linux, they could be draggable buttons placed on the bottom-left of the desktop. The SED Importer desktop can itself be resized, maximized and iconified in your native desktop.
Importing Data
The "Load SED" icon on the SED Importer desktop is the entry point into the tool.
Upon selecting this option, you are prompted to provide an ID for the new SED which you will create, then a SED buider window opens in which you may load a spectroscopic/photometric segment using the "New Segment(s)" option. Data may be loaded from a file on your local disk; an http or ftp URL address; from the NED SED service (internet connection required for this option); or from another desktop application. A "segment" refers to a data spectrum, a photometric point or a series of photometric points; see the Iris FAQ entry "Which SED data types are supported by Iris?" for more details.
An arbitrary number of data segments may be loaded into the SED Importer using the "New Segment" function. You are provided with three options for loading data: browsing for a file on your local disk, entering a URL file location, or searching NED for SED data associated with the entered target name.
Loading from File
When you load a data segment from a file on your local disk into the SED Importer, the file is parsed according to the format you provide in the data import window, e.g., "ASCII" or "CSV".
Loading from a URL
To load a file from a web address, simply enter the full ftp or http URL location into the field provided in the data import window; e.g., "http://cxc.cfa.harvard.edu/iris/threads/importer/3c273.csv".
Loading from NED
If you are using the "Get an SED from the NED Service" option, the entered target name will be resolved by NED, and any associated SED data found in the NED photometry archive will be returned by the web service and loaded into the tool.
The "Change Endpoint" input can be used to change the NED service URL which is queried; however, this not usually needed.
Loading from a SAMP-enabled application
To transmit data to the SED Importer from another SAMP-enabled desktop application, such as Topcat, simply broadcast the table from the external application (or send it directly to the SED Importer). If in the SED Importer there are no open SEDs, a new SED will be created. Each SED building window has a check box labeled "Accept tables from SAMP". If the box is ticked, the SED will receive the new importer segment from the external application; if the box is unticked, it will not. This also means that different SEDs can share the same segment.
In the SED builder window, you have the option of entering a target name and coordinates into the Target Info field, to be associated with the new SED and recorded in the metadata of the saved SED file. If this information is provided and an internet connection is available, the name is resolved and the associated coordinate fields are automatically populated. (Note that if you have entered a target name in to the Target Info field, this name will be offered as a suggestion whenever you choose to upload a data segment using the NED SED web service option.)
A SED builder window with example target information entered is shown below.
Entering the Conversion Configuration
If the file you are loading is compliant, the segments in the file will be automatically imported and no other input will be required.
It the file is not compliant, after loading a data segment into the SED Importer - e.g., using the URL option with an HTTP file location entered as "http://cxc.cfa.harvard.edu/iris/threads/importer/3c273_hut.ascii" - another interactive window opens, labeled "Import Setup Frame". In this window, you must enter various pieces of information which will be used to define the configuration for the format conversion for this particular data segment (i.e., so that the tool knows how to convert the data from the unsupported format to a supported one). The "Save Setup" button beneath the "Setup Help" window is available for those who wish to write this configuration to file for use with the SED Importer CLI, described in the "Advanced Usage" section below.
If the file imported has more than one table in it, more than one "Import Setup Frames" will be opened.
The fields in the Import Setup Frame window which require entries are the X Axis, Y Axis, and Y Error fields. Until these fields are populated, the Setup Help window will contain warning messages indicating that the form is incomplete, and you will not be able to save the current setup to a file or import new segments into the SED.
In the X Axis section, you must characterize the column in your file which corresponds to the spectral coordinate axis, e.g., wavelength in Angstrom units, frequency in Hz, or energy in eV. In the Column drop-down menu, you will find the name of the columns as they are in your file. If no column names can be found, then "colN" will be used, where N is the number of the column as it appears in the file.
In the Y Axis section, you are to characterize the column in your file which corresponds to the flux density axis, e.g., energy flux density in ergs/s/cm2/Hz, photon flux density in photons/s/cm2/Hz, or the AB magnitude equivalent of the flux density.
In the Y Error section, you can characterize the error for the Y Axis using several different options:
- Unknown: the error is unknown.
- ConstantValue: all the points in the segment have the same error, which is not actually included in the file, but is typed into the provided field (e.g, "0.001").
- SymmetricColumn: the error is symmetric and its value is contained in a column.
- SymmetricParameter: the error is symmetric and its value is contained in a parameter in the file header. The Importer tries to read the parameters from the file header for the formats that allow such parameters. If the parameter cannot be read, you are required to include the error value manually as a ConstantValue.
In the remaining fields of the Import Setup Frame window, you can view the name you assigned to the SED along with the name and path of the associated file; as well as view and edit the target information and data publisher (e.g., "NED") which was optionally entered in the SED builder window, or add this information here if it was not done previously.
After entering the required configuration information into the setup window, the "Add Segment to SED" and "Save Setup" buttons become active.
At this point you can optionally save the configuration to a text file for later use of the tool in non-interactive mode, as well as add the configured segment to the new SED. Adding the segment closes the "Import Setup Frame" window and brings you back to the interactive window which is labeled with your SED ID and contains general information about all SED segments loaded in the session thus far.
Here, you may continue loading data segments from various locations, and repeat the process outlined above until you are finished building your SED. The list of loaded segments is shown with associated coordinates, publisher information, and the number of points in the segment, if this information is available. You can select one or more segments in this list and the relevant buttons in the Segments Operations section to perform tasks on the selected segments.
Building a Multi-segment SED
As the SED Importer has the flexibility to read data files in a variety of formats and from different locations, you are able to use the tool to gather multiple SED data segments and save them together as an aggregate SED, in a single session, and save to a single FITS or VOTable format file.
In order to build a multi-segment SED - where each segment may be converted using a different configuration - you can load a second segment using the "New Segment" option in the SED builder window, using any of the file upload options described in the "Importing Data" section, above. For example, if the first segment had been loaded using the "URL" option, the second segment could be loaded from the NED SED Service by entering the object name "3c273" into the appropiate field of the data import window.
In this case, the segment would be directly added to the SED, without prompting you to enter a separate configuration for the segment in the Import Setup Frame window, because this step is not required for data loaded from compliant files, like the ones provided by NED.
Saving the Converted SED Data to File
When you have finished importing and configuring data segments within the SED Importer GUI, and optionally saving the associated setup file(s) for later use in the SED Importer CLI, you may write the new SED to an Iris-compatible FITS or VOTable format file by selecting "Save SED". The converted SED data file is now in an Iris-compatible format and therefore may be loaded into the Iris GUI for SED analysis.
Loading the Converted File into Iris
Once you have used the SED Importer to convert your SED data into an IVAO-compliant FITS or VOTable format, you may load the converted file into Iris for analysis. The "Launch Iris" icon on the SED Importer desktop is available as a shortcut to the Iris GUI, which runs separately from the SED Importer GUI in the current release.
This is identical to launching Iris from the command line in the usual way:
% <basedir>/iris-1.0-<plat>-<arch>/Iris &
The "Broadcast SED" button in the SED Importer can be used to transmit the SED to Iris. You can also select one or more segments and send them to Iris individually.
Getting Help
Two lifebuoy shaped icons on the SED Importer desktop will point your default browser to the documentation pages for both the SED Importer (this page) and Iris (the other components of the Iris How-to Guide). If the program cannot open these links in your default browser, a simple browser should appear in the SED Importer Desktop itself (note that formatting errors may result, in this case).
Advanced: SED Importer Command-line Interface (CLI)
Creating a Setup File
Using a Setup File
Editing a Setup File
The SED Importer may be run non-interactively from the Unix command line, using both the data file to be converted, and a setup file containing the configuration information for the conversion. The setup file must be created using the SED Importer interactively, first, according to the procedure described in the section "Entering the Conversion Configuration", above.
Creating a Setup File for the CLI
A SED Importer setup file is a text file which records the configuration information used by the tool to convert from the user-input unsupported format, to a supported VOTable or FITS data format. Its intended use is to automate the file conversion procedure in scripting. It may be created and saved within the "Import Setup Frame" window of the SED Importer GUI.
The full set of instructions for creating a setup file is provided in the "Enter Conversion Configuration" section, above; the basic steps are:
- Start a new SED building sesssion by clicking on the "Load SED" icon in the SED Importer desktop interface.
- Assign it a label when prompted and then click the "New Segment(s)" button in the SED builder window to load data from a disk, URL, the NED SED web service, or a remotely connected application.
- In the "Import Setup Frame" window of the SED Importer GUI, enter the spectral coordinate and flux density characterization of the loaded data so that the tool can make the conversion to a supported data format.
- Save the configuration to a setup file using the "Save Setup" option.
Using a Setup File
To run the SED Importer from the command line using a newly created setup file as input, the following arguments must be provided, in the order shown:
- config_file - The setup file which contains the importing configuration(s), which was created using the SED Importer in interactive mode
- output_file - The output file that will contain the new SED.
- format - The format in which the new SED file must be written. Choose between 'vot' and 'fits'. If omitted, the default will be 'vot'.
For example:
% <basedir>/iris-1.0-<plat>-<arch>/SedImporter config_file.ini outputfile.vot vot
The setup file created in the SED Importer GUI allows you to quickly convert data in a given unsupported format into one of those supported by Iris. This is particularly useful when you have a long list of files in the same unsupported format which you need to convert and analyze in Iris: instead of loading each file into the GUI and unnecessarily re-creating the same conversion configuration, you can simply use the one setup file you already created to convert a batch of files on the command line.
An example Python script which performs a batch conversion is available here, with accompanying instructions for customizing and executing. Such a script may be used to automatically convert all the files in a given directory using the SED Importer Command Line Interface.
Note that a setup file output by the GUI may also be loaded into the GUI, using the "Load from Setup" option in the File menu. This is useful when you are building a multi-segment, aggregate SED in the GUI, and would like to contribute a segment for which you have already created a conversion configuration, in a previous session.
Editing a SED Importer Setup File
The contents and format of a SED Importer setup file are described in detail in this section, so that you may learn how to edit the file and customize the configuration to suit your needs, independently of the GUI.
Setup File Format
The output setup file looks like a Windows ini file or a MySQL configuration file. It is organized into sections, with each section representing a separate data segment. This means that in a single setup file you can include many segments from many different files.
Each section has a title between squared brackets, e.g. "[Segment1]". Titles are not used when the file is processed, but different titles mark different segments; this means that two sections with the same title would refer to the same segment. While it is allowed to fragment information in different sections it is not wise to do so because if you include the same information more than once in different subsections, the result may become unpredictable.
Beside the title, all the information is expressed in key/value pairs: the key and the value are on the same line and they are separated by the character "=", e.g., "XAxisColumnNumber = 5"; the order does not matter.
Decimal numbers may be represented in scientific notation, e.g., 5.5E-7.
Setup File Contents
The contents of an SED Importer setup file is shown below, where there is a field to specify the location of the input file to be converted, as well as various other fields for specifying the configuration of this input file.
[Segment0] XAxisColumnNumber = 5 XAxisQuantity = FREQUENCY XAxisUnit = HERTZ YAxisColumnNumber = 6 YAxisQuantity = FLUXDENSITY YAxisUnit = FLUXDENSITYFREQ1 constantErrorValue = 2.0 errorType = ConstantValue fileLocation = file:/Users/data/3c273.csv formatName = CSV publisher = UNKNOWN targetDec = 2.05238729 targetName = 3c273 targetRa = 187.27791798
The fields of the setup file are defined below.
General Information
targetName - A string representing the name of the object this segment belongs to, e.g. 3c273 targetRa - The Right Ascension of this segment in decimal degrees (double), e.g. 187.27791798 targetDec - The Declination of this segment in decimal degrees (double), e.g. 2.05238729 publisher - A string representing the data curator of this segment.
File Information
fileLocation - A URL pointing to the actual location of the file. If it is a local file, the absolute path of the file must be preceded by the protocol file:, e.g., file:/User/data/3c273.csv formatName - the name of the file format which has to be used for reading the file. The string must be chosen among these ones: VOTABLE CSV FITS ASCIITABLE IPAC TST
X Axis
XAxisColumnNumber - an integer representing the column position in the file, where 0 represents the first column. XAxisQuantity - a string representing the spectral quantity of this segment, among: FREQUENCY WAVELENGTH ENERGY XAxisUnit - a string representing the X Axis units. The units have to be consistent with the Axis quantity, i.e.: FREQUENCY: HERTZ, KHZ, MHZ, GHZ. WAVELENGTH: ANGSTROM, CM, M, MICRON, NM. ENERGY: EV, KEV, MEV.
Y Axis
YAxisColumnNumber - an integer representing the column position in the file, where 0 represents the first column. YAxisQuantity - a string representing the spectral quantity of this segment, among: FLUX (Flux) FLUXDENSITY (Flux Density) MAGNITUDE (Magnitude) PHOTONFLUXDENSITY (Photon Flux Density) YAxisUnit - a string representing the X Axis units. The units have to be consistent with the Axis quantity. In the following table you will find the strings of the units that are supported and consistent with each quantity; where applicable, the unit string is indicated. Notice that you do not have to include the unit string but the corresponding label (e.g. FLUXDENSITYFREQ1): FLUX: FLUX0: Jy-Hz FLUX1: erg/s/cm2/Hz FLUXDENSITY: FLUXDENSITYFREQ1: Jy FLUXDENSITYFREQ2: Watt/m2/Hz FLUXDENSITYWL0: erg/s/cm2/Angstrom FLUXDENSITYWL1: Watt/m2/um (micron) MAGNITUDE: ABMAG STMAG OBMAG PHOTONFLUXDENSITY: PHOTONFLUXDENSITY0: photon/s/cm2/Hz PHOTONFLUXDENSITY1: photon/s/cm2/Angstrom
Y Error
errorType: the type of the error that characterize the Y Axis, among the following: Unknown ConstantValue SymmetricColumn SymmetricParameter AsymmetricColumn AsymmetricParameter
These values for the errorType key may require at least one more key/value pair, with semantics that depend on the chosen option. In the following list, for each error type option you will find a list of acceptable keys and the values they expect.
Unknown: no other information is required ConstantValue: constantErrorValue: a decimal number representing the value of the error, e.g. 0.2 SymmetricColumn: symmetricErrorColumnNumber: an integer representing the position of the error column in the file, where the first column is in position 0. SymmetricParameter: - symmetricErrorParameter: the name of the parameter in the file header that contains the value of the error for all the points in the file.
Advanced: Creating Custom Input File Formats with Plugins
Plugins, software components used to extend the functionality of a given software application, can be used to extend the list of input file formats supported by the SED Importer. Plugins are created by trusted third party developers or by the Iris team itself, and can be installed by the user and then used to read files in custom formats. If you want to develop your own plugin, please refer to the "For Developers" section of this documentation.
Plugins can be loaded into the SED Importer from the local disk or directly from the web. To load a plugin, select "Plugins" from the "File" menu.
This will open the Plugin Manager GUI, where the plugins which have already been installed are shown, and new plugins can be added by clicking on the button "Load Plugin".
Plugins are distributed as Java jar files (extension .jar). Each plugin can contain more than one filter.
Once imported, the plugins, and all the filters therein will be listed in the Plugin Manager.
By right-clicking on a plugin, the plugin can be removed.
The effect of installing a plugin is that all the contained filters will be listed in the Load Segment Window File Format drop-down menu (along with the natively supported ones, like CSV).
For Developers
A "Plugin" refers to a Java archive that contains one or more "File Filters".
To develop a plugin, the SED Importer jar can be used as a simple Software Development Kit (SDK). A more complete SDK will be soon available as a Maven Archetype, with example files and test infrastructure, to help the developer in creating and testing plugins.
For developing custom filters, there are two possibilities: one is to directly implement the IFilter interface; the other is to annotate an arbitrary class and some of its methods with some annotations.
In any case the SEDImporter SDK employs the Inversion of Control paradigm to make the development of Custom Filters as seamless as possible.
Implementing the IFilter interface
This approach should be straightforward but does not offer much flexibility. For this reeason, a developer is encouraged to leverage the SED Importer SDK framework to write less code and with less risk. In particular, the custom filter implementation can directly extend the AbstractFilter class.
The basic information needed by the Custom Filter is the URL of the file, either it is local (file protocol) or remote (http or ftp). The Filter is supposed to read the file from this URL and provide the Data and the Metadata to the SEDImporter framework.
The AbstractFilter class provides the getUrl() method that can be used to retrieve the file URL. This means that a Filter extending this class is automatically compliant with the SED Importer caching mechanism, which stores a Filter instance for each URL. The getUrl() method can be used wherever in the code the developer needs to access the file URL.
Several methods are abstract and must be implemented by the extending Filter. Some of these methods are used to provide metadata about the Filter itself (getName(), getAuthor(), getDescription() and getVersion()) and are quite trivial to implement.
The other two abstract methods are used to implement the actual importing code.
The getMetadata() method is invoked by the SED Importer to get the metadata of the file that is being imported. The Filter is supposed to read the file (using the getUrl() method) and return a list of ISegmentMetadata objects. These objects contain lists of two kinds of metadata entities: columns (interface ISegmentColumn) and parameters (interface ISegmentParameter).
There is currently a STIL based implementation of these interfaces that the developer can take advantage of.
In general, the implemented Filter could allow the underlying file format to store more than one table in the same file. This is why the getMetadata() method returns a list. The number of elements in the list is equal to the number of tables (i.e. segments) in the file. So, if the file contains three tables, there will be three elements in the list of ISegmentMetadata objects.
Once the SED Importer framework has acquired the metadata instances, it will ask the Filter to return the actual columns, through the getData method.
The getData(int, int) method gets two integers as arguments and returns an array of java.lang.Number: the index of the segment in the file and the number of the column for which the SEDImporter is requesting the data to the filter. This method applies to generic file formats that allow for multiple tables to be stored in the same file. As it was discussed before, if the Custom Filter doesn't allow for more than one table to be stored in the file, then the first argument of this method will always be 0. Otherwise the method implementation will be required to handle the different indexes. The index is, at most, the length of the list produced by the getMetadata method.
Note that the extending class can implement its own state so that information is efficiently retained. The important information here is that the framework will keep an instance of the Filter object for each URL, i.e. for each file imported.
Also note that the return type is Number[], and that Number is an abstract class. This means that you can return any subclass of Number[], like Double[] or Integer[].
Using annotations
A more flexible and economic (but more error prone, if misused) approach employs the Java annotations. An arbitrary class correctly annotated can employ less code and provide the developer with more freedom in the development of Filters. It is up to the developer to evaluate the trade-off between flexibility and safety.
The first annotation is @Filter. It must be used to annotate the class implementing the Filter, and has several attributes that can be used to provide filter metadata. Even though these attributes have default values, the developer is supposed to override them with the actual Filter metadata. Here is an example of the use of this annotation, which should be self explanatory:
@Filter( name="ASCII Header", author="Omar Laurino", version="1.0", description="ASCII Table with utypes in the header" ) public class ASCIIHeaderFilter {
The @FileLocation annotation can be used to tag the field that, at runtime, will contain the URL of the file to import. This variable doesn't need to be initialized and will be "injected" with the actual URL at runtime. Of course, the variable *must* be of java.net.URL type. Note that in the current release (1.0) this field needs to be declared public. In the next release it will be possible to declare the field private.
The two annotations @Data and @Metadata must be used to tag the methods that return the data and the metadata objects. These methods can have arbitrary names. However, they must get the same arguments of their IFilter counterpart (see previous section) and return their same type.
History
08 Aug 2011 | updated for Iris Beta 2.5 |
20 Sep 2011 | updated for Iris 1.0 |
23 Sep 2011 | moved section "Supported Input File Formats: Extended Information" to the References page SED Importer File Formats |