source: trunk/loci/formats/using-bioformats.txt @ 2319

Revision 2319, 9.2 KB checked in by curtis, 13 years ago (diff)

Some documentation updates.

Line 
1                                  Overview
2                                -----------
3
4This document describes various things that are useful to know when working
5with Bio-Formats.  It is recommended that you obtain the Bio-Formats source
6by following the directions at http://www.loci.wisc.edu/software, rather than
7using an official release.  It is also recommended that you have a copy of the
8JavaDocs nearby; the notes that follow will make more sense when you see the
9API.
10
11For a complete list of supported formats, see the Bio-Formats home page:
12http://www.loci.wisc.edu/ome/formats.html
13
14                              Basic File Reading
15                         ---------------------------
16
17Bio-Formats provides several methods for retrieving data from files in an
18arbitrary (supported) format.  These methods fall into three categories: raw
19pixels, core metadata, and format-specific metadata.  All methods described here
20are present and documented in loci.formats.IFormatReader - it is advised that
21you take a look at the source and/or JavaDoc.  In general, it is recommended
22that you read files using an instance of ImageReader.  While it is possible to
23work with readers for a specific format, ImageReader contains additional logic
24to automatically detect the format of a file and delegate subsequent calls to
25the appropriate reader.
26
27Raw pixels are always retrieved one plane at a time.  Planes can be returned
28either in a byte array, or in a java.awt.image.BufferedImage (using
29openBytes(String, int) and openImage(String, int) respectively).  It is entirely
30up to you which method to use, as the pixel values are always identical.
31In general, BufferedImages are more convenient for viewer applications and
32applications that don't need to perform computations on pixel data, while byte
33arrays are better for applications that perform pixel manipulations.
34
35Core metadata is the general term for anything that might be needed to work with
36the planes in a file.  A list of core metadata fields is given below, with the
37appropriate accessor method in parentheses:
38
39- image width (getSizeX(String))
40- image height (getSizeY(String))
41- total number of images per file (getImageCount(String))
42- number of slices per file (getSizeZ(String))
43- number of timepoints per file (getSizeT(String))
44- number of actual channels per file (getSizeC(String))
45- number of channels per image (getRGBChannelCount(String))
46- the ordering of the images within the file (getDimensionOrder(String))
47- whether each image is RGB (isRGB(String))
48- whether the pixel bytes in little-endian order (isLittleEndian(String))
49- whether the channels in an image are interleaved (isInterleaved(String))
50- the type of pixel data in this file (getPixelType(String))
51
52All file formats are guaranteed to accurately report core metadata.
53
54Format-specific metadata refers to any other data specified in the file - this
55includes acquisition and hardware parameters, among other things.  This data
56is stored internally in a java.util.Hashtable, and can be accessed in one of
57two ways: individual values can be retrieved by calling
58getMetadataValue(String, String), which gets the value of the specified key.
59Alternatively, getMetadata(String) will return the entire Hashtable.
60Note that the keys in this Hashtable are different for each format, hence the
61name "format-specific metadata".
62
63                             File Reading Extras
64                           ----------------------
65
66The previous section described how to read pixels as they are stored in the
67file.  However, the native format isn't necessarily convenient, so Bio-Formats
68provides a few extras to make file reading more flexible.
69
70- loci.formats.FileStitcher extends IFormatReader, and uses advanced pattern
71  matching heuristics to group files that belong to the same dataset.
72- loci.formats.ChannelSeparator extends IFormatReader, and makes sure that
73  all planes are grayscale - RGB images are split into 3 separate grayscale
74  images.
75- loci.formats.ChannelSeparator extends IFormatReader, and merges grayscale
76  images to RGB if the number of channels is greater than 1.
77- ImageTools provides a number of methods for manipulating BufferedImages and
78  primitive type arrays.  In particular, there are methods to split and merge
79  channels in a BufferedImage/array, as well as converting to a specific data
80  type (e.g. convert short data to byte data).
81
82                                Writing Files
83                        ----------------------------
84
85The following file formats can be written using Bio-Formats:
86
87- TIFF (uncompressed or LZW)
88- JPEG
89- PNG
90- AVI (uncompressed)
91- QuickTime (uncompressed is supported natively; additional codecs use QTJava)
92- Encapsulated PostScript (EPS)
93
94We are planning support for OME-XML in the near future.
95
96The writer API (see loci.formats.IFormatWriter) is very similar to the reader
97API, in that files are written one plane at time (rather than all at once).
98
99All writers allow the output file to be changed before the last plane has
100been written.  This allows you to write to any number of output files using
101the same writer and output settings (compression, frames per second, etc.),
102and is especially useful for formats that do not support multiple images per
103file.
104
105A word of warning: IFormatWriter.save(String, Image, boolean) accepts generic
106java.awt.Images, and converts them to a BufferedImage under the hood.
107The problem is that not all formats support all types of data (e.g. JPEG
108does not support 16-bit data).  To prevent the possibility of corrupt or
109invalid files, it is important to check that the Image you supply to save()
110is supported.  This can be done using the isSupportedType and getPixelTypes
111methods of IFormatWriter.
112
113Please see the Movie Stitcher (loci.apps.stitcher) for an example of how
114to write files using Bio-Formats.
115
116                    Arcane Notes and Implementation Details
117                 --------------------------------------------
118
119Following is a list of known oddities.
120
121o While the IFormatReader API provides methods to read a byte array or
122  BufferedImage, the IFormatWriter API only accepts a java.awt.Image.
123  For completeness, we should add a method to IFormatWriter that accepts an
124  array of bytes for writing.  The only reason for not doing so is that no
125  one has needed this feature (yet).  If you think this would be useful,
126  it can be added.
127
128o IFormatWriter accepts Image objects, and not BufferedImages; yet all writers
129  convert the Image to a BufferedImage.  You can still pass in a
130  BufferedImage, but you are free to pass in any Image object (this is mainly
131  for compatibility with ImageJ).
132
133o All readers have another openBytes method that takes a pre-allocated byte
134  array, but there is no corresponding method for openImage.  The
135  rationale behind pre-allocated byte arrays is (1) array allocation takes
136  a relatively long time; and (2) pre-allocation avoids memory spikes on the
137  heap.  The reason there isn't something similar for openImage (i.e., a method
138  that takes a pre-allocated BufferedImage) is that it's kind of a pain to
139  implement, and no one has cared so far.  If you want this method, we can work
140  towards adding it.
141
142o Leica LEI files sometimes (actually, frequently) don't look right.
143  However, there is a sneaky way of getting Bio-Formats to read them correctly:
144  call setColorTableIgnored(true) on the reader object.  Why?  Well, the LEI
145  file format consists of a "header" file (.lei extension), and a set of TIFF
146  files (which contain the actual pixel data).  The LEI acquisition software
147  allows the user to specify an "acquisition channel" - in fact, multiple
148  acquisition channels can be specified for each dataset.  Then for each
149  acquisition channel, a color lookup table is applied to the grayscale data.
150  This doesn't sound so bad, but it gets really ugly if half of the TIFF files
151  in a dataset had an acquisition channel specified, and half didn't - you get
152  alternating RGB and grayscale planes.  Forcing Bio-Formats to ignore the
153  color tables means you get all grayscale planes, but the number of channels
154  is preserved (so channel merging still works).
155
156o Importing multi-file formats (Leica LEI, PerkinElmer, FV1000 OIF, ICS, and
157  Prairie TIFF) can fail if any of the files are renamed.  There are
158  "best guess" heuristics in these readers, but they aren't guaranteed to work
159  in general.  So please don't rename files in these formats.
160
161o If you are working on a Macintosh, make sure that the data and resource forks
162  of your image files are stored together.  Bio-Formats does not handle
163  separated forks (the native QuickTime reader tries, but usually fails).
164
165o Through specialized I/O classes, Bio-Formats is able to control the number of
166  open file descriptors (in the current JVM).  Currently, the maximum is 200,
167  which is lower than the default on most systems.  Side note on I/O: the
168  reasoning behind writing our own I/O stuff (see
169  loci.formats.RandomAccessStream) is 1) InputStreams are fast at reading
170  data sequentially, but cannot do random access; 2) RandomAccessFiles are
171  great for random access, but less efficient for sequential reading; 3) we
172  needed RandomAccessFile-like functionality for byte arrays; 4) we wanted to
173  be able to read from disk, over HTTP, and potentially other sources.  The
174  result is a hybrid class that extends InputStream and implements DataInput to
175  meet all of our goals.
Note: See TracBrowser for help on using the repository browser.