Ticket #296 (closed defect: fixed)

Opened 12 years ago

Last modified 9 years ago

Stitching multiple series needs improvement

Reported by: curtis Owned by: melissa
Priority: minor Milestone: bio-formats-4.2.1
Component: bio-formats Severity: serious
Keywords: Cc:
Blocked By: Blocking:

Description

The FileStitcher and AxisGuesser classes currently have logic for stitching together multi-series datasets distributed across multiple files. The organization of such datasets could vary greatly, but see dicom/john/Fake Set for one example.

There are two possible situations:

  1. Each file contains multiple image series (presumably the same number of series per file).
  2. Image series are distributed using file numbering.

In addition, a dataset could conceivably fall into both categories. The problem in either case is that each series can have its own image count with differing sizeZ, sizeC and sizeT values. The current code does not handle such differences elegantly.

As a first step, I would like to remove the multi-series logic from FileStitcher altogether. The resulting code should be significantly shorter and easier to understand.

Once the multi-series code has been removed, we should reexamine how to best handle multi-series stitched datasets. Some considerations:

  1. A single file pattern fundamentally cannot represent multiple series distributed across files using one or more "series" axis types, since the resultant pattern may not be rectangular. Rather, each series should have its own pattern.
  2. We can detect when the files in a folder are conceivably problematic by checking whether the "pattern reduction" logic gets invoked (see source:trunk/loci/formats/FilePattern.java, at the bottom of the findPattern method around line 459). When we detect this situation happening, we could invoke more thorough filename analysis logic to puzzle out which numerical axes represent series variability, ultimately producing one file pattern per series.
  3. We could check upfront whether all files in the pattern "exist" (are on the list), and if not, make inferences from which ones are missing.

Regardless, there will be pathological situations where we cannot know which axes represent multiple series, just as we cannot always tell which are Z, T or C axes.

We should consider encapsulating series-specific logic into a new class if possible, to avoid complicating the FileStitcher class any more than necessary.

Change History

comment:1 Changed 9 years ago by melissa

  • Owner changed from curtis to melissa
  • Milestone set to bio-formats-4.2.1

This is related to #542.

comment:2 Changed 9 years ago by melissa

  • Status changed from new to closed
  • Resolution set to fixed

(In [7021]) Updated FileStitcher and FilePattern to better handle multi-series datasets in which the series axis is spread across multiple files. Closes #296, closes #542.

Note that datasets such as S<0-2>_T<0-10>.lif with each file containing multiple series are still unsupported. However, attempting to stitch datasets of this type will result in an informative exception.

Note: See TracTickets for help on using tickets.