Ticket #19 (closed defect: fixed)

Opened 13 years ago

Last modified 12 years ago

File pattern detection sometimes depends on the file chosen as the basis

Reported by: curtis Owned by: curtis
Priority: minor Milestone:
Component: bio-formats Severity: serious
Keywords: Cc:
Blocked By: Blocking:

Description

Some file groups are not always properly detected due to variations in axis length. For example:

Series004_t000.tif - Series004_t999.tif
Series005_t000.tif - Series005_t999.tif
Series006_t000.tif - Series006_t999.tif
Series007_t000.tif - Series007_t999.tif
Series008_t000.tif - Series008_t476.tif
Series009_t000.tif - Series009_t999.tif

FilePattern, if given Series004_T000.tif from the above collection of files, would detect the pattern as Series00<4-9>_t<000-999>.tif. Unfortunately, for Series008, t only ranges from 000 to 476 (Series008_t<477-999>.tif are missing).

The reason the problem occurs is that FilePattern, in the interest of efficiency, looks along each numerical block individually, but not every combination of every numerical block. Thus, if Series008_t000_ch00.tif is chosen as the basis for the file pattern detection, the pattern is detected as Series00<4-9>_t<000-476>.tif, which has no missing image planes.

It is not totally clear how to fix this problem. In nearly every case, the collection of files are not intended to be stitched together. In the example above, each series is supposed to be a separate collection.

One thing that FileStitcher could do would be to intelligently divide incompatible numerical blocks into distinct series. In the example above, the reader would report six separate series, which are free to have differing numbers of time points. Detecting whether a given dimensional axis is actually a "series" axis would be a new heuristic in AxisGuesser.

Alternately, the file pattern detector could simply report back the shorter pattern regardless of which file is chosen as the basis. In this case, to efficiently detect bounds properly, the existence of more files must be checked. It makes sense to start with the shortest axes and work from there. For example, Series only has six elements, so the code could check the min and max bounds for t at each Series position. Generalizing such an approach would require some thought.

A third option would be to stick with the wider bounds but return null for missing image planes. But that solution does not correct the problem of the pattern detection results being dependent on which file is chosen as the basis. This dependence is downright baffling from an end-user perspective, and should be corrected.

The multi-series solution is probably strongest in the long term, if the implementation details can be worked out.

Change History

comment:1 Changed 12 years ago by curtis

  • Status changed from new to closed
  • Resolution set to fixed
  • Severity set to serious

Fixed in r3838. For now we just redetect the pattern from the first file in the pattern if it was not the one originally used as the base file. This technique works, but does not implement any of the more sophisticated schemes discussed above such as dividing a pattern into multiple image series.

Note: See TracTickets for help on using tickets.