Skip to content

DL3 Event List with EventPreprocessor#3004

Open
kosack wants to merge 11 commits into
mainfrom
feature/dl3_eventlist_with_processor
Open

DL3 Event List with EventPreprocessor#3004
kosack wants to merge 11 commits into
mainfrom
feature/dl3_eventlist_with_processor

Conversation

@kosack

@kosack kosack commented May 18, 2026

Copy link
Copy Markdown
Member

This adds a FeatureSet to the EventPreprocessor for transforming DL2 to DL3 observed events. This is a work-in-progress to see what is needed to replace some of what is in #2979

  • To support coordinate transforms, I added a altaz_to_icrs helper function, and allow arbitrary objects to be passed into the FeatureGenerator (in this case the observatory location), and also allow the subarray to be passed in to init.
  • There are now two modes: DROP (cuts events) and MARK (creates new boolean columns for each quality query). The latter is useful for debugging or benchmarking.

TODO:

Example:

# a fully applied DL2 file (has to have gammaness and energy, and I add a fake time column)
test_dl2 = "/Users/kkosack/Data/CTA/Prod6/North_Alpha_Zen20_Az180_NSB01/gamma_20deg_180deg_run002012___cta-prod6-2156m-LaPalma-dark+magic_cone10.01_alpha_test_applied.DL2.h5"
# output of ctapipe-optimize
test_cuts = "/Users/kkosack/Projects/CTA/PipeWork/TestIRFs/Selection_Cuts.fits" 

cuts = OptimizationResult.read(test_cuts)
apply_gammaness_cut = partial(evaluate_binned_cut, cut_table=cuts.gh_cuts, op=operator.ge) 

with TableLoader(test_dl2, dl2=True, pointing=True, observation_info=True) as loader:
      processed = []
      preprocess = EventPreprocessor(feature_set="dl2_to_dl3", mode="KEEP", subarray=loader.subarray)

      for chunk in tqdm(
          loader.read_subarray_events_chunked(chunk_size=10_000, stop=100_000)
      ):
          processed.append(
              preprocess(
                  chunk.data, 
                  gammaness_cut_function=apply_gammaness_cut, 
                  multiplicity_cut_function=lambda m,e: m>=4
              )
          )
processed = QTable(vstack(processed))
EVENT_ID TIME RA DEC ENERGY ALT AZ FOV_LON FOV_LAT GAMMANESS MULTIP HMAX PASSED_GH
3611 0.0 36.39662350967956 13.181637086091095 0.7886559320362269 73.47979669912594 191.68312922045783 3.308260464769828 3.7976173873254258 0.635594432581715 2 10549.88363817283 False
7815 0.0 41.497736016190096 11.13345829907106 0.1201553049782346 71.70613976698166 174.58745551052763 -1.6974919647423072 1.7815269359464097 0.5352792594560092 2 11165.816231391931 False
10605 0.0 nan nan nan nan nan nan nan nan 0 nan False
10607 0.0 44.119883677161845 9.231156218646664 4.5139644334844435 69.48510550212504 167.61302288026295 -4.311362381591428 -0.07564972460559523 0.855976625014405 5 9145.982531426409 True
22312 0.0 nan nan nan nan nan nan nan nan 0 nan False
36609 0.0 nan nan nan nan nan nan nan nan 0 nan False
36611 0.0 39.32701254373573 6.380309010565509 0.1071564794967893 67.0024948143038 181.01639543139208 0.3976198153673294 -2.994190390275193 0.844323214214656 3 11816.080167982072 False
42217 0.0 39.758252574150056 13.182609322182753 0.9676237982439976 73.8115075459884 180.0999841805689 0.02793579401849974 3.8115311089855477 0.8351008610837526 3 11384.874142945158 True
101604 0.0 35.49254214507106 7.759876584987491 0.8787181468947538 67.98702098851416 191.28771591764462 4.208977246630804 -1.622427043094159 0.8820927737812743 2 9579.43598772079 True
... ... ... ... ... ... ... ... ... ... ...
preprocess.quality_query.to_table()
criteria counts cumulative_counts
TOTAL 100000 100000
VALID_RECO 59895 59895
PASSED_GH 21700 21700

Running with mode="DROP" instead, gives this:

EVENT_ID TIME RA DEC ENERGY ALT AZ FOV_LON FOV_LAT GAMMANESS MULTIP HMAX
10607 0.0 44.119883677161845 9.231156218646664 4.5139644334844435 69.48510550212504 167.61302288026295 -4.311362381591428 -0.07564972460559523 0.855976625014405 5 9145.982531426409
42217 0.0 39.758252574150056 13.182609322182753 0.9676237982439976 73.8115075459884 180.0999841805689 0.02793579401849974 3.8115311089855477 0.8351008610837526 3 11384.874142945158
101604 0.0 35.49254214507106 7.759876584987491 0.8787181468947538 67.98702098851416 191.28771591764462 4.208977246630804 -1.622427043094159 0.8820927737812743 2 9579.43598772079
101614 0.0 35.10404178608531 7.761124864610876 0.8044300019185018 67.91578787905237 192.2955001287401 4.59407965049234 -1.6196555603949843 0.9534247605630375 2 9819.945429768006
158707 0.0 43.70203610110545 8.837011833749948 8.340466870436115 69.16713653769105 168.9553064577644 -3.906852221775937 -0.4781807764333982 0.6671689461596626 7 10390.57041889935
243205 0.0 38.667999787833736 4.082779296133155 7.353141833650544 64.68325267947209 182.42326261654924 1.040428102885588 -5.296069400797796 0.776961403714988 3 9055.18240057809
243217 0.0 38.64339135111972 4.0392617588179895 6.952456093580335 64.63865147450528 182.47588580284236 1.0648352169273947 -5.339725880239597 0.83467228299174 3 9851.006604209662
329010 0.0 43.715917424242946 9.25592194577844 3.1268765514540457 69.57828174283871 168.72396990259503 -3.9122085316907196 -0.0590712363524156 0.8818997025060955 4 8863.360606510592
353201 0.0 37.92246434262939 9.23761178505964 0.2548752624380273 69.77801219205789 185.22961361037795 1.805453423615171 -0.14451968919799074 0.8003449970879816 3 9717.493291877923
... ... ... ... ... ... ... ...  

And just to check it all worked:

FOV = 30
plt.hist2d(
    processed["FOV_LAT"].to_value("deg"),
    processed["FOV_LON"].to_value("deg"),
    range=[[-FOV / 2, FOV / 2], [-FOV / 2, FOV / 2]],
    bins=[50, 50],
)
plt.gca().set_aspect(1)
image

the configuration it generates:

EventPreprocessor:
  FeatureGenerator:
    features:
    - [EVENT_ID, event_id]
    - [TIME, time],
    - [ALT, HillasReconstructor_alt]
    - [AZ, HillasReconstructor_az]
    - - reco_fov_coord
      - altaz_to_nominal(AZ, ALT, subarray_pointing_lon, subarray_pointing_lat)
    - - FOV_LON
      - reco_fov_coord[:,0]
    - - FOV_LAT
      - reco_fov_coord[:,1]
    - - reco_icrs_coord
      - altaz_to_icrs(AZ, ALT, TIME, subarray.reference_location)
    - - RA
      - reco_icrs_coord[:,0]
    - - DEC
      - reco_icrs_coord[:,1]
    - - ENERGY
      - RandomForestRegressor_energy
    - - GAMMANESS
      - RandomForestClassifier_prediction
    - - MULTIP
      - subarray.multiplicity(HillasReconstructor_telescopes)
    - - HMAX
      - HillasReconstructor_h_max
    - - passed_gh
      - apply_gammaness_cut(GAMMANESS, ENERGY)
  QualityQuery:
    quality_criteria:
    - - VALID_RECO
      - HillasReconstructor_is_valid
    - - PASSED_GH
      - passed_gh
  energy_reconstructor: RandomForestRegressor
  feature_set: dl2_to_dl3
  features:
  - EVENT_ID
  - TIME
  - RA
  - DEC
  - ENERGY
  - ALT
  - AZ
  - FOV_LON
  - FOV_LAT
  - GAMMANESS
  - MULTIP
  - HMAX
  gammaness_reconstructor: RandomForestClassifier
  geometry_reconstructor: HillasReconstructor
  mode: EventPreprocessorMode.DROP

@kosack kosack mentioned this pull request May 18, 2026
@kosack

kosack commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

One thing of course missing is x_max, which we should have, but then we need to think about how to get th atmosphere profile, which would come from calibpipe, so I'll leave that for a future version. We have h_max, however, so can add that now, even if it's not in GADF

@maxnoe

maxnoe commented Jun 1, 2026

Copy link
Copy Markdown
Member

x_max

x_max really is not that interesting at DL3 for gamma rays, is it?

I wouldn't worry about filling that, we won't get good estimates for mono anyways.

@kosack

kosack commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

x_max really is not that interesting at DL3 for gamma rays, is it?
I wouldn't worry about filling that, we won't get good estimates for mono anyways.

It's more for CR studies, but probably then you want DL2 anyhow (but maybe with event types...).

@kosack kosack force-pushed the feature/dl3_eventlist_with_processor branch from 35c993a to 4125df3 Compare June 1, 2026 15:05
@kosack

kosack commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

It would be nice to add a tutorial here, but I lack a nice test dataset of DL2 events with energy and gammaness applied and an optimized cuts file. We should create one, or else make a fixture to generate one. Maybe for another PR?

@kosack kosack marked this pull request as ready for review June 2, 2026 15:36
@kosack kosack requested review from maxnoe and mdebony June 2, 2026 15:36
)
icrs_coord = event_coord.transform_to("icrs")

return u.Quantity(np.column_stack((icrs_coord.ra.deg, icrs_coord.dec.deg)), u.deg)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to return the SkyCoord object and then do coord.ra.to(u.deg) in the transform isntead of filling a 2d array and then indexing into that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just to output a simple 2D array, not a SkyCoord: astropy.table supports writing columns of SkyCoords, but our io.write_table function doesn't.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But looking at the examples and the DL3 config, the goal seems to anyway be to get two separate columns? The 2d array is never actually used.

@kosack kosack Jun 8, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used: it's a bit confusing maybe, but there is a 2D feature genereated with this function, and then the following two features extract the single columns. THat was the most efficient way to do this without calculating twice.

e.g.:

            (
                "reco_fov_coord",
                "altaz_to_nominal(AZ, ALT, subarray_pointing_lon, subarray_pointing_lat)",
            ),  # makes the 2D coordinate
            ("FOV_LON", "reco_fov_coord[:,0]"), # makes single column 
            ("FOV_LAT", "reco_fov_coord[:,1]"),  # makes single column

The 2D column (reco_fov_coord) is then not included in the EventPreprocessor output, so it's temporary only.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I wrote above, right?

and then indexing into that

You can just leave the skycoord and then in the feature generator you access its fields instead of indexing into the array.

@mdebony

mdebony commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

It would be nice to add a tutorial here, but I lack a nice test dataset of DL2 events with energy and gammaness applied and an optimized cuts file. We should create one, or else make a fixture to generate one. Maybe for another PR?

If you look in the #2979, I've implemented fixtures that generate the optimized cuts files from existing test files.

("passed_gh", "apply_gammaness_cut(GAMMANESS, ENERGY)"),
],
"quality_criteria": [
("VALID_RECO", f"{preprocessor.geometry_reconstructor}_is_valid"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As quality criteria are independant for each reconstructor, I think it would be safer to perform an overall "and" on reconstructor quality cut.

@kosack kosack Jun 4, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add criteria also for energy and gammaness. It's true they could be different, and it's useful to track

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realized I forgot the MULTIP cut, so I added a similar function hook where we can add a function of f(multiplicity, energy) that applies it. That should also come out of the OptimizationResult.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, but perhaps for the future after we refactor the optimization code to use this as well, would be to just copy over the OptimizationResult.quality_query directly to the QualityQuery here, rather than allowing the user to attach them - it's already stored in the output. That way we are sure to have the same selection applied.

I have to think what would be the cleanest way to do that, while keeping this class a bit generic. I think one option is to expand the OptimizationResult class to be a bit smarter: it should also handle the actual cutting, and then we can just pass it in like the Subarray, rather than require gammaness_cut_function and multiplicity_cut_function to be passed into the EventPreprocessor on reconstruction.

In that scenario, I would just have:

("PASSED_GH", "optimization_result.pass_gh(GAMMANESS, ENERGY)"),
("PASSED_MULTIP", "optimization_result.pass_multip(MULTIP, ENERGY)"

@kosack

kosack commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

If you look in the #2979, I've implemented fixtures that generate the optimized cuts files from existing test files.

Great, then once that is merged, we can add a more fancy test here. Right now, I just pass in very simplisitic cut functions in the test, but eventually we can have an integration test that checks the whole thing.

@kosack

kosack commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@maxnoe

maxnoe commented Jun 11, 2026

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@kosack kosack force-pushed the feature/dl3_eventlist_with_processor branch from 9d02316 to b7a66e5 Compare June 16, 2026 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add "column" mode to EventPreprocessor: keep but mark events that don't pass qualityquery

3 participants