we fixed groupby issue in cuj2; sync the change into nightly test. this test doesn't need to use gpu at all. it returns soon after data ingestion finishes.
This PR does two things:
merge latest groupby based filtering to CUJ2
add a debug mode so we only run dummy trainer for measure data processing performance.