It's possible in Blender to bake an audio file to f-curves, and use it to produce a simple visualization. I wanted to do something much more detailed (note by note). I tried creating a new workflow and it was fairly successful and convenient.
The sequence basically is:
- Create MIDI track(s) and corresponding audio track as you desire.
- Convert MIDI track(s) to CSV file(s) containing timed note events. [I used a little Java program to do the conversion - not quite ready to publish yet. Of course one could create CSV data directly.]
- Import CSV file(s) into Geometry Nodes.
- Process note events. Most of the processing is in a reusable node group.
- Apply the processed data to geometry as you desire.
- Add the animation and the audio track to VSE, and render.
That's pretty darn cool. I'm impressed. Think of the possibilities. I expect great things from you!