Matt Mireles released gemma-tuner-multimodal on April 7, 2026, a fine-tuning toolkit he developed over six months for training Gemma 4 models locally on Apple Silicon. The Show HN post reached 117 points with 15 comments, while an accompanying announcement received 163 likes and 184 bookmarks, indicating strong community interest in local multimodal fine-tuning.
Developer Builds Custom Streaming Pipeline for Limited Compute Budgets
Mireles originally created the project to fine-tune Whisper locally on his M2 Ultra Mac Studio with a limited compute budget. Facing a dataset of 15,000 hours of audio stored in Google Cloud Storage—too large to fit on local storage—he built a system to stream training data directly from cloud storage during training sessions.
The toolkit supports:
- LoRA fine-tuning for Gemma models (originally Whisper, expanded to Gemma 3n, now Gemma 4)
- Audio, image, and text modalities
- Local execution on Apple Silicon via PyTorch and Metal acceleration
- Cloud data streaming from Google Cloud Storage
- Easy-to-use CLI wizard interface
Project Addresses Gap in MLX Audio Fine-Tuning Capabilities
Mireles explained the project's necessity: "You can't really do audio fine-tuning with MLX, that's really the reason this exists (in addition to my personal interest). I would have preferred to use MLX and not have had to make this, but here we are."
The developer noted technical challenges with longer sequences: "One thing I have learned so far: It's very easy to OOM when you fine-tune on longer sequences! My local Mac Studio has 64GB RAM, so I run out of memory constantly." This transparency about limitations reflects the realities of local AI development on consumer hardware.
Tool Democratizes Access to Custom Multimodal Models
The project fills a significant gap in the local AI fine-tuning ecosystem. Most multimodal fine-tuning tutorials assume cloud compute availability, while MLX—Apple's preferred machine learning framework—lacks audio fine-tuning support. By enabling fine-tuning on consumer Mac hardware with cloud data streaming, gemma-tuner-multimodal makes custom multimodal models accessible to developers without expensive GPU infrastructure.
The community response suggests strong demand for such tools. Beyond Hacker News engagement, the project accumulated 10,871 impressions and 184 bookmarks on X, indicating developers are actively seeking local fine-tuning solutions that balance compute constraints with data access needs.
Mireles concluded his announcement: "And so I made this. I hope you have as much fun using it as I had fun making it." The project is available on GitHub at github.com/mattmireles/gemma-tuner-multimodal.
Key Takeaways
- Matt Mireles released gemma-tuner-multimodal after six months of development, enabling local Gemma fine-tuning on Apple Silicon with multimodal support
- The toolkit streams training data from Google Cloud Storage during training, solving the problem of large datasets that exceed local storage capacity
- The project addresses a gap in MLX's capabilities, which doesn't support audio fine-tuning for local Apple Silicon development
- The Show HN post reached 117 points while the X announcement received 163 likes and 184 bookmarks, indicating strong community interest
- The tool runs on consumer Mac hardware (M2 Ultra with 64GB RAM), though developers may encounter out-of-memory issues with longer sequences