I've been working on RVW, my adaptation of the standard transformer model that is capable of online continual learning without catastrophic forgetting. I finally published the first pre-print of my early experiments: https://doi.org/10.5281/zenodo.20064617
Now I'm working on expanding the work into more parameters and improving performance. I just finished an extremely harsh test of a Nemotron-flavored RVW that consisted of stretches of a random assortment of domains interspersed with long runs of single domains. Across all of it the model didn't forget (and actually improved on some of the more challenging domains). PPL on SmolTalk is still in the ~18 range, which I'd like to get lower, but this is all with only 4B params.
Currently, I'm training a Llama 3.2-flavored RVW with only about 2B params to see how that turns out. Depending on results of that, I may take it to Gemma 4 next.
IIRC, there was always a way to filter out certain messages (or that may be an alt.org customization, but it's been a part of my config file for a while now).
For real! Valkyrie is the perfect "just bash things while only half paying attention" class. Great for when I'm playing to unwind (as opposed to playing as a challenge to myself).
I think Douglas Adams had one of the best quotes regarding observing infinity:
"Infinity itself looks flat and uninteresting. Looking up into the night sky is looking into infinity – distance is incomprehensible and therefore meaningless."
It's been a while since I worked at Apple, but back in the day the entire OS X Server team made extensive use of kerberized NFS shares for moving around large files...
...the last version of Server shipped in 2021 (and the last real version shipped almost a decade before that).
Hmm, the more I think about I think you’re right, they likely still do use kerberized nfs, but I think the auth layer they use is… different. Without giving too much away, the internal SSO software ends up either wrapping or providing Kerberos tickets in some way, so I’m imagining that code path doesn’t panic.
In fact that’s probably the clue… everyone internally at Apple using krb5 auth with nfs is probably using the internal SSO software and the code path for “vanilla” Kerberos (ie. Ticket Viewer.app and so on) has zero testing. Maybe I’ll write that into the next crash tracer report I type up :-D
My first job after finishing my undergrad degree was performing quality analysis on corn starch. As a condition of employment, I had to sign a paper saying anything I invented related to corn was property of my employer.
It's been more than a few years since I worked at Apple, but they were always unique in the tech space in that their retail division dwarfed headcount. If I recall correctly all of OS X Lion was produced by around 3,000 engineers (and probably less, since I think that count included iLife and iWork).
I've been working on an ML model capable of robust continuous learning, resistant to catastrophic forgetting without relying on replay, an external memory system, or unbounded parameter growth. Last week I confirmed the first non-toy, 580M parameter version soundly beat LoRA, EWC, and full fine tuning. This week I'm scaling up to 4.4B parameters...
Based on what you've already mentioned, there's a good chance you're familiar, but on the off chance you're not: "Funkungfusion" (or, really, anything off the Ninja Tune label) might be right up your alley.
Now I'm working on expanding the work into more parameters and improving performance. I just finished an extremely harsh test of a Nemotron-flavored RVW that consisted of stretches of a random assortment of domains interspersed with long runs of single domains. Across all of it the model didn't forget (and actually improved on some of the more challenging domains). PPL on SmolTalk is still in the ~18 range, which I'd like to get lower, but this is all with only 4B params.
Currently, I'm training a Llama 3.2-flavored RVW with only about 2B params to see how that turns out. Depending on results of that, I may take it to Gemma 4 next.