Hacker Newsnew | past | comments | ask | show | jobs | submit | jballanc's commentslogin

I've been working on RVW, my adaptation of the standard transformer model that is capable of online continual learning without catastrophic forgetting. I finally published the first pre-print of my early experiments: https://doi.org/10.5281/zenodo.20064617

Now I'm working on expanding the work into more parameters and improving performance. I just finished an extremely harsh test of a Nemotron-flavored RVW that consisted of stretches of a random assortment of domains interspersed with long runs of single domains. Across all of it the model didn't forget (and actually improved on some of the more challenging domains). PPL on SmolTalk is still in the ~18 range, which I'd like to get lower, but this is all with only 4B params.

Currently, I'm training a Llama 3.2-flavored RVW with only about 2B params to see how that turns out. Depending on results of that, I may take it to Gemma 4 next.


Super interesting. I'm also super into the idea of always online continual learning.

I'll check it out. Thanks for sharing.


IIRC, there was always a way to filter out certain messages (or that may be an alt.org customization, but it's been a part of my config file for a while now).


For real! Valkyrie is the perfect "just bash things while only half paying attention" class. Great for when I'm playing to unwind (as opposed to playing as a challenge to myself).

At least there's still Samurai.


That was my favorite class. Still remember the game where I mostly (t) threw my wakizashi (b) at enemies.


I think Douglas Adams had one of the best quotes regarding observing infinity:

"Infinity itself looks flat and uninteresting. Looking up into the night sky is looking into infinity – distance is incomprehensible and therefore meaningless."


It's been a while since I worked at Apple, but back in the day the entire OS X Server team made extensive use of kerberized NFS shares for moving around large files...

...the last version of Server shipped in 2021 (and the last real version shipped almost a decade before that).


Apple was still using Kerberos when I was there not that long ago.


Hmm, the more I think about I think you’re right, they likely still do use kerberized nfs, but I think the auth layer they use is… different. Without giving too much away, the internal SSO software ends up either wrapping or providing Kerberos tickets in some way, so I’m imagining that code path doesn’t panic.

In fact that’s probably the clue… everyone internally at Apple using krb5 auth with nfs is probably using the internal SSO software and the code path for “vanilla” Kerberos (ie. Ticket Viewer.app and so on) has zero testing. Maybe I’ll write that into the next crash tracer report I type up :-D


If you want a slightly different black hole to send your report to, you could use Feedback Assistant: https://developer.apple.com/feedback-assistant/


My first job after finishing my undergrad degree was performing quality analysis on corn starch. As a condition of employment, I had to sign a paper saying anything I invented related to corn was property of my employer.


It's been more than a few years since I worked at Apple, but they were always unique in the tech space in that their retail division dwarfed headcount. If I recall correctly all of OS X Lion was produced by around 3,000 engineers (and probably less, since I think that count included iLife and iWork).


Aren’t they sort of unique in that they… have a retail division, as a real ongoing thing (I’m sure MS tried an MS store but I’ve never seen one).

Well, unique other than Amazon I guess.


I've been working on an ML model capable of robust continuous learning, resistant to catastrophic forgetting without relying on replay, an external memory system, or unbounded parameter growth. Last week I confirmed the first non-toy, 580M parameter version soundly beat LoRA, EWC, and full fine tuning. This week I'm scaling up to 4.4B parameters...


Do you have a public repo for that? I'm also trying to do that although I'm using "replay"/distillation and hopfield memory banks.


No public repo yet, but coming soon. Just filed for a patent on the technique and am preparing a paper. Posted the first figure I have for the paper here: https://dev.to/jballanc/what-would-you-do-with-an-ai-model-c...


We need benchmarks that can distinguish between continuous learning and long-context extrapolation.


oh that's easy: continuous learning is not something current architectures can do. So the benchmark for that can be done mentally


Based on what you've already mentioned, there's a good chance you're familiar, but on the off chance you're not: "Funkungfusion" (or, really, anything off the Ninja Tune label) might be right up your alley.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: