Hacker Newsnew | past | comments | ask | show | jobs | submit | more locusofself's commentslogin

650/month is incredible. My mortgage payment is almost eight thousand dollars.


I really only need a small apartment, anything more would be a waste for me. I just need a place to cook/eat and sleep, and keep my things.


It's also really nice to just do some work outside instead of staring at a screen . I really enjoy mowing my huge lawn, doing some light landscaping stuff. My wife and daughter pick weeds for hours and it really centers them.


I don't think the data supports this at all.

Home prices have doubled here since 2014. We are having a modest dip right now.


This might be an unpopular thing to do, but lately I've been literally having claude/chatgpt write books for my personal consumption: things to brush up interview skills, python, system design, and even agentic coding. I've also had it aggregate substack posts for me, etc.

I have it generate an ePub file, and drag and drop it to my kindle.

It's a great way to take information with me, out for a long walk where I can focus on it and absorb it.


I recommend using internet archive to access free texts for ereaders. LLMs will be trained on these texts already, it is better to get it straight from the source.


How are you sure that it hasn't hallucinated content that you end up relying on?


wow, they had to OCR it back in from paper printouts

> This source code is old enough that it hadn’t been stored digitally. “A dedicated team of historians and preservationists led by Yufeng Gao and Rich Cini,” calling itself the “DOS Disassembly Group,” painstakingly transcribed and scanned in code from paper printouts provided by Paterson. This process was made even more difficult because modern OCR software struggled with the quality of the decades-old printout.


I'd like to hear more about what works in OCR of dot-matrix fonts.

I've been able to OCR letter-quality printer output to 97% (mostly Os and Xs problems).

But it seems that machine-learning text-recognition is also now biased to reject computer code because it doesn't look like human language.


There's a writeup here from one of the people on the team about the work it took to go from the listings to source code. http://cini.classiccmp.org/recoveryblog.htm

> With less-than-satisfactory OCR output, I resorted to a process I used many years ago when converting scans made of old Commodore ROM dumps printed on a Commodore 1515 dot-matrix printer. The process relies on the ASCII OCR output having the same repetitive errors. "B" and "8", "S" and "5" are good examples, as are "l" and "1", and "O" and "0". There are many other similar single-character errors and, when working with x86 code, there are similar errors with instructions like "MOV". This process naturally works better if the output file is monolithic rather than single-page OCR conversions because you can do substitutions across the entire converted printout and not 75 separate files.

> The next formatting hassle was the spacing. This required repetitive substitutions of a descending numbers of spaces to tabs (i.e., replace 8 spaces with a tab, 7, 6, etc.). Then if you want to return it to fixed spaces (which is likely how the original printer printed it -- spaces and not vertical tabs), you can. For pure re-creation work, spaces produce absolute column formatting while tabs can move around depending on the program displaying the file.

> Once you run thought the 15 or so common global substitutions and tab conversion, it's a lot easier to work with the file to fix formatting and perform other cleanup. This is then followed by a line-by-line comparison against the original printouts. Overall I'd say the conversion output quality with this method is very good.


Hmm, doesn't say anything about what OCR tools they used.

I've got a 4" stack of wide-carriage COBOL. I guess it's two revisions of the same system so I only need to scan the newer half. Its probably from a TI Omni 810.

On the other hand, I've got 100 pages of code printed in compressed font by someone wanting to make sure that 80+ char lines fit within margins. So a lot of words just don't come out at all. A frequent error is "A" becomes "H", "O" becomes "U" because the top dots aren't "attached".

And columns of line numbers starting with 0001, or hex? The most confounding thing is OCR that thinks 00 is a sideways 8, and that dominates the uniform block, so it tries to interpret the whole column as sideways text. In another situation, it interprets two stacked lines (each starting with 0) as one line starting with 8 and it just goes off the rails.

So I've been working with automatic skew correction, then clipping it into rows, in order to get each line of text isolated from the surrounding context. When I do that, I get better results, but it is not great either.

I'm considering going all-in on training a new recognizer on snippets. For that, I'll be constructing "The Set of All As" and so on.


Pretty interesting. I wonder if a whitelist against certain columns in the output could help, e.g. this column can only contain valid x86 instructions (e.g. MOV is allowed, M0V is not), this column can only contain hexadecimal (1 is allowed but never "l"), etc. Probably more work than it's worth given the final line-by-line comparison that happens anyway.


I tried giving a dictionary of "preferred recognitions" to the OCR but got no improvement.

I might do what you said, column sensitive. A first-pass assembler which does spell checking and makes the corrections. M0V is a single replacement on MOV, MOV8 is closest to MOVB. For registered, R Oh must be R zero. But R Oh will be valid as a symbol name (curse your poor choice of symbol name). Alas, R1 is defined in the symbol table as a mnemonic for 1.

This idiom occurs in TMS9900 assembly (of which I have 2100+ pages to scan)

Indexed addressing into caller's register file: MOV @R1*2(R13),R0

Where R1 is 1, a small offset in #words so the operand is pointer to the word after where R13 points. Yet @RI(R13) is valid if RI is in the symbol table.

So there has to be some heuristic that starts at "is RI a defined symbol?" "Can a symbol be used in this context?" Yes/Nope: it is probably R1.

And R11 is used a lot.

Same curse on people who used I as a counter variable in type-in programs. Countless folks typed it as a 1 in expressions before magazines got better fonts.


Boring reply perhaps, but I've had wild success with adding even a tiny LLM afterwards to do "fixups" over OCRd text, works great for the typical O/0 issues and similar, just pass it the scrambled OCRd text together with the text around it, and even dumb and tiny 7b models running on CPU do a pretty fine job.


ABBYY has a specific module for dot matrix printouts so I’m surprised it was a struggle for them but every document is different


I've recovered some ancient software I wrote via scanning in listings I found among my dad's papers.


Yet another case where text printed on paper outlived any digital storage.


Seems like it was never digitally stored in the first place, and the printed text was barely readable due to age. Not really a big win for paper.


Well it had to have been on disk or tape at some point. It wasn't all typed in by hand every time they needed to build a new version.


unless they used punch cards


Punch cards are still a form of digital storage, mind.


Also a form of storing things on paper


Reminds me of an old fortune cookie message or meme, something like "digital data is made from analog parts".


> unless they used punch cards

For MS-DOS?


Not likely. Punch cards disappeared around the end of 1976.


I remember seeing stacks of cards being carried into/out of the university "computing center" in the mid 1980s, on more than a couple of occasions. Though in retrospect, these were probably just old programs that had been in various professors offices since the mid 70s, being taken to get read into some disk in the mainframe.


We still learned how to use them in the 80’s high school computer classes, mostly because we had a balance of CP/M plus card-reader/early DOS machines, eventually .. in the labs. Rich kid schools had Apples though, and some of them also had card readers for BASIC ..


"[..] card readers for BASIC"

Finally, a sensible use case for BASIC's "READ" and "DATA" commands. Learning BASIC as a kid on a micro, it always struck me as an odd way to get input into a program. Sure, with INPUT, you'd have to hand enter your input every time, but baking into the program meant that you'd have to edit your program any time you wanted to change anything.

But with a card reader, you could "cut the deck". Keep the program cards, and then just stack on whatever set of data cards you wanted.

From this vantage point, in the 21st century with our flying cars and what not, it seems really quirky that back then, even your data could be a tangible thing.


Indeed, we still pay homage to the era with terms such as the stack, pushing and popping, and all kinds of things .. i remember we had fun inserting random infinite loops in other students cards on occasion until we all realized we could just have marked “finished” stacks with an X across the spine, and also to ease sorting, and so on .. i would mark certain sub-routines with different color markers on the spine too, just to see a budget for how much computing time i expected to be billed for, and so on and on .. lots of valuable hands on came from the card-based computing, its a lost art ..


My firt job out of college in the early 1990s was at an equipment manufacturer who was still using them. They had a big chart on the wall titled "punch-card elimination" and a line trending down, but it wasn't at zero yet.

My work there was all new code and didn't involve any of that, however.


My college used them for PL/I and IBM Assembly language programming classes until 1982. Cards were used well into the mid-80s.


I learned COBOL in college at UNC-W on punch cards in 1980.


We still used them in the university as late as in 2010...

...as writing paper.


I threw out all my punch cards. Wish I'd kept at least a listing!


I find punch cards being used in old engineering books I buy from the 60s.

Maybe write them again?


Punch* 'em again


The idea that it never existed digitally is obviously untrue. Likely poor wording in the author's part. They probably meant something like, so old that a printout is all that survived (which sounds vaguely like not being digital to someone in an era so far removed from a time when programs were/could realistically be printed.)


Having printouts were necessary when:

1. you were using a DECwriter dot matrix printer as a terminal

2. using an ASR-33 teletype as a terminal

3. using punch cards or paper tape

4. using a glass tty that could only display 24 lines

5. when you did not have a remote terminal, and wanted to spread your code out on a table and debug it


Brings back memories of desk checking


> a time when programs were/could realistically be printed

Really depends on the program. Source code is often quite manageable. Even artifacts aren't always as large as you might expect. Busybox on my system weighs in at 1.9 MiB or alternatively 928 KiB with zstd maxed out.

But I don't really see a point to printing any of it. A situation that might require the printouts is likely to largely preclude the continued existence of modern electronics, the ability to replace batteries, or even a connection to a reliable electrical grid.


Yeah, that's why I tried to include both categories. Even for programs that are small enough to be printed, we just don't do it any more. I could have worded that part better myself.


Early versions of some things, MS Basic being one example I think, were baked into ROM. One of the best innovations that Paul Allen came up with was adding software hooks to the code so bugs that were found later could still be patched.


How did they print it then, I wonder?


They had some old German guy with a big beard, and two interns, running some sort of big contraption that looked like a medieval torture instrument, and the interns would run and put letters in a row and then the old guy move a massive letter and in the end out came a bit of paper with source code on it.


I appreciate the subtle sarcasm, I did not consider a printing press as an option!


Where can I buy this printer?


Humbrechthof, Mainz, Germany ofc.

(https://en.wikipedia.org/wiki/Humbrechthof)


One has to be pretty ignorant and dismissive to claim that this is not "a big win for paper".

First of all, that comment is weirdly out of place. The quality and longevity of paper is not the topic.

Secondly, there are fragments of paper with writing as old as 2,000 years.

Thirdly, paper you look at and see the writing. With digital documents, you need the technology to read the medium and then you need to know how the information was encoded onto the medium, before you even arrive at the same level with paper, where you can start to decide the actual writing.

Paper has brought us where we are today, and given us what we know about the past. Don't be so ignorant and dismissive.


> struggled with the quality of the decades-old printout.

barely

It sounds like this printout has deteriorated badly and was barely readable.


If it was your standard issue cheap dot-matrix printout, it may not been particularly legible even back then.


Even if the printer itself was fine it doesn't imply the ribbon was wet enough.


I assume you are saying it would look less ridiculous? By how much?


certainly will be interesting to find out..


I assume it is just meant to imply light-weightness, either in the application itself or making you feel as though you can float / fly .


Wonderful piece. I feel this so acutely as a 42yo father and tech worker. It's hard to even keep up with my own siblings. I see my some of my "best friends" once a year if that. Sending the occasional text and stupid memes back and forth is no replacement for how things were when we were younger. I keep telling myself I'm going to do something about it, start a book club, plan more trips with old friends, etc., and another year slips by.


I have a strong urge to listen to pink Floyd and "10 years have got behind you" song. The years just flow past, Winter, Spring, summer, fall each day so busy and the time so short and also long.


TIL you can run OpenBSD on apple silicon. With how much effort has gone into Asahi Linux, I'm surprised.


OpenBSD has had support for a bit over four years (v7.1, though the earlier v7.0 had /some/ support).

OpenBSD 7.1, 2022-04-21 -- https://www.openbsd.org/71.html

R/AsahiLinux posting from around that time, only one comment -- https://www.reddit.com/r/AsahiLinux/comments/u8rb2o/openbsd_...


Running anything on Apple Silicon is result of Asahi Linux effort.


Only M1 and M2 machines though. M3 and up is still missing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: