Hacker Newsnew | past | comments | ask | show | jobs | submit | leetharris's commentslogin

I have the pro account for ChatGPT, Claude, Gemini, and Grok.

They all have various strengths and weaknesses. My favorite is still ChatGPT, then Gemini/Claude, then Grok.

Grok often feels 1-2 generations behind the competition in general use, but it has three things that I love:

1. It seems to be the best at understanding current events. Maybe due to X integration, or some other tool call optimization in the backend? I don't know, but I often ask about things going on, and the other models have outdated info, give unhelpful answers, etc.

2. It is generally the least sycophantic for personal things. Anthropic is getting here too. ChatGPT and Gemini are working on this, but previous models in those families would almost never say anything negative about what I am doing. Sometimes I need career advice, personal advice, etc and I like the tone of how it responds. I think Claude will be caught up soon.

3. For professional work, there are certain topics that other models would refuse to engage with. At my last company we had an enormous amount of legal users. When a deposition would need a summary on certain topics, most models would refuse. Grok would not. I understand the need for safety and I don't blame the other model providers, but for some professional use cases you NEED a model that is capable of handling sensitive subjects.


I recently worked with NRC dataset, specifically about nuclear reactor events and status reports(example: https://www.nrc.gov/reading-rm/doc-collections/event-status/...). Public data that just needed some cleaning. Several time Claude API would refuse to engage. Because of that I can't trust Claude to clean production data sets.

> 1. It seems to be the best at understanding current events. Maybe due to X integration, or some other tool call optimization in the backend? I don't know, but I often ask about things going on, and the other models have outdated info, give unhelpful answers, etc.

That makes sense, but occasionally you ask about an issue where it's clearly received political instruction from the commissar and it acts totally lobotomized. But it's true that Gemini will often blithely state that something could never happen and you'll say "what do you mean, that just happened" and then it comes back apologizing after running a Web search.


We saw this too with Gemini specifically. My favorite example - we built a hallucination detector (given the input, does the output make any false claims) in Gemini, and after the Seahawks won the Superbowl in February, it would consistently flag that as "not possible".

I believe it was assuring me the Israelis would never invade southern Lebanon and declare a buffer zone inside it after that had already happened.

Do you have an example of this?

Which "this"?

All 4 of these still regularly insist that I am a genius and everything I say is brilliant. Grok definitely pushes back more than the others, but I don't like how sycophantic they all still are.

I don’t want to open up that whole can of worms but Grok on any vaguely philosophical or political topic is a scaredy cat and has a very hard time staying factual if it could make Musk or the conservative movement appear negatively.

Opus 4.8 has made huge jumps in being less sycophantic. I see it pushing back on ideas a lot, and that's very helpful when you're evaluating options.

Almost too much so, it often feels like opus is pushing back for the sake of pushing back. The way old models used to add disclaimers to every message regardless of content

That's because it can't literally reason, it has just been manually steered into those reasoning speech cycles.

Yes, yes. Does everyone still find it interesting to go over this point every time about how it's not literally a person with human reasoning?

Uh, only when people don't seem to understand it, or try to personify it. Which is quite often.

What about when they ask how you can take gold at IMO and solve research-level math problems without reasoning?

People “personify” their cars but I don’t think because they think cars have human cognition

People are weird about their cars and make major errors in judgement as a result (e.g. we tolerate incredibly high rates of people getting killed because they were "hit by a car", as though the driver had nothing to do with it). Pushing back on that is absolutely worthwhile.

Which has approximately zero to do with the anthropomorphization of the car itself. I could have chosen a different machine or tool to make my point.

> Which has approximately zero to do with the anthropomorphization of the car itself.

You don't think people talking about the car doing things has anything to do with anthropomorphising the car?


No, in general I don't buy this idea that if we start using awkward phrases like "died by suicide" everywhere or avoiding phrases like "car accident" (which, despite what advocates claim, is a literally accurate description of unintentionally hitting someone or something with your car) but avoid changing any of the circumstances that cause the behavior it changes anything.

That's a completely different claim from the one you were making in your previous comment.

> avoid changing any of the circumstances that cause the behavior

The normalisation of unsafe driving is the circumstance that causes the behaviour. Just look at how the cultural shift in how drink-driving is perceived over the last few decades has changed the rate of it happening.


Not in the same way.

That doesn't seem to be much more than special pleading without an explanation of how you think it's different.

It’s more like Opus wants you to do its job for it. I feel that amount of time when I tell it “no, you do that” increases with each new version.

It was mind blowing the first time I got a refusal, and retorted "yes you can" and had that work, but now it's just another reason to move to a different model.

> Anthropic is getting here too.

I almost exclusively use claude for all my professional and private needs. In my experience it's really good at adhering to my wishes in regards to sycophancy and pushing back. If you really want to you can tell it to systematically push back on anything where pushback makes sense until it continues with the flow of conversation.

In my first therapy session, the answers were too long and contained multiple questions, spawning multiple threads of conversation. I told it to tone it down and only ever ask one question back, maybe two, if they are related. The answers got too short. I told it to make them "slightly longer" again and reached a sweet spot.

The conversation is yours to form! You need to find the "system prompts" and guidelines to give it that work for you.


What are you using it for? Im pretty surprised ChatGPT is your top model but maybe you arent using it for code.

codex-5.5 > Opus 4.7, imo.

My favorite was ChatGPT, and I still use it often, but it becomes way too 'hair splitting' argumentative too often over very minor non controversial topics. Like it's always going out of its way to "well actually..."

Grok used to be really really bad ~8 months ago or so, but it's gotten better.

ChatGPT team needs to turn down the 'disagree just because' factor by a lot.


But in terms of agentic coding? Dead last.

My SO works in audit/compliance and business Gemini definitely does not refuse to answer.

Career and personal advice from LLMs, not sure if thats your best bet

1. It seeks to manipulate the information you see and your lens to the world. This is already partially true from independent and major publications.

As soon as we hand over searching out information to social media algorithms and LLM tools, we abandon our ability to see reality outside our direct vision.

Grok's ownership has already demonstrated capacity to influence major world elections and other events. You cannot trust it with this sort of information gathering and reporting.


The only mistake he made was not buying Tesla around the M3 launch. Elon was desperate and would have sold it to him for cheap. He didn't take the meeting.

Otherwise I completely agree, once Tesla reached takeoff, it's too hard for anyone else to do it without burning mountains worth of cash.


I will say this in the most charitable terms I can. Microsoft simply does not have it in their culture to compete with something like this. Their prime days are over. They are slowly becoming IBM.

They were completely correct to not compete in foundation models. They would have no chance. I mean, they can't even make a decent app or harness to use the other models!


They aren't publicly traded.


You're right they don't have a board and definitely weren't punished by both the FTC/DOJ for anti-trust and manipulative business practices.

Just a small mom and pop shop that somehow seems to elude themselves from the typical braindead MBA playbook of ruining lives to justify their shitty business decisions.

Hopefully the beloved indie game studio can navigate these waters successfully! Lots of sharks out there that like to rat fuck the commons for personal gain, wouldn't want that to happen to the gaming company that helped normalize gambling to children.


Amen.


Just a company with 40%+ of ownership from Tencent, Disney and Sony (all public traded companies). You could have googled this


Market in this context doesn't necessarily mean the publicly traded markets, it can mean the investor market (and their ability to raise), the financial market (their ability to get loans or other financial products), all of these things are dependent on their overall company performance, which can be manipulated in the short term by mass firing 1k employees so some executive can hit a KPI and get their bonus. Ya know, sociopathic stuff like that.


None of them do it well from our experience. We had to write our own custom pipeline with a mixture of legacy CV approaches to handle this (AI contract analysis). We constantly benchmark every new multimodal and VLM model that comes out and are consistently disappointed.


If someone releases a benchmark/dataset, I'm sure that significantly increases the chances of one of these AI labs training on the task.


"Polluting" is a very charged term. These satellites provide immense value. So far, there is no evidence these will stop us from watching the stars.


"value" is also a very charged term, and pollution is nearly purely a byproduct of the pursuit of value.


Document Crunch | REMOTE(US) / HYBRID

Company description: We do construction compliance AI. We're in series B, about $40m raised. Growing a lot! Feel free to reply or DM me if you want some more info.

Tech Stack: Typescript full stack except for Python for ML, NestJS backend, NextJS frontend, React Native mobile, IaC in Pulumi using TS.

Open Jobs:

- Senior/Staff/Principal Frontend Engineer (fullstack Typescript nice to have)

- Senior/Staff/Principal Node Engineer (fullstack Typescript nice to have)

- Senior/Staff/Principal QA Engineer

Location: Austin or Texas preferred, but open to remote for the right candidate in the US.

Jobs page: https://www.documentcrunch.com/careers#open-roles


Was I too late in regards to the QA positions, since Im not seeing them shown on the careers page!?


I used to work in ASR. Due to the nature of current multimodal architectures, it is unlikely we'll ever see accurate timestamps over a longer horizon. You're better off using encoder-decoder ASR architectures, then using traditional diarization using embedding clustering, then using a multimodal model to refine it, then use a forced alignment technique (maybe even something pre-NN) to get proper timestamps and reconciling it at the end.

These things are getting really good at just regular transcription (as long as you don't care about verbatimicity), but every additional dimension you add (timestamps, speaker assignment, etc) will make the others worse. These work much better as independent processes that then get reconciled and refined by a multimodal LLM.


It is insane that the thoughtful act of gift giving has been reduced into a checklist item. What is the point anymore?


Our family stopped doing gifts after it had devolved into just selecting items off of each person's Amazon wishlist.


Two PMs left a company with 125,665 employees. Yes, they are very important, but this happens every day at companies all around the world. Could be burnout, they are rich, got better offers, anything really.

People are so desperate for Tesla to fail they will latch onto anything.


What is the specific criticism? You believe it should not be reported on? Or you simply don't like the tone (in which case, why are you omitting that)?


Apart from the facts (that program managers have left), the tone of the whole article feels like that this writer absolutely dislikes Musk.


Sorry, I didn't mean tone as in "dislikes Musk", I meant tone as in "unreasonably dislikes Musk".


makes one really wonder how the writer came to dislike Musk of all people...


If they're claiming to be a journalist, their personal feelings on Musk should be irrelevant to the story.


They're low level program managers. Doomerism article for the sake of it.


The heads of Cybertruck and Model Y programs are "low level"?


He wasn't the "head of cybertruck" - if you look at his LinkedIn, he was a lead program manager. Still an important cog in the machine, but he'd been there for 8 years from an intern.

Model Y guy makes it sound like he was more of a key person, though.


More like not important, Tesla was never a car company /s


its the only way to recoup that trillion, or do you have any other ideas?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: