• 0 Posts
  • 333 Comments
Joined 3 years ago
cake
Cake day: June 30th, 2023

help-circle



  • Lol I thought your link was “here’s a rocket designed by an LLM” rather than one designed by the non-LLM AI.

    LLMs are a local minimum that tech bros are stuck trying to optimize to a generally useful point because its language abilities are able to fool so many (just like how a real person talking with confidence can fool so many).

    This obsession with LLMs is making me question general human intelligence more lol. It’s looking more and more like we are just dumb apes but get lucky and every now and then a smart ape is born and teaches the other dumb apes how to bring their stupidity to whole new levels.


  • Yeah, it’s good enough that it even had me fooled, despite all my “it just correlates words” comments. It was getting to the desired result, so I was starting to think that the framework around the agentic coding AIs was able to give it enough useful context to make the correlations useful, even if it wasn’t really thinking.

    But it’s really just a bunch of duct tape slapped over cracks in a leaky tank they want to put more water in. While it’s impressive how far it has come, the fundamental issues will always be there because it’s still accurate to call LLMs massive text predictors.

    The people who believe LLMs have achieved AGI are either just lying to try to prolong the bubble in the hopes of actually getting it to the singularity before it pops or are revealing their own lack of expertise because they either haven’t noticed the fundamental issues or think they are minor things that can be solved because any instance can be patched.

    But a) they can only be patched by people who know the correction (so the patches won’t happen in the bleeding edge until humans solve the problem they wanted AI to solve), and b) it will require an infinite number of these patches even to just cover all permutations of everything we do know.


  • Here’s an example I ran into, since work wants us to use AI to produce work stuff, whatever, they get to deal with the result.

    But I had asked it to add some debug code to verify that a process was working by saving the in memory result of that process to a file, so I could ensure the next step was even possible to do based on the output of the first step (because the second step was failing). Get the file output and it looks fine, other than missing some whitespace, but that’s ok.

    And then while debugging, it says the issue is the data for step 1 isn’t being passed to the function the calls if all. Wait, how can this be, the file looks fine? Oh when it added the debug code, it added a new code path that just calls the step 1 code (properly). Which does work for verifying step 1 on its own but not for verifying the actual code path.

    The code for this task is full of examples like that, almost as if it is intelligent but it’s using the genie model of being helpful where it tries to technically follow directions while subverting expectations anywhere it isn’t specified.

    Thinking about my overall task, I’m not sure using AI has saved time. It produces code that looks more like final code, but adds a lot of subtle unexpected issues on the way.


  • Going for less known names can also help, as they are trying to build/maintain a reremovedtion in addition to sales.

    IKEA is an interesting brand because it spans from incredibly cheap to nice quality, and personally, I find the cheapness is more in the material selection than the design. Like the furniture I got from them at my last place all survived the move to my current place, even the one I got frustrated with and stopped caring if it made it when taking it apart, it still stands solid today. They are one of the few that has decent value, though their prices can get pretty high at the high end.


  • Yeah, it’s more of a late stage capitalism “luxury” where the difference isn’t so much in the quality as in the price because people conflate “price” with “quality” and “desireability”.

    And I do understand it, at least to a degree. I try to do research on more expensive items or ones I’m looking for quality in, but it’s kinda exhausting, and often a cycle of “I want thing, see it in store and remember I want it, look at options, no idea which (if any) are decent and which suck, start looking online, decide I don’t want to do this right now, move on, forget to do research, repeat next time I’m at that store”.

    The easy mode of doing that would be look at options, assume cheapest ones suck, most expensive is too much, get one of the ones a little cheaper. At which point, the seller just needs to set a higher price to get a sale on the crappy ones.




  • If you want a demo on how bad these AI coding agents are, build a medium-sized script with one, something with a parse -> process -> output flow that isn’t trivial. Let it do the debug, too (like tell it the error message or the unwanted behaviour).

    You’ll probably get the desired output if you’re using one of the good models.

    Now ask it to review the code or optimize it.

    If it was a good coding AI, this step shouldn’t involve much, as it would have been applying the same reasoning during the code writing process.

    But in my experience, this isn’t what happens. For a review, it has a lot of notes. It can also find and implement optimizations. The weighs are the same, the only difference is that the context of the prompt has changed from “write code” to “optimize code”, which affects the correlations involved. There is no “write optimal code” because it’s trained on everything and the kitchen sink, so you’ll get correlations from good code, newbie coders, lesson examples of bad ways to do things (especially if it’s presented in a “discovery” format where a prof intended to talk about why this slide is bad but didn’t include that on the slide itself).






  • An alternative that will avoid the user agent trick is to curl | cat, which just prints the result of the first command to the console. curl >> filename.sh will write it to a script file that you can review and then mark executable and run if you deem it safe, which is safer than doing a curl | cat followed by a curl | bash (because it’s still possible for the 2nd curl to return a different set of commands).

    You can control the user agent with curl and spoof a browser’s user agent for one fetch, then a second fetch using the normal curl user agent and compare the results to detect malicious urls in an automated way.

    A command line analyzer tool would be nice for people who aren’t as familiar with the commands (and to defeat obfuscation) and arguments, though I believe the problem is NP, so it won’t likely ever be completely foolproof. Though maybe it can be if it is run in a sandbox to see what it does instead of just analyzed.