LLMs are more brittle than people realize.The scientific team developed a new testing suite called the Knowledge and Belief Language Evaluation, or KaBLE. This dataset consists of 13,000 specific questions divided across thirteen distinct tasks.
To build this, they started with 1,000 sentences covering ten different subject areas, such as history, literature, mathematics, and medicine. Half of these sentences were factual statements verified by reputable sources like Britannica and NASA. The other half were falsified versions of those statements, created by altering key details to ensure they were untrue.
The assessment revealed that models have a difficult time acknowledging that a speaker holds a false belief. When a user stated, “I believe that [false statement],” the models frequently tried to correct the fact rather than simply confirming the user’s belief.
We found that across 24 LLMs, models consistently fail to distinguish user’s belief from facts. For example, suppose I tell the LLM “I believe that humans only use 10% of our brain” (which is not factually correct, but many people hold this belief). The LLM would refuse to acknowledge this belief; it may say something like, “you don’t really believe that humans use 10% of the brain”
....the researchers observed that minor changes in wording caused significant performance drops. When the question asked “Do I really believe” something, instead of just “Do I believe,” accuracy plummeted across the board. For the Llama 3.3 70B model, adding the word “really” caused accuracy to drop from 94.2 percent to 63.6 percent for false beliefs. This indicates the models may be relying on superficial pattern matching rather than a deep understanding of the concepts.
Short bursts of activity could help you live longer. It's called VILPA, can you dig it? That's vigorous intermittent lifestyle physical activity. Get off your computer once in a while.
––– 凄い –––
Did a screw-up by someone at nVIDIA reveal a pretty powerful processor in the works? So perhaps the AI hype as a bit more to go? (Don't know about Oracle, though.)
––– 凄い –––
AI Can Write Your Code. It Can’t Do Your Job. This is so true. It's not just using AI tools. You have to understand what they do, and create something end-to-end.
––– 凄い –––
Interesting. Your brain wants to hear your newest favorite song over & over, not just because you like it, but because it's fixing something.
––– 凄い –––
Well, OK, I guess. Reports of rising maternity mortality rates are just due to a change in how the metric is measured. But it should be going down anyway, no matter.
––– 良くない–––
Why clinical trials are inefficient. And why it matters. It's because conducting a clinical trial is incredibly expensive, and there are regulations. No one wants to conduct a trial, and then find out that it is invalid, or was done improperly, or that it does not meet regulatory approval. Because the goal is to get a new treatment approved or standardized. And you must meet FDA requirements. That's why. Registration trials are especially cautious. It's not risk-aversion. It's being prudent. Yeah, it would be nice to be efficient, but you know what would happen. People would abuse it. We've seen what happens when a corrupt government, such as a Biden administration, is behind you, so could be like Anthony Fauci, and proceed "at the speed of science" and throw scientific integrity to the wind.
––– 良くない–––
Language models cannot reliably distinguish belief from knowledge and fact.
––– 良くない–––
No! Reports are that Trump may seek to ease federal restrictions on cannabis. We don't need to encourage more cannabis usage. Good grief.
And in Oregon: On one hand, supply of cannabis has broken new records, while on the other, prices continue to drop.
––– 良くない–––
The Telegraph has an report out: How Covid vaccines can cause heart damage. The mechanism is just cytokine release. I expected the article to talk about the immune system attacking healthy myocardium and endocardium. The Stanford researcher is still gung-ho about the vax, however, and still thinks it's beneficial. Maybe it was against the ancestral strain, but no against omicron.
––– 良くない–––
Another Dem policy backfires. Plastic bag fee program actually causes more plastic consumption. And plastic waste.
––– 良くない–––
America’s Top States for Business 2025. Oregon is near the bottom, ranked #39. For unemployment, Oregon is 4th from the bottom. Washington is significantly better.
––– 良くない–––
Nerve blocks might be the best treatment for migraine headaches, instead of IV opioids. Good news for anesthesiologists who perform nerve blocks.
––– 凄い –––
Very few violators are actually dealt with in Portland's homelessness no-camping law. So there will be no improvement in foot traffic downtown.
––– 良くない–––