Knowledge Updates

Observations while developing web applications and creating great software.

  • Things we learned about LLMs in 2024 ↗

    Simon Willison:

    lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments.

    Prompt-to-interface exploded:

    We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.

    […] the Chatbot Arena team introduced a whole new leaderboard for this feature, driven by users building the same interactive app twice with two different models and voting on the answer. Hard to come up with a more convincing argument that this feature is now a commodity that can be effectively implemented against all of the leading models.

    This is one reason why Rolldown is focused on WebAssembly performance — the world needs build tools to run in browsers. Web apps generating web apps. Watch VoidZero in 2025.

    On MLX:

    Apple’s mlx-lm Python supports running a wide range of MLX-compatible models on my Mac, with excellent performance. mlx-community on Hugging Face offers more than 1,000 models that have been converted to the necessary format.

    Prince Canuma’s excellent, fast moving mlx-vlm project brings vision LLMs to Apple Silicon as well. I used that recently to run Qwen’s QvQ.

    MLX is used by Exo which is a very fast way to get started running models locally.

    On the value of LLMs:

    I get it. There are plenty of reasons to dislike this technology—the environmental impact, the (lack of) ethics of the training data, the lack of reliability, the negative applications, the potential impact on people’s jobs.

    […]

    LLMs absolutely warrant criticism. We need to be talking through these problems, finding ways to mitigate them and helping people learn how to use these tools responsibly in ways where the positive applications outweigh the negative.

    I think telling people that this whole field is environmentally catastrophic plagiarism machines that constantly make things up is doing those people a disservice, no matter how much truth that represents. There is genuine value to be had here, but getting to that value is unintuitive and needs guidance.

    Agreed.

  • EXO Private Search ↗

    EXO Labs:

    So we face a dilemma: how can we maintain the privacy benefits of local models while still accessing real-time information?

    […]

    The problem is that this database is hosted on a server somewhere in the cloud. The host of the server can see our query and the response of that query, compromising our privacy.

    A potential solution would be to download all the data locally. That way, we could query the database locally keeping the query and response private. But this is impractical – Twitter alone generates 200TB of text per year. Your phone can’t store that much data.

    What we want is a way for the server to process our query without ever seeing what we were asking for or what results we got back. Let’s see if we can find a way to do that by looking at how search works.

    How modern search engines work:

    Modern search engines convert both documents and queries into normalized vectors such that similar items end up close together. This conversion is called “embedding”. For instance:

    
      "I love cats"      → A = [0.7, 0.71] (normalized)
      "I like felines"   → B = [0.5, 0.866] (normalized)
      "Buy cheap stocks" → C = [-0.8, 0.6] (normalized)

    To find relevant documents, we look for vectors that point in similar directions – meaning they have a small angle between them. The cosine of this angle gives us a stable, efficient similarity measure ranging from -1 to 1:

    • cos(0°) = 1 (vectors point same direction)
    • cos(90°) = 0 (vectors are perpendicular)
    • cos(180°) = -1 (vectors point opposite directions)

    The cosine similarity formula is shown below, where a and b are vectors with components a₁, a₂, etc. and b₁, b₂, etc., θ is the angle between them, · represents the dot product, and ||a|| denotes the length (magnitude) of vector a:

    cos(θ) = (a·b)/(||a||·||b||)
           = (a₁b₁ + a₂b₂ + ...)/(√(a₁² + a₂² + ...) · √(b₁² + b₂² + ...))

    However, when vectors are normalized (meaning their length equals 1), the denominator becomes 1·1 = 1, leaving us with just the dot product:

    cos(θ) = a·b = a₁b₁ + a₂b₂ + a₃b₃ + ...

    This is why working with normalized vectors is so efficient – the similarity calculation reduces to simply multiplying corresponding elements and adding up the results.

    Let’s compute the similarities between our example vectors:

    cos(θ₁) = A·B = (0.7 × 0.5) + (0.71 × 0.866) = 0.964 // Similar!
    cos(θ₂) = A·C = (0.7 × -0.8) + (0.71 × 0.6) = -0.134 // Quite different

    The first high value (0.964) tells us vectors A and B point in similar directions – just as we’d expect for two phrases about cats! The second value (-0.254) shows that vectors A and C point in quite different directions, indicating unrelated topics.

    relevance = query_vector · document_vector
    = q₁d₁ + q₂d₂ + q₃d₃ + ...

    This seemingly simple calculation – just multiplications and additions – is the key to private search.

    Via X
  • A theory of appropriateness with applications to generative artificial intelligence ↗

    Google DeepMind with Joel Z. Leibo, Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, Raphael Koster, Edgar A. Duéñez-Guzmán, William S. Isaac, Georgios Piliouras, Stanley M. Bileschi, Iyad Rahwan, Simon Osindero, and others:

    What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which actions are appropriate in which contexts? And what causes these standards to change over time? Since all judgments of AI appropriateness are ultimately made by humans, we need to understand how appropriateness guides human decision making in order to properly evaluate AI decision making and improve it. This paper presents a theory of appropriateness: how it functions in human society, how it may be implemented in the brain, and what it means for responsible deployment of generative AI technology.

  • Load instantly ↗

    if your product doesn’t load instantly in the web browser with 0 logins and clicks it’s a bad product and no one is going to use it

  • Ladybird in December 2024 ↗

    Ladybird is a brand-new browser & web engine targeting a first Alpha release for early adopters in 2026. Here’s what’s new last month.