AI Tech and Product Releases
ChatGPT has been released to Android play stores. Following their release to IOS, this move continues OpenAIโs global expansion of ChatGPT into as many markets as possible. With smartphones used by billions of people worldwide, Powerful AI accessible as smartphone apps will be very impactful.
GOAT-7B-Community model is the SOTA among the open-source 7B models. This model is fine-tuned on recently released LLaMA-2 7B model, using a fine-tuning dataset collected from the GoatChat app. Using a number of innovations in fine-tuning of the model, GOAT-7B achieved an MMLU rating close to 50, on par with larger 13B models and the best MMLU rating for a 7B parameter model thus far.
Stability AI releases its latest image-generating model, Stable Diffusion XL 1.0. Yet another better-than-ever image generation AI model release, Stable Diffusion XL 1.0 improves in multiple ways: Supports image-to-image prompting, inpainting and outpainting; is better at legible text font generation; can understand complex multi-part instructions; and is faster, producing โfull 1-megapixel resolution images in seconds in multiple aspect ratios.โ At only 3B parameters, this open source AI model can be fine-tuned and run locally, so we can expect to see an explosion of customized models on this latest release.
AI Research News
Prompting Large Language Models with Speech Recognition Abilities combines audio with text input to enable speech recognition in LLMs: โBy directly prepending a sequence of audial embeddings to the text token embeddings, the LLM can be converted to an automatic speech recognition (ASR) system, and be used in the exact same manner as its textual counterpart.โ
Google DeepMind introduces a new vision-language-action model for improving robotics called Speaking robot: Our new AI model translates vision and language into robotic actions.
In research presented at ICML this week, Researchers from MIT and Stanford develop machine-learning technique that can efficiently learn to control a robot. This should help drones or autonomous vehicles move more effectively and efficiently in dynamic environments where conditions can change rapidly.
AI Paper Proposes to Inject the 3D World into Large Language Models and Introduce a Whole New Family of 3D-LLMs. Multi-modal LLMs (like Flamingo and BLIP-2) align pictures and videos with LLM for a new breed of AI models that can comprehend and make sense of 2D visuals. This research extends multi-modal concepts into the real 3D physical world, to build models which understand things like spatial connections, affordances, physics, and interaction.
If we combine the above four papers into a single concept, we can envision โfull multi-modalโ AI models that have multi-sensory inputs, of text, audio, visual, plus 3D understanding and embodied understanding of how to navigate that 3D world. At the rapid pace of development, we could see such models in only a few years.
AI safety concerns highlighted: AI researchers say they've found 'virtually unlimited' ways to bypass Bard and ChatGPT's safety rules. The researchers found they could use jailbreaks they'd developed for open-source systems to target mainstream and closed AI systems to bypass safety rules in several LLMs. The โjailbreakโ technique involves adding specific suffix text to a prompt to trick the model into answering instead of rejecting the prompt. Any given jailbreak method can be patched out, but for each breach that is sealed there is another waiting to be discovered. How we can build tools to avoid these pitfalls is a key question in AI Safety research.
A recent paper called โApplication of artificial intelligence in libraries and information centers services: prospects and challengeโ considered AI in library information services. It concludes that institutes could establish AI systems to handle a myriad of tasks including cataloging, indexing, and referencing, among other things.
Machine learning models have been applied in stock price prediction, assessing the news and picking up on social cues to predict prices movements and gain some profit on the prediction. The paper โA Future Trading System using Ensemble Deep Learningโ illustrates how using Ensemble Deep Learning can improve performance on this challenge, with accuracy up to 91%.
AI Business and Policy
Studios Quietly Go on Hiring Spree for AI Specialist Jobs Amid Strike. In the midst of actor and writer strikes, companies are opting to hire AI specialists instead of conceding to their laborers. For example, Netflix touts $900k AI jobs amid Hollywood strikes. One of the core points of contention in these strikes is the growing presence of generative AI, yet Disney, Sony, and Netflix, among other companies, seem to be leaning into AI and searching for replacements rather than addressing the strikes.
More than 70% of companies are experimenting with generative AI, but few are willing to commit more spending. More than half (54.6%) of organizations are experimenting with generative AI, while a few (18.2%) are already implementing it. Corporation use of AI is still in early days.
Stanford's HAI recently discussed the impacts of the EUโs AI act on American policy. The overarching idea in this discussion is that policy abroad would influence our policy, as companies would not want to design different tools for two different markets. Importantly, key weaknesses in the EUโs AI Act were that transparency levels of certain tools may need to be determined by the fields they are used for, and the Act doesnโt consider datasets or training models. This is a pivotal time for forming the policies on AI, that might either protect our future and encourage innovation.
AI startup funding: Beyond Work raises $2.5M to make work more โhumanโ with LLMs.
AI Opinions and Articles
We have been using the commonly referred term โopen sourceโ to AI model that have some degree of open-ness. Yet the term has been abused, as organizations and companies have declared their models โopenโ while hiding details about AI model data inputs, training process, model weights, and architecture. This article corrects the record: Llama and ChatGPT Are Not Open-Source: Few ostensibly open-source LLMs live up to the openness claim.
This article reports on research from Radboud University, which assessed the openness of AI models across a number of categories - a chart of their asessment is here. Hereโs a snippet of it:
They are particularly scathing of Meta declaring their own Llama models โopen sourceโ, when we know little about the internals of the models - what is open about Llama / Llama 2 is the model weights, which enables customization.
โMeta using the term โopen sourceโ for this is positively misleading: There is no source to be seen, the training data is entirely undocumented, and beyond the glossy charts the technical documentation is really rather poor. We do not know why Meta is so intent on getting everyone into this model, but the history of this companyโs choices does not inspire confidence. Users beware.โ - Mark Dingemanse
A Look Back โฆ
Stanford University shares a website that catalogs the thousands of LLMs produced in recent years. It is called โConstellation: An Atlas of 15,821 Large Language Models.โ It leverages the Hugging Face catalog of uploaded LLMs to build out their database of models. This atlas serves as a taxonomy of the many AI models being produced, and in so doing, sheds light on the origin and evolution of LLMs in recent years.