Language model lore… what is it good for?
SJ: I’ve been working w/ generative models since the Met hackathon we helped organize, working on ways to dissect & label latent spaces, and recently to get text transformers (like GPT) to generate synthesis (encyclopedia articles) and take non-text inputs (describe images) w/ minimal retraining. Both approaches work currently w/ supervision; the amount of supervision needed and the usefulness of citations in the outputs are improving every week. Models remain vulnerable to mistakes, as are sources themselves, so this is still only suitable for draft generation. However Bing😊 and neeva point to ways to choose the density of sourcing, and how that can be integrated with search + review for a fast spot-checkable result. Even with that supervision step, this is a great speedup for background research: 5-10x faster for my own drafting. Currently poor for choosing how much weight to give to different ideas, or comparing conflicting ones; fine for lists, overviews, summaries, redrafting, tone + format switching, translation.
Specific example from the wikiverse (where edits are already all of unknown quality, from pseudonymous sources, and validated by the accuracy + quality of their cites): For drafting or expanding articles, GPT-4 works as a fast 80/20 substitute for a range of other tools (and faceless third-party services). Without supervision: GPT3 was hard to use, ChatGPT is comparable to low-quality anon edits today, Bing😊 produced crank-quality drafts (competent, may have persistent mistakes), and GPT4 is producing high-quality anon edits by default (could pass as a research note; may be fooled by one focus group over another but likely to note nuance + avoid overconfidence) and good edits under the right conditions and prompting (translation, clarification, expansion, adding valid sources + recent facts from the news) — better still with a wolfram plugin.
All of this plus good open models points to inexpensive self-hosting and fine-tuning this year. Once you have access to a good text transformer, you can spend much less time training a projection of other inputs onto what it can parse. Fine-tuning outputs w/ feedback to satisfy correctness constraints or technical constraints is also improving rapidly, and will directly replace a range of entry-level analysis. GPT4-in-progress (“DV”), according to friendly lawyers, can output boilerplate legal text comparable to not an L1 but a first-year lawyer. [Still just a transformer +RLHF!]
Current-gen transformers are strongly aligned w/ reading + writing workflows: good at responding to natural language and doing tasks from parsing and interpretation to completion, translation, and format conversion. They’re also fundamentally inexpensive (at the scale of cloud services): a few cents a prompt & dropping quickly for one-offs, and $100-200/hr for an unlimited private instance of GPT-4 with 8k-32k tokens of context per session. Open versions of the major text and media models come out quickly after closed ones do, so it’s possible to spin up and fine-tune your own.
These models are good at a lot of things that improve quality of reading, writing, and analysis; likely central to all future publishing workflows. Our current services could be improved with the right integration, and new ones may become possible. Things like content transformation (text to speech, audio and video) are constantly getting better, and staying on top of that for users is a hugely valuable meta-service.
How might this fit into the roadmap?
A subset of what’s possible, off the top of the head — goals that aren’t too much of a stretch from current plans (as of Feb ‘23) :
Make common tasks easier, faster, more fun:
Reading / writing
Tagging / assessment
Alternate formats: outline, illustration, abstract
Moderation (of spam, comments, users)
Support complex tasks (more likely to need oversight):
5. Format transformations
6. Brainstorming names / titles
7. Outlines / list completions
8. Reviews / analyses
Suggest summarization
Abstract generation
Rewriting for brevity
Suggest completion
Outline / list completion
Transform format/style
Tables to text, v-v
Current import
Dataset to data-paper, v-v
Image generation
Automatic thumbs and headers (midjourney, sd, dalle)
Media integration
Audio and video transcription
Image captioning and alt-text
Overall
Buttons that run specific prompts w/ context drawn from current metadata + cursor / viewing-window position
Separate column for quick prompt + output. [conveniently: can silently specify size constraints of outputs]
Embedded at cursor under some circumstances (ghostwriter does this. most contextual, hard to get right)
Text
Prompt prefix/suffix for ChatGPT wrapper (a Ghostwriter-like experience)
Fine-tuning on a topical corpus? on existing public work in a community?
Fixed prompts for common tasks, w/ pub or its metadata as input
Series templates: sparkling C&P
Templated draft of the next in a series (special issues)
Images
Thumb/header generation
Inline section illustration
Layout
Description-to-template layout generator for PubPub-specific style
community-level css: generating a valid PubPub look?
Data
Extraction from pub text
Conversion to/from Pub tables (and collections, other?)
Homeworld: for grants drafting (help submitters convert a one-paragraph summary into a full proposal), topic tagging
Discourse graphs: for conversion b/t graph, outline, and summary;
for conversion among common json formats, tags
ML/AI communities: already actively using such tools to track breaking developments; sidebars that autosummarize specific queries from the recent literature would be useful (fast changing) and their limitations understood (right audience)
Multimedia communities: the 🇨🇭🔪 aspect means something like IFTTT will be able to do more semantically interesting transforms (motif to description, midi to transposition, or to video clip). Likely split by community into keen / averse to integrating such a thing.
(Addendum: plugin approaches to LLMs make the above easier to get started with & no less interesting.)