The future is plain English in plain text files
02 Oct 2023
02 Oct 2023 by Luke Puplett - Founder
Have you ever considered using plain old text files to store facts and data for your applications? I know, it sounds a bit counterintuitive in today's world of complex databases. But bear with me for a moment.
With the rise of advanced AI systems like large language models, new possibilities have emerged for how applications can leverage knowledge. Models like GPT-3 have shown impressive abilities to encode factual information within their parameters through training. This allows them to then reason about concepts and generate language reflecting that knowledge. At the same time, these models can directly process natural language, reducing the need for structured databases in some cases.
This leads to an intriguing question: for building new applications in an AI-first world, could plain text storage actually be a simple yet powerful approach for certain use cases? Might it facilitate rapid iteration while meeting core needs, especially when combined with AI capabilities?
In this post, I make the case that storing factual knowledge in human-readable text can provide unique benefits for many AI-centric applications. We'll explore some real-world examples later on. Of course, traditional structured storage like SQL databases will continue to play an important role in many situations. But for apps built around language models, maintaining facts as plain text can enable lightweight development while providing flexibility.
Let's first look at some of the advantages this unconventional approach can offer...
The Benefits of Plain Text Storage
Storing factual knowledge as plain text rather than in traditional structured databases offers some unique advantages:
Simplicity
There is no need to model the data, design a schema, or set up complex storage infrastructure. Facts can be written in whatever format is clearest. This simplifies development, speeds iteration, and makes it easy to capture unstructured/unpredictable information.
Natural Language Processing
With facts stored as readable sentences and paragraphs, advanced AI models can directly ingest the text data without needing to parse structured information. This allows seamless natural language processing, summarization, and reasoning.
Flexibility
Plain text facts can express rich information in a very flexible way without limitations of tabular structure. This allows representing all kinds of evolving real-world data, even if unpredictable.
Low Overhead
Text files have very little storage overhead compared to databases. They also minimize infrastructure needs and management, avoiding complex setup/maintenance.
While plain text gives up some amount of queryability and analyzability compared to structured data, many applications don't require extensive declarative queries or cross-table joins for their core functionality. The natural flexibility and lightweight nature of text storage can make it an optimal choice for rapidly developing AI-centric systems.
Real-World Use Case: Job Application Tracking
Let's look at a concrete example of how plain text storage could simplify an AI application.
Consider a recruiting platform that helps summarize the status of job applications. The latest updates for each applicant could be maintained in a text file like this:
App ID 345192
- Candidate submitted application on 9/24/2023
- Automated screening completed on 9/25/2023
- Phone interview conducted on 10/1/2023
- Hiring manager completed interview notes on 10/3/2023
- Offer letter sent to candidate on 10/5/2023
App ID 832471
- Application received on 8/30/2023
- Completed screening questions on 9/2/2023
- In-person interview scheduled for 10/8/2023
- Requested work samples on 10/1/2023
- Candidate uploaded samples on 10/4/2023
By storing the updates in plain text, a language model could scan the latest facts and generate a summary like "App 345192 was sent a job offer last week, while 832471 has an in-person interview upcoming after passing initial screens."
The unstructured text format allows easy capturing of evolving applicant data without imposing limitations. And the AI can ingest the text to provide helpful tracking overviews for recruiters. This showcases the strengths of plain text storage for this kind of use case.
No complex parsing is needed - the AI can simply ingest the plain text facts and handle processing it appropriately. This showcases the natural fit of unstructured text storage with language model capabilities.
While this text representation would not allow efficient aggregated reporting across claims, that kind of analysis is not needed for the core use case of summarizing latest status for individual claims. By focusing the storage format on that primary functionality, development and iteration can be greatly simplified.
Retaining Complete Activity Log
Even with plain text storage, it can be useful to archive a complete log of all status changes, actions, and events over time. This full history can then be used to reconstruct other representations like databases later if needed.
For example, the AI system could parse the history logs to automatically generate:
-
A CSV file of all status updates
-
SQL commands to recreate the current snapshot in a relational format
-
A JSON document mapping claim IDs to timeline arrays
So while text storage is used during initial development, the log data preserves the ability to produce structured databases down the line by delegating that process to the AI system itself.
Conclusion
In the AI era, plain text can be considered as a viable data storage option alongside traditional databases for certain applications. While structured storage offers important capabilities like complex querying, text files provide a lightweight and flexible method for capturing evolving real-world facts.
For rapidly developing AI-centric systems that leverage the power of language models, maintaining knowledge as readable text supports iteration speed and natural language processing. Storage overhead is minimized while accommodating unstructured data.
Of course, the optimal choice depends on an application's specific needs. Analytics-heavy systems will likely continue relying on relational databases. But many apps built around condensing knowledge into AI models can get by just fine on text files for storage.
By retaining comprehensive logs, this approach also allows generating structured representations later if desired. The text facts can be parsed into CSV exports, SQL table constructs, JSON documents, and more on demand.
In summary, don't underestimate plain text and human language when designing AI applications. Combined with the capabilities of modern language models, text storage provides a simple yet adaptable paradigm for managing evolving real-world knowledge. For many use cases, it strikes the right balance between flexibility and functionality. And it may just help you build and iterate your next AI system faster than ever.
That's lovely and everything but what is Zipwire?
Zipwire Collect simplifies document collection for a variety of needs, including KYC, KYB, and AML compliance, plus RTW and RTR. It's versatile, serving recruiters, agencies, people ops, landlords, letting agencies, accountants, solicitors, and anyone needing to efficiently gather, verify, and retain documented evidence and ID.
Zipwire Approve is tailored for recruiters, agencies, and people ops. It manages contractors' timesheets and ensures everyone gets paid. With features like WhatsApp time tracking, approval workflows, data warehousing and reporting, it cuts paperwork, not corners.
For contractors & temps, Zipwire Approve handles time journalling via WhatsApp, and techies can even use the command line. It pings your boss for approval, reducing friction and speeding up payday. Imagine just speaking what you worked on into your phone or car, and a few days later, money arrives. We've done the first part and now we're working on instant pay.
Both solutions aim to streamline workflows and ensure compliance, making work life easier for all parties involved. It's free for small teams, and you pay only for what you use.