TLDR:
This article is written as a short primer for a general audience on the introduction of AI agents in building autonomous organizations — or key trends in that direction. Along the lines of the rest of this Substack, it is a personal experiential perspective on how individuals can develop highly leveragable tools, particularly in knowledge domains. It presents two autonomous pipelines used to gather and assess information about the life science industry. This article will also touch on some of the considerations when thinking about the longer-term integration of AI agents in an enterprise setting.
Contents:
For more like this see: jasonsteiner.xyz
Redefining the Possible
A year ago, I started exploring what was possible using AI tools, primarily LLMs, as a non-career software engineer. The experience has led me to believe that business/commercial savvy individuals who have an understanding of what is currently possible and what is being developed with regard to AI agents and LLM tools, will be able to build extremely capable and scalable organizations with very few resources. There is a lot of hype in the industry about AI agents and their potential impact on labor markets. This is both overestimated and underestimated. Building functional workflows entails a tremendous amount of work (and I am not even touching the idea of a polished product), but the scalability and capabilities enabled by those workflows will be an unparalleled lever for enterprises that construct them well. Below are two examples that I have built from scratch (to the extent that LLM coding assistants can be viewed as scratch) — the entire frontend, backend, agentic logic, database management, etc. has started from plain text files (and some useful libraries). For many people that I know, the potential of LLM tooling is still opaque and the idea of learning disciplines seems out of reach, but they are substantially more practical than one might imagine.
The moniker “the future is here, it is just not evenly distributed” is very true.
The Life Science Web: Workflow for Automating Data Extraction
The first example is a workflow for automated construction of networks in the life science industry. It autonomously updates conflicts of interest that are reported across the industry. Currently it has over 175k unique relationships between individuals, companies, and their relationship type (for example: consultants, investors, SAB members, advisors, employees, etc.) across more than 70k individuals and nearly 20k companies. It automatically runs and updates every day.
Deep(er) Research*: Autonomous Clinical Trials Assessment Agent
The second example is a clinical trials research agent built on top of hundreds of thousands of technical publications. While this is composed of a workflow, it is agentic in nature in that it will determine whether or not it has sufficient information (and of what type), and continue to expand a search and identify information gaps to meet assessment criteria for the quality of the research obtained. ( * like most things in the AI world, naming isn’t great yet…)
These agents are autonomous. They operate in the background, can be called on demand, and are arbitrarily scalable. Throughout their construction, several lessons have been learned about database infrastructure, modularity, extensibility, and flexibility. Many of these lessons are discussed below. The underlying mechanics of each process are adaptable to a wide range of applications for similar use cases (e.g., automated information gathering and extraction and automated research synthesis for any arbitrary topic).
These agents are potential components of an autonomous organization. Importantly, LLMs have dramatically lowered the technical bar for individuals who are not career software engineers. The construction of autonomous organizations happens cumulatively—with subprocesses being developed to address specific business needs. Ensuring that there is consistency, integrateability, and accessibility across agents will provide the foundation for increasingly autonomous organizations.
For more on this topic see: jasonsteiner.xyz
For details or interest in the above agents: Contact Here
Autonomous Unicorns
Sam Altman has been credited with positing the question of when we will see the first single founder unicorn company — that is a company that is valued at over $1B and is operated by a single individual. This is an ambitious viewpoint.
A nearer term question for most founders and established enterprises alike is how to incorporate advances in AI into their organizations — and indeed what is possible currently, what is coming, and how fast.
This is often referred to under this collective term of “AI Agents” — and the defining characteristic is the “agency”. This also goes by the more common and historical term of software indeterminacy which is a central aspect of paradigms like fuzzy logic. Agentic LLMs can be considered an extension of this idea — fuzzy logic in very high dimensions for complex situations. In practice, this means designing software systems where LLMs will produce outputs in a decision-making framework, often accompanied by the use of “tools” which are generally deterministic software packages like calculators and web browsers but may also be other language models that perform different functions. LLMs that play different roles, for example, in a debate setting, can provide much more robust and accurate information to users.1
This is an easy practical step to take from familiar LLM chatbots. In a very simple form, one can imagine asking an LLM a yes/no question and depending on the answer directing a software workflow down a particular path. The complexity of the yes/no question can be as arbitrary as the performance of the LLM — which, as we all have experienced, can be considerable.
With the capabilities of LLMs to produce language-based outputs to complex inputs and the ability to connect those outputs to specific actions, it is easy to imagine how very complex processes can be assembled to address a broad cross-section of use cases.
This article will discuss some of the practical aspects, from a first-hand perspective on building this type of software. The capabilities are impressive, but there are many technical challenges to effective implementation and adoption.
For some additional reading on prior topics related to the commercialization of AI products and building LLM systems, the following may be of interest:
Some Terminology
An important distinction to make upfront is the difference between LLM workflows and LLM agents — this article will discuss both.
The distinction lies in the indeterminacy of the code. LLM workflows use LLMs, for example for extracting information from documents, but their procedural path is fixed.
In contrast, LLM Agents do not have a fixed path — the actual execution of software is determined by the outputs of the LLMs themselves. Both are very useful paradigms. Agents, however, have considerably more complex implementations.
Tools and Frameworks
There is an explosion of frameworks being developed to build AI agents — some of the more common ones include Langchain, n8n, crew.ai, and, most recently, a toolkit released by OpenAI. This article will not touch on the details of these tools. In general, these tools are not complex — they are largely collections of APIs that make instructions, tool allocation, and memory management more streamlined, but the underlying basics are the same. This discussion is about writing agents from scratch. The primary reason is both educational and practical, as the power of agents lies in the ability to customize their use cases, and this is often less possible with low-code or no-code solutions.
Key Lessons
Before starting to consider implementing agents, one should certainly be familiar with LLMs at the API level and what they can and cannot do.
Building AI agents is an architecture problem — the more detail that is put into the upfront architecture design, the better the implementation will be.
For real-world applications, data engineering is a very significant part of the implementation. This includes input information, reference information (e.g., for RAG), and logging information. This is perhaps the most extensive technical component of implementation. Traceability, memory management, and logging need to be priority design decisions.
Prompt engineering is very important for LLMs that serve as decision making agents. Variations in how prompts are structured and the specific language used can have significant impacts on outputs — for example, ensuring that critical instructions are placed at the end of the prompt assists the chat continuation in paying more attention to them in generating a response.
Mapping a decision workflow in detail is mandatory. For real-world applications where agents may operate autonomously, the greater the detail in the upfront workflow, the more consistently the agent will follow the logic.
Modularity is critical. To the greatest extent possible, agentic components should be modular to allow rapid swapping of prompts, tools, data pipelines, and logic flows. Many of the toolkits mentioned above do help with this by providing abstracted modules that can be plugged together.
For agents that are designed for creative or open-ended tasks, like research agents, one should manage LLM temperature settings accordingly. For agents that are designed to execute SOPs, temperature should be set to 0 for reproducible results.
Robust error and exception handling are mandatory. This is important on at least three main levels: variability in third-party APIs, inconsistencies in real-world data integrity, and stochasticity in LLM outputs. All of these can negatively impact agentic processes. LLM capabilities like structured outputs, libraries like pydantic, and robust I/O management are all critical to managing edge cases.
UI/UX interfaces are the primary objective and should be very clearly defined. This requires a clear objective for what the agent is intended to do and who the ultimate consumer is. It is possible that some agents will have consumers that are other agents. It is more likely, currently, that agents will have humans as consumers, especially in contexts where liability is a concern. In each case, however, the interaction of users with the agent should be the primary motivator behind architecture decisions.
Training Agents
This article does not substantively touch on training agents, but there are a few paradigms worth mentioning:
Fine-tuning LLMs for decision making. If there are robust SOPs, this is a supervised process of fine-tuning an LLM on generated inputs and outputs to ensure that an agent will correctly execute an SOP. This is most useful when there is a clear decision making tree to follow and an objective endpoint.
In open-ended settings, such as scientific research, where the decision making framework is not established, training is often done in a reinforcement setting, where decisions are assigned a reward value — this is similar to the way that reasoning models like DeepSeek R1 have been trained but with the addition of branches that incorporate the use of tools as opposed to a purely LLM generated framework. This is mostly applicable in open-ended settings where there is not a clearly defined framework. Some interesting work in this has been published by FutureHouse in their Aviary paper4.
Autonomous Organizations
Building agents is a cumulative process and should be thought of strategically — both from an implementation perspective and from a use case perspective. One of the central advantages of LLMs or AI-tooling in general, is the ability to synthesize vast amounts of information. This ability is most leveragable when such information is cumulatively accessible and agents are built in a modular fashion. For example, if an agentic system were built to provide a strategic perspective for an executive team, that system would likely consist of several subsystems that would assess specific aspects of a business and its ecosystem, for example, agents that assess financial data, market data, sales data, operational data, etc. Each of these subsystems may be constructed individually, however, they should have outputs and branch points that allow them to be piped together and new agent subsystems should be easily integrateable. This should be thought of as being as straightforward as adding a new invitee to an executive review committee.
We are moving into a world where agentic processes will be built into organizational charts—with job descriptions, specified authorizations and access, and reporting lines. It is important, however, that agents are not, in general substitutes for people, particularly in decision making contexts with liabilities. The ability to train an agent to act independently with reliability, consistency, and trust, remains a challenge currently. It should be expected, however, that agents will significantly improve the scope and productivity of individuals, and this is a fact that will need to be considered.
The adoption of enterprise-level agents will be deliberate, but it is highly likely that companies that do this well will outperform in their industries.
Planning for the Future
Building AI agents into an enterprise or building with them is a new paradigm, but it will be companies that do this well that will excel. Agentic systems should be built with the understanding that:
LLM capabilities will improve. This includes tactical features like context length for larger inputs, data types for multimodal analyses, and reasoning capabilities which can aid in structuring agent workflows
The end users of agents will increasingly be other agents (at least as intermediaries), but decisions that require liability will still be made by humans, at least currently.
Data structures matter more than ever. Data infrastructure is a huge topic in enterprise business, and agents will make this even more important because the power of agents is the ability to synthesize information, and the more accessible it is across an enterprise, the better the performance. Pipelines to transform all types of data into clean and accessible formats (structured, semi-structured, and unstructured) should be a central aspect of an enterprise’s data strategy at the point of data generation, wherever that may be.
It should be expected that agents will transform many aspects of business. This may happen more slowly than hype cycles will advertise, but it will happen, and it will be significant. Companies that are building their infrastructure with this mental model will have advantages, operationally, tactically, and strategically.
It is Actually Amazing…
It is not necessary to write agents from scratch to benefit from agentic products. Tools like Gemini DeepResearch, new products from Perplexity.ai, and a host of products being released both for workflow automation and modular agent composition are enabling a broad spectrum of technical expertise to engage with agentic AI. One of the most significant accelerators of innovation was the development of cloud computing which enabled individuals to access an arbitrarily powerful computing stack on demand eliminating the need to build local infrastructure. Agentic AI will be similarly impactful—enabling the construction of complex processes that are arbitrarily scalable.
It is actually amazing.
I think we should expect to see organizations adopting increasing autonomy. It is not an overnight transition, but organizations that do it well will have substantial advantages.
A primary goal of this Substack is to encourage people who are bio x AI curious to gain a better intuition for what is possible and practical. If you like it, please consider subscribing and sharing
References
https://arxiv.org/html/2402.06782v4
AI Products that Win
TLDR: This article concerns market and product strategy, particularly in fast-moving technology spaces. It provides some simple frameworks to think about, plan, and communicate strategic positioning at a range of different hierarchies both for life science companies and AI companies. While markets are different, the underlying tenets of strategy remain…
https://arxiv.org/html/2412.21154v1