If you want more information on this opportunity or if you are an inventor, a university tech transfer office, a VC or indeed you just want to be part of our innovation community, we would love to talk with you.

Contact Us

AI in Chemistry: Transforming Molecule Discovery with Artificial Intelligence


Discovering and synthesising a specific molecule has always required a significant amount of time and scientific expertise. However, developments in AI are speeding up the process and redefining the procedures.

In this article we look at what work has been done with AI in chemistry for molecule discovery, explore examples of what is currently possible and examine the outcomes.


How is AI applied to chemistry?​

Chemists have used computers to aid in molecule discovery for many years. The field of cheminformatics, also known as chemical informatics or chemoinformatics, emerged with the advancement of automated techniques for creating compounds in parallel and conducting high-throughput screening (HTS) in the 1990s. This led to a significant increase in the availability of data on compounds and their properties. Cheminformatics encompasses various areas, including databases storing chemical structures, the prediction of structure properties, methods for virtual screening, identifying lead compounds for HTS, and designing chemical libraries. In the context of drug molecule discovery, key cheminformatic disciplines include pharmacophore modeling (representing essential features required for molecular recognition), quantitative structure-activity relationship (QSAR), docking, and molecular dynamics (MD) simulations.

Machine learning (ML), a branch of artificial intelligence, is often used in chemistry. It uses algorithms to analyse large amounts of training data, learn from the insights, and then make informed decisions. AI and machine learning techniques are now being used to model nonlinear datasets, as well as big data of increasing depth and complexity. The process of converting a compound structure into chemical data for machine learning is complex, involves multiple steps, and can impact the success of machine learning. Different machine learning models are used, including making inferences from the training data by comparison (instance-based learning) or from a trained statistical model (model-based learning). While it can be applied to many areas of chemistry like the development of cosmetic products, food chemistry, agrochemistry, risk assessment of chemicals, analytical chemistry, material science, or process control, pharmaceutical drug discovery is a major application of AI in chemistry.

Molecule discovery, particularly drug molecule discovery, follows an iterative, three-step cycle of design-make-test: designing molecules, making (synthesising) molecules, and testing the molecules for activity. This trial-and-error approach is very slow. It is estimated that it costs around $1 B to bring a new drug to the market from scratch, taking around 12 years. There is a continual need to accelerate drug discovery, reduce its costs, and decrease the time needed to develop new drugs and bring them to market.

But how can AI help this process?

Design – AI can design new molecules from scratch but can also be used to predict the chemical, biological, and physical properties of novel compounds and how the chemical structures can be optimised. Through cycles, improved molecular designs are achieved, providing a range of hit molecules, which can be investigated further. Unsupervised learning is the ML model used for this approach. To produce these lead compounds, certain predefined criteria have to be met, with different programmes and applications having different designs.

Make – Chemical synthesis is typically the most time-consuming step of the cycle. AI can help to identify the most effective and automatable synthetic routes and optimise them. It can also help with catalyst discovery and, combined with robotics, can be used to test synthesis pathways. Beyond simply providing a faster synthetic approach, it also ensures standardisation of synthesis, not always achieved by synthetic chemists. Some companies are even able to perform purification and analysis of  Further, synthetic feasibility can be a parameter introduced as a requirement for hit design.

Test – AI can predict other chemical properties such as absorption, distribution, metabolism, excretion, or toxicity of drugs (ADMET). This provides a way of analysing targets. However, it is not physical testing essential for drug development. Again, using robotics, screening and testing of samples is possible, but is in its infancy for drug discovery.


Key players

There is a wide range of organisations from multinationals to startups as well as academia that have been researching the use of AI in molecule discovery, while most of the big pharmaceutical companies are partnering with AI providers or producing their own systems. Here are a few recent examples.

An interesting collaboration has been established between pharmaceutical giants Sartorius and Eli Lilly, alongside biotech research company Matterworks, to enhance the efficiency of analysing raw metabolomic data. Matterworks has developed Pyxis, the first Large Spectral Model, which serves as a significant advancement in data analysis technology. This innovative tool, based on a semantic foundation, allows for the direct interpretation of raw instrument data. Scientists can now generate raw biochemical compositional data from LC-MS (Liquid Chromatography–Mass Spectrometry) data and leverage AI to interpret results with a speed up to 500 times faster than traditional methods. Pyxis compresses weeks of data acquisition and interpretation into real-time insight generation for applications such as biomanufacturing, including the production of therapeutic proteins.

PostEra, an American startup, has successfully introduced Proton, an advanced end-to-end machine learning platform designed to streamline the vital three-step cycle of Design-Make-Test commonly used in medicinal chemistry. Originally stemming from academic research, their innovative machine learning technology integrates chemical data with low-data machine learning techniques to efficiently craft optimised molecules possessing a multitude of essential properties required for drug candidates, while also predicting the synthesis pathway for these molecules. In 2020, PostEra unveiled COVID Moonshot, a groundbreaking open-science initiative aimed at developing a cure for COVID-19. Through a pioneering study, they applied their cutting-edge model to the human muscarinic acetylcholine receptor M1, successfully identifying four experimentally confirmed agonists that exhibit unique chemical structures distinct from all known ligands. The precision and accuracy of their model far surpassed current industry standards.

Insilico, a biotechnology company from Hong Kong, have developed several platforms using AI for drug discovery. They have a two stage approach to discover target molecules for treating diseases. Firstly, they use a model to identify areas for drug targeting for specific diseases, using data from trials and publications. They are able to identify the target areas they could exploit in certain conditions, and even provide a likelihood score for success. Once a specific protein target has been identified, they use their molecule discovery platform, Chemistry42. It screens millions of molecules, to find the best target for the specific protein binding and ultimately disease prevention. Insilico even have an AI tool to assist clinical trial design and implementation. The company are currently working on a new drug to target pulmonary fibrosis. They used their two stage approach to design the target molecule. Firstly, a new anti-pulmonary fibrosis target protein was identified, and an appropriate molecule was devised using their Chemistry42 programme, using structure based drug design workflow. The initial target was optimised further to improve solubility and ensure it had a good safety profile. It only took 18 months from project start to their successful pre-clinical trials. Their lead drug molecule is now undergoing phase II clinical .

Academic groups, institutes and networks are also studying AI and Chemistry. Individual academics are running multidisciplinary research teams composed of experts from different disciplines including computing, chemistry and robotics. For example, Prof Alan Aspuru-Guzik runs the matter lab at the University of Toronto that works at the interface of theoretical chemistry with physics, computer science, and applied mathematics. Entire institutes are being set up that specialise in the use of AI in Chemistry, like the Molecule Maker Lab Institute at the University of Illinois. Finally, the emergence of collaborative research networks are fostering exchange of ideas and research collaborations such as The European Laboratory for Learning and Intelligent Systems (ELLIS), a network of European researchers working in AI for Chemistry as well as other applications.



Has the work on AI been successful?

The benefit of computational methods lies in their potential to identify lead compounds much faster than a human chemist, accelerating molecule discovery and offering cost savings.

However, the application of AI to molecule discovery is still very new, and it is not established that the AI tools currently available lead to better outcomes. Datasets for training AI are incomplete. Training an AI neural network requires a dataset containing molecular composition and chemical property data from at least 100,000 compounds. As well as being large, datasets need to be reliable and free of bias. Often more data are needed, both experimental and simulated as well as historical data and data from failed experiments may be available for a small number of molecules but the number of possible chemical features that might determine binding will be much larger than the available dataset. For predicting synthesis reactions, an AI system is trained on specific chemical structures that different reactions work with, but data in publications about new reactions are usually not comprehensive. Ideally, AI developers should have access to open data, but a lot of data on molecule discovery will be proprietary e.g., held by big pharma companies.

AI tools for chemical synthesis need to factor in manufacturability and safety. AI can generate lead compounds that theoretically have the desired functionality, but the compounds may be too complicated to be manufactured cheaply. Data on chemical manufacturability need to be incorporated into AI models – these data often come from human chemists. AI systems cannot yet reliably predict those reactions which should be avoided, for example, strongly exothermic or explosive reactions – human chemists would know of these dangers.

Robotic systems for synthesising chemicals have the promise of speed – one study reports their robot could perform 1000 experiments in 10 days with an initial setup by a human chemist of 1/2 a day (making the systems 1000× faster than a human chemist carrying out the experiments manually). Robotic systems are still seen as an enabling technology, not a replacement for the scientist – current systems may not generate their own hypotheses and are not always successful e.g., the system developed by Google Deepmind and A-Lab failed to make 17 of 41 target materials because of experimental difficulties.

Although, we are now observing lead drug compounds, discovered using AI, entering clinical trials their success and overall time and cost savings are not yet known. Indeed, work Insilico does seem promising, but it is unclear if the proposed molecules could have been proposed by researchers, and if AI truly had a positive impact.



Academics and companies across many industries are interested in the use of AI in molecule design, reflecting the importance of pharmaceutical drugs, actives, ingredients and chemicals in many products. AI tools are being developed to address all three stages of the design-make-test cycle of molecule discovery, but are still being refined to make them more effective. Underlying reasons for the need for improvement of AI tools include a lack of comprehensive data sets for AI training as well as the challenge of translating 3D chemical structures into data formats compatible with machine learning algorithms. The synthesis of new molecules remains a bottle neck in molecule discovery although robotic systems offer the potential to accelerate molecule discovery.

AI cannot replace medicinal and synthetic chemists’ intuition, nuance, and originality, but chemists who are AI-enabled will ultimately outperform those who aren’t.

Strategic Allies Ltd is dedicated to collaborating with you to explore a variety of solutions for your business. Our approach involves a deep dive into the core of an issue to gain a comprehensive understanding of the complexities of a specific technology or market, enabling us to offer you a wide array of technology and solution options. If you are interested in exploring how we can support you, please reach out to John Allies at john@strategicallies.co.uk for an initial conversation.