ETFOptimize | High-performance ETF-based Investment Strategies

Quantitative strategies, Wall Street-caliber research, and insightful market analysis since 1998.


ETFOptimize | HOME
Close Window

Unearthing experimental data buried in scientific papers



TSUKUBA, Japan, Jan 8, 2026 - (ACN Newswire) - Technologies that underpin modern society, such as smartphones and automobiles, rely on a diverse range of functional materials. Materials scientists are therefore working to develop and improve new materials, but predicting material properties is no simple task. Data science is key to transforming this field, and new tools powered by artificial intelligence are expected to accelerate the exploration, collection, and management of materials property data worldwide.

Researchers and artificial intelligence work together to collect experimental materials science data from papers worldwide and build a database. (Copyright: Kenji Tashiro. Instagram: ripplemarkmaker. CC-BY-4.0)
Researchers and artificial intelligence work together to collect experimental materials science data from papers worldwide and build a database.
(Copyright: Kenji Tashiro. Instagram: ripplemarkmaker. CC-BY-4.0)

The relationship between functional materials and their properties is complex. Even slight differences in composition or synthesis methods can affect electronic states and microstructures, often resulting in entirely different properties. For this reason, theoretical models alone cannot provide reliable predictions, and the intuition of researchers and engineers built on years of experience has played a significant role.

Machine learning is a technology that can learn empirical trends rather than relying on theory. By applying machine learning to experimental data in materials science, it may be possible to replicate such intuition computationally. Large language models (LLMs), such as ChatGPT, now support the daily lives of many people and are capable of flexible information extraction that takes background knowledge and context into account. This opens up the possibility of automating the process of converting complex information sources like scientific papers into structured data. If large-scale datasets of experimental data can be built through this approach, it is expected to enable researchers to gain inspiration through a bird's-eye view of the data, as well as to realize property predictions based on empirical trends using machine learning.

A team led by Dr. Yukari Katsura, a Senior Researcher at the National Institute for Materials Science (NIMS), has focused on this potential and developed two new tools to accelerate the construction of Starrydata, a materials property database built from data collected from scientific papers. This work was recently published in the journal Science and Technology of Advanced Materials: Methods.

"Graphs in the millions of papers published to date contain valuable experimental data collected by past researchers, and much of it remains untapped," says Prof. Katsura. In the Starrydata project, which she launched in 2015, data collection from papers was performed manually and supported by the independently developed Starrydata2 web system, successfully amassing an unprecedented volume of experimental data. The new tools are designed to further streamline this data collection process. "We found that by specifying a data structure and giving instructions to an LLM, we can accurately and comprehensively extract information about figures, tables, and samples from the text of paper PDFs across a wide range of fields."

Prof. Katsura added, "Many publishers prohibit the use of artificial intelligence on paper PDFs, so we are currently developing the system to target open-access papers."

The first tool, Starrydata Auto-Suggestion for Sample Information, is a function that reads the text of a paper and suggests candidate entries for data fields pre-designed for each materials domain; it is already integrated into the Starrydata2 web system. When a user pastes text from a paper's abstract or experimental methods section, it is sent to OpenAI's GPT via API, and candidate entries in English are automatically displayed below each input field.

The second tool, Starrydata Auto-Summary GPT, deconstructs an entire open-access paper PDF uploaded by the user and automatically summarizes all descriptions of figures, tables, and samples appearing in the paper as a structured data in JSON format. The JSON data output is generated using ChatGPT's custom GPT feature, and the resulting data can be viewed as an easy-to-read table in a web browser. Although this data is not currently incorporated directly into the Starrydata database, it dramatically accelerates the work of data collectors in quickly locating target data and entering information. Note that reading data points from graph images is difficult for LLMs, so this task is performed by data collectors using an independently developed semi-automated tool.

"A paper is a logical structure assembled to convey the author's claims, but by deconstructing it and returning it to the form of experimental data, other researchers can also use it for their own research," says Dr. Katsura. "In this way, we are aiming for a future where experimental data from all materials science fields can be shared in digital format and viewed from a bird's-eye perspective."

At present, Starrydata has only progressed in building databases for certain materials science fields, such as thermoelectric materials that convert heat and electricity, and magnets. However, as an open dataset that can be used for new materials development, it is beginning to be utilized primarily by leading researchers around the world. The team is advancing their research with the aim of raising broader awareness of the potential of such large-scale experimental data and establishing paper data collection as a recognized form of research within the scientific community.

Further information
Yukari Katsura
Senior Researcher, National Institute for Materials Science (NIMS)
KATSURA.Yukari@nims.go.jp
(Yukari Katsura is also an associate professor at University of Tsukuba and guest researcher at RIKEN)

Paper: https://doi.org/10.1080/27660400.2025.2590811 

About Science and Technology of Advanced Materials: Methods (STAM-M)

STAM Methods is an open access sister journal of Science and Technology of Advanced Materials (STAM), and focuses on emergent methods and tools for improving and/or accelerating materials developments, such as methodology, apparatus, instrumentation, modeling, high-through put data collection, materials/process informatics, databases, and programming. https://www.tandfonline.com/STAM-M 

Dr Kazuya Saito
STAM Methods Publishing Director
SAITO.Kazuya@nims.go.jp

Press release distributed by Asia Research News for Science and Technology of Advanced Materials.

]]>

Source: Science and Technology of Advanced Materials

Copyright 2026 ACN Newswire . All rights reserved.

Recent Quotes

View More
Symbol Price Change (%)
AMZN  246.47
-0.91 (-0.37%)
AAPL  260.25
+0.88 (0.34%)
AMD  207.69
+4.52 (2.22%)
BAC  55.19
-0.66 (-1.18%)
GOOG  332.73
+3.59 (1.09%)
META  641.97
-11.09 (-1.70%)
MSFT  477.17
-2.11 (-0.44%)
NVDA  184.94
+0.08 (0.04%)
ORCL  204.68
+6.16 (3.10%)
TSLA  448.96
+3.95 (0.89%)
Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the Privacy Policy and Terms Of Service.


 

IntelligentValue Home
Close Window

DISCLAIMER

All content herein is issued solely for informational purposes and is not to be construed as an offer to sell or the solicitation of an offer to buy, nor should it be interpreted as a recommendation to buy, hold or sell (short or otherwise) any security.  All opinions, analyses, and information included herein are based on sources believed to be reliable, but no representation or warranty of any kind, expressed or implied, is made including but not limited to any representation or warranty concerning accuracy, completeness, correctness, timeliness or appropriateness. We undertake no obligation to update such opinions, analysis or information. You should independently verify all information contained on this website. Some information is based on analysis of past performance or hypothetical performance results, which have inherent limitations. We make no representation that any particular equity or strategy will or is likely to achieve profits or losses similar to those shown. Shareholders, employees, writers, contractors, and affiliates associated with ETFOptimize.com may have ownership positions in the securities that are mentioned. If you are not sure if ETFs, algorithmic investing, or a particular investment is right for you, you are urged to consult with a Registered Investment Advisor (RIA). Neither this website nor anyone associated with producing its content are Registered Investment Advisors, and no attempt is made herein to substitute for personalized, professional investment advice. Neither ETFOptimize.com, Global Alpha Investments, Inc., nor its employees, service providers, associates, or affiliates are responsible for any investment losses you may incur as a result of using the information provided herein. Remember that past investment returns may not be indicative of future returns.

Copyright © 1998-2017 ETFOptimize.com, a publication of Optimized Investments, Inc. All rights reserved.