ETFOptimize | High-performance ETF-based Investment Strategies

Quantitative strategies, Wall Street-caliber research, and insightful market analysis since 1998.


ETFOptimize | HOME
Close Window

Gretel Releases World’s Largest Open Source Text-to-SQL Dataset to Accelerate AI Model Training

SAN FRANCISCO, April 04, 2024 (GLOBE NEWSWIRE) -- Gretel, the leader in synthetic data, today released the world’s largest open source Text-to-SQL dataset to unlock new possibilities for AI in the enterprise. Available on Hugging Face and released under the Apache 2.0 license, Gretel’s dataset consists of over 100,000 high-quality synthetic Text-to-SQL samples with SQL metadata and spans 100 verticals. With access to Gretel’s open-source, high-quality synthetic dataset, developers can train AI models that empower business users to extract value from critical enterprise data sources, expediting AI initiatives across the enterprise.

"Access to quality training data is one of the biggest obstacles to building with generative AI. Everything Gretel does is designed to address this issue head-on, and contributing to the open-source community is no exception," said Alex Watson, co-founder & Chief Product Officer at Gretel. "By providing developers with high-quality, synthetic Text-to-SQL data, we're enabling them to create AI models that can understand natural language queries and generate SQL queries. This empowers users across the organization to easily access and derive insights from complex databases, data warehouses, and data lakes, without needing to learn SQL or rely on technical teams. We’re excited for developers to take our dataset for a spin, and build upon it.”

Growing demand for AI training data
The largest AI companies in the world are struggling with access to high-quality training data. And in the enterprise, Text-to-SQL data — data that’s essential for building natural language interfaces to critical data sources — is in particularly high demand. Nearly every enterprise has invaluable insights buried in data tables or data views that are only accessible to developers skilled in Structured Query Language (SQL) — the standard language for interacting with databases, data warehouses and data lakes. AI models trained on Text-to-SQL data allow business users to derive value from these datasets on demand.

Most text-to-SQL datasets today are manually curated and annotated, limiting their size, applicability and utility. This process is expensive, labor intensive, and cumbersome. For instance, the Spider text-to-SQL dataset, consisting of 7k samples, was annotated by 11 college students at Yale, and took a total of 1,000 hours to complete — an incredible amount of effort for a relatively small dataset in the context of large language models.

Furthermore, the vast majority of existing Text-to-SQL datasets lack a natural language explanation of what their SQL code does. Gretel’s dataset includes an explanation field providing a plain-english description of the SQL code, which helps end users quickly understand the output and realize its value.

Filling a gap in the open source community
To date, the open source community has offered little reprieve. The Spider dataset, for instance, is available under a commercially permissive creative commons license (CC-BY-SY-4.0), but it’s a copyleft license, meaning a derivative work must be licensed under the same or a compatible license. This differs significantly from MIT or Apache licenses, which allow derivative works to be released under different license terms without attribution or sharealike terms.

With Gretel’s Text-to-SQL dataset released under the Apache 2.0 license, AI developers can build conversational applications that open up a range of new opportunities for businesses across industries. For instance:

  • Finance: Analysts and managers can ask questions about the company's financial performance and get instant answers sourced from their databases. Example query: "What was the total revenue generated from credit card transactions in the last quarter, broken down by product category?”
  • Health: Providers can streamline the process of querying and analyzing clinical trial data from multiple (2-10k) experiments. Example query: "Find the average reduction in blood pressure for patients aged 45-60 who received the new drug, compared to the placebo group, over the last 6 months of the trial."
  • Government: Leaders can provide citizens with an easy way to search for and access public records databases that may include licenses, property ownership, permits, etc. Example query: "Find the top 10 counties with the highest population growth rate between 2010 and 2020."

Elevating data quality in the age of AI
Gretel’s Text-to-SQL dataset was generated by Gretel Navigator, a compound AI system that integrates agent-based execution, multiple proprietary models, including a custom tabular Large Language Model (TabLLM), and privacy-enhancing technologies to generate high quality synthetic data for enterprise AI teams.

The quality of Gretel’s dataset was assessed and compared to other Text-to-SQL datasets using an independent service and the LLM-as-a-judge technique. A full synopsis of Gretel’s holistic quality evaluation is available on the blog. A deeper dive into the data, including data previews, examples, and statistics, can be found on Hugging Face. To learn more about Gretel’s platform and capabilities, visit gretel.ai.

About Gretel
Gretel leverages advanced generative AI models and privacy-enhancing technologies to turn data into a safe, renewable resource that anyone can use. These services are available on its platform as low-code tools and APIs that deliver synthetic versions of data in multiple modalities (text, tabular, time series, image). Gretel’s platform and SaaS tools are self-serve, flexible, and scalable to accommodate any developer workflow. Learn more at https://gretel.ai.

Media Contact
LaunchSquad for Gretel
gretel@launchsquad.com


Primary Logo

Recent Quotes

View More
Symbol Price Change (%)
AMZN  213.21
-5.73 (-2.62%)
AAPL  257.46
-2.83 (-1.09%)
AMD  192.43
-7.02 (-3.52%)
BAC  48.64
-0.89 (-1.80%)
GOOG  298.30
-2.61 (-0.87%)
META  644.86
-15.71 (-2.38%)
MSFT  408.65
-2.03 (-0.49%)
NVDA  177.82
-5.52 (-3.01%)
ORCL  152.96
-1.83 (-1.18%)
TSLA  396.73
-8.82 (-2.17%)
Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the Privacy Policy and Terms Of Service.


 

IntelligentValue Home
Close Window

DISCLAIMER

All content herein is issued solely for informational purposes and is not to be construed as an offer to sell or the solicitation of an offer to buy, nor should it be interpreted as a recommendation to buy, hold or sell (short or otherwise) any security.  All opinions, analyses, and information included herein are based on sources believed to be reliable, but no representation or warranty of any kind, expressed or implied, is made including but not limited to any representation or warranty concerning accuracy, completeness, correctness, timeliness or appropriateness. We undertake no obligation to update such opinions, analysis or information. You should independently verify all information contained on this website. Some information is based on analysis of past performance or hypothetical performance results, which have inherent limitations. We make no representation that any particular equity or strategy will or is likely to achieve profits or losses similar to those shown. Shareholders, employees, writers, contractors, and affiliates associated with ETFOptimize.com may have ownership positions in the securities that are mentioned. If you are not sure if ETFs, algorithmic investing, or a particular investment is right for you, you are urged to consult with a Registered Investment Advisor (RIA). Neither this website nor anyone associated with producing its content are Registered Investment Advisors, and no attempt is made herein to substitute for personalized, professional investment advice. Neither ETFOptimize.com, Global Alpha Investments, Inc., nor its employees, service providers, associates, or affiliates are responsible for any investment losses you may incur as a result of using the information provided herein. Remember that past investment returns may not be indicative of future returns.

Copyright © 1998-2017 ETFOptimize.com, a publication of Optimized Investments, Inc. All rights reserved.