Creating Datasets and Data Preparation

A Dataset in Athenic AI is similar to a SQL view which is like a real-time filtered snapshot of data. Each dataset becomes part of the Knowledge Graph, the visual semantic layer that helps Athenic understand your data and respond to natural language questions. For Enterprise Users: Take advantage of 3 hours of complimentary consultation to optimize your datasets, ensuring an optimal experience with Athenic AI.

Datasets

There are two types of datasets:

Basic Datasets

A Basic Dataset is a single table with the columns you choose. It does not allow for data cleaning, schema changes, or complex setup. This type is best for quickly exploring raw data as-is.

Advanced Datasets

An Advanced Dataset involves more detailed preparation and requires writing SQL for cleaning data, adjusting the schema, or pre-aggregating or joining different tables. Use this when you need to combine multiple tables, create custom metrics, apply filters, or build refined, human-readable views.

Prepare Your Data and Understand Datasets

Preparing your data carefully helps build a solid foundation for your AI Analyst and improves the accuracy of insights.

Prepare Your Data

  • Remove any irrelevant or sensitive columns

  • Standardize formats where needed (for example, date formats or categories)

  • Check that relationships between tables are clear and logical

Create Your Datasets

1. Basic Datasets (No SQL Required)

A Basic Dataset is created by selecting a single table and picking the columns you want.

  • Choose your source table

  • Select the relevant columns

This method is ideal if you want to quickly share raw data without needing SQL or data cleaning.

Add new basic dataset
Select new basic dataset

2. Advanced Datasets (SQL Required)

An Advanced Dataset lets you use SQL to:

  • Define custom metrics, aggregations, or filters

  • Join multiple tables together with business rules

  • Clean and reshape data into polished views

Athenic reads your schema automatically, but for advanced datasets you can also manually adjust:

  • Data types (e.g., number, text, date)

  • Field names (rename technical names to user-friendly labels)

  • Column descriptions

This approach is best suited for users comfortable with SQL who want optimized datasets for Athenic AI.

Add new advanced dataset

Common SQL Techniques

  • Join: Combine columns from multiple tables (pre-joining can improve consistency and speed)

  • Filter: Remove unnecessary rows or columns

  • Rename: Make column names clearer and business-friendly

  • Standardize: Normalize values (e.g., “California” → “CA”)

Run and Save Your SQL

  1. Click Run SQL to see a preview

  2. Review the results

  3. Name your dataset (top-left corner)

  4. Click Save

Your saved dataset will appear under Datasets and can be used in your project, added to the Knowledge Graph, and queried using natural language.

SQL Commands

Tips for Effective Data Preparation

  • Select only the tables and fields you need to include in your datasets to keep performance high

  • Use clear, consistent naming and descriptions for easier use by your team

  • Plan your datasets and relationships carefully before creating them


Next Steps

After setting up your data, proceed to:


Managing Datasets

  • You can edit, rename, or delete datasets anytime

  • Changes automatically update the Knowledge Graph and affect future queries


Best Practices

  • Use Basic Datasets to expose raw or simple data views

  • Use Advanced Datasets to apply business logic and create polished views

  • Add clear descriptions and definitions to improve question accuracy

  • Use joins to connect related datasets for deeper insights

Last updated