# Creating Datasets and Data Preparation

A Dataset in Athenic AI is similar to a SQL view which is like a real-time filtered snapshot of data. Each dataset becomes part of the Knowledge Graph, the visual semantic layer that helps Athenic understand your data and respond to natural language questions.\
\
\&#xNAN;***For Enterprise Users:** Take advantage of 3 hours of complimentary consultation to optimize your datasets, ensuring an optimal experience with Athenic AI.*

### Prepare Your Data and Understand Datasets

Preparing your data carefully helps build a solid foundation for your AI Analyst and improves the accuracy of insights.

#### Prepare Your Data

* Remove any irrelevant or sensitive columns
* Standardize formats where needed (for example, date formats or categories)
* Check that relationships between tables are clear and logical

### Create Your Datasets

An **Advanced Dataset** lets you use SQL to:

* Define custom metrics, aggregations, or filters
* Join multiple tables together with business rules
* Clean and reshape data into polished views

Athenic reads your schema automatically, but for advanced datasets you can also manually adjust:

* Data types (e.g., number, text, date)
* Field names (rename technical names to user-friendly labels)
* Column descriptions

This approach is best suited for users comfortable with SQL who want optimized datasets for Athenic AI.

<figure><img src="/files/YoWVOFrQ9qXLM7Ui4pP7" alt=""><figcaption><p>Add new advanced dataset</p></figcaption></figure>

#### Common SQL Techniques

* **Join:** Combine columns from multiple tables (pre-joining can improve consistency and speed)
* **Filter:** Remove unnecessary rows or columns
* **Rename:** Make column names clearer and business-friendly
* **Standardize:** Normalize values (e.g., “California” → “CA”)

#### Run and Save Your SQL

1. Click **Run SQL** to see a preview
2. Review the results
3. Name your dataset (top-left corner)
4. Click **Save**

Your saved dataset will appear under **Datasets** and can be used in your project, added to the Knowledge Graph, and queried using natural language.

<figure><img src="/files/LKDPJwOJyxaY9fjFYwuG" alt=""><figcaption><p>SQL Commands</p></figcaption></figure>

### Requirements

* We highly recommend using a star schema or similar setup for your datasets. If you are not familiar with the concept, here is a great [introduction](https://www.geeksforgeeks.org/dbms/star-schema-in-data-warehouse-modeling/).

### Tips for Effective Data Preparation

* Use **Datasets** to apply business logic and create polished views
* Select only the tables and fields you need to include in your datasets to keep performance high
* Use clear, consistent naming  for easier use by your team
* Plan your datasets and relationships carefully before creating them
* Use joins to prepare your data into the grain you want for your governed metrics.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.athenic.com/getting-started/creating-datasets-and-data-preparation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
