# Creating Datasets and Data Preparation

A Dataset in Athenic AI is similar to a SQL view which is like a real-time filtered snapshot of data. Each dataset becomes part of the Knowledge Graph, the visual semantic layer that helps Athenic understand your data and respond to natural language questions.\
\
\&#xNAN;***For Enterprise Users:** Take advantage of 3 hours of complimentary consultation to optimize your datasets, ensuring an optimal experience with Athenic AI.*

### Prepare Your Data and Understand Datasets

Preparing your data carefully helps build a solid foundation for your AI Analyst and improves the accuracy of insights.

#### Prepare Your Data

* Remove any irrelevant or sensitive columns
* Standardize formats where needed (for example, date formats or categories)
* Check that relationships between tables are clear and logical

### Create Your Datasets

An **Advanced Dataset** lets you use SQL to:

* Define custom metrics, aggregations, or filters
* Join multiple tables together with business rules
* Clean and reshape data into polished views

Athenic reads your schema automatically, but for advanced datasets you can also manually adjust:

* Data types (e.g., number, text, date)
* Field names (rename technical names to user-friendly labels)
* Column descriptions

This approach is best suited for users comfortable with SQL who want optimized datasets for Athenic AI.

<figure><img src="https://2503620627-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVqTQOKxSAPs7Xr7cb4Ms%2Fuploads%2FeI9hq0CHGma9rgssAuMU%2FAdd%20Advanced%20Dataset.png?alt=media&#x26;token=5e40c3aa-a107-40c0-ae94-5a96bbbc8be9" alt=""><figcaption><p>Add new advanced dataset</p></figcaption></figure>

#### Common SQL Techniques

* **Join:** Combine columns from multiple tables (pre-joining can improve consistency and speed)
* **Filter:** Remove unnecessary rows or columns
* **Rename:** Make column names clearer and business-friendly
* **Standardize:** Normalize values (e.g., “California” → “CA”)

#### Run and Save Your SQL

1. Click **Run SQL** to see a preview
2. Review the results
3. Name your dataset (top-left corner)
4. Click **Save**

Your saved dataset will appear under **Datasets** and can be used in your project, added to the Knowledge Graph, and queried using natural language.

<figure><img src="https://2503620627-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVqTQOKxSAPs7Xr7cb4Ms%2Fuploads%2Fhfzk5gAdpzr1g7KXZ8AO%2FScreenshot%202024-12-12%20at%2010.30.24%E2%80%AFAM.png?alt=media&#x26;token=342a0686-ba9f-4bcc-9736-533eb30ee8f8" alt=""><figcaption><p>SQL Commands</p></figcaption></figure>

### Requirements

* We highly recommend using a star schema or similar setup for your datasets. If you are not familiar with the concept, here is a great [introduction](https://www.geeksforgeeks.org/dbms/star-schema-in-data-warehouse-modeling/).

### Tips for Effective Data Preparation

* Use **Datasets** to apply business logic and create polished views
* Select only the tables and fields you need to include in your datasets to keep performance high
* Use clear, consistent naming  for easier use by your team
* Plan your datasets and relationships carefully before creating them
* Use joins to prepare your data into the grain you want for your governed metrics.
