Dataset in SAC


Do you know about datasets?

Before we discuss datasets, it is important to understand about few tools. Here, in SAC every tools are interconnected, so having a basic understanding about these tools helps you understand about datasets.

In upcoming blogs, we will explore each tools in detail. Stay Tuned!!!


Fig: 1

In menu, under Apps, you can see applications like story, analytics applications, data analyzer, digital boardroom, datasets, modeler, data action etc... It can also denoted as tools. I will give you a brief idea about some of these tools.
Stories: Where we build dashboard and reports.

Datasets: Dataset designed for quick analysis.

Modeler: You can use this as basis for your story. The Modeler is where you can create, maintain, transform, edit and load data into model. Modeler and datasets are kind of similar but modeler is more advanced. 


Datasets

Datasets designed for quick analysis. Its is the first choice when you want to create story or visualization quickly. Here, you can clean, and prepare data.

Step 1: Open datasets from menu, you can see this create option which is shown in the image given below.

Fig: 2

  • If you are uploading excel or csv file as your dataset, you can use 'from a csv or excel file'. 

  • If you are uploading data from any datasource use 'From a data source'. 
  • Lets start by uploading dataset from excel sheet. 

Step 2: Select 'From a CSV or excel file', and upload any excel sheet from your file.  

Step 3: After uploading data, you have to save the datasets in your file. 

Fig: 3 

Create a folder by clicking on the folder icon on top. You can save all your work in this folder. Save the file. 

Step 4: Here, you can improve your data quality. 

Fig: 4

In the top bar, you can see some options used for cleaning and transforming data. Save, undo, redo, sort columns, toggle transform bar, toggle custom expression editor, geo enrichment, level based hierarchy etc... You can see the name of each tool by hovering over the icons.


Fig: 5

In right panel, you can see the number of rows (2000), number of columns (15), and pencil icon used for editing (to rename the file). 

If your data contain more than 2000 number of rows, in datasets and modeler you will be able to see only 2000 number of rows. The changes applying to these sample will be applied to all data. 

Step 5: Discuss about each tools in it. 

    • Can edit the file name
    • Can convert measure to dimension and dimension to measure 

    Measure: Numerical value that can be calculated, aggregated, and analyzed. These are quantitative data, meaning they can be summed, averaged, compared or used in calculations. Eg: sales, profit, quantity, sold, cost.

    Dimension: Categories used to slice, group, identify and filter data. These are descriptive or categorical data, they give context to the measure. For Eg: Product name, customer ID, region/country, date, postal code. 

    So assign each column according to its datatype properly to measure and dimension. Hover over the column name in the right side panel, two icons will be visible. By clicking on 3 dots, 4 options will be visible. 

    There is the another option before three dots. By clicking on it, you will get more details of that particular column. Click on it and analyze the details by yourself.  

     

    • You can delete the column 
    • Can rename the column
    • Aggregation type: sum, count, none(where aggregation may not be possible), min, max. Set this according to our data. 
    • Toggle transform bar - By clicking on this option, you can see a bar (create transform) option.

      Fig:6

    1. Concatenate - to combine two columns
    2. Split - To split a column
    3. Extract - To extract any word from all values in a particular column
    4. Replace - To replace any word from a column
    5. Filter - To filter column according to any condition 

     If you select any of these option, corresponding formula would come on the bar. You can select the column and add into the equation. Try it out. 

    • The toggle custom expression editor is a tool in SAC used to fix and clean up messy data when you first bring it into the system.

    Fig:7

      • For eg: you can convert lowercase to uppercase using formula
      • [Clean_city] = upperCase([City]) 

    • Geo Enrichment - There are two options 
    • By coordinates: if you want to use latitude and longitude data to enrich a dimension or create a location dimension.
      • You can use chatgpt or other AI to create formula as per requirement.

      Fig: 8
       

      • By area name: If you want to enrich a dimension based on country, region, and sub region data.

      Fig: 9

    • Level-based-hierarchy: When your data is organized into levels, such as product category, product group and product. You can create this in story, and models also. 
    • In toolbar, open level based hierarchy. Name hierarchy

      • Select columns to build hierarchy. You can drag the column tokens to change the order of level.
      • Eg: Category --> Sub category --> Product


    • Transpose column into rows
      • Enables you to change selected columns into rows.
      • This is used when your data is wide format and you want to convert long format. 


    • Remove duplicate rows: Can remove duplicate rows by clicking this tool.

    • Reimport data into datasets
      • Data is increasing day by day. If you upload an excel one day and the data in the original data source has increased the very next day. If you want to get access to the newly entered data, reimport your dataset here. So that all new data changes will be applied to story. 

    • Details: Can see the details of the datasets using this option

    • Transform: By clicking on transform you can see the changes you made in this dataset


    Step 6: All these steps are done according to your requirement. Now you get a clean data and can save the changes.



    Comments