If you're a developer, you've been there. You query a third-party service, a NoSQL database, or an internal microservice, and what you get back is a deeply nested JSON object. While JSON's hierarchical structure is great for representing complex data entities, it becomes a major roadblock when you need to load that data into a BI tool, a spreadsheet, or a relational database.
The challenge is always the same: how do you effectively and reliably flatten this nested structure into a simple, tabular format? The traditional answer involves writing brittle, recursive scripts that are a pain to write and even more of a pain to maintain.
But what if you could describe the transformation you want and let an intelligent service handle the execution? Let's explore how to solve this common data transformation problem using a simple, powerful API.
Nested JSON is data that contains objects within objects or arrays of objects. For example, a customer order might look like this:
{
"orderId": "ORD-123",
"customer": {
"id": "CUST-A",
"contact": {
"name": "Alice Johnson",
"email": "alice@example.com"
}
},
"items": [
{ "sku": "SKU-001", "name": "Widget A", "price": 10.00 },
{ "sku": "SKU-002", "name": "Widget B", "price": 15.50 }
]
}
This structure is perfectly logical, but most analytics platforms and data warehouses (like BigQuery, Redshift, or even a simple CSV importer) work best with flat data, like this:
order_id | customer_id | customer_name | customer_email | item_sku | item_name | item_price |
---|---|---|---|---|---|---|
ORD-123 | CUST-A | Alice Johnson | alice@example.com | SKU-001 | Widget A | 10.00 |
ORD-123 | CUST-A | Alice Johnson | alice@example.com | SKU-002 | Widget B | 15.50 |
Flattening the data involves two key tasks:
Writing a custom script to handle this often leads to fragile code that breaks the moment the source schema changes. This is where a dedicated data transformation service becomes invaluable.
Instead of writing imperative code that details how to loop and recurse through a JSON object, a modern approach lets you declaratively define what the final structure should look like.
This is the core philosophy behind transform.do—intelligent data transformation as code. You define your transformation rules in a simple, version-controlled format, and our AI agents handle the complex execution. This turns a complex ETL job into a stable, reusable service you can call from anywhere.
Let's take our nested order data and flatten it using the transform.do agent. All we need to do is define our source data and a set of transformation rules.
Here's how you can do it with a few lines of code:
import { Agent } from "@do/sdk";
// Initialize the transformation agent
const transform = new Agent("transform.do");
// 1. Define your nested source data
const sourceData = [{
"orderId": "ORD-123",
"orderDate": "2024-05-21T15:00:00Z",
"customer": {
"id": "CUST-A",
"contact": { "name": "Alice Johnson", "email": "alice@example.com" }
},
"items": [
{ "sku": "SKU-001", "name": "Widget A", "price": 10.00, "quantity": 2 },
{ "sku": "SKU-002", "name": "Widget B", "price": 15.50, "quantity": 1 }
],
"shipping": { "address": "123 Main St, Anytown, USA" }
}];
// 2. Define your flattening and transformation rules
const transformations = {
rules: [
// "Explode" the items array to create a row for each item
{ unwind: "items" },
// Map nested fields to a flat structure using dot notation
{ rename: {
"orderId": "order_id",
"customer.id": "customer_id",
"customer.contact.name": "customer_name",
"items.sku": "item_sku",
"items.name": "item_name",
"items.price": "item_price",
"items.quantity": "item_quantity",
"shipping.address": "shipping_address"
}
},
// Create a new calculated field
{ addField: { "total_price": "{{item_price}} * {{item_quantity}}" } },
// Clean up the original complex fields
{ removeFields: ["customer", "items", "shipping"] }
]
};
// 3. Execute the transformation
const result = await transform.run({
source: sourceData,
transform: transformations
});
console.log(result.data);
Executing this workflow produces a clean, flat array of objects, ready for any analytics tool or database. The agent automatically handled the array explosion and nested field mapping based on our simple rules.
[
{
"order_id": "ORD-123",
"order_date": "2024-05-21T15:00:00Z",
"customer_id": "CUST-A",
"customer_name": "Alice Johnson",
"item_sku": "SKU-001",
"item_name": "Widget A",
"item_price": 10,
"item_quantity": 2,
"shipping_address": "123 Main St, Anytown, USA",
"total_price": 20
},
{
"order_id": "ORD-123",
"order_date": "2024-05-21T15:00:00Z",
"customer_id": "CUST-A",
"customer_name": "Alice Johnson",
"item_sku": "SKU-002",
"item_name": "Widget B",
"item_price": 15.5,
"item_quantity": 1,
"shipping_address": "123 Main St, Anytown, USA",
"total_price": 15.5
}
]
Flattening is just one piece of the puzzle. The transform.do platform is designed to handle a wide range of data transformation tasks within the same simple workflow:
By defining these complex ETL pipelines as simple, version-controlled services, you can chain them together, integrate them into CI/CD, and build robust, scalable data processing workflows without the maintenance headache.
Stop writing one-off scripts. Start building intelligent data transformation services.
Ready to simplify your data workflows? Visit transform.do to learn more and get started for free.