Data Automation

How to deploy DataOps: Step 2 – Create foundational data

Switchboard Feb 17

Creating foundational data
Table of Contents

    In our last post, we covered Step 1 of the four steps to realizing the benefits of Data Ops – Identify KPIs.

    Once your data collaboration goals and key metrics are established, the technical work of producing value from your data begins. Various teams across your organization require different combinations of metrics. But raw data accessed directly from APIs or log files rarely provides a format useful for collaborative analysis.

    To address this, you need to continuously and automatically refine raw data streams into a format useful to derive your KPIs. This is what we call ‘foundational data’.

    Foundational data: the basis for true KPIs

    Just as a jet aircraft requires highly-refined aviation fuel to achieve its full potential, similarly the KPIs that drive your business decisions require data that is high quality, well understood and, and above all, reliable.

    Unfortunately, raw data that comes from vendors and third parties can be anything but. No two APIs are alike, so data must be cleaned, typed and sometimes enriched with match tables to be useful. Oftentimes, data formats change, while connectivity or vendor hiccups present the ongoing risk of data loss or data corruption.

    Building meaningful KPIs from such data is impossible without taking on an enormous amount of complexity. With such a large gap between raw data and KPIs, experienced teams are investing in an intermediate concept – foundational data.

    Foundational data involves taking each source and normalizing into standardized and canonical versions, which, ideally, can be easily combined with other, similarly-refined data sources.

    Data source integration challenges

    If you’re using Google Ad Manager, MOAT, Krux, YouTube or a similar analytics platform, you’ll likely have experienced some of the following:

    • Programmatic deal data is not easily matched to campaign delivery data from the ad server
    • Complex APIs require developer resources to use and maintain
    • Many different metrics that are not always easily matched to campaign delivery or audience segments
    • Large user and segment logs require data parsing skills
    • Raw data exported to the data warehouse
    • Variables can be complex to analyze
    • Data such as UTMs and URLs can be challenging to clean

    Common data sources need to normalize

    Within most organizations, the list of data sources you need to master is clear. However, each incremental data source adds new complexities. By understanding the distinctive properties and challenges presented by each source, you‘ll be able to make a more informed tool selection based on the unique profile of your business.

    Disparate data sources, formats, and integration challenges will sap any budget, so how can you protect against mounting costs? Rather than attempt to on-board every single data source for its own sake, try to understand the data characteristics of your business today, and where it will be tomorrow. This will help you understand how your raw data can evolve to become the foundation for the specific metrics you need to succeed.

    Creating foundational data

    Let‘s say you‘re most interested in creating foundational GAM data, because as your primary ad server, GAM data can provide a rich view of how certain display and video inventory is delivered. Start with the GAM API. For brevity, we‘ll assume you‘re already familiar with its quirks and limitations.

    • The first step is to determine the appropriate queries and granularity of data required. An important consideration is identifying the dimensions you really need as there are quota limits.
    • Next is to use a script or a tool to invoke the API, and extract and store the query result. It‘s important to do this with 100% consistency so that query results maintain the same schema.
    • Each row in the query result needs to be type-checked, i.e., numeric values must be cast into integers or floats if they are to have any value for calculations.
    • Dimensions must be normalized in order to avoid textual inconsistencies (as a result of occasional human input error) that can also throw off calculations.
    • The query result needs to be written either to a file or, preferably, to a data warehouse, so that it can be consolidated for query-ability. Additional considerations include how to extract key-values so that the business attributes captured in custom dimensions can be extracted for analysis, as well as backfilling.
    • Finally, consider if you need Data Transfer (event-level server logs that can provide the finest possible granularity of insights).

    The steps above are the abbreviated set of tasks involved in creating foundational data for a selected data source. These processes also involve numerous data cleansing tasks (file encoding, to name just one), but they are a necessary part of getting your data into a neat, foundational layer.

    If you need help unifying your first or second-party data, we can help. Contact us to learn how.

    Schedule Demo
    subscribe

    STAY UPDATED

    Subscribe to our newsletter

    Submit your email, and once a month we'll send you our best time-saving articles, videos and other resources