Summer β€˜23

About Pratham Books πŸ“š

Pratham Books is a not-for-profit multilingual children's book publisher based in Bangalore, India with a mission to put "A Book in Every Child’s Hand". Pratham Books offers diverse solutions to ensure children have access to a wide range of books to support reading and knowledge acquisition. In 18 years, Pratham Books has published over 5000 books in 24 languages, and distributed more than 30M printed books and story cards (mini story books) across India.

In 2015, Pratham Books launched Story Weaver, an open source digital repository of multilingual storybooks that are available under an open license - Creative Commons' CC-BY 4.0 and are free to use. The stories can be read, downloaded, translated, and versioned or printed using the tools embedded on the platform. StoryWeaver today has over 50,000 stories in 330+ languages, with a 110M readership. StoryWeaver has a large global footprint and the books are regularly accessed by users in 150 countries with 70% of its traffic coming from India.

Project Goal 🎯

Pratham Books has a free-tier Google Analytics 4 Property set up for StoryWeaver website. Data exploration happens through the BigQuery(BQ) link that they have created directly from GA4 property which feeds to the raw table called Events within the BQ sandbox. They further analyzed data through the inbuilt BigQuery SQL Query Explorer console.

StoryWeaver previous workflow

StoryWeaver previous workflow

As Pratham Books uses free/ low cost data tools (GA-4, Metabase, Data Studio), they have struggled with quota limits or limitations on data transformations.

With strong analytics skills in-house, their goal is to have centralized data warehouse hosted on their servers. This needs to be linked to a SQL based advanced query environment that will allow us to model aggregate tables and maintain a well-defined intermediate analytical data layer.

Project Objective

Integrated Data Analytics Platform is project aimed at creating a modern and future-proof data stack that can support insights informing product, content and technical decisions at StoryWeaver. In addition to creating a centralized warehouse that will support holistic data views, we wish to move analyst efforts from manual reporting to automated analytics engineering.

  1. Cost-Efficient GA-4 Data Integration: Develop a streamlined, cost-efficient data engineering workflow to seamlessly transfer Google Analytics 4 (GA-4) data into Pratham Books' central warehouse and pre-process the data for further analysis.
  2. SQL-Based Transformation Environment: Establish a free or low-cost SQL-based transformation environment in the data warehouse to facilitate the visualization of intricate lineages within the derived tables, enhancing data insights and decision-making.

Curious about the process? πŸ’­

Project Scope phase (3 weeks)

During the initial phase of scoping the project, our team engaged with the client to gain insights into their existing pipeline and identify potential areas for improvement. Due to the intricacies of the project, it became evident that a more comprehensive understanding of their data sources was essential. As a result, we proactively reached out to the client, requesting a detailed technical document outlining the databases involved and their corresponding requirements.

Leveraging the valuable guidance of our mentor, we conducted collaborative brainstorming and white-boarding sessions to explore innovative approaches that could not only automate but also enhance the efficiency of the pipeline. Subsequently, we meticulously documented these proposed strategies within an architecture document.

We then prepared a Product requirements document to list out the tasks that are going to be taken up in this DFG-summer cycle.

Project Implementation phase (12 weeks)