- datapro.news
- Posts
- Data Engineering in 2024
Data Engineering in 2024
THIS WEEK: Is AI as Disruptive as World War Two?

Dear Data Professional…
Few events have reshaped the world as profoundly as World War Two. The conflict spurred unprecedented technological advancements, from radar to the first general-purpose computers, forever altering the trajectory of human progress. Today, we are in the midst of a similarly seismic shift—this time driven by artificial intelligence (AI).
Just as WWII catalysed rapid innovation across multiple domains, AI is now revolutionising industries at a breakneck pace. The introduction of generative AI models like ChatGPT has not only automated tasks once thought to be the exclusive domain of human intellect but has also created entirely new paradigms for business operations and data management.
AI's impact on data engineering is profound. Traditional data analysis, once the realm of SQL wizards and BI tool experts, is now accessible to a much broader audience through natural language interfaces like Chat GPT. Likewise, AI-assisted tools are boosting the productivity of data engineers, enabling them to build and optimise data pipelines with new levels of efficiency.

“With great power comes great responsibility”
As Stan Lee wrote in the Marvel Spider-Man Comic Strip. The increased accessibility and automation brought by AI necessitates robust data observability practices to ensure the reliability and integrity of data products. As we navigate this new era, the lessons from the past remind us that those who adapt and innovate will lead the charge into the future.
The goal of this newsletter is help Data & AI Engineers, along with Modern Data Managers navigate the new world of AI driven development, delivery and management. Each week we promise deliver value to your inbox with news, views and insights on how your career will be impacted by the disruptive force that is Generative AI.
STORY: A Day in the Life of a Data Engineer in 2024

Peter Parker
If Peter Parker had his time over again, there’s a chance that instead of being photo journalist, he’d become a data engineer. Much has changed over the last decade in Data Engineering and so, he just like you, would be coming to grips with the new realities of an AI enhanced world. On any typical workday in 2024, Peter starts his morning by checking the status of the company's cloud-based data pipelines. With a quick glance at his dashboard, he confirms that the nightly ETL jobs have run successfully, transforming and loading data from various sources into the cloud data warehouse.
Mr Parker’s first task of the day involves collaborating with the Machine Learning team to optimise a real-time recommendation engine. Using a no-code data integration platform, he swiftly connects new data sources to the existing pipeline, a process that would have taken days of custom coding just a few years ago. This simplification allows him to focus on more strategic initiatives.
Next Peter joins a virtual meeting with the analytics team to discuss data quality issues. Using advanced data observability tools, he quickly identifies the root cause of some inconsistencies in customer data. With a few clicks, he implements automated data quality checks and sets up alerts for future anomalies, showcasing how DataOps have changed maintaining of data integrity.

Data Innovators Exchange Community
In the afternoon, Mr Parker turns his attention to a new project involving edge computing for IoT devices. He begins designing a distributed data processing architecture that will allow for real-time analytics at the point of data collection, reducing latency and bandwidth costs. This shift towards edge computing represents a significant evolution in how data is managed and processed compared to the centralised approaches of the 2010’s.
Towards the end of the day, he reviews some pull requests for the team's data infrastructure code, which is now fully version-controlled and deployed using infrastructure-as-code principles. Sitting back at his desk he muses about how things have changed since he started out back in the 2010s:
1. The rise of cloud-native, managed data services has abstracted away much of the complexity in data infrastructure management.
2. The adoption of DataOps practices has led to more agile, collaborative, and quality-focused data engineering workflows.
3. The shift towards edge computing and distributed architectures has changed how data is collected, processed, and analysed moving from manual configurations and monolithic architectures to more agile workflows.
The pace of adoption of AI models is growing exponentially. Literally every type of knowledge worker is being impacted and the practice of Data Science is no exception. It is our belief that AI will profoundly impact your career in ways yet to be understood. Each week we aim to provide you with a perspective on this revolutionary force. We are keen to hear your opinion with the poll below.
Is AI as disruptive as WWII?Have your say. Do you agree or disagree? |
PLATFORM SPOTLIGHT: How Perplexity is reducing Hallucinations in everyday workflows
Lex Fridman was recently in conversation with Aravind Srinivas, CEO of Perplexity AI.
Fridman describes: “[Perplexity as] a company that aims to revolutionise how we humans get answers to questions on the internet. It combines search and large language models, LLMs, in a way that produces answers where every part of the answer has a citation to human-created sources on the web. This significantly reduces LLM hallucinations, and makes it much easier and more reliable to use for research, and general curiosity-driven late night rabbit hole explorations that I often engage in.”
When talking about what’s under the hood, Aravind Srinivas describes Perplexity as an “answer engine. Where you ask it a question, you get an answer. Except the difference is, all the answers are backed by sources. This is like how an academic writes a paper. Now, that referencing part, the sourcing part is where the search engine part comes in. You combine traditional search, extract results relevant to the query the user asked. You read those links, extract the relevant paragraphs, feed it into an LLM.”

Aravind Srinivas, CEO Perplexity
“That LLM takes the relevant paragraphs, looks at the query, and comes up with a well-formatted answer with appropriate footnotes to every sentence.” … ”It’s been instructed with that one particular instruction, given a bunch of links and paragraphs, write a concise answer for the user, with the appropriate citation. The magic is all of this working together in one single orchestrated product, and that’s what we built Perplexity for.”
The podcast delves into the workings of RAG - Retrieval Augmented Generation, along with a deeper discussion on when and if AI will achieve PhD level intelligence.

Some key points to know about Perplexity and the way it works include
Precise Information Retrieval: Perplexity's advanced natural language processing and machine learning capabilities enable it to understand complex queries and deliver accurate, context-relevant answers. This means you spend less time sifting through irrelevant search results and more time focusing on critical tasks.
Faster Response Times: By pre-indexing data, Perplexity can quickly retrieve relevant information from its database, resulting in faster response times compared to traditional search methods. This speed boost enables you to iterate and troubleshoot more efficiently, potentially accelerating project timelines.
Enhanced Data Quality: Perplexity’s generative capabilities can improve data quality by creating synthetic datasets for testing, simulating realistic scenarios, and augmenting existing data. This empowers you to build more reliable and resilient data pipelines and analytics platforms.
We highly recommend a listen to this enlightening conversation on effectively reducing hallucinations in the use of LLM’s.
For more on this topic check out the Data Radio Show - on YouTube and where you get your podcasts.
That’s a wrap for this week.