June, 2023

article thumbnail

Generative AI and the Future of Data Engineering

Monte Carlo

Generative AI is taking the world by storm – here’s what it means for data engineering and why data observability is critical for this groundbreaking technology to succeed. Maybe you’ve noticed the world has dumped the internet, mobile, social, cloud and even crypto in favor of an obsession with generative AI. But is there more to generative AI than a fancy demo on Twitter?

article thumbnail

Modern Data Engineering with MAGE: Empowering Efficient Data Processing

Analytics Vidhya

Introduction In today’s data-driven world, organizations across industries are dealing with massive volumes of data, complex pipelines, and the need for efficient data processing. Traditional data engineering solutions, such as Apache Airflow, have played an important role in orchestrating and controlling data operations in order to tackle these difficulties.

article thumbnail

An educational side project

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.

Education 363
article thumbnail

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

A Comprehensive Guide to Convolutional Neural Networks

KDnuggets

Artificial Intelligence has been witnessing monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision.

article thumbnail

Introducing English as the New Programming Language for Apache Spark

databricks

Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™.

More Trending

article thumbnail

The Journey of a Senior Data Scientist and Machine Learning Engineer at Spice Money

Analytics Vidhya

Introduction Meet Tajinder, a seasoned Senior Data Scientist and ML Engineer who has excelled in the rapidly evolving field of data science. Tajinder’s passion for unraveling hidden patterns in complex datasets has driven impactful outcomes, transforming raw data into actionable intelligence. In this article, we explore Tajinder’s inspiring success story.

article thumbnail

Domain Registrars which Developers Recommend

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of four topics from today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here.

AWS 59
article thumbnail

Migrating Netflix to GraphQL Safely

Netflix Tech

By Jennifer Shin , Tejas Shikhare , Will Emmanuel In 2022, a major change was made to Netflix’s iOS and Android applications. We migrated Netflix’s mobile apps to GraphQL with zero downtime, which involved a total overhaul from the client to the API layer. Until recently, an internal API framework, Falcor , powered our mobile apps. They are now backed by Federated GraphQL , a distributed approach to APIs where domain teams can independently manage and own specific sections of the API.

Utilities 143
article thumbnail

10 ChatGPT Plugins for Data Science Cheat Sheet

KDnuggets

For an overview of what we believe to be the 10 of the best ChatGPT plugins for data science, check out our latest cheat sheet.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Meta developer tools: Working at scale

Engineering at Meta

Every day, thousands of developers at Meta are working in repositories with millions of files. Those developers need tools that help them at every stage of the workflow while working at extreme scale. In this article we’ll go through a few of the tools in the development process. And, as an added bonus, those we talk about below are open source so you can try them yourself.

Java 133
article thumbnail

New Approaches For Detecting AI-Generated Profile Photos

LinkedIn Engineering

Co-authors: Shivansh Mundra , Gonzalo Aniano Porcile , Smit Marvaniya , Hany Farid A core part of what we do on the Trust Data Team at LinkedIn is create, deploy, and maintain models that detect and prevent many types of abuse. This spans the detection and prevention of fake accounts, account takeovers, and policy-violating content. We are constantly working to improve and increase the effectiveness of our anti-abuse defenses to protect the experiences of our members and customers.

Media 132
article thumbnail

What Data Engineers Really Do?

Analytics Vidhya

In a data-driven world, behind-the-scenes heroes like data engineers play a crucial role in ensuring smooth data flow. Imagine being an online shopper who suddenly receives irrelevant recommendations. A data engineer investigates the issue, identifies a glitch in the e-commerce platform’s data funnel, and swiftly implements seamless data pipelines.

article thumbnail

Google Domains to shut down

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

By Jose Fernandez , Ed Barker , Hank Jacobs Introduction In November 2022, we introduced a brand new tier —  Basic with ads. This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. As we were gearing up for launch, we wanted to ensure it would go as smoothly as possible.

Algorithm 140
article thumbnail

Your Ultimate Guide to Chat GPT and Other Abbreviations

KDnuggets

Everyone seems to have gone crazy about ChatGPT, which has become a cultural phenomenon. If you’re not on the ChatGPT train yet, this article might help you better understand the context and excitement around this innovation.

160
160
article thumbnail

What is a self-serve data platform & how to build one

Start Data Engineering

1. Introduction 2. What is self-serve? 2.1. Components of a self-serve platform 3. Building a self-serve data platform 3.1. Creating dataset(s) 3.1.1. Gather requirements 3.1.2. Get data foundations right 3.2. Accessing data 3.3. Identify and remove dependencies 4. Conclusion 5. Further reading 6. References 1. Introduction Most companies want to build a self-serve data platform.

Building 130
article thumbnail

Yes, I'm learning Apache Flink - beginner's problems

Waitingforcode

Surprised? You shouldn't. I've always been eager to learn, including 5 years ago when for the first time, I left my Apache Spark comfort zone to explore Apache Beam. Since then I had a chance to write some Dataflow streaming pipelines to fully appreciate this technology and work on AWS, GCP, and Azure. But there is some excitement for learning-from scratch I miss.

AWS 130
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Top 10 Powerful Data Modeling Tools to Know in 2023

Analytics Vidhya

Introduction In the era of data-driven decision-making, having accurate data modeling tools is essential for businesses aiming to stay competitive. As a new developer, a robust data modeling foundation is crucial for effectively working with databases. Properly configured data structures ensure a smoother workflow and prevent data loss or misplacement.

Database 211
article thumbnail

Inside Agoda’s Private Cloud - Exclusive

The Pragmatic Engineer

👋 Hi, this is Gergely with the monthly, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’re not a subscriber, you missed the issue on Shopify’s leveling split and a few others. Subscribe to get two full issues every week.

Cloud 201
article thumbnail

Exploring Graphs in Rust. Yikes.

Confessions of a Data Guy

I’ve been a dog licking my wounds for some time now. Over on my Substack newsletter, I’ve been doing a small series on DSA (Data Structures and Algorithms). I tackled some of the easier stuff first, like Linked Lists, Binary Search, and the like. What’s more, I actually did most of it in Rust, since […] The post Exploring Graphs in Rust.

Algorithm 130
article thumbnail

AI: Large Language & Visual Models

KDnuggets

This article discusses the significance of large language and visual models in AI, their capabilities, potential synergies, challenges such as data bias, ethical considerations, and their impact on the market, highlighting their potential for advancing the field of artificial intelligence.

Data 160
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Data News — Week 23.25

Christophe Blefari

( credits ) Hey, this is the Data News. It's super hard to change habits, but it's how it is, the newsletter is going out on Saturday. I hope this edition finds you well. Summer is coming ☀️ Thank you all because we crossed the 3000 subscribers mark last week. Let's go for the 4000 before the end of the year 🤗 This is a almost-raw edition for this week.

article thumbnail

What's new in Apache Spark 3.4.0 - shuffle changes

Waitingforcode

Shuffle is a permanent point in the What's new in Apache Spark series. Why? It's often one the most time consuming part of the jobs and knowing the improvement simply helps writing better pipelines.

IT 130
article thumbnail

Mr. Pavan’s Data Engineering Journey Drives Business Success

Analytics Vidhya

Introduction We had an amazing opportunity to learn from Mr. Pavan. He is an experienced data engineer with a passion for problem-solving and a drive for continuous growth. Throughout the conversation, Mr. Pavan shares his journey, inspirations, challenges, and accomplishments. Thus, providing valuable insights into the field of data engineering. As we explore Mr.

article thumbnail

An explosion in software engineers using AI coding tools?

The Pragmatic Engineer

GitHub surveyed 500 developers in the US for a sense of how they use AI coding tools. I examine the results and add context on how the survey was conducted.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Conceptual Introduction to Delta Lake.

Confessions of a Data Guy

The post Conceptual Introduction to Delta Lake. appeared first on Confessions of a Data Guy.

Data 130
article thumbnail

Will ChatGPT Replace Data Scientists?

KDnuggets

Every job is at risk. Here’s how you can AI-proof your career.

Data 175
article thumbnail

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

Summary Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design.

Data Lake 130
article thumbnail

Data News — Week 23.24

Christophe Blefari

The newsletter, a metaphor ( credits ) Hello, after the good weather comes the storm. I'm now under the Berlin rain with 20° When I write in these conditions I feel like a tortured author writing a depressing novel while actually today I'll speak about the AI Act, Python, SQL and data platforms. Casual day at the office finally. Some personal news, next Monday and Tuesday I'll be at Berlin Buzzwords, if you're ping me, it would be a pleasure to meet and hang together.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.