2017

article thumbnail

The Downfall of the Data Engineer

Maxime Beauchemin

This post follows up on The Rise of the Data Engineer , a recent post that was an attempt at defining data engineering and described how this new role relates to historical and modern roles in the data space. In this post, I want to expose the challenges and risks that cripple data engineers and enumerates the forces that work against this discipline as it goes through its adolescence.

article thumbnail

Evolving Distributed Tracing at Uber Engineering

Uber Engineering

Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber Engineering, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds … The post Evolving Distributed Tracing at Uber Engineering appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Wallaroo with Sean T. Allen - Episode 12

Data Engineering Podcast

Summary Data oriented applications that need to operate on large, fast-moving sterams of information can be difficult to build and scale due to the need to manage their state. In this episode Sean T. Allen, VP of engineering for Wallaroo Labs, explains how Wallaroo was designed and built to reduce the cognitive overhead of building this style of project.

Kafka 100
article thumbnail

8 Key Facts You Should know if You are a HR Professional

U-Next

Two of the most common reasons why people think they can be great HR professionals are either they are very organized and systematic or they have good people skills. But these two qualities alone are not enough for anyone to make it big in their career in human resource management. The two attributes can land them jobs but to move up the ladder, they definitely need some qualities that will set them apart from other employees.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Constant Gardening

Zalando Engineering

How effective management is a continuing story of growth Producers’ Style One of the things I struggled the most with in the past year was identifying the best way to lead my teams. I worked a lot on myself, observed my peers, and tried to learn from my leads, but in the end, I ran into into the well known dilemma: task-focused or people-focused management, which one is best?

article thumbnail

Recap of Hadoop News for November 2017

ProjectPro

News on Hadoop - November 2017 IBM leads BigInsights for Hadoop out behind barn. Shots heard.theRegister.co.uk, November 8, 2017. IBM’s BigInsights for Hadoop sunset on December 6, 2017. IBM will not provide any further new instances for the basic plan of its data analytics platform. The existing instances will continue to be available on the Bluemix console as is from December 7, 2017 to November 7, 2018.

Hadoop 52

More Trending

article thumbnail

Deep Learning in Cloudera

Cloudera

Deep learning is in the news. It’s changing the game. It’s changing your life. It’s changing everything. It will change the world. It’s good to see people excited about technology. But deep learning is a tool that enterprises use to solve practical problems. Nothing more, and nothing less. In this blog, we provide a few examples that show how organizations put deep learning to work.

article thumbnail

Apache Airflow and the Future of Data Engineering: A Q&A

Maxime Beauchemin

With a brief Introduction and Takeaway added by Taylor D. Edmiston Introduction Every once in a while I read a post about the future of tech that resonates with clarity. A few weeks ago it was The Rise of the Data Engineer by Maxime Beauchemin, a data engineer at Airbnb and creator of their data pipeline framework, Apache Airflow. At Astronomer, Apache Airflow is at the very core of our tech stack : our integration workflows are defined by data pipelines built in Apache Airflow as directed acycl

article thumbnail

The Rise of the Data Engineer

Maxime Beauchemin

I joined Facebook in 2011 as a business intelligence engineer. By the time I left in 2013, I was a data engineer. I wasn’t promoted or assigned to this new role. Instead, Facebook came to realize that the work we were doing transcended classic business intelligence. The role we’d created for ourselves was a new discipline entirely. My team was at forefront of this transformation.

article thumbnail

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering

With the evolution of storage formats like Apache Parquet and Apache ORC and query engines like Presto and Apache Impala , the Hadoop ecosystem has the potential to become a general-purpose, unified serving layer for workloads that can tolerate latencies … The post Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop appeared first on Uber Engineering Blog.

Hadoop 105
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

SiriDB: Scalable Open Source Timeseries Database with Jeroen van der Heijden - Episode 11

Data Engineering Podcast

Summary Time series databases have long been the cornerstone of a robust metrics system, but the existing options are often difficult to manage in production. In this episode Jeroen van der Heijden explains his motivation for writing a new database, SiriDB, the challenges that he faced in doing so, and how it works under the hood. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll

Database 100
article thumbnail

Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10

Data Engineering Podcast

Summary To process your data you need to know what shape it has, which is why schemas are important. When you are processing that data in multiple systems it can be difficult to ensure that they all have an accurate representation of that schema, which is why Confluent has built a schema registry that plugs into Kafka. In this episode Ewen Cheslack-Postava explains what the schema registry is, how it can be used, and how they built it.

Kafka 100
article thumbnail

data.world with Bryon Jacob - Episode 9

Data Engineering Podcast

Summary We have tools and platforms for collaborating on software projects and linking them together, wouldn’t it be nice to have the same capabilities for data? The team at data.world are working on building a platform to host and share data sets for public and private use that can be linked together to build a semantic web of information. The CTO, Bryon Jacob, discusses how the company got started, their mission, and how they have built and evolved their technical infrastructure.

article thumbnail

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Data Engineering Podcast

Summary With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats.

Hadoop 100
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Astronomer with Ry Walker - Episode 6

Data Engineering Podcast

Summary Building a data pipeline that is reliable and flexible is a difficult task, especially when you have a small team. Astronomer is a platform that lets you skip straight to processing your valuable business data. Ry Walker, the CEO of Astronomer, explains how the company got started, how the platform works, and their commitment to open source.

article thumbnail

ScyllaDB with Eyal Gutkind - Episode 4

Data Engineering Podcast

Summary If you like the features of Cassandra DB but wish it ran faster with fewer resources then ScyllaDB is the answer you have been looking for. In this episode Eyal Gutkind explains how Scylla was created and how it differentiates itself in the crowded database market. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch

Database 100
article thumbnail

Defining Data Engineering with Maxime Beauchemin - Episode 3

Data Engineering Podcast

Summary What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this episode, Maxime Beauchemin joins me to discuss these questions and more. Transcript provided by CastSource Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.

article thumbnail

Dask with Matthew Rocklin - Episode 2

Data Engineering Podcast

Summary There is a vast constellation of tools and platforms for processing and analyzing your data. In this episode Matthew Rocklin talks about how Dask fills the gap between a task oriented workflow tool and an in memory processing framework, and how it brings the power of Python to bear on the problem of big data. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the news

Hadoop 100
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Pachyderm with Daniel Whitenack - Episode 1

Data Engineering Podcast

Summary Do you wish that you could track the changes in your data the same way that you track the changes in your code? Pachyderm is a platform for building a data lake with a versioned file system. It also lets you use whatever languages you want to run your analysis with its container based task graph. This week Daniel Whitenack shares the story of how the project got started, how it works under the covers, and how you can get started using it today!

Data Lake 100
article thumbnail

Introducing The Show

Data Engineering Podcast

Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes , or Google Play Music , share it on social media, and tell your friends and co-workers.

Media 100
article thumbnail

Re-Architecting Cash and Digital Wallet Payments for India with Uber Engineering

Uber Engineering

Uber is developing a payment platform for India that enables operations teams to more seamlessly collect and distribute cash and digital wallet payments to drivers. In this article, San Francisco-based software engineer Yijun Liu reflects on his experiences working with … The post Re-Architecting Cash and Digital Wallet Payments for India with Uber Engineering appeared first on Uber Engineering Blog.

article thumbnail

The Road to uChat: Building Uber’s Internal Chat Solution

Uber Engineering

Two years ago, Uber’s previous chat application began showing signs that it would not be able to adapt to our growth. There were app crashes, performance hiccups, and outages that crippled our company’s ability to effectively communicate online. With user … The post The Road to uChat: Building Uber’s Internal Chat Solution appeared first on Uber Engineering Blog.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Engineering Uber Predictions in Real Time with ELK

Uber Engineering

Uber’s services rely on the accuracy of our event prediction a n d f o r e c a s t i n g t o o l s. From estimating rider demand on a given date to predicting … The post Engineering Uber Predictions in Real Time with ELK appeared first on Uber Engineering Blog.

article thumbnail

Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform

Uber Engineering

Uber facilitates seamless and more enjoyable user experiences by channeling data from a variety of real-time sources. These insights range from in-the-moment traffic conditions that provide guidance on trip routes to the Estimated Time of Delivery (ETD) of an UberEATS … The post Introducing AthenaX, Uber Engineering’s Open Source Streaming Analytics Platform appeared first on Uber Engineering Blog.

article thumbnail

Engineering On-Demand Transportation for Business with Uber Central

Uber Engineering

When Uber launched in 2009, our mission was simple: make transportation as reliable as running water everywhere, for everyone. While our mission remains the same today, the number of Uber use cases have grown dramatically, motivating our engineers to think … The post Engineering On-Demand Transportation for Business with Uber Central appeared first on Uber Engineering Blog.

article thumbnail

Spaghetti and Marshmallows at Zalando: An Exercise to Inspire Deep Learning

Zalando Engineering

Some months ago I had the opportunity, with two fellow Zalandos, to organize the “Dortmund 5PM”; a gathering across all Dortmund teams, scheduled once a month on Fridays in our local event space. We want to foster further cross-team collaboration between individuals, making these meetings a memorable experience for all. We opted for running The Marshmallow Challenge ; a funny design exercise that encourages teams to experience simple yet profound lessons in collaboration, innovation, and creativ

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Recap of Hadoop News for June 2017

ProjectPro

News on Hadoop - June 2017 Hadoop Servers Expose Over 5 Petabytes of Data. BleepingComputer.com, June 2, 2017. According to John Matherly, the founder of Shodan, a search engine used for discovering IoT devices found that Hadoop installed improperly configured HDFS based servers exposed over 5 PB of information. He found approximately 4487 HDFS servers available without authentication through public IP addresses that in total exposed 5120 TB of data.The expert said that 47820 MongoDB servers exp

Hadoop 52
article thumbnail

Hadoop Cluster Overview: What it is and how to setup one?

ProjectPro

What is a Hadoop Cluster? In general, a computer cluster is a collection of various computers that work collectively as a single system. “A hadoop cluster is a collection of independent components connected through a dedicated network to work as a single centralized data processing resource. “ “A hadoop cluster can be referred to as a computational computer cluster for storing and analysing big data (structured, semi-structured and unstructured) in a distributed environment.

Hadoop 52
article thumbnail

Getting to Know Hadoop 3.0 -Features and Enhancements

ProjectPro

Hadoop was first made publicly available as an open source in 2011, since then it has undergone major changes in three different versions. Apache Hadoop 3 is round the corner with members of the Hadoop community at Apache Software Foundation still testing it. The major release of Hadoop 3.x is anticipated to be rolled out sometime mid of 2017. What else can be more exciting for the big data community than waiting for the release of a major new version of the tiny toy elephant?

Hadoop 52
article thumbnail

Signalling Your Jenkins Build Status with a Mini USB Traffic Light

Zalando Engineering

As part of an effort to increase developer awareness of quality, we wanted to draw attention the fact that you should have healthy CI builds. The normal procedure revolved around emails sent to the individuals who broke the build with their last commit. With almost all of us used to receiving a lot of email-noise throughout the day, this is not a channel where you can expect an immediate reaction.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.