November, 2017

article thumbnail

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Data Engineering Podcast

Summary With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats.

Hadoop 100
article thumbnail

Building a Big Data Culture

Cloudera

In an earlier VISION post, The Five Markers on Your Big Data Journey , Amy O’Connor shared some common traits of many of the most successful data-driven companies. In this blog, I’d like to explore what I believe is the most important of those traits, building and fostering a culture of data. . The most important elements to establishing a data-driven culture are having a strong executive sponsor and consistent communication.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Real-time Ranking with Apache Kafka’s Streams API

Zalando Engineering

Using Apache and the Kafka Streams API with Scala on AWS for real-time fashion insights This piece was originally published on confluent.io The Fashion Web Zalando, Europe’s leading online fashion platform, cares deeply about fashion. Our mission statement is to, “Reimagine fashion for the good of all”. To reimagine something, first you need to understand it.

Kafka 40
article thumbnail

Cybersecurity On Call: Information War with Bill Gertz

Cloudera

What’s more terrifying, knowing that you just lost your identity or unknowingly being manipulated? While they both seem awful, they are the reality of the digital world that we live in, just look at the news. With countless articles discussing the recent Equifax hack where thousands of social security numbers were compromised to organizations like Facebook, Google, and Twitter coming forward with Russian accounts that were buying ads to influence US elections.

AWS 41
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Introduction to Six Strategies for Advancing Customer Knowledge

Cloudera

It’s almost indefensible today to say that there is a single more important asset to a modern business than the health and happiness of their customers. We simply can not grow as an enterprise without the support, voice, and participation of our users and constituents. However, rarely are our customers raising their hands to speak to us. That doesn’t mean that our customers aren’t talking to us.

BI 40
article thumbnail

Cybersecurity in our Connected Future

Cloudera

According to expert analysis , there will be more than 20 billion internet-connected devices by 2020. This profusion of connected devices, of course, is not limited to the private sector: from weapons systems and soldier uniforms to smart military bases and connected vehicles, the government has been an early adopter of the Internet of Things as a means to enhance national defense.

Medical 40

More Trending

article thumbnail

The Future of Cloud-based Analytics (Part 3)

Cloudera

As the market moves toward cloud-based big data and analytics, three qualities emerge as vital for success. While many services will get some traction without meeting all three goals, they will also disappoint users and cause perpetual headaches for IT. At Cloudera, we see these undisputable attributes to be: Easy – Certainly no one goes out looking for a harder way to do their job.

Cloud 40
article thumbnail

Introducing Cloudera Altus Analytic DB (beta) for Cloud-based Data Warehousing

Cloudera

Today, we are thrilled to announce the upcoming beta release of Cloudera Altus Analytic DB. As the first data warehouse cloud service that brings the warehouse to the data, it delivers instant self-service BI and SQL analytics to anyone – easily, reliably, and securely. Business analysts get iterative and flexible analytics with no limits on the number of users or use cases, and IT can easily manage across all tenants with simplified security and governance.

Cloud 40
article thumbnail

Machine Learning, the DOCOMO Digital way: Two Core Use Cases

Cloudera

Pattern recognition. Anomaly detection. Event prediction. All of these capabilities are driven by machine learning (ML.) And recently, ML has been a hot topic among our clients. We’ve seen a steep uptick in companies highlighting their successes in ML. Cloudera offers a unified platform for analytics and machine learning. Each customer that we work with has a unique story, executing numerous uses cases, sometimes spanning multiple divisions, to gain insights that were impossible to achieve with

article thumbnail

Cloudera in the Cloud (Part 2)

Cloudera

A noteworthy point is that Cloudera complements popular cloud services, such as Amazon Web Services (AWS) and Microsoft Azure. While cloud services do provide useful resources — such as compute instances and object storage on demand — Cloudera offers the unified platform to organize, process, analyze, and store data at large scale… anywhere.

Cloud 40
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Why Event Driven?

Zalando Engineering

Zalando is using an event-driven approach for its new Fashion Platform. Conor Clifford examines why In a recent post , I wrote about how we went about building the core Article services and applications, of Zalando’s new Fashion Platform, with a strong event first focus. That new platform also has a strong overall event-driven focus, rather than a more “traditional” service-oriented approach.

article thumbnail

Do We Really Need UI Tests?

Zalando Engineering

Two brothers examine the pros and cons of UI testing Based on their different experiences in Partner Solutions and Zalando Media Solutions respectively, we speak to frontend developers, Vadym Kukhtin and Oleksandr Kukhtin about their opposing opinions on UI testing. The Case Against UI Testing - Vadym TL;DR It depends on preference, but I believe that UI testing isn’t required in every instance In my experience, it is a sisyphean task to force developers to write even basic Unit tests, nevermind

Media 40
article thumbnail

Dedicated Ownership for Teams at Zalon

Zalando Engineering

Agile Lead and Software Engineer at Zalon, Jan Helwich on how to work well At the beginning of 2017, we at Zalon decided to enable our teams to work in what we believe is the most effective and efficient way. At the heart of this restructuring process, we assigned cross-functional teams to business goals or user needs only and let them take full responsibility for solving these problems.

article thumbnail

Zalando Wins Big in Dublin

Zalando Engineering

Ana Peleteiro Ramallo takes ‘Data Scientist of the Year’ award at the DatSci’s There was a great turnout at the Dublin DatSci Awards at Dublin’s Croke Park, with Data Scientists from across companies, universities, startups and the public sector attending. Zalando Dublin had finalists in two major award categories, backed up with two tables for support.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Agile Fails

Zalando Engineering

Learning from and overcoming the issues that slow our teams down As Agile Coaches we meet a lot of teams, engineers, product people, leads and managers throughout our daily work. Not all of them are agile experts and we understand that there are some misconceptions about “agile”. When we encounter a misconception several times, we see it as a pattern that we can overcome.

article thumbnail

Buzzfeed Data Infrastructure with Walter Menendez - Episode 7

Data Engineering Podcast

Summary Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This lets them produce new content that will continue to be well-received. To surface the insights that they need to grow their business they need a robust data infrastructure to reliably capture all of those interactions.

article thumbnail

Running Kafka Streams applications in AWS

Zalando Engineering

Second in our series about the use of Apache Kafka’s Streams API by Zalando This is the second in a series about the use of Apache Kafka’s Streams API by Zalando, Europe’s leading online fashion platform. See Ranking Websites in Real-time with Apache Kafka’s Streams API for the first post in the series. This piece was first published on confluent.io Running Kafka Streams applications in AWS At Zalando, Europe’s leading online fashion platform, we use Apache Kafka for a wide variety of use cases.

Kafka 40
article thumbnail

Applying Data Science to Change Lives

Cloudera

My leadership experience has taught me many things so far; humility, patience, discipline, respect, and inclusion. All these qualities are applicable in martial arts, one of my many passions outside of work. I have also seen some of my early mentors integrate these values into the DNA of their teams and today, I aspire to do the same for the teams I lead in Cloudera.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri