November, 2017

article thumbnail

Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8

Data Engineering Podcast

Summary With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats.

Hadoop 100
article thumbnail

Building a Big Data Culture

Cloudera

In an earlier VISION post, The Five Markers on Your Big Data Journey , Amy O’Connor shared some common traits of many of the most successful data-driven companies. In this blog, I’d like to explore what I believe is the most important of those traits, building and fostering a culture of data. . The most important elements to establishing a data-driven culture are having a strong executive sponsor and consistent communication.

article thumbnail

Running Kafka Streams applications in AWS

Zalando Engineering

Second in our series about the use of Apache Kafka’s Streams API by Zalando This is the second in a series about the use of Apache Kafka’s Streams API by Zalando, Europe’s leading online fashion platform. See Ranking Websites in Real-time with Apache Kafka’s Streams API for the first post in the series. This piece was first published on confluent.io Running Kafka Streams applications in AWS At Zalando, Europe’s leading online fashion platform, we use Apache Kafka for a wide variety of use cases.

Kafka 40
article thumbnail

Buzzfeed Data Infrastructure with Walter Menendez - Episode 7

Data Engineering Podcast

Summary Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This lets them produce new content that will continue to be well-received. To surface the insights that they need to grow their business they need a robust data infrastructure to reliably capture all of those interactions.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Cybersecurity On Call: Information War with Bill Gertz

Cloudera

What’s more terrifying, knowing that you just lost your identity or unknowingly being manipulated? While they both seem awful, they are the reality of the digital world that we live in, just look at the news. With countless articles discussing the recent Equifax hack where thousands of social security numbers were compromised to organizations like Facebook, Google, and Twitter coming forward with Russian accounts that were buying ads to influence US elections.

AWS 41
article thumbnail

Introduction to Six Strategies for Advancing Customer Knowledge

Cloudera

It’s almost indefensible today to say that there is a single more important asset to a modern business than the health and happiness of their customers. We simply can not grow as an enterprise without the support, voice, and participation of our users and constituents. However, rarely are our customers raising their hands to speak to us. That doesn’t mean that our customers aren’t talking to us.

BI 41

More Trending

article thumbnail

Now Available: Cloudera Data Science Workbench Release 1.2

Cloudera

Cloudera Data Science Workbench (CDSW) is a self-service collaboration platform for data scientists. It offers: Secure access to Cloudera data. On-demand compute. Support for Python, R, and Scala. Workflow automation, version control, and sharing. GPU acceleration for deep learning on demand. Now, with Release 1.2, CDSW is easier than ever to deploy and manage.

article thumbnail

The Future of Cloud-based Analytics (Part 3)

Cloudera

As the market moves toward cloud-based big data and analytics, three qualities emerge as vital for success. While many services will get some traction without meeting all three goals, they will also disappoint users and cause perpetual headaches for IT. At Cloudera, we see these undisputable attributes to be: Easy – Certainly no one goes out looking for a harder way to do their job.

Cloud 40
article thumbnail

Introducing Cloudera Altus Analytic DB (beta) for Cloud-based Data Warehousing

Cloudera

Today, we are thrilled to announce the upcoming beta release of Cloudera Altus Analytic DB. As the first data warehouse cloud service that brings the warehouse to the data, it delivers instant self-service BI and SQL analytics to anyone – easily, reliably, and securely. Business analysts get iterative and flexible analytics with no limits on the number of users or use cases, and IT can easily manage across all tenants with simplified security and governance.

Cloud 40
article thumbnail

Machine Learning, the DOCOMO Digital way: Two Core Use Cases

Cloudera

Pattern recognition. Anomaly detection. Event prediction. All of these capabilities are driven by machine learning (ML.) And recently, ML has been a hot topic among our clients. We’ve seen a steep uptick in companies highlighting their successes in ML. Cloudera offers a unified platform for analytics and machine learning. Each customer that we work with has a unique story, executing numerous uses cases, sometimes spanning multiple divisions, to gain insights that were impossible to achieve with

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Cloudera in the Cloud (Part 2)

Cloudera

A noteworthy point is that Cloudera complements popular cloud services, such as Amazon Web Services (AWS) and Microsoft Azure. While cloud services do provide useful resources — such as compute instances and object storage on demand — Cloudera offers the unified platform to organize, process, analyze, and store data at large scale… anywhere.

Cloud 40
article thumbnail

Real-time Ranking with Apache Kafka’s Streams API

Zalando Engineering

Using Apache and the Kafka Streams API with Scala on AWS for real-time fashion insights This piece was originally published on confluent.io The Fashion Web Zalando, Europe’s leading online fashion platform, cares deeply about fashion. Our mission statement is to, “Reimagine fashion for the good of all”. To reimagine something, first you need to understand it.

Kafka 40
article thumbnail

Why Event Driven?

Zalando Engineering

Zalando is using an event-driven approach for its new Fashion Platform. Conor Clifford examines why In a recent post , I wrote about how we went about building the core Article services and applications, of Zalando’s new Fashion Platform, with a strong event first focus. That new platform also has a strong overall event-driven focus, rather than a more “traditional” service-oriented approach.

article thumbnail

Do We Really Need UI Tests?

Zalando Engineering

Two brothers examine the pros and cons of UI testing Based on their different experiences in Partner Solutions and Zalando Media Solutions respectively, we speak to frontend developers, Vadym Kukhtin and Oleksandr Kukhtin about their opposing opinions on UI testing. The Case Against UI Testing - Vadym TL;DR It depends on preference, but I believe that UI testing isn’t required in every instance In my experience, it is a sisyphean task to force developers to write even basic Unit tests, nevermind

Media 40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Dedicated Ownership for Teams at Zalon

Zalando Engineering

Agile Lead and Software Engineer at Zalon, Jan Helwich on how to work well At the beginning of 2017, we at Zalon decided to enable our teams to work in what we believe is the most effective and efficient way. At the heart of this restructuring process, we assigned cross-functional teams to business goals or user needs only and let them take full responsibility for solving these problems.

article thumbnail

Zalando Wins Big in Dublin

Zalando Engineering

Ana Peleteiro Ramallo takes ‘Data Scientist of the Year’ award at the DatSci’s There was a great turnout at the Dublin DatSci Awards at Dublin’s Croke Park, with Data Scientists from across companies, universities, startups and the public sector attending. Zalando Dublin had finalists in two major award categories, backed up with two tables for support.

article thumbnail

Agile Fails

Zalando Engineering

Learning from and overcoming the issues that slow our teams down As Agile Coaches we meet a lot of teams, engineers, product people, leads and managers throughout our daily work. Not all of them are agile experts and we understand that there are some misconceptions about “agile”. When we encounter a misconception several times, we see it as a pattern that we can overcome.

article thumbnail

Applying Data Science to Change Lives

Cloudera

My leadership experience has taught me many things so far; humility, patience, discipline, respect, and inclusion. All these qualities are applicable in martial arts, one of my many passions outside of work. I have also seen some of my early mentors integrate these values into the DNA of their teams and today, I aspire to do the same for the teams I lead in Cloudera.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?