This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Summary With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats.
In an earlier VISION post, The Five Markers on Your Big Data Journey , Amy O’Connor shared some common traits of many of the most successful data-driven companies. In this blog, I’d like to explore what I believe is the most important of those traits, building and fostering a culture of data. . The most important elements to establishing a data-driven culture are having a strong executive sponsor and consistent communication.
Using Apache and the Kafka Streams API with Scala on AWS for real-time fashion insights This piece was originally published on confluent.io The Fashion Web Zalando, Europe’s leading online fashion platform, cares deeply about fashion. Our mission statement is to, “Reimagine fashion for the good of all”. To reimagine something, first you need to understand it.
What’s more terrifying, knowing that you just lost your identity or unknowingly being manipulated? While they both seem awful, they are the reality of the digital world that we live in, just look at the news. With countless articles discussing the recent Equifax hack where thousands of social security numbers were compromised to organizations like Facebook, Google, and Twitter coming forward with Russian accounts that were buying ads to influence US elections.
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
It’s almost indefensible today to say that there is a single more important asset to a modern business than the health and happiness of their customers. We simply can not grow as an enterprise without the support, voice, and participation of our users and constituents. However, rarely are our customers raising their hands to speak to us. That doesn’t mean that our customers aren’t talking to us.
According to expert analysis , there will be more than 20 billion internet-connected devices by 2020. This profusion of connected devices, of course, is not limited to the private sector: from weapons systems and soldier uniforms to smart military bases and connected vehicles, the government has been an early adopter of the Internet of Things as a means to enhance national defense.
Cloudera Data Science Workbench (CDSW) is a self-service collaboration platform for data scientists. It offers: Secure access to Cloudera data. On-demand compute. Support for Python, R, and Scala. Workflow automation, version control, and sharing. GPU acceleration for deep learning on demand. Now, with Release 1.2, CDSW is easier than ever to deploy and manage.
Cloudera Data Science Workbench (CDSW) is a self-service collaboration platform for data scientists. It offers: Secure access to Cloudera data. On-demand compute. Support for Python, R, and Scala. Workflow automation, version control, and sharing. GPU acceleration for deep learning on demand. Now, with Release 1.2, CDSW is easier than ever to deploy and manage.
As the market moves toward cloud-based big data and analytics, three qualities emerge as vital for success. While many services will get some traction without meeting all three goals, they will also disappoint users and cause perpetual headaches for IT. At Cloudera, we see these undisputable attributes to be: Easy – Certainly no one goes out looking for a harder way to do their job.
Today, we are thrilled to announce the upcoming beta release of Cloudera Altus Analytic DB. As the first data warehouse cloud service that brings the warehouse to the data, it delivers instant self-service BI and SQL analytics to anyone – easily, reliably, and securely. Business analysts get iterative and flexible analytics with no limits on the number of users or use cases, and IT can easily manage across all tenants with simplified security and governance.
Pattern recognition. Anomaly detection. Event prediction. All of these capabilities are driven by machine learning (ML.) And recently, ML has been a hot topic among our clients. We’ve seen a steep uptick in companies highlighting their successes in ML. Cloudera offers a unified platform for analytics and machine learning. Each customer that we work with has a unique story, executing numerous uses cases, sometimes spanning multiple divisions, to gain insights that were impossible to achieve with
A noteworthy point is that Cloudera complements popular cloud services, such as Amazon Web Services (AWS) and Microsoft Azure. While cloud services do provide useful resources — such as compute instances and object storage on demand — Cloudera offers the unified platform to organize, process, analyze, and store data at large scale… anywhere.
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Zalando is using an event-driven approach for its new Fashion Platform. Conor Clifford examines why In a recent post , I wrote about how we went about building the core Article services and applications, of Zalando’s new Fashion Platform, with a strong event first focus. That new platform also has a strong overall event-driven focus, rather than a more “traditional” service-oriented approach.
Two brothers examine the pros and cons of UI testing Based on their different experiences in Partner Solutions and Zalando Media Solutions respectively, we speak to frontend developers, Vadym Kukhtin and Oleksandr Kukhtin about their opposing opinions on UI testing. The Case Against UI Testing - Vadym TL;DR It depends on preference, but I believe that UI testing isn’t required in every instance In my experience, it is a sisyphean task to force developers to write even basic Unit tests, nevermind
Agile Lead and Software Engineer at Zalon, Jan Helwich on how to work well At the beginning of 2017, we at Zalon decided to enable our teams to work in what we believe is the most effective and efficient way. At the heart of this restructuring process, we assigned cross-functional teams to business goals or user needs only and let them take full responsibility for solving these problems.
Ana Peleteiro Ramallo takes ‘Data Scientist of the Year’ award at the DatSci’s There was a great turnout at the Dublin DatSci Awards at Dublin’s Croke Park, with Data Scientists from across companies, universities, startups and the public sector attending. Zalando Dublin had finalists in two major award categories, backed up with two tables for support.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Learning from and overcoming the issues that slow our teams down As Agile Coaches we meet a lot of teams, engineers, product people, leads and managers throughout our daily work. Not all of them are agile experts and we understand that there are some misconceptions about “agile”. When we encounter a misconception several times, we see it as a pattern that we can overcome.
Summary Buzzfeed needs to be able to understand how its users are interacting with the myriad articles, videos, etc. that they are posting. This lets them produce new content that will continue to be well-received. To surface the insights that they need to grow their business they need a robust data infrastructure to reliably capture all of those interactions.
Second in our series about the use of Apache Kafka’s Streams API by Zalando This is the second in a series about the use of Apache Kafka’s Streams API by Zalando, Europe’s leading online fashion platform. See Ranking Websites in Real-time with Apache Kafka’s Streams API for the first post in the series. This piece was first published on confluent.io Running Kafka Streams applications in AWS At Zalando, Europe’s leading online fashion platform, we use Apache Kafka for a wide variety of use cases.
My leadership experience has taught me many things so far; humility, patience, discipline, respect, and inclusion. All these qualities are applicable in martial arts, one of my many passions outside of work. I have also seen some of my early mentors integrate these values into the DNA of their teams and today, I aspire to do the same for the teams I lead in Cloudera.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content