This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Deep learning forms the backbone of modern day artificial intelligence. Learn more about the important aspects of this connection with this freely available course.
My guest this week is Drew Smith , Vice President of Global Data and Analytics at Little Caesars Enterprises and Ilitch Companies. Little Caesars is a pizza franchise that is mainly in the United States. Illitch Companies owns the Detroit Tigers (baseball), Detroit Red Wings (hockey), and several stadiums. Before that, Drew worked at International Institute for Analytics (IIA), an analytics consulting company, and IKEA, the furniture retailer and manufacturer.
Cloudera has appointed Remus Lim as vice president of Asia Pacific and Japan, to drive adoption of the hybrid data platform across the region and support customers in their journey to become more data-driven. We’ve asked him to share his vision for Cloudera in APAC and reflect on his past few months since taking up the mantle. What drew you to the tech space and attracted you to the roles you’ve held?
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
Summary Building and maintaining reliable data assets is the prime directive for data engineers. While it is easy to say, it is endlessly complex to implement, requiring data professionals to be experts in a wide range of disparate topics while designing and implementing complex topologies of information workflows. In order to make this a tractable problem it is essential that engineers embrace automation at every opportunity.
The need for data fabric. As Cloudera CMO David Moxey outlined in his blog , we live in a hybrid data world. Data is growing and continues to accelerate its growth. It is changing in makeup and appearing in ever more places. Driving insight and value from it all is as much of an opportunity as it is a challenge. As a result, it’s getting ??progressively more complex for businesses to access, use, and create value from it.
The need for data fabric. As Cloudera CMO David Moxey outlined in his blog , we live in a hybrid data world. Data is growing and continues to accelerate its growth. It is changing in makeup and appearing in ever more places. Driving insight and value from it all is as much of an opportunity as it is a challenge. As a result, it’s getting ??progressively more complex for businesses to access, use, and create value from it.
In a world that creates 1.145 trillion MB of data per day , change is the only constant. With brand new information being seeded every other second, businesses are evolving at the speed of light. Where there’s data, there’s analytics, and thus, the demand for skilled Business Analysts. Data enthusiasts have stumbled across enough facts and figures to know what’s trending and what’s needed to master these trends, which is why over 1,000 learners hit the road to becoming highly sought-after Busine
Summary Building a data platform is a journey, not a destination. Beyond the work of assembling a set of technologies and building integrations across them, there is also the work of growing and organizing a team that can support and benefit from that platform. In this episode Inbar Yogev and Lior Winner share the journey that they and their teams at Riskified have been on for their data platform.
In this article we discuss the various methods to replicate HBase data and explore why Replication Manager is the best choice for the job with the help of a use case. Cloudera Replication Manager is a key Cloudera Data Platform (CDP) service, designed to copy and migrate data between environments and infrastructures across hybrid clouds. The service provides simple, easy-to-use, and feature-rich data movement capability to deliver data and metadata where it is needed, and has secure data backup
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
There are many reasons you, as an analytics engineer, may want to capture the complete version history of data: You’re in an industry with a very high standard for data governance You need to track big OKRs over time to report back to your stakeholders You want to build a window to view history with both forward and backward compatibility These are often high-stakes situations!
Machine learning, big data analytics or AI may steal the headlines, but if you want to hone a smart, strategic skill that can elevate your career, look no further than SQL.
How Confluent built Intelligent Storage, for 10x more scalable and elastic Kafka storage with infinite retention, max cluster uptime, and zero operational burdens.
The supply chain disruptions affecting the UK retail market prompted me to have another think about shrink. There are many elements of retail shrink, or rather their knock-on effects, that are barely discussed and yet they represent a significant opportunity to reduce costs as well as recover the all-important margin. Fixing these problems will in turn deliver a more rounded customer service and help meet ESG objectives that are of growing reputational importance.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
A job profile famous for being among the highest-paying careers, Product Management is a very serious business. It requires the best of the best candidates who are always on their feet, ready to adorn a new hat to fit a new role that the situation demands. It’s almost heroic, the way a product manager functions. Just like you, three highly-energetic product management enthusiasts dreamed a little dream about making it big in this exciting domain.
Hi, I’m Pasha Finkelshteyn , and I’ll be your guide today through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community. If you think I missed something worthwhile, catch me on Twitter and suggest a topic, link, or anything else you want to see. By the way, if you would prefer to get this monthly source of data engineering information delivered straight to your inbox each month, you can subscribe t
Data self-service, the ability for stakeholders in the organization to answer their own business questions with data, is a top initiative for nearly every data leader I’ve spoken to this year. It’s so foundational to creating a data-driven organization, that most of the questions surrounding it focus on the “when” rather than the “why.” That’s why we were surprised to hear it became such a passionate debate at our New York IMPACT event , which included some of the area’s top data leaders w
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Today, much work goes into creating every product we use, whether the kitchenware or the branded watches we wear. Throughout the product life cycle, from product creation to customer service, it needs extra effort and meticulous work to succeed in the market today. Due to the ever-growing product market, businesses have taken it upon themselves to ensure their product establishes their value in the market.
Hi, I’m Pasha Finkelshteyn , and I’ll be your guide today through this month’s news. I’ll offer my impressions of recent developments in the data engineering space and highlight new ideas from the wider community. If you think I missed something worthwhile, catch me on Twitter and suggest a topic, link, or anything else you want to see. By the way, if you would prefer to get this monthly source of data engineering information delivered straight to your inbox each month, you can subscribe t
Within data engineering , one of the most frequent tasks is modeling data into data marts and data products. Traditionally, if a company wanted to model some data, there were two types of resources available that could accomplish this: data engineers or data analysts. However, the approach that a data engineer would take to model data vs. a data analyst looks very different.
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Why Do Businesses Need Analytics? Tim Berners-Lee, the inventor of the World Wide Web, once said, “Data is a precious thing, and it will last longer than the systems themselves.” Thirty-three years after the launch of the Web, his words still stand true. In fact, in 2022, data is the new diamond for businesses all across the globe. But, to tell a diamond from a stone, you need a high level of expertise and in-depth knowledge.
Generalizing things is easy for us humans, however, it can be challenging for Machine Learning models. This is where Cross-Validation comes into the picture.
Users are increasingly recognizing that data decay and temporal depreciation are major risks for businesses, consequently building solutions with low data latency, schemaless ingestion and fast query performance using SQL, such as provided by Rockset, becomes more essential. Rockset provides the ability to JOIN data across multiple collections using familiar SQL join types, such as INNER , OUTER , LEFT and RIGHT join.
As analytics professionals, we deal in data: serving ad-hoc reports on a minute’s notice, pulling queries for executives, and generally forecasting company performance across a variety of metrics. But how can we be truly successful if we don’t measure our own performance, too? In this article, we discuss six important steps to setting goals for our own data teams, from taking time for exploration to avoiding vanity stats while maintaining a constant pulse on the *actual* needs of the business.
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
In general, data people prefer the more granular over the less granular. Timestamps > dates , daily data > weekly data, etc.; having data at a more granular level always allows you to zoom in. However, you’re likely looking at your data at a somewhat zoomed-out level—weekly, monthly, or even yearly. To do that, you’re going to need a handy dandy function that helps you round out date or time fields.
With Confluent's new Bangalore office, India’s Head of HR shares her experience with Confluent’s team culture, fast growth, personal and career development, and why it’s a great place to work.
Data Science is only for persons with an IT background. It is a persistent myth that many people believe. Although it is true that some IT professionals seek to advance their skills in analytics, this field is not only open to people with a background in programming and IT. Many successful Data Scientists began their Data Science careers without prior coding knowledge or IT experience.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content