Sat.Apr 24, 2021 - Fri.Apr 30, 2021

article thumbnail

How to Survive a Kafka Outage

Confluent

There is a class of applications that cannot afford to be unavailable—for example, external-facing entry points into your organization. Typically, anything your customers interact with directly cannot go down. As […].

Kafka 132
article thumbnail

Writing memory efficient data pipelines in Python

Start Data Engineering

Introduction 1. Using generators Using generator expression Using generator yield Mini batching Reading in batches from a database Pros & Cons 2. Using distributed frameworks Pros & Cons Conclusion Further reading References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fit into memory Then this post is for you.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Self Service Data Exploration And Dashboarding With Superset

Data Engineering Podcast

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used methods of access is through a business intelligence dashboard. Superset is an open source option that has been gaining popularity due to its flexibility and extensible feature set. In this episode Maxime Beauchemin discusses how data engineers can use Superset to provide self service access to data and deliver analytics.

article thumbnail

DataOps Enables Your Data Fabric

DataKitchen

Industry analysts who follow the data and analytics industry tell DataKitchen that they are receiving inquiries about “data fabrics” from enterprise clients on a near-daily basis. Forrester relates that out of 25,000 reports published by the firm last year, the report on data fabrics and DataOps ranked in the top ten for downloads in 2020. Gartner included data fabrics in their top ten trends for data and analytics in 2019.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Announcing ksqlDB 0.17.0

Confluent

We’re excited to announce ksqlDB 0.17, a big release for 2021. This version adds support for managing the lifecycle of your queries from CI servers, a first-class timestamp data type, […].

article thumbnail

#ClouderaLife Spotlight: Suzy Tonini, Talent Researcher

Cloudera

As we continue to work toward diversity, equality, and inclusion in every aspect of our company culture and beyond, we’ve learned so much from our employees’ unique perspectives on allyship. One such employee is Suzy Tonini, a Talent Researcher with a globe-trotting childhood. Growing up with parents who worked for the U.S. State Department, Suzy had the opportunity to hop from country to country with her family, experiencing a variety of cultures. .

More Trending

article thumbnail

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

In early April 2021, DataKItchen sat down with Jonathan Hodges, VP Data Management & Analytics, at Workiva ; Chuck Smith, VP of R&D Data Strategy at GlaxoSmithKline (GSK) ; and Chris Bergh, CEO and Head Chef at DataKitchen, to find out about their enterprise DataOps transformation journey, including key successes and lessons learned. You can listen to the entire conversation here or read the summary below.

article thumbnail

How to Maximize Your Time at Kafka Summit Europe 2021

Confluent

This past year has offered little in the way of normalcy for pretty much everyone outside of New Zealand and Taiwan. Rising to the occasion, conference organizers have put together […].

Kafka 81
article thumbnail

Converting HBase ACLs to Ranger policies

Cloudera

CDP is using Apache Ranger for data security management. If you wish to utilize Ranger to have a centralized security administration, HBase ACLs need to be migrated to policies. This can be done via the Ranger webUI, accessible from Cloudera Manager. But first, let’s take a quick overview of HBase method for access control. HBase Authorization. If authorization is set up ( for example with Kerberos and setting the hbase.security.authorization property to true ), users can have rules defined on r

Finance 95
article thumbnail

How Airbnb Achieved Metric Consistency at Scale

Airbnb Tech

Part-I: Introducing Minerva — Airbnb’s Metric Platform By : Amit Pahwa , Cristian Figueroa , Donghan Zhang , Haim Grosman , John Bodley , Jonathan Parks , Maggie Zhu , Philip Weiss , Robert Chang , Shao Xie , Sylvia Tomiyama , Xiaohui Sun Data is the voice of our users at scale. In the midst of the COVID-19 pandemic, we saw that travel with Airbnb has become hyper-local.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

7 Things that Make SQLite Unique and Awesome

Grouparoo

I became very close with SQLite in the few weeks it took me to build out Grouparoo's SQLite plugin. Through that process I came to find that SQLite is not like the others. It has a handful of quirks, caveats, and gotchas when compared to other databases like MySQL and PostgreSQL. Here are seven of those quirks that I find most interesting: 1. SQLite is serverless SQLite doesn't require a separate process to run, as other databases do.

article thumbnail

Monte Carlo and Snowflake partner to help organizations achieve more trustworthy data

Monte Carlo

Monte Carlo, the data reliability company, today announced a partnership with Snowflake , the Data Cloud company, to help data teams trust their data and accelerate the adoption of analytics in the Data Cloud. This combination can provide Snowflake customers with end-to-end Data Observability across their entire Snowflake Data Cloud, from ingestion to analytics.

article thumbnail

The New Releases of Apache NiFi in Public Cloud and Private Cloud

Cloudera

Cloudera released a lot of things around Apache NiFi recently! We just released Cloudera Flow Management (CFM) 2.1.1 that provides Apache NiFi on top of Cloudera Data Platform (CDP) 7.1.6. This major release provides the latest and greatest of Apache NiFi as it includes Apache NiFi 1.13.2 and additional improvements, bug fixes, components, etc. Cloudera also released CDP 7.2.9 on all three major cloud platforms, and it also brings Flow Management on DataHub with Apache NiFi 1.13.2 and more.

Cloud 77
article thumbnail

The Techniques and Technologies Bringing Agility to Enterprise Data

DataKitchen

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Cats Effect 3: Racing IOs Explained

Rock the JVM

Following the introduction to concurrency in Cats Effect: explore advanced techniques for managing racing IOs and fibers

article thumbnail

Apple Migration Tips for M1 Macs

Grouparoo

Last week, I upgraded to a M1 Macbook Pro. I got it configured for development and 48 hours later, through a series of unfortunate events and hardware failure, I ended up with a second M1 Macbook Pro instead. The transition between computers wasn’t too bad thanks to Apple’s Migration Assistant. I ran into an interesting situation, though. About 90% of the migration worked as expected or better, but the other 10% presented some puzzling blockers.

article thumbnail

Cable Companies Are Growing Up

Cloudera

Cable and Satellite companies in the US have emerged from a decade of acquisitions, consolidation and shakeout and are beginning to assert themselves as full service providers in the communications and media space. With Comcast just announcing its new suite of cellphone plans this month, and Charter , Altice and Dish ramping up their offerings, the Big Three in wireless – AT&T, Verizon and T-Mobile/Sprint – are looking over their shoulders.

article thumbnail

A Guide to DataOps Tests

DataKitchen

The post A Guide to DataOps Tests first appeared on DataKitchen.

52
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

CFO Analytics – Machine Learning

Teradata

Once data issues are solved, we can focus on driving value leveraging analytics. Machine learning is a hot topic with CFOs today, but is it the right tool for CFO Analytics?

article thumbnail

Delivering End-to-End Data Trust with Snowflake and Monte Carlo

Monte Carlo

As companies increasingly leverage data-driven insights to drive innovation and maintain their competitive edge, it’s important that this data is accurate and reliable. With Monte Carlo and Snowflake’s strategic partnership, teams can finally trust their data through end-to-end Data Observability and automated lineage of their entire data ecosystem, all the way down to field-level values.

article thumbnail

Managing Python dependencies for Spark workloads in Cloudera Data Engineering

Cloudera

Apache Spark is now widely used in many enterprises for building high-performance ETL and Machine Learning pipelines. If the users are already familiar with Python then PySpark provides a python API for using Apache Spark. When users work with PySpark they often use existing python and/or custom Python packages in their program to extend and complement Apache Spark’s functionality.

Python 65
article thumbnail

Cloud Migration Series (Step 1 of 5): Define Your Strategy

Cloud Academy

This is part 1 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. If you’ve already locked in your strategy, have a look at what you should do next.

Cloud 40
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Flattening a JSON Object So It’s Queryable Using Rockset

Rockset

Many developers use NoSQL databases in order to ingest unstructured and schemaless data. When it comes to understanding the data by writing queries that join, aggregate, and search, it becomes more challenging. This is where Rockset becomes a great partner not only in understanding your unstructured data but in returning queries that join, aggregate, and search within milliseconds at scale.

MongoDB 40
article thumbnail

Github Actions and AWS Fargate - Github Actions ECR

Preset

How the Apache Superset™ project uses AWS Fargate and Github actions for ephemeral test environments.

AWS 40
article thumbnail

People in Data (my favorite for Q1-2021) : Taylor Brownlow (Head of data @ Count)

François Nguyen

This is my second article on “Why do you find Data so interesting after all these years ?” and my anwser is always “it is not about the subject, it is about the people”. A distinctive and instantly-recognizable style I was reading this article “ Is the Tableau Era Coming to an End? ” with no author and long before the conclusion I was telling to myself “looks like an article from Taylor Brownlow” It is clearly not easy with so many authors on the Data topic to have a dist

BI 130
article thumbnail

How Apache Superset™ Supports Real-Time Analytics

Preset

At the core of the Apache Superset™ project is real-time analytics. In this post, we showcase the key features that support real-time analytics.

Project 40
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Leading Design as a UX Team of 1

Rockset

"Rockset is like magic. Holy carp making query lambdas and then using them on the server is so much easier than Mongo. True story. This is my first time in here. My coworker made our previous query lambda. I'm sold now! It's soooo fast!" Customer love notes keep me going. It is like the wind beneath my wings. I helped build query lambdas and I am so proud of the work we did - the value we created for our customers.