Sat.Apr 24, 2021 - Fri.Apr 30, 2021

article thumbnail

How to Survive a Kafka Outage

Confluent

There is a class of applications that cannot afford to be unavailable—for example, external-facing entry points into your organization. Typically, anything your customers interact with directly cannot go down. As […].

Kafka 134
article thumbnail

People in Data (my favorite for Q1-2021) : Taylor Brownlow (Head of data @ Count)

François Nguyen

This is my second article on “Why do you find Data so interesting after all these years ?” and my anwser is always “it is not about the subject, it is about the people”. A distinctive and instantly-recognizable style I was reading this article “ Is the Tableau Era Coming to an End? ” with no author and long before the conclusion I was telling to myself “looks like an article from Taylor Brownlow” It is clearly not easy with so many authors on the Data topic to have a dist

BI 130
article thumbnail

Writing memory efficient data pipelines in Python

Start Data Engineering

Introduction 1. Using generators Using generator expression Using generator yield Mini batching Reading in batches from a database Pros & Cons 2. Using distributed frameworks Pros & Cons Conclusion Further reading References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fit into memory Then this post is for you.

article thumbnail

Self Service Data Exploration And Dashboarding With Superset

Data Engineering Podcast

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used methods of access is through a business intelligence dashboard. Superset is an open source option that has been gaining popularity due to its flexibility and extensible feature set. In this episode Maxime Beauchemin discusses how data engineers can use Superset to provide self service access to data and deliver analytics.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Announcing ksqlDB 0.17.0

Confluent

We’re excited to announce ksqlDB 0.17, a big release for 2021. This version adds support for managing the lifecycle of your queries from CI servers, a first-class timestamp data type, […].

article thumbnail

DataOps Enables Your Data Fabric

DataKitchen

Industry analysts who follow the data and analytics industry tell DataKitchen that they are receiving inquiries about “data fabrics” from enterprise clients on a near-daily basis. Forrester relates that out of 25,000 reports published by the firm last year, the report on data fabrics and DataOps ranked in the top ten for downloads in 2020. Gartner included data fabrics in their top ten trends for data and analytics in 2019.

More Trending

article thumbnail

Making Customer Experience Your Competitive Advantage

Teradata

Customers expect organizations to know them, provide relevant & personalized experiences, and be good stewards of their data. Yet many businesses still struggle with this. Why?

Data 91
article thumbnail

How to Maximize Your Time at Kafka Summit Europe 2021

Confluent

This past year has offered little in the way of normalcy for pretty much everyone outside of New Zealand and Taiwan. Rising to the occasion, conference organizers have put together […].

Kafka 81
article thumbnail

7 Things that Make SQLite Unique and Awesome

Grouparoo

I became very close with SQLite in the few weeks it took me to build out Grouparoo's SQLite plugin. Through that process I came to find that SQLite is not like the others. It has a handful of quirks, caveats, and gotchas when compared to other databases like MySQL and PostgreSQL. Here are seven of those quirks that I find most interesting: 1. SQLite is serverless SQLite doesn't require a separate process to run, as other databases do.

article thumbnail

Converting HBase ACLs to Ranger policies

Cloudera

CDP is using Apache Ranger for data security management. If you wish to utilize Ranger to have a centralized security administration, HBase ACLs need to be migrated to policies. This can be done via the Ranger webUI, accessible from Cloudera Manager. But first, let’s take a quick overview of HBase method for access control. HBase Authorization. If authorization is set up ( for example with Kerberos and setting the hbase.security.authorization property to true ), users can have rules defined on r

Finance 97
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Monte Carlo and Snowflake partner to help organizations achieve more trustworthy data

Monte Carlo

Monte Carlo, the data reliability company, today announced a partnership with Snowflake , the Data Cloud company, to help data teams trust their data and accelerate the adoption of analytics in the Data Cloud. This combination can provide Snowflake customers with end-to-end Data Observability across their entire Snowflake Data Cloud, from ingestion to analytics.

article thumbnail

The Techniques and Technologies Bringing Agility to Enterprise Data

DataKitchen

article thumbnail

Cats Effect 3: Racing IOs Explained

Rock the JVM

Following the introduction to concurrency in Cats Effect: explore advanced techniques for managing racing IOs and fibers

article thumbnail

The New Releases of Apache NiFi in Public Cloud and Private Cloud

Cloudera

Cloudera released a lot of things around Apache NiFi recently! We just released Cloudera Flow Management (CFM) 2.1.1 that provides Apache NiFi on top of Cloudera Data Platform (CDP) 7.1.6. This major release provides the latest and greatest of Apache NiFi as it includes Apache NiFi 1.13.2 and additional improvements, bug fixes, components, etc. Cloudera also released CDP 7.2.9 on all three major cloud platforms, and it also brings Flow Management on DataHub with Apache NiFi 1.13.2 and more.

Cloud 78
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Apple Migration Tips for M1 Macs

Grouparoo

Last week, I upgraded to a M1 Macbook Pro. I got it configured for development and 48 hours later, through a series of unfortunate events and hardware failure, I ended up with a second M1 Macbook Pro instead. The transition between computers wasn’t too bad thanks to Apple’s Migration Assistant. I ran into an interesting situation, though. About 90% of the migration worked as expected or better, but the other 10% presented some puzzling blockers.

article thumbnail

A Guide to DataOps Tests

DataKitchen

The post A Guide to DataOps Tests first appeared on DataKitchen.

52
article thumbnail

CFO Analytics – Machine Learning

Teradata

Once data issues are solved, we can focus on driving value leveraging analytics. Machine learning is a hot topic with CFOs today, but is it the right tool for CFO Analytics?

article thumbnail

Cable Companies Are Growing Up

Cloudera

Cable and Satellite companies in the US have emerged from a decade of acquisitions, consolidation and shakeout and are beginning to assert themselves as full service providers in the communications and media space. With Comcast just announcing its new suite of cellphone plans this month, and Charter , Altice and Dish ramping up their offerings, the Big Three in wireless – AT&T, Verizon and T-Mobile/Sprint – are looking over their shoulders.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Delivering End-to-End Data Trust with Snowflake and Monte Carlo

Monte Carlo

As companies increasingly leverage data-driven insights to drive innovation and maintain their competitive edge, it’s important that this data is accurate and reliable. With Monte Carlo and Snowflake’s strategic partnership, teams can finally trust their data through end-to-end Data Observability and automated lineage of their entire data ecosystem, all the way down to field-level values.

article thumbnail

Cloud Migration Series (Step 1 of 5): Define Your Strategy

Cloud Academy

This is part 1 of a 5-part series on best practices for enterprise cloud migration. Released weekly from the end of April to the end of May 2021, each article will cover a new phase of a business’s transition to the cloud, what to be on the lookout for, and how to ensure the journey is a success. If you’ve already locked in your strategy, have a look at what you should do next.

Cloud 40
article thumbnail

Leading Design as a UX Team of 1

Rockset

"Rockset is like magic. Holy carp making query lambdas and then using them on the server is so much easier than Mongo. True story. This is my first time in here. My coworker made our previous query lambda. I'm sold now! It's soooo fast!" Customer love notes keep me going. It is like the wind beneath my wings. I helped build query lambdas and I am so proud of the work we did - the value we created for our customers.

article thumbnail

Managing Python dependencies for Spark workloads in Cloudera Data Engineering

Cloudera

Apache Spark is now widely used in many enterprises for building high-performance ETL and Machine Learning pipelines. If the users are already familiar with Python then PySpark provides a python API for using Apache Spark. When users work with PySpark they often use existing python and/or custom Python packages in their program to extend and complement Apache Spark’s functionality.

Python 65
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Github Actions and AWS Fargate - Github Actions ECR

Preset

How the Apache Superset™ project uses AWS Fargate and Github actions for ephemeral test environments.

AWS 40
article thumbnail

The Journey to DataOps Success: Key Takeaways from Transformation Trailblazers

DataKitchen

In early April 2021, DataKItchen sat down with Jonathan Hodges, VP Data Management & Analytics, at Workiva ; Chuck Smith, VP of R&D Data Strategy at GlaxoSmithKline (GSK) ; and Chris Bergh, CEO and Head Chef at DataKitchen, to find out about their enterprise DataOps transformation journey, including key successes and lessons learned. You can listen to the entire conversation here or read the summary below.

article thumbnail

Flattening a JSON Object So It’s Queryable Using Rockset

Rockset

Many developers use NoSQL databases in order to ingest unstructured and schemaless data. When it comes to understanding the data by writing queries that join, aggregate, and search, it becomes more challenging. This is where Rockset becomes a great partner not only in understanding your unstructured data but in returning queries that join, aggregate, and search within milliseconds at scale.

MongoDB 40
article thumbnail

How Airbnb Achieved Metric Consistency at Scale

Airbnb Tech

Part-I: Introducing Minerva — Airbnb’s Metric Platform By : Amit Pahwa , Cristian Figueroa , Donghan Zhang , Haim Grosman , John Bodley , Jonathan Parks , Maggie Zhu , Philip Weiss , Robert Chang , Shao Xie , Sylvia Tomiyama , Xiaohui Sun Data is the voice of our users at scale. In the midst of the COVID-19 pandemic, we saw that travel with Airbnb has become hyper-local.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

How Apache Superset™ Supports Real-Time Analytics

Preset

At the core of the Apache Superset™ project is real-time analytics. In this post, we showcase the key features that support real-time analytics.

Project 40