Sat.Jan 16, 2021 - Fri.Jan 22, 2021

article thumbnail

Helpful Tools for Apache Kafka Developers

Confluent

Apache Kafka® is at the core of a large ecosystem that includes powerful components, such as Kafka Connect and Kafka Streams. This ecosystem also includes many tools and utilities that […].

Kafka 130
article thumbnail

The last (but not least)”ops” you need for your data : DataGovops

François Nguyen

To finish the trilogy (Dataops, MLops), let’s talk about DataGovOps or how you can support your Data Governance initiative. The origin of the term : Datakitchen We must give credit to Chris Bergh and his team DataKictchen. You should visit their website , you will find incredible good stuff there. This article was published in October 2020 with this title : “Data Governance as Code” The idea behind that is you should “actively promotes the safe use of data with automation

article thumbnail

How to unit test sql transforms in dbt

Start Data Engineering

Introduction Setup Code Conditional logic to read from mock input Custom macro to test for equality Setup environment specific test Run ELT using dbt Conclusion Further reading Introduction With the recent advancements in data warehouses and tools like dbt most transformations(T of ELT) are being done directly in the data warehouse. While this provides a lot of functionality out of the box, it gets tricky when you want to test your sql code locally before deploying to production.

SQL 130
article thumbnail

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 3: Productionization of ML models

Cloudera

In this last installment, we’ll discuss a demo application that uses PySpark.ML to make a classification model based off of training data stored in both Cloudera’s Operational Database (powered by Apache HBase) and Apache HDFS. Afterwards, this model is then scored and served through a simple Web Application. For more context, this demo is based on concepts discussed in this blog post How to deploy ML models to production.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Optimizing the Aural Experience on Android Devices with xHE-AAC

Netflix Tech

By Phill Williams and Vijay Gondi Introduction At Netflix, we are passionate about delivering great audio to our members. We began streaming 5.1 channel surround sound in 2010, Dolby Atmos in 2017 , and adaptive bitrate audio in 2019. Continuing in this tradition, we are proud to announce that Netflix now streams Extended HE-AAC with MPEG-D DRC ( xHE-AAC ) to compatible Android Mobile devices (Android 9 and newer).

Metadata 110
article thumbnail

Using Your Data Warehouse As The Source Of Truth For Customer Data With Hightouch

Data Engineering Podcast

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a platform that synchronizes information about your customers out to third party systems for use by marketing and sales teams. In this episode Tejas Manohar explains the benefits of sourcing customer data from one location for all of your organization to use, the technical challenges of synchronizing the data to external systems with varying APIs, and

More Trending

article thumbnail

Digital Transformation is a Data Journey From Edge to Insight

Cloudera

Digital transformation is a hot topic for all markets and industries as it’s delivering value with explosive growth rates. Consider that Manufacturing’s Industry Internet of Things (IIOT) was valued at $161b with an impressive 25% growth rate, the Connected Car market will be valued at $225b by 2027 with a 17% growth rate, or that in the first three months of 2020, retailers realized ten years of digital sales penetration in just three months.

article thumbnail

Do You Need a DataOps Dojo?

DataKitchen

As DataOps activity takes root within an enterprise, managers face the question of whether to build centralized or decentralized DataOps capabilities. Centralizing analytics brings it under control but granting analysts free reign is necessary to foster innovation and stay competitive. The beauty of DataOps is that you don’t have to choose between centralization and freedom.

article thumbnail

What is the Business Case for Delivering a Good Customer Experience at Your Bank?

Teradata

Most banks talk about developing great customer experiences but don't understand the value that investment would deliver. Learn about the 6 key capabilities banks require to address this problem.

Banking 59
article thumbnail

Event Streaming Across Networks and Corporate Firewalls Using PubNub and Confluent Platform

Confluent

This year’s pandemic has forced businesses all around the world to adopt a “remote-first” approach to their operations, with an emphasis on better enabling collaboration, remote work, and productivity. This […].

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Fostering community to help drive cultural change

Cloudera

2020 put on full display how humanity shows up in times of hardship. We saw everything from street celebrations to usher weary medical personnel home after long days fighting to save lives to places like food banks receiving more donations and volunteers than ever before. Some communities were harder hit than others, and we’ve seen the same in the global workplace.

Food 106
article thumbnail

Demo: Supercharging Data Engineering with Magpie for Snowflake®

Silectis

For those using a robust analytics database, such as the Snowflake® Data Cloud , adding the power of a data engineering platform can help maximize the value you’re getting out of that database. In this demo, we’ll show you how native tools in the Magpie data engineering platform play well with Snowflake, ultimately, allowing your team to do more in a centralized data engineering environment.

article thumbnail

Digital Payments Data Drives Increased Usage and Customer Retention

Teradata

Payment data drives opportunities to increase usage & prevent attrition through hyper-segmentation, personalized interactions & optimized rewards programs. Read more.

article thumbnail

Storing Cold Metadata, Snowflake Data Cloud, and More: Top 10 Links From Across the Web

Data Council

Here's our January 2021 roundup of links from across the web that could be relevant to you: 1. Storing Cold Metadata with Alki (Dropbox) Dropbox shared insights into Alki , the petabyte-scale metadata store it designed for infrequently accessed metadata (“cold data”). The post details how one-size-fits-all database Edgestore was reaching capacity limits, and why audit logs were a good candidate to be moved elsewhere than on costly SSDs.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

Do you need faster time to value? Does your organization’s success depend on immediate delivery of new reports, applications, or projects? When you go to Central IT for support, are you blocked by insanely long wait times for the resources needed to meet your business goals? If so – you are likely one of the growing group of Line of Business (LoB) professionals forced into creating your own solution – creating your own Shadow IT.

IT 96
article thumbnail

Hepta Analytics Microsoft Silver Partner

Hepta Analytics

Hepta Analytics is proud to announce that we have attained Silver Status within the Microsoft Partner Network ! This achievement means that we have demonstrated our proven expertise in delivering quality solutions in one or more specialized areas of business (namely Cloud Platform and, in future, Data Analytics and Security). Microsoft competencies are designed to prepare companies to meet their customers’ needs, and to help attract new customers who are looking for Microsoft-certified sol

article thumbnail

Defer Transaction Side-Effects in Node.js

Grouparoo

At Grouparoo, we use Actionhero as our Node.js API server and Sequelize for our Object Relational Mapping (ORM) tool - making it easy to work with complex records from our database. Within our Actions and Tasks, we often want to treat the whole execution as a single database transaction - either all the modifications to the database will succeed or fail as a unit.

article thumbnail

Head Pose Estimation with Computer Vision

InData Labs

Recently, head pose estimation has become a popular area of research. Data scientists have spent over 20 years researching the most effective approaches to it, уеt haven’t settled for one. The technology is needed for facial recognition, eye gaze estimation and emotion recognition. For instance, it can be used for safety monitoring on the road, Запись Head Pose Estimation with Computer Vision впервые появилась InData Labs.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 3: PAM authentication

Cloudera

In the previous posts in this series, we have discussed Kerberos and LDAP authentication for Kafka. In this post, we will look into how to configure a Kafka cluster to use a PAM backend instead of an LDAP one. The examples shown here will highlight the authentication-related properties in bold font to differentiate them from other required security properties, as in the example below.

Kafka 81
article thumbnail

How Does UX Design Help in Visualizing Big Data?

Teradata

Learn about the UX principles that help in designing effective Big Data visualizations so users can better understand data and make more informed decisions.

article thumbnail

Better to Be Wrong Than Vague: Apache Kafka and Data Architecture Predictions for 2021

Confluent

On a recent episode of Streaming Audio, Gwen Shapira, Michael Noll, and Ben Stopford joined me to hold forth about the near future of Apache Kafka® and software architecture in […].

Kafka 45
article thumbnail

Creating a uniform landscape for macOS Software

Zalando Engineering

At the time of this writing, we have a universe of Mac applications — that are identified and version-inventoried — within the fleet of little over 3,000 Mac devices in Zalando from which a subset — selected either by their importance, frequency of updates or size of the install base — are part of a so-called software lifecycle. However, in July 2019, when a vulnerability was discovered in Zoom (long before becoming the mainstream video conference app during the COVID-19 pandemic), Information S

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Cloudera Flow Management Continuous Delivery while Minimizing Downtime

Cloudera

Cloudera Flow Management , based on Apache NiFi and part of the Cloudera DataFlow platform , is used by some of the largest organizations in the world to facilitate an easy-to-use, powerful, and reliable way to distribute and process data at high velocity in the modern big data ecosystem. Increasingly, customers are adopting CFM to accelerate their enterprise streaming data processing from concept to implementation.

article thumbnail

Elasticsearch or Rockset for Real-Time Analytics: Managing Clusters vs Going Serverless

Rockset

Having the right analytics backend for your real-time application makes all the difference when it comes to how much time your team spends managing and maintaining the underlying infrastructure. Today, distributed systems that used to require a lot of manual intervention can often be replaced by more operationally efficient solutions. One example of this evolution is the move from Elasticsearch —which has been a great open-source, full-text search and analytics engine—to a low-ops alternative in

article thumbnail

How to Build a Successful Cloud DataOps Program

DataKitchen

The post How to Build a Successful Cloud DataOps Program first appeared on DataKitchen.

article thumbnail

2020 Visual Recap of the Apache Superset Project

Preset

The Apache Superset project experienced a critical growth period in 2020 in all aspects. In this post, I'll document how the key facets of the project changed last year.

Project 40
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Finding digital transformation in high places – how a ski resort improved operational agility and customer experiences

Cloudera

Most blogs in my history are very focused on Industry 4.0’s digital transformation of the manufacturing industry, which in itself is pretty remarkable. By 2025, Industry 4.0 is expected to generate greater than $11 trillion in economic value as connected manufacturing processes, operations and their supply chains become more streamlined, efficient, agile and realize improved productivity, improved uptime and product quality. .

article thumbnail

Cloudera Cares Speaker Series guiding value: Diversity

Cloudera

With intention and creativity, we opened eyes and minds. What now seems like a lifetime ago, our worlds were upended. As the stay at home orders were extended again and again and we continued to work from home, many of us were faced with reimagining our work. . For me, an unexpected challenge as head of Cloudera Cares has been redesigning the employee volunteer experience to continue engaging Clouderans even when in-person activities were no longer possible.

article thumbnail

Apache Superset 1.0 is out!

Preset

The best Superset release to date is finally out

40