Sat.Sep 12, 2020 - Fri.Sep 18, 2020

article thumbnail

Streaming Data from Apache Kafka into Azure Data Explorer with Kafka Connect

Confluent

Near-real-time insights have become a de facto requirement for Azure use cases involving scalable log analytics, time series analytics, and IoT/telemetry analytics. Azure Data Explorer (also called Kusto) is the […].

Kafka 139
article thumbnail

Fundamentals for Success in Cloud Data Management

Cloudera

Everybody needs more data and more analytics, with so many different and sometimes often conflicting needs. Data engineers need batch resources, while data scientists need to quickly onboard ephemeral users. Data architects deal with constantly evolving workloads and business analysts must balance the urgency and importance of a concurrent user population that continues to grow.

article thumbnail

Distributed In Memory Processing And Streaming With Hazelcast

Data Engineering Podcast

Summary In memory computing provides significant performance benefits, but brings along challenges for managing failures and scaling up. Hazelcast is a platform for managing stateful in-memory storage and computation across a distributed cluster of commodity hardware. On top of this foundation, the Hazelcast team has also built a streaming platform for reliable high throughput data transmission.

Process 100
article thumbnail

Analytics at Netflix: Who we are and what we do

Netflix Tech

Analytics at Netflix: Who We Are and What We Do An Introduction to Analytics and Visualization Engineering at Netflix by Molly Jackman & Meghana Reddy Explained: Season 1 (Photo Credit: Netflix) Across nearly every industry, there is recognition that data analytics is key to driving informed business decision-making. But there is far less agreement on what that term “data analytics” actually means?

BI 97
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Using the Fully Managed MongoDB Atlas Connector in a Secure Environment

Confluent

Since the MongoDB Atlas source and sink became available in Confluent Cloud, we’ve received many questions around how to set up these connectors in a secure environment. By default, MongoDB […].

MongoDB 97
article thumbnail

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

For enterprise organizations, managing and operationalizing increasingly complex data across the business has presented a significant challenge for staying competitive in analytic and data science driven markets. With growing disparate data across everything from edge devices to individual lines of business needing to be consolidated, curated, and delivered for downstream consumption, it’s no wonder that data engineering has become the most in-demand role across businesses — growing at an estima

More Trending

article thumbnail

How Our Paths Brought Us to Data and Netflix

Netflix Tech

Part of our series on who works in Analytics at Netflix?—?and what the role entails by Julie Beckley & Chris Pham This Q&A provides insights into the diverse set of skills, projects, and culture within Data Science and Engineering (DSE) at Netflix through the eyes of two team members: Chris Pham and Julie Beckley. Photo from a team curling offsite?

article thumbnail

Confluent Is Now Certified Ready on AWS Outposts

Confluent

Are you looking for a way to run AWS services on premises in your own datacenter? I am excited to share today that we have completed validation of support for […].

AWS 72
article thumbnail

Announcing the 2020 Data Impact Awards Finalists

Cloudera

Announcing the finalists of the Data Impact Awards is always a highlight in our annual Cloudera calendar, and this year is no different. The 2020 entrants have shown incredible data-driven innovation, problem-solving ability and have proven real-world impact. . Our independent judges certainly had their jobs cut out for them, as they were faced with an overwhelming number of outstanding entries.

Banking 103
article thumbnail

How to develop digital products and solutions for industrial environments?

Data Science Blog: Data Engineering

The Data Science and Engineering Process in PLM. Huge opportunities for digital products are accompanied by huge risks Digitalization is about to profoundly change the way we live and work. The increasing availability of data combined with growing storage capacities and computing power make it possible to create data-based products, services, and customer specific solutions to create insight with value for the business.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Lenses, Prisms, and Optics in Scala

Rock the JVM

Inspect, extract, and modify deeply nested data structures in Scala with ease: discover a powerful method to handle complex data effortlessly

Scala 52
article thumbnail

Celebrating Hispanic Heritage Month

Teradata

Hispanic Heritage Month not only promotes the rich culture & heritage that so many Americans share, it sheds a distinct light on colleagues & friends. Read more from our colleague, Crystal Diaz.

IT 52
article thumbnail

Covid-19 Accelerates The Need for Retail, Manufacturing Supply Chains To Adapt – Part 2

Cloudera

Supply chain redesign has become an area of critical focus as businesses try to ensure supply chains aren’t overly reliant on any one constituent part. . In Part One of our round-table discussion with Michael Ger, Managing Director of Manufacturing and Automotive, and Brent Biddulph, Global Managing Director, Retail and Consumer Goods at Cloudera they talked about the direct impact of COVID 19 on the retail and manufacturing sectors with Vijay Raja, Director of Industry & Solutions Marketing

article thumbnail

Don't Track Product Performance with Events

Grouparoo

Many businesses have built great analytics products to help with tracking the actions your users are taking in your product ( Mixpanel , Pendo , and Amplitude , to name a few). These products use an events-based data model where they track user behavior, usually client-side, so you can understand and visualize behavior like page views and button clicks.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Data Observability Tools: Data Engineering’s Next Frontier

Monte Carlo

To keep pace with data’s lightning innovation speed, data engineers need to invest not only in the latest data modeling and analytics tools, but also technologies that can increase data accuracy and prevent broken ETL pipelines. The solution? Data observability tools , the next frontier of data engineering and a pillar of the emerging data reliability category.

article thumbnail

Clean Up Your Enterprise Data Mess the Easy Way: Ignore it

Teradata

If you’re responsible for data strategy in a large organization, there’s a good chance you’ve got a data mess on your hands. So what do you do with it? Read more.

IT 52
article thumbnail

How to get powerful and actionable insights from any and all of your data, without delay

Cloudera

Today’s data tool challenges. A North American telecom company struggled for years trying to react quickly enough to new categories and new levels of spam texts and calls. They also did not have a good way to know when and why they would need additional capacity on their own, or any other telecom company’s networks. By enabling their event analysts to monitor and analyze events in real time, as well as directly in their data visualization tool, and also rate and give feedback to the system

article thumbnail

Akka HTTP Loves JSON: 3 Libraries You Can Integrate into Akka HTTP

Rock the JVM

Akka HTTP needs JSON like humans need water: discover how to integrate Spray-Json, circe, and Jackson into Akka HTTP

52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Monte Carlo Raises $16M to Build the World’s First Data Reliability Platform

Monte Carlo

We’re excited to share that Monte Carlo has raised $16M in funding to pioneer the Data Reliability category. Our Series A was led by Accel , with participation from GGV Capital , and enables us to pursue our mission of accelerating the world’s adoption of data by reducing Data Downtime. Other angel investors include DJ Patil , the former Chief Data Scientist for the U.S. as well as top executives from Cloudera, eBay, Google and VMWare.

article thumbnail

#CloudGuruChallenge – Event-Driven Python on AWS

A Cloud Guru: Data Engineering

You can complete the project requirements by yourself or in collaboration with others. Feel free to ask questions in the discussion forum or on social media using the #CloudGuruChallenge hashtag! The post #CloudGuruChallenge – Event-Driven Python on AWS appeared first on A Cloud Guru.

AWS 52
article thumbnail

Access control for Azure ADLS cloud object storage

Cloudera

Cloudera Data Platform 7.2.1 introduces fine-grained authorization for access to Azure Data Lake Storage using Apache Ranger policies. Cloudera and Microsoft have been working together closely on this integration, which greatly simplifies the security administration of access to ADLS-Gen2 cloud storage. Apache Ranger provides a centralized console to manage authorization and view audits of access to resources in a large number of services including Apache Hadoop’s HDFS, Apache Hive, Apache HBase

article thumbnail

Leveraging Teradata Vantage's Superior Performance for Real-Time Analytics

Teradata

Learn how Teradata Vantage was used for a leading Turkish bank to predict credit scores for customers in real time and to make near-immediate decisions on their loan applications.

Banking 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Save your High Water Marks as Strings

Grouparoo

In Brian’s post, Building a Sync Engine , he talks about the value of using a High Water Mark to keep track of the latest bit of data you’ve imported. This approach is often a better pattern than using Limit and Offset , especially when the underlying data might be changing. In this post, I’m gong to dive even deeper into this topic, and suggest that you should be storing you High Water Marks as strings whenever possible.

MySQL 52
article thumbnail

Rockset: 1 Billion Events in a Day with 1-Second Data Latency

Rockset

YADB (Yet Another Database Benchmark) The world has a plethora of database benchmarks, starting with the Wisconsin Benchmark which is my favorite. Firstly, that benchmark was from Dr David Dewitt, who taught me Database Internals when I was a graduate student at University of Wisconsin. Secondly, it is probably the earliest conference paper (circa 1983) that I ever read.

article thumbnail

Addressing the data storm with the Enterprise Data Cloud

Cloudera

For some, this may look like a new category at this year’s Data Impact Awards. However, the Enterprise Data Cloud category marks the evolution of what was once the Data Anywhere category. The main reason for this change is that this title better represents the move that our customers are making; away from acknowledging the ability to have data ‘anywhere’.

Cloud 68