Sat.Jun 19, 2021 - Fri.Jun 25, 2021

article thumbnail

Designing a Data Project to Impress Hiring Managers

Start Data Engineering

Introduction Objective Setup Pre-requisites Project 1. ETL Code 2. Test 3. Scheduler 4. Presentation 4.1. Formatting, Linting, and Type checks 4.2. Architecture Diagram 4.3. README.md 5. Adding Dashboard to your Profile Future Work Tear down infra Conclusion Further Reading References Introduction Building a data project for your portfolio is hard. Getting hiring managers to read through your Github code is even harder.

Project 130
article thumbnail

Saxo Bank’s Best Practices for a Distributed Domain-Driven Architecture Founded on the Data Mesh

Confluent

Al data til folket (all data to the people) is a compelling proposition in an enterprise context. Yet the ability to quickly address integration challenges and deliver data to those […].

article thumbnail

Efficient and Reliable Compute Cluster Management at Scale

Uber Engineering

Introduction. Uber relies on a containerized microservice architecture. Our need for computational resources has grown significantly over the years, as a consequence of business’ growth. It is an important goal now to increase the efficiency of our computing resources. Broadly … The post Efficient and Reliable Compute Cluster Management at Scale appeared first on Uber Engineering Blog.

article thumbnail

Lessons Learned From The Pipeline Data Engineering Academy

Data Engineering Podcast

Summary Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that, Daniel Molnar and Peter Fabian started the Pipeline Academy to do exactly that. In this episode they reflect on the lessons that they learned while teaching the first cohort of their bootcamp how to be effective data engineers.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Migrate Hive data from CDH to CDP public cloud

Cloudera

Introduction. Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. The Replication Manager service facilitates both disaster recovery and data migration across different environments. Using easy-to-define policies, Replication Manager solves one of the biggest barriers for the customers in their cloud adoption journey by allowing them to mov

Cloud 74
article thumbnail

Thank You

Confluent

Today, Confluent became a publicly traded company. This is a big milestone in the short life of our company. To the employees, customers, partners, investors, and the larger developer community […].

94

More Trending

article thumbnail

Make Database Performance Optimization A Playful Experience With OtterTune

Data Engineering Podcast

Summary The database is the core of any system because it holds the data that drives your entire experience. We spend countless hours designing the data model, updating engine versions, and tuning performance. But how confident are you that you have configured it to be as performant as possible, given the dozens of parameters and how they interact with each other?

Database 100
article thumbnail

7 Types of Classification Algorithms in Machine Learning

ProjectPro

This blog will help you master the fundamentals of classification machine learning algorithms with their pros and cons. You will also explore some exciting machine learning project ideas that implement different types of classification algorithms. So, without much ado, let's dive in. Imagine that the pandemic is over and today is a weekday. All the schools, colleges, and offices are open, and you should reach your institution by 8 A.M.

article thumbnail

Confluent Presented the Databricks ISV Momentum Partner Award 2021

Confluent

I’m excited to announce that Confluent was presented with the Databricks ISV Momentum Award at the Databricks Partner Executive Summit last month. This award is given to the partner whose […].

64
article thumbnail

Exploring Data @ Netflix

Netflix Tech

By Gim Mahasintunan on behalf of Data Platform Engineering. Supporting a rapidly growing base of engineers of varied backgrounds using different data stores can be challenging in any organization. Netflix’s internal teams strive to provide leverage by investing in easy-to-use tooling that streamlines the user experience and incorporates best practices.

Data 63
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Standing Up a DataOps Program for Practitioners

DataKitchen

In this five-module course, Mike Lampa & Chris Bergh teach data professionals to plan their organization's DataOps program for low errors & fast deployment. The post Standing Up a DataOps Program for Practitioners first appeared on DataKitchen.

article thumbnail

Look Out for Risks in Open Banking!

Teradata

Open Banking is re-shaping the landscape of financial services and introducing new types of risks extending beyond data security. Secure open banking is everyone’s responsibility.

Banking 59
article thumbnail

Building Real-Time Event Streams in the Cloud, On Premises, or Both with Confluent

Confluent

To the developer or architect seeking to provide their business with as much value as possible, what is the best way to start working with data in motion? Choosing Apache […].

article thumbnail

Deploying applications on CDP Operational Database (COD)

Cloudera

CDP Operational Database Experience (COD) is a PaaS offering on the Cloudera Data Platform (CDP). COD enables you to create a new operational database with a few clicks and auto-scales based on your workload. Behind the scenes, COD automatically manages cluster deployment and configuration, reducing overheads related to setting up new database instances.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Monte Carlo Expands Leadership Team from Snowflake, Segment to Support Hypergrowth of Data Observability Category

Monte Carlo

Monte Carlo , the data reliability company, today announced two new strategic hires to its leadership team: Daniel Day , Head of Marketing, and Jordan Van Horn , Head of Revenue. With experience leading award-winning go-to-market teams at Snowflake and Segment, Day and Van Horn share a deep expertise in the data industry and will help Monte Carlo meet the growing demands as the industry leader in Data Observability.

article thumbnail

Federated Development with Deployment at Scale

Teradata

The connected cloud data warehouse is fundamental to Data Mesh implementation in large and complex organisations. Find out why.

article thumbnail

ZIO Fibers: Concurrency and Lightweight Threads

Rock the JVM

Explore ZIO's unique fiber model for concurrency: see how it stands out from other effect libraries in the Scala ecosystem

Scala 52
article thumbnail

Zalando Tech Radar - Scaling Contributions to Technology Selection

Zalando Engineering

Introduction In our previous post about Technology Choices at Zalando we spoke about a few problems with scaling technology selection in Tech companies. Since then, we have focused on the remaining categories of the Tech Radar beyond languages and the Tech Radar contribution process. Now, we'd like to reflect on our lessons learned, which you can use when designing technology selection processes.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Role of DataOps in Data Modernization

DataKitchen

Cognizant's JP Thakur & DataKitchen's Chris Bergh discuss how DataOps sets the foundation for Data Modernization initiatives enabling continuous data & insight. The post The Role of DataOps in Data Modernization first appeared on DataKitchen.

Data 52
article thumbnail

What Concept Are You Trying to Prove?

Teradata

When undertaking the expense & time to execute a proof of concept on your journey to the cloud, make sure the efforts are well defined & drive an actionable outcome.

Cloud 52
article thumbnail

ZIO Fibers: Concurrency and Lightweight Threads

Rock the JVM

Explore ZIO's unique fiber model for concurrency: see how it stands out from other effect libraries in the Scala ecosystem

Scala 52
article thumbnail

The A-Z Guide to Gradient Descent Algorithm and Its Variants

ProjectPro

If you have heard of Machine Learning and Deep Learning, you must have also heard about cost (error or loss) functions. But, even if you haven't, fret not! The cost function, in simple words, is a way of measuring the performance of a Machine Learning model by attributing a cost to every 'mistake' or wrong prediction that the model makes. But as we know from personal experience, one can gain little by simply knowing that a mistake has occurred or how many, for that matter, if you have no clue ho

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Visualizing ClickHouse Data - ClickHouse SQLAlchemy

Preset

Analyzing data that’s frequently updated? A comprehensive walkthrough of how to analyze and visualize ClickHouse data in Superset.

Data 40
article thumbnail

Our Product Vision for a Developer-First CDP

RudderStack

This blog talks in detail about RudderStack's product vision- what you can expect from RudderStack in the coming quarters in terms of features and experience.

40
article thumbnail

DAX-JUNGLE: NORM.DIST

FreshBI

It’s a jungle out there Back in the day- when I was stuck on a DAX problem, I used to toggle through the IntelliSense in PowerBI one letter at a time. I’ve learned much since then and in this blog I’d like to share my experience with using NORM.DIST in Dax. A: ABS ACOS ACOSH … B: BETA.DIST BETA.INV BLANK Etc…. Hours wasted. Mistakes were made A MUCH better use of my time would have been reviewing quality solutions to real world problems.

BI 52
article thumbnail

Top 20 Data Analytics Projects for Students to Practice in 2023

ProjectPro

According to Gartner , organizations can suffer a financial loss of up to 15 million dollars for the poor quality of data. As per McKinsey , 47% of organizations believe that data analytics has impacted the market in their respective industries. According to Forbes , in 2012 only 12% of Fortune 1000 companies reported having a CDO (Chief Data Officer).

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Insurers – Be Aware of the Hidden Exposures in assessing the economic impact of Climate Risk

Cloudera

Climate change is a challenge for insurers in some obvious ways, such as stronger and more frequent natural disasters. Yet there are also more subtle risks to monitor, including changes to insured assets, risks, and exposures. Climate impacts the production quality and quantity of insured consumable goods, their location, and their supply chains. Climate change can also impact the insurance carrier as an enterprise itself—similarly to cyber risks, insurers underwrite cyber risks for their custom

article thumbnail

RudderStack Product News Vol. #007 - New Security Features

RudderStack

RudderStack product news update includes the latest security features- Mobile SDK Distribution, New Spreadsheet, Database Destinations, and RudderStack video content.

article thumbnail

Using DataOps to Drive Agility and Business Value

DataKitchen

In May 2021 at the CDO & Data Leaders Global Summit, DataKitchen sat down with the following data leaders to learn how to use DataOps to drive agility and business value. Kurt Zimmer, Head of Data Engineering for Data Enablement at AstraZeneca. Ryan Chapin, Former Executive Manager, Advanced Additive Design, Chief Product and Portfolio Manager, GE Aviation.