Sat.Nov 28, 2020 - Fri.Dec 04, 2020

article thumbnail

A Data Scientist in Engineering Wonderland

Team Data Science

As a data scientist, I always felt a missing link between my developed models and putting them in the production process. Yes, I can create a pipeline, write a model, get results, and interpret the results, but if I cannot scale it, these all will sit on my Jupiter notebooks. This thought led me to my data engineering adventure. I am confident that learning data engineering will make me a better data scientist.

article thumbnail

Streaming Data Integration Without The Code at Equalum

Data Engineering Podcast

Summary The first stage of every good pipeline is to perform data integration. With the increasing pace of change and the need for up to date analytics the need to integrate that data in near real time is growing. With the improvements and increased variety of options for streaming data engines and improved tools for change data capture it is possible for data teams to make that goal a reality.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Project Metamorphosis Month 8: Complete Apache Kafka in Confluent Cloud

Confluent

This is the eighth and final month of Project Metamorphosis: an initiative that brings the best characteristics of modern cloud-native data systems to the Apache Kafka® ecosystem, served from Confluent […].

Kafka 95
article thumbnail

2020 Data Impact Award Winner Spotlight: Rush University Medical Center

Cloudera

After a tumultuous year, the final award category at the Data Impact Awards was a much needed pick me up for everyone in attendance. Showcasing some of the most inspiring and uplifting use cases of Cloudera’s technology, The Data for Good category recognizes organizations that are tackling the challenging issues affecting society and the planet — and we all know there are plenty of them in 2020!

Medical 76
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Teradata at AWS re:Invent

Teradata

Teradata is participating in AWS re:Invent 2020, demonstrating our cloud-first stance as a Gold sponsor. Find out more.

AWS 59
article thumbnail

Immutable Linked Lists in Scala With Call-By-Name and Lazy Values

Rock the JVM

Discover how to harness lazy values and call-by-name techniques to craft a fully immutable doubly-linked list in Scala

Scala 52

More Trending

article thumbnail

How to configure clients to connect to Apache Kafka Clusters securely – Part 1: Kerberos

Cloudera

This is the first installment in a short series of blog posts about security in Apache Kafka. In this article we will explain how to configure clients to authenticate with clusters using different authentication mechanisms. Secured Apache Kafka clusters can be configured to enforce authentication using different methods, including the following: SSL – TLS client authentication.

Kafka 69
article thumbnail

Risk-Based Wealth Management: What the Insurance Industry Gets Wrong

Teradata

Product-centric processes degrade customer experience. Insurers must insulate consumers from internal & regulatory-driven controls by placing them in the center of the customer experience.

article thumbnail

Open Source Highlight: Klio

Data Council

Klio is a framework for easy large-scale processing and ML research on binary files, such as audio files -- its original use case. As a matter of fact, it was developed for audio intelligence at Spotify, which open-sourced it earlier this year at the 2020 International Society for Music Information Retrieval Conference.

Process 52
article thumbnail

Real-Time Serverless Ingestion, Streaming, and Analytics using AWS and Confluent Cloud

Confluent

Due to the distributed architecture of Apache Kafka®, the operational burden of managing it can quickly become a limiting factor on adoption and developer agility. For this reason, it is […].

AWS 84
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Making Privacy an Essential Business Process

Cloudera

Canada is poised to become a world-leader in privacy regulation and with new regulation comes record-breaking fines for those who can’t keep up. . In November, Canada introduced the Digital Charter Implementation Act. If passed, companies could face fines of up to five percent of global revenue or $25 million CAD — whichever is greater — for violating Canadians’ privacy.

Process 68
article thumbnail

Data and Strategic Alignment in the Bank of the Future

Teradata

Strategic alignment is a fundamental building block for the bank of the future. It must rest on integrated data & financial data analysis that inform each stage on the enterprise value chain.

Banking 52
article thumbnail

2020 Retrospective (and What's Coming in 2021)

Rock the JVM

In this article, I'll recap 2020's highlights, share key insights and achievements, and unveil exciting plans for the future of Rock the JVM

52
article thumbnail

Getting Started with Spring Cloud Data Flow and Confluent Cloud

Confluent

Data is the currency of competitive advantage in today’s digital age. All organizations struggle with their data due to the sheer variety of data types and ways that it can […].

Cloud 59
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

2020 Data Impact Award Winner Spotlight: Telkomsel

Cloudera

2020 is a year that’s been defined by transformation. The way we work, how businesses operate, and even serve customers have all transformed in order to cope with the challenges that have been thrown our way. Amongst the chaos, some organizations have excelled. The Industry Transformation category at our Data Impact Awards celebrates these organizations— the ones that have looked digital transformation in the eye and said “bring it on!

article thumbnail

How to Tackle Data Skew

Teradata

Learn how to use use Teradata's Global Space Accounting to counter our biggest villain: data skew.

Data 52
article thumbnail

5 things you should know about Real-Time Analytics

A Cloud Guru: Data Engineering

Running analytics on real-time data is a challenge many data engineers are facing today. But not all analytics can be done in real time! Many are dependent on the volume of the data and the processing requirements. Even logic conditions are becoming a bottleneck. For example, think about join operations on huge tables with more […] The post 5 things you should know about Real-Time Analytics appeared first on A Cloud Guru.

article thumbnail

A Visual Tour of the Global COVID-19 Vaccine Efforts

Preset

In response to the COVID-19 pandemic, hundreds of countries, organizations, universities, and companies came together to fund many vaccine candidates.

Data 40
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

#ClouderaLife: Unplugged

Cloudera

It’s a trick as old as time… or at least as old as technology. We all know that step one to solving for any tech issue is to turn it off and then turn it back on again. But would it solve for issues in advance of them happening? And could this work not only for technology but for the people behind the technology? Our leadership team decided to explore that theory.

article thumbnail

Intertoys

Teradata

Toy retailer uses Vantage on Azure, the modern cloud data analytics platform, as the building blocks for agility and cost-savings.

Retail 52
article thumbnail

Cloudera Operational Database Infrastructure Planning Considerations

Cloudera

In this blog post, let us take a look at how you can plan your infrastructure planning that you may have to do when deploying an operational database cluster on a CDP Private Cloud Base deployment. Note that you may have to do some planning assumptions when designing your initial infrastructure, and it must be flexible enough to scale up or down based on your future needs. .

article thumbnail

Coffee with Cloudera: Cindy Maike, VP of Industry Solutions

Cloudera

Meet Cindy Maike, VP of Industry Solutions at Cloudera. Cindy has led the Industry Solutions team for over 3 years, with 6 years with Cloudera, and has been at the forefront of developing targeted vertical solutions for our customers and partners. Cindy is an exceptional female leader and we hope this blog gives you insight into the great work Cindy is doing with the Industry Solutions team!

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you