Top Data Engineering Digest Scala Data Storage Content for Week of Jan 02

Sat.Jan 02, 2021 - Fri.Jan 08, 2021

Data-driven 2021: Predictions for a new year in data, analytics and AI

DataKitchen

JANUARY 4, 2021

The post Data-driven 2021: Predictions for a new year in data, analytics and AI first appeared on DataKitchen.

Data Analytics

Data Analytics Data

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Cloudera

JANUARY 6, 2021

Introduction. Python is used extensively among Data Engineers and Data Scientists to solve all sorts of problems from ETL/ELT pipelines to building machine learning models. Apache HBase is an effective data storage system for many workflows but accessing this data specifically through Python can be a struggle. For data professionals that want to make use of data stored in HBase the recent upstream project “hbase-connectors” can be used with PySpark for basic operations.

Machine Learning

Machine Learning Data Science Database Building

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Improving Population Health Through Citizen 360

Teradata

JANUARY 5, 2021

By leveraging data to create a 360 degree view of its citizenry, government agencies can create more optimal experiences & improve outcomes such as closing the tax gap or improving quality of care.

Government

Government IT Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Skills you should have as a Data Engineer

Team Data Science

JANUARY 8, 2021

Big Data has become the dominant innovation in all high-performing companies. Notable businesses today focus their decision-making capabilities on knowledge gained from the study of big data. Big Data is a collection of large data sets, particularly from new sources, providing an array of possibilities for those who want to work with data and are enthusiastic about unraveling trends in rows of new, unstructured data.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

How to Backfill a SQL query using Apache Airflow

Start Data Engineering

JANUARY 6, 2021

What is backfilling ? Setup Prerequisites Apache Airflow - Execution Day Backfill Conclusion Further Reading References What is backfilling ? Backfilling refers to any process that involves modifying or adding new data to existing records in a dataset. This is a common use case in data engineering. Some examples can be a change in some business logic may need to be applied to an already processed dataset.

SQL

SQL Datasets Data Engineering Data Engineer

Implementing mTLS and Securing Apache Kafka at Zendesk

Confluent

JANUARY 7, 2021

At Zendesk, Apache Kafka® is one of our foundational services for distributing events among different internal systems. We have pods, which can be thought of as isolated cloud environments where […].

Kafka

Kafka Cloud Systems

New Applied ML Research: Few-shot Text Classification

Cloudera

JANUARY 7, 2021

Text classification is a ubiquitous capability with a wealth of use cases. For example, recommendation systems rely on properly classifying text content such as news articles or product descriptions in order to provide users with the most relevant information. Classifying user-generated content allows for more nuanced sentiment analysis. And in the world of e-commerce, assigning product descriptions to the most fitting product category ensures quality control. .

Machine Learning

Machine Learning Algorithm Deep Learning Designing

More Trending

New Applied ML Research: Few-shot Text Classification

Cloudera

JANUARY 7, 2021

Machine Learning

Machine Learning Algorithm Deep Learning Designing

Bringing Feature Stores and MLOps to the Enterprise at Tecton

Data Engineering Podcast

JANUARY 4, 2021

Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner.

Python

Python Machine Learning Computer Science Data Lake

The Business Case for DataOps

DataKitchen

JANUARY 6, 2021

Savvy executives maximize the value of every budgeted dollar. Decisions to invest in new tools and methods must be backed up with a strong business case. As data professionals, we know the value and impact of DataOps: streamlining analytics workflows, reducing errors, and improving data operations transparency. Being able to quantify the value and impact helps leadership understand the return on past investments and supports alignment with future enterprise DataOps transformation initiatives.

Pharmaceutical

Pharmaceutical Consulting Utilities Programming

Digital Payments: An Explosion of Emerging Opportunities

Teradata

JANUARY 3, 2021

Digital payments generate 90% of financial institutions’ useful customer data. How can they exploit its value? Find out more.

IT Data

Maximizing Supply Chain Agility through the “Last Mile” Commitment

Cloudera

JANUARY 5, 2021

In my last two blogs ( Get to Know Your Retail Customer: Accelerating Customer Insight and Relevance, and Improving your Customer-Centric Merchandising with Location-based in-Store Merchandising ) we looked at the benefits to retail in building personalized interactions by accessing both structured and unstructured data from website clicks, email and SMS opens, in-store point sale systems and past purchased behaviors.

Retail

Retail Unstructured Data Machine Learning Big Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Open Source Highlight: Orchest

Data Council

JANUARY 7, 2021

Orchest is an open-source tool for creating data science pipelines. Its core value proposition is to make it easy to combine notebooks and scripts with a visual pipeline editor (“build”); to make your notebooks executable (“run”); and to facilitate experiments (“discover”).

Data Science

Data Science Building IT Data

DataOps Facilitates Remote Work

DataKitchen

JANUARY 5, 2021

Remote working has revealed the inconsistency and fragility of workflow processes in many data organizations. The data teams share a common objective; to create analytics for the (internal or external) customer. Execution of this mission requires the contribution of several groups: data center/IT, data engineering, data science, data visualization, and data governance.

Data Governance

Data Governance Government Data Science Metadata

Drowning in Data - Regulators Need a Data Strategy Too!

Teradata

JANUARY 7, 2021

The problem for regulators & for banks alike is agreeing what good data looks like & how to share it to create a modern, flexible, shared data model. Read more.

Banking

Banking Data IT

What the Functor? Exploring Functors in Depth

Rock the JVM

JANUARY 4, 2021

Explore one of the most essential concepts in pure functional programming: the Functor, a crucial but abstract idea that will challenge your understanding

Programming

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Data Engineering Glossary

Silectis

JANUARY 3, 2021

If you’re new to data engineering or are a practitioner of a related field, such as data science, or business intelligence, we thought it might be helpful to have a handy list of commonly used terms available for you to get up to speed. This data engineering glossary is by no means exhaustive, but should provide some foundational context and information.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Gartner: Operational AI Requires Data Engineering, DataOps, and Data-AI Role Alignment

DataKitchen

JANUARY 2, 2021

In Gartner’s recent report, Operational AI Requires Data Engineering, DataOps, and Data-AI Role Alignment , Robert Thanaraj and Erick Brethenoux recognize that “organizations are not familiar with the processes needed to scale and promote artificial intelligence models from the prototype to the production stages; resulting in uncoordinated production deployment attempts.”.

Data Engineering

Data Engineering Data Engineer Engineering Government

What the Functor? Exploring Functors in Depth

Rock the JVM

JANUARY 4, 2021

Explore one of the most essential concepts in pure functional programming: the Functor, a crucial but abstract idea that will challenge your understanding

Programming

Sat.Jan 02, 2021 - Fri.Jan 08, 2021

Data-driven 2021: Predictions for a new year in data, analytics and AI

Building a Machine Learning Application With Cloudera Data Science Workbench And Operational Database, Part 1: The Set-Up & Basics

Webinars

Trending Sources

Improving Population Health Through Citizen 360

Webinars

Skills you should have as a Data Engineer

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

How to Backfill a SQL query using Apache Airflow

Implementing mTLS and Securing Apache Kafka at Zendesk

New Applied ML Research: Few-shot Text Classification

Sign up to get articles personalized to your interests!

More Trending

New Applied ML Research: Few-shot Text Classification

Bringing Feature Stores and MLOps to the Enterprise at Tecton

The Business Case for DataOps

Digital Payments: An Explosion of Emerging Opportunities

Maximizing Supply Chain Agility through the “Last Mile” Commitment

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Open Source Highlight: Orchest

DataOps Facilitates Remote Work

Drowning in Data - Regulators Need a Data Strategy Too!

What the Functor? Exploring Functors in Depth

How to Modernize Manufacturing Without Losing Control

Data Engineering Glossary

Gartner: Operational AI Requires Data Engineering, DataOps, and Data-AI Role Alignment

What the Functor? Exploring Functors in Depth

Stay Connected