Sat.Feb 27, 2021 - Fri.Mar 05, 2021

article thumbnail

Build your data pipelines like the Toyota Way

François Nguyen

If there is one only book to read about lean manufacturing, this is the one. This is the kind of book you can read again and again and still learn something about your current context. It is also a book you can read whatever your industry, you will always find situations covered by this book. Today, we are going to apply these principles to the data pipelines. “The right process will deliver the right results” – Totoya way (section II) In the 14 Toyota way principles, you have

article thumbnail

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

Start Data Engineering

Introduction Pre-requisites Setting up the data-ops pipeline Snowflake Local development environment dbt cloud Connect to Snowflake Link to github repository Setup deployment(release/prod) environment Setup CI PR -> CI -> merge cycle Schedule jobs Host data documentation Conclusion and next steps Further reading References Introduction With companies realizing the importance of having correct data, there has been a lot of attention on the data-ops side of things.

Cloud 130
article thumbnail

To Pull or to Push Your Data with Kafka Connect? That Is the Question.

Confluent

Today, every company is a data company. There are many different data pipeline, integration, and ingestion tools in the market, but before you can feed your data analytics needs, data […].

Kafka 126
article thumbnail

CFO Analytics: What Is It and Why Should You Care?

Teradata

Finance-driven analytics might be the largest untapped opportunity for organizations & a catalyst for driving business value & strategic vision. But, what exactly is CFO analytics?

IT 119
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

The Netflix Cosmos Platform

Netflix Tech

Orchestrated Functions as a Microservice by Frank San Miguel on behalf of the Cosmos team Introduction Cosmos is a computing platform that combines the best aspects of microservices with asynchronous workflows and serverless functions. Its sweet spot is applications that involve resource-intensive algorithms coordinated via complex, hierarchical workflows that last anywhere from minutes to years.

Media 94
article thumbnail

In-memory Caching in Finance

Data Science Blog: Data Engineering

Big data has been gradually creeping into a number of industries through the years, and it seems there are no exceptions when it comes to what type of business it plans to affect. Businesses, understandably, are scrambling to catch up to new technological developments and innovations in the areas of data processing, storage, and analytics. Companies are in a race to discover how they can make big data work for them and bring them closer to their business goals.

Finance 52

More Trending

article thumbnail

Enhancing Customer Experience with Every Journey

Teradata

Big Tech giants dominate by using data to improve product & experience. The auto industry can emulate this by analyzing data to improve customer experience & guide individual choices.

Data 95
article thumbnail

A Day in the Life of an Experimentation and Causal Inference Scientist @ Netflix

Netflix Tech

Stephanie Lane , Wenjing Zheng , Mihir Tendulkar Source credit: Netflix Within the rapid expansion of data-related roles in the last decade, the title Data Scientist has emerged as an umbrella term for myriad skills and areas of business focus. What does this title mean within a given company, or even within a given industry? It can be hard to know from the outside.

article thumbnail

Space-Time Tradeoff: Examining Snowflake's Compute Cost

Rockset

Imagine you had a big book, and you were looking for the section that talks about dinosaurs. Would you read through every page or use the index? The index will save you a lot of time and energy. Now imagine that it’s a big book with a lot of words in really tiny print, and you need to find all the sections that talk about animals. Using the index will save you a LOT of time and energy.

article thumbnail

Kafka Summit Europe 2021 – A Look at the Agenda

Confluent

As you may have heard, we are hosting not one, not two, but three Kafka Summits in 2021. No matter where you are in the world, there’s a Summit event […].

Kafka 81
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Is the Centralized Data Warehouse Dead?

Teradata

Learn how Teradata's founding vision, along with its technology, has evolved over time to deliver on its core principle: bringing data together to drive analytics that matter.

article thumbnail

Open Source Highlight: PostHog

Data Council

PostHog provides open-source product analytics, which users can deploy on their own infrastructure to collect every event on their website or app without having to send the data to third parties - an increasing source of concern in times of GDPR and CCPA.

Data 52
article thumbnail

Why Production Machine Learning Fails — And How To Fix It

Monte Carlo

Machine learning has emerged as a must-have tool for any serious data team: augmenting processes, generating smarter and more accurate predictions, and generally improving our ability to make use of data. However, discussing applications of machine learning, in theory, is much different than actually applying machine learning models at scale in production.

article thumbnail

SQL Dialect differences in Sequelize

Grouparoo

Like many applications, Grouparoo stores data in a relational database. Unlike most applications, Grouparoo works with 2 different types of databases - Postgres and SQLite. We enable our customers to run Grouparoo in a number of different ways - on their laptop with no external decencies, and as part of a large cluster with many servers processing data in parallel.

SQL 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

How we use GraphQL at Europe's largest fashion e-commerce company

Zalando Engineering

Background Today's large scale organizations leveraging microservice architecture face a plethora of problems at the data aggregation and presentation layers. Managing consistent and backwards-compatible APIs for Web and Mobile App frontends is definitely one of the complex ones. The balance between a frontend developer's need for consistent data source and of product managers for delivering new features quickly in a fast-paced, large organization is a tough nut to crack.

article thumbnail

Certifying Ripple's System and Organization Controls: SOC 2

Ripple Engineering

More than a year of cross-team collaboration has resulted in an important achievement: Ripple has been awarded the SOC 2 certification ! How do you make a computer system maximally secure and reliable? Disconnect it from all networks and never change any of the software or data. How do you make a computer system maximally useful? Connect it to networks and make frequent changes to the software and data!

Systems 52
article thumbnail

Monte Carlo is SOC 2 Certified

Monte Carlo

When it comes to managing your company’s data, security is high on your list of priorities. Today, I’m thrilled to share that Monte Carlo has achieved SOC 2 Type I certification , an industry-leading standard for the security, availability, and confidentiality that our organization adopted. What does this mean for you? Our SOC 2 designation means that Monte Carlo has designed a set of internal controls, systems, policies, and procedures that meet industry best practices for protecting our custom

article thumbnail

5 Tips to Create a Job-Winning Data Science Resume in 2023

ProjectPro

You are about to create the best data science resume out there, but first: Data Scientists are unicorns. However, most of your overburdened hiring managers don't know this. They can't see the wonders you make with data-driven insights.It's all Greek to them. You need to cram all your data science superpowers onto your data science resume to prove that you are the best candidate out there for the open data science job.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Organizing Services with ZIO and ZLayers

Rock the JVM

ZIO layers (ZLayers) help structure complex services into independent, composable, and easy-to-understand modules: discover how they can simplify your architecture

article thumbnail

The Future Of Business Intelligence Is Open Source

Preset

It's time for the future of business intelligence to go open source, preventing lock in, providing extensibility, and fostering a community for innovation.

article thumbnail

The New Rules of Data Quality

Monte Carlo

There are two types of data quality issues in this world: those you can predict (known unknowns) and those you can’t (unknown unknowns). Here’s how some of the best data teams are taking a more comprehensive approach to tackling both of them at scale. For the past several years, data teams have leveraged the equivalent of unit testing to detect data quality issues.

article thumbnail

Recommender Systems Python-Methods and Algorithms

ProjectPro

Welcome to the World of Recommender Systems!!! Table of Contents What is a Recommender System? Recommender Systems – An Introduction Types of Recommender Systems 1) Content-Based Filtering 2) Collaborative Filtering Content-Based Recommender Systems Grab Some Popcorn and Coke –We’ll Build a Content-Based Movie Recommender System Analyzing Documents with TI-IDF Creating a TF-IDF Vectorizer Calculating the Cosine Similarity – The Dot Product of Normalized Vectors It’s

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Building an End to End load test automation system on top of Kubernetes

Zalando Engineering

Introduction At Zalando we continuously invent new ways for customers to interact with fashion. In order to provide an excellent customer experience, we must ensure our systems can technically handle high traffic events such as Cyber Week or other sales campaigns. We have published a detailed article on how Zalando prepares for the Cyberweek. Checkout and payments related systems are particularly important during sales events.

Systems 40
article thumbnail

Time-series Analysis With Druid Superset and Prophet

Preset

Time-series analysis with Druid and Superset with in-chart analytics from Facebook's Prophet library.

40
article thumbnail

International Women’s Day 2021: Challenging what’s possible

Cloudera

This year’s International Women’s Day (IWD) on March 8th comes at a time when global communities, businesses, and governments find themselves continuing to pirouette, pivot, and adapt in the face of a relentless, global pandemic. . COVID-19 has touched every aspect of our lives. As women, overnight we suddenly found that we had a portfolio career – comprising our day jobs, caregiver, school teacher and house cleaner – that we had neither asked for, nor were consulted on. .

Portfolio 111
article thumbnail

Bridging The Gap Between Machine Learning And Operations At Iguazio

Data Engineering Podcast

Summary The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. In this episode Yaron Haviv, co-founder of Iguazio, discusses the complexities inherent to the process, as well as how he has worked to democratize the technologies necessary to make machine learning operations maintainable.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Build a Slack Activity Dashboard Using Airbyte and Superset

Preset

In this post, we'll walk through how to use Airbyte with Superset to build a Slack dashboard.

article thumbnail

Using SQL to democratize streaming data

Cloudera

Streaming analytics is crucial to modern business – it opens up new product opportunities and creates massive operational efficiencies. In many cases, it’s the difference between creating an outstanding customer experience versus a poor one – or losing the customer altogether. However, in the typical enterprise, only a small team has the core skills needed to gain access and create value from streams of data.

SQL 118