Sat.Mar 06, 2021 - Fri.Mar 12, 2021

article thumbnail

Building a Data Engineering Project in 20 Minutes

Simon Späti

This post focuses on practical data pipelines with examples from web-scraping real-estates, uploading them to S3 with MinIO, Spark and Delta Lake, adding some Data Science magic with Jupyter Notebooks, ingesting into Data Warehouse Apache Druid, visualising dashboards with Superset and managing everything with Dagster. The goal is to touch on the common data engineering challenges and using promising new technologies, tools or frameworks, which most of them I wrote about in Business Intelligence

article thumbnail

Under the Hood of Real-Time Analytics with Apache Kafka and Pinot

Confluent

Real-time analytics has become the need of the hour for modern internet companies. The ability to derive internal insights around business metrics, user growth and adoption as well as security […].

Kafka 144
article thumbnail

Towards a Data Mesh (part 1) : Data Domains and Teams Topologies.

François Nguyen

Just an illustration – not the truth and we will pivot if it does not work. I discovered Zhamak Dehghani’s first article about Data Mesh in August 2020. Thanks to Youtube, you have the live illustration in this video with even more context and explanations. And then, you have this second video that is an introduction to her second article (december 2020).

article thumbnail

Remote Workstations for the Discerning Artists

Netflix Tech

By Michelle Brenner Netflix is poised to become the world’s most prolific producer of visual effects and original animated content. To meet that demand, we need to attract the world’s best artistic talent. Artists like to work at places where they can create groundbreaking entertainment instead of worrying about getting access to the software or source files they need.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Building a Data Engineering Project in 20 Minutes

Simon Späti

This post focuses on practical data pipelines with examples from web-scraping real-estates, uploading them to S3 with MinIO, Spark and Delta Lake, adding some Data Science magic with Jupyter Notebooks, ingesting into Data Warehouse Apache Druid, visualising dashboards with Superset and managing everything with Dagster. The goal is to touch on the common data engineering challenges and using promising new technologies, tools or frameworks, which most of them I wrote about in Business Intelligence

article thumbnail

How to Tune RocksDB for Your Kafka Streams Application

Confluent

Apache Kafka ships with Kafka Streams, a powerful yet lightweight client library for Java and Scala to implement highly scalable and elastic applications and microservices that process and analyze data […].

Kafka 131

More Trending

article thumbnail

Enterprise Data Operating Systems in the Cloud: Necessary, But Not Sufficient

Teradata

Getting your Cloud data architecture right starts with understanding which data products you need, the roles they perform, & the functional & non-functional characteristics that those roles demand.

Cloud 110
article thumbnail

ConsoleMe: A Central Control Plane for AWS Permissions and Access

Netflix Tech

ConsoleMe: A Central Control Plane for AWS Permissions and Access By Curtis Castrapel , Patrick Sanders , and Hee Won Kim At AWS re:Invent 2020, we open sourced two new tools for managing multi-account AWS permissions and access. We’re very excited to bring you ConsoleMe (pronounced: kuhn-soul-mee ), and its CLI utility, Weep (pun intended)! If you missed the talk, check it out here.

AWS 105
article thumbnail

Leave Your Data Where It Is And Automate Feature Extraction With Molecula

Data Engineering Podcast

Summary A majority of the time spent in data engineering is copying data between systems to make the information available for different purposes. This introduces challenges such as keeping information synchronized, managing schema evolution, building transformations to match the expectations of the destination systems. H.O. Maycotte was faced with these same challenges but at a massive scale, leading him to question if there is a better way.

IT 100
article thumbnail

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

Governance and the sustainable handling of data is a critical success factor in virtually all organizations. While Cloudera Data Platform (CDP) already supports the entire data lifecycle from ‘Edge to AI’, we at Cloudera are fully aware that enterprises have more systems outside of CDP. It is crucial to avoid that CDP becomes the next silo in your IT landscape.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Integrating Apache Kafka Clients with CNCF Jaeger at Funding Circle Using OpenTelemetry

Confluent

At Funding Circle, we rely heavily on Kafka as the main piece of infrastructure to enable our event-driven-based microservices architecture. There are numerous organizational benefits of microservices, however a key […].

Kafka 83
article thumbnail

Production Media Management: Transforming Media Workflows by leveraging the Cloud

Netflix Tech

Written by Anton Margoline , Avinash Dathathri , Devang Shah and Murthy Parthasarathi. Credit to Netflix Studio’s Product, Design, Content Hub Engineering teams along with all of the supporting partner and platform teams. In this post, we will share a behind-the-scenes look at how Netflix delivers technology and infrastructure to help production crews create and exchange media during production and post production stages.

Media 72
article thumbnail

How to Get Your Cloud Analytic Architecture Right

Teradata

Getting your Cloud data architecture right starts with understanding which data products you need, the roles they perform, & the functional & non-functional characteristics that those roles demand.

article thumbnail

Cloudera celebrates International Women’s Day – Sharing experiences and our voices from around the globe

Cloudera

Cloudera is happy to be an official supporter of International Women’s Day 2021. We at Cloudera believe in the undeniable power of data to build a more equitable future, and we are humbled to be building the products that make it possible for data to change the world for the better. . The theme of this year’s IWD is #ChooseToChallenge. As w e celebrate the social, economic, cultural, and political achievements of women, we’re building a foundation for our future young women, raising awareness ab

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Gartner – Top Trends in Data and Analytics for 2021: XOps

DataKitchen

Gartner identified XOps (DataOps, ModelOps, DevOps) as one of the top trends in data and analytics for 2021. Below we provide additional suggestions for further reading based on Gartner’s recommendations. What is XOps? . Gartner: “The multiplication of Ops disciplines stemming out of DevOps best practices has caused significant confusion in the marketplace.

article thumbnail

Micro Frontends: from Fragments to Renderers (Part 1)

Zalando Engineering

In 2015, we wanted to improve how we delivered features to customers and move away from a monolithic shop system. Project Mosaic and its microservices approach for the frontend were vital to support this transition. Mosaic enabled a relatively large number of teams to work on the main Zalando website independently and without performance compromises.

article thumbnail

CRM System Rate Limiting Overview

Grouparoo

Rate limiting is the method by which an API limits the calls for its use. When creating a data sync implementation with an API, it's important to adapt the approach that the remote system takes. Whether stated or not, all systems have a rate limit. Even if not addressed explicitly, there is still some finite number of parallel connections that a set of servers can handle.

Systems 52
article thumbnail

#ClouderaLife Spotlight: Karen Ji, Senior Manager, Customer Operations

Cloudera

Karen Ji, is Cloudera’s Senior Manager Customer operations ensuring the success of our global customers, with a regional focus on China and Korea. Multi-tasking is my superpower. Karen joined Cloudera as a Solutions Engineer before switching to lead customer support. On a daily basis Karen collaborates across the business with different functions, but works extremely closely with the field, sales and professional services teams to ensure that customers have the support and insights they need to

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

7 Data Engineering Trends to Watch

Silectis

The importance of data engineering is on the rise, with organizations increasingly investing in talent and infrastructure. Here at Silectis, we are in the fortunate position of working with a wide range of enterprises across multiple industries. I caught up with a few members of the team to take note of some of the data engineering trends we anticipate seeing more of this year and beyond. 1.

article thumbnail

RippleNet Engineering's Inclusive Language Initiative: Part 2

Ripple Engineering

Welcome back to the second post of this Inclusive Language blog series! Previously, we contextualized the importance of eliminating terms with problematic and racist origins from our codebase, such as “master” and “slave”, or “blacklist” and “whitelist” We then suggested changing them with equally clear and more agreeable words such as “primary” and “secondary”, “denylist” and “allowlist

article thumbnail

All That Glitters is Not Gold!

Teradata

All companies want a golden data analytics platform. But instead of looking at the real properties of the platform, they are often mislead by its shine & look. Find out more.

article thumbnail

Should You Build or Buy a DataOps Solution?

DataKitchen

The post Should You Build or Buy a DataOps Solution? first appeared on DataKitchen.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

5 Tips for Recruiting Top Engineering Talent in Startups

Rockset

“Two of the most important things as a CEO of a company are to make sure you have money in the bank and recruit amazing people.” - Venkat Venkataramani, CEO and Co-Founder of Rockset We hosted a Clubhouse event with VPs of Engineering from Gusto and Robinhood, Nimrod Hoofien and Adam Wolff, on their tips for recruiting top engineering talent in startups.

article thumbnail

Building Database Connectors for Superset Using SQLAlchemy

Preset

Superset can integrate with almost any SQL speaking database because of SQLAlchemy, Python DB-API 2, and some minimal custom logic.

article thumbnail

The Future: Seamless Journey to Invisible Payments

Teradata

The future of payments is rapidly evolving toward seamless omni-channel customer journeys and ultimately, payments becoming invisible. Find out more.

52
article thumbnail

6 security risks in software development and how to address them

DataKitchen

The post 6 security risks in software development and how to address them first appeared on DataKitchen.

52
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Monte Carlo Launches Chief Data Officer Advisory Board

Monte Carlo

Today, I am proud to announce the formation of Monte Carlo’s Chief Data Officer (CDO) advisory board. The advisory board was launched to help Monte Carlo and the emerging data observability market better serve customers on their journeys to data trust, advise their product roadmap, and pioneer the data observability category. This announcement comes just weeks after our $25M Series B funding round this February, led by Redpoint Ventures, backers of Snowflake and Looker, and GGV Capital, in

article thumbnail

The Future of Business Intelligence is Open Source

Maxime Beauchemin

While “software is [still actively] eating the world” , it’s also clear that open source is taking over software. Simply put, open source is a superior approach at building and distributing software because it provides important guaranties around how software can be discovered, tried, operated, collaborated on and packaged. For those reasons, it is not surprising that it has taken over most of the modern data stack: infrastructure, databases, orchestration, data processing, AI/ML and beyond.

article thumbnail

Banco Bradesco

Teradata

Vantage scales in-database R/Python models on 70M clients. The customer analytics are transforming Bradesco to become the bank of the future, scaling insights and accelerating time-to-value.

Banking 52