Sat.Sep 04, 2021 - Fri.Sep 10, 2021

article thumbnail

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Uber Engineering

Problem. Uber deploys a few storage technologies to store business data based on their application model. One such technology is called Schemaless , which enables the modeling of related entries in one single row of multiple columns, as well as … The post Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System appeared first on Uber Engineering Blog.

Systems 118
article thumbnail

Value Proposition of the Cloudera Operational Database over Legacy Apache HBase Deployments

Cloudera

The CDP Operational Database ( COD ) builds on the foundation of existing operational database capabilities that were available with Apache HBase and/or Apache Phoenix in legacy CDH and HDP deployments. Within the context of a broader data and analytics platform implemented in the Cloudera Data Platform ( CDP ), COD will function as highly scalable relational and non-relational transactional database allowing users to leverage big data in operational applications as well as the backbone of the a

Database 100
article thumbnail

A View From The Round Table Of Gartner's Cool Vendors

Data Engineering Podcast

Summary Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For businesses that are working in the data management and analytics space they recognized the efforts of Timbr.ai, Soda Data, Nexla, and Tada. In this episode the founders and leaders of each of these organizations share their perspective on the current state of the market, and the challenges facing businesses and data professionals today.

SQL 100
article thumbnail

Event Sourcing Outgrows the Database

Confluent

I’ve always found event sourcing to be fascinating. We spend so much of our lives as developers saving data in database tables—doing this in a completely different way seems almost […].

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Decision Making at Netflix

Netflix Tech

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This introduction is the first in a multi-part series on how Netflix uses A/B tests to make decisions that continuously improve our products, so we can deliver more joy and satisfaction to our members. Subsequent posts will cover the basic statistical concepts underpinning A/B tests, the role of experimentation across Netflix, how Netflix has invested in infrastructure to support and scale experimentation, a

article thumbnail

Big Data 50: Companies Driving Innovation

DataKitchen

The post Big Data 50: Companies Driving Innovation first appeared on DataKitchen.

More Trending

article thumbnail

Data Integration: Approaches, Techniques, Tools, and Best Practices for Implementation

AltexSoft

The pace of data being created is mind-blowing. For example, Amazon receives more than 66,000 orders per hour with each order containing valuable pieces of information for analytics. Yet, dealing with continuously growing volumes of data isn’t the only challenge businesses encounter on the way to better, faster decision-making. Information often resides across countless distributed data sources, resulting in data silos.

article thumbnail

Data-Driven Performance Improvements: Basketball and actionable insights

Retail Insight

At the 1992 Olympics, the American men’s basketball team won the gold medal after years of disappointment and underperformance. For the first time at an Olympics, Team USA was comprised of professional US National Basketball Association (NBA) players, including the legendary Michael Jordan. Since this ‘Dream Team’ was formed, the USA men’s basketball team has won seven golds at the last eight Olympics, including most recently at Tokyo 2020.

Data 52
article thumbnail

Spark vs Hive - What's the Difference

ProjectPro

Apache Hive and Apache Spark are the two popular Big Data tools available for complex data processing. To effectively utilize the Big Data tools, it is essential to understand the features and capabilities of the tools. Spark vs. Hive comparison elaborates on the two tools’ architecture, features, limitations, and key differences. Table of Contents Spark vs Hive - Architecture Hive vs Spark - Key Features and Capabilities Apache Hive - Key Features Apache Spark - Key Features Apache Spark

Hadoop 52
article thumbnail

Supporting Transformation with an Integrated Data Platform. Three Common Questions Answered.

Cloudera

In recent years there has been increased interest in how to safely and efficiently extend enterprise data platforms and workloads into the cloud. CDOs are under increasing pressure to reduce costs by moving data and workloads to the cloud, similar to what has happened with business applications during the last decade. Our upcoming webinar is centered on how an integrated data platform supports the data strategy and goals of becoming a data-driven company.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Reflecting on Change.

Teradata

Change is inevitable, but you have to adapt to survive. Take a look back on the last 40 years to see how Teradata has adapted to change.and not only survived, but thrived.

52
article thumbnail

Jumpstart Your DataOps Program with DataKitchen’s Lean DataOps

DataKitchen

Adopting DataOps can be easy; by following DataKitchen's 'Lean DataOps' four-phase program, you can roll out DataOps in smaller, easy-to-manage increments. The post Jumpstart Your DataOps Program with DataKitchen’s Lean DataOps first appeared on DataKitchen.

article thumbnail

Apache Superset™ Now Supports Rockset

Preset

Apache Superset™ now supports Rockset as a data source. Rockset is a real-time indexing database built for the cloud that uses RocksDB for fast storage.

article thumbnail

#ClouderaLife Spotlight: Fanly Tanto, Regional Sales Director

Cloudera

Meet Fanly Tanto. Fanly is a Regional Sales Director operating out of Indonesia and the recent recipient of Channel Asia’s Women in ICT “Shining Star” Award – an award recognizing candidates with “a strong record of achievement and a consistent high performer who regularly achieves standout business results and continues to assume increased levels of seniority.” .

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Micro Frontends: Deep Dive into Rendering Engine (Part 2)

Zalando Engineering

Zalando's Fashion Store has been running on top of microservices for quite some time already. This architecture has proven to be very flexible, and project Mosaic has extended it – although partially – to the frontend, allowing HTML fragments from multiple services to be stitched together, and served as a single page. Fragments in Mosaic can be seen as the first step towards a Micro Frontends architecture.

article thumbnail

Hello World: Join the New Rockset Developer Community

Rockset

At Rockset, we work hard to build developer tools (as well as APIs and SDKs) that allow you to easily consume semi-structured data using SQL and run sub-second queries on real-time data. You automatically get our Converged Index ™, which unifies indexing, sub-second query latency on terabytes of nested data, real-time data ingestion for mere seconds in data latency, and much more.

SQL 52
article thumbnail

What is Operational Analytics?

Grouparoo

We've improved the Getting Started Experience! Check out our UI Configuration method. The steps utilizing grouparoo generate will not be replicable as the command will be fully deprecated in v0.8.1 What is Operational Analytics? Operational analytics is the process of creating data pipelines and datasets to support business teams such as sales, marketing, and customer support.

article thumbnail

Cloudera and NVIDIA Help IRS Fight Fraud, Safeguard Taxpayers

Cloudera

Across the federal government, agencies are struggling to identify, organize, analyze, and act on troves of data. It’s a problem that leaders are working actively to tackle, but they’re in a race against immeasurable volumes of data that is continuously being generated in perpetuity in stores known and unknown. At the Internal Revenue Service, decades’ worth of data exceeds even the most cutting-edge processing capabilities.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Slowly Changing Dimensions (SCD Type 1) with Delta and Databricks

Advancing Analytics: Data Engineering

From Warehouse to Lakehouse Pt.1 SCD Type 1 in SQL and Python Introduction With the move to cloud based Data Lake platforms there has often been criticism from the more traditional Data Warehousing community. A Data Lake, offering cheap, almost endlessly scalable storage in the cloud is hugely appealing to a platform administrator however over the number of years that this has been promoted some adopters have often fallen victim to the infamous Data Swamp.

article thumbnail

Taking Pride in Our Actions

Teradata

Corporate responsibility may have a new name but Teradata’s commitments continue to shine. Read Claire Bramley and Molly Treese’s overview of Teradata’s dedicated ESG efforts.

52
article thumbnail

Data Engineering Annotated Monthly – August 2021

Big Data Tools

August is usually a quiet month, with vacations taking their toll. But data engineering never stops. I’m Pasha Finkelshteyn and I will be your guide through this month’s news, my impressions of the developments, and ideas from the wider community. If you think I missed something worthwhile, ping me on Twitter and suggest a topic, link, or anything else.

article thumbnail

20 Web Scraping Projects Ideas for 2023

ProjectPro

In this article, you will find a list of interesting web scraping projects that are fun and easy to implement. The list has worthwhile web scraping projects for both beginners and intermediate professionals. The projects have been divided into categories so that you can quickly pick one as per your requirements. Table of Contents Top 20 Web Scraping Project Ideas Useful Web Scraping Projects for Beginners Fun Web Scraping Projects for Final Year Students Python Web Scraping Projects Machine Lear

Project 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

See Rockset’s Rollups for Streaming Data at Kafka Summit 2021

Rockset

Event stream processing has lately become the most-requested feature among data practitioners, who are ever being pushed by their business counterparts for more fresh, real-time insights to improve their operational decisions and boost the digital customer experience. But while streaming data is easy, analyzing it in real time was, until recently, too expensive and too slow.

Kafka 52
article thumbnail

Welcome, KC!

Grouparoo

The promise of open source is one of community. It is about people making great things together. With that in mind, maybe it's not surprising that we first met KC Glick years ago when he contributed to the Actionhero project that is at the core of Grouparoo. Now, he's on the Grouparoo team and will be contributing throughout the stack. KC comes to us most recently from iHeart, the media company that runs all those stations we listen to.

Media 52
article thumbnail

Building an Open-source Ingestion Layer with Airbyte

Preset

To build an open-source community tracker, we first build an ingestion layer with Airbyte

article thumbnail

Top 15 Machine Learning Use Cases in 2023

ProjectPro

The Machine Learning market is anticipated to be worth $30.6 Billion in 2024. The world is increasingly driven by the Internet of Things (IoT) and Artificially Intelligent (AI) solutions. Machine Learning plays a vital role in the design and development of such solutions. Machine learning is everywhere. We live in an era led by machine learning applications , be it the Voice Assistants on our Smartphones, the Face Unlock feature, the surge pricing on the ride-hailing apps, email filtering, and m

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

RudderStack Product News Vol. #012 - Call for Beta Users

RudderStack

In this update, we cover the S3 Data Lake destination, our Braze Currents source, and other new integrations.

article thumbnail

Data Engineering Annotated Monthly – August 2021

Big Data Tools

August is usually a quiet month, with vacations taking their toll. But data engineering never stops. I’m Pasha Finkelshteyn and I will be your guide through this month’s news, my impressions of the developments, and ideas from the wider community. If you think I missed something worthwhile, ping me on Twitter and suggest a topic, link, or anything else.

article thumbnail

Reverse ETL and Data Observability: Solving Data’s “Last Mile” Problem

Monte Carlo

Modern data teams have all the right solutions in place to ensure that data is ingested, stored, transformed, and loaded into their data warehouse, but what happens at “the last mile?” In other words, how can data analysts and engineers ensure that transformed, actionable data is actually available to access and use? Here’s where Reverse ETL and Data Observability can help teams go the extra mile when it comes to trusting your data products.

article thumbnail

AWS vs GCP - Which One to Choose in 2023?

ProjectPro

Are you confused about choosing the best cloud platform for your next data engineering project ? AWS vs. GCP blog compares the two major cloud platforms to help you choose the best one. So, are you ready to explore the differences between two cloud giants, AWS vs. google cloud? Let’s get started! Table of Contents AWS vs. GCP - The Cloud Battle AWS vs.

AWS 52
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.