Sat.Jul 13, 2019 - Fri.Jul 19, 2019

article thumbnail

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. Integrating Kafka with external systems like MongoDB is best done though the use of Kafka Connect. This API enables users to leverage ready-to-use components that can stream data from external systems into Kafka topics, as well as stream data from Kafka topics into external systems.

MongoDB 21
article thumbnail

Data Labeling That You Can Feel Good About With CloudFactory

Data Engineering Podcast

Summary Successful machine learning and artificial intelligence projects require large volumes of data that is properly labelled. The challenge is that most data is not clean and well annotated, requiring a scalable data labeling process. Ideally this process can be done using the tools and systems that already power your analytics, rather than sending data into a black box.

article thumbnail

How Analytics Answer the Most Challenging Business Questions

Teradata

Analytics can help enterprises answer the toughest business questions by leveraging all of the data across an organization.

Data 80
article thumbnail

Educating Data Analysts at Scale: Cloudera Launches Modern Big Data Analysis with SQL on Coursera

Cloudera

At a time when machine learning, deep learning, and artificial intelligence capture an outsize share of media attention, jobs requiring SQL skills continue to vastly outnumber jobs requiring those more advanced skills. Influential data scientists often point to SQL as the most important yet underrated skill for anyone who works with data. SQL is today—and will remain for the foreseeable future—a vital foundational skill for a wide range of data professionals working in different roles across dif

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Confluent

Have you ever realized that, according to the latest FBI report , more than 80% of all crimes are property crimes, such as burglaries? And that the FBI clearance figures indicate that only 13% of all burglaries in 2017 were cleared due to lack of witnesses and/or physical evidence? How cool would it be to build your own burglar alarm system that can alert you before the actual event takes place simply by using a few network-connected cameras and analyzing the camera images with Apache Kafka ® ,

article thumbnail

Introduction to Streaming Data

Cloud Academy

Designing a streaming data pipeline presents many challenges, particularly around specific technology requirements. When designing a cloud-based solution, an architect is no longer faced with the question, “How do I get this job done with the technology we have?” but rather, “What is the right technology to support my use case?” In this blog post, we will walk through some initial scoping steps and walk through an example.

More Trending

article thumbnail

Open Source: June Updates - New releases, continue to foster diversity and inclusion in tech

Zalando Engineering

Project Highlights Kopf - Kubernetes Operator Pythonic Framework now supports built-in resources and can be used to write controllers of any kind (pods, namespaces, mixed), not only of custom resources. Check out the latest release for more details [link] Skipper publishes new releases weekly. Some of the important features were implemented such as support to proxy Kubernetes API server and support Kubernetes externalName services from ingress.

AWS 52
article thumbnail

Why Your Enterprise Needs a Hybrid Cloud Strategy

Cloudera

A combination of on-premises, public, and private cloud platforms and data centers describes the reality for today’s businesses. For some, this hybrid mix was born of rigorous planning or even a “cloud-first” mandate. For others, however, it evolved organically. Businesses merged, data centers ran out of room to expand, and departments made independent choices or engaged in shadow IT.

Cloud 41
article thumbnail

SQL Query Planning for Operational Analytics

Rockset

Rockset is a schemaless SQL data platform. It is designed to support SQL on raw data. While most SQL databases are strongly and statically typed, data within Rockset is strongly but dynamically typed. Dynamic typing makes it difficult for us to adopt off-the-shelf SQL query optimizers since they are designed for statically typed data where the types of the columns are known ahead of time.

SQL 40