Thu.May 02, 2024

article thumbnail

Containerize Python Apps with Docker in 5 Easy Steps

KDnuggets

Get up and running with Docker with this tutorial on containerizing Python applications.

Python 148
article thumbnail

How to build a data team

Christophe Blefari

My personal collection of the best resources to bootstrap a data team and get inspired from what others are doing.

Building 130
article thumbnail

Getting Started with PyTest: Effortlessly Write and Run Tests in Python

KDnuggets

Exploring the Test-Driven Development Paradigm in Python

Python 133
article thumbnail

Moving Beyond MTEB and BEIR: Snowflake AI Research Joins Forces with the University of Waterloo to Evolve RAG and Retrieval Benchmarks

Snowflake

To accurately answer business questions using LLMs, companies must augment models with their data. Retrieval Augmented Generation (RAG) is a popular solution to this problem, as it integrates the organization’s factual, real-time data into the prompt for the LLM. While the adoption of RAG has increased, an open question remains: How do enterprises know how effective their system is?

Cloud 120
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Revolutionizing Data in Sports: The Game-Changing Impact of Databricks Marketplace and Delta Sharing

databricks

Unlock the power of advanced sports analytics with Databricks Marketplace and Delta Sharing. Discover how these platforms are transforming the sports industry by enabling seamless data access, collaboration, and real-time insights. Leverage a diverse array of data assets to optimize performance, enhance fan engagement, and gain a competitive edge. Explore the future of sports analytics, powered by Databricks.

Data 111
article thumbnail

Reading and Processing JSON with Rust vs Python.

Confessions of a Data Guy

Have you ever wondered about being explicit in your code vs being vague? I think about this a lot as I’m writing code on a daily basis. I’ve found I like being explicit and verbose when writing code, rather than being vague in what I’m doing most of the time. When it comes to debugging […] The post Reading and Processing JSON with Rust vs Python. appeared first on Confessions of a Data Guy.

Python 100

More Trending

article thumbnail

Top 8 Snowflake Marketplace Questions, Answered

Snowflake

Snowflake Marketplace is designed to give customers and organizations a place to easily find, try and buy data, apps and AI products that help solve their most pressing business problems. We have more than 540 providers, offering over 2,400 live, ready-to-use data products (as of Jan 31, 2024), so there are many options to help you enrich your own data resources, build new data apps and leverage the power of AI on Snowflake.

article thumbnail

What is Machine Learning and Why It Matters: Everything You Need to Know

Knowledge Hut

If you are a machine learning enthusiast and stay in touch with the latest developments, you would have definitely come across the news “Machine learning identifies links between the world's oceans” Wait, we all know how complex it would be to analyse a concept such as oceans and their behaviour which would undoubtedly involve billions of data points associated with many critical parameters such as wind velocities, temperatures, earth’s rotation and many such.

article thumbnail

How Uber Serves Over 40 Million Reads Per Second from Online Storage Using an Integrated Cache

Uber Engineering

Learn how Uber serves over 40 million reads per second from its in-house, distributed database built on top of MySQL using an integrated caching solution: CacheFront.

MySQL 81
article thumbnail

How to install Apache Spark on Windows?

Knowledge Hut

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python, and R and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools, including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Java 98
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How to Supercharge Your Python Classes with Class Methods

Towards Data Science

Four advanced tricks to give your data science and machine learning classes the edge you never knew they needed Continue reading on Towards Data Science »

Python 80
article thumbnail

The Role of Mathematics in Machine Learning

Knowledge Hut

Automation and machine learning have changed our lives. From the most technologically savvy person working in leading digital platform companies like Google or Facebook to someone who is just a smartphone user, there are very few who have not been impacted by artificial intelligence or machine learning in some form or the other; through social media, smart banking, healthcare or even Uber.

article thumbnail

Scaling AI/ML Infrastructure at Uber

Uber Engineering

Accelerating Tomorrow: How Uber Turbocharges AI/ML Frontiers.

77
article thumbnail

Why Working Remotely is an Issue with IT Managers?

Knowledge Hut

The work scenario today is stretching workplace flexibilities to accommodate the needs of professionals. Globally stationed offices have also made extending flexible workplaces a norm. Working remotely is the new trend that is transcending industries. While working remotely comes with its own set of benefits, it isn’t well-suited for some industries or professions.

IT 98
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

DragonCrawl: Generative AI for High-Quality Mobile Testing

Uber Engineering

Learn how Uber improved mobile testing reliability, and increased productivity for thousands of engineers, using machine learning to create DragonCrawl, a highly stable and low-maintenance testing system.

article thumbnail

Powerful Tips for Writing the Best User Stories in Scrum

Knowledge Hut

The main reason most projects move to Agile is they would like to see results fast. These results cannot be achieved quickly if there is a lack of clarity on the outcome, this is where the user story comes in. You might also find it interesting to go through User Stories examples. User stories are like mini single-line business requirements which tell you the Who for, Why, and What to develop.

article thumbnail

Building Scalable, Real-Time Chat to Improve Customer Experience

Uber Engineering

Innovatively scaling its chat channel, Uber’s Customer Obsession Team enhanced global support by transitioning 36% of contact volume to chat, leveraging a new architecture that slashed error rates from 46% to 0.45%, showcasing a significant leap in efficiency and customer satisfaction.

article thumbnail

Role of HR in the Post-COVID Work Environment

Knowledge Hut

A study published recently in the Journal of Applied Psychology found that, “the pandemic has resulted in people getting more stressed and less engaged at work” Covid-times have brought to the fore the shortcomings of the traditional workplace. Organizations are relying on HR to deal with new age disruptions like lack of engagement, employee retention and motivation.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Customer Master Data 101: Challenges and Solutions

Precisely

In the digital era, your data is a crucial key to operational success – and the strategic importance of SAP customer master data can’t be overstated. When it comes to customer-related transactions and analytics, your data’s integrity, accuracy, and accessibility directly impact your business’s ability to operate efficiently and deliver value to customers.

article thumbnail

Apache Spark vs MapReduce: A Detailed Comparison

Knowledge Hut

Why We Need Big Data Frameworks Big data is primarily defined by the volume of a data set. Big data sets are generally huge – measuring tens of terabytes – and sometimes crossing the threshold of petabytes. It is surprising to know how much data is generated every minute. As estimated by DOMO : Over 2.5 quintillion bytes of data are created every single day, and it’s only going to grow from there.

Hadoop 96
article thumbnail

Migrating a Trillion Entries of Uber’s Ledger Data from DynamoDB to LedgerStore

Uber Engineering

Migrating money data with peace of mind. Learn how Uber moved its Money related data spanning trillion of rows & petabytes in size flawlessly.

Data 65
article thumbnail

Selenium vs Testcomplete: A Quick Comparison

Knowledge Hut

Test automation is one of the most cost-effective and time-saving methods to test software products with long maintenance cycles. TestComplete and Selenium are the two most important automation testing tools which provide an open platform for you to easily build continuous testing frameworks to test non-stop with a lightweight execution engine and distributed testing.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.

article thumbnail

How LedgerStore Supports Trillions of Indexes at Uber

Uber Engineering

Learn about how Uber presents a consistent view of distributed financial data across earners, spenders, and merchants powered by indexes in Uber’s homegrown ledger-style database, LedgerStore.

article thumbnail

Apache Spark Use Cases & Applications

Knowledge Hut

Apache Spark was developed by a team at UC Berkeley in 2009. Since then, Apache Spark has seen a very high adoption rate from top-notch technology companies like Google, Facebook, Apple, Netflix etc. The demand has been ever increasing day by day. According to marketanalysis.com survey, the Apache Spark market worldwide will grow at a CAGR of 67% between 2019 and 2022.

Scala 52
article thumbnail

Uber: GC Tuning for Improved Presto Reliability

Uber Engineering

Want to improve the reliability of your Presto cluster with just a few lines of code? Come read how we reduced errors by 90% through improving garbage collection.

Coding 54
article thumbnail

Fatal Mistakes IT Professionals Make While Transitioning Between Teams

Knowledge Hut

In this day it’s very common for companies to shuffle teams and move around people depending on where they are needed or where the company is shorthanded. And one of the major challenges faced is that of effective team building. While the companies face the challenge of team building, the individuals have their own issues to deal with - fitting in.

IT 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Model Excellence Scores: A Framework for Enhancing the Quality of Machine Learning Systems at Scale

Uber Engineering

With the introduction of Model Excellence Scores at Uber, we’re setting a new standard for measuring, monitoring, and maintaining ML model quality–read how this innovative approach aims to enhance ML governance and provide clearer insights.

article thumbnail

How to Install Spark on Ubuntu: An Instructional Guide

Knowledge Hut

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python, and R and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools, including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Hadoop 52
article thumbnail

Ensuring Precision and Integrity: A Deep Dive into Uber’s Accounting Data Testing Strategies

Uber Engineering

Get behind-the-scenes access to Uber’s financial finesse. Explore Uber’s commitment to flawless financials with data-driven excellence.

Data 45
article thumbnail

Docker Vs Virtual Machines(VMs)

Knowledge Hut

Let’s have a quick warm up on the resource management before we dive into the discussion on virtualization and dockers. In today’s multi-technology environments, it becomes inevitable to work on different software and hardware platforms simultaneously. The need to run multiple different machines (Desktops, Laptops, handhelds, and Servers) platforms with customized hardware and software requirements has given the rise to a new world of virtualization in IT industry.

Python 52
article thumbnail

Introducing CDEs to Your Enterprise

Explore how enterprises can enhance developer productivity and onboarding by adopting self-hosted Cloud Development Environments (CDEs). This whitepaper highlights the simplicity and flexibility of cloud-based development over traditional setups, demonstrating how large teams can leverage economies of scale to boost efficiency and developer satisfaction.