Sat.Aug 17, 2019 - Fri.Aug 23, 2019

article thumbnail

Nothing but NumPy: Understanding & Creating Neural Networks with Computational Graphs from Scratch

KDnuggets

Entirely implemented with NumPy, this extensive tutorial provides a detailed review of neural networks followed by guided code for creating one from scratch with computational graphs.

Coding 123
article thumbnail

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

Building 110
article thumbnail

Building Transactional Systems Using Apache Kafka

Confluent

Traditional relational database systems are ubiquitous in software systems. They are surrounded by a strong ecosystem of tools, such as object-relational mappers and schema migration helpers. Relational databases also provide strong guarantees in the form of ACID transactions, which are loved by developers for their all-or-nothing semantics. Today’s businesses, however, want to process ever-increasing amounts of data.

Kafka 22
article thumbnail

A High Performance Platform For The Full Big Data Lifecycle

Data Engineering Podcast

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale.

Big Data 100
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Top Handy SQL Features for Data Scientists

KDnuggets

Whenever we hear "data," the first thing that comes to mind is SQL! SQL comes with easy and quick to learn features to organize and retrieve data, as well as perform actions on it in order to gain useful insights.

SQL 123
article thumbnail

Applying Netflix DevOps Patterns to Windows

Netflix Tech

Baking Windows with Packer By Justin Phelps and Manuel Correa Customizing Windows images at Netflix was a manual, error-prone, and time consuming process. In this blog post, we describe how we improved the methodology, which technologies we leveraged, and how this has improved service deployment and consistency. Artisan Crafted Images In the Netflix full cycle DevOps culture the team responsible for building a service is also responsible for deploying, testing, infrastructure, and operation of t

AWS 84

More Trending

article thumbnail

A Guide to the Confluent Verified Integrations Program

Confluent

When it comes to writing a connector, there are two things you need to know how to do: how to write the code itself, and helping the world know about your new connector. This post specifically outlines the process by which we verify partner integrations, and is a means of letting the world know about our partner’s contributions to our connector ecosystem.

article thumbnail

Order Matters: Alibaba’s Transformer-based Recommender System

KDnuggets

Alibaba, the largest e-commerce platform in China, is a powerhouse not only when it comes to e-commerce, but also when it comes to recommender systems research. Their latest paper, Behaviour Sequence Transformer for E-commerce Recommendation in Alibaba, is yet another publication that pushes the state of the art in recommender systems.

Systems 123
article thumbnail

How We Reduced DynamoDB Costs by Using DynamoDB Streams and Scans More Efficiently

Rockset

Many of our users implement operational reporting and analytics on DynamoDB using Rockset as a SQL intelligence layer to serve live dashboards and applications. As an engineering team, we are constantly searching for opportunities to improve their SQL-on-DynamoDB experience. For the past few weeks, we have been hard at work tuning the performance of our DynamoDB ingestion process.

Bytes 52
article thumbnail

Announcing Bottom Navigator

Pandora Engineering

An Android Multiple Backstack Bottom Navigation Library Pandora’s latest mobile redesign brings the bottom navigation pattern to our apps. Bottom navigation has become a popular design choice for many apps due to its many advantages including easy one-handed use and enhanced discoverability of top app destinations. When Pandora embarked on this project our designers had a clear vision of how navigation should work, a vision that in many ways is familiar to users of other popular apps like Instag

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Data is Not the New Oil. Data is Water!

Teradata

If you work in data analytics or a related field, you’ve probably heard the mantra that data is the new oil. But data is not oil, it's water. Find out why.

Data 45
article thumbnail

Is Kaggle Learn a “Faster Data Science Education?”

KDnuggets

Kaggle Learn is "Faster Data Science Education," featuring micro-courses covering an array of data skills for immediate application. Courses may be made with newcomers in mind, but the platform and its content is proving useful as a review for more seasoned practitioners as well.

Education 123
article thumbnail

Building the New Uber Freight App as Lists of Modular, Reusable Components

Uber Engineering

As Uber Freight marked its second anniversary, we went back to the drawing board to redesign its app. The original carrier app was successful for owner-operators with one or two drivers, but it wasn’t optimized for larger fleets—feedback we … The post Building the New Uber Freight App as Lists of Modular, Reusable Components appeared first on Uber Engineering Blog.

article thumbnail

The Kafka Connect Plugin for Rockset and How It Works

Rockset

Rockset continuously ingests data streams from Kafka, without the need for a fixed schema, and serves fast SQL queries on that data. We created the Kafka Connect Plugin for Rockset to export data from Kafka and send it to a collection of documents in Rockset. Users can then build real-time dashboards or data APIs on top of the data in Rockset. This blog covers how we implemented the plugin.

Kafka 40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Deep Learning for NLP: Creating a Chatbot with Keras!

KDnuggets

Learn how to use Keras to build a Recurrent Neural Network and create a Chatbot! Who doesn’t like a friendly-robotic personal assistant?

article thumbnail

Detecting stationarity in time series data

KDnuggets

Explore how to determine if your time series data is generated by a stationary process and how to handle the necessary assumptions and potential interpretations of your result.

Data 123
article thumbnail

Understanding Decision Trees for Classification in Python

KDnuggets

This tutorial covers decision trees for classification also known as classification trees, including the anatomy of classification trees, how classification trees make predictions, using scikit-learn to make classification trees, and hyperparameter tuning.

Python 122
article thumbnail

An Overview of Python’s Datatable package

KDnuggets

Modern machine learning applications need to process a humongous amount of data and generate multiple features. Python’s datatable module was created to address this issue. It is a toolkit for performing big data (up to 100GB) operations on a single-node machine, at the maximum possible speed.

Big Data 122
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Math for Programmers

KDnuggets

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.

article thumbnail

Gender Diversity in AI Research

KDnuggets

Through an analysis of 1.5M papers from arXiv, this study reviews the evolution of gender diversity across disciplines, countries, and institutions as well as the semantic differences between AI papers with and without female co-authors.

110
110
article thumbnail

Proptech and the proper use of technology for house sales prediction

KDnuggets

Using the ATTOM dataset, we extracted data on sales transactions in the USA, loans, and estimated values of property. We developed an optimal prediction model from correlations in the time and status of ownership as well as the time of the year of sales fluctuations.

article thumbnail

Automate Stacking In Python: How to Boost Your Performance While Saving Time

KDnuggets

Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it.

Python 104
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Crafting an Elevator Pitch for your Data Science Startup

KDnuggets

If you are launching a data science startup, these tips will give you a head start as you seek capital for seed funding or your next level of growth.

article thumbnail

Manual Coding or Automated Data Integration – What’s the Best Way to Integrate Your Enterprise Data?

KDnuggets

What’s the best way to execute your data integration tasks: writing manual code or using ETL tool? Find out the approach that best fits your organization’s needs and the factors that influence it.

article thumbnail

Which skills / knowledge areas do you currently have, and which do you want to add or improve?

KDnuggets

New KDnuggets survey looks to find out what skills our readers currently use, and which they are looking to add or improve. Take a few minutes to participate.

article thumbnail

Comparing Decision Tree Algorithms: Random Forest vs. XGBoost

KDnuggets

Check out this tutorial walking you through a comparison of XGBoost and Random Forest. You'll learn how to create a decision tree, how to do tree bagging, and how to do tree boosting.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Artificial Intelligence Is Not Intelligence – Interview With Andy Cotgreave (Keynote Speaker at Crunch Conf)

KDnuggets

Crunch is coming to Budapest, Hungary on 16-18 Oct. Use code KDNuggets to save on Data Science, Data Engineering, or BI tracks. But first, read this interview with keynote speaker Andy Cotgreave.

BI 85
article thumbnail

eBook: How to Enhance Privacy in Data Science

KDnuggets

Check out this eBook, How to Enhance Privacy in Data Science, to equip yourself with the tools to enhance privacy in data science, including transforming data in a manner that protects the privacy, an overview of the challenges and opportunities of privacy-aware analytics, and more.

article thumbnail

How LinkedIn, Uber, Lyft, Airbnb and Netflix are Solving Data Management and Discovery for Machine Learning Solutions

KDnuggets

As machine learning evolves, the need for tools and platforms that automate the lifecycle management of training and testing datasets is becoming increasingly important. Fast growing technology companies like Uber or LinkedIn have been forced to build their own in-house data lifecycle management solutions to power different groups of machine learning models.

article thumbnail

Lincoln Clean Energy: Director, Asset Performance [Austin, TX]

KDnuggets

Seeking an Asset Performance Director, a role which requires an individual that possesses a strong technical skill set and the ability to communicate findings effectively throughout the organization.

73
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.