SlideShare a Scribd company logo
1 of 87
Daniel Jacobson
• @daniel_jacobson
• http://www.linkedin.com/in/danieljacobson
• http://www.slideshare.net/danieljacobson
Sangeeta Narayanan
• @sangeetan
• http://www.linkedin.com/in/sangeetanarayanan/
I have added more detail in the
notes field for each slide to provide
additional context
Strategy
Lessons
Implementation
Lessons
Know Your Audience
Target Audience Dictates
Everything Else
The target audience should be
the single biggest influence on
your API design
Small Set of Known Developers
SSKDs
Large Set of Unknown Developers
LSUDs
Both
SSKDs and LSUDs
No matter what…
Figure this out first!
Target Audience Influence
• Team Identity
• Staffing Decisions
• System Architecture
• SLAs
• Development Velocity
• Security Needs
Netflix API : Key Responsibilities
2008
• Broker data between internal services and
public developers
• Grow community of public developers
• Optimize design for reusability
Evangelists
Partner
Engagement
and Support
API
Engineers
Technical
Writer
QA
Specialists
Private API Public API
< 0.3% of total
API traffic *
* 11 years worth of public API requests = one day of private API requests
Netflix API : Key Responsibilities
Today
• Broker data between services and devices
• System resiliency
• Scaling the system
• High velocity development
• Insights
The consumers of the API are now
Netflix subscribers
We are now responsible for ensuring
subscribers can stream
Application
Engineers
Platform
Engineers
Technical
Writer
Tools and
Automation
Engineers
Team is now 6x its
size from 2010
Separation of Concerns
Primary Responsibilities of APIs
• Data Gathering
– Retrieving the requested data from one or many local
or remote data sources
• Data Formatting
– Preparing a structured payload to the requesting
agent
• Data Delivery
– Delivering the structured payload to the requesting
agent
There are two players in APIs
API Provider API Consumer
API Provider
PROVIDES
API Consumer
CONSUMES
Traditional API Interactions
API Provider
PROVIDES
EVERYTHING
API Consumer
CONSUMES
Everything means, API Provider does:
• Data Gathering
• Data Formatting
• Data Delivery
• (among other things)
Traditional API Interactions
Why do most API providers provide
everything?
• Many APIs have a large set of unknown and
external developers
• Generic API design tends to be easier for
teams closer to the source
• Centralized API functions makes them easier
to support
Why do most API providers provide
everything?
• Many APIs have a large set of unknown and
external developers
• Generic API design tends to be easier for
teams closer to the source
• Centralized API functions makes them easier
to support
Data Gathering Data Formatting Data Delivery
API Consumer
Don’t care how data is
gathered, as long as it
is gathered
Each consumer cares a
lot about the format
for that specific use
Each consumer cares a
lot about how payload
is delivered
API Provider
Care a lot about how
the data is gathered
Only cares about the
format to the extent it
is easy to support
Only cares that the
delivery method is
easy to support
Separation of Concerns
To be a better provider, the API should address the
separation of concerns of the three core functions
One Size Doesn’t Fit All
Embrace the Differences
Data Gathering Data Formatting Data Delivery
API Consumer
Don’t care how data is
gathered, as long as it
is gathered
Each consumer cares a
lot about the format
for that specific use
Each consumer cares a
lot about how payload
is delivered
API Provider
Care a lot about how
the data is gathered
Only cares about the
format to the extent it
is easy to support
Only cares that the
delivery method is
easy to support
Separation of Concerns
Screen Real Estate
Controllers
Technical Capabilities
Resource-Based API
vs.
Experience-Based API
Resource-Based Requests
• /users/<id>/ratings/title
• /users/<id>/queues
• /users/<id>/queues/instant
• /users/<id>/recommendations
• /catalog/titles/movie
• /catalog/titles/series
• /catalog/people
REST API
RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
Network Border Network Border
Experience-Based Requests
• /ps3/homescreen
JAVA API
Network Border Network Border
RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
Client Adapter Code
Be Pragmatic, Not Dogmatic
Common API Debates
• XML / JSON
• REST / SOAP
• OAuth / Other
• Versioning
• Hypermedia
Who Cares!?!?
Just Solve Problems for your
Audience
Embrace Change
Impermanence and Versionless APIs
v1.0
v1.5
v2.0
Versioning for APIs
1.0
1.5
2.0
3.0?
4.0?
5.0?
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Eliminate Versioning?
1.0
1.5
2.0
New Architecture
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
JAVA API
Network Border Network Border
RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
Act Fast, React Fast
Favor Velocity Over Completeness
Delivery Using Buckets
Testing
Production Traffic
Old Code (Baseline) New Code (Canary)
~1% Traffic
Deployments
Old Code New Code
Production Traffic
Enable Others to
Act Fast, React Fast
JAVA API
Network Border Network Border
RECOMME
NDATIONS
MOVIE
DATA
SIMILAR
MOVIES
AUTH
MEMBER
DATA
A/B
TESTS
START-
UP
RATINGS
Dynamically deployed
endpoints
Statically deployed
libraries
Dynamically deployed
endpoints
Dependency Canaries
Personalization
Service
API
• Build
• Test
• Deploy Service
• Release Lib
Pers.
Lib
• Integrate Lib
• Build
• Test
• Deploy Service
UI Script
Iterations in Hours or Days
Access
Data
Personalization
Service
API
• Build
• Test
• Deploy Service
• Release Lib
• Publish to API
Pers.
Lib
UI Script
Iterations in Minutes?
Access
Data
• Integrate Lib
• Build
• Test
• Deploy Service
Internal Developers Need
Engagement Too
Documentation
Tools
REPL:
Trainings
Failure is Inevitable
~5,000,000,000
Requests per day
~35
Dependencies
~600
Libraries
Things will break!
Scale at All Costs
-
10
20
30
40
50
60
June, 2010 June, 2011 June, 2012
RequestsinBillions
API Requests Per Month
Incoming Traffic
Predictive Auto Scaling
Predicted vs. Actual RPS
Reactive + Predictive Autoscaling
No. of instances
1. Know Your Audience
2. Separation of Concerns
3. One Size Doesn’t Fit All
4. Be Pragmatic, Not
Dogmatic
5. Embrace Change
1. Act Fast, React Fast
2. Enable Others to Act
Fast, React Fast
3. Internal Developers
Need Engagement Too
4. Failure is Inevitable
5. Scale at All Costs
http://github.com/Netflix
Daniel Jacobson
• @daniel_jacobson
• http://www.linkedin.com/in/danieljacobson
• http://www.slideshare.net/danieljacobson
Sangeeta Narayanan
• @sangeetan
• http://www.linkedin.com/in/sangeetanarayanan/

More Related Content

What's hot

Performance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introPerformance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introMykola Kovsh
 
UX and Usability Workshop Southampton Solent University
UX and Usability Workshop Southampton Solent University UX and Usability Workshop Southampton Solent University
UX and Usability Workshop Southampton Solent University Dr.Mohammed Alhusban
 
USER ACCEPTANCE TESTING
USER ACCEPTANCE TESTINGUSER ACCEPTANCE TESTING
USER ACCEPTANCE TESTINGKADARI SHIVRAJ
 
High-throughput and Automated Process Development for Accelerated Biotherapeu...
High-throughput and Automated Process Development for Accelerated Biotherapeu...High-throughput and Automated Process Development for Accelerated Biotherapeu...
High-throughput and Automated Process Development for Accelerated Biotherapeu...KBI Biopharma
 
Quality by Design
Quality by DesignQuality by Design
Quality by Designmahesh745
 
Software Testing Services
Software Testing ServicesSoftware Testing Services
Software Testing ServicesFuad Mak
 
Performance testing : An Overview
Performance testing : An OverviewPerformance testing : An Overview
Performance testing : An Overviewsharadkjain
 
Load and performance testing
Load and performance testingLoad and performance testing
Load and performance testingQualitest
 
Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016Tony Barber
 
QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration Merck Life Sciences
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data ManagementMahesh Koppula
 
Commercializing antibody-drug conjugates: a CMO’s journey
Commercializing antibody-drug conjugates: a CMO’s journeyCommercializing antibody-drug conjugates: a CMO’s journey
Commercializing antibody-drug conjugates: a CMO’s journeyMerck Life Sciences
 
Clinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLSClinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLSKatalyst HLS
 
Product lifecycle management in the pharmaceutical industry
Product lifecycle management in the pharmaceutical industryProduct lifecycle management in the pharmaceutical industry
Product lifecycle management in the pharmaceutical industryGeorgi Daskalov
 
Load Testing Strategy 101
Load Testing Strategy 101Load Testing Strategy 101
Load Testing Strategy 101iradari
 
Quality Metrics in Pharmaceuticals
Quality Metrics in PharmaceuticalsQuality Metrics in Pharmaceuticals
Quality Metrics in PharmaceuticalsPiyush Tripathi
 
RIM - regulatory information management
RIM - regulatory information managementRIM - regulatory information management
RIM - regulatory information managementV E R A
 
Quality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability DeckQuality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability DeckSowmak Bardhan
 
Clinical Research Presentation
Clinical Research PresentationClinical Research Presentation
Clinical Research Presentationdeepikashankar
 

What's hot (20)

Performance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introPerformance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter intro
 
UX and Usability Workshop Southampton Solent University
UX and Usability Workshop Southampton Solent University UX and Usability Workshop Southampton Solent University
UX and Usability Workshop Southampton Solent University
 
USER ACCEPTANCE TESTING
USER ACCEPTANCE TESTINGUSER ACCEPTANCE TESTING
USER ACCEPTANCE TESTING
 
High-throughput and Automated Process Development for Accelerated Biotherapeu...
High-throughput and Automated Process Development for Accelerated Biotherapeu...High-throughput and Automated Process Development for Accelerated Biotherapeu...
High-throughput and Automated Process Development for Accelerated Biotherapeu...
 
Quality by Design
Quality by DesignQuality by Design
Quality by Design
 
Software Testing Services
Software Testing ServicesSoftware Testing Services
Software Testing Services
 
Performance testing : An Overview
Performance testing : An OverviewPerformance testing : An Overview
Performance testing : An Overview
 
QA Process Overview
QA Process OverviewQA Process Overview
QA Process Overview
 
Load and performance testing
Load and performance testingLoad and performance testing
Load and performance testing
 
Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016Testing Centre of Excellence Model 2016
Testing Centre of Excellence Model 2016
 
QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration QBD for Downstream Virus Filtration
QBD for Downstream Virus Filtration
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data Management
 
Commercializing antibody-drug conjugates: a CMO’s journey
Commercializing antibody-drug conjugates: a CMO’s journeyCommercializing antibody-drug conjugates: a CMO’s journey
Commercializing antibody-drug conjugates: a CMO’s journey
 
Clinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLSClinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLS
 
Product lifecycle management in the pharmaceutical industry
Product lifecycle management in the pharmaceutical industryProduct lifecycle management in the pharmaceutical industry
Product lifecycle management in the pharmaceutical industry
 
Load Testing Strategy 101
Load Testing Strategy 101Load Testing Strategy 101
Load Testing Strategy 101
 
Quality Metrics in Pharmaceuticals
Quality Metrics in PharmaceuticalsQuality Metrics in Pharmaceuticals
Quality Metrics in Pharmaceuticals
 
RIM - regulatory information management
RIM - regulatory information managementRIM - regulatory information management
RIM - regulatory information management
 
Quality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability DeckQuality Assurance/Testing Overview & Capability Deck
Quality Assurance/Testing Overview & Capability Deck
 
Clinical Research Presentation
Clinical Research PresentationClinical Research Presentation
Clinical Research Presentation
 

Viewers also liked

Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Daniel Jacobson
 
Redesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCONRedesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCONDaniel Jacobson
 
Maintaining the Netflix Front Door - Presentation at Intuit Meetup
Maintaining the Netflix Front Door - Presentation at Intuit MeetupMaintaining the Netflix Front Door - Presentation at Intuit Meetup
Maintaining the Netflix Front Door - Presentation at Intuit MeetupDaniel Jacobson
 
Set Your Content Free! : Case Studies from Netflix and NPR
Set Your Content Free! : Case Studies from Netflix and NPRSet Your Content Free! : Case Studies from Netflix and NPR
Set Your Content Free! : Case Studies from Netflix and NPRDaniel Jacobson
 
API Revolutions : Netflix's API Redesign
API Revolutions : Netflix's API RedesignAPI Revolutions : Netflix's API Redesign
API Revolutions : Netflix's API RedesignDaniel Jacobson
 
Culture (Original 2009 version)
Culture (Original 2009 version)Culture (Original 2009 version)
Culture (Original 2009 version)Reed Hastings
 
How Do Developers React to API Deprecation? The Case of a Smalltalk Ecosystem
How Do Developers React to API Deprecation? The Case of a Smalltalk EcosystemHow Do Developers React to API Deprecation? The Case of a Smalltalk Ecosystem
How Do Developers React to API Deprecation? The Case of a Smalltalk Ecosystemmircea.lungu
 
API Business Models
API Business ModelsAPI Business Models
API Business ModelsJohn Musser
 
Scaling the Netflix API - From Atlassian Dev Den
Scaling the Netflix API - From Atlassian Dev DenScaling the Netflix API - From Atlassian Dev Den
Scaling the Netflix API - From Atlassian Dev DenDaniel Jacobson
 
Versioning schemes and branching models for Continuous Delivery - Continuous ...
Versioning schemes and branching models for Continuous Delivery - Continuous ...Versioning schemes and branching models for Continuous Delivery - Continuous ...
Versioning schemes and branching models for Continuous Delivery - Continuous ...Pavel Chunyayev
 
History and Future of the Netflix API - Mashery Evolution of Distribution
History and Future of the Netflix API - Mashery Evolution of DistributionHistory and Future of the Netflix API - Mashery Evolution of Distribution
History and Future of the Netflix API - Mashery Evolution of DistributionDaniel Jacobson
 
Netflix API - Presentation to PayPal
Netflix API - Presentation to PayPalNetflix API - Presentation to PayPal
Netflix API - Presentation to PayPalDaniel Jacobson
 
Automotive Grade APIs – designing for longevity
Automotive Grade APIs – designing for longevityAutomotive Grade APIs – designing for longevity
Automotive Grade APIs – designing for longevityNordic APIs
 
Why should C-Level care about APIs? It's the new economy, stupid.
Why should C-Level care about APIs? It's the new economy, stupid.Why should C-Level care about APIs? It's the new economy, stupid.
Why should C-Level care about APIs? It's the new economy, stupid.Fabernovel
 
NuGet package CI and CD
NuGet package CI and CDNuGet package CI and CD
NuGet package CI and CDYu GUAN
 

Viewers also liked (20)

Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016
 
Culture
CultureCulture
Culture
 
Redesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCONRedesigning the Netflix API - OSCON
Redesigning the Netflix API - OSCON
 
Maintaining the Netflix Front Door - Presentation at Intuit Meetup
Maintaining the Netflix Front Door - Presentation at Intuit MeetupMaintaining the Netflix Front Door - Presentation at Intuit Meetup
Maintaining the Netflix Front Door - Presentation at Intuit Meetup
 
Netflix API
Netflix APINetflix API
Netflix API
 
Set Your Content Free! : Case Studies from Netflix and NPR
Set Your Content Free! : Case Studies from Netflix and NPRSet Your Content Free! : Case Studies from Netflix and NPR
Set Your Content Free! : Case Studies from Netflix and NPR
 
API Revolutions : Netflix's API Redesign
API Revolutions : Netflix's API RedesignAPI Revolutions : Netflix's API Redesign
API Revolutions : Netflix's API Redesign
 
Culture (Original 2009 version)
Culture (Original 2009 version)Culture (Original 2009 version)
Culture (Original 2009 version)
 
Sulle tracce di Tullo Morgagni
Sulle tracce di Tullo MorgagniSulle tracce di Tullo Morgagni
Sulle tracce di Tullo Morgagni
 
How Do Developers React to API Deprecation? The Case of a Smalltalk Ecosystem
How Do Developers React to API Deprecation? The Case of a Smalltalk EcosystemHow Do Developers React to API Deprecation? The Case of a Smalltalk Ecosystem
How Do Developers React to API Deprecation? The Case of a Smalltalk Ecosystem
 
API Business Models
API Business ModelsAPI Business Models
API Business Models
 
Scaling the Netflix API - From Atlassian Dev Den
Scaling the Netflix API - From Atlassian Dev DenScaling the Netflix API - From Atlassian Dev Den
Scaling the Netflix API - From Atlassian Dev Den
 
Versioning schemes and branching models for Continuous Delivery - Continuous ...
Versioning schemes and branching models for Continuous Delivery - Continuous ...Versioning schemes and branching models for Continuous Delivery - Continuous ...
Versioning schemes and branching models for Continuous Delivery - Continuous ...
 
History and Future of the Netflix API - Mashery Evolution of Distribution
History and Future of the Netflix API - Mashery Evolution of DistributionHistory and Future of the Netflix API - Mashery Evolution of Distribution
History and Future of the Netflix API - Mashery Evolution of Distribution
 
Netflix API - Presentation to PayPal
Netflix API - Presentation to PayPalNetflix API - Presentation to PayPal
Netflix API - Presentation to PayPal
 
Automotive Grade APIs – designing for longevity
Automotive Grade APIs – designing for longevityAutomotive Grade APIs – designing for longevity
Automotive Grade APIs – designing for longevity
 
Why should C-Level care about APIs? It's the new economy, stupid.
Why should C-Level care about APIs? It's the new economy, stupid.Why should C-Level care about APIs? It's the new economy, stupid.
Why should C-Level care about APIs? It's the new economy, stupid.
 
Zuul @ Netflix SpringOne Platform
Zuul @ Netflix SpringOne PlatformZuul @ Netflix SpringOne Platform
Zuul @ Netflix SpringOne Platform
 
NuGet package CI and CD
NuGet package CI and CDNuGet package CI and CD
NuGet package CI and CD
 
Separation of concerns - DPC12
Separation of concerns - DPC12Separation of concerns - DPC12
Separation of concerns - DPC12
 

Similar to Top 10 Lessons Learned from the Netflix API - OSCON 2014

Oscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons LearnedOscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons LearnedSangeeta Narayanan
 
Maintaining the Front Door to Netflix
Maintaining the Front Door to NetflixMaintaining the Front Door to Netflix
Maintaining the Front Door to NetflixBenjamin Schmaus
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxapidays
 
Building the Eventbrite API Ecosystem
Building the Eventbrite API EcosystemBuilding the Eventbrite API Ecosystem
Building the Eventbrite API EcosystemMitch Colleran
 
Building a REST API for Longevity
Building a REST API for LongevityBuilding a REST API for Longevity
Building a REST API for LongevityMuleSoft
 
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...apidays
 
APIs as a Product Strategy
APIs as a Product StrategyAPIs as a Product Strategy
APIs as a Product StrategyRavi Kumar
 
The Ultimate API Publisher's Guide
The Ultimate API Publisher's GuideThe Ultimate API Publisher's Guide
The Ultimate API Publisher's GuidePronovix
 
Best Practices for Salesforce Data Access
Best Practices for Salesforce Data AccessBest Practices for Salesforce Data Access
Best Practices for Salesforce Data AccessSalesforce Developers
 
Service api design validation & collaboration
Service api design validation & collaborationService api design validation & collaboration
Service api design validation & collaborationUchit Vyas ☁
 
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMGapidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMGapidays
 
Pain Points In API Development? They’re Everywhere
Pain Points In API Development? They’re EverywherePain Points In API Development? They’re Everywhere
Pain Points In API Development? They’re EverywhereNordic APIs
 
Recipes for API Ninjas
Recipes for API NinjasRecipes for API Ninjas
Recipes for API NinjasNordic APIs
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy IntroductionDoug Gregory
 
DataHero / Eventbrite - API Best Practices
DataHero / Eventbrite - API Best PracticesDataHero / Eventbrite - API Best Practices
DataHero / Eventbrite - API Best PracticesJeff Zabel
 
Practical Application of API-First in microservices development
Practical Application of API-First in microservices developmentPractical Application of API-First in microservices development
Practical Application of API-First in microservices developmentChavdar Baikov
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets Redar Ismail
 
API Product Opportunity Responsibility Nicolas Sierro 2015.pptx
API Product Opportunity Responsibility Nicolas Sierro 2015.pptxAPI Product Opportunity Responsibility Nicolas Sierro 2015.pptx
API Product Opportunity Responsibility Nicolas Sierro 2015.pptxBlockchainizator
 

Similar to Top 10 Lessons Learned from the Netflix API - OSCON 2014 (20)

Oscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons LearnedOscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons Learned
 
Maintaining the Front Door to Netflix
Maintaining the Front Door to NetflixMaintaining the Front Door to Netflix
Maintaining the Front Door to Netflix
 
Lessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptxLessons learned on the Azure API Stewardship Journey.pptx
Lessons learned on the Azure API Stewardship Journey.pptx
 
Building the Eventbrite API Ecosystem
Building the Eventbrite API EcosystemBuilding the Eventbrite API Ecosystem
Building the Eventbrite API Ecosystem
 
API ARU-ARU
API ARU-ARUAPI ARU-ARU
API ARU-ARU
 
Building a REST API for Longevity
Building a REST API for LongevityBuilding a REST API for Longevity
Building a REST API for Longevity
 
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...
apidays LIVE Paris 2021 - Lessons from the API Stewardship Journey in Azure b...
 
Flavours of APIs
Flavours of APIs Flavours of APIs
Flavours of APIs
 
APIs as a Product Strategy
APIs as a Product StrategyAPIs as a Product Strategy
APIs as a Product Strategy
 
The Ultimate API Publisher's Guide
The Ultimate API Publisher's GuideThe Ultimate API Publisher's Guide
The Ultimate API Publisher's Guide
 
Best Practices for Salesforce Data Access
Best Practices for Salesforce Data AccessBest Practices for Salesforce Data Access
Best Practices for Salesforce Data Access
 
Service api design validation & collaboration
Service api design validation & collaborationService api design validation & collaboration
Service api design validation & collaboration
 
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMGapidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
apidays LIVE New York 2021 - Service API design validation by Uchit Vyas, KPMG
 
Pain Points In API Development? They’re Everywhere
Pain Points In API Development? They’re EverywherePain Points In API Development? They’re Everywhere
Pain Points In API Development? They’re Everywhere
 
Recipes for API Ninjas
Recipes for API NinjasRecipes for API Ninjas
Recipes for API Ninjas
 
API Strategy Introduction
API Strategy IntroductionAPI Strategy Introduction
API Strategy Introduction
 
DataHero / Eventbrite - API Best Practices
DataHero / Eventbrite - API Best PracticesDataHero / Eventbrite - API Best Practices
DataHero / Eventbrite - API Best Practices
 
Practical Application of API-First in microservices development
Practical Application of API-First in microservices developmentPractical Application of API-First in microservices development
Practical Application of API-First in microservices development
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets
 
API Product Opportunity Responsibility Nicolas Sierro 2015.pptx
API Product Opportunity Responsibility Nicolas Sierro 2015.pptxAPI Product Opportunity Responsibility Nicolas Sierro 2015.pptx
API Product Opportunity Responsibility Nicolas Sierro 2015.pptx
 

More from Daniel Jacobson

Maintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix APIMaintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix APIDaniel Jacobson
 
Why API? - Business of APIs Conference
Why API? - Business of APIs ConferenceWhy API? - Business of APIs Conference
Why API? - Business of APIs ConferenceDaniel Jacobson
 
Scaling the Netflix API - OSCON
Scaling the Netflix API - OSCONScaling the Netflix API - OSCON
Scaling the Netflix API - OSCONDaniel Jacobson
 
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech ConferenceNetflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech ConferenceDaniel Jacobson
 
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SFTechniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SFDaniel Jacobson
 
APIs for Internal Audiences - Netflix - App Dev Conference
APIs for Internal Audiences - Netflix - App Dev ConferenceAPIs for Internal Audiences - Netflix - App Dev Conference
APIs for Internal Audiences - Netflix - App Dev ConferenceDaniel Jacobson
 
Netflix API : BAPI 2011 Presentation : SF
Netflix API : BAPI 2011 Presentation : SFNetflix API : BAPI 2011 Presentation : SF
Netflix API : BAPI 2011 Presentation : SFDaniel Jacobson
 
Presentation to ESPN about the Netflix API
Presentation to ESPN about the Netflix APIPresentation to ESPN about the Netflix API
Presentation to ESPN about the Netflix APIDaniel Jacobson
 
The future-of-netflix-api
The future-of-netflix-apiThe future-of-netflix-api
The future-of-netflix-apiDaniel Jacobson
 
NPR Presentation at Wolfram Data Summit 2010
NPR Presentation at Wolfram Data Summit 2010NPR Presentation at Wolfram Data Summit 2010
NPR Presentation at Wolfram Data Summit 2010Daniel Jacobson
 
NPR: Digital Distribution Strategy: OSCON2010
NPR: Digital Distribution Strategy: OSCON2010NPR: Digital Distribution Strategy: OSCON2010
NPR: Digital Distribution Strategy: OSCON2010Daniel Jacobson
 
NPR's Digital Distribution and Mobile Strategy
NPR's Digital Distribution and Mobile StrategyNPR's Digital Distribution and Mobile Strategy
NPR's Digital Distribution and Mobile StrategyDaniel Jacobson
 
NPR API Usage and Metrics
NPR API Usage and MetricsNPR API Usage and Metrics
NPR API Usage and MetricsDaniel Jacobson
 
OpenID Adoption UX Summit
OpenID Adoption UX SummitOpenID Adoption UX Summit
OpenID Adoption UX SummitDaniel Jacobson
 

More from Daniel Jacobson (16)

Maintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix APIMaintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix API
 
Why API? - Business of APIs Conference
Why API? - Business of APIs ConferenceWhy API? - Business of APIs Conference
Why API? - Business of APIs Conference
 
Scaling the Netflix API - OSCON
Scaling the Netflix API - OSCONScaling the Netflix API - OSCON
Scaling the Netflix API - OSCON
 
Scaling the Netflix API
Scaling the Netflix APIScaling the Netflix API
Scaling the Netflix API
 
Netflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech ConferenceNetflix API: Keynote at Disney Tech Conference
Netflix API: Keynote at Disney Tech Conference
 
Techniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SFTechniques for Scaling the Netflix API - QCon SF
Techniques for Scaling the Netflix API - QCon SF
 
APIs for Internal Audiences - Netflix - App Dev Conference
APIs for Internal Audiences - Netflix - App Dev ConferenceAPIs for Internal Audiences - Netflix - App Dev Conference
APIs for Internal Audiences - Netflix - App Dev Conference
 
Netflix API : BAPI 2011 Presentation : SF
Netflix API : BAPI 2011 Presentation : SFNetflix API : BAPI 2011 Presentation : SF
Netflix API : BAPI 2011 Presentation : SF
 
Presentation to ESPN about the Netflix API
Presentation to ESPN about the Netflix APIPresentation to ESPN about the Netflix API
Presentation to ESPN about the Netflix API
 
The future-of-netflix-api
The future-of-netflix-apiThe future-of-netflix-api
The future-of-netflix-api
 
NPR Presentation at Wolfram Data Summit 2010
NPR Presentation at Wolfram Data Summit 2010NPR Presentation at Wolfram Data Summit 2010
NPR Presentation at Wolfram Data Summit 2010
 
NPR: Digital Distribution Strategy: OSCON2010
NPR: Digital Distribution Strategy: OSCON2010NPR: Digital Distribution Strategy: OSCON2010
NPR: Digital Distribution Strategy: OSCON2010
 
NPR's Digital Distribution and Mobile Strategy
NPR's Digital Distribution and Mobile StrategyNPR's Digital Distribution and Mobile Strategy
NPR's Digital Distribution and Mobile Strategy
 
NPR API Usage and Metrics
NPR API Usage and MetricsNPR API Usage and Metrics
NPR API Usage and Metrics
 
OpenID Adoption UX Summit
OpenID Adoption UX SummitOpenID Adoption UX Summit
OpenID Adoption UX Summit
 
NPR : Examples of COPE
NPR : Examples of COPENPR : Examples of COPE
NPR : Examples of COPE
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Top 10 Lessons Learned from the Netflix API - OSCON 2014

Editor's Notes

  1. The lessons that we discuss in these slides fall into two buckets: strategy and implementation.
  2. In some cases, the audience will be a small set of known developers (SSKDs). These developers are generally engineers within your company or one with whom you are partnering.
  3. In other cases, the audience may be a large set of unknown developers. This audience is typically associated with public APIs.
  4. And in some cases, the API will target both audience types.
  5. This is a short list of the things that the target audience will influence.
  6. For Netflix, we started out with a public API, with the audience being a large set of unknown developers. There were no internal use cases at launch.
  7. Based on the target audience of unknown developers, we staffed accordingly. The team was relatively small, with skills around development, evangelism, partnering, testing and documentation.
  8. As streaming became more critical to the company, we started having devices use the API. Our first mistake was that we were probably too late to pivot our architecture based on our change in target audience. At the time, we had many devices call into our REST API, the same one that we used for the unknown developers.
  9. But eventually, the data demonstrated that the architectural change was needed. This chart shows that the private API completely drarfs the public API in terms of requests. The private API does about five billion requests per day while the public API does between one and two million. This disparity clearly demonstrates the need for us to target the API to the small set of known developers – Netflix’s UI engineers – who build the vast majority of the experiences on Netflix devices.
  10. Given the shift in responsibilities, we positioned the team accordingly, hiring for skills mostly around engineering.
  11. And the team size grew by about 6x in the last few years. If the target audience was still the public API, it is likely that the team size would have grown, but less significantly (perhaps 2x) in that time frame.
  12. API consumers care a lot about data formatting and delivery, but each consumer, in such a diverse ecosystem, cares about them differently. For some devices, they may want an XML payload delivered as a complete document, while others may need JSON, protobuffer or some other format, potentially delivered as streamed bits. Because of these diverse needs, we need to separate out the concerns to better enable the consumers to get what they need.
  13. Most companies focus on a small handful of device implementations, most notably Android and iOS devices.
  14. At Netflix, we have more than 1,000 different device types that we support. Across those devices, there is a high degree of variability. As a result, we have seen inefficiencies and problems emerge across our implementations. Those issues also translate into issues with the API interaction.
  15. For example, screen size could significantly affect what the API should deliver to the UI. TVs with bigger screens that can potentially fit more titles and more metadata per title than a mobile phone. Do we need to send all of the extra bits for fields or items that are not needed, requiring the device itself to drop items on the floor? Or can we optimize the deliver of those bits on a per-device basis? Different devices have different controllers as well. Some, like the iPad, allow for fast swipe interactions so the content needs to be there for the entire row. Other devices, like smart TVs or game some game consoles have LRUD controllers, so it at least gives the opportunity to fetch the data as the row gets navigated. And the technical capabilities of the devices will influence the interactions as well. Some have more computing power or memory which will influence how much data you can process on the device vs. how much needs to be gathered in real-time.
  16. We evolved our discussion towards what ultimately became a discussion between resource-based APIs and experience-based APIs.
  17. The original one-size-fits-all API was very resource oriented with granular requests for specific data, delivering specific documents in specific formats.
  18. The interaction model looked basically like this, with (in this example) the PS3 making many calls across the network to the OSFA API. The API ultimately called back to dependent services to get the corresponding data needed to satisfy the requests.
  19. We have decided to pursue an experience-based approach instead. Rather than making many API requests to assemble the PS3 home screen, the PS3 will potentially make a single request to a custom, optimized endpoint.
  20. In an experience-based interaction, the PS3 can potentially make a single request across the network border to a scripting layer (currently Groovy), in this example to provide the data for the PS3 home screen. The call goes to a very specific, custom endpoint for the PS3 or for a shared UI. The Groovy script then interprets what is needed for the PS3 home screen and triggers a series of calls to the Java API running in the same JVM as the Groovy scripts. The Java API is essentially a series of methods that individually know how to gather the corresponding data from the dependent services. The Java API then returns the data to the Groovy script who then formats and delivers the very specific data back to the PS3.
  21. Our original REST API had granular endpoints and generic interaction models. This leads to different versions when significant changes are made. The REST API had three primary version before our move to the experience-based API.
  22. If we persisted in the REST API, we very likely could have continued to add versions while needing to support the old ones. The need to support prior versions stems from older device implementations that may not be able to updated or retired, thus forcing us to maintain these endpoints for a long time (perhaps as long as 10 years).
  23. Our target with the experience-based API was to build an architecture that allowed us to be versionless. Through SSKDs, separation of concerns, abstraction layers, and interaction optimizations, we are able move to a deprecation model.
  24. The primary goal is to limit versioning in the device-to-server interaction. Ideally, we can deprecate effectively in the server interactions as well, but that is sometimes more difficult. Back to our architecture view, the data can now flow from the services into the Java APIs. We expose granular methods (think data elements rather than resources) to the scripting tier. If a method needs to change, we can add a new method and then work closely with the SSKDs to migrate the calling scripts, enabling us to deprecate the old method. If we are not able to move the scripts, we can insulate the devices from the change either in the Java layer or in the scripting tier.
  25. Several years ago, we were deploying changes roughly every two weeks. We would accumulate changes over that time and then drop them into production all at once. Think of it as gathering water in a bucket.
  26. What we found was that our releases were unpredictable, sometimes resulting in outages, broken functionality, or incomplete work. Accordingly, we decided to slow down, changing our release cycles to three weeks. We figured that would give us more time to test our work. In other words, we got a larger bucket.
  27. Over time, however, we learned that the longer release cycle didn’t improve predictability or quality. Instead, it just slowed us down. In response, we moved aggressively towards continuous delivery. Instead of delivering water in buckets, we had a steady stream of water from a hose. This enabled us to have smaller changes, more isolated and testable, pushed to production instead of having bigger releases with more complexity.
  28. This is how code flows through the system. We have multiple canary releases per day. Internal envs are deployed ~8 times/day in 3 AWS regions. Prod deployments happen 2-3 times/week and can be triggered on demand.
  29. This dashboard lets us track the status of our master branch at any time. Builds that fail at any step in the pipeline are stopped from going further.
  30. A quick word on Testing. We follow the ‘Operate what you Build’ model where developers are responsible for shepherding their changes all the way through to production. We provide them with the tools necessary to help them gain confidence in the quality of their code. One such tool is the automated Canary Analyzer.
  31. Canary Analysis is the process wherein a small percent of traffic is routed to the new code and its performance is compared against the old code based on 1000s of metrics.
  32. A detailed report gives further insight into potential problem areas. In this case, our canary gives a score of 87%, which means it is likely not ready for release.
  33. In tandem with canaries, we use Red/Black deployments as well.
  34. The Red/Black process allows us to run production code in one cluster while we spin up the new code in a second one. As the new code proves itself, we can route all traffic to it and eventually shut down the old. It also allows us to have a fast, automated rollback in the even that the new code is seeing problems.
  35. Our architecture enables us to move faster because of the scripting tier. But this also put us in position to help our consuming teams and dependency teams to move faster as well.
  36. Let’s peek under the hood of the API Server. Client teams deploy endpoints dynamically based on their own schedule. Their cycles are completely asynchronous of server deployments. Newly deployed endpoints are live and ready to take traffic within minutes.
  37. Endpoint Activity Dashboard shows recent deployment activity. Rollbacks can be performed in a matter of minutes as well.
  38. Our dependent services provide to us client libraries that get compiled into our JVM upon deployment. These libraries typically expose static interfaces, which means changes to the interfaces require coding and deployments without our contain. Similar to the dynamic endpoints, we also have opportunity to improve the nimbleness and velocity around these libraries.
  39. One such improvement is dependency canaries, where we are evaluating our new code against the dependencies. This is a dashboard the provides insights into these canaries.
  40. Making the interaction with the consumers of the API dynamic has led to increased agility on the UI side. We are also exploring ways to increase the speed of iteration on the dependencies side. The current interaction model uses static domain models and client libraries to handle the data flow through the API. This results in long iteration cycles for even the simplest of use cases. We are actively pursuing an approach where our dependencies will be able to expose new data by using dynamic pass-through model using a Dictionary of key values.
  41. The idea is that this model will avoid the static update cycle on the API end, thereby resulting in shorter iteration cycles. This will require investment in things like safety checks and discoverability of the API. We are instrumenting the API layer to inspect traffic at runtime and provide insights into API usage.
  42. One of the early mistakes that we made in this new architecture was not treating internal developers like we did public developers. We don’t need the same degree of evangelism, but we do need to maintain strong communications with the client teams while providing robust tools and systems to help them be better developers in our system. An example of us being late to this is represented by our endpoint dashboard. One of our teams went from having about 30 scripts to about 500 in a matter of weeks. Each of these scripts are dynamically compiled into the JVM, occupying permgen space. As the script count shot up, we hit limits in our permgen which resulted in an outage. And an outage in our layer means people cannot stream Netflix. Of course, there is nothing like an outage to kickstart new behaviors. As a result, we immediately set up alerts and then focused more heavily on building tools to support the developers.
  43. Included in that effort is comprehensive documentation.
  44. We built an array of tools as well, including this REPL.
  45. And prepared frequent trainings and videos.
  46. Nobody has a 100% SLA, so things will fail
  47. In fact, a few years ago, we have many failures on a routine basis.
  48. Many of those failures were a result of failures in a dependent service that we did a poor job of protecting against. Because we are the last step before delivering content to the customers, we have a unique opportunity to help protect customers from such failures.
  49. Hystrix allows us to be resilient to failure by implementing the bulk-heading and circuit breaker patterns. Hystrix is open source and available at our github repository.
  50. Failure Simulation and Game Day exercises are a key part of the overall story. The Simian army is a fleet of monkeys who are simulate failures and alert us to non-conformities in an automated manner. Chaos Monkey periodically terminates AWS instances in production to see how the system responds to the instance disappearing. Latency Monkey introduces latencies and errors into a service to see how it responds and lets us assess the customer quality of experience. Conformity monkey alerts us to variations in versions of application across regions. The monkeys are also available in our open source github repository
  51. Because of our pivot to the private API and the explosion of devices consuming it, our traffic grew tremendously in a few years (and continues to grow at very fast rates). Scaling our systems to support this growth is absolutely critical to the success of the company. Techniques, such as throttling are not an option because that only serves to limit the interactions from our streaming subscribers. Instead, we need to be able to handle any load that our devices throw at us. This manifests in many ways, but the following is a detail on one of them – instance scaling.
  52. Let’s go back to the traffic chart. The pattern is predictable with higher peaks on the weekends
  53. To offset these limitations, we created Scryer (not yet open sourced, but in production at Netflix). Scryer evaluates needs based on historical data (week over week, month over month metrics), adjusts instance minimums based on algorithms, and relies on Amazon Auto Scaling for unpredicted events
  54. This graph shows that Scryer’s predictions are in line with actual RPS. In production, Scryer allows us to get instances into production prior to the need (which is different than Amazon’s reactive autoscaling engine which triggers the ramp up based on immediate need, only needing to wait until server start-up is complete). Because the instances are there in advance, Scryer smooths out load averages and response times, which in turn improves the customer experience.
  55. This is an example of what Scryer looks like during an outage. When actual traffic dropped because of an outage, the reactive autoscaling engine would have downsized the farm. In this case, Scryer kept the farm sized correctly so that we were able to deal with the traffic spike after the recovery.
  56. As a side benefit (not the initial intent), Scryer also allows us to be more precise with our instance counts, reducing inefficiencies.