Spring 2019 Syllabus

Notice that there are links to the slides in the syllabus, so you can make notes right on a copy of the slides if you wish.

Readings are listed where a specific paper or web page is better than what you can find in Foster/Gannon book, or Ken's book.  Otherwise, we give book and chapter references.  None of the readings are from the big data book, because I haven't had time to read it carefully and link to it.  I hope to do that before classes start.

The readings are encouraged, and yet optional. They are a good way to learn more, or to gain clarity if you want to go back over a lecture topic that you weren't completely comfortable with from lecture (or if you miss a lecture entirely). To learn the material in CS5412 it would usually be a good idea to at least look at these recommended readings. Just the same, you won't ever be "tested" on something that requires knowing things that were never covered in class, or asked about optional readings when presenting your project (you might be asked about topics covered in class that seem relevant to the project).

Some lectures are really part of a larger group of lectures with multiple readings, covered on multiple class meetings. In those cases the same reading won't be repeated again and again, yet we might refer back to things that the prior lecture and reading explored.

Ken sometimes notices typos while lecturing, and will post updated slides for each lecture within a day or two if that happens.

  Date Topic Remarks, Recommended reading (optional, see note above)
1. Tue 1/22 [Internet of Things: Overview]

Overview of the course.  Azure IoT model: Sensors, Azure IoT Edge roles, Azure Intelligent Edge and IoT Hub, u-services model, data center file system and database infrastructures, big-data analytics infrastructures.

We focus on Azure just for coherency, but Amazon AWS has completely analogous components except with less focus (as of today) on IoT.

Slides: pptx  pdf
The first five lectures are really to help everyone get situated and onto the same page in terms of terminology and mindset.  In lecture one we look at an end-to-end perspective on how a smart farm would work in Microsoft Azure from data collection all the way back to data storage and big-data analytics.  The technical depth will be kind of shallow.

Azure.microsoft.com:  Home page for all of Azure and Azure IoT.  This is actually quite a useful resource for finding more details on the topics of the first few lectures.

Some of the examples in the lecture draw on work done by Professor Delimitrou in Cornell's ECE department.  A paper on her Seer system can be found here: Seer: Leveraging Big Data to Navigate the Complexity.  Seer depended on a suite of tools for benchmarking microservices discussed here: Benchmarking Microservices

One example discussed in Lecture 1 is Microsoft's smart farms project.  Read more at:  FarmBeats: AI & IoT for Agriculture.
2. Th 1/24 [Scalability and Key-Value Sharding]

Introduction to cloud scalability techniques: hierarchy, point of presence mini-datacenters, full datacenters, (key-value) sharding and simple fault-tolerance techniques, use of a DHT plus notifications to implement a publish-subscribe message bus, a DDS, or a message queue.  Putting it all together: Akamai CDN and Facebooks massive content delivery infrastructure.

Slides: pptx  pdf
Continuing our broad but shallow review, lecture two looks at ways of breaking large data sets into what are called sharded key-value stores.

Much of what we discuss in Lecture 2 can be found on Wikipedia in the key-value database entry.  (In fact they go beyond what we will be talking about and look at the whole question of treating entire databases in a key-value manner, but in CS5412 we won't tackle the full question.)

The two papers we'll specifically cover are concerned with Facebook's caching policy, and the RIPQ mechanism they used to adapt S4LRU to work on flash SSD.  But you are only responsible for understanding the overall approach -- not the details.
3. Tue 1/29 [Function servers.] 

Customization of event handlers.  Typical micro-services: Message queuing, message bus, storage, data compression, image segmentation and tagging, other data transformations.  Concept of event-driven state machines in the function-server setting.  Stateless functions, and where to save state.

Slides: pptx  pdf
We used the term "micro-service", but where did this idea come from?  What does a typical micro-service do?  We'll also look at a few of the more important micro-services found in Azure.  By the way, in diagrams, AFN is a shorthand for "Azure Function".

The relevant Wikipedia page is here, but again, is actually more general than what we discuss in CS5412.
4. Th 1/31 [Functions versus Micro-services]

Now that we have two programmable options, how will we pick between them?
Slides: pptx  pdf
In a nutshell, although one could probably implement any application in any way you like, this wouldn't be a smart use of Azure and would lead to performance problems or issues of other kinds.  The right way to go is to use functions for very brief actions that either just query the state or trigger an out-of-band update.  We'll use microservices where there is a heavier computational aspect and perhaps significant state that changes over time.
5. Tue 2/5 [IoT sensor registration.  Risk of sensor inaccuracy.]

The Azure IoT hub and the concept of a secure sensor with a managed life-cycle.  Sensor properties.  Fault-tolerance.  The META system and its model of fault-tolerance for IoT devices.

Slides: pptx  pdf
To start drilling down, we'll look closely at how end-users connect devices like cameras, drones, microphones (Cortana/Siri/Alexa) and so forth to the cloud.  Azure IoT Hub is a microservice for secure sensor management.

Then we will study an example of a case where an IoT sensor malfunctions to start thinking about what this even means, how we could compensate, and what corrective actions might be appropriate.

Tools for Distributed Application Management. K. Marzullo, M. Wood, K. Birman and R. Cooper. IEEE Computer, Aug. 1991, 24(8):42-51.
6. Th 2/7 [Time and Causality]

Timestamped data.  Clocks and clock synchronization.  Sensor time, platform time.  Causal ordering and causal clocks.

Slides: pptx  pdf
We will start a somewhat deeper dive into underlying technology by looking at the issue of temporality in modern IoT settings, where sensors might have some form of clock.  We will end by looking at Lamport's definitions for causality and consistent cuts.

Time, clocks, and the ordering of events in a distributed system. L. LamportCommun. ACM 21, 7 (July 1978), 558-565.

Distributed snapshots: determining global states of distributed systems. K. Mani Chandy and Leslie Lamport. ACM Trans. Comput. Syst. 3, 1 (February 1985), 63-75.
7. Tue 2/12 [Temporality and causality in storage systems]

File systems with concepts of temporal data and causal consistency.

Slides: pptx  pdf
This lecture shows how the ideas we learned about in lecture 6 can be useful in understanding some of the issues that arise in modern file systems, which turn out to be pretty terrible at handling real-time data, or causality.  But there are file systems that do better (including one we created here at Cornell!) and more broadly, there are ways IoT application developers can overcome the limitations and problems.  We'll see how that might work for a file system, and then for a key-value "object store".

The Freeze-Frame File System. Weijia Song, Theo Gkountouvas, Ken Birman, Qi Chen, and Zhen Xiao. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC '16), Marcos K. Aguilera, Brian Cooper, and Yanlei Diao (Eds.). ACM, New York, NY, USA, 307-320.

As a side remark, machine learning systems often use the ARIMA model when accessing temporal data.
8. Th 2/14 [Systems Challenges for Intelligent IoT]

Intelligent decision-making via functions in the Azure IoT Edge.  Usring drones and sensors to intelligently monitor a farm or greenhouse.  Concept of a real-time DDS, and how this concept failed in the past.

Slides: pptx  pdf
In this lecture we will apply our understanding of Azure to think about how we might create edge computing solutions that behave in intelligent ways.  While this may sound like a lecture on machine learning, actually we won't be discussing the way the machine learning solution works, at all.  Instead our focus will be on the challenges of making sure that the needed machine-learned "models" are present at the place the filters need to run.  But we will treat the actual classifier as a black box.

The lecture centers on a story with a warning in it: not everything can be made to work, even if you have huge resources to invest in the concept.  You can read about the CASD protocol we'll discuss in Ken's textbook, chapter 19.  Copies are available in the Engineering Library.
9. Tue 2/19 [The Architecture of an Edge IoT Solution]

We will discuss appropriate roles for functions and for smart micro-services and will encounter a style of computing common in the cloud, where all data is sharded all the time, and all computation is spread over multiple concurrent tasks.  Cornell's Derecho can help you create such services and will automate many of the self-management aspects.

Slides: pptx  pdf
Highly dynamic services pose their own special challenges, which we will discuss.  How would we initialize services when we need to suddenly launch new members to handle load surges, or suddenly shut things down?

This topic is covered in all three textbooks listed in the resources area for the class.  In Ken's textbook, it is the main topic in Part I and II.  As noted, copies are on reserve in the Engineering Library.
10. Th 2/21 [Fault-tolerance: Deep Dive]

Fault-tolerance concepts.  Split-brain concept..  "Stateless" computing with replicated persistent data.  State machine replication. Chain replication. 

Slides: pptx  pdf
One big puzzle with a system split between sensors at the edge, cloud-hosted middle services, and then perhaps back-end computing on massive data sets, is that sooner or later elements will definitely fail and restart.  This lecture looks at the best ways to have your system keep running even after a stumble.

Optional reading if you want to learn more about Derecho:

Derecho: Fast State Machine Replication for Cloud Services. Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weija Song, Edward Tremel, Sydney Zink, Kenneth P. Birman, Robbert van Renesse. To appear, ACM Transactions on Computing Systems (TOCS), 2019.

Optional reading if you want to learn more about Paxos (this is a very famous but really complex protocol... we touched on it but did not cover details in class, and you are not required to know how the protocol works):

Paxos Made Moderately Complex. R Van Renesse and D Altinbuken. ACM Comput. Surv. 47, 3, Article 42 (February 2015), DOI: 10.1145/2673577
-- 2/26 February break, no classes.

Have fun!  We won't go on holiday as a class, but if you might be tempted, this is a photo from Lake Placid's "Whiteface" mountain.  That area, or the famous ski areas in Vermont and New Hampshire, are just a few hours from Ithaca by car.  And guess what?  They offer great lessons and have beginner slopes too...  Just bring warm clothes and drive carefully: the roads can be slippery up in the frosty north...
Image result for Snowman
11. Thu 2/28 [Geoscale Computing]

Availability zones.  WAN replication.  Mirroring versus active update models.  Google's Spanner system.  5G mobility.

Slides: pptx  pdf
If you depend on the cloud, clearly you need your cloud to be reliable.  Yet datacenters do fail.  An availability zone is a set of 2 or 3 side-by-side cloud datacenters that the vendor manages to ensure that (if possible) at most 1 would be down at any time.  Because the distances are so tiny, latencies are similar to intra-datacenter delays. 

WAN replication arises when datacenters are located at very long distances, maybe even globally.  Yet we can still do strongly consistent data replication even at that scale, as Google's Spanner demonstrates.

Spanner: Google’s Globally Distributed Database. James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. ACM Trans. Comput. Syst. 31, 3, Article 8 (August 2013), 22 pages.
12. Tue 3/5 [Gossip Protocols]

We've talked about protocols based on Paxos, but Paxos isn't the only option.  We'll discuss gossip and see how it can be useful inside datacenters.  Bimodal Multicast and Astrolabe.  Swarm computing and computing for traffic convoys.

Slides: pptx  pdf
One puzzle in a self-managed system is that the components might not even know how to get in touch with one-another.  Gossip protocols are unusually robust and scalable and have completely predictable load.  This makes them especially valuable in system management.

Here are papers on the two main systems we'll discuss during this lecture.  You can also read about them in Ken's textbook.  Amazon adopted Astrolabe, although they quickly evolved it to not resemble the original version anymore.  Later they reported on one way that gossip caused a problem: the famous Amazon S3 storage-availability tracking bug that shut S3 down for a whole day. 

Bimodal Multicast. Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu and Yaron Minsky. ACM Transactions on Computer Systems, Vol. 17, No. 2, pp 41-88, May, 1999.

Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining. Robbert van Renesse, Kenneth Birman and Werner Vogels. ACM Transactions on Computer Systems, May 2003, Vol.21, No. 2, pp 164-206  
13. Thu 3/7 [BlockChains for IoT]

Definitions.  Anonymity, Byzantine DDoS attacks.  Proof of work.  Permissionless versus Permissioned BlockChain models. Using Ethereum or Hyperledger to encode IoT event records. . 

Slides: pptx pdf
A fun question is to see whether we can link seemingly diverse ideas together.  So we'll look at whether we could use gossip as the basis for a permissioned BlockChain. 

You can read more about BlockChains of both permissioned and non-permissioned flavor on Wikipedia.

14. Tue 3/12 [BlockChain Puzzles and Concerns]

Vegvisir.  Open questions: BlockChain has been adopted so enthusiastically that early users are seemingly ignoring a great many puzzles.  We'll discuss a few of them.

Slides: pptx pdf
The main paper we will discuss is this:

Vegvisir: A Partition-Tolerant Blockchain for the Internet-of-Things, (s) Kolbeinn Karlsson ; Weitao Jiang ; Stephen Wicker ; Danny Adams ; Edwin Ma ; Robbert van Renesse ; Hakim Weatherspoon. 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), Vienna, 2018, pp. 1150-1158. 
15. Thu 3/14 [Hardware accelerators]

These days, anyone who follows the cloud literature sees endless rave reviews of hardware devices like RDMA, NVMe, GPU and GPU clusters, TPU and TPU clusters, FPGA.  But how important are these accelerators for cloud intelligence?  How do you get access to them, and can you use them without learning obscure languages like Verilog?

Slides: pptx  pdf
In the cloud accelerators matter, a lot.  Many kinds of cloud intelligence applications center on very costly computations, and we have to find ways to do them quickly and cost-effectively.  But this dimension of the cloud centers on its ability to leverage highly specialized hardware.  We'll do a mile-high review of the most important accelerators.  You don't normally access these directly: instead, you use u-services that already are integrated with them.  But there are exceptions: GPU and TPU are sometimes accessible to users, and there are many software layers that have special permission to access other devices, too.   This drives us towards u-services: there just isn't any other way to get the needed performance at reasonable cost.

TensorFlow: A System for Large-Scale Machine Learning
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 265-283.
16. Tue 3/19 [Why isn't RDMA everywhere?]

Why can't we just RDMA everywhere and avoid "copying"?  Why copying is such a costly operation, and why zero-copy nonetheless remains a holy grail.  Challenges of introducing RDMA into big data centers (and how to view those challenges as a "warning" for other future accelerators that people may want to deploy at scale!)

Slides: pptx pdf
This lecture is really another one of Ken's "heroic challenges" lectures.  We will look at the path from the invention of RDMA into modern data centers like Azure, and focus on some of the many "hiccups" that occurred before deployment was finally feasible.  Microsoft was kind enough to write a series of papers on their experiences, so we'll focus on the story they describe.

RDMA over Commodity Ethernet at Scale. Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, New York, NY, USA, 202-215. DOI: https://doi.org/10.1145/2934872.2934908
17. Thu 3/21 [Leave No Trace: Practical IoT Privacy for Cloud-Assisted Computing]

The privacy puzzle for IoT and the Cloud Edge.  Impossibilties.  Practical options: VPNs and Enterprise VLAN.  Container isolation.  ORAM.  Secure databases and the MIT CryptDB concept.  Intel's SGX model.  Audit trails and legal protections.

Slides: pptx pdf
Today's lecture will focus on a puzzle: if we use the cloud for sensitive tasks like making sense of speech and images, how can we preserve privacy when the underlying data is being collected in a private environment such as the home?

After looking at the way this question arises we will spend a few minutes each on some of the practical building blocks available today.  The perfect solution cannot exist but we can do extremely well if the IoT app developer and the cloud vendor are both committed to security and privacy.  If either is motivated primarily by adversing revenue, in contrast, or has "spies" working inside the company, the problem becomes unsolvable.
18. Tue 3/26 In-class prelim covering lectures 1-15. We have posted a study guide.  The prelim from 2018sp is here (solutions).  Keep in mind that the class itself isn't identical.  The 2019sp prelim will focus on topics discussed in this semester, not topics from one year ago.  But the style of the exam will be similar.
19. Thu 3/28 [Big Data Analytics Frameworks]

This lecture will be an introduction to big data, with a focus on the concept of "always sharded" computing.  We will look at one example of an existing big data infrastructure (Facebook TAO) and we'll discuss how IoT data will create new challenges, such as situations where most of the actual data is physically located in the edge or on the edge sensors, and only a tiny fraction can even be downloaded (and then would need to be processed urgently). 

Because a lot of people will already have left for Spring Break, this class is designed so that lecture 20 doesn't depend on it.

Slides: pptx pdf
TAO: Facebook's Distributed Data Store for the Social Graph. Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov Dmitri Petrov, Lovro Puzar, Yee Jiun Song, Venkat Venkataramani. 2013 USENIX Annual Technical Conference (USENIX ATC '13).
-- Tue 4/2 Spring Break, no classes.

You've earned it... maybe head for a warm place this time? 
Image result for water skiing
-- Thu 4/4
20. Tue 4/9

[The Apache big-data technologies].

Zookeeper drill-down. 

HDFS, HBASE, YARN and Hadoop (a version of MapReduce).

Slides: pptx pdf
The Apache "ecosystem" uses Zookeeper for distributed system management and configuration control.

ZooKeeper: Distributed Process Coordination. Flavio Junqueira and Benjamin Reed. 2017, O'Reilly. ISBN-13: 978-1449361303. ISBN-10: 1449361307 Apache Zookeeper Site: https://zookeeper.apache.org

A simple totally ordered broadcast protocol. Benjamin Reed and Flavio P. Junqueira. 2008. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, New York, NY, USA, , Article 2 , 6 pages. DOI=http://dx.doi.org/10.1145/1529974.1529978

The life and times of a zookeeper. Flavio P. Junqueira and Benjamin C. Reed. 2009. In Proceedings of the 28th ACM symposium on Principles of distributed computing (PODC '09). ACM, New York, NY, USA, 4-4. DOI: https://doi.org/10.1145/1582716.1582721
Apache Zookeeper Site: https://zookeeper.apache.org

Hadoop is a popular open-source version of MapReduce. We already discussed the MapReduce model, but today will talk about how Hadoop actually works.  Hadoop is a way to run parallel batch applications on files, using a scheduler called YARN and a set of file access accelerators called HBase, PIG and Hive.  We won't talk about PIG and Hive in this lecture.

The  Hadoop WikiPedia article gives a good overview of the Hadoop platform and has links with detail on HDFS, HBase, YARN, etc.  There are also journal papers written at Google on their versions, which were the original ones: MapReduce, GFS, etc.
21. Thu 4/11

[The Apache big-data technologies].

Hive and PIG.  Publish-subscribe messaging with Kafka.

Slides: pptx pdf
Some files have immense key-value tables or other forms of structured data, but way too much to load into memory (like log files from entire data centers).  How would you process those on Hadoop?  It turns out that they have some answers.

First we will look at Hive and PIG, which are more tools for file access, but a bit fancier than HBase. Then we will switch topics a bit, and discuss Kafka.  You might think that files and messaging are totally different ideas, but a long history of work on "message oriented middleware" has yielded some very popular systems that merge the two concepts: in these, you can generate objects that are not just stored, but also "notified" in the sense that applications can monitor files or "topics" for updates.  Examples found in the cloud include Kafka and OpenSplice.

Apache Kafka Site: https://kafka.apache.org/
22. Tue 4/16
[Write-Once Data and Rollback/Redo Fault Tolerance]

Fault-tolerance in MapReduce/Hadoop.  Why Hadoop's style of computing only requires file appends, not general updates or replacement. 

Slides: pptx pdf
Many people are surprised to learn that even though Hadoop's HDFS file system can be used more or less like a normal file system, in fact Hadoop only allows programs to append to files, not to do arbitrary updates.  Why did they impose this rule?  We'll see that it comes down to fault-tolerance in Hadoop.

In this talk we discussed the Fischer, Lynch and Patterson impossibility result.  The paper is not simple to read, although it is short.  Here is a pointer to it, and then a pointer to a much easier to follow paper about some other limitations on fault-tolerance that might interest you:

Impossibility of distributed consensus with one faulty process. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. J. ACM 32, 2 (April 1985), 374-382. DOI=http://dx.doi.org/10.1145/3149.214121

Easy impossibility proofs for distributed consensus problems. Michael J. Fischer, Nancy A. Lynch, and Michael Merritt. In Proceedings of the fourth annual ACM symposium on Principles of distributed computing (PODC '85), Michael Malcolm and Ray Strong (Eds.). ACM, New York, NY, USA, 59-70.1985.  DOI=http://dx.doi.org/10.1145/323596.323602
23. Thu 4/18 [Microsoft vision for Azure, Digital Farming, and IoT]

Guest Lecturer from Microsoft Azure IoT senior leadership team.
Ranveer Chandra, Chief Scientist for Azure Global, joined us for this lecture (it was presented remotely, and we do not yet have permission to post the slides).  A Cornell PhD, Ranveer is widely known as a pioneer of new wireless communication technologies.  He went on to commercialize that concept as a Microsoft product, shifted his group to explore new software-controlled battery concepts and integrate them into new drones, and recently was promoted to a new role as the visionary and leader for Azure Global. 
24. Tue 4/23 [Object Oriented Storage]

Ceph: A Scalable High-Performance Distributed File System

Slides: pptx pdf
While many big-data systems start with unstructured data (like web pages), there are growing needs to work with higher-level "objects" through file system APIs.  Ceph is a new and very popular file system that scales super well, has HPC extensions for people doing supercomputing research, and with a built-in layer for "object" storage that bypasses the POSIX file system API.

Ceph: A Scalable High-Performance Distributed File System.  Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006.  In Proceedings of the 7th symposium on Operating systems design and implementation (OSDI '06). USENIX Association, Berkeley, CA, USA, 307-320.

Ceph Object storage
25. Thu 4/27 [Spark RDD concept.]

Slides: pptx pdf
Hadoop used to be slow until a Berkeley project called Spark came up with a clever new caching concept centered on resilient distributed data objects or RDDs.  We'll look at how these work, and how they can talk to temporal data from sensors.

Spark: Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. HotCloud 2010.

Improving MapReduce Performance in Heterogeneous Environments, M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz and I. Stoica, OSDI 2008, December 2008. 

26. Tue 4/30 [Adapting RDDs for Real-Time IoT Data]

Slides: pptx pdf
This is a topic Theo (our TA) has been exploring for his PhD thesis.  He'll give this lecture, and will show how temporal access to FFFS via Spark RDDs led to inefficient cache use, and how he was able to modify the Spark caching policy to get far better performance with temporal queries that use the common "ARIMA" model (ARIMA is a machine-learning term for Autoregressive Integrated Moving Average).

How to Create an ARIMA Model for Time Series Forecasting in Python
27. Thu 5/2 [Smart Homes]

A dialog with the CEO and Founder of Caspar.ai: Dr. Ashutosh Saxena.
Our entire semester has focused on IoT applications like FarmBeats.  In this lecture, we'll have a guest speaker from a company called Caspar.ai.  Ashutosh Saxena used to be a Cornell faculty member in the robotics / database area, and then left to launch his company, which looks at IoT within the home.

Wikipedia Article on Smart Devices, Smart Homes, Smart Highways, Smart Grid.

Caspar.ai is working on smart homes, but they are in the middle of this ecosystem.

Caspar.ai web site
28. Tue
[The Future of the Cloud]

If the cloud is ultimately shaped by the flow of money, what can we learn about how the market for large-scale computing is evolving,  and what does this tell us about the future of the cloud?  The class will be as data-driven as possible (I plan to hunt for public materials about the evolution of the cloud business model and market, and we'll see what insights we can glean from the charts and predictions).

Slides: pptx pdf