Fall
1999. Distributed/Multiprocessor Operating Systems.
Class
synopsis
- just a brief outline of topics covered in class
Class 1: August 23 1999
- Information about: Course Web page,
Office Hours, Cheating, Exams, Projects, Lectures, Books,
Course outline.
- Mutliprocessor systems:
- Types of machines: UMA, NUMA, NORMA, SIMD, MIMD (possibly
SISD, MISD)
- SMP and UMA machines, architecture, operations, speedup
Class 2: August 25 199
- Cache Coherence in UMA machine
- Applications for UMA machines
- Operating systems for UMA
- Interrupt handling
- Locking - spin locks
Class 3: August 30 1999
- Usage of Spin locks, mutex locks, interrupt disabling in
uni- and multi-processors
- why spin lock not used on single processor
- Interrupt Handling
- Reentrant kernels
- Operating System Structure - Master Slave/Floating
Master/Symmetric
- Scheduling
- preemtion inside spin lock
- cache corruption
- context switching
- coscheduling
- Affinity Scheduling
- Handoff Scheduling
- Switched Multiprocessors
Class 4: September 1st
- NUMA architectures
- scalability
- architecture details
- application programming
- operating systems
- switch design
- NORMA Architecture
- relationship to distributed systems
- switch characteristics
- NORMA programming strategies
- mesage massing - PVM, MPI, Open MP
- distributed shared memory
- tuple space (Linda)
Class X: September 6th
- No class - HOLIDAY
Class 5: September 8th
- Distributed Opearting Systems, what are they.
- The processor pool model
- The workstation model
- History of distributed systems
- timesharing systems
- minicomputer systems
- networked systems
- Towards centralization of distributed systems
- The Network File Systems
- Dataless workstation configuration
- How NFS works (incomplete)
Class 6: September 13th
- NSF discussion continued
- Client stubs and server stubs in NSF
- Remote mounts and file name resolution
- open/read/write operations
- Client Caching in NSF, file semantics and concurrent
access
- Stateless and Stateful file service
- Scalability issues
- Andre File System
- File server, local files
- callbacks and coherence control
- Issues of scalability and cosnsitency in AFS
Class 7: September 15th
- What are are Dist Systems used for?
-. Distributed Applications
-. Information Sharing
-. Resource Sharing
-. Better Price Performance
-. Higher Reliability
-. faster throughput
-. growth/flexibility/
- What is a Distributed Operating System?
- Integration of autonomous computers.
- Issues:
- One world view of dist systems
- Logical and physical distribution
>> Transparency of:
*- access,
*- location,
*- replication,
*- failure
>> Reliability
*- Fault avoidance,
*- fault tolerance,
*- failure detection and
*- recovery
>> Flexibility: microkernels
>> Performance:
*- caching,
*- less copying,
*- low latency protocols,
>> Scalability
*- avoid centralized
*- do things on clients
- - Distributable system models
- message model
- object model
- shared memory model
Class 8: September 20th
- Message Passing - what it is
- SEND - the variations in semantics
- RECEIVE - the variations in semantics
- blocking, non-blocking, asynchronous, synchronous
- relative merits of the different kinds
- Programming with send/recv
- global mailboxes - ports
- Client server programs
Class 9: September 22rd
- Client Server Architecture
- using threads for parallelism
- Marshalling of Arguments
- Dynamic Port Creation
- Port Binding to Name server
- Name server strategies
- well known ports
- replications
Class 10: September
27th
- Mutltithreaded servers, why we need it:
- efficiency - even in uniprocessor systems
- handling recursion
- How to get multithreading?
- co-routines
- design of user level non-preemptive threads
- using semaphores for scheduling and mutual exclusion
- Reentrant Programming and techniques to attain reentrancy
- Kernel level threads
Class 11: October 4th
- More discussion of threads
- Managing servers without threads - using manual programming strategies
- Use of cookies to record state information
- Corrected version of the User-Level threads and server
program (uses cookies)
- Taxonomy of threads
- Usage guidelines for threads
- Introduction to Microkernels
Class 12: October 6th
- Mickrokernels - description, usage, facilities and
advantages
- Configuring on OS on top of a microkernel
- services
- system calls
- Performamnce problems with micorkernels
- methods of alleviating performance problems
- Message system implementations
- single machine message passing
Class 13: October 11th
- Message passing between machines
- Local ports and global ports
- Sending to and receiving from ports that are on remote
machines
- Network servers, network protocols and inter-kernel
messaging
- Access Control, using Capabilities
Class 14: October 13th
Class 15: October 18th
- Lock management
- Lock types, lock compatibility
- Centralized lock manager for distributed locking (client/server)
- Locking and unlocking read and write locks, with
starvation prevention
Class 16: October 20th
- Upgrading of read locks to write locks
- Semantics of upgrading
- Deadlocks due to upgrading
- Deadlock resolution
- Starvation due to deadlock resolution
- Starvations due to upgrading (not discussed - open
problem)
- Project details
Class 17: October 25th
- MILAN (Calypso, Chime) , by Tom Boyd
Class 18: October 27th
- Computing Communities by Tom Boyd
Class 19: November 1st
- Shortcoming of Message passing as a programming method
- Remote Procedure calls (RPC)
- The IDL compiler
- Client stub generation and how client stubs work
(marshalling, type checking, parameter types)
Class 20: November 3rd
- RPC Continued...
- Server stub generation
- How the RPC server and clients communicate
- Name services
- Type Checking
- SunRPC, DCE RPC, MS RPC
- What is DCE
Class 21: November 8th
- Introduction to DSM
- Shared memory semantics (coherence, threads, processes, sharing)
- Distributed Shared Memory - approaches
- Page fault handling and DSM servers
- Invalidations
- Algorithms for implementing a sequential consistent, page-based DSM system
Class 22: November 10th
- Performance problems of DSM
- Page Shuttling
- False Sharing
- Semantics of DSM systems
- Strict consistency, Sequential Consistency, Causal Consistency, PRAM, Weak
and Release Consistency
- What is release consistency (acquire/release of shared memory)
- How release consistency reduces page shuttling and false sharing
Class 23: November 15th
- Release consistency implementation
- page diffs
- multiple locks
- why is performance good with RC
- performance of DSM systems
- The Muddy Children Problem
- Hierarchies of knowledge
- The definition of "Common Knowledge"
Class 24: November 17th
- PROJECT 2 handed out. Please note STRICT due date.
- Consensus and agreement in distributed systems
- The Coordinated Attack Problem
- The 2-phase commit algorithm
- Introduction to "time"
Class 25: November 22nd
- Discussion of Project 2 requirements
- Concept of time and clocks
- Real time clocks
- Clocks synchronization for real time clocks
- Modeling clocks as a counter
- Modeling time as event ordering
- Lamport's clock criteria
Class 26: November 24th
- Project 2 web pages have been updated
- Lamport Clocks
- Distributed Mutual Exclusion
Class 27: November 29th
- Lamport Solution to DME
- Other solutions, Maekawa style solutions
- Centralized solutions - comparisons
- Distributed Snapshots
Class 28: December 1st
- Distributed Snapshots
- The Algorithms
- The Sample programs and examples of state recording
- The properties of recorded state
- Informal proof of correctness
Class 29: December 6th
- Project discussion, Snapshots discussion
- Distributed deadlock detection (edge chasing algorithm)
- Fault tolerance in distributed systems
- Data Fault Tolerance [Replication Management]
- Read 1 Write All, Available Copies, Quorum Consensus, etc.
Class 30: December 8th
Final Exam: Tuesday December 14th 10am
- 11:50am