Presentation is loading. Please wait.

Presentation is loading. Please wait.

8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 1 Publications Review Large Mailing Lists –Kolstad97 –Chalup98.

Similar presentations


Presentation on theme: "8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 1 Publications Review Large Mailing Lists –Kolstad97 –Chalup98."— Presentation transcript:

1 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 1 Publications Review Large Mailing Lists –Kolstad97 –Chalup98

2 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 2 Review: Kolstad97 Problem –Large mailing list ~2000 subscribers * ~200 msgs/day input = 400,000 msgs/day output Long delays

3 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 3 Review: Kolstad97 Goal –Reduce delivery times to ~ 5 min. for each subscriber –Mitigate problems with unavailable hosts

4 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 4 Review: Kolstad97 Solutions: –Sort list by average delay, fastest first –Break list into smaller & more manageable chunks, process chunks in parallel –Reduce initial timeouts Secondary process to clean up and retry with longer timeouts –Split into multiple queues & use multiple queue runners Avoid directory size problems –Improve filesystem to optimize sync. meta-data updates Early version of softupdates

5 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 5 Review: Kolstad97 Applicability –Some solutions specific to large mailing lists –Limited in scope for our purposes

6 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 6 Review: Chalup98 Problem: –Large mailing list ~4500 real-time subscribers ~4900 digest subscribers * ~ msgs/day input = ~ 95, ,000 msgs/day output –Move from greatcircle.com to GNAC

7 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 7 Review: Chalup98 Solutions –Dont blindly copy Redesign & reimplement based on original specifications –Host tuning –Remove smarthost to Uunet –Split list into smaller, more manageable chunks Handle chunks in parallel

8 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 8 Review: Chalup98 Solutions, continued –By-pass standard sendmail injection Directly create separate qf & hard-linked df files –Split queues into multiple separate queues –Multiple sendmail queue runners per queue –Take advantage of sendmail host status feature

9 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 9 Review: Chalup98 Further issues –Bounces –Junkmail –Reflectors –Undeliverable mail

10 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 10 Review: Chalup98 Applicability: –Some solutions specific to large lists –Some MTA-specific solutions –Somewhat limited in scope for our purposes

11 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 11 Publications Review MTAs –Knowles98 –Christenson99 –Venema98 –Golanski2000

12 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 12 Review: Knowles98 Goals –Analyze and describe typical operations for sendmail, enumerate some implications, describe solutions for various problems, indicate some potential future work for further enhancements

13 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 13 Review: Knowles98 Solutions –Monitor directory sizes for mqueues & re-make when/if necessary –Make /var/spool/mqueue separate filesystem and write only to subdirectories –Have your own queue management process –Use QueueSortOrder=host and QueueSortOrder=time with queue runners

14 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 14 Review: Knowles98 Solutions, continued –Use file system w/ built-in directory hashing to avoid long directory search times –Dedicate shared caching-only nameservers to maximize cache hit ratio for all servers –Put /var/spool/mqueue on faster storage technology E.g., RAID 1+0 w/ large battery-backed write-back RAM cache or even solid-state disk

15 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 15 Review: Knowles98 Solutions, continued –Run multiple daemons on separate IP addresses w/ separate sendmail.cf files and mqueue directories Reduces sync. meta-data & locking to 1/n in each mqueue directory –Set MinQueueAge to significant fraction of queue retry period –If you still have too many servers, implement NAT, proxying, or L4 load-balancing switches in front

16 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 16 Review: Knowles98 Solutions, continued –Dont use Linux async writes unsafe Violation of RFC 1123, section 5.3.3? –Dont list too many MXes Takes too long to go through entire list if network unreachable May cause much worse problems if exceed DNS UDP packet length limitations –Possibilities for future improvement

17 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 17 Review: Knowles98 Applicability –Very specific to only the MTA portion of mail system –Largely targeted towards sendmail Somewhat applicable towards other MTAs –Limited in scope, although some problems are similar or shared

18 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 18 Review: Christenson99 Goals –Analyze typical configurations of sendmail, identify typical bottlenecks, detail potential tuning changes to improve performance

19 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 19 Review: Christenson99 Solutions: –Most servers probably dont need tuning –Make sure network isnt the choke point –Dont try to tune the filesystem on small system –Almost never need CPU upgrades –Only need enough RAM to avoid swapping –Almost always better to tune I/O subsystem Not just I/O, but synchronous meta-data updates

20 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 20 Review: Christenson99 Solutions, continued –Mail is very dependant on DNS and very sensitive to latency Put a dedicated caching nameserver on each mail server –Not much kernel tuning needed on modern systems Increase depth of listen() queue Raise buffer sizes Raise maximum number of processes

21 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 21 Review: Christenson99 Solutions, continued –At least six synchronous meta-data operations per message relayed sendmail blocks for each –Gain asynchronous-level mqueue performance safely by using Journaling filesystem and/or softupdates filesystem NVRAM for filesystem acceleration Solid-state disk (SSD) Memory-based filesystem (RAM disk) if you dont care about contents

22 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 22 Review: Christenson99 Solutions, –Large directory sizes are usually a killer Some filesystems implement internal hashing –E.g., SGI XFS Sendmail allows multiple mqueues Once a directory has grown, it does not shrink –Must delete and re-create to recover

23 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 23 Review: Christenson99 Solutions, continued –Networking can be bottleneck Ensure you only use high-quality NICs with generous buffers –Most on-board NICs have no buffering, increasing interrupt handling for CPUs –Few good cards available

24 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 24 Review: Christenson99 Solutions, continued –Message store should be on SCSI disks or RAID, 1+0 preferred Good place for NVRAM –For RAID, you want Ultra/wide SCSI or FibreChannel, with dedicated controller, and maximum number of high-speed spindles Reduce average latency as much as possible

25 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 25 Review: Christenson99 Solutions, continued –For mail reading, mailbox copy is biggest hit Chopping mailbox into separate files helps copy –File-per-message causes more synchronous meta-data operations –Requires scan of each and every file to build mailbox state If there are a lot of users, you will have directory size problems –Need filesystem with built-in hashing –Alternatively, implement your own mailbox hashing scheme »Requires modification of LDA, POP3, & IMAP servers

26 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 26 Review: Christenson99 Solutions, continued –Most problems are I/O problems Use tools like top, vmstat, iostat, sar, etc… Keep historical records to create baseline for comparison –Think youre CPU bound? May be memory problem causing excessive paging & swapping

27 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 27 Review: Christenson99 Solutions, continued –Think youre memory bound? Use top, vmstat, and sar to see how much memory is used and where –If unsure, add more memory May be slow rate at which I/O operations are completing, causing processes to stack up in memory

28 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 28 Review: Christenson99 Solutions, continued –Think youre I/O controller bound? Upgrade, reduce # of disks/controller, use or increase NVRAM to optimize and streamline disk operations –Think youre disk/spindle bound? Faster disks, faster controller, improved filesystem, NVRAM, RAID or rethink RAID configuration Easy to just add more disks

29 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 29 Review: Christenson99 Solutions, continued –Think youre network bound? Upgrade NIC Upgrade network –Sendmail-specific tuning recommendations

30 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 30 Review: Christenson99 Applicability –Largely focussed on MTA issues –Some specific recommended sendmail configuration options –Does address some broader issues

31 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 31 Review: Venema98 Goals –Replace sendmail with a maximally compatible, freely available, faster, more secure, more flexible system that is more predictable under stress –Also as example for book w/ Dan Farmer

32 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 32 Review: Venema98 Problems –Lots of broken software to talk to –Concurrent mail database access –Queue management –Junkmail/relay control –Security –Performance

33 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 33 Review: Venema98 Solutions –Break single monolithic setuid root process into family of mutually distrusting communicating processes each of which operate at least privilege –No setuid programs –No /tmp race conditions –No remote data in shell variables

34 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 34 Review: Venema98 Solutions, continued –No fixed-length string sizes –No unbounded string sizes –No rewriting language, initially Table-driven instead –In-memory window to disk queue –Round-robin between per-destination queues –Random walk to select address within destination

35 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 35 Review: Venema98 Solutions, continued –Dead site list –Per-message bounded exponential backoff –Hashed queues –Tarpit on error –Controlled number of inbound connections –Slow-start of outbound parallel connections No thundering herd

36 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 36 Review: Venema98 Applicability –Again, MTA-specific –Gives us fresh perspectives on many problems and some good ideas

37 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 37 Review: Golanski2000 Goal –Build mail system to handle mail for millions of free Internet users

38 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 38 Review: Golanski2000 Solutions –Exim can hash directories –Exim can make use of hints to improve performance –Local mailbox stored as message per file, not Berkeley mbox format Helps avoid mailbox locking Hurts POP3 performance because there is no central database to allow it to avoid stat() on each file to find out message size each time the mailbox is accessed

39 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 39 Review: Golanski2000 Solutions, continued –Servers split into inbound, outbound/POP3, and fallback servers –Mail addressed to The node is same as the userid Can have multiple domains with same node

40 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 40 Review: Golanski2000 Solutions, continued –Separate outbound/POP3 server for one or more nodes –Mail stored on NetApp NFS file servers

41 8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 41 Review: Golanski2000 Applicability –Again, very MTA-specific –Mixing POP3 & outbound SMTP services causes future scaling problems if future load of one service is not balanced relative to the other –Puts WAY too much information into the DNS To serve 2.5 million customers with a node each, requires at least 2.5 million DNS records –Users are not isolated from single server failure If the server for their node goes down, they are toast


Download ppt "8 Dec 2000Copyright © 2000 by Brad Knowles, all rights reserved. 1 Publications Review Large Mailing Lists –Kolstad97 –Chalup98."

Similar presentations


Ads by Google