Enron email datasets LING 575 Fei Xia 01/04/2011.

Slides:



Advertisements
Similar presentations
Enron Rights of the Employees Presented By Ho-Min Lee Mike Forbes.
Advertisements

.  The sender and recipient(s) of an message do not have to be online at the same time. When one person sends a message, it is stored on an.
Review Questions Business 205
1. Hurricane Forecast Model Enron Corpus LCDR Matt Tabar LCDR William Evans.
The Enron Scandal – Timeline
Enron Scandal Rebecca Klinger & Aaron Cooper. What is it. Enron was formed in 1985 by Kenneth Lay After merging Houston natural Gas. Revealed in October.
By Aleksandra Whistle, Rachel Davis and Jie li
1 ELECTRONIC DATA & DISCRIMINATION INVESTIGATIONS Peter J. Constantine U.S. Department of Labor Office of the Solicitor.
Top-Down/Bottom-Up Analysis Workshop Simon Attfield.
Contacts. Stored in the Contact folder Stores information about businesses or people –Address, phone number, address etc.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
1 Argumentative text type Paola Catenaccio Lingua inglese 1 – LIN
1 of 3 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
New York Stock Exchange Enron was a publicly traded company whose shares were listed on the New York Stock Exchange and were bought, held and sold by individuals.
Actor Centrality Correlates to Project based Coordination Liaquat Hossain, Ph.D. Knowledge Management Research Lab
Enron Corporation Analysis of the Company By: Heather Major & Amanda Walton OL-125 Management.
Group 5 Andrew Beamon Brandon Hall Joseph Gasparini Matthew Pugh
BELLRINGER What is the purpose of ? Who uses it?
Communications Resources 2008 POCO Seattle Susan Root Director, Conference Business Services July 2008 xx.
evidence. Safety To stay safe on the internet there are many points you need to follow. The first point is to change your password regularly, you.
This is group I have made.  Deleted all the s we didn’t need to clear out our .  Then created folders for the s we would receive.
Anya Brookman. How to create a new message Unwanted messages Folders Messages you have sent to someone Logging out when you have finished sending.
ENRON SCANDAL.
Page 1 Keep Your Company Out of the Media Workshop Rachel Verdugo March 23, 2011 Reno, Nevada Protect and Control Your Data.
Electronic Communication Is the process of sending and receiving messages. Text Message File Transfer Sending messages from one computer.
Network Aware Module Implementation of the paper: “Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service”. Its.
WRITTEN COMMUNICATIONS
Web-based Recall App Client Customer Support Orientation For best viewing of this orientation, at the bottom of the pdf viewing window select the viewing.
Data Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.”
Week 10: Accounting process MIS2101: Management Information Systems Based on material developed by C.J. Marselis.
Punishment. Rationale Social contract – Avoid chaos by giving State authority to punish us for our transgressions – within limits Goals – Retribution.
Digital Filing A Simple Way to Digitally Centralize and Distribute Documents.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Social Network Analysis (1) LING 575 Fei Xia 01/04/2011.
Online Faxing Send & receive faxes from anywhere..
Enron The Classic Shell Game. Intro Video a=sp1001c63c&sp-k=Video&sp-p=all&sp-f=ISO &sp-s=doc_date.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
Thomas Jefferson Information Center. What is the Thomas Jefferson Information Center? It is a special center for information about the United States:
Strategies for Cleaning Organizational s with an Application to Enron Dataset Yingjie Zhou, Research Assistant, RPI Mark Goldberg, Professor,
Learning TFC Meeting, SRI March 2005 On the Collective Classification of “Speech Acts” Vitor R. Carvalho & William W. Cohen Carnegie Mellon University.
Lesson 2 . Objectives Describe Understand how to send, reply, and forward Define and send attachments.
CHAPTER 17 INTRODUCTION TO SPREADSHEETS. SPREADSHEETS Application Software designed to aid users in entering, moving,copying, labeling, displaying and.
-to-Blog How It Works. This Is The « -to-blog» System Architecture.
By Brian, Froilan, Lisa, Pedro and Raissa
+ Enron the Unsinkable Ship Nancy Vazquez. + Overview Describe the concept of ‘Creative Accounting’. How the unsinkable ship was sunk to the bottom of.
A brief introduction to ENRON SCANDAL Student name: Olga Balzhinimaeva Student ID: Ma3n0231.
(Electronic Mail) a message sent and received electronically via telecommunication links between computers.
SCHILLER Manuel POLINI Aude
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Classification Results for Folder Classification on Enron Dataset.
Enron: 21,000 Arthur Anderson: 29,000 Retirement: $1.2 Billion Pensions: $2 Billion
PERSONAL INFORMATION MANAGEMENT (PIM) Helen Nneka Okpala Website:
Guide By Phoebi Stewart. has changed the way we communicate with each other. Being able to send information quickly across the internet.
How to manage your s Tips and tricks. Use Folders Folders are used to manage files in your hard disk drive. Similarly you can create folders in your.
Enron: Audit Documentation
– Purposes & Uses 20 – Purposes & Uses 20.
AS computing Data Entry. Structured data This kind of data has a structure. Word processed documents are usually structured. They have paragraphs, indentation.
Don’t Forget the Users! Loretta Cook Digital Skills Developer, Plymouth University.
CSC410: Distributed System
Computer Owners’ Forum
Applications of IScore (using R)
FIN 590 RANK Lessons in Excellence-- fin590rank.com.
FIN 590 RANK Education for Service-- fin590rank.com.
FIN 590 Education for Service-- tutorialrank.com
Mike Ellis Mohammad Monakes
Fun gym Cambridge Nationals R001.
Lesson 2 .
Unit# 5: Internet and Worldwide Web
Discovering Important Nodes through Graph Entropy
Presentation transcript:

Enron datasets LING 575 Fei Xia 01/04/2011

History of Enron Enron was formed in 1985 under the direction of Kenneth Lay In 1999, Enron officials began to use the “special purpose entities” (SPE) trick. In Dec 2000, Jeffrey Skilling took over the position of CEO from Kenneth Lay. In Aug 2001, Skilling surprisingly resigned. Lay became CEO again. Watkins wrote an anonymous letter to Lay about possible fraud. In Oct 2001, the losses transferred from Enron to SPE totaled over $618 million. SEC started an inquiry into Enron. In Jan 2002, Lay resigned as chairman and CEO. Enron collapsed in the same year. In 2003, Enron emerged from bankruptcy as two separate companies. Most creditors would receive about 1/5 of the $67 billion they were owed.

History of Enron dataset Made public by the Federal Energy Regulatory Commission during its investigation in May 2002 Later collected and prepared by SRI for the CALO project William Cohen from CMU put up the dataset on the web for the researchers (the CMU dataset) in March 2004 ISI cleaned the CMU dataset and created a MySql database (the ISI database) Various teams did data cleaning and annotation

Several corpora Raw data: s between 1998 and 2002 – the CMU dataset – the ISI database – … Annotated data – Personal vs. business – zoning – …

The CMU dataset

Paper: ( B. Klimt and Y. Yang, 2004) Available at Stored on patas under /corpora/enron_ _dataset/cmu/

CMU dataset Raw corpus: – 619,446 messages from 158 users Cleanup: – remove folders such as “discussion_threads” – remove duplicates Cleaned corpus: – 200,399 messages from 158 users

Messages per user A few people sent out a lot of messages

Correlation of folders and messages Most users do use folders to organize their s, but their usage of folders varies a lot.

Distribution of thread sizes Thread: same subject line among the same users. Out of 200,399 messages, 61.6% of s are in threads (123,501 s in 30,091 threads). Most threads are of small size:

The ISI database

Paper: Shetty and Adibi’s report Report and data are available at Stored on patas under $data_dir/isi/ Stored on capuchin as a mysql database called “enron”.

Data cleaning Start from the CMU dataset Remove duplicate s Remove folders such as “discussion_threads”, “all documents”, and “sent_mail” …

Cleaned Enron dataset 252,759 s from 151 employees distributed in about 3000 user defined folders The dataset has been used by many research groups.

MySql database: four tables rtype: TO, CC, or BCC rvalue: recipient value

Distribution of sent s per user A few employees sent out a lot of messages.

Distribution of over time Notice the spike around Nov 2001

Social network