Presentation is loading. Please wait.

Presentation is loading. Please wait.

10-Nov-98Python with COM ~ Christian Tismer1 Python with COM Get at your Office Data.

Similar presentations

Presentation on theme: "10-Nov-98Python with COM ~ Christian Tismer1 Python with COM Get at your Office Data."— Presentation transcript:

1 10-Nov-98Python with COM ~ Christian Tismer1 Python with COM Get at your Office Data

2 10-Nov-98Python with COM ~ Christian Tismer2 Contents 1.1. Introduction 2.2. Using COM with Python 3.3. Accessing Excel Data 4.4. Representing data tables in Python 5.5. Reading data from Access databases 6.6. Reading Word tables 7.7. Processing of data in Python 8.8. Creating results A.A. Supplemental

3 10-Nov-98Python with COM ~ Christian Tismer3 4 Foreword 4 Prerequisites 4 Short overview on data management 4 whetting your appetite: some online examples 4 DM problems handled in this sessionthis session 4 other DM problems one should know of other 4 Specific goals of this session *collect data from different sources, convert them into a suitable structure, modify them and put them back into some other form 1 Introduction

4 10-Nov-98Python with COM ~ Christian Tismer4 4 Some words... (1 of 2) 4 The following tutorial is on Data Management on the Windows platform. The main target is interaction with Office Objects like Word tables, Excel sheets and Access tables and queries. Therefore, the Win32 COM interface plays a central role, which led to inclusion of COM into the tutorials title. 4 Nevertheless, my primary intent is to give you support, practice and a row of tools, to make your data processing tasks more effective. You will get working examples on how to explore data, how to convert it, find a better shape, insert it into your result set, optimize for speed, make your data fly. 4 You will not get complete applications to handle specific tasks. Instead, you get working code pieces and modules from production code, together with hints, tips, work-arounds, tricks, whatever can be squeezed out of a single person in 3 1/2 hours. 4 The majority of materials has been prepared on transparencies. I will probably publish them as a preview on the Internet, in order to let you prepare specific questions in advance, which are not covered yet. Hints by email are welcome. 1.1 Foreword

5 10-Nov-98Python with COM ~ Christian Tismer5 4... before we begin. (2 of 2) 4 The whole course material will be handed out to you on a CD-ROM. This includes the current Python distribution, all source code, also modules which are not in the public domain and contain a copyright statement. 4 Attendees of this tutorial gain the right to use our modules for any purpose at home or in their company in order to increase their productivity. 4 Excluded is the right of publishing, selling or distributing original or modified versions of our software, or bundling it as part of a product. Companies which wish to do so need to contact Professional Net Service about conditions. 1.1 Foreword

6 10-Nov-98Python with COM ~ Christian Tismer6 4 In order to make use of the materials which are handed out to you in the tutorial, you need the following equipment: KPC workstation running Windows 95, 98, NT 4.0 or up KNot below 32 MB of main memory KCD-ROM drive KPython 1.5.1 KWin32all-122.exe KOffice Professional 97 or 98 (including Access) 1.2 Prerequisites

7 10-Nov-98Python with COM ~ Christian Tismer7 4 Data management is quite a large field. The complete scope is not so well defined. A try to give an idea: KA Data Management department is usually responsible to take care about data entry, data quality, data conversion, data verification, data access security, data storage security, preparation of raw data listings and basic statistics, data and report archival. KThis can be extended / simplified to everything necessary, until a statisticians work can begin. KData Managers are more and more confrontated with new evolving data formats, multiple data systems being used in parallel, and much increased demands on the ouptut quality. Providing simple text files as a report is in most cases insufficient. Modern Office tools have set new standards on what can be expected and force the Data Manager to not only produce data, but also present them in a convenient outfit. 4 You can save a lot of time for all of the above using Python with COM 1.3 On Data Management

8 10-Nov-98Python with COM ~ Christian Tismer8 1.4 DM problems 4 DM tasks tackled in this session: KData conversion KData transport across applications KFinding a model for given data KExploring an unknown set of data KTechniques for import and export KCommon data formats

9 10-Nov-98Python with COM ~ Christian Tismer9 1.5 Other DM problems 4 DM tasks one should have heared about: KVerification of transport by closing the loop KData entry and comparison KGenerating raw data listings KGenerating full reports with basic statistics KDesign and management of databases KData archival and retrieval KCoping with customer defined structures :-(

10 10-Nov-98Python with COM ~ Christian Tismer10 1.6 Specific goals of this session

11 10-Nov-98Python with COM ~ Christian Tismer11 2 Using COM with Python 4 How to access a COM object 4 How to create and read an interface

12 10-Nov-98Python with COM ~ Christian Tismer12 2.1 Accessing a COM object 4 Create an interface Kidentify which interface to use Kgenerate the Python interface Kfigure out how to use it Ktry to create an object >>> import win32com.client >>> e=win32com.client.Dispatch("DAO.DBEngine") >>> db=e.OpenDatabase("h:\\ab\\vergleich.mdb") >>>

13 10-Nov-98Python with COM ~ Christian Tismer13 2.2 Reading an Interface 4 Learn about the interface Buy books on the remote application Use the applications online help Read the generated Python code Try the interface from your Python shell >>> w=win32com.client.Dispatch("Word.Application") >>> w.Visible=1 >>> doc=w.Documents.Add() >>> doc.Range() >>> doc.Range().Text="Hi SPAM7" >>>

14 10-Nov-98Python with COM ~ Christian Tismer14 3 Accessing Excel Data 4 Getting an Excel sheet into a Python tablePython table 4 working with ranges and attributesranges and attributes 4 Where to get help on my objects? 4 how do I get the data as it looks like? KThe hard way using COM Kthe hard way using delimited (SDF) files

15 10-Nov-98Python with COM ~ Christian Tismer15 3.1 From Excel into Python 4 Getting an Excel sheet into a Python table make sure to get accustomed to ranges be careful with strings: they come as Unicode when reading multiple cells >>> xl=win32com.client.Dispatch("Excel.Application") >>> xl.Visible=-1 >>> wb=xl.Workbooks(1) >>> sh=wb.Worksheets(1) >>> for row in sh.UsedRange.Value: print row (L'Name', L'Age', L'Language', L'Salary') (L'Gates', 43.0, L'Visual Basic', L'dooh') (L'Tismer', 42.0, L'Python', L':-(') (L'Rossum', L'dunno, >42?', L'Python', L'SPAM') >>>

16 10-Nov-98Python with COM ~ Christian Tismer16 3.2 Ranges and Attributes KMany properties are themselves ranges >>> sh.UsedRange.Rows(1).Value ((L'Name', L'Age', L'Language', L'Salary'),) >>> sh.UsedRange.Columns(1).Value ((L'Name',), (L'Gates',), (L'Tismer',), (L'Rossum',)) >>> sh.UsedRange.Columns(1) KOther properties are attributes >>> r=sh.UsedRange.Columns(1) >>> r.Font >>> r.Font.Size 10.0 >>> r.Font.Size=20 >>>

17 10-Nov-98Python with COM ~ Christian Tismer17 3.3 Where to get help on my objects?

18 10-Nov-98Python with COM ~ Christian Tismer18 3.4.1 WYSIWYG data Part I KThe hard way using COM Value property gives the true internal value Text property gives the current text representation >>> r.Cells(2,2).NumberFormat="0.000" >>> r.Cells(2,2).Text '43.000 >>> r.Cells(2,2).Value 43.0 >>> KYou have to cycle through all the single cells to get at the formatted text KWorks, but is very slow

19 10-Nov-98Python with COM ~ Christian Tismer19 3.4.2 WYSIWYG data Part II KThe hard way using SDF - very fast! Excel exports WYSIWYG - You parse the output def split_delimited(s) : """split a delimited text file string into a list of tuples. Quotes are obeyed, enclosed newlines are expanded to tab, double quotes go to quotes""" # the trick to make this handy is to use \0 as a placeholder eol = buffalo.findlinedelimiter(s[:10000]) # guessing function parts = string.split(string.replace(s, "\t", "\0"), '"') limits = (0, len(parts)-1) for i in range(len(parts)) : part = parts[i] if i%2 :part = string.replace(part, eol, "\t") else :if not part and i not in limits: part = '"' parts[i] = part # merge it back txt = string.join(parts, "") parts = string.split(txt, eol) # now break by \0 for i in range(len(parts)) : fields = string.split(parts[i], "\0") parts[i] = tuple(fields) return parts

20 10-Nov-98Python with COM ~ Christian Tismer20 4 Representing data tables in Python 4 simple tables as with the Excel examples 4 table wrapper class with named columns >>> tab [(1, 2, 3), (4, 5, 6), (7, 8, 9)] >>> import dataset >>> ds = dataset.DataSet(["field1", "field2", "field3"], tab) >>> ds DataSet with 3 rows and 3 columns >>> ds.getFieldNames() ['field1', 'field2', 'field3'] >>> ds[-1]{'field1': 7, 'field3': 9, 'field2': 8} >>>

21 10-Nov-98Python with COM ~ Christian Tismer21 4.1 Some DataSet methods 4 ambiguous 4 append 4 appendColumns 4 appendConstantColumn 4 crossTabulate 4 deTabulate 4 display 4 displayColumn 4 expand 4 filterByCategory 4 filterByColumn 4 filterByValueList 4 flatten 4 fold 4 folded 4 getColumn 4 getColumnNames, getFieldNames 4 getTuples 4 getUniqueColumnValues 4 guessColumnTypes 4 hasColumn 4 insert 4 item 4 join 4 notinlist 4 reduce 4 reduced 4 remove 4 renameColumn, renameColumns 4 selectColumns 4 sortOnColumn, sortOnColumns 4 splitByColumnValues 4 substituteInColumn 4 transformByColumn 4 union 4 unique These methods are described in the dataset module.

22 10-Nov-98Python with COM ~ Christian Tismer22 4.2 A little DataSet browser >>> import PyTabs, axsaxs >>> db = axsaxs.Accessor("h:/verwaltung/AB/Adressen/Adreßbuch.mdb") >>> ds = db.getDataSet("adressen") opening adressen reading records... 258 >>> x=PyTabs.viewDS(ds) This handy little tool is itself a COM server which I wrote with Delphi in an afternoon

23 10-Nov-98Python with COM ~ Christian Tismer23 5 Reading data from Access databases 4 reading a table 4 inspecting table and field properties 4 creating queries dynamically

24 10-Nov-98Python with COM ~ Christian Tismer24 5.1 Reading an Access table KUsing the native COM interface >>> import win32com.client >>> e=win32com.client.Dispatch("DAO.DBEngine.35") >>> db=e.OpenDatabase("h:/ab/vergleich.mdb") >>> rs = db.OpenRecordset("-finde-") >>> f = rs.Fields >>> names = map(lambda fld: fld.Name, f) >>> names ['Typ_3', 'Nummer', 'Finde', 'Bemerkung'] >>> rs.MoveFirst() >>> while not rs.EOF:... values = map(lambda fld:fld.Value, f)... print values... rs.MoveNext() # never forget this one!... ['ABNORM', 0, 'Normal', None] ['ABNORM', 1, 'Abnormal', None] ['ACTTYP', 1, 'Mild', None] >>> This is usually written as "DAO.DBEngine". some machines seem to require ".35". I believe this happens when no Office 95 was installed before, but Im not sure.

25 10-Nov-98Python with COM ~ Christian Tismer25 5.1 Reading an Access table KUsing the axsaxs / dataset interface >>> import axsaxs, dataset >>> db = axsaxs.Accessor("h:/ab/vergleich.mdb") >>> ds = db.getDataSet("-finde-") opening -finde- reading records... 241 >>> ds.getFieldNames() ['Typ_3', 'Nummer', 'Finde', 'Bemerkung'] >>> ds[0] {'Bemerkung': None, 'Finde': 'Normal', 'Nummer': 0, 'Typ_3': 'ABNORM'} >>> ds.item(0) ('ABNORM', 0, 'Normal', None) >>> KA dataset is a wrapper class around tabular data in Python. Axsaxs is a wrapper around DAO databases.

26 10-Nov-98Python with COM ~ Christian Tismer26 5.2 Accessing properties KAccess TableDefs >>> import axsaxs >>> db=axsaxs.Accessor(r"h:\ab\vergleich.mdb") >>> d=db.daoDB >>> for td in d.TableDefs: print td.Name -Finde- -MedList- AE_TAB (...) KField properties >>> rs = d.OpenRecordset("AE_TAB") >>> f=rs.Fields[0] >>> f.Properties.Count 25 >>> for p in f.Properties:... print p.Name KSome can be changed by assignment Value Attributes CollatingOrder Type Name OrdinalPosition Size SourceField SourceTable ValidateOnSet DataUpdatable ForeignName DefaultValue ValidationRule ValidationText Required AllowZeroLength FieldSize OriginalValue VisibleValue ColumnHidden ColumnWidth ColumnOrder DecimalPlaces DisplayControl >>>

27 10-Nov-98Python with COM ~ Christian Tismer27 6.1 Reading Word tables 4 using COM (online with Word) >>> import win32com.client >>> w=win32com.client.Dispatch("word.application") >>> w.Visible=1 >>> doc=w.Documents.Add("d:\\tmp\\d.html") >>> doc.Tables.Count 1 >>> tbl = [] >>> for row in range(1, 1+len(doc.Tables(1).Rows)):... line = []... for col in range(1, 1+len(doc.Tables(1).Columns)):... try:... line.append(doc.Tables(1).Cell(row, col).Range.Text)... except:pass # exception for joined cells... tbl.append(line)... >>> len(tbl) 11 >>> this works with HTML, too!

28 10-Nov-98Python with COM ~ Christian Tismer28 6.2 Reading Word tables Kusing Rich Text files (offline, RTF parser) # simple class to get the text from RTF. # Especially to read tables in and get their values. import string, sys sys.path.insert(0,"c:\\ab\\python") import rtfpars class rtftext(rtfpars.rtfstream) : def __init__(self, fname) : rtfpars.rtfstream.__init__(self, fname) self.level = 0 def gettok(self) : code, val = self.gettoken() if code < 2 : self.level = self.level + code return code, val def skipuntil(self, target) : while 1 : code, val = self.gettok() if code == 0 : if not val or val in target : return val def skiphead(self) : self.skipuntil(["pard"]) def readuntil(self, target) : res = [] while 1 : tup = self.gettok() res.append(tup) code, val = tup if code == 0 : if not val or val in target: return res def getthing(self) : # a thing is a simple line or a table row. if self.level==0 : self.skiphead() line = self.readuntil(["par", "sect", "cell", "row"]) if (0, "intbl") not in line: return line buf = line if (0, "row") not in buf: buf = buf + self.readuntil(["row"]) cells = splitlist(buf, (0, "cell")) rest = cells[-1] del cells[-1] tok = rest[-1] del rest[-1] row = [] for cell in cells: row.append(splitlist(cell, (0, "par"))) row.append(rest) row.append(tok) return row

29 10-Nov-98Python with COM ~ Christian Tismer29 # the little app: get all data from tables def main(fname = "c:\\ab\\brivudin\\513\\urin\\RE14-97E.rtf") : global tables tables = [] rtf = rtftext(fname) intbl = 0 while not rtf.eof: row = rtf.getthing() if (0, "row") not in row : intbl = 0 pass # print gettext(row) continue if not intbl : tables.append([]) intbl = 1 textrow = [] for cell in row[:-2] : celltext = map(gettext, cell) textrow.append(string.join(celltext, "\n")) tables[-1].append(textrow) # helpers # later, we will have an own paragraph structure # uhhm, bad without one. hack... def gettext(para) : ret = [] for code, val in para: if code==2 : ret.append(val) elif val == "tab" : ret.append("\t") return string.join(ret, "") def splitlist(lis, elem) : # splits list, but keeps the elem found at the end. res = [] while 1 : try : pos = lis.index(elem)+1 res.append(lis[:pos]) lis[:pos] = [] except ValueError: res.append(lis) return res 6.2 Reading Word tables (cont.) KFurther examination is very data / problem specific

30 10-Nov-98Python with COM ~ Christian Tismer30 7 Processing of data in Python 4 reorganizing tables 4 exploring of data Kwhat is the contents Kwhat is the best datatype for this column? 4 Grouping operations 4 data normalization

31 10-Nov-98Python with COM ~ Christian Tismer31 7.1 Processing of data 4 reorganizing tables (1 of 3) A common task: de-tabulate data from many columns into a long one. Hard for Access or SQL, this is a cakewalk with a dataset. This is the raw data, prepared a little as an Access Query. The data columns must be re- organized into one clumn. Example taken from a huge Pharmaceutical project: Brivudin Oral had 26 large Access Databases with different structure. They were all harmonized and merged into one big Summary database.

32 10-Nov-98Python with COM ~ Christian Tismer32 7.1 Processing of data 4 reorganizing tables (2 of 3) Here an excerpt from the transformation code. def moveECG(limit=None): SRCT="ECG" # does all the blocks of data, just ECG for now #VarID,SubjectID,VisitNo,TimeStamp,Val,RefRangeID print 'loading ecg data, wait a minute...' ds1 = Halle.getDataSet('_exportECG',limit) ds1.display() ds2 = ds1.selectColumns(['Subject','Visit','P', 'PQ', 'QRS','QT','HR']) ds3 = ds2.deTabulate(2) ds4 = ds3.transformByColumn('Subject',globalID) ds5 = ds4.renameColumn('Subject','SubjectID') ds6 = ds5.renameColumn('Visit','VisitNo') ds6.display() Summary.addMeasurements(ds6, SRCT) Here we go - reshaping our data

33 10-Nov-98Python with COM ~ Christian Tismer33 7.1 Processing of data 4 reorganizing tables (3 of 3) And a look at the resulting table...

34 10-Nov-98Python with COM ~ Christian Tismer34 7.2 Exploring of data 4 What is the contents? >>> ds = db.getDataSet("ae_tab") opening ae_tab 1 reading records... 217 >>> ds.getFieldNames() ['Subject', 'Page', 'Row', 'OCCNO', 'AE', 'HARTS', 'SEVERITY', 'STRTDT', 'STOPDT', 'PATTERN', 'RELSHIP', 'NOACT', 'SMDINCR', 'SMDRED', 'SMDINTR', 'SMDDISC', 'NDTHER', 'CONALT', 'CONADD', 'HOSPITL', 'OUTC', 'SAE', 'OK'] >>> ds.getUniqueColumnValues("SEVERITY") [None, 1, 2, 3] >>>

35 10-Nov-98Python with COM ~ Christian Tismer35 7.2 Exploring of data 4 What is the best datatype for this column? def guessColumnType(data) : # should go into Dataset perhaps. # This can be done much more sophisticated. # for now, we do simple heuristics. data = filter(None, data) if len(filter(isDateTime, data)) == len(data) : return "Date" data = map(str, data) needed = max(map(len, data)) if not data : return "VarChar(60)" # try to reduce unnecessary floats data = map(lambda s:s[-2:]==".0" and s[:-2] or s, data) try : # now try to convert data = map(string.atoi, data) maxval = max(data) minval = min(data) maxval = max(maxval, abs(minval)) if maxval <= 65535: return Integer" return Long" except ValueError: pass try : data = map(string.atof, data) return "Double" except ValueError: pass if needed > 255: return "Memo" elif needed > 60: return "VarChar(255)" return "VarChar(60)" This function didnt make it into dataset yet, since it is quite database dependant.

36 10-Nov-98Python with COM ~ Christian Tismer36 7.3 Processing of data 4 Grouping operations Kdataset.reduce(columnlist) squeezes all repeated records into one and turns the values in columnlist into lists. Dataset.expand does the inverse. This provides easy processing of groups of records. >>> ds = db.getDataSet("select subject, page, ae, severity from [ae_tab 2]") opening select subject, page, ae, severity from [ae_tab 2] reading records... 217 >>> ds DataSet with 217 rows and 4 columns >>> ds2=ds.reduce(["page", "ae", "severity"]) >>> ds2.display(colwidth=20, maxrows=5) 'subject' ['page'] ['ae'] ['severity'] 1 [103, 132, 132, 240, ['Diastolic pressure [1, 1, 1, 1, 1, None 7 [174, 174, 191, 191, ['Impatiences', 'Dro [1, 1, 1, 1, 1, 1, 1 8 [174, 191, 213, 305, ['Palpitations', 'Pa [1, 1, 1, 2, 1, 2, 2 9 [132, 152, 174, 174, ['Hypersalivation', [2, 2, 2, 2, 2, 2, 1 10 [132] ['Decrease of diasto [1] >>> >>> ds2 DataSet with 45 rows and 4 columns >>> ds2.expand() DataSet with 217 rows and 4 columns >>>

37 10-Nov-98Python with COM ~ Christian Tismer37 7.4 Processing of data 4 data normalization (1 of 2) Kwith a few of the grouping operations, redundancy in tables can be analyzed. Group the columns with the contents by reduce insert a unique key column select master and detail datasets expand the detail dataset

38 10-Nov-98Python with COM ~ Christian Tismer38 7.4 Processing of data 4 data normalization (2 of 2) >>> ds=dataset.DataSet(["nr", "nr2", "data1", "data2"], []) >>> import whrandom >>> for nr in range(1,5):... nr2 = nr*2... for k in range(whrandom.randint(1, 8)):... data1 = whrandom.randint(1, 1000)... data2 = whrandom.randint(1, 2000)... ds.append((nr, nr2, data1, data2))... >>> ds2 = ds.reduce(["data1", "data2"]) >>> ds3 = ds2.appendColumns(ds2.recordRange("key")) >>> dsmaster = ds3.selectColumns(ds3.notinlist(ds3.reduced())) >>> dsdetail = ds3.selectColumns(["key"]+ds3.reduced()).expand() >>> dsdetail.display() 'key' 'data1' 'data2' 0 883 732 0 224 1853 0 889 1170 0 871 1581 1 763 453 1 867 1881 1 870 566 1 646 1509 1 645 612 2 150 1042 >>> dsmaster.display() 'nr' 'nr2' 'key' 1 2 0 2 4 1 3 6 2 4 8 3 >>>

39 10-Nov-98Python with COM ~ Christian Tismer39 8 Creating results 4 creating result tables in Access 4 creating result tables in Word 4 logging events in an Access table 4 fast writing mode for Access 4 producing cross tables beyond Access capabilities 4 formatting output in Word

40 10-Nov-98Python with COM ~ Christian Tismer40 8.1 Creating results 4 creating result tables in Access (1 of 2) def makeTableSQL(name, ds): fields = [] for cn in ds.getFieldNames(): fields.append( "[%s] %s" % (cn, guessColumnType(ds.getColumn(cn))) ) return "create table [%s] (%s)" % (name, string.join(fields, ", ")) >>> makeTableSQL("master", dsmaster) 'create table [master] ([nr] Integer, [nr2] Integer, [key] Integer)' >>> from brivtools import makeTableSQL >>> makeTableSQL("master", dsmaster) 'create table [master] ([nr] Integer, [nr2] Integer, [key] Integer)' >>> db.execSQL(makeTableSQL("master", dsmaster)) >>> db.execSQL(makeTableSQL("detail", dsdetail)) >>> db.insertDataSet("master", dsmaster) inserting data 4 >>> db.insertDataSet("detail", dsdetail) inserting data 21 >>>

41 10-Nov-98Python with COM ~ Christian Tismer41 8.1 Creating results 4 creating result tables in Access (2 of 2)

42 10-Nov-98Python with COM ~ Christian Tismer42 8.2 Creating results 4 creating result tables in Word (1 of 2) c = win32com.client.constants def appendtable(rows, columns) : myrange = doc.Range() myrange.Collapse(c.wdCollapseEnd) # sieh nach ob wir in einer Tabelle sind. # wenn ja, hänge einen Absatz an if myrange.Tables.Count: myrange.Paragraphs.Add() myrange.Collapse(c.wdCollapseEnd) tbl = myrange.Tables.Add(myrange, rows, columns) return tbl def dstotableslow(ds) : nrows = len(ds)+1 ncols = len(ds.getColumnNames()) tbl = appendtable(nrows, ncols) header = ds.getFieldNames() cell = tbl.Cell for col in range(ncols): cell(1, col+1).Range.Text = str(header[col]) content = ds.getTuples() for row in range(len(content)): for col in range(ncols): cell(row+2, col+1).Range.Text = str(content[row][col]) return tbl This is the straight-forward way to create a Word table: Add a table with rows and columns, and fill them cell by cell. Meanwhile you can brew coffee, or have a meal...

43 10-Nov-98Python with COM ~ Christian Tismer43 8.2 Creating results 4 creating result tables in Word (2 of 2) def dstostring(ds): nrows = len(ds)+1 ncols = len(ds.getColumnNames()) header = ds.getFieldNames() content = ds.getTuples() lis = [string.join(map(str, header), "\t")] for line in content: lis.append(string.join(map(str, line), "\t")) lis.append("") return string.join(lis, "\n") def dstotable(ds): nrows = len(ds)+1 ncols = len(ds.getColumnNames()) blob = dstostring(ds) if string.count(blob, "\n") != nrows or \ string.count(blob, "\t") != (ncols-1) * nrows: return dstotableslow(ds) # no specials, we can use the fast one. doc.Range().Paragraphs.Add() myrange = doc.Range().Paragraphs.Add().Range myrange.Text = blob c = win32com.client.constants myrange.ConvertToTable(Separator=c.wdSeparateByTabs, NumColumns=ncols, NumRows=nrows, Format=c.wdTableFormatNone) return myrange.Tables(1) But in most cases, your data will most probably not contain TAB characters. This leads to a very fast solution which converts megabytes of table data into Word in a few seconds. For the unlikely cases, we fall back to the slower method.

44 10-Nov-98Python with COM ~ Christian Tismer44 8.3 Creating results 4 logging events in an Access table Ka simple example where status records are written into existing Access records def msg_to_subjectvisits(subject, visit, msg) : ddb = db.daoDB rs = ddb.OpenRecordset('''select * from SubjectVisits where Pat_No = %d and Visit = %d''' %(subject, visit)) rs.Edit() rs.Fields("PythonResult").Value = msg rs.Update() rs.Close()

45 10-Nov-98Python with COM ~ Christian Tismer45 8.4 Creating results 4 fast writing mode for Access Kreading an Access table with axsaxs is very fast since it is done in larger blocks. KWriting is much more expensive since it always must happen recordwise. KEarly versions of Pythons COM interface were slow at attribute access, and an accelerator module gave speed gains of about 3.8. The idea is to pick pre-bound functions which are applied later. KSpeed gain is meanwhile down to 1.17, but still considerable. >>> import COMutil, speedCOM >>> speedCOM.Install(COMutil.findModule("DAO.DBEngine.35"))

46 10-Nov-98Python with COM ~ Christian Tismer46 8.5 Creating results 4 producing cross tables beyond Access capabilities KAccess can do crosstabs only on single fields. The dataset module can collapse multiple fields into a tuple, rotate that and unpack it again - giving multiple crosstabs. Group name and value fields together with selectColumns Fold multiple name fields into one >>> ds = ds.fold(firstnamecol, num_of_cols) Fold the same number of value fields into one >>> ds = ds.fold(firstvaluecol, num_of_cols) Do the crosstabulation >>> dsx = ds.crossTabulate(namefield, valuefield) Flatten the dataset, thats it. >>> dsx = dsx.flatten() You still have to work on the column names a little

47 10-Nov-98Python with COM ~ Christian Tismer47 8.6 Creating results 4 formatting output in Word (1 of 3) class converter: def __init__(self, factor, divisor=1): if factor != 1 and divisor != 1: self.factor = float(factor) self.divisor = float(divisor) self.operation = self.scale elif divisor != 1 : self.divisor = float(divisor) self.operation = self.divide elif factor != 1 : self.factor = float(factor) self.operation = self.multiply else: self.operation = self.noop def multiply(self, arg): return arg * self.factor def divide(self, arg): return arg / self.divisor def scale(self, arg): return arg * self.factor / self.divisor def noop(self, arg): return arg+0.0 def __call__(self, arg): return self.operation(arg) PicasToPoints = converter(12) PointsToPicas = converter(1, 12) InchesToPoints = converter(72) PointsToInches = converter(1, 72) LinesToPoints = converter(12) PointsToLines = converter(1, 12) InchesToCentimeters = converter(254, 100) CentimetersToInches = converter(100, 254) CentimetersToPoints = converter(InchesToPoints(100), 254) PointsToCentimeters = converter(254, InchesToPoints(100)) PythonCOM still has some probs with some global functions of Word. Here a little surrogate class which is not only useful for Word.

48 10-Nov-98Python with COM ~ Christian Tismer48 8.6 Creating results 4 formatting output in Word (2 of 3) True = -1 False = 0 def simpleformat(tbl, w1=3, w2 = 3.5): """formats a 3-columned table arbitrary""" c = win32com.client.constants tbl.Columns(1).SetWidth(CentimetersToPoints(w1), c.wdAdjustProportional) if tbl.Columns.Count >= 2 : tbl.Columns(2).SetWidth(CentimetersToPoints(w2), c.wdAdjustProportional) tbl.Rows(1).HeadingFormat = True myrange = tbl.Rows(1).Range myrange.Font.Bold = True fmt = myrange.ParagraphFormat fmt.SpaceBefore = 3 fmt.SpaceAfter = 3 return tbl Now for a simple formatter which does some very few changes. Next page we look at the result...

49 10-Nov-98Python with COM ~ Christian Tismer49 8.6 Creating results 4 formatting output in Word (3 of 3)

50 10-Nov-98Python with COM ~ Christian Tismer50 A Supplemental 4 how to find the right Access version of a.mdb file 4 ODBC data sources and ADODB 4 Running your data through SAS 4 how to compress all your Access databases overnight

51 10-Nov-98Python with COM ~ Christian Tismer51 A.1 Supplemental Khow to find the right Access version of a.mdb file this is a common problem: Access shows up and offers to convert your database. Buf you want to open it with the right Access version instead. KSolution: does what its name says. Usage: After installation, you can double-click any.MDB file from the Explorer, and the according Access version will be loaded. Only this action is intercepted. Opening from the file menu works as usual. Installation: - Copy this file into your python directory. - Edit the path settings for your Access executables. - From Explorer, open the File Type settings for.MDB and change the "open" method as follows: d:\python\pythonw.exe d:\python\ "%1"(adjust paths accordingly) DAO's "Version" method for Access 7/8 won't help, since the databases are identical in structure. Only the objects which Access stores in the database have changed. An undocumented feature which works for all my databases is a Property "AccessVersion" which is stored in a database property collection. We use DAO to read the version number and fire up the right MSACCESS.EXE version.

52 10-Nov-98Python with COM ~ Christian Tismer52 A.2 Supplemental 4 ODBC data sources and ADODB Kthis chapter is in preparation. The axsaxs module meanwhile has an axsado companion which works with several databases, like MS SQL server 6.5, and it will support ODBC sources as well. The module will be ready in a few weeks.

53 10-Nov-98Python with COM ~ Christian Tismer53 A.3 Supplemental 4 Running your data through SAS Kthis chapter is in preparation. The SAS module is under development and will be ready in a few weeks.

54 10-Nov-98Python with COM ~ Christian Tismer54 A.4 Supplemental 4 how to compress all your databases overnight (1 of 3) # 19980113 recursion # compact # CT971215 # compresses MS Access databases # and keeps the date info. import dao3032, os, stat, string, sys infofile = ".compactinfo" def compact(fname) : e=dao3032.DBEngine() x=os.stat(fname) tim=(x[7],x[8]) # == stat.ST_ATIME, stat.ST_MTIME fneu = fname+".neu" e.CompactDatabase(fname, fneu) os.utime(fneu, tim) os.unlink(fname) os.rename(fneu, fname) x=os.stat(fname) return x[stat.ST_SIZE]

55 10-Nov-98Python with COM ~ Christian Tismer55 A.4 Supplemental 4 how to compress all your databases overnight (2 of 3) def get_tree(path, retlist=None) : if retlist is None: retlist=[] dirs = [] ; known = {} try : for row in open(os.path.join(path, infofile)).readlines() : fields = eval(row) if len(fields) == 2 : known[fields[0]] = fields[1] except : pass for entry in os.listdir(path) : fullname = os.path.join(path, entry) x=os.stat(fullname) mode = x[0] # == stat.ST_MODE if stat.S_ISDIR(mode) : print "Directory: %s" %entry dirs.append(entry) elif stat.S_ISREG(mode) and string.lower(entry)[-4:]==".mdb": print "Database %s" % entry if not (mode & stat.S_IWRITE) : print 10*'*', 'SCHREIBGESCHÜTZT', 10*'*' continue if known.has_key(entry) and known[entry]==x[stat.ST_SIZE] : print 10*'+', 'Größe ist unverändert', 10*'+' continue

56 10-Nov-98Python with COM ~ Christian Tismer56 A.4 Supplemental 4 how to compress all your databases overnight (3 of 3) retlist.append(x[stat.ST_SIZE], fullname) for subdir in dirs: newpath = os.path.join(path, subdir) print "recursing into %s" % newpath get_tree(newpath, retlist) print "back from %s" % newpath return retlist def compactall(path) : print "searching tree %s" % path worklist = get_tree(path) print "sorting by size, smallest first" worklist.sort() for (size, database) in worklist: print "compacting %s (%d)" % (database, size), sys.stdout.flush() newsize = compact(database) logpath, entry = os.path.split(database) logpath = os.path.join(logpath, infofile) open(logpath, "a").write(repr((entry, newsize))+"\n") print "- %d (ratio=%0.2f)" % (newsize, (0.0+newsize)/size) print "done compacting all of %s" % path

Download ppt "10-Nov-98Python with COM ~ Christian Tismer1 Python with COM Get at your Office Data."

Similar presentations

Ads by Google