pydataframe

A DataFrame (Tabular datastructur) for Python, similar to R's dataframe
Download

pydataframe Ranking & Summary

Advertisement

  • Rating:
  • License:
  • BSD License
  • Price:
  • FREE
  • Publisher Name:
  • Florian Finkernagel
  • Publisher web site:
  • http://coonabibba.de

pydataframe Tags


pydataframe Description

A DataFrame (Tabular datastructur) for Python, similar to R's dataframe pydataframe is an implemention of an almost R like DataFrame object.Usage: u = DataFrame( { "Field1": , "Field2": }, optional: )A DataFrame is basically a table with rows and columns.Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they're converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get's you another DataFrame, to access individual entries use get_row(), get_column(), get_value().DataFrames also understand basic arithmetic and you can either add (multiply,...) a constant value, or another DataFrame of the same size / with the same column names, like this:#multiply every value in ColumnA that is smaller than 5 by 6.my_df < 5, 'ColumnA'] *= 6#you always need to specify both row and column selectors, use : to meaneverythingmy_df = my_df + my_df#let's take every row that starts with Shu in ColumnA and replace it with a newlist (comprehension)select = my_df.where(lambda row: row.startswith('Shu'))my_df = .replace('Shu', 'Sha') for row inmy_df.iter_rows()]Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for thelibrary!)from dataframe import DataFramefrom rpy2 import robjects as romy_df = DataFrame({"ColumnA": , 'ColumnB': })ro.r(my_df)Combine DataFrames on rows or columns:my_df = a.rbind_copy(b) # a and b have the same columnsmy_df = a.cbind_view(b) # my_df is a composite sharing numpy arrays (columns)with a and bmy_d = a.join_columns_on(b, 'Name_in_A', 'Name_in_B") #join on common values#manipulate DataFrame columnsmy_df.insert_column("new_column_name", )my_df.drop_column('dropped_column_name')my_df.drop_all_columns_except('keep_me_please', 'keep_me_as_well')my_df.rename_column("old","new")print my_df.get_column_names()my_df.impose_partial_column_order(,) # set the column order. Everythingbetween the first and the second list (unspecified columns) get's sortedalphabetically#access datamy_df #a new DataFrame with one column and one rowmy_df.get_value(100, 'ColumnA') #whatever was in in row 100, column'ColumnA' (string, int, object...)my_df.get_row(100) # -> {"ColumnA": value, "ColumnB": another_value}my_df.get_row_as_list(100) # -> , in order ofmy_df.columns_orderedmy_df.get_column('columnA') # numpy array of the column (a copy)my_df.get_column_view('columnA') # the actual underlying numpy array#iterate across the datamy_df.iter_rows() # iter rows as dictionarysmy_df.iter_rows_as_list() # iter rows as lists (see get_row_as_list())my_df.iter_values_columns_first() #value by value, first column row 1, firstcolumn row 2...my_df.iter_values_rows_first() #value by value, first column, row 1, secondcolumn, row 1#turn into boolean array for subselectionmy_df.where(lambda row: row.startswith("Hello") and row>=5)my_df > 5 # any comparison#sortsorted_df = my_df.sort_by("ColumnA") # copy sorted by ColumnA ascendingsorted_df = my_df.sort_by("ColumnA", False) # copy sorted by ColumnA descendingsorted_df = my_df.sort_by(, ) # copy sortedby ColumnA descending, then Column B ascending#aggregation functionsmy_df.mean('ColumnA') # - average (mean) of the values in ColumnAmy_df.mean_and_std('ColumnA') # - mean and standard deviation of the values inColumnA#translate columnsmy_df.turn_into_level('ColumnA') #turns into R compatible factor. Optional:order of levelsmy_df.digitize_column('ColumnA') # bin the valuesmy_df.rankify_column('ColumnA', True) # turn into ranks, Ascending (0.5, 0.6,0.55) -> 0, 2, 1my_df.rescale_column_0_1('ColumnA') # rescales a column to lie within 0..1(inclusive)#import and exportmy_df = pydataframe.DF2CSV().read("filename", dialect=pydataframe.TabDialect(),handle_quotes=True) #read a tab seperated value file. Lot's of options, pleasecheck the codepydataframe.DF2CSV().write(my_df, filename, dialect=pydataframe.TabDialect()) #write a tab seperated value filepydataframe.DF2Excel().read(filename)pydataframe.DF2Excel().write(my_filename) Requirements: · Python


pydataframe Related Software