gdutils.extract

extract is a module in package gdutils that provides a class ExtractTable. ExtractTable is used in converting and extracting tabular data.

Module Functions

extract.read_file

gdutils.extract.read_file(filename: str, column: Optional[str] = None, value: Union[str, List[str], None] = None)

Returns an ExtractTable instance with a specified input filename.

Parameters:
  • filename (str) – Name/path of input file of tabular data to read.
  • column (str | None, optional, default = None) – Label of column to use as index for extracted table.
  • value (str | List[str] | None, optional, default = None) – Value(s) of specified column in rows to extract.
Returns:

Return type:

extract.ExtractTable

Examples

>>> et1 = extract.read_file('example/input.shp')
>>> et2 = extract.read_file('example/file.csv', column='ID')
>>> et3 = extract.read_file('in.shp', column='foo', value='bar')
>>> et4 = extract.read_file('in.csv', column='X', value=['1','3'])

Class gdutils.extract.ExtractTable

class gdutils.extract.ExtractTable(infile: Union[str, geopandas.geodataframe.GeoDataFrame, pandas.core.frame.DataFrame, None] = None, outfile: Optional[str] = None, column: Optional[str] = None, value: Union[str, List[str], None] = None)

For extracting tabular data. Run help(extract.ExtractTable) to view docs.

Specifying outfile determines the filetype of the output table. Specifying column uses given column as output’s index. Specifying value isolates output to rows that contain values in specified column.

infile

Name/path of input file of tabular data to read.

Type:str, optional, default = None
outfile

Path of output file for writing.

Type:pathlib.Path, optional, default = None
column

Label of column to use as index for extracted table.

Type:str, optional, default = None
value

Value(s) of specified column in rows to extract.

Type:str | List[str], optional, default = None

Class Methods

extract.ExtractTable.__init__

gdutils.extract.ExtractTable.__init__(self, infile: Union[str, geopandas.geodataframe.GeoDataFrame, pandas.core.frame.DataFrame, None] = None, outfile: Optional[str] = None, column: Optional[str] = None, value: Union[str, List[str], None] = None)

ExtractTable initializer. Returns an ExtractTable instance.

Parameters:
  • infile (str | gpd.GeoDataFrame | pd.DataFrame | None, optional, default = None) – Name/path of input file of tabular data to read or geopandas GeoDataFrame or pandas DataFrame.
  • outfile (str | None, optional, default = None) – Name/path of output file for writing.
  • column (str | None, optional, default = None) – Label of column to use as index for extracted table
  • value (str | List[str] | None, optional, default = None) – Value(s) of specified column in rows to extract.
Returns:

An ExtractTable instance.

Return type:

extract.ExtractTable

See also

extract.read_file()

Examples

>>> et1 = extract.ExtractTable()
# creates an empty ExtractTable instance
>>> et2 = extract.ExtractTable('example/input.shp')
# initializes the input tabular data
>>> et3 = extract.ExtractTable('example/file.csv', column='ID')
# initializes the input tabular data and sets the column to use
# as the index
>>> et4 = extract.ExtractTable('input.xlsx', 'output.md')
# initializes the input tabular data and specifies the output
# file
>>> et5 = extract.ExtractTable('in.csv', 'out.tex', 'ID', '01')
# initializes input tabular data source, output file, column to
# use as index, and value that isolates subtable to extract
>>> et6 = extract.ExtractTable('in.shp', column='ID', value=['1', '3'])
# initializes input data source, column, and a list of values that
# isolate the subtable to extract
>>> et7 = extract.ExtractTable(gpd.GeoDataFrame())
# initializes the input data source as a geopandas GeoDataFrame
>>> et8 = extract.ExtractTable(pd.DataFrame())
# initializes the input data source as a pandas DataFrame

Instance Methods

extract.ExtractTable.extract

gdutils.extract.ExtractTable.extract(self) → geopandas.geodataframe.GeoDataFrame

Returns a GeoPandas GeoDataFrame containing extracted subtable.

Returns:A geopandas GeoDataFrame of the extracted table.
Return type:gpd.GeoDataFrame
Raises:RuntimeError – Raised if trying to extract from non-existent tabular data.

See also

extract.ExtractTable.extract_to_file()

Examples

>>> et = extract.read_file('input.csv')
>>> df1 = et.extract()
# extracts a GeoDataFrame from a '.csv' file
>>> print(df1.head())
Unnamed: 0 col1 col2 geometry
0     asdf    a    b     None
1     fdsa    c    d     None
2     lkjh    c    3     None
>>> et.column = 'col1'
# sets index from column 'col1'
>>> print(et.extract().head())
     Unnamed: 0 col2 geometry
col1
a          asdf    b     None
c          fdsa    d     None
c          lkjh    3     None
>>> et.value = 'c'
# sets the isolating value to 'c'
>>> print(et.extract().head())
     Unnamed: 0 col2 geometry
col1
c          fdsa    d     None
c          lkjh    3     None

extract.ExtractTable.extract_to_file

gdutils.extract.ExtractTable.extract_to_file(self, outfile: Optional[str] = None, driver: Optional[str] = None) → NoReturn

Writes the tabular extracted data to a file.

Given an optional Fiona support OGR driver, writes to file using the driver. If outfile is None, data is printed as plaintext to stdout.

Parameters:
  • outfile (str | None, optional, default = None) – Name of file to write extracted data.
  • driver (str | None, optional, default = None) – Name of Fiona supported OGR drivers to use for file writing.
Raises:

RuntimeError – Raised if unable to extract to output file.

See also

extract.ExtractTable.extract()

Examples

>>> et1 = extract.read_file('input.csv', 'col2', ['b', 'd'])
>>> et1.extract_to_file()
# outputs the extracted table to standard output
     Unnamed: 0 col1
col2
b          asdf    a
d          fdsa    c
>>> et1.outfile = 'output.xlsx'
# sets the output file to 'output.xlsx'
>>> et1.extract_to_file()
# outputs the extracted Excel table to `output.xlsx'
>>> et2 = extract.ExtractTable('input.shp', 'output', 'col1', 'square')
# sets the output file to 'output'
>>> et2.extract_to_file('ESRI Shapefile')
# extracts table to 'output' in specified format of 'ESRI Shapefile'

extract.ExtractTable.list_columns

gdutils.extract.ExtractTable.list_columns(self) → numpy.ndarray

Returns a list of all columns in the initialized source tabular data.

Returns:An array of column names in the initialized table.
Return type:np.ndarray
Raises:RuntimeError – Raised if trying to list columns from non-existent tabular data.

See also

extract.ExtractTable.list_values()

Examples

>>> et = extract.read_file('input.csv)
>>> cols = et.list_columns())
# gets a list of columns from 'input.csv'
>>> print(cols)
['Unnamed: 0' 'col1' 'col2']

extract.ExtractTable.list_values

gdutils.extract.ExtractTable.list_values(self, column: Optional[str] = None, unique: Optional[bool] = False) → Union[numpy.ndarray, geopandas.array.GeometryArray]

Returns a list of values in the initialized column (default).

Returns a list of values in the given column (if specified). Returns a list of unique values (if specified)

Parameters:
  • column (str | NoneType, optional, default = None) – Name of the column whose values are to be listed. If None, lists the values of the initialized column.
  • unique (bool, optional, default = False) – If True, function lists only unique values.
Returns:

An array of values in the given column of the initialized source table. If the column is the ‘geometry’ column of a geopandas GeoDataFrame, the return value is a GeometryArray.

Return type:

np.ndarray | gpd.array.GeometryArray

Raises:
  • RuntimeError – Raised if trying to list values from non-existent tabular data.
  • KeyError – Raised if column does not exist in tabular data.
  • RuntimeError – Raised if trying to list values from non-existent column.

See also

extract.ExtractTable.list_columns()

Examples

>>> et = extract.read_file('input.csv', 'col2')
>>> vals = et.list_values
# gets a list of values in 'col2' from 'input.csv'
>>> print(vals)
['b' 'd' '3' '5' '10']
>>> vals = et.list_values('col1')
# gets a list of values in 'col1' from 'input.csv'
>>> print(vals)
['a' 'c' 'c' 'c' 'b']
>>> vals = et.list_values('col1', unique=True)
# gets a list of unique values in 'col1' from 'input.csv'
>>> print(vals)
['a' 'c' 'b']