Friday, September 28, 2012

Reading Raster Data with Python and gdal

I am trying to learn Python for Geoprocessing. Here are some very basic notes on "playing" with Python/gdal/ogr. The online documentation is, I must say, rather confusing, so I try it step by step at the command line as below.

I follow the very useful lecture notes at http://www.gis.usu.edu/~chrisg/python/2009/ and the tutorial at http://www.gdal.org/gdal_tutorial.html

Importing both gdal and gdalconst; just "import gdalconst" does not work (and I don't understand why right now...):

    >>> import gdal
    >>> from gdalconst import *


Defining the filename (assuming it's located in the current working directory of python -- use os.getced and os.chdir):

    >>> filename = 'ERS1PRI_19920430o04133tr481fr1989_AppOrb_Calib_Spk_SarsimTC_LinDB.tif'

Now the file can be opened, the driver only needs to be imported for write-access if I understand correctly:

    >>> dataset = gdal.Open(filename, GA_ReadOnly)

Typing "dataset" shows the pointer to the opened file:

    >>> dataset
    <osgeo.gdal.Dataset; proxy of <Swig Object of type 'GDALDatasetShadow *' at 0x0000000003550EA0> >
>>>


Various information can be retrieved from the opened file:

    >>> cols = dataset.RasterXSize
    >>> rows = dataset.RasterYSize
    >>> bands = dataset.RasterCount
    >>> driver = dataset.GetDriver().LongName

    >>> cols
    7257
    >>> rows
    7226
    >>> bands
    1
    >>>driver

    'GeoTIFF'
    >>>

Geoinformation can be retrieved with GetGeoTransform().

>>> geotransform = dataset.GetGeoTransform()

The variable "geotransform" now contains a list with Geoinformation:

    >>> geotransform
    (368745.92379062285, 20.0, 0.0, 8828671.611738198, 0.0, -20.0) 


The answer to what these values mean are found in the documentation:

    adfGeoTransform[0] /* top left x */
    adfGeoTransform[1] /* w-e pixel resolution */
    adfGeoTransform[2] /* rotation, 0 if image is "north up" */
    adfGeoTransform[3] /* top left y */
    adfGeoTransform[4] /* rotation, 0 if image is "north up" */
    adfGeoTransform[5] /* n-s pixel resolution */


and one can retrieve a single value from this list for example with

    >>> originX = geotransform[0]
    >>> originY = geotransform[3]
    >>> pixelWidth = geotransform[1]
    >>> pixelHeight = geotransform[5]
    >>> originX
    368745.92379062285
    >>> originY
    8828671.611738198
    >>> pixelWidth
    20.0
    >>> pixelHeight
    -20.0 


But how to get the individual data values in the file?

Get the band and read the first line:

    >>> band = dataset.GetRasterBand(1)

    >>> bandtype = gdal.GetDataTypeName(band.DataType)
    >>> bandtype
    'Float32'
    >>> scanline = band.ReadRaster( 0, 0, band.XSize, 1,band.XSize, 1, band.DataType)




Since I was not sure what the "ReadRaster" parameters meant, google led me to this useful page:

The ReadRaster() call has the arguments: def ReadRaster(self, xoff, yoff, xsize, ysize, buf_xsize = None, buf_ysize = None, buf_type = None, band_list = None ): The xoff, yoff, xsize, ysize parameter define the rectangle on the raster file to read. The buf_xsize, buf_ysize values are the size of the resulting buffer. So you might say "0,0,512,512,100,100" to read a 512x512 block at the top left of the image into a 100x100 buffer (downsampling the image).
which I found here

Typing "scanline" gives me long lines of this:

    >>> scanline
    '`B\xa2\x8d`B\xa2\x8d`B\xa2\x8d`B\xa2\x8d`B\xa2\x8d`B\xa2\x8d`B\xa2\x8d`B\xa2\x8d`B\xa2\x8d`.........


Note that the returned scanline is of type string, and contains xsize*4 bytes of raw binary floating point data. To convert this into readable values use struct.unpack and instead you get long lines of float numbers:

    >>> import struct
    >>> value = struct.unpack('f' * band.XSize, scanline)
    >>> value
    (-1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, ....


Now I can get individual values, but all are in "one line" and not in an array:

     >>> value[8]
    -1.0000000031710769e-30


I rather read the whole file into an array:

    >>>data = band.ReadAsArray(0, 0, cols, rows)
    >>> value = data[3500,4000]
    >>> value
    -8.30476 


Using the numpy library I can define the datatype in the array:
   >>> import numpy
   >>> data = band.ReadAsArray(0, 0, dataset.RasterXSize, dataset.RasterYSize).astype(numpy.float)
   >>> value = data[3500,4000]
   >>> value
   -8.3047599792480469
   >>>


One has  to be careful not confusing column and rows! Matrix is value = data[row, column], and it starts with '0', so the value -8.30476 is located at y=row=3501 and x=column=4001.

To be continued....

33 comments:

  1. Hey Max,

    Any chance you've got x and y swapped on the very last set of identities in this blog?

    Seems like it should be y=row=3501,x=column=4001

    ReplyDelete
  2. oh yes, indeed, you are right!

    thanks for reading carefully!

    max

    ReplyDelete
  3. Haha, no problem ;-) I was actually trying to figure out why some of my own gdal arrays weren't behaving as I expected, so in the process of checking my own dimensions I ended up checking yours!

    Have you had any good experiences working with python/gdal and long raster time series?

    ReplyDelete
  4. >Have you had any good experiences working with python/gdal and long raster time series?

    Works really fine for me and getting into it --- I do processing of large numbers of images with Python, gdal and ESA NEST called from Python. Like this one extracting images containing area of interest and creating then calibrated, map projected GeoTIFFS for all Radarsat images (>100) in a folder:
    https://github.com/npolar/RemoteSensing/blob/master/ExtractSvalbard.py

    ReplyDelete
  5. oh cool! I hadn't seen the NEST library before; will keep that in mind for any future radar project.

    How are its interferometric tools?

    ReplyDelete
  6. hello max,

    your code for read the tiff image using gdal was perfectly working.i want to detect the lunar's crater (TIFF image).and i am going to use the hough transform algorithm.please help me for to work with that.and also i am going to find the depth and diameter of the crater.

    ReplyDelete
  7. Dear Prabakaran -- I don't think I can help you there, but try searching for example http://gis.stackexchange.com/ or post a question there. Greetings Max

    ReplyDelete
  8. Hello Max,

    I tried to use your tutorial to read .bil data from BioClim, and unfortunately it did not work. When I tried to convert the data with struct.unpack, I got this error message:
    struct.error: unpack requires a string argument of length 172800
    Therefore, I checked the length of scanline and found 86400. Maybe this has something to do with the band type (which is Int16 and not Float32), what do you think?
    How can I solve this issue and read my data?

    ReplyDelete
    Replies
    1. Not sure about this, all formats for the method above are found here http://www.gdal.org/formats_list.html but not BIL. Have not heard about this one before

      Here they say to convert it to Geotiff http://gis.stackexchange.com/questions/21028/adding-bil-raster-file-to-geoserver-and-viewing-it-on-geoserver-or-geonode ... Here I see how http://trac.osgeo.org/gdal/ticket/436.

      Apparently use gdal_translate filename.bil filename.tif and then use the tiff as I use it above

      Delete
    2. Thanks for your quick answer, Unfortunately I translated the file thanks to gdal_translate, but I still get the same error message, the same band type and the length of the scanline remains the same. Are the tiffs you read tiled or stripped ? Or do you know if there is a method to convert the band type ?

      Delete
    3. Hi again -- band type conversion is all gdal_translate --- this could be many things, maybe a corrupt file -- you could try posting your question at http://gis.stackexchange.com/

      Delete
    4. by accident I needed myself to read a *.bin" file these days, and information that helped me I found here https://stevendkay.wordpress.com/category/python/

      I think I write a short post on how I did it

      Delete
    5. You should use fmttypes in:

      values = struct.unpack(fmttypes[BandType] * band.YSize, scanline)

      The fmttypes could be defined in a dictionary:

      fmttypes = {'Byte':'B', 'UInt16':'H', 'Int16':'h', 'UInt32':'I', 'Int32':'i', 'Float32':'f', 'Float64':'d'}

      In my Blog (in spanish) there are examples with complete code and work perfectly.

      Delete
  9. Hi Max
    your script works very good , but is there a way to get all this informations in a file .txt ?

    ReplyDelete
    Replies
    1. which information you want written into a text file? For rows and columns you could use

      textfile = open( outputtextfile, 'w')
      textfile.write( 'rows: ' + str(rows) + ' cols: '+ str(cols) + ' \n\n' )
      textfile.close()

      Delete
  10. I need to convert a 2-d array,extracted from a raster image, into a text file. Text file should contain individual pixel values in different bands in that .tif image.

    ReplyDelete
    Replies
    1. the answer to the comment just above this one should give a hint on how to proceed? Otherwise check the mentioned tutorial at http://www.gis.usu.edu/~chrisg/python/2009/

      Delete
  11. i need python code to show the values of each pixel in diffrent bands of a raster image (.tif file) into one single 2-d array...that new array will contain the bands as its columns..
    i am new to python so pls help.

    ReplyDelete
    Replies
    1. similar to the question answered before, loop through each pixel, read it and write it to a text-file ...

      Delete
  12. please concatenate the python code to write that new array into a new textfile too.

    ReplyDelete
    Replies
    1. you may check at gis.stackexchange.com for such question ... as mentioned, with the comments above and the tutorial at http://www.gis.usu.edu/~chrisg/python/2009 you will find out ... I learned all what I need using this tutorial

      Delete
  13. can you please provide me the code to carry on that looping. How would i read pixels and their values in four arrays of individual bands.

    ReplyDelete
    Replies
    1. as mentioned before, please check the tutorial above, in there exactly these things are explained ...

      Delete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Hello Max
    Are the information contained into data array relative to height of terrain?

    ReplyDelete
    Replies
    1. Hi Alessandro,

      if I understand your question correctly ... then this depends on the kind of raster array you have. If it is a digital elevation model, than the vallue at a raster point is the height of the terrain. If it is an image, then it is a brightness or backscatter or albedo value ...

      is that what you meant?

      Max

      Delete
    2. Thanks Max.
      I have a DEM file than the value contained into the array "data" represents a height map.
      More general, does it depend from the scope of specific application?

      Delete
    3. Hi again, I am unfortunately not sure what your questioms aims at ... An array or raster in Python or gdal is simply a mathematical array like a matrix. What the array contains is up to your data source or interest. Using gdal, you are interested in a geographical array. But the values at a givem point can be anything. Surface temperature, elevation, albedo or backscatter in case of a visual image, wind speed etc etc ... The gdal array is simply meant to be an array where a given raster point at a geographic location has some physical property described by the array value ... Python allows to work with arrays, gdal adds geographical ability like applying map projection etc etc ... Is that what you were after?

      Max

      Delete
  16. Hello Sir,
    How should I get all the pixel values of a tiff image? I m trying to get all the pixel values of land-use land cover image having 11 classes. I m new to the python programming.

    ReplyDelete
    Replies
    1. sorry, your question is not clear to me ... with the command above "data = band.ReadAsArray(0, 0, cols, rows)" the array "data" contains all the pixel values of the image. Otherwise have a look at the lecture notes linked at the beginning of this post

      Delete
  17. hi....
    i want to show list of bands and their values. how to do this using gdal

    ReplyDelete
  18. h I really find your blog useful, can you guide on extraction of calibrated value for radarsat 2 LUT files using this equation 𝑐𝑎𝑙𝑖𝑏𝑟𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 = 𝑑𝑖𝑔𝑖𝑡𝑎𝑙 𝑣𝑎𝑙𝑢𝑒2 + 𝐵 /𝐴
    I have three LUT files (gamma, beta and alpha) and product.xml file, how do I extract calibrated value out of that? Any python liberty that you can suggest?

    ReplyDelete
  19. Hi
    Thanks for such a great post. I have a question-


    band1_12 = dataset1_12.GetRasterBand(1)
    max = band1_12.GetMaximum() // this gives a value of 2425

    but
    data1_12 = np.array(band1_12.ReadAsArray(0,0,band1_12.XSize, band1_12.YSize))
    print(np.amax(data1_12)) // this gives 1977

    Can you please tell why there is a difference and how to rectify it in python.

    ReplyDelete