Problem while decoding jpeg with Tjpegdec

Currently I'm working on a jpeg decompression. I've found a great lib from elm-chan, but it seems that's there's something wrong. All the returns are JDR_OK, but despite that on the lcd I get only first pixel of the block. I can see they are valid, because the colors match. In debug mode I see that the decompressor gives my output funtion only 4 bytes. I don't know what is going on. I think I've read all the discussion in the internet and I cannot find an error like this. I hope that someone has succesfully ported this lib to STM32. and can tell me what am I doing wrong. 
the code:
UINT tjd_input (
    JDEC* jd,       /* Decompression object */
    BYTE* buff,     /* Pointer to the read buffer (NULL:skip) */
    UINT nd         /* Number of bytes to read/skip from input stream */
    UINT rb;
    if (buff) { /* Read nd bytes from the input strem */
        fres= f_read(&fsrc, buff, nd, &rb);
        return rb;  /* Returns number of bytes could be read */
    } else {    /* Skip nd bytes on the input stream */
        return (f_lseek(&fsrc, f_tell(&fsrc) + nd) == FR_OK) ? nd : 0;
/* User defined call-back function to output RGB bitmap */
UINT tjd_output (
    JDEC* jd,       /* Decompression object of current session */
    void* bitmap,   /* Bitmap data to be output */
    JRECT* rect     /* Rectangular region to output */
    jd = jd;    /* Suppress warning (device identifier is not needed in this appication) */
    LCD_area(rect->left, rect->bottom, rect->right, rect->top);
    return 1;   /* Continue to decompression */
void load_jpg (
    FIL *fp,    /* File to open */
    char *fn,
    void *work,     /* Pointer to the working buffer (must be 4-byte aligned) */
    UINT sz_work    /* Size of the working buffer (must be power of 2) */
    JDEC jd;        /* Decompression object (70 bytes) */
    JRESULT rc;
    BYTE scale=0;
    fres = f_open(&fsrc,fn, FA_READ | FA_OPEN_EXISTING);
    rc = jd_prepare(&jd, tjd_input, work, sz_work, fp);
    if (rc == JDR_OK)
        rc = jd_decomp(&jd, tjd_output, 0);
void DMA_Config(int ele, u8* buf)
    DMA1_Channel5->CCR|=DMA_CCR1_MEM2MEM;       //DMA type mem to mem
    DMA1_Channel5->CCR|=DMA_CCR1_PL;        //priority very high
    DMA1_Channel5->CCR&=~DMA_CCR1_MSIZE;    // memory 8bit
    DMA1_Channel5->CCR&=~DMA_CCR1_PSIZE;    //peripheral 8bit
    DMA1_Channel5->CCR|=DMA_CCR1_MINC;      //inc mem adress
    DMA1_Channel5->CCR|=DMA_CCR1_PINC;      // inc per adress
    DMA1_Channel5->CCR|=DMA_CCR1_DIR;       //mem to per
    DMA1_Channel5->CNDTR=ele;               // buf size
    DMA1_Channel5->CPAR=&LCD_WRITE_DATA;    //per adress
    DMA1_Channel5->CMAR=buf;                //mem adress
    DMA1_Channel5->CCR|=DMA_CCR1_EN;        //turn on DMA1_ch5

Function DMA_config is 100% valid, because I can display a normal BMP file with it.

Hope You manage to help me :)