Как повысить точность демонстрации камеры Tensorflow на iOS для пересмотренного графика

У меня есть приложение для Android, которое было смоделировано после демонстрации Tensorflow Android для классификации изображений,

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android

Исходное приложение использует файл graph graph (.pb) для определения общего набора изображений из Inception v3 (я думаю)

Затем я подготовил собственный график для своих собственных изображений, следуя инструкциям в блоге Tensorflow for Poets,

https://petewarden.com/2016/02/28/tensorflow-for-poets/

и это очень хорошо работало в Android-приложении, после изменения настроек,

ClassifierActivity

private static final int INPUT_SIZE = 299;
private static final int IMAGE_MEAN = 128;
private static final float IMAGE_STD = 128.0f;
private static final String INPUT_NAME = "Mul";
private static final String OUTPUT_NAME = "final_result";
private static final String MODEL_FILE = "file:///android_asset/optimized_graph.pb";
private static final String LABEL_FILE =  "file:///android_asset/retrained_labels.txt";

Чтобы перенести приложение в iOS, я затем использовал демонстрационную камеру iOS, https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/ios/camera

и использовал тот же файл графика и изменил настройки,

CameraExampleViewController.mm

// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = false;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 128f;
const float input_std = 128.0f;
const std::string input_layer_name = "Mul";
const std::string output_layer_name = "final_result";

После этого приложение работает на iOS, однако...

Приложение на Android работает намного лучше, чем iOS при обнаружении секретных изображений. Если я заполняю порт камеры с изображением, оба выполняют аналогичные действия. Но обычно изображение для обнаружения является лишь частью порта просмотра камеры, на Android это, похоже, не сильно влияет, но на iOS это сильно влияет, поэтому iOS не может классифицировать изображение.

Я предполагаю, что Android обрезается, если порт просмотра камеры находится в центральной области 299x299, где iOS масштабирует порт камеры в центральной области 299x299.

Может ли кто-нибудь подтвердить это? и кто-нибудь знает, как исправить демо iOS, чтобы лучше обнаружить сфокусированные изображения? (сделайте это растение)

В демо-классе Android,

ClassifierActivity.onPreviewSizeChosen()

rgbFrameBitmap = Bitmap.createBitmap(previewWidth, previewHeight, Config.ARGB_8888);
    croppedBitmap = Bitmap.createBitmap(INPUT_SIZE, INPUT_SIZE, Config.ARGB_8888);

frameToCropTransform =
        ImageUtils.getTransformationMatrix(
            previewWidth, previewHeight,
            INPUT_SIZE, INPUT_SIZE,
            sensorOrientation, MAINTAIN_ASPECT);

cropToFrameTransform = new Matrix();
frameToCropTransform.invert(cropToFrameTransform);

и на iOS есть,

CameraExampleViewController.runCNNOnFrame()

const int sourceRowBytes = (int)CVPixelBufferGetBytesPerRow(pixelBuffer);
  const int image_width = (int)CVPixelBufferGetWidth(pixelBuffer);
  const int fullHeight = (int)CVPixelBufferGetHeight(pixelBuffer);

  CVPixelBufferLockFlags unlockFlags = kNilOptions;
  CVPixelBufferLockBaseAddress(pixelBuffer, unlockFlags);

  unsigned char *sourceBaseAddr =
      (unsigned char *)(CVPixelBufferGetBaseAddress(pixelBuffer));
  int image_height;
  unsigned char *sourceStartAddr;
  if (fullHeight <= image_width) {
    image_height = fullHeight;
    sourceStartAddr = sourceBaseAddr;
  } else {
    image_height = image_width;
    const int marginY = ((fullHeight - image_width) / 2);
    sourceStartAddr = (sourceBaseAddr + (marginY * sourceRowBytes));
  }
  const int image_channels = 4;

  assert(image_channels >= wanted_input_channels);
  tensorflow::Tensor image_tensor(
      tensorflow::DT_FLOAT,
      tensorflow::TensorShape(
          {1, wanted_input_height, wanted_input_width, wanted_input_channels}));
  auto image_tensor_mapped = image_tensor.tensor<float, 4>();
  tensorflow::uint8 *in = sourceStartAddr;
  float *out = image_tensor_mapped.data();
  for (int y = 0; y < wanted_input_height; ++y) {
    float *out_row = out + (y * wanted_input_width * wanted_input_channels);
    for (int x = 0; x < wanted_input_width; ++x) {
      const int in_x = (y * image_width) / wanted_input_width;
      const int in_y = (x * image_height) / wanted_input_height;
      tensorflow::uint8 *in_pixel =
          in + (in_y * image_width * image_channels) + (in_x * image_channels);
      float *out_pixel = out_row + (x * wanted_input_channels);
      for (int c = 0; c < wanted_input_channels; ++c) {
        out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
      }
    }
  }

  CVPixelBufferUnlockBaseAddress(pixelBuffer, unlockFlags);

Я думаю, что проблема здесь,

tensorflow::uint8 *in_pixel =
          in + (in_y * image_width * image_channels) + (in_x * image_channels);
      float *out_pixel = out_row + (x * wanted_input_channels);

Мое понимание заключается в том, что это просто масштабирование до размера 299 путем выбора каждого x-го пикселя вместо масштабирования исходного изображения до размера 299. Таким образом, это приводит к плохому масштабированию и плохому распознаванию изображений.

Решение состоит в том, чтобы сначала масштабировать до пикселя Buffer до размера 299. Я пробовал это,

UIImage *uiImage = [self uiImageFromPixelBuffer: pixelBuffer];
float scaleFactor = (float)wanted_input_height / (float)fullHeight;
float newWidth = image_width * scaleFactor;
NSLog(@"width: %d, height: %d, scale: %f, height: %f", image_width, fullHeight, scaleFactor, newWidth);
CGSize size = CGSizeMake(wanted_input_width, wanted_input_height);
UIGraphicsBeginImageContext(size);
[uiImage drawInRect:CGRectMake(0, 0, newWidth, size.height)];
UIImage *destImage = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
pixelBuffer = [self pixelBufferFromCGImage: destImage.CGImage];

и для преобразования изображения в буфер буфера,

- (CVPixelBufferRef) pixelBufferFromCGImage: (CGImageRef) image
{
    NSDictionary *options = @{
                              (NSString*)kCVPixelBufferCGImageCompatibilityKey : @YES,
                              (NSString*)kCVPixelBufferCGBitmapContextCompatibilityKey : @YES,
                              };

    CVPixelBufferRef pxbuffer = NULL;
    CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault, CGImageGetWidth(image),
                                          CGImageGetHeight(image), kCVPixelFormatType_32ARGB, (__bridge CFDictionaryRef) options,
                                          &pxbuffer);
    if (status!=kCVReturnSuccess) {
        NSLog(@"Operation failed");
    }
    NSParameterAssert(status == kCVReturnSuccess && pxbuffer != NULL);

    CVPixelBufferLockBaseAddress(pxbuffer, 0);
    void *pxdata = CVPixelBufferGetBaseAddress(pxbuffer);

    CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef context = CGBitmapContextCreate(pxdata, CGImageGetWidth(image),
                                                 CGImageGetHeight(image), 8, 4*CGImageGetWidth(image), rgbColorSpace,
                                                 kCGImageAlphaNoneSkipFirst);
    NSParameterAssert(context);

    CGContextConcatCTM(context, CGAffineTransformMakeRotation(0));
    CGAffineTransform flipVertical = CGAffineTransformMake( 1, 0, 0, -1, 0, CGImageGetHeight(image) );
    CGContextConcatCTM(context, flipVertical);
    CGAffineTransform flipHorizontal = CGAffineTransformMake( -1.0, 0.0, 0.0, 1.0, CGImageGetWidth(image), 0.0 );
    CGContextConcatCTM(context, flipHorizontal);

    CGContextDrawImage(context, CGRectMake(0, 0, CGImageGetWidth(image),
                                           CGImageGetHeight(image)), image);
    CGColorSpaceRelease(rgbColorSpace);
    CGContextRelease(context);

    CVPixelBufferUnlockBaseAddress(pxbuffer, 0);
    return pxbuffer;
}

- (UIImage*) uiImageFromPixelBuffer: (CVPixelBufferRef) pixelBuffer {
    CIImage *ciImage = [CIImage imageWithCVPixelBuffer: pixelBuffer];

    CIContext *temporaryContext = [CIContext contextWithOptions:nil];
    CGImageRef videoImage = [temporaryContext
                             createCGImage:ciImage
                             fromRect:CGRectMake(0, 0,
                                                 CVPixelBufferGetWidth(pixelBuffer),
                                                 CVPixelBufferGetHeight(pixelBuffer))];

    UIImage *uiImage = [UIImage imageWithCGImage:videoImage];
    CGImageRelease(videoImage);
    return uiImage;
}

Не уверен, что это лучший способ изменить размер, но это сработало. Но, казалось, сделать классификацию изображений еще хуже, а не лучше...

Любые идеи или проблемы с преобразованием/изменением размера?

Ответ 1

Поскольку вы не используете YOLO Detector, флаг MAINTAIN_ASPECT имеет значение false. Следовательно, изображение в приложении для Android не обрезается, а масштабируется. Однако в приведенном фрагменте кода я не вижу фактической инициализации флага. Убедитесь, что значение флага на самом деле false в вашем приложении.

Я знаю, что это не полное решение, но надеюсь, что это поможет вам в отладке проблемы.

Ответ 2

Обнаружение объекта Tensorflow имеет стандартные и стандартные конфигурации, ниже приведен список настроек,

Важные вещи, которые необходимо проверить на основе вашей модели ввода ML,

→ model_file_name - это в соответствии с вашим именем файла.pb,

→ model_uses_memory_mapping - Это зависит от вас, чтобы уменьшить общее использование памяти.

→ labels_file_name - Это зависит от имени файла метки,

→ input_layer_name/output_layer_name - убедитесь, что вы используете собственные имена ввода/вывода слоя, которые вы используете во время создания файла графика (.pb).

фрагмент:

// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"graph";//@"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = true;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"labels";//@"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 224;
const int wanted_input_height = 224;
const int wanted_input_channels = 3;
const float input_mean = 117.0f;
const float input_std = 1.0f;
const std::string input_layer_name = "input";
const std::string output_layer_name = "final_result";

Определение пользовательского изображения Tensorflow, вы можете использовать ниже рабочий фрагмент:

→ Для этого процесса вам просто нужно передать объект UIImage.CGImage,

NSString* RunInferenceOnImageResult(CGImageRef image) {
    tensorflow::SessionOptions options;

    tensorflow::Session* session_pointer = nullptr;
    tensorflow::Status session_status = tensorflow::NewSession(options, &session_pointer);
    if (!session_status.ok()) {
        std::string status_string = session_status.ToString();
        return [NSString stringWithFormat: @"Session create failed - %s",
                status_string.c_str()];
    }
    std::unique_ptr<tensorflow::Session> session(session_pointer);
    LOG(INFO) << "Session created.";

    tensorflow::GraphDef tensorflow_graph;
    LOG(INFO) << "Graph created.";

    NSString* network_path = FilePathForResourceNames(@"tensorflow_inception_graph", @"pb");
    PortableReadFileToProtol([network_path UTF8String], &tensorflow_graph);

    LOG(INFO) << "Creating session.";
    tensorflow::Status s = session->Create(tensorflow_graph);
    if (!s.ok()) {
        LOG(ERROR) << "Could not create TensorFlow Graph: " << s;
        return @"";
    }

    // Read the label list
    NSString* labels_path = FilePathForResourceNames(@"imagenet_comp_graph_label_strings", @"txt");
    std::vector<std::string> label_strings;
    std::ifstream t;
    t.open([labels_path UTF8String]);
    std::string line;
    while(t){
        std::getline(t, line);
        label_strings.push_back(line);
    }
    t.close();

    // Read the Grace Hopper image.
    //NSString* image_path = FilePathForResourceNames(@"grace_hopper", @"jpg");
    int image_width;
    int image_height;
    int image_channels;
//    std::vector<tensorflow::uint8> image_data = LoadImageFromFile(
//                                                                  [image_path UTF8String], &image_width, &image_height, &image_channels);
    std::vector<tensorflow::uint8> image_data = LoadImageFromImage(image,&image_width, &image_height, &image_channels);
    const int wanted_width = 224;
    const int wanted_height = 224;
    const int wanted_channels = 3;
    const float input_mean = 117.0f;
    const float input_std = 1.0f;
    assert(image_channels >= wanted_channels);
    tensorflow::Tensor image_tensor(
                                    tensorflow::DT_FLOAT,
                                    tensorflow::TensorShape({
        1, wanted_height, wanted_width, wanted_channels}));
    auto image_tensor_mapped = image_tensor.tensor<float, 4>();
    tensorflow::uint8* in = image_data.data();
    // tensorflow::uint8* in_end = (in + (image_height * image_width * image_channels));
    float* out = image_tensor_mapped.data();
    for (int y = 0; y < wanted_height; ++y) {
        const int in_y = (y * image_height) / wanted_height;
        tensorflow::uint8* in_row = in + (in_y * image_width * image_channels);
        float* out_row = out + (y * wanted_width * wanted_channels);
        for (int x = 0; x < wanted_width; ++x) {
            const int in_x = (x * image_width) / wanted_width;
            tensorflow::uint8* in_pixel = in_row + (in_x * image_channels);
            float* out_pixel = out_row + (x * wanted_channels);
            for (int c = 0; c < wanted_channels; ++c) {
                out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
            }
        }
    }

    NSString* result;
//    result = [NSString stringWithFormat: @"%@ - %lu, %s - %dx%d", result,
//              label_strings.size(), label_strings[0].c_str(), image_width, image_height];

    std::string input_layer = "input";
    std::string output_layer = "output";
    std::vector<tensorflow::Tensor> outputs;
    tensorflow::Status run_status = session->Run({{input_layer, image_tensor}},
                                                 {output_layer}, {}, &outputs);
    if (!run_status.ok()) {
        LOG(ERROR) << "Running model failed: " << run_status;
        tensorflow::LogAllRegisteredKernels();
        result = @"Error running model";
        return result;
    }
    tensorflow::string status_string = run_status.ToString();
    result = [NSString stringWithFormat: @"Status :%s\n",
              status_string.c_str()];

    tensorflow::Tensor* output = &outputs[0];
    const int kNumResults = 5;
    const float kThreshold = 0.1f;
    std::vector<std::pair<float, int> > top_results;
    GetTopN(output->flat<float>(), kNumResults, kThreshold, &top_results);

    std::stringstream ss;
    ss.precision(3);
    for (const auto& result : top_results) {
        const float confidence = result.first;
        const int index = result.second;

        ss << index << " " << confidence << "  ";

        // Write out the result as a string
        if (index < label_strings.size()) {
            // just for safety: theoretically, the output is under 1000 unless there
            // is some numerical issues leading to a wrong prediction.
            ss << label_strings[index];
        } else {
            ss << "Prediction: " << index;
        }

        ss << "\n";
    }

    LOG(INFO) << "Predictions: " << ss.str();

    tensorflow::string predictions = ss.str();
    result = [NSString stringWithFormat: @"%@ - %s", result,
              predictions.c_str()];

    return result;
}

Масштабирование изображения для пользовательской ширины и высоты - фрагмент кода C++,

std::vector<uint8> LoadImageFromImage(CGImageRef image,
                                     int* out_width, int* out_height,
                                     int* out_channels) {

    const int width = (int)CGImageGetWidth(image);
    const int height = (int)CGImageGetHeight(image);
    const int channels = 4;
    CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB();
    const int bytes_per_row = (width * channels);
    const int bytes_in_image = (bytes_per_row * height);
    std::vector<uint8> result(bytes_in_image);
    const int bits_per_component = 8;
    CGContextRef context = CGBitmapContextCreate(result.data(), width, height,
                                                 bits_per_component, bytes_per_row, color_space,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(color_space);
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), image);
    CGContextRelease(context);
    CFRelease(image);

    *out_width = width;
    *out_height = height;
    *out_channels = channels;
    return result;
}

Вышеуказанная функция позволяет загружать данные изображения на основе вашего пользовательского соотношения. Высокое точное соотношение пикселей изображения для ширины и высоты при классификации тензорного потока составляет 224 x 224.

Вам нужно вызывать функцию LoadImage выше из RunInferenceOnImageResult с фактическими параметрами ширины и высоты вместе с ссылкой на изображение.

Ответ 3

Пожалуйста, измените этот код:

// If you have your own model, modify this to the file name, and make sure
// you've added the file to your app resources too.
static NSString* model_file_name = @"tensorflow_inception_graph";
static NSString* model_file_type = @"pb";
// This controls whether we'll be loading a plain GraphDef proto, or a
// file created by the convert_graphdef_memmapped_format utility that wraps a
// GraphDef and parameter file that can be mapped into memory from file to
// reduce overall memory usage.
const bool model_uses_memory_mapping = false;
// If you have your own model, point this to the labels file.
static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";
static NSString* labels_file_type = @"txt";
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 299;
const int wanted_input_height = 299;
const int wanted_input_channels = 3;
const float input_mean = 128f;
const float input_std = 1.0f;
const std::string input_layer_name = "Mul";
const std::string output_layer_name = "final_result";

Здесь измените: const float input_std = 1.0f;